Debian Bug report logs - #561203
threads and fork on machine with VIPT-WB cache

version graph

Package: linux-2.6; Maintainer for linux-2.6 is Debian Kernel Team <debian-kernel@lists.debian.org>;

Reported by: dann frazier <dannf@debian.org>

Date: Tue, 15 Dec 2009 05:36:02 UTC

Severity: important

Tags: moreinfo, patch, upstream

Found in versions linux-2.6/2.6.33-1, linux-2.6/3.2.2-1

Done: Moritz Muehlenhoff <jmm@inutil.org>

Bug is archived. No further changes may be made.

Forwarded to http://thread.gmane.org/gmane.linux.ports.parisc/4267

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org>:
Bug#561203; Package src:kde4libs. (Tue, 15 Dec 2009 05:36:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to dann frazier <dannf@debian.org>:
New Bug report received and forwarded. Copy sent to Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org>. (Tue, 15 Dec 2009 05:36:05 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: dann frazier <dannf@debian.org>
To: submit@bugs.debian.org
Subject: FTBFS [hppa] - build hangs
Date: Mon, 14 Dec 2009 22:32:08 -0700
Source: kde4libs
Version: 4:4.3.4-1
Severity: serious
User: debian-hppa@lists.debian.org
Usertags: hppa

kde4libs hangs during build on hppa, but the hang location varies.

First attempt:
[...]
[ 48%] Generating org.kde.kded.xml
cd kded && /usr/bin/qdbuscpp2xml /build/buildd/kde4libs-4.3.4/kded/kdedadaptor.h > /build/buildd/kde4libs-4.3.4/obj-hppa-linux-gnu/kded/org.kde.kded.xml
qdbuscpp2xml: exit code 0 from moc. Aborting
QProcess: Destroyed while process is still running.
make[3]: *** Deleting file `kded/org.kde.kded.xml'
make[3]: *** wait: No child processes.  Stop.
make[3]: *** Waiting for unfinished jobs....
make[3]: *** wait: No child processes.  Stop.
make[2]: *** [kded/CMakeFiles/kdeinit_kded4.dir/all] Error 2
make[1]: *** [all] Terminated
make: *** [debian/stamp-makefile-build] Terminated
Build killed with signal 15 after 300 minutes of inactivity


Second attempt:
[...]
make[3]: Entering directory `/build/buildd/kde4libs-4.3.4/obj-hppa-linux-gnu'
cd kio && /usr/bin/automoc4 /build/buildd/kde4libs-4.3.4/obj-hppa-linux-gnu/kio/kio_automoc.cpp /build/buildd/kde4libs-4.3.4/kio /build/buildd/kde4libs-4.3.4/obj-hppa-linux-gnu/kio /usr/bin/moc-qt4 /usr/bin/cmake
Generating kdirwatch_p.moc
make[3]: *** wait: No child processes.  Stop.
make[3]: *** Waiting for unfinished jobs....
make[3]: *** wait: No child processes.  Stop.
make[2]: *** [kio/CMakeFiles/kio_automoc.dir/all] Error 2
make[1]: *** [all] Terminated
make: *** [debian/stamp-makefile-build] Terminated
Build killed with signal 15 after 300 minutes of inactivity


The third attempt is currently hung at:
[...]
make[3]: Entering directory `/build/buildd/kde4libs-4.3.4/obj-hppa-linux-gnu'
cd kdecore && /usr/bin/automoc4 /build/buildd/kde4libs-4.3.4/obj-hppa-linux-gnu/kdecore/kdecore_automoc.cpp /build/buildd/kde4libs-4.3.4/
kdecore /build/buildd/kde4libs-4.3.4/obj-hppa-linux-gnu/kdecore /usr/bin/moc-qt4 /usr/bin/cmake
Generating klibrary.moc


The build system looks wedged - it isn't consuming any cpu.
The build-related processes are:

  buildd    4199   766  0 02:53 ?        00:00:00 /usr/bin/make -C obj-hppa-linux-gnu
  buildd    4212  4199  0 02:53 ?        00:00:00 /usr/bin/make -f CMakeFiles/Makefile2 all
  buildd    5739  4212  0 02:55 ?        00:00:00 /usr/bin/make -f kdecore/CMakeFiles/kdecore_automoc.dir/build.make kdecore/CMakeFiles/kdecore_automoc.dir/build
  buildd    5740  5739  0 02:55 ?        00:00:00 /bin/sh -c cd kdecore && /usr/bin/automoc4 /build/buildd/kde4libs-4.3.4/obj-hppa-linux-gnu/kdecore/kdecore_automoc.cpp /build/buildd/kde4libs-4.3.4/kdecore /build/buildd/kde4libs-4.3.4/obj-hppa-linux-gnu/kdecore /usr/bin/moc-qt4 /usr/bin/cmake
  buildd    5741  5740  0 02:55 ?        00:00:00 /usr/bin/automoc4 /build/buildd/kde4libs-4.3.4/obj-hppa-linux-gnu/kdecore/kdecore_automoc.cpp /build/buildd/kde4libs-4.3.4/kdecore /build/buildd/kde4libs-4.3.4/obj-hppa-linux-gnu/kdecore /usr/bin/moc-qt4 /usr/bin/cmake
  buildd    5755  5741  0 02:56 ?        00:00:00 [cmake] <defunct>


strace shows that automoc4 is stuck in a select call:

  Process 5741 attached - interrupt to quit
  _newselect(1024, [10], [], NULL, NULL





Information forwarded to debian-bugs-dist@lists.debian.org, Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org>:
Bug#561203; Package src:kde4libs. (Tue, 15 Dec 2009 16:15:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Modestas Vainius <modestas@vainius.eu>:
Extra info received and forwarded to list. Copy sent to Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org>. (Tue, 15 Dec 2009 16:15:05 GMT) Full text and rfc822 format available.

Message #10 received at 561203@bugs.debian.org (full text, mbox):

From: Modestas Vainius <modestas@vainius.eu>
To: dann frazier <dannf@debian.org>, 561203@bugs.debian.org
Subject: Re: Bug#561203: FTBFS [hppa] - build hangs
Date: Tue, 15 Dec 2009 18:12:45 +0200
[Message part 1 (text/plain, inline)]
Hello,

On antradienis 15 Gruodis 2009 07:32:08 dann frazier wrote:
> Source: kde4libs
> Version: 4:4.3.4-1
> Severity: serious
> User: debian-hppa@lists.debian.org
> Usertags: hppa
> 
> kde4libs hangs during build on hppa, but the hang location varies.

Doesn't this evidence suggest that either c++ toolchain, hardware, kernel etc. 
is broken on hppa in general / on peri? automoc4 has not been rebuilt for half 
a year and numerous previous kde4libs versions have built successfully since 
then...


-- 
Modestas Vainius <modestas@vainius.eu>
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org>:
Bug#561203; Package src:kde4libs. (Tue, 15 Dec 2009 16:33:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to dann frazier <dannf@dannf.org>:
Extra info received and forwarded to list. Copy sent to Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org>. (Tue, 15 Dec 2009 16:33:06 GMT) Full text and rfc822 format available.

Message #15 received at 561203@bugs.debian.org (full text, mbox):

From: dann frazier <dannf@dannf.org>
To: Modestas Vainius <modestas@vainius.eu>
Cc: 561203@bugs.debian.org
Subject: Re: Bug#561203: FTBFS [hppa] - build hangs
Date: Tue, 15 Dec 2009 09:29:58 -0700
On Tue, Dec 15, 2009 at 06:12:45PM +0200, Modestas Vainius wrote:
> Hello,
> 
> On antradienis 15 Gruodis 2009 07:32:08 dann frazier wrote:
> > Source: kde4libs
> > Version: 4:4.3.4-1
> > Severity: serious
> > User: debian-hppa@lists.debian.org
> > Usertags: hppa
> > 
> > kde4libs hangs during build on hppa, but the hang location varies.
> 
> Doesn't this evidence suggest that either c++ toolchain, hardware, kernel etc. 
> is broken on hppa in general / on peri? automoc4 has not been rebuilt for half 
> a year and numerous previous kde4libs versions have built successfully since 
> then...

Yes, it certainly does. The hppa porters are looking into such issues,
using the above usertag as a todo list.

-- 
dann frazier





Information forwarded to debian-bugs-dist@lists.debian.org, Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org>:
Bug#561203; Package src:kde4libs. (Tue, 22 Dec 2009 09:12:08 GMT) Full text and rfc822 format available.

Acknowledgement sent to Modestas Vainius <modestas@vainius.eu>:
Extra info received and forwarded to list. Copy sent to Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org>. (Tue, 22 Dec 2009 09:12:08 GMT) Full text and rfc822 format available.

Message #20 received at 561203@bugs.debian.org (full text, mbox):

From: Modestas Vainius <modestas@vainius.eu>
To: dann frazier <dannf@dannf.org>, 561203@bugs.debian.org
Subject: Re: Bug#561203: FTBFS [hppa] - build hangs
Date: Tue, 22 Dec 2009 11:06:25 +0200
[Message part 1 (text/plain, inline)]
Hello,

On antradienis 15 Gruodis 2009 18:29:58 dann frazier wrote:
> > On antradienis 15 Gruodis 2009 07:32:08 dann frazier wrote:
> > > Source: kde4libs
> > > Version: 4:4.3.4-1
> > > Severity: serious
> > > User: debian-hppa@lists.debian.org
> > > Usertags: hppa
> > >
> > > kde4libs hangs during build on hppa, but the hang location varies.
> >
> > Doesn't this evidence suggest that either c++ toolchain, hardware, kernel
> > etc. is broken on hppa in general / on peri? automoc4 has not been
> > rebuilt for half a year and numerous previous kde4libs versions have
> > built successfully since then...
> 
> Yes, it certainly does. The hppa porters are looking into such issues,
> using the above usertag as a todo list.

To reproduce:

$ cat minifail.cpp
#include <QtCore/QProcess>
#include <QtCore/QCoreApplication>

int main(int argc, char** argv) {
        QCoreApplication app(argc, argv);

        QProcess proc;
        proc.start("/usr/bin/cut", QStringList(), QIODevice::NotOpen);
        proc.waitForFinished(-1);
        return 0;
}
$ g++-4.4 -I/usr/include/qt4 -lQtCore minifail.cpp -o minifail
$ i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail; done
Run 1
Run 2
Run 3
Run 4
Run 5
Run 6
Run 7
Run 8
Run 9
Run 10

$ ps aux | grep ...
modax    11058  0.0  0.0  19728  3020 pts/1    Sl+  08:45   0:00 ./minifail
modax    11060  0.0  0.0      0     0 pts/1    Z+   08:45   0:00 [cut] 
<defunct>

$ gdb -p 11058
(gdb) bt
#0  0x4082fe30 in select () from /lib/libc.so.6
#1  0x405d9644 in qt_native_select (fdread=0xfb091410, fdwrite=0xfb091490, 
timeout=<value optimized out>) at io/qprocess_unix.cpp:936
#2  0x405dc3ac in QProcessPrivate::waitForFinished (this=0x27f80, msecs=-1) at 
io/qprocess_unix.cpp:1158
#3  0x4058b6f8 in QProcess::waitForFinished (this=0xfb0912dc, msecs=-1) at 
io/qprocess.cpp:1318
#4  0x00010f24 in main ()
(gdb)


Frame #1 is at 
http://qt.gitorious.org/qt/qt/blobs/4.5/src/corelib/io/qprocess_unix.cpp#line1158

Sometimes minifail crashes: 

$ i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail; done; 
Run 1
Run 2
Run 3                                                                              
Run 4                                                                              
Run 5                                                                              
Run 6                                                                              
Run 7                                                                              
Run 8                                                                              
Run 9                                                                              
Run 10                                                                             
Run 11
Run 12
Run 13
Run 14
Segmentation fault
Run 15
Run 16
Segmentation fault
Run 17
Run 18
Run 19
Run 20
Run 21
Run 22
Run 23
Run 24
Run 25
Run 26
Run 27
Run 28
Run 29
Run 30
Run 31
Run 32
Run 33
Run 34
Run 35

So repeatedly running it under gdb I see a couple of SIGSEGVs: 

$ gdb ./minifail
GNU gdb (GDB) 7.0-debian          
Reading symbols from /home/modax/minifail...(no debugging symbols 
found)...done.
(gdb) r                                                                         
Starting program: /home/modax/minifail                                          
[Thread debugging using libthread_db enabled]                                   
[New Thread 0x41b5c480 (LWP 15889)]                                             
[Thread 0x41b5c480 (LWP 15889) exited]                                          

Program exited normally.
(gdb)                   
(gdb) r                              
Starting program: /home/modax/minifail 
[Thread debugging using libthread_db enabled]
[New Thread 0x41b5c480 (LWP 15892)]          
[Thread 0x41b5c480 (LWP 15892) exited]       

Program exited normally.
(gdb)                   
(gdb) r                 
Starting program: /home/modax/minifail 
[Thread debugging using libthread_db enabled]
[New Thread 0x41b5c480 (LWP 15895)]          
[Thread 0x41b5c480 (LWP 15895) exited]       

Program exited normally.
(gdb) r                 
Starting program: /home/modax/minifail 
[Thread debugging using libthread_db enabled]
[New Thread 0x41b5c480 (LWP 15909)]          
[Thread 0x41b5c480 (LWP 15909) exited]       

Program exited normally.
(gdb) r                 
Starting program: /home/modax/minifail 
[Thread debugging using libthread_db enabled]
[New Thread 0x41b5c480 (LWP 15912)]          
[Thread 0x41b5c480 (LWP 15912) exited]       

Program exited normally.
(gdb) r                 
Starting program: /home/modax/minifail 
[Thread debugging using libthread_db enabled]
[New Thread 0x41b5c480 (LWP 15915)]          

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x41b5c480 (LWP 15915)]        
0x00000000 in ?? ()                                 
(gdb) bt                                            
#0  0x00000000 in ?? ()                             
#1  0x00000000 in ?? ()                             
(gdb) r                                             
The program being debugged has been started already.
Start it from the beginning? (y or n) n             
Program not restarted.                              
(gdb) kill                                          
Kill the program being debugged? (y or n) y         
(gdb) r                                             
Starting program: /home/modax/minifail              
[Thread debugging using libthread_db enabled]       
[New Thread 0x41b5c480 (LWP 15943)]                 
[Thread 0x41b5c480 (LWP 15943) exited]              

Program exited normally.
(gdb) r                 
Starting program: /home/modax/minifail 
[Thread debugging using libthread_db enabled]
[New Thread 0x41b5c480 (LWP 16100)]          
[Switching to Thread 0x41b5c480 (LWP 16100)] 
0x40f676d0 in start_thread () from /lib/libpthread.so.0
ptrace: No such process.                               
(gdb) bt                                               
#0  0x40f676d0 in start_thread () from /lib/libpthread.so.0
#1  0x40f676d0 in start_thread () from /lib/libpthread.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) r                                                                   
The program being debugged has been started already.                      
Start it from the beginning? (y or n) y                                   
[Thread 0x41b5c480 (LWP 16100) exited]                                    
[Thread 0x400040c0 (LWP 16001) exited]                                    
Starting program: /home/modax/minifail                                    
[Thread debugging using libthread_db enabled]                             
[New Thread 0x41b5c480 (LWP 16230)]                                       
[Thread 0x41b5c480 (LWP 16230) exited]                                    

Program exited normally.
(gdb) r                 
Starting program: /home/modax/minifail 
[Thread debugging using libthread_db enabled]
[New Thread 0x41b5c480 (LWP 16233)]          
[Thread 0x41b5c480 (LWP 16233) exited]       

Program exited normally.
(gdb) r
Starting program: /home/modax/minifail
[Thread debugging using libthread_db enabled]
[New Thread 0x41b5c480 (LWP 16236)]
[Thread 0x41b5c480 (LWP 16236) exited]

Program exited normally.
(gdb) r
Starting program: /home/modax/minifail
[Thread debugging using libthread_db enabled]
[New Thread 0x41b5c480 (LWP 16239)]
[Thread 0x41b5c480 (LWP 16239) exited]

Program exited normally.
(gdb) r
Starting program: /home/modax/minifail
[Thread debugging using libthread_db enabled]
[New Thread 0x41b5c480 (LWP 16242)]
[Switching to Thread 0x41b5c480 (LWP 16242)]
0x40f676d0 in start_thread () from /lib/libpthread.so.0
ptrace: No such process.
(gdb) bt
#0  0x40f676d0 in start_thread () from /lib/libpthread.so.0
#1  0x40f676d0 in start_thread () from /lib/libpthread.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)


The bug is NOT reproducable when minifail is run under strace.

-- 
Modestas Vainius <modestas@vainius.eu>
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org>:
Bug#561203; Package src:kde4libs. (Tue, 22 Dec 2009 19:57:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Modestas Vainius <modestas@vainius.eu>:
Extra info received and forwarded to list. Copy sent to Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org>. (Tue, 22 Dec 2009 19:57:04 GMT) Full text and rfc822 format available.

Message #25 received at 561203@bugs.debian.org (full text, mbox):

From: Modestas Vainius <modestas@vainius.eu>
To: 561203@bugs.debian.org
Cc: dann frazier <dannf@dannf.org>, debian-hppa@lists.debian.org
Subject: Bug#561203: FTBFS [hppa] - pthread_create() (QThread) + fork() = crash
Date: Tue, 22 Dec 2009 21:54:16 +0200
[Message part 1 (text/plain, inline)]
retitle 561203 FTBFS [hppa] - pthread_create() (or QThread) + fork() = crash
reassign 561203 libc6 2.10.2-2
affects 561203 kde4libs
tags 561203 help
thanks

Hello,

when investigating this issue further, I determined that fork() following 
pthread_create() sometimes makes the application crash. In order to reproduce, 
build attached minifail.cpp with:

$ g++ -I/usr/include/qt4 -lQtCore minifail.cpp -o minifail -O0 -g

(pipe()/read()/write() are only used to sync parent with child after fork(), 
they are irrelevant for the problem).

------------------------------------

When repeatedly running it as `minifail` (pure_test() mode), I get:

$ i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail; done; 
Run 1                                                                              
Child OK.                                                                          
Thread OK.                                                                         
Run 2                                                                              
Thread OK.                                                                         
Child OK.                                                                          
Run 3                                                                              
Thread OK.                                                                         
Segmentation fault                                                                 
Run 4                                                                              
Thread OK.                                                                         
Segmentation fault                                                                 
Run 5                                                                              
Thread OK.                                                                         
Segmentation fault                                                                 
Run 6                                                                              
Child OK.                                                                          
Thread OK.                                                                         
Run 7                                                                              
Child OK.                                                                          
Thread OK.                                                                         
Run 8                                                                              
Child OK.                                                                          
Thread OK.                                                                         
Run 9                                                                              
Child OK.                                                                          
Thread OK.                                                                         
Run 10                                                                             
Child OK.                                                                          
Thread OK.                                                                         
Run 11                                                                             
Child OK.                                                                          
Thread OK.                                                                         
Run 12                                                                             
Thread OK.                                                                         
Child OK.                                                                          
Run 13                                                                             
Child OK.                                                                          
Thread OK.                                                                         
Run 14                                                                             
Thread OK.                                                                         
Child OK.                                                                          
Run 15                                                                             
Thread OK.                                                                         
Child OK.                                                                          
Run 16                                                                             
Child OK.                                                                          
Thread OK.                                                                         
Run 17                                                                             
Child OK.                                                                          
Thread OK.                                                                         
Run 18                                                                             
Thread OK.                                                                         
Child OK.                                                                          
Run 19                                                                             
Thread OK.                                                                         
Child OK.                                                                          
Run 20                                                                             
Segmentation fault                                                                 
Run 21                                                                             
Segmentation fault                                                                 
Run 22                                                                             
Child OK.                                                                          
Thread OK.                                                                         
Run 23                                                                             
Thread OK.
Segmentation fault
Run 24
Thread OK.
Child OK.
Run 25
Thread OK.
Child OK.
Run 26
Thread OK.
Segmentation fault
Run 27
Child OK.
Thread OK.
Run 28
Child OK.
Thread OK.
Run 29
Child OK.
Thread OK.
Run 30
Child OK.
Thread OK.
Run 31
Child OK.
Thread OK.

The hang which is original problem of this FTBFS, can be reproduced with 
`./minifail qt` (qt_test() mode that uses QThread + fork()). QThread 
internally uses pthreads but unfortunately I was not able to reproduce the 
hang with pure pthread_* calls.

$ i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail qt; done;
Run 1
Child OK.
Thread OK.
Run 2
Child OK.
Thread OK.
Run 3
Child OK.
Thread OK.
Run 4
Segmentation fault
Run 5
Child OK.
Thread OK.
Run 6
Child OK.
Thread OK.
Run 7
Child OK.

When attaching gdb to the hung minifail process, I get:

0x40f6db90 in __pthread_cond_wait_internal () from /lib/libpthread.so.0                                      
(gdb) thread apply all bt                                                                                    

Thread 2 (Thread 0x41b5c480 (LWP 28185)):
#0  0x40838214 in clone () from /lib/libc.so.6
#1  0x00000000 in ?? ()                       

Thread 1 (Thread 0x400040c0 (LWP 28184)):
#0  0x40f6db90 in __pthread_cond_wait_internal () from /lib/libpthread.so.0
#1  0x40f6e278 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x40847e88 in pthread_cond_wait () from /lib/libc.so.6                   
#3  0x404e5a68 in QWaitConditionPrivate::wait (this=0x27fd8, mutex=0x27fc4, 
time=4294967295) at thread/qwaitcondition_unix.cpp:87
#4  QWaitCondition::wait (this=0x27fd8, mutex=0x27fc4, time=4294967295) at 
thread/qwaitcondition_unix.cpp:159                    
#5  0x404e4128 in QThread::wait (this=<value optimized out>, time=4294967295) 
at thread/qthread_unix.cpp:484                     
#6  0x000110f0 in qt_test (argc=2, argv=0xfaf8e020) at minifail.cpp:39                                                           
#7  0x00011328 in main (argc=2, argv=0xfaf8e020) at minifail.cpp:77                                                              
(gdb) t 2                                                                                                                        
[Switching to thread 2 (Thread 0x41b5c480 (LWP 28185))]#0  0x40838214 in clone 
() from /lib/libc.so.6                            
(gdb) x/10i 0x40838214                                                                                                           
0x40838214 <clone+24>:  stw,ma r26,40(r25)                                                                                       
0x40838218 <clone+28>:  stw r23,-3c(r25)                                                                                         
0x4083821c <clone+32>:  stw r24,-38(r25)                                                                                         
0x40838220 <clone+36>:  copy r24,r26                                                                                             
0x40838224 <clone+40>:  ldw -74(sp),r24                                                                                          
0x40838228 <clone+44>:  ldw -78(sp),r23                                                                                          
0x4083822c <clone+48>:  ldw -7c(sp),r22                                                                                          
0x40838230 <clone+52>:  copy r19,r4                                                                                              
0x40838234 <clone+56>:  be,l 100(sr2,r0),sr0,r31                                                                                 
0x40838238 <clone+60>:  ldi 78,r20                                                                                               
(gdb) x/20i clone
0x408381fc <clone>:     stw rp,-14(sp)
0x40838200 <clone+4>:   stw,ma r4,40(sp)
0x40838204 <clone+8>:   stw sp,-4(sp)
0x40838208 <clone+12>:  stw r19,-20(sp)
0x4083820c <clone+16>:  cmpib,=,n 0,r26,0x40838270 <clone+116>
0x40838210 <clone+20>:  cmpib,=,n 0,r25,0x40838270 <clone+116>
0x40838214 <clone+24>:  stw,ma r26,40(r25)
0x40838218 <clone+28>:  stw r23,-3c(r25)
0x4083821c <clone+32>:  stw r24,-38(r25)
0x40838220 <clone+36>:  copy r24,r26
0x40838224 <clone+40>:  ldw -74(sp),r24
0x40838228 <clone+44>:  ldw -78(sp),r23
0x4083822c <clone+48>:  ldw -7c(sp),r22
0x40838230 <clone+52>:  copy r19,r4
0x40838234 <clone+56>:  be,l 100(sr2,r0),sr0,r31
0x40838238 <clone+60>:  ldi 78,r20
0x4083823c <clone+64>:  ldi -1000,r1
0x40838240 <clone+68>:  cmpclr,>>= r1,ret0,r0
0x40838244 <clone+72>:  b,l,n 0x4083825c <clone+96>,r0
0x40838248 <clone+76>:  copy r4,r19
(gdb)

This means that thread 2 was not started at all and hung at clone().
Relevant QThread code at 
http://qt.gitorious.org/qt/qt/blobs/4.5/src/corelib/thread/qthread_unix.cpp

------------------------------------

I strongly believe that if you fix the first problem, the 2nd one will 
disappear too as their origin is the same. Both tests work just fine on amd64.

Contrary to what I said previously, the bug is reproducible under strace -f, 
but you have to wait much longer (up to 10000th run).

ii  libc6                         2.10.2-2                      GNU C Library: 
Shared libraries

-- 
Modestas Vainius <modestas@vainius.eu>
[minifail.cpp (text/x-c++src, attachment)]
[signature.asc (application/pgp-signature, inline)]

Changed Bug title to 'FTBFS [hppa] - pthread_create() (or QThread) + fork() = crash' from 'FTBFS [hppa] - build hangs' Request was from Modestas Vainius <modestas@vainius.eu> to control@bugs.debian.org. (Tue, 22 Dec 2009 19:57:06 GMT) Full text and rfc822 format available.

Bug reassigned from package 'src:kde4libs' to 'libc6'. Request was from Modestas Vainius <modestas@vainius.eu> to control@bugs.debian.org. (Tue, 22 Dec 2009 19:57:07 GMT) Full text and rfc822 format available.

Bug No longer marked as found in versions kde4libs/4:4.3.4-1. Request was from Modestas Vainius <modestas@vainius.eu> to control@bugs.debian.org. (Tue, 22 Dec 2009 19:57:07 GMT) Full text and rfc822 format available.

Bug Marked as found in versions eglibc/2.10.2-2. Request was from Modestas Vainius <modestas@vainius.eu> to control@bugs.debian.org. (Tue, 22 Dec 2009 19:57:08 GMT) Full text and rfc822 format available.

Added indication that 561203 affects kde4libs Request was from Modestas Vainius <modestas@vainius.eu> to control@bugs.debian.org. (Tue, 22 Dec 2009 19:57:09 GMT) Full text and rfc822 format available.

Added tag(s) help. Request was from Modestas Vainius <modestas@vainius.eu> to control@bugs.debian.org. (Tue, 22 Dec 2009 19:57:10 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Tue, 22 Dec 2009 20:45:08 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Carlos O'Donell" <carlos@systemhalted.org>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Tue, 22 Dec 2009 20:45:08 GMT) Full text and rfc822 format available.

Message #42 received at 561203@bugs.debian.org (full text, mbox):

From: "Carlos O'Donell" <carlos@systemhalted.org>
To: Modestas Vainius <modestas@vainius.eu>
Cc: 561203@bugs.debian.org, dann frazier <dannf@dannf.org>, debian-hppa@lists.debian.org
Subject: Re: Bug#561203: FTBFS [hppa] - pthread_create() (QThread) + fork() = crash
Date: Tue, 22 Dec 2009 15:41:40 -0500
On Tue, Dec 22, 2009 at 2:54 PM, Modestas Vainius <modestas@vainius.eu> wrote:
> when investigating this issue further, I determined that fork() following
> pthread_create() sometimes makes the application crash. In order to reproduce,
> build attached minifail.cpp with:
>
> $ g++ -I/usr/include/qt4 -lQtCore minifail.cpp -o minifail -O0 -g

Thank you for the test case.

> This means that thread 2 was not started at all and hung at clone().
> Relevant QThread code at
> http://qt.gitorious.org/qt/qt/blobs/4.5/src/corelib/thread/qthread_unix.cpp

I don't believe that it would be stuck at the store instruction which
is pointed to by the PC (iaoqh).

However, I do believe that something is wrong and I am going to
investigate the kde4libs failure tonight.

> I strongly believe that if you fix the first problem, the 2nd one will
> disappear too as their origin is the same. Both tests work just fine on amd64.
>
> Contrary to what I said previously, the bug is reproducible under strace -f,
> but you have to wait much longer (up to 10000th run).
>
> ii  libc6                         2.10.2-2                      GNU C Library:
> Shared libraries

Thanks. I'll look into this tonight.

Cheers,
Carlos.




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Tue, 22 Dec 2009 23:24:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Helge Deller <deller@gmx.de>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Tue, 22 Dec 2009 23:24:02 GMT) Full text and rfc822 format available.

Message #47 received at 561203@bugs.debian.org (full text, mbox):

From: Helge Deller <deller@gmx.de>
To: Modestas Vainius <modestas@vainius.eu>
Cc: 561203@bugs.debian.org, dann frazier <dannf@dannf.org>, debian-hppa@lists.debian.org
Subject: Re: Bug#561203: FTBFS [hppa] - pthread_create() (QThread) + fork() = crash
Date: Wed, 23 Dec 2009 00:23:07 +0100
On 12/22/2009 08:54 PM, Modestas Vainius wrote:
> when investigating this issue further, I determined that fork() following
> pthread_create() sometimes makes the application crash. In order to reproduce,
> build attached minifail.cpp with:
>
> $ g++ -I/usr/include/qt4 -lQtCore minifail.cpp -o minifail -O0 -g
>
> (pipe()/read()/write() are only used to sync parent with child after fork(),
> they are irrelevant for the problem).
> ------------------------------------

Thanks!

Good testcase!
I could verify all problems you reported (segfaults and hangs).
Kernel 2.6.33-rc1-32bit, UP-machine, c3000.

Helge




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Sun, 27 Dec 2009 14:48:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Luk Claes <luk@debian.org>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Sun, 27 Dec 2009 14:48:03 GMT) Full text and rfc822 format available.

Message #52 received at 561203@bugs.debian.org (full text, mbox):

From: Luk Claes <luk@debian.org>
To: debian-hppa@lists.debian.org
Cc: 561203@bugs.debian.org
Subject: Re: Re: Bug#561203: FTBFS [hppa] - pthread_create() (QThread) + fork() = crash
Date: Sun, 27 Dec 2009 15:47:17 +0100
Hi

What's the status of this bug? It's holding the KDE transition which is
blocking the Xorg and python transitions...

Cheers

Luk




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Sun, 27 Dec 2009 15:42:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Carlos O'Donell" <carlos@systemhalted.org>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Sun, 27 Dec 2009 15:42:03 GMT) Full text and rfc822 format available.

Message #57 received at 561203@bugs.debian.org (full text, mbox):

From: "Carlos O'Donell" <carlos@systemhalted.org>
To: Luk Claes <luk@debian.org>
Cc: debian-hppa@lists.debian.org, 561203@bugs.debian.org
Subject: Re: Re: Bug#561203: FTBFS [hppa] - pthread_create() (QThread) + fork() = crash
Date: Sun, 27 Dec 2009 10:38:45 -0500
On Sun, Dec 27, 2009 at 9:47 AM, Luk Claes <luk@debian.org> wrote:
> What's the status of this bug? It's holding the KDE transition which is
> blocking the Xorg and python transitions...

I'm working on this bug. The current status is "under investigation."
I don't have a good idea of what is going on or why it's crashing.

Cheers,
Carlos.




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Sun, 27 Dec 2009 15:57:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Helge Deller <deller@gmx.de>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Sun, 27 Dec 2009 15:57:05 GMT) Full text and rfc822 format available.

Message #62 received at 561203@bugs.debian.org (full text, mbox):

From: Helge Deller <deller@gmx.de>
To: Carlos O'Donell <carlos@systemhalted.org>
Cc: Luk Claes <luk@debian.org>, debian-hppa@lists.debian.org, 561203@bugs.debian.org
Subject: Re: Bug#561203: FTBFS [hppa] - pthread_create() (QThread) + fork() = crash
Date: Sun, 27 Dec 2009 16:54:41 +0100
On 12/27/2009 04:38 PM, Carlos O'Donell wrote:
> On Sun, Dec 27, 2009 at 9:47 AM, Luk Claes<luk@debian.org>  wrote:
>> What's the status of this bug? It's holding the KDE transition which is
>> blocking the Xorg and python transitions...
>
> I'm working on this bug. The current status is "under investigation."
> I don't have a good idea of what is going on or why it's crashing.

I could reproduce this bug as well and will continue to debug as soon
as I return back from christmas family visits.


My current analysis/assumption is:

I assumed, that the NPTL userspace implementation is correct and in this case
I only see a difference in how the clone() syscall is called from pthread_create()
and fork().  fork() always worked, while pthread_create() sometimes failed.

pthread_create() uses
clone(child_stack=0x4088d040,
  flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
  parent_tidptr=0x4108c4e8, tls=0x4108c900, child_tidptr=0x4108c4e8),

while fork() uses
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x40002028)

So, maybe the kernel implementation of clone() misses
some cache flush instructions for the newly created child
in the pthread_create() case... ?

That said... I still need some more time for testing...

Helge




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Sun, 27 Dec 2009 18:03:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Luk Claes <luk@debian.org>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Sun, 27 Dec 2009 18:03:03 GMT) Full text and rfc822 format available.

Message #67 received at 561203@bugs.debian.org (full text, mbox):

From: Luk Claes <luk@debian.org>
To: debian-hppa@lists.debian.org
Cc: 561203@bugs.debian.org
Subject: Re: Bug#561203: FTBFS [hppa] - pthread_create() (QThread) + fork() = crash
Date: Sun, 27 Dec 2009 18:59:53 +0100
Helge Deller wrote:
> On 12/27/2009 04:38 PM, Carlos O'Donell wrote:
>> On Sun, Dec 27, 2009 at 9:47 AM, Luk Claes<luk@debian.org>  wrote:
>>> What's the status of this bug? It's holding the KDE transition which is
>>> blocking the Xorg and python transitions...
>>
>> I'm working on this bug. The current status is "under investigation."
>> I don't have a good idea of what is going on or why it's crashing.
> 
> I could reproduce this bug as well and will continue to debug as soon
> as I return back from christmas family visits.
> 
> 
> My current analysis/assumption is:

> So, maybe the kernel implementation of clone() misses
> some cache flush instructions for the newly created child
> in the pthread_create() case... ?
> 
> That said... I still need some more time for testing...

Thanks to both of you for the update. Hopefully the cause can be
identiefied and fixed soon.

Cheers

Luk




Removed indication that 561203 affects kde4libs Added indication that 561203 affects src:kde4libs Request was from Modestas Vainius <modax@debian.org> to control@bugs.debian.org. (Tue, 29 Dec 2009 10:03:02 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Mon, 04 Jan 2010 11:57:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Modestas Vainius <modestas@vainius.eu>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Mon, 04 Jan 2010 11:57:03 GMT) Full text and rfc822 format available.

Message #74 received at 561203@bugs.debian.org (full text, mbox):

From: Modestas Vainius <modestas@vainius.eu>
To: 561203@bugs.debian.org
Cc: dann frazier <dannf@dannf.org>, debian-hppa@lists.debian.org, "Carlos O'Donell" <carlos@systemhalted.org>, Helge Deller <deller@gmx.de>, Luk Claes <luk@debian.org>
Subject: Re: Bug#561203: FTBFS [hppa] - pthread_create() (QThread) + fork() = crash
Date: Mon, 4 Jan 2010 13:53:58 +0200
[Message part 1 (text/plain, inline)]
tags 561203 pending
thanks

Hello,

On antradienis 22 Gruodis 2009 21:54:16 Modestas Vainius wrote:
> when investigating this issue further, I determined that fork() following
> pthread_create() sometimes makes the application crash. In order to
>  reproduce, build attached minifail.cpp with:

[.. snip ..]

> When repeatedly running it as `minifail` (pure_test() mode), I get:
> 
> $ i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail; done;

[.. snip ..]
> 
> The hang which is original problem of this FTBFS, can be reproduced with
> `./minifail qt` (qt_test() mode that uses QThread + fork()). QThread
> internally uses pthreads but unfortunately I was not able to reproduce the
> hang with pure pthread_* calls.
> 
> $ i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail qt; done;

[.. snip ..]

> 
> ii  libc6                         2.10.2-2                      GNU C
>  Library: Shared libraries
> 

It seems libc6 2.10.2-3 fixed the problem. I cannot reproduce the bug with 
both test cases above any more. As far as I can tell from the changelog, 
rebuild with gcc-4.4 helped. I will close this bug once a couple of KDE 
packages get built on hppa successfully.

-- 
Modestas Vainius <modestas@vainius.eu>
[signature.asc (application/pgp-signature, inline)]

Added tag(s) pending. Request was from Modestas Vainius <modestas@vainius.eu> to control@bugs.debian.org. (Mon, 04 Jan 2010 11:57:08 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Mon, 04 Jan 2010 15:21:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Helge Deller" <deller@gmx.de>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Mon, 04 Jan 2010 15:21:03 GMT) Full text and rfc822 format available.

Message #81 received at 561203@bugs.debian.org (full text, mbox):

From: "Helge Deller" <deller@gmx.de>
To: Modestas Vainius <modestas@vainius.eu>, 561203@bugs.debian.org
Cc: luk@debian.org, carlos@systemhalted.org, debian-hppa@lists.debian.org, dannf@dannf.org
Subject: Re: Bug#561203: FTBFS [hppa] - pthread_create() (QThread) + fork() = crash
Date: Mon, 04 Jan 2010 16:18:40 +0100
> It seems libc6 2.10.2-3 fixed the problem. I cannot reproduce the bug with
> both test cases above any more. As far as I can tell from the changelog, 
> rebuild with gcc-4.4 helped. I will close this bug once a couple of KDE 
> packages get built on hppa successfully.

Hello Modestas,

libc6-2.10.2-3 made it much, *much* better (I'm not sure yet why!!).
But I can still reproduce the bug on my system with your testcases. It's just much harder to reproduce it, but it still happens.
So, it's not fixed yet, it just happens much less often.

I'm continuing to look into this issue, but at least we have some progress...

Helge
-- 
Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.!
http://portal.gmx.net/de/go/dsl02




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Mon, 04 Jan 2010 15:51:09 GMT) Full text and rfc822 format available.

Acknowledgement sent to Modestas Vainius <modestas@vainius.eu>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Mon, 04 Jan 2010 15:51:09 GMT) Full text and rfc822 format available.

Message #86 received at 561203@bugs.debian.org (full text, mbox):

From: Modestas Vainius <modestas@vainius.eu>
To: 561203@bugs.debian.org, "Helge Deller" <deller@gmx.de>
Cc: luk@debian.org, carlos@systemhalted.org, debian-hppa@lists.debian.org, dannf@dannf.org
Subject: Re: Bug#561203: FTBFS [hppa] - pthread_create() (QThread) + fork() = crash
Date: Mon, 4 Jan 2010 17:47:54 +0200
[Message part 1 (text/plain, inline)]
tags 561203 - pending
thanks

Hello,

On pirmadienis 04 Sausis 2010 17:18:40 Helge Deller wrote:
> libc6-2.10.2-3 made it much, *much* better (I'm not sure yet why!!).
> But I can still reproduce the bug on my system with your testcases. It's
>  just much harder to reproduce it, but it still happens. So, it's not fixed
>  yet, it just happens much less often.

Indeed, you are right. I was able to run `./minifail qt` 90k times without a 
hang, but it hang at the 3000+th run next time. Anyway, probability of hitting 
this bug has become much much lower now so maybe KDE will finally build on 
hppa now. Even if build fails with a timeout like previously, it should be 
enough to give back it once again.

Btw, backtrace of the hang is different now:

(gdb) t 2
[Switching to thread 2 (Thread 0x41d26480 (LWP 4088))]#0  0x0000046c in ?? ()
(gdb) bt                                                                     
#0  0x0000046c in ?? ()                                                      
#1  0x40a06380 in ?? () from /lib/libc.so.6                                  
#2  0x40a06380 in ?? () from /lib/libc.so.6                                  
#3  0x40a060b4 in malloc () from /lib/libc.so.6                              
#4  0x4093b2b4 in operator new(unsigned int) () from /usr/lib/libstdc++.so.6 
#5  0x404e45e8 in QThreadPrivate::createEventDispatcher (data=0x16c40) at 
thread/qthread_unix.cpp:159
#6  0x404e4858 in QThreadPrivate::start (arg=0x168f8) at 
thread/qthread_unix.cpp:183                 
#7  0x403080a0 in start_thread () from /lib/libpthread.so.0                                          
#8  0x40a66898 in clone () from /lib/libc.so.6                                                       
#9  0x04010300 in ?? ()                                                                              
#10 0x04010300 in ?? ()                                                                              
Backtrace stopped: previous frame identical to this frame (corrupt stack?)                           
(gdb) x/20i 0x40a060b4
0x40a060b4 <malloc+1208>:       b,l 0x40a05d60 <malloc+356>,r0
0x40a060b8 <malloc+1212>:       copy ret0,r5                  
0x40a060bc <malloc+1216>:       mfctl tr3,ret0                
0x40a060c0 <malloc+1220>:       ldi 0,r23                     
0x40a060c4 <malloc+1224>:       ldw -478(ret0),r25            
0x40a060c8 <malloc+1228>:       ldi 1,r24                     
0x40a060cc <malloc+1232>:       depwi -1,31,1,r25             
0x40a060d0 <malloc+1236>:       copy r3,r26                   
0x40a060d4 <malloc+1240>:       copy r19,r4                   
0x40a060d8 <malloc+1244>:       be,l 100(sr2,r0),sr0,r31      
0x40a060dc <malloc+1248>:       ldi d2,r20                    
0x40a060e0 <malloc+1252>:       copy r4,r19                   
0x40a060e4 <malloc+1256>:       b,l 0x40a05d28 <malloc+300>,r0
0x40a060e8 <malloc+1260>:       ldo -8(r5),r20                
0x40a060ec <malloc+1264>:       mfctl tr3,ret0                
0x40a060f0 <malloc+1268>:       ldi 0,r23                     
0x40a060f4 <malloc+1272>:       ldw -478(ret0),r25            
0x40a060f8 <malloc+1276>:       ldi 1,r24                     
0x40a060fc <malloc+1280>:       depwi -1,31,1,r25             
0x40a06100 <malloc+1284>:       copy r7,r26                   
(gdb) x/20i 0x40a06380
0x40a06380:     b,l 0x40a0622c,r0
0x40a06384:     copy r4,r19      
0x40a06388:     mfctl tr3,ret0   
0x40a0638c:     copy r5,r26      
0x40a06390:     ldw -478(ret0),r25
0x40a06394:     ldi 0,r23         
0x40a06398:     depwi -1,31,1,r25 
0x40a0639c:     ldi 1,r24         
0x40a063a0:     copy r19,r4       
0x40a063a4:     be,l 100(sr2,r0),sr0,r31
0x40a063a8:     ldi d2,r20              
0x40a063ac:     copy r4,r19             
0x40a063b0:     b,l,n 0x40a06290,r0     
0x40a063b4:     stw rp,-14(sp)          
0x40a063b8:     addil L%1000,r19,r1     
0x40a063bc:     ldo 40(sp),sp           
0x40a063c0:     ldw 35c(r1),ret0        
0x40a063c4:     stw r4,-34(sp)
0x40a063c8:     copy r19,r4
0x40a063cc:     stw r19,-20(sp)


-- 
Modestas Vainius <modestas@vainius.eu>
[signature.asc (application/pgp-signature, inline)]

Removed tag(s) pending. Request was from Modestas Vainius <modestas@vainius.eu> to control@bugs.debian.org. (Mon, 04 Jan 2010 15:51:11 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Mon, 04 Jan 2010 16:13:44 GMT) Full text and rfc822 format available.

Acknowledgement sent to dann frazier <dannf@dannf.org>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Mon, 04 Jan 2010 16:13:44 GMT) Full text and rfc822 format available.

Message #93 received at 561203@bugs.debian.org (full text, mbox):

From: dann frazier <dannf@dannf.org>
To: Modestas Vainius <modestas@vainius.eu>
Cc: 561203@bugs.debian.org, Helge Deller <deller@gmx.de>, luk@debian.org, carlos@systemhalted.org, debian-hppa@lists.debian.org
Subject: Re: Bug#561203: FTBFS [hppa] - pthread_create() (QThread) + fork() = crash
Date: Mon, 4 Jan 2010 08:54:16 -0700
On Mon, Jan 04, 2010 at 05:47:54PM +0200, Modestas Vainius wrote:
> On pirmadienis 04 Sausis 2010 17:18:40 Helge Deller wrote:
> > libc6-2.10.2-3 made it much, *much* better (I'm not sure yet why!!).
> > But I can still reproduce the bug on my system with your testcases. It's
> >  just much harder to reproduce it, but it still happens. So, it's not fixed
> >  yet, it just happens much less often.
> 
> Indeed, you are right. I was able to run `./minifail qt` 90k times without a 
> hang, but it hang at the 3000+th run next time. Anyway, probability of hitting 
> this bug has become much much lower now so maybe KDE will finally build on 
> hppa now. Even if build fails with a timeout like previously, it should be 
> enough to give back it once again.

given back.




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Fri, 02 Apr 2010 03:03:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to NIIBE Yutaka <gniibe@fsij.org>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Fri, 02 Apr 2010 03:03:04 GMT) Full text and rfc822 format available.

Message #98 received at 561203@bugs.debian.org (full text, mbox):

From: NIIBE Yutaka <gniibe@fsij.org>
To: linux-parisc@vger.kernel.org
Cc: pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: threads and fork on machine with VIPT-WB cache
Date: Fri, 02 Apr 2010 11:41:16 +0900
Hi there,

I think that I am catching a bug for threads and fork.  I found it
when debugging FTBFS of Gauche, a Scheme interpreter.  As I think that
the Debian bug #561203 has same cause, I am CC:-ing to the BTS too.
Please send Cc: to me, I am not on linux-parisc list.

Here, I am talking uniprocessor system case.
I assume that PARISC has virtually indexed, physically tagged, write
back cache, I call it VIPT-WB.

I am reading the source in Debian:
	linux-source-2.6.32/kernel/fork.c
	linux-source-2.6.32/mm/memory.c
	linux-source-2.6.32/arch/parisc/include/asm/pgtable.h

To have same semantics as other archs, I think that VIPT-WB cache
machine should have cache flush at ptep_set_wrprotect, so that memory
of the page has up-to-date data.  Yes, it will be huge performance
impact for fork.  But I don't find any good solution other than this
yet.  Well, I will need to write to linux-arch.

Let me explain our case.  As I couldn't catch the scene, but the
result, it includes imagination and interpretation of mine.  Correct
me if I'm wrong.

(1) We have process A with threads.  One of threads calls fork(2) (in
    fact, it's clone(2) without CLONE_VM) when other threads are still
    live.  Let's call this thread A-1.

(2) As a result of clone(2), we have process B.

(3) The memory of process A are copied to process B by dup_mmap
    (fork.c) by A-1 with the context of process A.  There,
    flush_cache_dup_mm is called.

    In case of single thread, flush_cache_dup_mm is enough.  All data
    in cache go to memory.  But we have other threads, in this case.

(4) From dup_mmap, copy_page_range (memory.c) is called.

    Note that there is a possibility of sleep in copy_page_range.
    Allocation of page in pud_alloc, pmd_alloc, or pte_alloc_map_lock
    may need the A-1 thread to be scheduled off (and wakes up the
    swapper or other processes).

(5) Suppose the A-1 thread sleeps in copy_page_range, and another
    thread of A-2 of process A is waken up, and touches memory.  Then
    we have data only in cache, memory has stale data.

(6) A-2 thread sleeps, and A-1 thread is waken up to continue
    copy_page_range -> copy_*_range -> copy_one_pte.

(7) From copy_one_pte, A-1 thread call ptep_set_wrprotect as
    this is COW mapping. (*)

(8) A-1 thread sleeps again in copy_page_range and process B is waken up.

(9) Process B does read-access on memory, which gets *NEW* data in
    cache (if process space identifier color is same).
    Process B does write-access on memory which causes memory fault,
    as it's COW memory.

    Note: Process B sees *NEW* data because it's VIPT-WB cache.
    It shares same memory in this situation.

(10) New page is allocated and memory contents are copied, with
     stale data.

     I assume that kernel access to the memory is by different
     cache line and does not see cache data of A-2.

(11) After falut, process B gets *OLD* data on memory.


(*) When we make COW memory mapping between process A and process B,
    we assume memory were up-to-date.  As this assumption is
    incorrect, I think that we need to flush cache data to memory
    here.


If you have more interest or like ASCII art, please keep reading.

In our Gauche case, we saw this problem on the linked list handling of
pthread implementation (NPTL).  We have two linked list heads, <used>
and <cache>.

Initially, situation of process A is like this:

      +-------------------------------------+
      |                                     |
used  v     ELEM                            |
+-----+     +-----+     +-----+     +-----+ |
|   ------->|   ------->|   ------->|   ----+
+-----+     +-----+     +-----+     +-----+
            |     |     |     |     |     |
            +-----+     +-----+     +-----+

      +-------------+
      |             |
cache v             |
+-----+     +-----+ |
|   ------->|   ----+
+-----+     +-----+                       This is in memory
            |     |
            +-----+

A-2 thread removes ELEM during fork.
This is Process A's final situation, and what Process B sees initially.

      +-------------------------------------+
      |                                     |
used  v                                     |
+-----+                 +-----+     +-----+ |
|   ------------------->|   ------->|   ----+
+-----+                 +-----+     +-----+
                        |     |     |     |
                        +-----+     +-----+

      +---------------------------+
      |     ELEM                  |
      |     +-----+               |
      | +-->|   -----+            |
      | |   +-----+  |            |
      | |   |     |  |            |
cache v |   +-----+  |            |        This is in cache
+-----+ |            |   +-----+  |
|   ----+            +-->|   -----+
+-----+                  +-----+
                         |     |
                         +-----+


Process B scans through linked list with <cache> and update data
in linked list.  After process B touches ELEM, it sees
*OLD* data of ELEM.


      +-------------------------------------+
      |                                     |
used  v                                     |
+-----+                 +-----+     +-----+ |
|   -----------------+->|   ------->|   ----+
+-----+              |  +-----+     +-----+
                     |  |     |     |     |
            ELEM     |  +-----+     +-----+
            +-----+  |
        +-->|   -----+ Wow!
        |   +-----+
        |   |*****|
cache   |   +-----+
+-----+ |                +-----+
|   ----+                |   ----> ... to cache
+-----+                  +-----+
                         |     |
                         +-----+

Process B follows the link and goes different places
and touches wrongly.

      +-------------------------------------+
      |                                     |
used  v                                     |
+-----+                 +-----+     +-----+ |
|   -----------------+->|   ------->|   ----+
+-----+              |  +-----+     +-----+
                     |  |*****|     |     |
            ELEM     |  +-----+     +-----+
            +-----+  |
        +-->|   -----+
        |   +-----+
        |   |*****|
cache   |   +-----+
+-----+ |                +-----+
|   ----+                |   ----> ... to cache
+-----+                  +-----+
                         |     |
                         +-----+

      +-------------------------------------+
      |                                     |
used  v                                     |
+-----+                 +-----+     +-----+ |
|   -----------------+->|   ------->|   ----+
+-----+              |  +-----+     +-----+
                     |  |*****|     |*****|
            ELEM     |  +-----+     +-----+
            +-----+  |
        +-->|   -----+
        |   +-----+
        |   |*****|
cache   |   +-----+
+-----+ |                +-----+
|   ----+                |   ----> ... to cache
+-----+                  +-----+
                         |     |
                         +-----+

Process B scans and goes linked list head of <used> as if it were
element of linked list.  Process B couldn't stop because its
condition is comparison with the head <cache>.  Process B touches
memory, and then it sees *OLD* data of <used>.  Besides,
<cache> is on the same page with <used>, it's contents from
viewpoint of process B is also changed to *OLD*.

      +-------------------------------------+
      |                                     |
used  v                                     |
+-----+ Wow!            +-----+     +-----+ |
|   -----+           +->|   ------->|   ----+
+-----+  |           |  +-----+     +-----+
 *****   |           |  |*****|     |*****|
         |  ELEM     |  +-----+     +-----+
         |  +-----+  |
         +->|   -----+
            +-----+
            |*****|
cache       +-----+
+-----+ Wow!             +-----+
|   -------------------->|   ----> ... to cache
+-----+                  +-----+
                         |     |
                         +-----+

Process B continues scanning this linked list forever.
It enters this loop from <cache>, but <cache>
does not points to ELEM now.
-- 




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Fri, 02 Apr 2010 03:42:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to James Bottomley <James.Bottomley@HansenPartnership.com>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Fri, 02 Apr 2010 03:42:05 GMT) Full text and rfc822 format available.

Message #103 received at 561203@bugs.debian.org (full text, mbox):

From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: NIIBE Yutaka <gniibe@fsij.org>
Cc: linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Thu, 01 Apr 2010 23:30:25 -0400
On Fri, 2010-04-02 at 11:41 +0900, NIIBE Yutaka wrote:
> (9) Process B does read-access on memory, which gets *NEW* data in
>     cache (if process space identifier color is same).
>     Process B does write-access on memory which causes memory fault,
>     as it's COW memory.
> 
>     Note: Process B sees *NEW* data because it's VIPT-WB cache.
>     It shares same memory in this situation.

So I think the bug here is that you're confusing aliasing with SMP cache
coherence.  In an alias situation, the same physical line is mapped to
multiple lines in a processor's cache (at different virtual addresses),
which means you can get a different answer depending on which alias you
read.

In COW breaking, the page table entry is copied, so A and B no longer
have page table entries at the same physical location.  If the COW is
intact, A and B have the same physical page, but it's also accessed by
the same virtual address, hence no aliasing.

In an SMP incoherent system, A and B could get different results (if on
different CPUs) because the write protect is in the cache of A but not
B.  However, PA is SMP coherent, so the act of B reading a line which is
dirty in A's cache causes a flush before the read completes via the
cache chequerboard logic and B ends up reading the same value A would
have read.

James






Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Fri, 02 Apr 2010 03:51:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to NIIBE Yutaka <gniibe@fsij.org>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Fri, 02 Apr 2010 03:51:04 GMT) Full text and rfc822 format available.

Message #108 received at 561203@bugs.debian.org (full text, mbox):

From: NIIBE Yutaka <gniibe@fsij.org>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Fri, 02 Apr 2010 12:48:11 +0900
Thanks for your quick reply.

James Bottomley wrote:
> In COW breaking, the page table entry is copied, so A and B no longer
> have page table entries at the same physical location.  If the COW is
> intact, A and B have the same physical page, but it's also accessed by
> the same virtual address, hence no aliasing.

Let me explain more.

In the scenario, I assume:

	No aliasing between A and B.
	We have aliasing between kernel access and user access.

Before COW breaking A and B share same data (with no aliasing same
space identifier color), and B sees data in cache, while memory has
stale data.

At COW breaking, kernel copies the memory, it doesn't see new data
in cache because of aliasing.

Isn't it possible?
-- 




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Fri, 02 Apr 2010 08:12:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to NIIBE Yutaka <gniibe@fsij.org>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Fri, 02 Apr 2010 08:12:06 GMT) Full text and rfc822 format available.

Message #113 received at 561203@bugs.debian.org (full text, mbox):

From: NIIBE Yutaka <gniibe@fsij.org>
To: linux-parisc@vger.kernel.org
Cc: pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Fri, 02 Apr 2010 17:05:43 +0900
NIIBE Yutaka wrote:
> To have same semantics as other archs, I think that VIPT-WB cache
> machine should have cache flush at ptep_set_wrprotect, so that memory
> of the page has up-to-date data.  Yes, it will be huge performance
> impact for fork.  But I don't find any good solution other than this
> yet.

I think we could do something like (only for VIPT-WB cache machine):

-	static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned 
long address, pte_t *ptep)

+	static inline void ptep_set_wrprotect(struct vm_area_struct *vma, 
struct mm_struct *mm, unsigned long addr, pte_t *ptep)
	{
		pte_t old_pte = *ptep;
+		if (atomic_read(&mm->mm_users) > 1)
+			flush_cache_page(vma, addr, pte_pfn(old_pte));
		set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
	}

Here, we can add condition for the call of flush_cache_page
to avoid big performance impact for non threads case.
-- 




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Fri, 02 Apr 2010 12:27:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to James Bottomley <James.Bottomley@HansenPartnership.com>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Fri, 02 Apr 2010 12:27:03 GMT) Full text and rfc822 format available.

Message #118 received at 561203@bugs.debian.org (full text, mbox):

From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: NIIBE Yutaka <gniibe@fsij.org>
Cc: linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Fri, 02 Apr 2010 08:22:07 -0400
On Fri, 2010-04-02 at 12:48 +0900, NIIBE Yutaka wrote:
> Thanks for your quick reply.
> 
> James Bottomley wrote:
> > In COW breaking, the page table entry is copied, so A and B no longer
> > have page table entries at the same physical location.  If the COW is
> > intact, A and B have the same physical page, but it's also accessed by
> > the same virtual address, hence no aliasing.
> 
> Let me explain more.
> 
> In the scenario, I assume:
> 
> 	No aliasing between A and B.
> 	We have aliasing between kernel access and user access.
> 
> Before COW breaking A and B share same data (with no aliasing same
> space identifier color), and B sees data in cache, while memory has
> stale data.
> 
> At COW breaking, kernel copies the memory, it doesn't see new data
> in cache because of aliasing.
> 
> Isn't it possible?

So your theory is that the data the kernel sees doing the page copy can
be stale because of dirty cache lines in userspace (which is certainly
possible in the ordinary way)?  By design that shouldn't happen: the
idea behind COW breaking is that before it breaks, the page is read
only ... this means that processes can have clean cache copies of it,
but never dirty cache copies (because writes are forbidden).  As soon as
one or other process tries to write to the page, it gets a memory
protection trap long before the data it's trying to write goes into the
cache.  By the time the write is allowed to complete (and the cache
becomes dirty), the process will have the new copy of the page which
belongs exclusively to it.

James






Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Fri, 02 Apr 2010 19:48:09 GMT) Full text and rfc822 format available.

Acknowledgement sent to John David Anglin <dave.anglin@nrc-cnrc.gc.ca>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Fri, 02 Apr 2010 19:48:09 GMT) Full text and rfc822 format available.

Message #123 received at 561203@bugs.debian.org (full text, mbox):

From: John David Anglin <dave@hiauly1.hia.nrc.ca>
To: NIIBE Yutaka <gniibe@fsij.org>
Cc: linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Fri, 2 Apr 2010 15:35:14 -0400
On Fri, 02 Apr 2010, NIIBE Yutaka wrote:

> NIIBE Yutaka wrote:
>> To have same semantics as other archs, I think that VIPT-WB cache
>> machine should have cache flush at ptep_set_wrprotect, so that memory
>> of the page has up-to-date data.  Yes, it will be huge performance
>> impact for fork.  But I don't find any good solution other than this
>> yet.
>
> I think we could do something like (only for VIPT-WB cache machine):
>
> -	static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
> address, pte_t *ptep)
>
> +	static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct 
> mm_struct *mm, unsigned long addr, pte_t *ptep)
> 	{
> 		pte_t old_pte = *ptep;
> +		if (atomic_read(&mm->mm_users) > 1)
> +			flush_cache_page(vma, addr, pte_pfn(old_pte));
> 		set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
> 	}

I tested the hack below on two machines currently running 2.6.33.2
UP kernels.  The change seems to fix Debian #561203 (minifail bug)!
Thus, I definitely think you are on the right track.  I'll continue
to test.

I suspect the same issue is present for SMP kernels.

Thanks,
Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index a27d2e2..a5d730f 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -14,6 +14,7 @@
 #include <linux/bitops.h>
 #include <asm/processor.h>
 #include <asm/cache.h>
+extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn);
 
 /*
  * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
@@ -456,7 +457,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 	return old_pte;
 }
 
-static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
+static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
 #ifdef CONFIG_SMP
 	unsigned long new, old;
@@ -467,6 +468,8 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
 	} while (cmpxchg((unsigned long *) ptep, old, new) != old);
 #else
 	pte_t old_pte = *ptep;
+	if (atomic_read(&mm->mm_users) > 1)
+		flush_cache_page(vma, addr, pte_pfn(old_pte));
 	set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
 #endif
 }
diff --git a/mm/memory.c b/mm/memory.c
index 09e4b1b..21c2916 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * in the parent and the child
 	 */
 	if (is_cow_mapping(vm_flags)) {
-		ptep_set_wrprotect(src_mm, addr, src_pte);
+		ptep_set_wrprotect(vma, src_mm, addr, src_pte);
 		pte = pte_wrprotect(pte);
 	}
 




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Mon, 05 Apr 2010 00:42:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to NIIBE Yutaka <gniibe@fsij.org>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Mon, 05 Apr 2010 00:42:07 GMT) Full text and rfc822 format available.

Message #128 received at 561203@bugs.debian.org (full text, mbox):

From: NIIBE Yutaka <gniibe@fsij.org>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Mon, 05 Apr 2010 09:39:09 +0900
Thanks a lot for the discussion.

James Bottomley wrote:
> So your theory is that the data the kernel sees doing the page copy can
> be stale because of dirty cache lines in userspace (which is certainly
> possible in the ordinary way)?

Yes.

> By design that shouldn't happen: the idea behind COW breaking is
> that before it breaks, the page is read only ... this means that
> processes can have clean cache copies of it, but never dirty cache
> copies (because writes are forbidden).

That must be design, I agree.

To keep this condition (no dirty cache for COW page), we need to flush
cache before ptep_set_wrprotect.  That's my point.

Please look at the code path:
   (kernel/fork.c)
   do_fork -> copy_process -> copy_mm -> dup_mm -> dup_mmap ->
   (mm/memory.c)
   copy_page_range -> copy_p*d_range -> copy_one_pte -> ptep_set_wrprotect

The function flush_cache_dup_mm is called from dup_mmap, that's enough
for a case of a process with single thread.

I think that:
We need to flush cache before ptep_set_wrprotect for a process with
multiple threads.  Other threads may change memory after a thread
invokes do_fork and before calling ptep_set_wrprotect.  Specifically,
a process may sleep at pte_alloc function to get a page.
-- 




Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#561203; Package libc6. (Mon, 05 Apr 2010 01:21:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to NIIBE Yutaka <gniibe@fsij.org>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Mon, 05 Apr 2010 01:21:04 GMT) Full text and rfc822 format available.

Message #133 received at 561203@bugs.debian.org (full text, mbox):

From: NIIBE Yutaka <gniibe@fsij.org>
To: control@bugs.debian.org
Cc: 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Mon, 05 Apr 2010 10:17:09 +0900
retitle 561203 threads and fork on machine with VIPT-WB cache
reassign 561203 linux-2.6
thanks

As I am sure that this bug lives in kernel, I do reassign and retitle.
-- 




Changed Bug title to 'threads and fork on machine with VIPT-WB cache' from 'FTBFS [hppa] - pthread_create() (or QThread) + fork() = crash' Request was from NIIBE Yutaka <gniibe@fsij.org> to control@bugs.debian.org. (Mon, 05 Apr 2010 02:03:08 GMT) Full text and rfc822 format available.

Bug reassigned from package 'libc6' to 'linux-2.6'. Request was from NIIBE Yutaka <gniibe@fsij.org> to control@bugs.debian.org. (Mon, 05 Apr 2010 02:03:09 GMT) Full text and rfc822 format available.

Bug No longer marked as found in versions eglibc/2.10.2-2. Request was from NIIBE Yutaka <gniibe@fsij.org> to control@bugs.debian.org. (Mon, 05 Apr 2010 02:03:09 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Mon, 05 Apr 2010 02:57:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to "John David Anglin" <dave@hiauly1.hia.nrc.ca>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Mon, 05 Apr 2010 02:57:06 GMT) Full text and rfc822 format available.

Message #144 received at 561203@bugs.debian.org (full text, mbox):

From: "John David Anglin" <dave@hiauly1.hia.nrc.ca>
To: gniibe@fsij.org (NIIBE Yutaka)
Cc: James.Bottomley@HansenPartnership.com, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Sun, 4 Apr 2010 22:51:53 -0400 (EDT)
> Thanks a lot for the discussion.
> 
> James Bottomley wrote:
> > So your theory is that the data the kernel sees doing the page copy can
> > be stale because of dirty cache lines in userspace (which is certainly
> > possible in the ordinary way)?
> 
> Yes.
> 
> > By design that shouldn't happen: the idea behind COW breaking is
> > that before it breaks, the page is read only ... this means that
> > processes can have clean cache copies of it, but never dirty cache
> > copies (because writes are forbidden).
> 
> That must be design, I agree.
> 
> To keep this condition (no dirty cache for COW page), we need to flush
> cache before ptep_set_wrprotect.  That's my point.
> 
> Please look at the code path:
>    (kernel/fork.c)
>    do_fork -> copy_process -> copy_mm -> dup_mm -> dup_mmap ->
>    (mm/memory.c)
>    copy_page_range -> copy_p*d_range -> copy_one_pte -> ptep_set_wrprotect
> 
> The function flush_cache_dup_mm is called from dup_mmap, that's enough
> for a case of a process with single thread.
> I think that:
> We need to flush cache before ptep_set_wrprotect for a process with
> multiple threads.  Other threads may change memory after a thread
> invokes do_fork and before calling ptep_set_wrprotect.  Specifically,
> a process may sleep at pte_alloc function to get a page.

I agree.  It is interesting that in the case of the Debian bug that
a thread of the parent process causes the COW break and thereby corrupts
its own memory.  As far as I can tell, the fork'd child never writes
to the memory that causes the fault.

My testing indicates that your suggested change fixes the Debian
bug.  I've attached below my latest test version.  This seems to fix
the bug on both SMP and UP kernels.

However, it doesn't fix all page/cache related issues on parisc
SMP kernels that I commonly see.

My first inclination after even before reading your analysis was
to assume that copy_user_page was broken (i.e, that even if a
processor cache was dirty when the COW page was write protected,
it should be possible to do the flush before the page is copied).
However, this didn't seem to work...  Possibly, there are issues
with aliased addresses.

I note that sparc flushes the entire cache and purges the entire
tlb in kmap_atomic/kunmap_atomic for highmem.  Although the breakage
that I see is not limited to PA8800/PA8900, I'm not convinced
that we maintain coherency that is required for these processors
in copy_user_page when we have multiple threads.

As a side note, kmap_atomic/kunmap_atomic seem to lack calls to
pagefault_disable()/pagefault_enable() on PA8800.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index a27d2e2..b140d5c 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -14,6 +14,7 @@
 #include <linux/bitops.h>
 #include <asm/processor.h>
 #include <asm/cache.h>
+extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn);
 
 /*
  * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
@@ -456,17 +457,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 	return old_pte;
 }
 
-static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
+static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
 #ifdef CONFIG_SMP
 	unsigned long new, old;
+#endif
+	pte_t old_pte = *ptep;
+
+	if (atomic_read(&mm->mm_users) > 1)
+		flush_cache_page(vma, addr, pte_pfn(old_pte));
 
+#ifdef CONFIG_SMP
 	do {
 		old = pte_val(*ptep);
 		new = pte_val(pte_wrprotect(__pte (old)));
 	} while (cmpxchg((unsigned long *) ptep, old, new) != old);
 #else
-	pte_t old_pte = *ptep;
 	set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
 #endif
 }
diff --git a/mm/memory.c b/mm/memory.c
index 09e4b1b..21c2916 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * in the parent and the child
 	 */
 	if (is_cow_mapping(vm_flags)) {
-		ptep_set_wrprotect(src_mm, addr, src_pte);
+		ptep_set_wrprotect(vma, src_mm, addr, src_pte);
 		pte = pte_wrprotect(pte);
 	}
 




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Mon, 05 Apr 2010 03:03:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to "John David Anglin" <dave@hiauly1.hia.nrc.ca>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Mon, 05 Apr 2010 03:03:07 GMT) Full text and rfc822 format available.

Message #149 received at 561203@bugs.debian.org (full text, mbox):

From: "John David Anglin" <dave@hiauly1.hia.nrc.ca>
To: dave@hiauly1.hia.nrc.ca (John David Anglin)
Cc: gniibe@fsij.org, James.Bottomley@HansenPartnership.com, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Sun, 4 Apr 2010 22:58:40 -0400 (EDT)
> > > By design that shouldn't happen: the idea behind COW breaking is
> > > that before it breaks, the page is read only ... this means that
> > > processes can have clean cache copies of it, but never dirty cache
> > > copies (because writes are forbidden).
> > 
> > That must be design, I agree.
> > 
> > To keep this condition (no dirty cache for COW page), we need to flush
> > cache before ptep_set_wrprotect.  That's my point.

Is it possible that a sleep/reschedule could cause the cache to become
dirty again before it is write protected?

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Mon, 05 Apr 2010 16:21:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to James Bottomley <James.Bottomley@HansenPartnership.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Mon, 05 Apr 2010 16:21:03 GMT) Full text and rfc822 format available.

Message #154 received at 561203@bugs.debian.org (full text, mbox):

From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: John David Anglin <dave@hiauly1.hia.nrc.ca>
Cc: NIIBE Yutaka <gniibe@fsij.org>, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Mon, 05 Apr 2010 11:18:38 -0500
On Sun, 2010-04-04 at 22:51 -0400, John David Anglin wrote:
> > Thanks a lot for the discussion.
> > 
> > James Bottomley wrote:
> > > So your theory is that the data the kernel sees doing the page copy can
> > > be stale because of dirty cache lines in userspace (which is certainly
> > > possible in the ordinary way)?
> > 
> > Yes.
> > 
> > > By design that shouldn't happen: the idea behind COW breaking is
> > > that before it breaks, the page is read only ... this means that
> > > processes can have clean cache copies of it, but never dirty cache
> > > copies (because writes are forbidden).
> > 
> > That must be design, I agree.
> > 
> > To keep this condition (no dirty cache for COW page), we need to flush
> > cache before ptep_set_wrprotect.  That's my point.
> > 
> > Please look at the code path:
> >    (kernel/fork.c)
> >    do_fork -> copy_process -> copy_mm -> dup_mm -> dup_mmap ->
> >    (mm/memory.c)
> >    copy_page_range -> copy_p*d_range -> copy_one_pte -> ptep_set_wrprotect
> > 
> > The function flush_cache_dup_mm is called from dup_mmap, that's enough
> > for a case of a process with single thread.
> > I think that:
> > We need to flush cache before ptep_set_wrprotect for a process with
> > multiple threads.  Other threads may change memory after a thread
> > invokes do_fork and before calling ptep_set_wrprotect.  Specifically,
> > a process may sleep at pte_alloc function to get a page.
> 
> I agree.  It is interesting that in the case of the Debian bug that
> a thread of the parent process causes the COW break and thereby corrupts
> its own memory.  As far as I can tell, the fork'd child never writes
> to the memory that causes the fault.
> 
> My testing indicates that your suggested change fixes the Debian
> bug.  I've attached below my latest test version.  This seems to fix
> the bug on both SMP and UP kernels.
> 
> However, it doesn't fix all page/cache related issues on parisc
> SMP kernels that I commonly see.
> 
> My first inclination after even before reading your analysis was
> to assume that copy_user_page was broken (i.e, that even if a
> processor cache was dirty when the COW page was write protected,
> it should be possible to do the flush before the page is copied).
> However, this didn't seem to work...  Possibly, there are issues
> with aliased addresses.
> 
> I note that sparc flushes the entire cache and purges the entire
> tlb in kmap_atomic/kunmap_atomic for highmem.  Although the breakage
> that I see is not limited to PA8800/PA8900, I'm not convinced
> that we maintain coherency that is required for these processors
> in copy_user_page when we have multiple threads.
> 
> As a side note, kmap_atomic/kunmap_atomic seem to lack calls to
> pagefault_disable()/pagefault_enable() on PA8800.
> 
> Dave
> -- 
> J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
> National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
> 
> diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
> index a27d2e2..b140d5c 100644
> --- a/arch/parisc/include/asm/pgtable.h
> +++ b/arch/parisc/include/asm/pgtable.h
> @@ -14,6 +14,7 @@
>  #include <linux/bitops.h>
>  #include <asm/processor.h>
>  #include <asm/cache.h>
> +extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn);
>  
>  /*
>   * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
> @@ -456,17 +457,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
>  	return old_pte;
>  }
>  
> -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> +static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>  {
>  #ifdef CONFIG_SMP
>  	unsigned long new, old;
> +#endif
> +	pte_t old_pte = *ptep;
> +
> +	if (atomic_read(&mm->mm_users) > 1)

Just to verify there's nothing this is hiding, can you make this 

	if (pte_dirty(old_pte))

and reverify?  The if clause should only trip on the case where the
parent has dirtied the line between flush and now.

> +		flush_cache_page(vma, addr, pte_pfn(old_pte));
>  
> +#ifdef CONFIG_SMP
>  	do {
>  		old = pte_val(*ptep);
>  		new = pte_val(pte_wrprotect(__pte (old)));
>  	} while (cmpxchg((unsigned long *) ptep, old, new) != old);
>  #else
> -	pte_t old_pte = *ptep;
>  	set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
>  #endif
>  }
> diff --git a/mm/memory.c b/mm/memory.c
> index 09e4b1b..21c2916 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	 * in the parent and the child
>  	 */
>  	if (is_cow_mapping(vm_flags)) {
> -		ptep_set_wrprotect(src_mm, addr, src_pte);
> +		ptep_set_wrprotect(vma, src_mm, addr, src_pte);

So this is going to be a hard sell because of the arch churn. There are,
however, three ways to do it with the original signature.

     1. implement copy_user_highpage ... this allows us to copy through
        the child's page cache (which is coherent with the parent's
        before the cow) and thus pick up any cache changes without a
        flush
     2. use the mm identically to flush_user_cache_page_noncurrent.  The
        only reason that needs the vma is for the icache check ... but
        that shouldn't happen here (if the parent is actually doing a
        self modifying exec region, it needs to manage coherency
        itself).
     3. Flush in kmap ... this is something that's been worrying me
        since the flamewars over kmap for pio.

James






Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Tue, 06 Apr 2010 05:03:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to NIIBE Yutaka <gniibe@fsij.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Tue, 06 Apr 2010 05:03:04 GMT) Full text and rfc822 format available.

Message #159 received at 561203@bugs.debian.org (full text, mbox):

From: NIIBE Yutaka <gniibe@fsij.org>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: John David Anglin <dave@hiauly1.hia.nrc.ca>, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Tue, 06 Apr 2010 13:57:07 +0900
John David Anglin wrote:
> It is interesting that in the case of the Debian bug that
> a thread of the parent process causes the COW break and thereby corrupts
> its own memory.  As far as I can tell, the fork'd child never writes
> to the memory that causes the fault.

Thanks for writing and testing a patch.

The case of #561203 is second scenario.  I think that this case is
relevant to VIVT-WB machine too (provided kernel does copy by kernel
address).

James Bottomley wrote:
> So this is going to be a hard sell because of the arch churn. There are,
> however, three ways to do it with the original signature.

Currently, I think that signature change would be inevitable for
ptep_set_wrprotect.

>      1. implement copy_user_highpage ... this allows us to copy through
>         the child's page cache (which is coherent with the parent's
>         before the cow) and thus pick up any cache changes without a
>         flush

Let me think about this way.

Well, this would improve both cases of the first scenario of mine and
the second scenario.

But... I think that even if we would have copy_user_highpage which
does copy by user address, we need to flush at ptep_set_wrprotect.  I
think that we need to keep the condition: no dirty cache for COW page.

Think about third scenario of threads and fork:

(1) In process A, there are multiple threads, and a thread A-1 invokes
    fork.  We have process B, with a different space identifier color.

(2) Another thread A-2 in process A runs while A-1 copies memory by
    dup_mmap.  A-2 writes to the address <x> in a page.  Let's call
    this page <oldpage>.

(3) We have dirty cache for <x> by A-2 at the time of
    ptep_set_wrprotect of thread A-1.  Suppose that we don't flush
    here.

(4) A-1 finishes copy, and sleeps.

(5) Child process B is waken up and sees old value at <x> in <oldpage>,
    through different cache line.  B sleeps.

(6) A-2 is waken up.  A-2 touches the memory again, breaks COW.  A-2
    copies data on <oldpage> to <newpage>.  OK, <newpage> is
    consistent with copy_user_highpage by user address.

    Note that during this copy, the cache line of <x> by A-2 is
    flushed out to <oldpage>.  It invokes another memory fault and COW
    break.  (I think that this memory fault is unhealthy.)
    Then, new value goes to <x> on <oldpage> (when it's physically
    tagged cache).

    A-2 sleeps.

(7) Child process B is waken up.  When it accesses at <x>, it sees new
    value suddenly.


If we flush cache to <oldpage> at ptep_set_wrprotect, this couldn't
occur.


			*	*	*


I know that we should not do "threads and fork".  It is difficult to
define clean semantics.  Because another thread may touch memory while
a thread which does memory copy for fork, the memory what the child
process will see may be inconsistent.  For the child, a page might be
new, while another page might be old.

For VIVT-WB cache machine, I am considering a possibility for the
child process to have inconsistent memory even within a single page
(when we have no flush at ptep_set_wrprotect).

It will be needed for me to talk to linux-arch soon or later.
-- 




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Tue, 06 Apr 2010 13:42:08 GMT) Full text and rfc822 format available.

Acknowledgement sent to James Bottomley <James.Bottomley@HansenPartnership.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Tue, 06 Apr 2010 13:42:08 GMT) Full text and rfc822 format available.

Message #164 received at 561203@bugs.debian.org (full text, mbox):

From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: NIIBE Yutaka <gniibe@fsij.org>
Cc: John David Anglin <dave@hiauly1.hia.nrc.ca>, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Tue, 06 Apr 2010 08:37:49 -0500
On Tue, 2010-04-06 at 13:57 +0900, NIIBE Yutaka wrote:
> John David Anglin wrote:
> > It is interesting that in the case of the Debian bug that
> > a thread of the parent process causes the COW break and thereby corrupts
> > its own memory.  As far as I can tell, the fork'd child never writes
> > to the memory that causes the fault.
> 
> Thanks for writing and testing a patch.
> 
> The case of #561203 is second scenario.  I think that this case is
> relevant to VIVT-WB machine too (provided kernel does copy by kernel
> address).
> 
> James Bottomley wrote:
> > So this is going to be a hard sell because of the arch churn. There are,
> > however, three ways to do it with the original signature.
> 
> Currently, I think that signature change would be inevitable for
> ptep_set_wrprotect.

Well we can't do it by claiming several architectures are wrong in their
implementation.  We might do it by claiming to need vma knowledge ...
however, even if you want the flush, as I said, you don't need to change
the signature.

> >      1. implement copy_user_highpage ... this allows us to copy through
> >         the child's page cache (which is coherent with the parent's
> >         before the cow) and thus pick up any cache changes without a
> >         flush
> 
> Let me think about this way.
> 
> Well, this would improve both cases of the first scenario of mine and
> the second scenario.
> 
> But... I think that even if we would have copy_user_highpage which
> does copy by user address, we need to flush at ptep_set_wrprotect.  I
> think that we need to keep the condition: no dirty cache for COW page.
> 
> Think about third scenario of threads and fork:
> 
> (1) In process A, there are multiple threads, and a thread A-1 invokes
>     fork.  We have process B, with a different space identifier color.

I don't understand what you mean by space colour ... there's cache
colour which refers to the line in the cache to which the the physical
memory maps.  The way PA is set up, space ID doesn't factor into cache
colour.

> (2) Another thread A-2 in process A runs while A-1 copies memory by
>     dup_mmap.  A-2 writes to the address <x> in a page.  Let's call
>     this page <oldpage>.
> 
> (3) We have dirty cache for <x> by A-2 at the time of
>     ptep_set_wrprotect of thread A-1.  Suppose that we don't flush
>     here.
> 
> (4) A-1 finishes copy, and sleeps.
> 
> (5) Child process B is waken up and sees old value at <x> in <oldpage>,
>     through different cache line.  B sleeps.

This isn't possible.  at this point, A and B have the same virtual
address and mapping for <oldpage> this means they are the same cache
colour, so they both see the cached value.

James

> (6) A-2 is waken up.  A-2 touches the memory again, breaks COW.  A-2
>     copies data on <oldpage> to <newpage>.  OK, <newpage> is
>     consistent with copy_user_highpage by user address.
> 
>     Note that during this copy, the cache line of <x> by A-2 is
>     flushed out to <oldpage>.  It invokes another memory fault and COW
>     break.  (I think that this memory fault is unhealthy.)
>     Then, new value goes to <x> on <oldpage> (when it's physically
>     tagged cache).
> 
>     A-2 sleeps.
> 
> (7) Child process B is waken up.  When it accesses at <x>, it sees new
>     value suddenly.
> 
> 
> If we flush cache to <oldpage> at ptep_set_wrprotect, this couldn't
> occur.
> 
> 
> 			*	*	*
> 
> 
> I know that we should not do "threads and fork".  It is difficult to
> define clean semantics.  Because another thread may touch memory while
> a thread which does memory copy for fork, the memory what the child
> process will see may be inconsistent.  For the child, a page might be
> new, while another page might be old.
> 
> For VIVT-WB cache machine, I am considering a possibility for the
> child process to have inconsistent memory even within a single page
> (when we have no flush at ptep_set_wrprotect).
> 
> It will be needed for me to talk to linux-arch soon or later.






Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Tue, 06 Apr 2010 13:48:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to James Bottomley <James.Bottomley@HansenPartnership.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Tue, 06 Apr 2010 13:48:07 GMT) Full text and rfc822 format available.

Message #169 received at 561203@bugs.debian.org (full text, mbox):

From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: NIIBE Yutaka <gniibe@fsij.org>
Cc: John David Anglin <dave@hiauly1.hia.nrc.ca>, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Tue, 06 Apr 2010 08:44:41 -0500
On Tue, 2010-04-06 at 08:37 -0500, James Bottomley wrote:
> > (5) Child process B is waken up and sees old value at <x> in
> <oldpage>,
> >     through different cache line.  B sleeps.
> 
> This isn't possible.  at this point, A and B have the same virtual
> address and mapping for <oldpage> this means they are the same cache
> colour, so they both see the cached value.

Perhaps to add more detail to this.  In spite of what the arch manual
says (it says the congruence stride is 16MB), the congruence stride on
all manufactured parisc processors is 4MB.  This means that any virtual
addresses, regardless of space id, that are equal modulo 4MB have the
same cache colour.

James
 





Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Thu, 08 Apr 2010 21:18:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Helge Deller <deller@gmx.de>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Apr 2010 21:18:07 GMT) Full text and rfc822 format available.

Message #174 received at 561203@bugs.debian.org (full text, mbox):

From: Helge Deller <deller@gmx.de>
To: John David Anglin <dave.anglin@nrc-cnrc.gc.ca>
Cc: John David Anglin <dave@hiauly1.hia.nrc.ca>, NIIBE Yutaka <gniibe@fsij.org>, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Thu, 08 Apr 2010 23:11:41 +0200
On 04/02/2010 09:35 PM, John David Anglin wrote:
> On Fri, 02 Apr 2010, NIIBE Yutaka wrote:
> 
>> NIIBE Yutaka wrote:
>>> To have same semantics as other archs, I think that VIPT-WB cache
>>> machine should have cache flush at ptep_set_wrprotect, so that memory
>>> of the page has up-to-date data.  Yes, it will be huge performance
>>> impact for fork.  But I don't find any good solution other than this
>>> yet.
>>
>> I think we could do something like (only for VIPT-WB cache machine):
>>
>> -	static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
>> address, pte_t *ptep)
>>
>> +	static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct 
>> mm_struct *mm, unsigned long addr, pte_t *ptep)
>> 	{
>> 		pte_t old_pte = *ptep;
>> +		if (atomic_read(&mm->mm_users) > 1)
>> +			flush_cache_page(vma, addr, pte_pfn(old_pte));
>> 		set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
>> 	}
> 
> I tested the hack below on two machines currently running 2.6.33.2
> UP kernels.  The change seems to fix Debian #561203 (minifail bug)!
> Thus, I definitely think you are on the right track.  I'll continue
> to test.
> 
> I suspect the same issue is present for SMP kernels.

Hi Dave,

I tested your patch today on one of my machines with plain kernel 2.6.33 (32bit, SMP, B2000 I think).
Sadly I still did see the minifail bug.

Are you sure, that the patch fixed this bug for you?

Helge

do_page_fault() pid=21470 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=7986 command='minifail3' type=6 address=0x00000003                                                                                 
do_page_fault() pid=19952 command='minifail3' type=6 address=0x00000003                                                                                
do_page_fault() pid=13549 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=21862 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=4615 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=17336 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=21986 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=2157 command='minifail3' type=15 address=0x000000dc
do_page_fault() pid=23886 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=2681 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=3229 command='minifail3' type=15 address=0x000000ec
do_page_fault() pid=26095 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=20722 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=19912 command='minifail3' type=15 address=0x000000ec
...
pagealloc: memory corruption
7db0c780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
7db0c790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
7db0c7a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
7db0c7b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Backtrace:
 [<1011ec14>] show_stack+0x18/0x28
 [<10117ba0>] dump_stack+0x1c/0x2c
 [<101c6594>] kernel_map_pages+0x2a0/0x2b8
 [<1019e6c8>] get_page_from_freelist+0x3d4/0x614
 [<1019ea3c>] __alloc_pages_nodemask+0x134/0x610
 [<101b1d20>] do_wp_page+0x268/0xac0
 [<101b3b34>] handle_mm_fault+0x4d4/0x7c4
 [<1011d854>] do_page_fault+0x1f8/0x2fc
 [<1011f450>] handle_interruption+0xec/0x730
 [<10103078>] intr_check_sig+0x0/0x34
...
do_page_fault() pid=13414 command='minifail3' type=15 address=0x000000dc
do_page_fault() pid=22776 command='minifail3' type=15 address=0x00000000
do_page_fault() pid=26290 command='minifail3' type=15 address=0x000000ec
do_page_fault() pid=1399 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=16130 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=26401 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=3383 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=3400 command='minifail3' type=15 address=0x00000004
do_page_fault() pid=18659 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=3730 command='minifail3' type=6 address=0x00000003
do_page_fault() pid=28828 command='minifail3' type=6 address=0x00000003




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Thu, 08 Apr 2010 21:57:10 GMT) Full text and rfc822 format available.

Acknowledgement sent to John David Anglin <dave.anglin@nrc-cnrc.gc.ca>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Apr 2010 21:57:10 GMT) Full text and rfc822 format available.

Message #179 received at 561203@bugs.debian.org (full text, mbox):

From: John David Anglin <dave@hiauly1.hia.nrc.ca>
To: Helge Deller <deller@gmx.de>
Cc: John David Anglin <dave.anglin@nrc-cnrc.gc.ca>, NIIBE Yutaka <gniibe@fsij.org>, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Thu, 8 Apr 2010 17:54:54 -0400
On Thu, 08 Apr 2010, Helge Deller wrote:

> On 04/02/2010 09:35 PM, John David Anglin wrote:
> > On Fri, 02 Apr 2010, NIIBE Yutaka wrote:
> > 
> >> NIIBE Yutaka wrote:
> >>> To have same semantics as other archs, I think that VIPT-WB cache
> >>> machine should have cache flush at ptep_set_wrprotect, so that memory
> >>> of the page has up-to-date data.  Yes, it will be huge performance
> >>> impact for fork.  But I don't find any good solution other than this
> >>> yet.
> >>
> >> I think we could do something like (only for VIPT-WB cache machine):
> >>
> >> -	static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long 
> >> address, pte_t *ptep)
> >>
> >> +	static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct 
> >> mm_struct *mm, unsigned long addr, pte_t *ptep)
> >> 	{
> >> 		pte_t old_pte = *ptep;
> >> +		if (atomic_read(&mm->mm_users) > 1)
> >> +			flush_cache_page(vma, addr, pte_pfn(old_pte));
> >> 		set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
> >> 	}
> > 
> > I tested the hack below on two machines currently running 2.6.33.2
> > UP kernels.  The change seems to fix Debian #561203 (minifail bug)!
> > Thus, I definitely think you are on the right track.  I'll continue
> > to test.
> > 
> > I suspect the same issue is present for SMP kernels.
> 
> Hi Dave,
> 
> I tested your patch today on one of my machines with plain kernel 2.6.33 (32bit, SMP, B2000 I think).
> Sadly I still did see the minifail bug.
> 
> Are you sure, that the patch fixed this bug for you?

Seemed to, but I have a bunch of other changes installed.  Possibly,
the change to cacheflush.h is important.  It affects all PA8000.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

diff --git a/arch/parisc/hpux/wrappers.S b/arch/parisc/hpux/wrappers.S
index 58c53c8..bdcea33 100644
--- a/arch/parisc/hpux/wrappers.S
+++ b/arch/parisc/hpux/wrappers.S
@@ -88,7 +88,7 @@ ENTRY(hpux_fork_wrapper)
 
 	STREG	%r2,-20(%r30)
 	ldo	64(%r30),%r30
-	STREG	%r2,PT_GR19(%r1)	;! save for child
+	STREG	%r2,PT_SYSCALL_RP(%r1)	;! save for child
 	STREG	%r30,PT_GR21(%r1)	;! save for child
 
 	LDREG	PT_GR30(%r1),%r25
@@ -132,7 +132,7 @@ ENTRY(hpux_child_return)
 	bl,n	schedule_tail, %r2
 #endif
 
-	LDREG	TASK_PT_GR19-TASK_SZ_ALGN-128(%r30),%r2
+	LDREG	TASK_PT_SYSCALL_RP-TASK_SZ_ALGN-128(%r30),%r2
 	b fork_return
 	copy %r0,%r28
 ENDPROC(hpux_child_return)
diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
index 716634d..d7fabc4 100644
--- a/arch/parisc/include/asm/atomic.h
+++ b/arch/parisc/include/asm/atomic.h
@@ -24,29 +24,46 @@
  * Hash function to index into a different SPINLOCK.
  * Since "a" is usually an address, use one spinlock per cacheline.
  */
-#  define ATOMIC_HASH_SIZE 4
-#  define ATOMIC_HASH(a) (&(__atomic_hash[ (((unsigned long) (a))/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
+#  define ATOMIC_HASH_SIZE (4096/L1_CACHE_BYTES)  /* 4 */
+#  define ATOMIC_HASH(a)      (&(__atomic_hash[ (((unsigned long) (a))/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
+#  define ATOMIC_USER_HASH(a) (&(__atomic_user_hash[ (((unsigned long) (a))/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
 
 extern arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned;
+extern arch_spinlock_t __atomic_user_hash[ATOMIC_HASH_SIZE] __lock_aligned;
 
 /* Can't use raw_spin_lock_irq because of #include problems, so
  * this is the substitute */
-#define _atomic_spin_lock_irqsave(l,f) do {	\
-	arch_spinlock_t *s = ATOMIC_HASH(l);		\
+#define _atomic_spin_lock_irqsave_template(l,f,hash_func) do {	\
+	arch_spinlock_t *s = hash_func;		\
 	local_irq_save(f);			\
 	arch_spin_lock(s);			\
 } while(0)
 
-#define _atomic_spin_unlock_irqrestore(l,f) do {	\
-	arch_spinlock_t *s = ATOMIC_HASH(l);			\
+#define _atomic_spin_unlock_irqrestore_template(l,f,hash_func) do {	\
+	arch_spinlock_t *s = hash_func;			\
 	arch_spin_unlock(s);				\
 	local_irq_restore(f);				\
 } while(0)
 
+/* kernel memory locks */
+#define _atomic_spin_lock_irqsave(l,f)	\
+	_atomic_spin_lock_irqsave_template(l,f,ATOMIC_HASH(l))
+
+#define _atomic_spin_unlock_irqrestore(l,f)	\
+	_atomic_spin_unlock_irqrestore_template(l,f,ATOMIC_HASH(l))
+
+/* userspace memory locks */
+#define _atomic_spin_lock_irqsave_user(l,f)	\
+	_atomic_spin_lock_irqsave_template(l,f,ATOMIC_USER_HASH(l))
+
+#define _atomic_spin_unlock_irqrestore_user(l,f)	\
+	_atomic_spin_unlock_irqrestore_template(l,f,ATOMIC_USER_HASH(l))
 
 #else
 #  define _atomic_spin_lock_irqsave(l,f) do { local_irq_save(f); } while (0)
 #  define _atomic_spin_unlock_irqrestore(l,f) do { local_irq_restore(f); } while (0)
+#  define _atomic_spin_lock_irqsave_user(l,f) _atomic_spin_lock_irqsave(l,f)
+#  define _atomic_spin_unlock_irqrestore_user(l,f) _atomic_spin_lock_irqsave_user(l,f)
 #endif
 
 /* This should get optimized out since it's never called.
diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index 7a73b61..ab87176 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -2,6 +2,7 @@
 #define _PARISC_CACHEFLUSH_H
 
 #include <linux/mm.h>
+#include <linux/uaccess.h>
 
 /* The usual comment is "Caches aren't brain-dead on the <architecture>".
  * Unfortunately, that doesn't apply to PA-RISC. */
@@ -113,11 +114,20 @@ static inline void *kmap(struct page *page)
 
 #define kunmap(page)			kunmap_parisc(page_address(page))
 
-#define kmap_atomic(page, idx)		page_address(page)
+static inline void *kmap_atomic(struct page *page, enum km_type idx)
+{
+	pagefault_disable();
+	return page_address(page);
+}
 
-#define kunmap_atomic(addr, idx)	kunmap_parisc(addr)
+static inline void kunmap_atomic(void *addr, enum km_type idx)
+{
+	kunmap_parisc(addr);
+	pagefault_enable();
+}
 
-#define kmap_atomic_pfn(pfn, idx)	page_address(pfn_to_page(pfn))
+#define kmap_atomic_prot(page, idx, prot)	kmap_atomic(page, idx)
+#define kmap_atomic_pfn(pfn, idx)	kmap_atomic(pfn_to_page(pfn), (idx))
 #define kmap_atomic_to_page(ptr)	virt_to_page(ptr)
 #endif
 
diff --git a/arch/parisc/include/asm/futex.h b/arch/parisc/include/asm/futex.h
index 0c705c3..7bc963e 100644
--- a/arch/parisc/include/asm/futex.h
+++ b/arch/parisc/include/asm/futex.h
@@ -55,6 +55,7 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
 {
 	int err = 0;
 	int uval;
+	unsigned long flags;
 
 	/* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is
 	 * our gateway page, and causes no end of trouble...
@@ -65,10 +66,15 @@ futex_atomic_cmpxchg_inatomic(int __user *uaddr, int oldval, int newval)
 	if (!access_ok(VERIFY_WRITE, uaddr, sizeof(int)))
 		return -EFAULT;
 
+	_atomic_spin_lock_irqsave_user(uaddr, flags);
+
 	err = get_user(uval, uaddr);
-	if (err) return -EFAULT;
-	if (uval == oldval)
-		err = put_user(newval, uaddr);
+	if (!err)
+		if (uval == oldval)
+			err = put_user(newval, uaddr);
+
+	_atomic_spin_unlock_irqrestore_user(uaddr, flags);
+
 	if (err) return -EFAULT;
 	return uval;
 }
diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index a27d2e2..53ba987 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -14,6 +14,7 @@
 #include <linux/bitops.h>
 #include <asm/processor.h>
 #include <asm/cache.h>
+extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn);
 
 /*
  * kern_addr_valid(ADDR) tests if ADDR is pointing to valid kernel
@@ -456,17 +457,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 	return old_pte;
 }
 
-static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
+static inline void ptep_set_wrprotect(struct vm_area_struct *vma, struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
 #ifdef CONFIG_SMP
 	unsigned long new, old;
+#endif
+	pte_t old_pte = *ptep;
+
+	if (pte_dirty(old_pte))
+		flush_cache_page(vma, addr, pte_pfn(old_pte));
 
+#ifdef CONFIG_SMP
 	do {
 		old = pte_val(*ptep);
 		new = pte_val(pte_wrprotect(__pte (old)));
 	} while (cmpxchg((unsigned long *) ptep, old, new) != old);
 #else
-	pte_t old_pte = *ptep;
 	set_pte_at(mm, addr, ptep, pte_wrprotect(old_pte));
 #endif
 }
diff --git a/arch/parisc/include/asm/system.h b/arch/parisc/include/asm/system.h
index d91357b..4653c77 100644
--- a/arch/parisc/include/asm/system.h
+++ b/arch/parisc/include/asm/system.h
@@ -160,7 +160,7 @@ static inline void set_eiem(unsigned long val)
    ldcd). */
 
 #define __PA_LDCW_ALIGNMENT	4
-#define __ldcw_align(a) ((volatile unsigned int *)a)
+#define __ldcw_align(a) (&(a)->slock)
 #define __LDCW	"ldcw,co"
 
 #endif /*!CONFIG_PA20*/
diff --git a/arch/parisc/kernel/asm-offsets.c b/arch/parisc/kernel/asm-offsets.c
index ec787b4..b2f35b2 100644
--- a/arch/parisc/kernel/asm-offsets.c
+++ b/arch/parisc/kernel/asm-offsets.c
@@ -137,6 +137,7 @@ int main(void)
 	DEFINE(TASK_PT_IAOQ0, offsetof(struct task_struct, thread.regs.iaoq[0]));
 	DEFINE(TASK_PT_IAOQ1, offsetof(struct task_struct, thread.regs.iaoq[1]));
 	DEFINE(TASK_PT_CR27, offsetof(struct task_struct, thread.regs.cr27));
+	DEFINE(TASK_PT_SYSCALL_RP, offsetof(struct task_struct, thread.regs.pad0));
 	DEFINE(TASK_PT_ORIG_R28, offsetof(struct task_struct, thread.regs.orig_r28));
 	DEFINE(TASK_PT_KSP, offsetof(struct task_struct, thread.regs.ksp));
 	DEFINE(TASK_PT_KPC, offsetof(struct task_struct, thread.regs.kpc));
@@ -225,6 +226,7 @@ int main(void)
 	DEFINE(PT_IAOQ0, offsetof(struct pt_regs, iaoq[0]));
 	DEFINE(PT_IAOQ1, offsetof(struct pt_regs, iaoq[1]));
 	DEFINE(PT_CR27, offsetof(struct pt_regs, cr27));
+	DEFINE(PT_SYSCALL_RP, offsetof(struct pt_regs, pad0));
 	DEFINE(PT_ORIG_R28, offsetof(struct pt_regs, orig_r28));
 	DEFINE(PT_KSP, offsetof(struct pt_regs, ksp));
 	DEFINE(PT_KPC, offsetof(struct pt_regs, kpc));
@@ -290,5 +292,11 @@ int main(void)
 	BLANK();
 	DEFINE(ASM_PDC_RESULT_SIZE, NUM_PDC_RESULT * sizeof(unsigned long));
 	BLANK();
+
+#ifdef CONFIG_SMP
+	DEFINE(ASM_ATOMIC_HASH_SIZE_SHIFT, __builtin_ffs(ATOMIC_HASH_SIZE)-1);
+	DEFINE(ASM_ATOMIC_HASH_ENTRY_SHIFT, __builtin_ffs(sizeof(__atomic_hash[0]))-1);
+#endif
+
 	return 0;
 }
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 3a44f7f..a7e9472 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -364,32 +364,6 @@
 	.align		32
 	.endm
 
-	/* The following are simple 32 vs 64 bit instruction
-	 * abstractions for the macros */
-	.macro		EXTR	reg1,start,length,reg2
-#ifdef CONFIG_64BIT
-	extrd,u		\reg1,32+(\start),\length,\reg2
-#else
-	extrw,u		\reg1,\start,\length,\reg2
-#endif
-	.endm
-
-	.macro		DEP	reg1,start,length,reg2
-#ifdef CONFIG_64BIT
-	depd		\reg1,32+(\start),\length,\reg2
-#else
-	depw		\reg1,\start,\length,\reg2
-#endif
-	.endm
-
-	.macro		DEPI	val,start,length,reg
-#ifdef CONFIG_64BIT
-	depdi		\val,32+(\start),\length,\reg
-#else
-	depwi		\val,\start,\length,\reg
-#endif
-	.endm
-
 	/* In LP64, the space contains part of the upper 32 bits of the
 	 * fault.  We have to extract this and place it in the va,
 	 * zeroing the corresponding bits in the space register */
@@ -442,19 +416,19 @@
 	 */
 	.macro		L2_ptep	pmd,pte,index,va,fault
 #if PT_NLEVELS == 3
-	EXTR		\va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
+	extru		\va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
 #else
-	EXTR		\va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
+	extru		\va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
 #endif
-	DEP             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
+	dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
 	copy		%r0,\pte
 	ldw,s		\index(\pmd),\pmd
 	bb,>=,n		\pmd,_PxD_PRESENT_BIT,\fault
-	DEP		%r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */
+	dep		%r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */
 	copy		\pmd,%r9
 	SHLREG		%r9,PxD_VALUE_SHIFT,\pmd
-	EXTR		\va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
-	DEP		%r0,31,PAGE_SHIFT,\pmd  /* clear offset */
+	extru		\va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
+	dep		%r0,31,PAGE_SHIFT,\pmd  /* clear offset */
 	shladd		\index,BITS_PER_PTE_ENTRY,\pmd,\pmd
 	LDREG		%r0(\pmd),\pte		/* pmd is now pte */
 	bb,>=,n		\pte,_PAGE_PRESENT_BIT,\fault
@@ -605,7 +579,7 @@
 	depdi		0,31,32,\tmp
 #endif
 	copy		\va,\tmp1
-	DEPI		0,31,23,\tmp1
+	depi		0,31,23,\tmp1
 	cmpb,COND(<>),n	\tmp,\tmp1,\fault
 	ldi		(_PAGE_DIRTY|_PAGE_WRITE|_PAGE_READ),\prot
 	depd,z		\prot,8,7,\prot
@@ -758,6 +732,10 @@ ENTRY(__kernel_thread)
 
 	STREG	%r22, PT_GR22(%r1)	/* save r22 (arg5) */
 	copy	%r0, %r22		/* user_tid */
+	copy	%r0, %r21		/* child_tid */
+#else
+	stw	%r0, -52(%r30)	     	/* user_tid */
+	stw	%r0, -56(%r30)	     	/* child_tid */
 #endif
 	STREG	%r26, PT_GR26(%r1)  /* Store function & argument for child */
 	STREG	%r25, PT_GR25(%r1)
@@ -765,7 +743,7 @@ ENTRY(__kernel_thread)
 	ldo	CLONE_VM(%r26), %r26   /* Force CLONE_VM since only init_mm */
 	or	%r26, %r24, %r26      /* will have kernel mappings.	 */
 	ldi	1, %r25			/* stack_start, signals kernel thread */
-	stw	%r0, -52(%r30)	     	/* user_tid */
+	ldi	0, %r23			/* child_stack_size */
 #ifdef CONFIG_64BIT
 	ldo	-16(%r30),%r29		/* Reference param save area */
 #endif
@@ -972,7 +950,10 @@ intr_check_sig:
 	BL	do_notify_resume,%r2
 	copy	%r16, %r26			/* struct pt_regs *regs */
 
-	b,n	intr_check_sig
+	mfctl   %cr30,%r16		/* Reload */
+	LDREG	TI_TASK(%r16), %r16	/* thread_info -> task_struct */
+	b	intr_check_sig
+	ldo	TASK_REGS(%r16),%r16
 
 intr_restore:
 	copy            %r16,%r29
@@ -997,13 +978,6 @@ intr_restore:
 
 	rfi
 	nop
-	nop
-	nop
-	nop
-	nop
-	nop
-	nop
-	nop
 
 #ifndef CONFIG_PREEMPT
 # define intr_do_preempt	intr_restore
@@ -1026,14 +1000,12 @@ intr_do_resched:
 	ldo	-16(%r30),%r29		/* Reference param save area */
 #endif
 
-	ldil	L%intr_check_sig, %r2
-#ifndef CONFIG_64BIT
-	b	schedule
-#else
-	load32	schedule, %r20
-	bv	%r0(%r20)
-#endif
-	ldo	R%intr_check_sig(%r2), %r2
+	BL	schedule,%r2
+	nop
+	mfctl   %cr30,%r16		/* Reload */
+	LDREG	TI_TASK(%r16), %r16	/* thread_info -> task_struct */
+	b	intr_check_sig
+	ldo	TASK_REGS(%r16),%r16
 
 	/* preempt the current task on returning to kernel
 	 * mode from an interrupt, iff need_resched is set,
@@ -1772,9 +1744,9 @@ ENTRY(sys_fork_wrapper)
 	ldo	-16(%r30),%r29		/* Reference param save area */
 #endif
 
-	/* These are call-clobbered registers and therefore
-	   also syscall-clobbered (we hope). */
-	STREG	%r2,PT_GR19(%r1)	/* save for child */
+	STREG	%r2,PT_SYSCALL_RP(%r1)
+
+	/* WARNING - Clobbers r21, userspace must save! */
 	STREG	%r30,PT_GR21(%r1)
 
 	LDREG	PT_GR30(%r1),%r25
@@ -1804,7 +1776,7 @@ ENTRY(child_return)
 	nop
 
 	LDREG	TI_TASK-THREAD_SZ_ALGN-FRAME_SIZE-FRAME_SIZE(%r30), %r1
-	LDREG	TASK_PT_GR19(%r1),%r2
+	LDREG	TASK_PT_SYSCALL_RP(%r1),%r2
 	b	wrapper_exit
 	copy	%r0,%r28
 ENDPROC(child_return)
@@ -1823,8 +1795,9 @@ ENTRY(sys_clone_wrapper)
 	ldo	-16(%r30),%r29		/* Reference param save area */
 #endif
 
-	/* WARNING - Clobbers r19 and r21, userspace must save these! */
-	STREG	%r2,PT_GR19(%r1)	/* save for child */
+	STREG	%r2,PT_SYSCALL_RP(%r1)
+
+	/* WARNING - Clobbers r21, userspace must save! */
 	STREG	%r30,PT_GR21(%r1)
 	BL	sys_clone,%r2
 	copy	%r1,%r24
@@ -1847,7 +1820,9 @@ ENTRY(sys_vfork_wrapper)
 	ldo	-16(%r30),%r29		/* Reference param save area */
 #endif
 
-	STREG	%r2,PT_GR19(%r1)	/* save for child */
+	STREG	%r2,PT_SYSCALL_RP(%r1)
+
+	/* WARNING - Clobbers r21, userspace must save! */
 	STREG	%r30,PT_GR21(%r1)
 
 	BL	sys_vfork,%r2
@@ -2076,9 +2051,10 @@ syscall_restore:
 	LDREG	TASK_PT_GR31(%r1),%r31	   /* restore syscall rp */
 
 	/* NOTE: We use rsm/ssm pair to make this operation atomic */
+	LDREG   TASK_PT_GR30(%r1),%r1              /* Get user sp */
 	rsm     PSW_SM_I, %r0
-	LDREG   TASK_PT_GR30(%r1),%r30             /* restore user sp */
-	mfsp	%sr3,%r1			   /* Get users space id */
+	copy    %r1,%r30                           /* Restore user sp */
+	mfsp    %sr3,%r1                           /* Get user space id */
 	mtsp    %r1,%sr7                           /* Restore sr7 */
 	ssm     PSW_SM_I, %r0
 
diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
index cb71f3d..84b3239 100644
--- a/arch/parisc/kernel/setup.c
+++ b/arch/parisc/kernel/setup.c
@@ -128,6 +128,14 @@ void __init setup_arch(char **cmdline_p)
 	printk(KERN_INFO "The 32-bit Kernel has started...\n");
 #endif
 
+	/* Consistency check on the size and alignments of our spinlocks */
+#ifdef CONFIG_SMP
+	BUILD_BUG_ON(sizeof(arch_spinlock_t) != __PA_LDCW_ALIGNMENT);
+	BUG_ON((unsigned long)&__atomic_hash[0] & (__PA_LDCW_ALIGNMENT-1));
+	BUG_ON((unsigned long)&__atomic_hash[1] & (__PA_LDCW_ALIGNMENT-1));
+#endif
+	BUILD_BUG_ON((1<<L1_CACHE_SHIFT) != L1_CACHE_BYTES);
+
 	pdc_console_init();
 
 #ifdef CONFIG_64BIT
diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
index f5f9602..68e75ce 100644
--- a/arch/parisc/kernel/syscall.S
+++ b/arch/parisc/kernel/syscall.S
@@ -47,18 +47,17 @@ ENTRY(linux_gateway_page)
 	KILL_INSN
 	.endr
 
-	/* ADDRESS 0xb0 to 0xb4, lws uses 1 insns for entry */
+	/* ADDRESS 0xb0 to 0xb8, lws uses two insns for entry */
 	/* Light-weight-syscall entry must always be located at 0xb0 */
 	/* WARNING: Keep this number updated with table size changes */
 #define __NR_lws_entries (2)
 
 lws_entry:
-	/* Unconditional branch to lws_start, located on the 
-	   same gateway page */
-	b,n	lws_start
+	gate	lws_start, %r0		/* increase privilege */
+	depi	3, 31, 2, %r31		/* Ensure we return into user mode. */
 
-	/* Fill from 0xb4 to 0xe0 */
-	.rept 11
+	/* Fill from 0xb8 to 0xe0 */
+	.rept 10
 	KILL_INSN
 	.endr
 
@@ -423,9 +422,6 @@ tracesys_sigexit:
 
 	*********************************************************/
 lws_start:
-	/* Gate and ensure we return to userspace */
-	gate	.+8, %r0
-	depi	3, 31, 2, %r31	/* Ensure we return to userspace */
 
 #ifdef CONFIG_64BIT
 	/* FIXME: If we are a 64-bit kernel just
@@ -442,7 +438,7 @@ lws_start:
 #endif	
 
         /* Is the lws entry number valid? */
-	comiclr,>>=	__NR_lws_entries, %r20, %r0
+	comiclr,>>	__NR_lws_entries, %r20, %r0
 	b,n	lws_exit_nosys
 
 	/* WARNING: Trashing sr2 and sr3 */
@@ -473,7 +469,7 @@ lws_exit:
 	/* now reset the lowest bit of sp if it was set */
 	xor	%r30,%r1,%r30
 #endif
-	be,n	0(%sr3, %r31)
+	be,n	0(%sr7, %r31)
 
 
 	
@@ -529,7 +525,6 @@ lws_compare_and_swap32:
 #endif
 
 lws_compare_and_swap:
-#ifdef CONFIG_SMP
 	/* Load start of lock table */
 	ldil	L%lws_lock_start, %r20
 	ldo	R%lws_lock_start(%r20), %r28
@@ -572,8 +567,6 @@ cas_wouldblock:
 	ldo	2(%r0), %r28				/* 2nd case */
 	b	lws_exit				/* Contended... */
 	ldo	-EAGAIN(%r0), %r21			/* Spin in userspace */
-#endif
-/* CONFIG_SMP */
 
 	/*
 		prev = *addr;
@@ -601,13 +594,11 @@ cas_action:
 1:	ldw	0(%sr3,%r26), %r28
 	sub,<>	%r28, %r25, %r0
 2:	stw	%r24, 0(%sr3,%r26)
-#ifdef CONFIG_SMP
 	/* Free lock */
 	stw	%r20, 0(%sr2,%r20)
-# if ENABLE_LWS_DEBUG
+#if ENABLE_LWS_DEBUG
 	/* Clear thread register indicator */
 	stw	%r0, 4(%sr2,%r20)
-# endif
 #endif
 	/* Return to userspace, set no error */
 	b	lws_exit
@@ -615,12 +606,10 @@ cas_action:
 
 3:		
 	/* Error occured on load or store */
-#ifdef CONFIG_SMP
 	/* Free lock */
 	stw	%r20, 0(%sr2,%r20)
-# if ENABLE_LWS_DEBUG
+#if ENABLE_LWS_DEBUG
 	stw	%r0, 4(%sr2,%r20)
-# endif
 #endif
 	b	lws_exit
 	ldo	-EFAULT(%r0),%r21	/* set errno */
@@ -672,7 +661,6 @@ ENTRY(sys_call_table64)
 END(sys_call_table64)
 #endif
 
-#ifdef CONFIG_SMP
 	/*
 		All light-weight-syscall atomic operations 
 		will use this set of locks 
@@ -694,8 +682,6 @@ ENTRY(lws_lock_start)
 	.endr
 END(lws_lock_start)
 	.previous
-#endif
-/* CONFIG_SMP for lws_lock_start */
 
 .end
 
diff --git a/arch/parisc/lib/bitops.c b/arch/parisc/lib/bitops.c
index 353963d..bae6a86 100644
--- a/arch/parisc/lib/bitops.c
+++ b/arch/parisc/lib/bitops.c
@@ -15,6 +15,9 @@
 arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned = {
 	[0 ... (ATOMIC_HASH_SIZE-1)]  = __ARCH_SPIN_LOCK_UNLOCKED
 };
+arch_spinlock_t __atomic_user_hash[ATOMIC_HASH_SIZE] __lock_aligned = {
+	[0 ... (ATOMIC_HASH_SIZE-1)]  = __ARCH_SPIN_LOCK_UNLOCKED
+};
 #endif
 
 #ifdef CONFIG_64BIT
diff --git a/kernel/fork.c b/kernel/fork.c
index f88bd98..108b1ed 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -608,7 +608,10 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 			 * We don't check the error code - if userspace has
 			 * not set up a proper pointer then tough luck.
 			 */
+			unsigned long flags;
+			_atomic_spin_lock_irqsave_user(tsk->clear_child_tid, flags);
 			put_user(0, tsk->clear_child_tid);
+			_atomic_spin_unlock_irqrestore_user(tsk->clear_child_tid, flags);
 			sys_futex(tsk->clear_child_tid, FUTEX_WAKE,
 					1, NULL, NULL, 0);
 		}
@@ -1432,8 +1435,12 @@ long do_fork(unsigned long clone_flags,
 
 		nr = task_pid_vnr(p);
 
-		if (clone_flags & CLONE_PARENT_SETTID)
+		if (clone_flags & CLONE_PARENT_SETTID) {
+			unsigned long flags;
+			_atomic_spin_lock_irqsave_user(parent_tidptr, flags);
 			put_user(nr, parent_tidptr);
+			_atomic_spin_unlock_irqrestore_user(parent_tidptr, flags);
+		}
 
 		if (clone_flags & CLONE_VFORK) {
 			p->vfork_done = &vfork;
diff --git a/mm/memory.c b/mm/memory.c
index 09e4b1b..21c2916 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -616,7 +616,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * in the parent and the child
 	 */
 	if (is_cow_mapping(vm_flags)) {
-		ptep_set_wrprotect(src_mm, addr, src_pte);
+		ptep_set_wrprotect(vma, src_mm, addr, src_pte);
 		pte = pte_wrprotect(pte);
 	}
 




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Thu, 08 Apr 2010 22:48:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "John David Anglin" <dave@hiauly1.hia.nrc.ca>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Apr 2010 22:48:03 GMT) Full text and rfc822 format available.

Message #184 received at 561203@bugs.debian.org (full text, mbox):

From: "John David Anglin" <dave@hiauly1.hia.nrc.ca>
To: dave.anglin@nrc-cnrc.gc.ca
Cc: deller@gmx.de, gniibe@fsij.org, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Thu, 8 Apr 2010 18:44:45 -0400 (EDT)
> On Thu, 08 Apr 2010, Helge Deller wrote:

> > I tested your patch today on one of my machines with plain kernel 2.6.33 (32bit, SMP, B2000 I think).
> > Sadly I still did see the minifail bug.
> > 
> > Are you sure, that the patch fixed this bug for you?
> 
> Seemed to, but I have a bunch of other changes installed.  Possibly,
> the change to cacheflush.h is important.  It affects all PA8000.

I also think the change suggested by James

+       if (pte_dirty(old_pte))

is important for SMP.  With the patch set that I sent, my rp3440 and
gsyprf11 seem reasonably stable running 2.6.33.2 SMP.  I doubt all
problems are solved but things are a lot better than before.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)




Added tag(s) sid and squeeze. Request was from Gerfried Fuchs <rhonda@debian.at> to control@bugs.debian.org. (Fri, 09 Apr 2010 08:48:14 GMT) Full text and rfc822 format available.

Added indication that bug 561203 blocks 573991 Request was from Clint Adams <schizo@debian.org> to control@bugs.debian.org. (Sun, 23 May 2010 15:48:11 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Wed, 02 Jun 2010 15:39:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Modestas Vainius <modestas@vainius.eu>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 02 Jun 2010 15:39:06 GMT) Full text and rfc822 format available.

Message #193 received at 561203@bugs.debian.org (full text, mbox):

From: Modestas Vainius <modestas@vainius.eu>
To: "John David Anglin" <dave@hiauly1.hia.nrc.ca>
Cc: dave.anglin@nrc-cnrc.gc.ca, deller@gmx.de, gniibe@fsij.org, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Wed, 2 Jun 2010 18:33:12 +0300
[Message part 1 (text/plain, inline)]
Hello,

this bug [1] is back to the "very common" department with eglibc 2.11 (libc6-
dev_2.11.1-1) builds. Majority of KDE applications are failing to build on 
hppa again. Is there really nothing what could be done to fix it?

1. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203
2. 
https://buildd.debian.org/fetch.cgi?pkg=kde4libs;ver=4%3A4.4.4-1;arch=hppa;stamp=1275467025
3. 
https://buildd.debian.org/fetch.cgi?pkg=basket;ver=1.80-1;arch=hppa;stamp=1275483241

-- 
Modestas Vainius <modestas@vainius.eu>
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Wed, 02 Jun 2010 17:27:10 GMT) Full text and rfc822 format available.

Acknowledgement sent to John David Anglin <dave.anglin@nrc-cnrc.gc.ca>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 02 Jun 2010 17:27:10 GMT) Full text and rfc822 format available.

Message #198 received at 561203@bugs.debian.org (full text, mbox):

From: John David Anglin <dave@hiauly1.hia.nrc.ca>
To: Modestas Vainius <modestas@vainius.eu>
Cc: dave.anglin@nrc-cnrc.gc.ca, deller@gmx.de, gniibe@fsij.org, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, 561203@bugs.debian.org
Subject: Re: threads and fork on machine with VIPT-WB cache
Date: Wed, 2 Jun 2010 13:16:01 -0400
On Wed, 02 Jun 2010, Modestas Vainius wrote:

> Hello,
> 
> this bug [1] is back to the "very common" department with eglibc 2.11 (libc6-
> dev_2.11.1-1) builds. Majority of KDE applications are failing to build on 
> hppa again. Is there really nothing what could be done to fix it?

I will just say it is very tricky.  I think a fix is possible (arm and mips
had similar cache problems) but the victim replacement present in PA8800/PA8900
caches makes the problem especially difficult  for hardware using these
processors.

I have spent the last few months testing various alternatives and have
now done hundreds of kernel builds.  I did post some experimental patches
that fix the problem on UP kernels.  However, the problem is not resolved
for SMP kernels.

The minifail test is a good one to demonstrate the problem.  Indeed,
a very similar test was given in the thread below:
http://readlist.com/lists/vger.kernel.org/linux-kernel/54/270861.html

This thread also discusses the PA8800 problem:
http://readlist.com/lists/vger.kernel.org/linux-kernel/54/271417.html

I currently surmise that we have a problem with the cache victim
replacement, although the cause isn't clear.  I did find recently
that the cache prefetch in copy_user_page_asm extends to the line
beyond the end of the page, but fixing this doesn't resolve the problem.

I am still experimenting with using equivalent aliasing.  It does
help to flush in ptep_set_wrprotect.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Wed, 02 Jun 2010 18:00:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to dann frazier <dannf@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 02 Jun 2010 18:00:03 GMT) Full text and rfc822 format available.

Message #203 received at 561203@bugs.debian.org (full text, mbox):

From: dann frazier <dannf@debian.org>
To: John David Anglin <dave.anglin@nrc-cnrc.gc.ca>, 561203@bugs.debian.org
Cc: Modestas Vainius <modestas@vainius.eu>, deller@gmx.de, gniibe@fsij.org, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org
Subject: Re: Bug#561203: threads and fork on machine with VIPT-WB cache
Date: Wed, 2 Jun 2010 11:56:17 -0600
On Wed, Jun 02, 2010 at 01:16:01PM -0400, John David Anglin wrote:
> On Wed, 02 Jun 2010, Modestas Vainius wrote:
> 
> > Hello,
> > 
> > this bug [1] is back to the "very common" department with eglibc 2.11 (libc6-
> > dev_2.11.1-1) builds. Majority of KDE applications are failing to build on 
> > hppa again. Is there really nothing what could be done to fix it?
> 
> I will just say it is very tricky.  I think a fix is possible (arm and mips
> had similar cache problems) but the victim replacement present in PA8800/PA8900
> caches makes the problem especially difficult  for hardware using these
> processors.
> 
> I have spent the last few months testing various alternatives and have
> now done hundreds of kernel builds.  I did post some experimental patches
> that fix the problem on UP kernels.  However, the problem is not resolved
> for SMP kernels.

Note that Debian's buildds run a UP kernel, so as soon as those fixes
go upstream we can pull them in. Thanks for all your work here!

> The minifail test is a good one to demonstrate the problem.  Indeed,
> a very similar test was given in the thread below:
> http://readlist.com/lists/vger.kernel.org/linux-kernel/54/270861.html
> 
> This thread also discusses the PA8800 problem:
> http://readlist.com/lists/vger.kernel.org/linux-kernel/54/271417.html
> 
> I currently surmise that we have a problem with the cache victim
> replacement, although the cause isn't clear.  I did find recently
> that the cache prefetch in copy_user_page_asm extends to the line
> beyond the end of the page, but fixing this doesn't resolve the problem.
> 
> I am still experimenting with using equivalent aliasing.  It does
> help to flush in ptep_set_wrprotect.
> 
> Dave

-- 
dann frazier





Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Thu, 03 Jun 2010 08:54:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Modestas Vainius <modestas@vainius.eu>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 03 Jun 2010 08:54:03 GMT) Full text and rfc822 format available.

Message #208 received at 561203@bugs.debian.org (full text, mbox):

From: Modestas Vainius <modestas@vainius.eu>
To: dann frazier <dannf@debian.org>
Cc: John David Anglin <dave.anglin@nrc-cnrc.gc.ca>, 561203@bugs.debian.org, deller@gmx.de, gniibe@fsij.org, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, carlos@systemhalted.org
Subject: Re: Bug#561203: threads and fork on machine with VIPT-WB cache
Date: Thu, 3 Jun 2010 11:50:05 +0300
[Message part 1 (text/plain, inline)]
# Breaks unrelated applications
tags 561203 critical
thanks

Hello,

On trečiadienis 02 Birželis 2010 20:56:17 dann frazier wrote:
> On Wed, Jun 02, 2010 at 01:16:01PM -0400, John David Anglin wrote:
> > On Wed, 02 Jun 2010, Modestas Vainius wrote:
> > > Hello,
> > > 
> > > this bug [1] is back to the "very common" department with eglibc 2.11
> > > (libc6- dev_2.11.1-1) builds. Majority of KDE applications are failing
> > > to build on hppa again. Is there really nothing what could be done to
> > > fix it?
> > 
> > I will just say it is very tricky.  I think a fix is possible (arm and
> > mips had similar cache problems) but the victim replacement present in
> > PA8800/PA8900 caches makes the problem especially difficult  for
> > hardware using these processors.
> > 
> > I have spent the last few months testing various alternatives and have
> > now done hundreds of kernel builds.  I did post some experimental patches
> > that fix the problem on UP kernels.  However, the problem is not resolved
> > for SMP kernels.
> 
> Note that Debian's buildds run a UP kernel, so as soon as those fixes
> go upstream we can pull them in. Thanks for all your work here!
> 

Well, as long as this is unfixed or at least "common", I don't see how hppa 
can be considered to be a release arch. Is that UP patch available somewhere?

All KDE applications have been stuck in unstable before due to this and 
history is about to repeat itself unless something is done. While apparently a 
failing test in eglibc can be ignored, other applications have to suffer real 
world problems...

-- 
Modestas Vainius <modestas@vainius.eu>
[signature.asc (application/pgp-signature, inline)]

Severity set to 'critical' from 'serious' Request was from Modestas Vainius <modax@debian.org> to control@bugs.debian.org. (Thu, 03 Jun 2010 08:57:06 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Fri, 04 Jun 2010 01:27:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to NIIBE Yutaka <gniibe@fsij.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Fri, 04 Jun 2010 01:27:03 GMT) Full text and rfc822 format available.

Message #215 received at 561203@bugs.debian.org (full text, mbox):

From: NIIBE Yutaka <gniibe@fsij.org>
To: Modestas Vainius <modestas@vainius.eu>
Cc: dann frazier <dannf@debian.org>, John David Anglin <dave.anglin@nrc-cnrc.gc.ca>, 561203@bugs.debian.org, deller@gmx.de, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, carlos@systemhalted.org
Subject: Re: Bug#561203: threads and fork on machine with VIPT-WB cache
Date: Fri, 04 Jun 2010 10:03:07 +0900
Modestas Vainius wrote:
>> Note that Debian's buildds run a UP kernel, so as soon as those fixes
>> go upstream we can pull them in. Thanks for all your work here!
>>
>
> Well, as long as this is unfixed or at least "common", I don't see how hppa
> can be considered to be a release arch. Is that UP patch available somewhere?

My case and my analysis talked about UP kernel, and John David Anglin
made a patch:
	http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144

After that, the discussion went to SMP cases.

It would be better to evaluate the patch again, and make sure it works
for UP case and fix failures of buildd, then apply for Linux in Debian
(only) for HPPA.

I know that the patch is not that ideal because it touches
architecture independent part of Linux, but it is worth for Linux in
Debian (or Linux for the HPPA machine of buildd, at least).
-- 




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Fri, 04 Jun 2010 05:24:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to dann frazier <dannf@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Fri, 04 Jun 2010 05:24:03 GMT) Full text and rfc822 format available.

Message #220 received at 561203@bugs.debian.org (full text, mbox):

From: dann frazier <dannf@debian.org>
To: NIIBE Yutaka <gniibe@fsij.org>, 561203@bugs.debian.org
Cc: Modestas Vainius <modestas@vainius.eu>, John David Anglin <dave.anglin@nrc-cnrc.gc.ca>, deller@gmx.de, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, carlos@systemhalted.org
Subject: Re: Bug#561203: threads and fork on machine with VIPT-WB cache
Date: Thu, 3 Jun 2010 23:21:06 -0600
On Fri, Jun 04, 2010 at 10:03:07AM +0900, NIIBE Yutaka wrote:
> Modestas Vainius wrote:
>>> Note that Debian's buildds run a UP kernel, so as soon as those fixes
>>> go upstream we can pull them in. Thanks for all your work here!
>>>
>>
>> Well, as long as this is unfixed or at least "common", I don't see how hppa
>> can be considered to be a release arch. Is that UP patch available somewhere?
>
> My case and my analysis talked about UP kernel, and John David Anglin
> made a patch:
> 	http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
>
> After that, the discussion went to SMP cases.
>
> It would be better to evaluate the patch again, and make sure it works
> for UP case and fix failures of buildd, then apply for Linux in Debian
> (only) for HPPA.
>
> I know that the patch is not that ideal because it touches
> architecture independent part of Linux, but it is worth for Linux in
> Debian (or Linux for the HPPA machine of buildd, at least).

I'm happy to test the patch if necessary to help push this change
upstream. However, we do need the change to go upstream before we can
include it in the Debian kernel.




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Fri, 04 Jun 2010 07:57:12 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bastian Blank <waldi@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Fri, 04 Jun 2010 07:57:12 GMT) Full text and rfc822 format available.

Message #225 received at 561203@bugs.debian.org (full text, mbox):

From: Bastian Blank <waldi@debian.org>
To: Modestas Vainius <modestas@vainius.eu>, 561203@bugs.debian.org
Subject: Re: Bug#561203: threads and fork on machine with VIPT-WB cache
Date: Fri, 4 Jun 2010 09:56:44 +0200
severity 561203 serious
thanks

On Thu, Jun 03, 2010 at 11:50:05AM +0300, Modestas Vainius wrote:
> # Breaks unrelated applications

Sorry, no. Almost all applications are related to the kernel.

> Well, as long as this is unfixed or at least "common", I don't see how hppa 
> can be considered to be a release arch. Is that UP patch available somewhere?

This is up to the release team.

Bastian

-- 
You're dead, Jim.
		-- McCoy, "Amok Time", stardate 3372.7




Severity set to 'serious' from 'critical' Request was from Bastian Blank <waldi@debian.org> to control@bugs.debian.org. (Fri, 04 Jun 2010 07:57:14 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Fri, 04 Jun 2010 10:58:30 GMT) Full text and rfc822 format available.

Acknowledgement sent to Thibaut VARENE <varenet@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Fri, 04 Jun 2010 10:58:30 GMT) Full text and rfc822 format available.

Message #232 received at 561203@bugs.debian.org (full text, mbox):

From: Thibaut VARENE <varenet@debian.org>
To: dann frazier <dannf@debian.org>
Cc: NIIBE Yutaka <gniibe@fsij.org>, 561203@bugs.debian.org, Modestas Vainius <modestas@vainius.eu>, John David Anglin <dave.anglin@nrc-cnrc.gc.ca>, deller@gmx.de, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, carlos@systemhalted.org
Subject: Re: Bug#561203: threads and fork on machine with VIPT-WB cache
Date: Fri, 4 Jun 2010 12:44:55 +0200
On Fri, Jun 4, 2010 at 7:21 AM, dann frazier <dannf@debian.org> wrote:
> On Fri, Jun 04, 2010 at 10:03:07AM +0900, NIIBE Yutaka wrote:
>> Modestas Vainius wrote:
>>>> Note that Debian's buildds run a UP kernel, so as soon as those fixes
>>>> go upstream we can pull them in. Thanks for all your work here!
>>>>
>>>
>>> Well, as long as this is unfixed or at least "common", I don't see how hppa
>>> can be considered to be a release arch. Is that UP patch available somewhere?
>>
>> My case and my analysis talked about UP kernel, and John David Anglin
>> made a patch:
>>       http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
>>
>> After that, the discussion went to SMP cases.
>>
>> It would be better to evaluate the patch again, and make sure it works
>> for UP case and fix failures of buildd, then apply for Linux in Debian
>> (only) for HPPA.
>>
>> I know that the patch is not that ideal because it touches
>> architecture independent part of Linux, but it is worth for Linux in
>> Debian (or Linux for the HPPA machine of buildd, at least).
>
> I'm happy to test the patch if necessary to help push this change
> upstream. However, we do need the change to go upstream before we can
> include it in the Debian kernel.

Just for reference, I've summarized the test cases and related patches here:
http://wiki.parisc-linux.org/TestCases

HTH

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/




Removed indication that 561203 affects src:kde4libs Added indication that 561203 affects libqtcore4 Request was from Modestas Vainius <modax@debian.org> to control@bugs.debian.org. (Sat, 05 Jun 2010 23:46:48 GMT) Full text and rfc822 format available.

Added indication that 561203 affects libqtcore4 and src:kde4libs Request was from Modestas Vainius <modax@debian.org> to control@bugs.debian.org. (Sun, 06 Jun 2010 00:03:10 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Sun, 06 Jun 2010 01:06:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Modestas Vainius <modestas@vainius.eu>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Sun, 06 Jun 2010 01:06:03 GMT) Full text and rfc822 format available.

Message #241 received at 561203@bugs.debian.org (full text, mbox):

From: Modestas Vainius <modestas@vainius.eu>
To: dann frazier <dannf@debian.org>
Cc: NIIBE Yutaka <gniibe@fsij.org>, 561203@bugs.debian.org, John David Anglin <dave.anglin@nrc-cnrc.gc.ca>, deller@gmx.de, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, carlos@systemhalted.org
Subject: Re: Bug#561203: threads and fork on machine with VIPT-WB cache
Date: Sun, 6 Jun 2010 04:01:23 +0300
[Message part 1 (text/plain, inline)]
Hello,

On penktadienis 04 Birželis 2010 08:21:06 dann frazier wrote:
> > My case and my analysis talked about UP kernel, and John David Anglin
> > 
> > made a patch:
> > 	http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> > 
> > After that, the discussion went to SMP cases.
> > 
> > It would be better to evaluate the patch again, and make sure it works
> > for UP case and fix failures of buildd, then apply for Linux in Debian
> > (only) for HPPA.
> > 
> > I know that the patch is not that ideal because it touches
> > architecture independent part of Linux, but it is worth for Linux in
> > Debian (or Linux for the HPPA machine of buildd, at least).
> 
> I'm happy to test the patch if necessary to help push this change
> upstream. However, we do need the change to go upstream before we can
> include it in the Debian kernel.

I made a hackish patch for QProcess in Qt (usleep(1000) before fork()) which 
seems to reduce likelihood of the failure to very rare again. Once a new 
revision of qt4-x11 is uploaded to sid (soon I believe), KDE applications 
should be able to build again (hopefully).

Obviously it would be better to get this bug fixed for real but at least now 
the whole KDE stack won't be held by it while we wait.

-- 
Modestas Vainius <modestas@vainius.eu>
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Mon, 07 Jun 2010 17:15:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to dann frazier <dannf@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Mon, 07 Jun 2010 17:15:03 GMT) Full text and rfc822 format available.

Message #246 received at 561203@bugs.debian.org (full text, mbox):

From: dann frazier <dannf@debian.org>
To: Thibaut VARENE <varenet@debian.org>, 561203@bugs.debian.org
Cc: NIIBE Yutaka <gniibe@fsij.org>, Modestas Vainius <modestas@vainius.eu>, John David Anglin <dave.anglin@nrc-cnrc.gc.ca>, deller@gmx.de, linux-parisc@vger.kernel.org, pkg-gauche-devel@lists.alioth.debian.org, carlos@systemhalted.org
Subject: Re: Bug#561203: threads and fork on machine with VIPT-WB cache
Date: Mon, 7 Jun 2010 11:11:37 -0600
On Fri, Jun 04, 2010 at 12:44:55PM +0200, Thibaut VARENE wrote:
> On Fri, Jun 4, 2010 at 7:21 AM, dann frazier <dannf@debian.org> wrote:
> > On Fri, Jun 04, 2010 at 10:03:07AM +0900, NIIBE Yutaka wrote:
> >> Modestas Vainius wrote:
> >>>> Note that Debian's buildds run a UP kernel, so as soon as those fixes
> >>>> go upstream we can pull them in. Thanks for all your work here!
> >>>>
> >>>
> >>> Well, as long as this is unfixed or at least "common", I don't see how hppa
> >>> can be considered to be a release arch. Is that UP patch available somewhere?
> >>
> >> My case and my analysis talked about UP kernel, and John David Anglin
> >> made a patch:
> >>       http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> >>
> >> After that, the discussion went to SMP cases.
> >>
> >> It would be better to evaluate the patch again, and make sure it works
> >> for UP case and fix failures of buildd, then apply for Linux in Debian
> >> (only) for HPPA.
> >>
> >> I know that the patch is not that ideal because it touches
> >> architecture independent part of Linux, but it is worth for Linux in
> >> Debian (or Linux for the HPPA machine of buildd, at least).
> >
> > I'm happy to test the patch if necessary to help push this change
> > upstream. However, we do need the change to go upstream before we can
> > include it in the Debian kernel.
> 
> Just for reference, I've summarized the test cases and related patches here:
> http://wiki.parisc-linux.org/TestCases

Cool - that is helpful. I've updated the kernel on peri/penalosa with
the various patches listed there that have gone upstream, but I'm not
seeing better results with any failing packages.

btw, I thought it would be useful to edit that page and tag each patch
with its status in Debian (in-official-kernel, installed-on-buildds,
etc), but the page appears to be immutable.

-- 
dann frazier





Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Thu, 10 Jun 2010 16:33:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Modestas Vainius <modestas@vainius.eu>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 10 Jun 2010 16:33:07 GMT) Full text and rfc822 format available.

Message #251 received at 561203@bugs.debian.org (full text, mbox):

From: Modestas Vainius <modestas@vainius.eu>
To: dann frazier <dannf@debian.org>, 561203@bugs.debian.org
Subject: Re: Bug#561203: threads and fork on machine with VIPT-WB cache
Date: Thu, 10 Jun 2010 19:30:45 +0300
[Message part 1 (text/plain, inline)]
Hello,

On sekmadienis 06 Birželis 2010 04:01:23 Modestas Vainius wrote:
> On penktadienis 04 Birželis 2010 08:21:06 dann frazier wrote:
> > > My case and my analysis talked about UP kernel, and John David Anglin
> > > 
> > > made a patch:
> > > 	http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> > > 
> > > After that, the discussion went to SMP cases.
> > > 
> > > It would be better to evaluate the patch again, and make sure it works
> > > for UP case and fix failures of buildd, then apply for Linux in Debian
> > > (only) for HPPA.
> > > 
> > > I know that the patch is not that ideal because it touches
> > > architecture independent part of Linux, but it is worth for Linux in
> > > Debian (or Linux for the HPPA machine of buildd, at least).
> > 
> > I'm happy to test the patch if necessary to help push this change
> > upstream. However, we do need the change to go upstream before we can
> > include it in the Debian kernel.
> 
> I made a hackish patch for QProcess in Qt (usleep(1000) before fork())
> which seems to reduce likelihood of the failure to very rare again. Once a
> new revision of qt4-x11 is uploaded to sid (soon I believe), KDE
> applications should be able to build again (hopefully).

qt4-x11/hppa 4:4.6.3-1 has recently been uploaded to incoming. It has my hppa 
hack applied. Therefore please give back the following KDE packages on hppa:

kde4libs basket kdesvn webkitkde kraft konq-plugins


-- 
Modestas Vainius <modestas@vainius.eu>
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#561203; Package linux-2.6. (Thu, 10 Jun 2010 17:30:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to dann frazier <dannf@dannf.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 10 Jun 2010 17:30:05 GMT) Full text and rfc822 format available.

Message #256 received at 561203@bugs.debian.org (full text, mbox):

From: dann frazier <dannf@dannf.org>
To: Modestas Vainius <modestas@vainius.eu>
Cc: 561203@bugs.debian.org
Subject: Re: Bug#561203: threads and fork on machine with VIPT-WB cache
Date: Thu, 10 Jun 2010 11:27:16 -0600
On Thu, Jun 10, 2010 at 07:30:45PM +0300, Modestas Vainius wrote:
> Hello,
> 
> On sekmadienis 06 Bir??elis 2010 04:01:23 Modestas Vainius wrote:
> > On penktadienis 04 Bir??elis 2010 08:21:06 dann frazier wrote:
> > > > My case and my analysis talked about UP kernel, and John David Anglin
> > > > 
> > > > made a patch:
> > > > 	http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> > > > 
> > > > After that, the discussion went to SMP cases.
> > > > 
> > > > It would be better to evaluate the patch again, and make sure it works
> > > > for UP case and fix failures of buildd, then apply for Linux in Debian
> > > > (only) for HPPA.
> > > > 
> > > > I know that the patch is not that ideal because it touches
> > > > architecture independent part of Linux, but it is worth for Linux in
> > > > Debian (or Linux for the HPPA machine of buildd, at least).
> > > 
> > > I'm happy to test the patch if necessary to help push this change
> > > upstream. However, we do need the change to go upstream before we can
> > > include it in the Debian kernel.
> > 
> > I made a hackish patch for QProcess in Qt (usleep(1000) before fork())
> > which seems to reduce likelihood of the failure to very rare again. Once a
> > new revision of qt4-x11 is uploaded to sid (soon I believe), KDE
> > applications should be able to build again (hopefully).
> 
> qt4-x11/hppa 4:4.6.3-1 has recently been uploaded to incoming. It has my hppa 
> hack applied. Therefore please give back the following KDE packages on hppa:
> 
> kde4libs basket kdesvn webkitkde kraft konq-plugins

done.





Added indication that bug 561203 blocks 579835 Request was from Mark Purcell <msp@debian.org> to control@bugs.debian.org. (Fri, 11 Jun 2010 23:45:04 GMT) Full text and rfc822 format available.

Removed indication that bug 561203 blocks 579835 Request was from Mark Purcell <msp@debian.org> to control@bugs.debian.org. (Wed, 16 Jun 2010 21:09:08 GMT) Full text and rfc822 format available.

Added indication that bug 561203 blocks 590889 Request was from Modestas Vainius <modax@debian.org> to control@bugs.debian.org. (Thu, 29 Jul 2010 22:42:05 GMT) Full text and rfc822 format available.

Severity set to 'important' from 'serious' Request was from Adam D. Barratt <adam@adam-barratt.org.uk> to control@bugs.debian.org. (Fri, 24 Sep 2010 17:51:09 GMT) Full text and rfc822 format available.

Added tag(s) wheezy. Request was from Kurt Roeckx <kurt@roeckx.be> to control@bugs.debian.org. (Wed, 16 Feb 2011 19:03:44 GMT) Full text and rfc822 format available.

Added tag(s) upstream and experimental. Request was from Jonathan Nieder <jrnieder@gmail.com> to control@bugs.debian.org. (Sat, 01 Oct 2011 02:33:14 GMT) Full text and rfc822 format available.

Set Bug forwarded-to-address to 'http://thread.gmane.org/gmane.linux.ports.parisc/3161/focus=3291'. Request was from Jonathan Nieder <jrnieder@gmail.com> to control@bugs.debian.org. (Sat, 01 Oct 2011 02:33:15 GMT) Full text and rfc822 format available.

Bug Marked as found in versions linux-2.6/3.2.2-1. Request was from Jonathan Nieder <jrnieder@gmail.com> to control@bugs.debian.org. (Fri, 03 Feb 2012 23:45:07 GMT) Full text and rfc822 format available.

Changed Bug forwarded-to-address to 'http://thread.gmane.org/gmane.linux.ports.parisc/4085/focus=4120' from 'http://thread.gmane.org/gmane.linux.ports.parisc/3161/focus=3291' Request was from Jonathan Nieder <jrnieder@gmail.com> to control@bugs.debian.org. (Fri, 03 Feb 2012 23:45:08 GMT) Full text and rfc822 format available.

Added tag(s) moreinfo and patch. Request was from Jonathan Nieder <jrnieder@gmail.com> to control@bugs.debian.org. (Fri, 03 Feb 2012 23:45:08 GMT) Full text and rfc822 format available.

Removed tag(s) sid, squeeze, wheezy, and experimental. Request was from Jonathan Nieder <jrnieder@gmail.com> to control@bugs.debian.org. (Mon, 12 Mar 2012 02:15:02 GMT) Full text and rfc822 format available.

Bug Marked as found in versions linux-2.6/2.6.33-1. Request was from Jonathan Nieder <jrnieder@gmail.com> to control@bugs.debian.org. (Mon, 12 Mar 2012 02:15:03 GMT) Full text and rfc822 format available.

Removed tag(s) help. Request was from Jonathan Nieder <jrnieder@gmail.com> to control@bugs.debian.org. (Mon, 12 Mar 2012 02:15:04 GMT) Full text and rfc822 format available.

Changed Bug forwarded-to-address to 'http://thread.gmane.org/gmane.linux.ports.parisc/4267' from 'http://thread.gmane.org/gmane.linux.ports.parisc/4085/focus=4120' Request was from Jonathan Nieder <jrnieder@gmail.com> to control@bugs.debian.org. (Sun, 20 May 2012 10:42:41 GMT) Full text and rfc822 format available.

Reply sent to Moritz Muehlenhoff <jmm@inutil.org>:
You have taken responsibility. (Wed, 14 Aug 2013 14:39:17 GMT) Full text and rfc822 format available.

Notification sent to dann frazier <dannf@debian.org>:
Bug acknowledged by developer. (Wed, 14 Aug 2013 14:39:17 GMT) Full text and rfc822 format available.

Message #289 received at 561203-done@bugs.debian.org (full text, mbox):

From: Moritz Muehlenhoff <jmm@inutil.org>
To: dann frazier <dannf@dannf.org>
Cc: 561203-done@bugs.debian.org
Subject: Re: Bug#561203: threads and fork on machine with VIPT-WB cache
Date: Wed, 14 Aug 2013 16:33:06 +0200
On Thu, Jun 10, 2010 at 11:27:16AM -0600, dann frazier wrote:
> On Thu, Jun 10, 2010 at 07:30:45PM +0300, Modestas Vainius wrote:
> > Hello,
> > 
> > On sekmadienis 06 Bir??elis 2010 04:01:23 Modestas Vainius wrote:
> > > On penktadienis 04 Bir??elis 2010 08:21:06 dann frazier wrote:
> > > > > My case and my analysis talked about UP kernel, and John David Anglin
> > > > > 
> > > > > made a patch:
> > > > > 	http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=561203#144
> > > > > 
> > > > > After that, the discussion went to SMP cases.
> > > > > 
> > > > > It would be better to evaluate the patch again, and make sure it works
> > > > > for UP case and fix failures of buildd, then apply for Linux in Debian
> > > > > (only) for HPPA.
> > > > > 
> > > > > I know that the patch is not that ideal because it touches
> > > > > architecture independent part of Linux, but it is worth for Linux in
> > > > > Debian (or Linux for the HPPA machine of buildd, at least).
> > > > 
> > > > I'm happy to test the patch if necessary to help push this change
> > > > upstream. However, we do need the change to go upstream before we can
> > > > include it in the Debian kernel.
> > > 
> > > I made a hackish patch for QProcess in Qt (usleep(1000) before fork())
> > > which seems to reduce likelihood of the failure to very rare again. Once a
> > > new revision of qt4-x11 is uploaded to sid (soon I believe), KDE
> > > applications should be able to build again (hopefully).
> > 
> > qt4-x11/hppa 4:4.6.3-1 has recently been uploaded to incoming. It has my hppa 
> > hack applied. Therefore please give back the following KDE packages on hppa:
> > 
> > kde4libs basket kdesvn webkitkde kraft konq-plugins

hppa is no longer a supported arch, but this was fixed upstream recently: The changes
from http://thread.gmane.org/gmane.linux.ports.parisc/4267 are part of 3.10.

Cheers,
        Moritz



Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Thu, 12 Sep 2013 07:25:57 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Mon Apr 21 02:45:15 2014; Machine Name: beach.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.