Debian Bug report logs -
#376329
utf-8 total weirdness
Toggle useless messages
Report forwarded to debian-bugs-dist@lists.debian.org, Brendan O'Dea <bod@debian.org>:
Bug#376329; Package perl.
(full text, mbox, link).
Acknowledgement sent to Joey Hess <joeyh@debian.org>:
New Bug report received and forwarded. Copy sent to Brendan O'Dea <bod@debian.org>.
(full text, mbox, link).
Message #5 received at submit@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Package: perl
Version: 5.8.8-6
Severity: normal
Please excuse the bug title, after working on this for something like 5
hours, I cannot think clearly enough to write a short title describing
this very weird bug. Let the code speak for me. I have attached a testcase;
untar it, run the "repro" program.
joey@kodama:~/tmp/repor/testcase>./repro
a
b
Wide character in subroutine entry at /usr/bin/markdown line 360.
zsh: exit 255 ./repro
Now, edit the repro file. There are 4 comments suggesting changes; if you make
any one of the changes, the wide character failure disappears.
Notice that several of the changes should not possibly affect anything,
but do. For example, uncommenting the s/// line should be a null change because
$mommy is otherwise utterly unused. But umcommenting that line "fixes"
the problem. This smells deeply of a perl bug to me. I boiled this test case down
from several thousand lines of code, dealing with many changes like this that
inexplicably hid the problem.
I should probably do a similar reduction on markdown and possibly HTML::Scrubber,
but it's getting late. Their versions here are listed below.
Here's some analysis of what's going on inside markdown when it fails:
<paravoid> watch this:
<paravoid> print 'text is utf: ', utf8::is_utf8($text) ? 'yes' : 'no', "\n";
<paravoid> $text =~ s{
<paravoid> ( # save in $1
<paravoid> ^ # start of line (with /m)
<paravoid> <($block_tags_a) # start tag = $2
<paravoid> \b # word break
<paravoid> (.*\n)*? # any number of lines, minimally matching
<paravoid> </\2> # the matching end tag
<paravoid> [ \t]* # trailing spaces/tabs
<paravoid> )
<paravoid> }{
<paravoid> print '$1 is utf: ', utf8::is_utf8($1) ? 'yes' : 'no', "\n";
<paravoid> my $key = md5_hex($1);
<paravoid> $g_html_blocks{$key} = $1;
<paravoid> "\n\n" . $key . "\n\n";
<paravoid> }egmx;
<paravoid> I added the two 'prints'
<paravoid> text is utf: no
<paravoid> $1 is utf: yes
<paravoid> that's freaking weird
<paravoid> the utf8 flag gets enabled after the regexp is run
Also note that paravoid had a version (much larger; a small modification to
ikiwiki) that reproduced the bug w/o HTML::Scrubber being loaded. As far as
I can guess, the HTML::Scrubber stuff doesn't really have any bearing on the bug
and is just one more mysterious thing that hides the bug if it's removed.
-- System Information:
Debian Release: testing/unstable
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.17-1-686
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Versions of packages perl depends on:
ii libc6 2.3.6-15 GNU C Library: Shared libraries
ii libdb4.4 4.4.20-6 Berkeley v4.4 Database Libraries [
ii libgdbm3 1.8.3-3 GNU dbm database routines (runtime
ii perl-base 5.8.8-6 The Pathologically Eclectic Rubbis
ii perl-modules 5.8.8-6 Core Perl modules
Versions of packages perl recommends:
ii perl-doc 5.8.8-6 Perl documentation
Other software:
ii markdown 1.0.1-3 Text-to-HTML conversion tool
ii libhtml-scrubb 0.08-2 Perl extension for scrubbing/sanitizing html
paravoid reproduced it using a similar test case on a system running sarge with:
<paravoid> ii perl 5.8.4-8sarge4 Larry Wall's Practical Extraction and Report
<paravoid> ii markdown 1.0.1-2 Text-to-HTML conversion tool
<paravoid> ii libhtml-scrubb 0.08-1 Perl extension for scrubbing/sanitizing html
--
see shy jo
[signature.asc (application/pgp-signature, inline)]
Information forwarded to debian-bugs-dist@lists.debian.org, Brendan O'Dea <bod@debian.org>:
Bug#376329; Package perl.
(full text, mbox, link).
Acknowledgement sent to Joey Hess <joeyh@debian.org>:
Extra info received and forwarded to list. Copy sent to Brendan O'Dea <bod@debian.org>.
(full text, mbox, link).
Message #10 received at 376329@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Of course I forgot the attachment..
--
see shy jo
[testcase.tgz (application/x-gtar, attachment)]
[signature.asc (application/pgp-signature, inline)]
Information forwarded to debian-bugs-dist@lists.debian.org, Brendan O'Dea <bod@debian.org>:
Bug#376329; Package perl.
(full text, mbox, link).
Acknowledgement sent to Joey Hess <joeyh@debian.org>:
Extra info received and forwarded to list. Copy sent to Brendan O'Dea <bod@debian.org>.
(full text, mbox, link).
Message #15 received at 376329@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
I can still reproduce using the test case with
ii perl 5.8.8-12 Larry Wall's Practical Extraction and Report
ii markdown 1.0.1-3 Text-to-HTML conversion tool
With newer versions of markdown, I have to modify the repro program
to use Text::Markdown. After doing so, I successfully reproduced it
with markdown 1.0.1-6. However, I failed to reproduce it with markdown
1.0.2~b8-2 (currently in experimental).
--
see shy jo
[signature.asc (application/pgp-signature, inline)]
Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#376329; Package perl.
(full text, mbox, link).
Acknowledgement sent to Brendan O'Dea <bod@debian.org>:
Extra info received and forwarded to list.
(full text, mbox, link).
Message #20 received at 376329@bugs.debian.org (full text, mbox, reply):
Version: 5.10.0~rc1-1
On Tue, Nov 27, 2007 at 04:20:34PM -0500, Joey Hess wrote:
>I can still reproduce using the test case with
>ii perl 5.8.8-12 Larry Wall's Practical Extraction and Report
Also a problem with 5.10.0-RC1.
--bod
Information forwarded
to debian-bugs-dist@lists.debian.org, Niko Tyni <ntyni@debian.org>:
Bug#376329; Package perl.
(Thu, 07 Jan 2010 10:27:07 GMT) (full text, mbox, link).
Acknowledgement sent
to "Eugene V. Lyubimkin" <jackyf@debian.org>:
Extra info received and forwarded to list. Copy sent to Niko Tyni <ntyni@debian.org>.
(Thu, 07 Jan 2010 10:27:07 GMT) (full text, mbox, link).
Message #25 received at 376329@bugs.debian.org (full text, mbox, reply):
Hello Joey,
This bug is currently unreproducible by me. And by you?
Versions of software in my system:
ii libhtml-scrubber-perl 0.08-4
ii libtext-markdown-perl 1.0.26-1
ii perl 5.10.1-8
Information forwarded
to debian-bugs-dist@lists.debian.org, Niko Tyni <ntyni@debian.org>:
Bug#376329; Package perl.
(Thu, 07 Jan 2010 18:15:02 GMT) (full text, mbox, link).
Acknowledgement sent
to Joey Hess <joeyh@debian.org>:
Extra info received and forwarded to list. Copy sent to Niko Tyni <ntyni@debian.org>.
(Thu, 07 Jan 2010 18:15:02 GMT) (full text, mbox, link).
Message #30 received at 376329@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Eugene V. Lyubimkin wrote:
> Hello Joey,
>
> This bug is currently unreproducible by me. And by you?
>
> Versions of software in my system:
>
> ii libhtml-scrubber-perl 0.08-4
> ii libtext-markdown-perl 1.0.26-1
No, you need to have the markdown package installed to reproduce this
bug. Yes, it is still reproducible with
ii libhtml-scrubb 0.08-4 Perl extension for scrubbing/sanitizing html
ii markdown 1.0.1-7 Text-to-HTML conversion tool
ii perl 5.10.1-8 Larry Wall's Practical Extraction and Report
Given that this bug is so fragile that removing even one line of the testcase
that should not possibly have any bearing masks it.. Even if the bug stopped
being reproducible, unless perl had a big changelog entry along the lines of
"fixed memory corruption problem that resulted in utf8 flag being incorrectly
set" -- I would assume the bug was still present and only hiding.
--
see shy jo
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to debian-bugs-dist@lists.debian.org:
Bug#376329; Package perl.
(Tue, 26 Jan 2010 21:06:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Niko Tyni <ntyni@debian.org>:
Extra info received and forwarded to list.
(Tue, 26 Jan 2010 21:06:03 GMT) (full text, mbox, link).
Message #35 received at 376329@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
This is a bug report for perl from Niko Tyni <ntyni@debian.org>
generated with the help of perlbug 1.39 running under perl 5.11.4.
-----------------------------------------------------------------
The following script croaks with 'Wide character in subroutine entry'
although $1 should not have the utf8 flag set at all.
#!perl
use Digest::MD5 qw(md5_hex);
qq[\x{263a}] =~ /(.)/ and "$1"; # vivify $1 with utf8 flag
"\303\244" =~ /(.)/ and do {
# "$1"; #uncomment this and it goes away
print md5_hex($1), "\n";;
}
__END__
The exception is thrown from the SvPVbyte call in cpan/Digest-MD5/MD5.xs:726.
AIUI, SvPVbyte() looks at the utf8 flag before handling get magic for
$1 and therefore gets the old value.
I'm attaching a patch that adds a TODO test in XS-APItest for this.
Originally reported by Joey Hess as http://bugs.debian.org/376329 .
-----------------------------------------------------------------
---
Flags:
category=core
severity=low
---
Site configuration information for perl 5.11.4:
Configured by niko at Mon Jan 25 19:04:36 EET 2010.
Summary of my perl5 (revision 5 version 11 subversion 4) configuration:
Local Commit: 403e9a2445a629525fa5e35afa699e4140495dcf
Ancestor: fe61459e95657c432074058bd8854fec03559335
Platform:
osname=linux, osvers=2.6.32-trunk-amd64, archname=x86_64-linux-gnu-thread-multi
uname='linux madeleine 2.6.32-trunk-amd64 #1 smp sun jan 10 22:40:40 utc 2010 x86_64 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.11 -Darchlib=/usr/lib/perl/5.11 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.11.4 -Dsitearch=/usr/local/lib/perl/5.11.4 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=both -Doptimize=-O0 -Dusedevel -Uuseshrplib -des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O0 -g',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.4.3 20100108 (prerelease)', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib /lib64 /usr/lib64
libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=/lib/libc-2.10.2.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.10.2'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -O0 -g -L/usr/local/lib -fstack-protector'
Locally applied patches:
---
@INC for perl 5.11.4:
lib
/usr/local/lib/perl/5.11.4
/usr/local/share/perl/5.11.4
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.11
/usr/share/perl/5.11
.
---
Environment for perl 5.11.4:
HOME=/home/niko
LANG=en_US.UTF-8
LANGUAGE (unset)
LC_CTYPE=fi_FI.UTF-8
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/niko/bin:/home/niko/bin:/home/niko/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/sbin:/usr/sbin:/sbin:/usr/sbin
PERL_BADLANG (unset)
SHELL=/bin/zsh
[0001-TODO-test-SvPVbyte-should-handle-get-magic-before-ch.patch (text/x-diff, attachment)]
Information forwarded
to debian-bugs-dist@lists.debian.org:
Bug#376329; Package perl.
(Tue, 26 Jan 2010 21:09:06 GMT) (full text, mbox, link).
Acknowledgement sent
to Niko Tyni <ntyni@debian.org>:
Extra info received and forwarded to list.
(Tue, 26 Jan 2010 21:09:06 GMT) (full text, mbox, link).
Message #40 received at 376329@bugs.debian.org (full text, mbox, reply):
forwarded 376329 http://rt.perl.org/rt3/Public/Bug/Display.html?id=72398
thanks
On Sun, Jul 02, 2006 at 01:37:42AM -0400, Joey Hess wrote:
> Package: perl
> Version: 5.8.8-6
> Severity: normal
> joey@kodama:~/tmp/repor/testcase>./repro
> a
> b
> Wide character in subroutine entry at /usr/bin/markdown line 360.
> zsh: exit 255 ./repro
> <paravoid> that's freaking weird
> <paravoid> the utf8 flag gets enabled after the regexp is run
Here's a reduced testcase:
#!perl
use Digest::MD5 qw(md5_hex);
qq[\x{263a}] =~ /(.)/ and "$1"; # vivify $1 with utf8 flag
"\303\244" =~ /(.)/ and do {
# "$1"; #uncomment this and it goes away
print md5_hex($1), "\n";;
}
__END__
The problem seems to be that the utf8 flag of $1 is checked before its
"get magic" has been applied to it, and the old value of the flag makes
Digest::MD5 croak incorrectly.
Workarounds include peeking at the contents $1 somehow (for example by
stringifying it with md5_hex("$1") or resetting it first with another
match.
This is still reproducible on bleadperl (5.11.4 or so), and I just filed
upstream ticket [perl #72398].
--
Niko Tyni ntyni@debian.org
Added tag(s) fixed-upstream.
Request was from bts-link-upstream@lists.alioth.debian.org
to control@bugs.debian.org.
(Mon, 25 Oct 2010 16:36:19 GMT) (full text, mbox, link).
Information forwarded
to debian-bugs-dist@lists.debian.org, Niko Tyni <ntyni@debian.org>:
Bug#376329; Package perl.
(Tue, 31 May 2011 20:06:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Dominic Hargreaves <dom@earth.li>:
Extra info received and forwarded to list. Copy sent to Niko Tyni <ntyni@debian.org>.
(Tue, 31 May 2011 20:06:03 GMT) (full text, mbox, link).
Message #49 received at 376329@bugs.debian.org (full text, mbox, reply):
fixed 376329 5.14.0-1
thanks
On Thu, Jan 07, 2010 at 01:12:11PM -0500, Joey Hess wrote:
> Eugene V. Lyubimkin wrote:
> > Hello Joey,
> >
> > This bug is currently unreproducible by me. And by you?
> >
> > Versions of software in my system:
> >
> > ii libhtml-scrubber-perl 0.08-4
> > ii libtext-markdown-perl 1.0.26-1
>
> No, you need to have the markdown package installed to reproduce this
> bug. Yes, it is still reproducible with
>
> ii libhtml-scrubb 0.08-4 Perl extension for scrubbing/sanitizing html
> ii markdown 1.0.1-7 Text-to-HTML conversion tool
> ii perl 5.10.1-8 Larry Wall's Practical Extraction and Report
>
> Given that this bug is so fragile that removing even one line of the testcase
> that should not possibly have any bearing masks it.. Even if the bug stopped
> being reproducible, unless perl had a big changelog entry along the lines of
> "fixed memory corruption problem that resulted in utf8 flag being incorrectly
> set" -- I would assume the bug was still present and only hiding.
I believe this has now been fixed upstream (in 5.14). See
<http://rt.perl.org/rt3/Ticket/Display.html?id=72398>.
Niko's test script succeeds with 5.14.0-1 from experimental.
Cheers,
Dominic.
--
Dominic Hargreaves | http://www.larted.org.uk/~dom/
PGP key 5178E2A5 from the.earth.li (keyserver,web,email)
Bug Marked as fixed in versions perl/5.14.0-1.
Request was from Dominic Hargreaves <dom@earth.li>
to control@bugs.debian.org.
(Tue, 31 May 2011 20:06:04 GMT) (full text, mbox, link).
Reply sent
to Dominic Hargreaves <dom@earth.li>:
You have taken responsibility.
(Sun, 13 Nov 2011 17:09:27 GMT) (full text, mbox, link).
Notification sent
to Joey Hess <joeyh@debian.org>:
Bug acknowledged by developer.
(Sun, 13 Nov 2011 17:09:27 GMT) (full text, mbox, link).
Message #56 received at 376329-done@bugs.debian.org (full text, mbox, reply):
I believe that these bugs have all been fixed in perl 5.14, which
has now migrated from experimental to unstable.
Dominic.
--
Dominic Hargreaves | http://www.larted.org.uk/~dom/
PGP key 5178E2A5 from the.earth.li (keyserver,web,email)
Bug archived.
Request was from Debbugs Internal Request <owner@bugs.debian.org>
to internal_control@bugs.debian.org.
(Mon, 12 Dec 2011 07:36:26 GMT) (full text, mbox, link).
Send a report that this bug log contains spam.
Debian bug tracking system administrator <owner@bugs.debian.org>.
Last modified:
Tue Aug 14 22:02:57 2018;
Machine Name:
buxtehude
Debian Bug tracking system
Debbugs is free software and licensed under the terms of the GNU
Public License version 2. The current version can be obtained
from https://bugs.debian.org/debbugs-source/.
Copyright © 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson,
2005-2017 Don Armstrong, and many other contributors.