Debian Bug report logs - #854821
iconv: behavior change with C.UTF-8

version graph

Package: libc-bin; Maintainer for libc-bin is GNU Libc Maintainers <debian-glibc@lists.debian.org>; Source for libc-bin is src:glibc (PTS, buildd, popcon).

Affects: phpmyadmin

Reported by: Nishanth Aravamudan <nish.aravamudan@canonical.com>

Date: Fri, 10 Feb 2017 17:57:01 UTC

Severity: normal

Found in version glibc/2.24-9

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#854821; Package libc-bin. (Fri, 10 Feb 2017 17:57:03 GMT) (full text, mbox, link).


Acknowledgement sent to Nishanth Aravamudan <nish.aravamudan@canonical.com>:
New Bug report received and forwarded. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Fri, 10 Feb 2017 17:57:03 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Nishanth Aravamudan <nish.aravamudan@canonical.com>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: iconv: behavior change with C.UTF-8
Date: Fri, 10 Feb 2017 09:55:27 -0800
Package: libc-bin
Version: 2.24-9
Severity: normal

Dear Maintainer,

I am trying to track down the root cause of a FTBFS in phpmyadmin (e.g,
https://launchpadlibrarian.net/301371947/buildlog_ubuntu-zesty-amd64.phpmyadmin_4%3A4.6.5.2-1_BUILDING.txt.gz,
which is due to a testcase failure at build-time:

"iconv(): Detected an illegal character in input string"

The test in question is basically doing:

$ echo "This is the Euro symbol '€'" |iconv -f UTF-8 -t ISO-8859-1//TRANSLIT

Since the builders default to C.UTF-8, if one prefaces this with

$ export LC_ALL=C.UTF-8

in various environments, we get:

Yakkety (libc.bin == 2.24-3ubuntu2) produces:
This is the Euro symbol 'EUR'

Zesty (libc.bin == 2.24-7ubuntu2) produces:
This is the Euro symbol 'iconv: illegal input sequence at position 25

Stretch & Sid (libc.bin == 2.24.9) produce:
This is the Euro symbol 'iconv: illegal input sequence at position 25

Given that phpmyadmin did build in Sid (earlier), I'm guessing that on
the next rebuild of phpmyadmin, it will fail in the same way as Ubuntu.

If the LC_ALL is set to POSIX or en_US.UTF-8 or C, the testcase passes
in all environments. I am not sure if this is due to the change back to
combining for transliteration in C.UTF-8, the update to Unicode 9, or a
combination of the two, but I think this behavior change was unintended?

The following is from my reporting system (running Ubuntu), but I am
able to reproduce the issue in a Sid schroot, as mentioned.

-- System Information:
Debian Release: stretch/sid
  APT prefers yakkety-updates
  APT policy: (500, 'yakkety-updates'), (500, 'yakkety-security'), (500, 'yakkety'), (100, 'yakkety-backports')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.8.0-37-generic (SMP w/4 CPU cores)
Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages libc-bin depends on:
ii  libc6  2.24-3ubuntu2

libc-bin recommends no packages.

Versions of packages libc-bin suggests:
ii  manpages  4.07-1

-- no debconf information

-- 
Nishanth Aravamudan
Ubuntu Server
Canonical Ltd



Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#854821; Package libc-bin. (Fri, 10 Feb 2017 19:54:08 GMT) (full text, mbox, link).


Acknowledgement sent to Nish Aravamudan <nish.aravamudan@canonical.com>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Fri, 10 Feb 2017 19:54:08 GMT) (full text, mbox, link).


Message #10 received at 854821@bugs.debian.org (full text, mbox, reply):

From: Nish Aravamudan <nish.aravamudan@canonical.com>
To: 854821@bugs.debian.org
Subject: Reverting 29be63fd restores the old behavior
Date: Fri, 10 Feb 2017 11:50:22 -0800
Just ran a quick test using a PPA build of glibc with 29be63fd
("debian/patches/localedata/locale-C.diff: switch back transliterations
to combining. Closes: #840199" [0] reverted and the test passes in a
17.04 (Zesty) container again.

Thanks,
Nish

[0]
https://anonscm.debian.org/cgit/pkg-glibc/glibc.git/commit/?id=29be63fde23ee497bb83fc9fee77171d26a7d447

-- 
Nishanth Aravamudan
Ubuntu Server
Canonical Ltd



Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#854821; Package libc-bin. (Tue, 14 Feb 2017 19:33:05 GMT) (full text, mbox, link).


Acknowledgement sent to Aurelien Jarno <aurelien@aurel32.net>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Tue, 14 Feb 2017 19:33:05 GMT) (full text, mbox, link).


Message #15 received at 854821@bugs.debian.org (full text, mbox, reply):

From: Aurelien Jarno <aurelien@aurel32.net>
To: Nish Aravamudan <nish.aravamudan@canonical.com>, 854821@bugs.debian.org
Subject: Re: Bug#854821: Reverting 29be63fd restores the old behavior
Date: Tue, 14 Feb 2017 20:30:52 +0100
On 2017-02-10 11:50, Nish Aravamudan wrote:
> Just ran a quick test using a PPA build of glibc with 29be63fd
> ("debian/patches/localedata/locale-C.diff: switch back transliterations
> to combining. Closes: #840199" [0] reverted and the test passes in a
> 17.04 (Zesty) container again.

This change is intentional, and was done to revert an unintentional
change (see #840199). Now the behaviour is consistent between jessie
and stretch/sid.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net



Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#854821; Package libc-bin. (Wed, 22 Feb 2017 01:42:03 GMT) (full text, mbox, link).


Acknowledgement sent to Nish Aravamudan <nish.aravamudan@canonical.com>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Wed, 22 Feb 2017 01:42:03 GMT) (full text, mbox, link).


Message #20 received at 854821@bugs.debian.org (full text, mbox, reply):

From: Nish Aravamudan <nish.aravamudan@canonical.com>
To: Aurelien Jarno <aurelien@aurel32.net>
Cc: 854821@bugs.debian.org
Subject: Re: Bug#854821: Reverting 29be63fd restores the old behavior
Date: Tue, 21 Feb 2017 17:38:03 -0800
On 14.02.2017 [20:30:52 +0100], Aurelien Jarno wrote:
> On 2017-02-10 11:50, Nish Aravamudan wrote:
> > Just ran a quick test using a PPA build of glibc with 29be63fd
> > ("debian/patches/localedata/locale-C.diff: switch back transliterations
> > to combining. Closes: #840199" [0] reverted and the test passes in a
> > 17.04 (Zesty) container again.
> 
> This change is intentional, and was done to revert an unintentional
> change (see #840199). Now the behaviour is consistent between jessie
> and stretch/sid.

I understand that, the above revert was mostly informational. Reading
the other bug, I see the reasoning behind not changing behavior. But it
seems like this also changes behavior, even if only within Unstable, and
needs some follow-up as phpmyadmin in Unstable will fail to rebuild
(verified just now in the chroot).

Given that this appears in an upstream test that is making an assumption
about \\TRANSLIT support from iconv (meaning that the behavior they are
testing for might be consistent across distributions), I'm not sure what
the best next step would be. Note that the phpmyadmin tests were only
relatively recently enabled at build-time, so that may be why this
wasn't noticed.

Any advice would be greatly appreciated!

Thanks,
Nish

-- 
Nishanth Aravamudan
Ubuntu Server
Canonical Ltd



Added indication that 854821 affects phpmyadmin Request was from Michal Čihař <nijel@debian.org> to control@bugs.debian.org. (Fri, 31 Mar 2017 18:42:03 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, GNU Libc Maintainers <debian-glibc@lists.debian.org>:
Bug#854821; Package libc-bin. (Fri, 31 Mar 2017 19:09:02 GMT) (full text, mbox, link).


Acknowledgement sent to Michal Čihař <michal@cihar.com>:
Extra info received and forwarded to list. Copy sent to GNU Libc Maintainers <debian-glibc@lists.debian.org>. (Fri, 31 Mar 2017 19:09:02 GMT) (full text, mbox, link).


Message #27 received at 854821@bugs.debian.org (full text, mbox, reply):

From: Michal Čihař <michal@cihar.com>
To: 854821@bugs.debian.org
Cc: Nish Aravamudan <nish.aravamudan@canonical.com>
Subject: Transliteration in C.UTF-8 locales
Date: Fri, 31 Mar 2017 21:05:35 +0200
[Message part 1 (text/plain, inline)]
Hi

I was just forced to look at this again (see #859219) and I think the
transliteration is not working as it should.

What is actually reason to make it behave differently on C.UTF-8 than
on other UTF-8 locales? Does it really have to be that either
transliteration of "ç" is broken or transliteration of "€" is broken
for this locale?

In most other UTF-8 locales (if not all, I've not tested this) both of
them work just fine:

$ echo "ça va €" | LC_ALL=en_GB.UTF-8  iconv -f UTF-8 -t
"ascii//TRANSLIT"
ca va EUR
$ echo "ça va €" | LC_ALL=de_DE.UTF-8  iconv -f UTF-8 -t
"ascii//TRANSLIT"
ca va EUR
$ echo "ça va €" | LC_ALL=cs_CZ.UTF-8  iconv -f UTF-8 -t
"ascii//TRANSLIT"
ca va EUR
$ echo "ça va €" | LC_ALL=C.UTF-8  iconv -f UTF-8 -t "ascii//TRANSLIT"
ca va iconv: illegal input sequence at position 7

Thanks for looking into this
-- 
	Michal Čihař | https://cihar.com/ | https://weblate.org/
[signature.asc (application/pgp-signature, inline)]

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed Jan 10 17:42:27 2018; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.