Debian Bug report logs - #555331
[col] improperly fails with Invalid or incomplete multibyte or wide character

version graph

Package: man-db; Maintainer for man-db is Colin Watson <cjwatson@debian.org>; Source for man-db is src:man-db (PTS, buildd, popcon).

Reported by: Raphael Hertzog <hertzog@debian.org>

Date: Mon, 9 Nov 2009 12:03:02 UTC

Severity: serious

Tags: fixed-upstream

Found in version man-db/2.5.6-3

Fixed in version man-db/2.5.6-4

Done: Colin Watson <cjwatson@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, lintian@packages.debian.org, man-db@packages.debian.org, Debian Bsdmainutils Team <pkg-bsdmainutils@teams.debian.net>:
Bug#555331; Package bsdmainutils. (Mon, 09 Nov 2009 12:03:05 GMT) (full text, mbox, link).


Acknowledgement sent to Raphael Hertzog <hertzog@debian.org>:
New Bug report received and forwarded. Copy sent to lintian@packages.debian.org, man-db@packages.debian.org, Debian Bsdmainutils Team <pkg-bsdmainutils@teams.debian.net>. (Mon, 09 Nov 2009 12:03:05 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Raphael Hertzog <hertzog@debian.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: [col] improperly fails with Invalid or incomplete multibyte or wide character
Date: Mon, 9 Nov 2009 12:48:03 +0100
Package: bsdmainutils
Version: 8.0.1
Severity: serious

Since today I gets lots of lintian warnings (manpage-has-errors-from-man)
on my dpkg builds because col fails with:
col: Invalid or incomplete multibyte or wide character

You can reproduce it by doing this:
LANG=C man --warnings -E UTF-8 -l /usr/share/man/man8/update-alternatives.8.gz >/dev/null

I don't know if it's col's fault or if it's man-db that does not use col
properly but since col changed recently (and not man-db), I filed the bug
against col. Note that dropping LANG=C makes the warning go away so it's
most certainly locale related. Using any other locale seems to work, even
one that is not UTF-8.

Severity serious to avoid propagation to testing until we know more on the
nature of the problem. 

Cheers,

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (150, 'experimental')
Architecture: i386 (x86_64)

Kernel: Linux 2.6.30-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages bsdmainutils depends on:
ii  bsdutils                  1:2.16.1-4     Basic utilities from 4.4BSD-Lite
ii  debianutils               3.2.1          Miscellaneous utilities specific t
ii  libc6                     2.10.1-5       GNU C Library: Shared libraries
ii  libncurses5               5.7+20090803-2 shared libraries for terminal hand

bsdmainutils recommends no packages.

Versions of packages bsdmainutils suggests:
ii  cpp                           4:4.3.4-1  The GNU C preprocessor (cpp)
pn  vacation                      <none>     (no description available)
ii  wamerican [wordlist]          6-3        American English dictionary words 
ii  wfrench [wordlist]            1.2.3-7    French dictionary words for /usr/s
ii  whois                         4.7.36     an intelligent whois client

-- no debconf information

-- 
Raphaël Hertzog




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Bsdmainutils Team <pkg-bsdmainutils@teams.debian.net>:
Bug#555331; Package bsdmainutils. (Mon, 09 Nov 2009 14:33:09 GMT) (full text, mbox, link).


Acknowledgement sent to Michael Meskes <meskes@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Bsdmainutils Team <pkg-bsdmainutils@teams.debian.net>. (Mon, 09 Nov 2009 14:33:09 GMT) (full text, mbox, link).


Message #10 received at 555331@bugs.debian.org (full text, mbox, reply):

From: Michael Meskes <meskes@debian.org>
To: Raphael Hertzog <hertzog@debian.org>, 555331@bugs.debian.org
Subject: Re: Bug#555331: [col] improperly fails with Invalid or incomplete multibyte or wide character
Date: Mon, 9 Nov 2009 15:03:57 +0100
On Mon, Nov 09, 2009 at 12:48:03PM +0100, Raphael Hertzog wrote:
> I don't know if it's col's fault or if it's man-db that does not use col
> properly but since col changed recently (and not man-db), I filed the bug
> against col. Note that dropping LANG=C makes the warning go away so it's
> most certainly locale related. Using any other locale seems to work, even
> one that is not UTF-8.

Please see #555330 for some details as I already saw the same thing. What
happens is that col sets the locale accordingly (to C) and then reads the
document using getwchar(). This operation returns the error you mentioned upon
reading the UTF-8 hyphen (e2 80 90). To me this doesn't look like a bug in col,
but an incorrect call from lintian as man is asked to produce UTF-8 encoding
while col isn't switched to it. Apparently the C locale does not define the
encoding.

Michael

-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ: 179140304, AIM/Yahoo/Skype: michaelmeskes, Jabber: meskes@jabber.org
VfL Borussia! Forca Barca! Go SF 49ers! Use: Debian GNU/Linux, PostgreSQL




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Bsdmainutils Team <pkg-bsdmainutils@teams.debian.net>:
Bug#555331; Package bsdmainutils. (Mon, 09 Nov 2009 15:36:05 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Bsdmainutils Team <pkg-bsdmainutils@teams.debian.net>. (Mon, 09 Nov 2009 15:36:06 GMT) (full text, mbox, link).


Message #15 received at 555331@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: Raphael Hertzog <hertzog@debian.org>, 555331@bugs.debian.org
Subject: Re: Bug#555331: [col] improperly fails with Invalid or incomplete multibyte or wide character
Date: Mon, 9 Nov 2009 15:15:02 +0000
On Mon, Nov 09, 2009 at 12:48:03PM +0100, Raphael Hertzog wrote:
> Package: bsdmainutils
> Version: 8.0.1
> Severity: serious
> 
> Since today I gets lots of lintian warnings (manpage-has-errors-from-man)
> on my dpkg builds because col fails with:
> col: Invalid or incomplete multibyte or wide character
> 
> You can reproduce it by doing this:
> LANG=C man --warnings -E UTF-8 -l /usr/share/man/man8/update-alternatives.8.gz >/dev/null
> 
> I don't know if it's col's fault or if it's man-db that does not use col
> properly but since col changed recently (and not man-db), I filed the bug
> against col. Note that dropping LANG=C makes the warning go away so it's
> most certainly locale related. Using any other locale seems to work, even
> one that is not UTF-8.
> 
> Severity serious to avoid propagation to testing until we know more on the
> nature of the problem. 

This bug is somewhere in the intersection of bsdmainutils, man-db,
lintian, and locales. Have fun. :-)

The proximate cause is that man uses -Tutf8 and thus outputs UTF-8
hyphens even under LANG=C (compare #547695), and that confuses col now
that it knows about the encoding of its input data.

However, the upstream patch referred to in #547695 is not sufficient
here. lintian uses the '-E UTF-8' option, which forces man to use UTF-8,
overriding the default. This used to work fine when col was dumb; now
that it's smart, things are a bit more problematic. The reason that
lintian does this is that it needs to force UTF-8 output somehow or else
CJK manual pages tend not to work properly, but there is no UTF-8 locale
that's guaranteed to be available on all systems.

In the short term, I think the best approach would be for man to set
LC_CTYPE to some appropriate locale that matches the encoding requested
by -E while running col. I'll see if I can arrange for this. However,
such a locale is not actually guaranteed to exist. Perhaps lintian needs
to generate a UTF-8 locale if it can't find one otherwise, a bit like
the hack in installation-locale; or perhaps we should just make sure
that there's always a C.UTF-8 locale on the system, which could be used
to get UTF-8 character type semantics without implying a particular
language or country.

-- 
Colin Watson                                       [cjwatson@debian.org]




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Bsdmainutils Team <pkg-bsdmainutils@teams.debian.net>:
Bug#555331; Package bsdmainutils. (Mon, 09 Nov 2009 16:42:10 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Bsdmainutils Team <pkg-bsdmainutils@teams.debian.net>. (Mon, 09 Nov 2009 16:42:10 GMT) (full text, mbox, link).


Message #20 received at 555331@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: Raphael Hertzog <hertzog@debian.org>, 555331@bugs.debian.org
Subject: Re: Bug#555331: [col] improperly fails with Invalid or incomplete multibyte or wide character
Date: Mon, 9 Nov 2009 16:35:12 +0000
reassign 555331 man-db 2.5.6-3
user man-db@packages.debian.org
usertags 555331 target-2.5.7
tags 555331 fixed-upstream
clone 555331 -1
reassign -1 lintian 2.2.17
retitle -1 lintian: ensure that there's always a UTF-8 locale for use when running man?
severity -1 wishlist
thanks

On Mon, Nov 09, 2009 at 03:15:02PM +0000, Colin Watson wrote:
> In the short term, I think the best approach would be for man to set
> LC_CTYPE to some appropriate locale that matches the encoding requested
> by -E while running col. I'll see if I can arrange for this.

Fixed upstream, so I'm going to claim this as a man-db bug:

Mon Nov  9 16:27:44 GMT 2009  Colin Watson  <cjwatson@debian.org>

        * src/encodings.c (find_charset_locale): New function.
        * src/encodings.h (find_charset_locale): Add prototype.
        * src/man.c (make_roff_command): When invoking col, ensure that
          LC_CTYPE is set to an appropriate locale for the selected
          character set (Debian bug #555331).
        * NEWS: Document this.

> However, such a locale is not actually guaranteed to exist. Perhaps
> lintian needs to generate a UTF-8 locale if it can't find one
> otherwise, a bit like the hack in installation-locale; or perhaps we
> should just make sure that there's always a C.UTF-8 locale on the
> system, which could be used to get UTF-8 character type semantics
> without implying a particular language or country.

I've cloned a bug for this.

-- 
Colin Watson                                       [cjwatson@debian.org]




Bug reassigned from package 'bsdmainutils' to 'man-db'. Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Mon, 09 Nov 2009 16:57:13 GMT) (full text, mbox, link).


Bug No longer marked as found in versions bsdmainutils/8.0.1. Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Mon, 09 Nov 2009 16:57:13 GMT) (full text, mbox, link).


Bug Marked as found in versions man-db/2.5.6-3. Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Mon, 09 Nov 2009 16:57:14 GMT) (full text, mbox, link).


Added tag(s) fixed-upstream. Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Mon, 09 Nov 2009 16:57:15 GMT) (full text, mbox, link).


Bug 555331 cloned as bug 555408. Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Mon, 09 Nov 2009 16:57:16 GMT) (full text, mbox, link).


Reply sent to Colin Watson <cjwatson@debian.org>:
You have taken responsibility. (Tue, 10 Nov 2009 12:36:12 GMT) (full text, mbox, link).


Notification sent to Raphael Hertzog <hertzog@debian.org>:
Bug acknowledged by developer. (Tue, 10 Nov 2009 12:36:12 GMT) (full text, mbox, link).


Message #35 received at 555331-close@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: 555331-close@bugs.debian.org
Subject: Bug#555331: fixed in man-db 2.5.6-4
Date: Tue, 10 Nov 2009 12:33:42 +0000
Source: man-db
Source-Version: 2.5.6-4

We believe that the bug you reported is fixed in the latest version of
man-db, which is due to be installed in the Debian FTP archive:

man-db_2.5.6-4.diff.gz
  to main/m/man-db/man-db_2.5.6-4.diff.gz
man-db_2.5.6-4.dsc
  to main/m/man-db/man-db_2.5.6-4.dsc
man-db_2.5.6-4_i386.deb
  to main/m/man-db/man-db_2.5.6-4_i386.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 555331@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Colin Watson <cjwatson@debian.org> (supplier of updated man-db package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.8
Date: Tue, 10 Nov 2009 11:58:25 +0000
Source: man-db
Binary: man-db
Architecture: source i386
Version: 2.5.6-4
Distribution: unstable
Urgency: low
Maintainer: Colin Watson <cjwatson@debian.org>
Changed-By: Colin Watson <cjwatson@debian.org>
Description: 
 man-db     - on-line manual pager
Closes: 547695 553623 554914 555331
Changes: 
 man-db (2.5.6-4) unstable; urgency=low
 .
   * Backport from trunk:
     - If the locale encoding is ASCII, then use the ascii device even if
       preconv is available; it will do a better job than producing UTF-8
       output and then recoding that to ASCII (closes: #547695).
     - Include <unistd.h> in src/encodings.c for dup and STDIN_FILENO
       (closes: #553623).
     - When invoking col, ensure that LC_CTYPE is set to an appropriate
       locale for the selected character set (closes: #555331).
   * Add man-db/auto-update debconf template, which may be preseeded to false
     to disable rebuilding the database when man-db is triggered (closes:
     #554914).
Checksums-Sha1: 
 908e668f6580e03e10af58c8e22fe98b5e6ce05c 1090 man-db_2.5.6-4.dsc
 f80c80d65f5286188222807f0aa79b9916dee98e 67315 man-db_2.5.6-4.diff.gz
 9795b9780522a5a23e17105ecc66dc2c609f5d68 1176396 man-db_2.5.6-4_i386.deb
Checksums-Sha256: 
 a2baa707bb6296e94ede4adc4fd556051fab07831fa5ab28b65ebc9f790271aa 1090 man-db_2.5.6-4.dsc
 0f2d7d9492d0dcd308b2f3f346cfbb8b9eef68cdb0f52203cac833b9f83e383f 67315 man-db_2.5.6-4.diff.gz
 a563ab65f8a635df85a0cd9a93a37aa68b58c25c7d00611021998180f9dc548e 1176396 man-db_2.5.6-4_i386.deb
Files: 
 d5bf3146bade6d031fd6d245b7383312 1090 doc important man-db_2.5.6-4.dsc
 86cf07f2efb8528d3a65ac73d43663dc 67315 doc important man-db_2.5.6-4.diff.gz
 ece6139ef95a3461fce49e9e1ac4113d 1176396 doc important man-db_2.5.6-4_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Colin Watson <cjwatson@debian.org> -- Debian developer

iD8DBQFK+VY89t0zAhD6TNERAttLAJ9loPO1pnWEcTqrrgnbLAtrxJ1L8wCfR0VP
ttFRsYCG9eHFO0wfswY+d3Q=
=f8p/
-----END PGP SIGNATURE-----





Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#555331; Package man-db. (Sat, 14 Nov 2009 13:57:03 GMT) (full text, mbox, link).


Acknowledgement sent to Paul Wise <pabs@debian.org>:
Extra info received and forwarded to list. Copy sent to Colin Watson <cjwatson@debian.org>. (Sat, 14 Nov 2009 13:57:03 GMT) (full text, mbox, link).


Message #40 received at 555331@bugs.debian.org (full text, mbox, reply):

From: Paul Wise <pabs@debian.org>
To: 555331@bugs.debian.org
Cc: control <control@bugs.debian.org>
Subject: man-db: 555331: needs to depend on the locales package?
Date: Sat, 14 Nov 2009 21:53:37 +0800
[Message part 1 (text/plain, inline)]
usertag 555331 + bittenby
found 555331 man-db/2.5.6-4
thanks

In an up-to-date cowbuilder chroot I still get this issue:

(cowbuilder)root@chianamo:~# LANG=C man --warnings -E UTF-8 -l /usr/share/man/man8/update-alternatives.8.gz >/dev/null
col: Invalid or incomplete multibyte or wide character
(cowbuilder)root@chianamo:~# apt-cache policy man-db
man-db:
  Installed: 2.5.6-4
  Candidate: 2.5.6-4
  Version table:
 *** 2.5.6-4 0
        500 ftp://xxxxxxxxxxxxxxx sid/main Packages
        100 /var/lib/dpkg/status

Looking at the patch, I thought it would be because the locales package
is not installed and thus /usr/share/i18n/SUPPORTED is not available.
Unfortunately, installing locales does not silence the warning:

(cowbuilder)root@chianamo:~# apt-get install locales
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  locales
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 4749kB of archives.
After this operation, 12.9MB of additional disk space will be used.
Get:1 ftp://xxxxxxxxxxxxxxxxx sid/main locales 2.10.1-7 [4749kB]
Fetched 4749kB in 35s (133kB/s)                                                                                                                                                    
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously deselected package locales.
(Reading database ... 15286 files and directories currently installed.)
Unpacking locales (from .../locales_2.10.1-7_all.deb) ...
Processing triggers for man-db ...
Setting up locales (2.10.1-7) ...
Generating locales (this might take a while)...
Generation complete.
(cowbuilder)root@chianamo:~# LANG=C man --warnings -E UTF-8 -l /usr/share/man/man8/update-alternatives.8.gz >/dev/null
col: Invalid or incomplete multibyte or wide character

This is on amd64 in a sid cowbuilder chroot.

-- 
bye,
pabs

http://wiki.debian.org/PaulWise
[signature.asc (application/pgp-signature, inline)]

Bug Marked as found in versions man-db/2.5.6-4; no longer marked as fixed in versions man-db/2.5.6-4 and reopened. Request was from Paul Wise <pabs@debian.org> to control@bugs.debian.org. (Sat, 14 Nov 2009 13:57:04 GMT) (full text, mbox, link).


Reply sent to Colin Watson <cjwatson@debian.org>:
You have taken responsibility. (Sun, 15 Nov 2009 12:57:06 GMT) (full text, mbox, link).


Notification sent to Raphael Hertzog <hertzog@debian.org>:
Bug acknowledged by developer. (Sun, 15 Nov 2009 12:57:06 GMT) (full text, mbox, link).


Message #47 received at 555331-done@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: Paul Wise <pabs@debian.org>, 555331-done@bugs.debian.org
Subject: Re: Bug#555331: man-db: 555331: needs to depend on the locales package?
Date: Sun, 15 Nov 2009 12:53:22 +0000
Source: man-db
Source-Version: 2.5.6-4

On Sat, Nov 14, 2009 at 09:53:37PM +0800, Paul Wise wrote:
> In an up-to-date cowbuilder chroot I still get this issue:
> 
> (cowbuilder)root@chianamo:~# LANG=C man --warnings -E UTF-8 -l /usr/share/man/man8/update-alternatives.8.gz >/dev/null
> col: Invalid or incomplete multibyte or wide character
> (cowbuilder)root@chianamo:~# apt-cache policy man-db
> man-db:
>   Installed: 2.5.6-4
>   Candidate: 2.5.6-4
>   Version table:
>  *** 2.5.6-4 0
>         500 ftp://xxxxxxxxxxxxxxx sid/main Packages
>         100 /var/lib/dpkg/status
> 
> Looking at the patch, I thought it would be because the locales package
> is not installed and thus /usr/share/i18n/SUPPORTED is not available.
> Unfortunately, installing locales does not silence the warning:

You've hit the corner case which I already cloned as bug 555408. I don't
think we need to keep this bug open for that as well.

As I said in a previous message:

  However, such a locale is not actually guaranteed to exist. Perhaps
  lintian needs to generate a UTF-8 locale if it can't find one
  otherwise, a bit like the hack in installation-locale; or perhaps we
  should just make sure that there's always a C.UTF-8 locale on the
  system, which could be used to get UTF-8 character type semantics
  without implying a particular language or country.

If you generate some random UTF-8 locale (uncomment it in
/etc/locale.gen and run 'sudo locale-gen'), then that will work around
the problem for you.

Regards,

-- 
Colin Watson                                       [cjwatson@debian.org]




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Tue, 15 Dec 2009 07:36:26 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Tue Jan 9 17:04:17 2018; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.