Debian Bug report logs - #555330
[checks/manpages] man syntax check requires a UTF-8 locale for col

version graph

Package: lintian; Maintainer for lintian is Debian Lintian Maintainers <lintian-maint@debian.org>; Source for lintian is src:lintian.

Reported by: Michael Meskes <meskes@debian.org>

Date: Mon, 9 Nov 2009 11:57:01 UTC

Severity: minor

Merged with 555408, 566121

Found in versions lintian/2.2.17, lintian/2.3.2

Fixed in version lintian/2.3.3

Done: Raphael Geissert <geissert@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#555330; Package lintian. (Mon, 09 Nov 2009 11:57:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michael Meskes <meskes@debian.org>:
New Bug report received and forwarded. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>. (Mon, 09 Nov 2009 11:57:04 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Michael Meskes <meskes@debian.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: lintian: manpage-has-errors-from-man does not work correctly
Date: Mon, 09 Nov 2009 12:54:54 +0100
Package: lintian
Version: 2.2.17
Severity: normal

When running under locale C the test does not work correctly:

michael@feivel:~$ LANG=C man --warnings -E UTF-8 -l /usr/share/man/man8/acpid.8.gz >/dev/null
col: Invalid or incomplete multibyte or wide character
michael@feivel:~$ LANG=de_DE.UTF-8 man --warnings -E UTF-8 -l /usr/share/man/man8/acpid.8.gz >/dev/null
michael@feivel:~$

The reason seems to be that man pipes its output through col when not
displaying on a terminal and col setting the local to C which apparently does
not mean UTF-8. Therefore the unicode hyphen (e2 80 90) triggers the above
mentioned error message in col's file io.

Miichael

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (101, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.31-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages lintian depends on:
ii  binutils               2.20-2            The GNU assembler, linker and bina
ii  diffstat               1.47-1            produces graph of changes introduc
ii  dpkg-dev               1.15.4.1          Debian package development tools
ii  file                   5.03-2            Determines file type using "magic"
ii  gettext                0.17-8            GNU Internationalization utilities
ii  intltool-debian        0.35.0+20060710.1 Help i18n of RFC822 compliant conf
ii  libapt-pkg-perl        0.1.24            Perl interface to libapt-pkg
ii  libclass-accessor-perl 0.34-1            Perl module that automatically gen
ii  libipc-run-perl        0.84-1            Perl module for running processes
ii  libparse-debianchangel 1.1.1-2           parse Debian changelogs and output
ii  libtimedate-perl       1.1900-1          Time and date functions for Perl
ii  liburi-perl            1.37+dfsg-1       Manipulates and accesses URI strin
ii  man-db                 2.5.6-3           on-line manual pager
ii  perl [libdigest-sha-pe 5.10.1-7          Larry Wall's Practical Extraction 

lintian recommends no packages.

Versions of packages lintian suggests:
pn  binutils-multiarch            <none>     (no description available)
ii  libtext-template-perl         1.45-1     Text::Template perl module
ii  man-db                        2.5.6-3    on-line manual pager

-- no debconf information




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#555330; Package lintian. (Mon, 09 Nov 2009 12:57:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Adam D. Barratt" <adam@adam-barratt.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>. (Mon, 09 Nov 2009 12:57:03 GMT) Full text and rfc822 format available.

Message #10 received at 555330@bugs.debian.org (full text, mbox):

From: "Adam D. Barratt" <adam@adam-barratt.org.uk>
To: "Michael Meskes" <meskes@debian.org>, <555330@bugs.debian.org>
Subject: Re: Bug#555330: lintian: manpage-has-errors-from-man does not work correctly
Date: Mon, 9 Nov 2009 12:43:35 -0000
Michael Meskes wrote:
> When running under locale C the test does not work correctly:
>
> michael@feivel:~$ LANG=C man --warnings -E UTF-8 -l
> /usr/share/man/man8/acpid.8.gz >/dev/null
> col: Invalid or incomplete multibyte or wide character
[...]
> The reason seems to be that man pipes its output through col when not
> displaying on a terminal and col setting the local to C which
> apparently does not mean UTF-8. Therefore the unicode hyphen (e2 80 90) 
> triggers the
> above mentioned error message in col's file io.

The above usage has, however, previously worked without issues.  col was 
recently upgraded to a new upstream version, which seems to be when the 
issue occurred.

I'll leave this bug filed against Lintian for the moment until the situation 
is clearer; #555331 is a related bug against bsdmainutils.

Regards,

Adam 




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#555330; Package lintian. (Mon, 09 Nov 2009 12:57:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Adam D. Barratt" <adam@adam-barratt.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>. (Mon, 09 Nov 2009 12:57:05 GMT) Full text and rfc822 format available.

Message #15 received at 555330@bugs.debian.org (full text, mbox):

From: "Adam D. Barratt" <adam@adam-barratt.org.uk>
To: "Michael Meskes" <meskes@debian.org>, <555330@bugs.debian.org>
Subject: Re: Bug#555330: lintian: manpage-has-errors-from-man does not work correctly
Date: Mon, 9 Nov 2009 12:45:38 -0000
Adam D. Barratt wrote:
> Michael Meskes wrote:
>> When running under locale C the test does not work correctly:
>>
>> michael@feivel:~$ LANG=C man --warnings -E UTF-8 -l
>> /usr/share/man/man8/acpid.8.gz >/dev/null
>> col: Invalid or incomplete multibyte or wide character
> [...]
>> The reason seems to be that man pipes its output through col when not
>> displaying on a terminal and col setting the local to C which
>> apparently does not mean UTF-8. Therefore the unicode hyphen (e2 80
>> 90) triggers the
>> above mentioned error message in col's file io.
>
> The above usage has, however, previously worked without issues.  col
> was recently upgraded to a new upstream version, which seems to be
> when the issue occurred.
>
> I'll leave this bug filed against Lintian for the moment until the
> situation is clearer; #555331 is a related bug against bsdmainutils.

Yes, I've just noticed who made the most recent bsdmainutils upload; mea 
culpa. :-) In any case, I have to admit I'm not sure exactly which part of 
the lintian -> man -> col chain is at fault here.

Adam 




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#555330; Package lintian. (Mon, 09 Nov 2009 14:27:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Adam D. Barratt" <adam@adam-barratt.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>. (Mon, 09 Nov 2009 14:27:03 GMT) Full text and rfc822 format available.

Message #20 received at 555330@bugs.debian.org (full text, mbox):

From: "Adam D. Barratt" <adam@adam-barratt.org.uk>
To: "Michael Meskes" <meskes@debian.org>
Cc: <555330@bugs.debian.org>
Subject: Re: Bug#555330: lintian: manpage-has-errors-from-man does not work correctly
Date: Mon, 9 Nov 2009 14:24:21 -0000
On Mon, 9 Nov 2009 15:06:35 +0100, Michael Meskes wrote:
> On Mon, Nov 09, 2009 at 12:45:38PM -0000, Adam D. Barratt wrote:
>> Yes, I've just noticed who made the most recent bsdmainutils upload;
>> mea culpa. :-) In any case, I have to admit I'm not sure exactly
>> which part of the lintian -> man -> col chain is at fault here.
>
> I think the call in lintian is as it asks man to produce UTF-8
> without telling col to accept UTF-8.

You'll have to excuse my ignorance here, but can lintian actually tell col 
to do that as part of the man call?

> Hmm, which of the bug reports do we use for discussion? :-)

I have a slight preference for using this report, as we've already started a 
discussion here and I automatically get any mail sent to it, but I'm open to 
using either.

Cheers,

Adam 




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#555330; Package lintian. (Mon, 09 Nov 2009 14:33:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michael Meskes <meskes@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>. (Mon, 09 Nov 2009 14:33:07 GMT) Full text and rfc822 format available.

Message #25 received at 555330@bugs.debian.org (full text, mbox):

From: Michael Meskes <meskes@debian.org>
To: "Adam D. Barratt" <adam@adam-barratt.org.uk>
Cc: Michael Meskes <meskes@debian.org>, 555330@bugs.debian.org
Subject: Re: Bug#555330: lintian: manpage-has-errors-from-man does not work correctly
Date: Mon, 9 Nov 2009 15:06:35 +0100
On Mon, Nov 09, 2009 at 12:45:38PM -0000, Adam D. Barratt wrote:
> Yes, I've just noticed who made the most recent bsdmainutils upload;
> mea culpa. :-) In any case, I have to admit I'm not sure exactly
> which part of the lintian -> man -> col chain is at fault here.

I think the call in lintian is as it asks man to produce UTF-8 without telling
col to accept UTF-8. 

Hmm, which of the bug reports do we use for discussion? :-)

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ: 179140304, AIM/Yahoo/Skype: michaelmeskes, Jabber: meskes@jabber.org
VfL Borussia! Forca Barca! Go SF 49ers! Use: Debian GNU/Linux, PostgreSQL




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#555330; Package lintian. (Mon, 09 Nov 2009 14:57:14 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michael Meskes <meskes@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>. (Mon, 09 Nov 2009 14:57:14 GMT) Full text and rfc822 format available.

Message #30 received at 555330@bugs.debian.org (full text, mbox):

From: Michael Meskes <meskes@debian.org>
To: "Adam D. Barratt" <adam@adam-barratt.org.uk>
Cc: Michael Meskes <meskes@debian.org>, 555330@bugs.debian.org
Subject: Re: Bug#555330: lintian: manpage-has-errors-from-man does not work correctly
Date: Mon, 9 Nov 2009 15:53:53 +0100
On Mon, Nov 09, 2009 at 02:24:21PM -0000, Adam D. Barratt wrote:
> You'll have to excuse my ignorance here, but can lintian actually
> tell col to do that as part of the man call?

Yes, by using a locale that sets the encoding to UTF-8. But I assume you didn't
mean that but would prefer a command line option. No, there is none.

> I have a slight preference for using this report, as we've already
> started a discussion here and I automatically get any mail sent to
> it, but I'm open to using either.

I'm involved in both anyway. :-)

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ: 179140304, AIM/Yahoo/Skype: michaelmeskes, Jabber: meskes@jabber.org
VfL Borussia! Forca Barca! Go SF 49ers! Use: Debian GNU/Linux, PostgreSQL




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#555330; Package lintian. (Mon, 09 Nov 2009 15:21:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Adam D. Barratt" <adam@adam-barratt.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>. (Mon, 09 Nov 2009 15:21:03 GMT) Full text and rfc822 format available.

Message #35 received at 555330@bugs.debian.org (full text, mbox):

From: "Adam D. Barratt" <adam@adam-barratt.org.uk>
To: "Michael Meskes" <meskes@debian.org>
Cc: <555330@bugs.debian.org>
Subject: Re: Bug#555330: lintian: manpage-has-errors-from-man does not work correctly
Date: Mon, 9 Nov 2009 15:19:34 -0000
Michael Meskes wrote:
> On Mon, Nov 09, 2009 at 02:24:21PM -0000, Adam D. Barratt wrote:
>> You'll have to excuse my ignorance here, but can lintian actually
>> tell col to do that as part of the man call?
>
> Yes, by using a locale that sets the encoding to UTF-8. But I assume
> you didn't mean that but would prefer a command line option. No,
> there is none.

Yeah, that would be better.  We're intentionally using LANG=C to avoid 
localisation issues with the output; obviously we can't assume that random 
Lintian users have any of the en_* locales installed so we can't force one 
of those either.

Colin Watson added the explicit UTF-8 encoding to avoid recoding localised 
manpages to ASCII and producing bogus warnings in the process; as he has 
both man-db and lintian hats I'm hoping he might have a cunning idea as to 
how to fix this.

Regards,

Adam 




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#555330; Package lintian. (Mon, 09 Nov 2009 15:42:11 GMT) Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>. (Mon, 09 Nov 2009 15:42:11 GMT) Full text and rfc822 format available.

Message #40 received at 555330@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: Raphael Hertzog <hertzog@debian.org>, 555331@bugs.debian.org
Subject: Re: Bug#555331: [col] improperly fails with Invalid or incomplete multibyte or wide character
Date: Mon, 9 Nov 2009 15:15:02 +0000
On Mon, Nov 09, 2009 at 12:48:03PM +0100, Raphael Hertzog wrote:
> Package: bsdmainutils
> Version: 8.0.1
> Severity: serious
> 
> Since today I gets lots of lintian warnings (manpage-has-errors-from-man)
> on my dpkg builds because col fails with:
> col: Invalid or incomplete multibyte or wide character
> 
> You can reproduce it by doing this:
> LANG=C man --warnings -E UTF-8 -l /usr/share/man/man8/update-alternatives.8.gz >/dev/null
> 
> I don't know if it's col's fault or if it's man-db that does not use col
> properly but since col changed recently (and not man-db), I filed the bug
> against col. Note that dropping LANG=C makes the warning go away so it's
> most certainly locale related. Using any other locale seems to work, even
> one that is not UTF-8.
> 
> Severity serious to avoid propagation to testing until we know more on the
> nature of the problem. 

This bug is somewhere in the intersection of bsdmainutils, man-db,
lintian, and locales. Have fun. :-)

The proximate cause is that man uses -Tutf8 and thus outputs UTF-8
hyphens even under LANG=C (compare #547695), and that confuses col now
that it knows about the encoding of its input data.

However, the upstream patch referred to in #547695 is not sufficient
here. lintian uses the '-E UTF-8' option, which forces man to use UTF-8,
overriding the default. This used to work fine when col was dumb; now
that it's smart, things are a bit more problematic. The reason that
lintian does this is that it needs to force UTF-8 output somehow or else
CJK manual pages tend not to work properly, but there is no UTF-8 locale
that's guaranteed to be available on all systems.

In the short term, I think the best approach would be for man to set
LC_CTYPE to some appropriate locale that matches the encoding requested
by -E while running col. I'll see if I can arrange for this. However,
such a locale is not actually guaranteed to exist. Perhaps lintian needs
to generate a UTF-8 locale if it can't find one otherwise, a bit like
the hack in installation-locale; or perhaps we should just make sure
that there's always a C.UTF-8 locale on the system, which could be used
to get UTF-8 character type semantics without implying a particular
language or country.

-- 
Colin Watson                                       [cjwatson@debian.org]





Changed Bug title to '[checks/manpages] man syntax check requires a UTF-8 locale for col' from 'lintian: manpage-has-errors-from-man does not work correctly' Request was from Russ Allbery <rra@debian.org> to control@bugs.debian.org. (Fri, 13 Nov 2009 06:03:07 GMT) Full text and rfc822 format available.

Severity set to 'minor' from 'normal' Request was from Russ Allbery <rra@debian.org> to control@bugs.debian.org. (Fri, 25 Dec 2009 20:21:08 GMT) Full text and rfc822 format available.

Merged 555330 555408. Request was from Russ Allbery <rra@debian.org> to control@bugs.debian.org. (Fri, 25 Dec 2009 20:21:10 GMT) Full text and rfc822 format available.

Added tag(s) pending. Request was from Russ Allbery <rra@debian.org> to control@bugs.debian.org. (Mon, 11 Jan 2010 05:51:03 GMT) Full text and rfc822 format available.

Forcibly Merged 555330 555408 566121. Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Thu, 21 Jan 2010 14:36:06 GMT) Full text and rfc822 format available.

Bug No longer marked as fixed in versions lintian/2.3.2 and reopened. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Fri, 29 Jan 2010 05:06:08 GMT) Full text and rfc822 format available.

Bug Marked as found in versions lintian/2.3.2. Request was from Ben Finney <ben+debian@benfinney.id.au> to control@bugs.debian.org. (Fri, 29 Jan 2010 05:06:11 GMT) Full text and rfc822 format available.

Added tag(s) pending. Request was from Raphael Geissert <geissert@debian.org> to control@bugs.debian.org. (Sun, 31 Jan 2010 07:57:04 GMT) Full text and rfc822 format available.

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Thu, 11 Mar 2010 07:28:02 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sat Apr 19 12:31:53 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.