Debian Bug report logs - #426362
w3mman2html.cgi eats section headings

version graph

Package: w3m; Maintainer for w3m is Tatsuya Kinoshita <tats@debian.org>; Source for w3m is src:w3m (PTS, buildd, popcon).

Reported by: Miciah Dashiel Butler Masters <miciah.masters@gmail.com>

Date: Mon, 28 May 2007 08:54:06 UTC

Severity: normal

Tags: patch

Found in version w3m/0.5.1-5.1

Fixed in version 0.5.2-3

Done: Tatsuya Kinoshita <tats@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Fumitoshi UKAI <ukai@debian.or.jp>:
Bug#426362; Package w3m. (full text, mbox, link).


Acknowledgement sent to Miciah Dashiel Butler Masters <miciah.masters@gmail.com>:
New Bug report received and forwarded. Copy sent to Fumitoshi UKAI <ukai@debian.or.jp>. (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Miciah Dashiel Butler Masters <miciah.masters@gmail.com>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: w3mman2html.cgi eats section headings
Date: Mon, 28 May 2007 08:52:45 +0000
Package: w3m
Version: 0.5.1-5.1

w3mman2html.cgi drops section headerings.  A frequent example is the
"SEE ALSO" section; e.g., view the manpage for ls(1) with the man
command and notice that the "SEE ALSO" heading is there.  Then view the
same manpage with w3mman2html.cgi and notice that the heading is absent.

The following works around the problem; I hope that it will help
in developing a proper fix:

@@ -111,7 +111,7 @@
     next;
   } elsif (!/\010/ && /^$space[\w\200-\377].*\s\S/o) { # delete footer
     $blank = -1;
-    next;
+    #next
   }
   if ($SQUEEZE) {
     if (/^\s*$/) {


Much love,

-- 
Miciah Masters <miciah.masters@gmail.com> / <mdm0304@ecu.edu> / <miciah@myrealbox.com>



Information forwarded to debian-bugs-dist@lists.debian.org, Fumitoshi UKAI <ukai@debian.or.jp>:
Bug#426362; Package w3m. (Tue, 30 Jun 2009 11:09:04 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@ubuntu.com>:
Extra info received and forwarded to list. Copy sent to Fumitoshi UKAI <ukai@debian.or.jp>. (Tue, 30 Jun 2009 11:09:04 GMT) (full text, mbox, link).


Message #10 received at 426362@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@ubuntu.com>
To: Stepan Golosunov <stepan@golosunov.pp.ru>
Cc: 325699@bugs.debian.org, Miciah Dashiel Butler Masters <miciah.masters@gmail.com>, 426362@bugs.debian.org
Subject: Re: w3mman eats underlined characters
Date: Tue, 30 Jun 2009 12:06:44 +0100
[Message part 1 (text/plain, inline)]
tags 325699 patch
tags 426362 patch
user ubuntu-devel@lists.ubuntu.com
usertags 325699 ubuntu-patch karmic
usertags 426362 ubuntu-patch karmic
thanks

On Tue, Aug 30, 2005 at 02:33:43PM +0500, Stepan Golosunov wrote:
> Package: w3m
> Version: 0.5.1-3
> Severity: normal
> 
> w3mman does not display underlined characters in Russian man pages in
> utf-8 terminals; man shows them correctly.
> 
> Output of
> LANG=ru_RU.UTF-8 lxterm -e w3mman dpkg
> and
> LANG=ru_RU.UTF-8 lxterm -e man dpkg
> attached.

w3mman should set MAN_KEEP_FORMATTING=1 in the environment to instruct
man not to invoke col to strip away formatting characters, which it
normally does by default when writing to a pipe. I added this feature to
man-db with the express intention that it should be used by programs
like pinfo and w3mman that invoke man and can do something with its
formatted output. Patch attached.

Doing this looks as though it should fix #426362 too.

Thanks,

-- 
Colin Watson                                       [cjwatson@ubuntu.com]
[w3m.325699.diff (text/x-diff, attachment)]

Tags added: patch Request was from Colin Watson <cjwatson@ubuntu.com> to control@bugs.debian.org. (Tue, 30 Jun 2009 11:09:08 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Fumitoshi UKAI <ukai@debian.or.jp>:
Bug#426362; Package w3m. (Wed, 01 Jul 2009 18:06:03 GMT) (full text, mbox, link).


Acknowledgement sent to Stepan Golosunov <stepan@golosunov.pp.ru>:
Extra info received and forwarded to list. Copy sent to Fumitoshi UKAI <ukai@debian.or.jp>. (Wed, 01 Jul 2009 18:06:03 GMT) (full text, mbox, link).


Message #17 received at 426362@bugs.debian.org (full text, mbox, reply):

From: Stepan Golosunov <stepan@golosunov.pp.ru>
To: Colin Watson <cjwatson@ubuntu.com>
Cc: 325699@bugs.debian.org, Miciah Dashiel Butler Masters <miciah.masters@gmail.com>, 426362@bugs.debian.org
Subject: Re: w3mman eats underlined characters
Date: Wed, 1 Jul 2009 23:04:03 +0500
30.06.2009 в 12:06:44 +0100 Colin Watson написал:
> w3mman should set MAN_KEEP_FORMATTING=1 in the environment to instruct
> man not to invoke col to strip away formatting characters, which it
> normally does by default when writing to a pipe. I added this feature to
> man-db with the express intention that it should be used by programs
> like pinfo and w3mman that invoke man and can do something with its
> formatted output. Patch attached.

Actually, w3mman in lenny shows underlined characters *unless* called
with MAN_KEEP_FORMATTING=1 (they just aren't underlined).



But it hides non-ascii section headings when called *without*
MAN_KEEP_FORMATTING=1.
And this seems to be because man in this case produces something
bogus.


This is the first section heading (ИМЯ), generated by
"MAN_KEEP_FORMATTING=1 man cp|hd" (in ru_RU.UTF-8 locale):
d0 98 08 d0 98 d0 9c 08 d0 9c d0 af 08 d0 af 0a


But "man cp|hd" generates invalid utf-8:
d0 d0 98 d0 d0 9c d0 d0 af 0a

It's supposed to be as in "echo ИМЯ|hd":
d0 98 d0 9c d0 af 0a




Information forwarded to debian-bugs-dist@lists.debian.org, Fumitoshi UKAI <ukai@debian.or.jp>:
Bug#426362; Package w3m. (Wed, 01 Jul 2009 22:48:05 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Fumitoshi UKAI <ukai@debian.or.jp>. (Wed, 01 Jul 2009 22:48:05 GMT) (full text, mbox, link).


Message #22 received at 426362@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: Stepan Golosunov <stepan@golosunov.pp.ru>
Cc: 325699@bugs.debian.org, Miciah Dashiel Butler Masters <miciah.masters@gmail.com>, 426362@bugs.debian.org
Subject: Re: w3mman eats underlined characters
Date: Wed, 1 Jul 2009 23:44:23 +0100
On Wed, Jul 01, 2009 at 11:04:03PM +0500, Stepan Golosunov wrote:
> 30.06.2009 в 12:06:44 +0100 Colin Watson написал:
> > w3mman should set MAN_KEEP_FORMATTING=1 in the environment to instruct
> > man not to invoke col to strip away formatting characters, which it
> > normally does by default when writing to a pipe. I added this feature to
> > man-db with the express intention that it should be used by programs
> > like pinfo and w3mman that invoke man and can do something with its
> > formatted output. Patch attached.
> 
> Actually, w3mman in lenny shows underlined characters *unless* called
> with MAN_KEEP_FORMATTING=1 (they just aren't underlined).

Assuming that you're referring to the same test case (LC_ALL=ru_RU.UTF-8
w3mman cp), this appears to be a separate bug; w3mman2html.cgi is
failing to deal with the sequence "_" BACKSPACE <UTF-8 character>,
presumably stripping off the first byte of the UTF-8 character and
attempting to underline that. I imagine it has the same trouble with
bold (<UTF-8 character> BACKSPACE <same UTF-8 character>).

This should be straightforward enough to fix if you have the patience to
dig through the relevant regular expressions. :-) It clearly ought to be
fixed.

> But it hides non-ascii section headings when called *without*
> MAN_KEEP_FORMATTING=1.
> And this seems to be because man in this case produces something
> bogus.
> 
> 
> This is the first section heading (ИМЯ), generated by
> "MAN_KEEP_FORMATTING=1 man cp|hd" (in ru_RU.UTF-8 locale):
> d0 98 08 d0 98 d0 9c 08 d0 9c d0 af 08 d0 af 0a
> 
> 
> But "man cp|hd" generates invalid utf-8:
> d0 d0 98 d0 d0 9c d0 d0 af 0a
> 
> It's supposed to be as in "echo ИМЯ|hd":
> d0 98 d0 9c d0 af 0a

Sure, I'm entirely familiar with that symptom, which is actually a col
bug, namely #319952. The point of MAN_KEEP_FORMATTING=1 is to skip the
call to col, thus as a side-effect dodging that bug.

For a program that handles the formatting typically emitted by groff, it
is unambiguously correct to set MAN_KEEP_FORMATTING=1 to skip the col
invocation. It hadn't occurred to me that w3mman would handle UTF-8
characters wrongly in this mode, but that should be easy enough to fix.

-- 
Colin Watson                                       [cjwatson@debian.org]




Reply sent to Tatsuya Kinoshita <tats@debian.org>:
You have taken responsibility. (Wed, 07 Jul 2010 14:42:03 GMT) (full text, mbox, link).


Notification sent to Miciah Dashiel Butler Masters <miciah.masters@gmail.com>:
Bug acknowledged by developer. (Wed, 07 Jul 2010 14:42:03 GMT) (full text, mbox, link).


Message #27 received at 426362-done@bugs.debian.org (full text, mbox, reply):

From: Tatsuya Kinoshita <tats@debian.org>
To: 426362-done@bugs.debian.org
Subject: Re: Bug#426362: w3mman eats underlined characters
Date: Wed, 07 Jul 2010 23:38:25 +0900 (JST)
[Message part 1 (text/plain, inline)]
Version: 0.5.2-3

Already fixed.

Thanks,
--
Tatsuya Kinoshita
[Message part 2 (application/pgp-signature, inline)]

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Thu, 05 Aug 2010 07:40:21 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Mon Jun 5 03:08:12 2023; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.