Debian Bug report logs - #737416
lynx: "lynx -dump" treats input as iso-8859-1 rather than utf-8

version graph

Package: lynx; Maintainer for lynx is Atsuhito KOHDA <kohda@debian.org>; Source for lynx is src:lynx-cur.

Reported by: Eric Cooper <ecc@cooper-siegel.org>

Date: Sun, 2 Feb 2014 17:39:02 UTC

Severity: normal

Found in version lynx-cur/2.8.8pre3-1

Fixed in version lynx-cur/2.8.8pre4-1

Done: Atsuhito Kohda <kohda@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Atsuhito KOHDA <kohda@debian.org>:
Bug#737416; Package lynx. (Sun, 02 Feb 2014 17:39:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Cooper <ecc@cooper-siegel.org>:
New Bug report received and forwarded. Copy sent to Atsuhito KOHDA <kohda@debian.org>. (Sun, 02 Feb 2014 17:39:06 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Eric Cooper <ecc@cooper-siegel.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: lynx: "lynx -dump" treats input as iso-8859-1 rather than utf-8
Date: Sun, 02 Feb 2014 11:56:16 -0500
Package: lynx
Version: 2.8.8pre3-1
Severity: normal

If I run "lynx -dump" on this HTML:

    <html>
    <body>
    This ( ) is a UTF-8 unbreakable space.
    </body>
    </html>

I get this output:

   This (Â ) is a UTF-8 unbreakable space.

Note the "capital A with circumflex".  This seems to be because the C2
A0 sequence is being interpreted as two iso-8859-1 characters, rather
than a single utf-8 character.

If I add the "-assume_charset=utf8" option, it does what I expect, but
I believe that should be the default (especially since I have
LANG=en.utf8 as my locale).



Information forwarded to debian-bugs-dist@lists.debian.org, Atsuhito KOHDA <kohda@debian.org>:
Bug#737416; Package lynx. (Mon, 03 Feb 2014 01:45:12 GMT) Full text and rfc822 format available.

Acknowledgement sent to dickey@his.com:
Extra info received and forwarded to list. Copy sent to Atsuhito KOHDA <kohda@debian.org>. (Mon, 03 Feb 2014 01:45:12 GMT) Full text and rfc822 format available.

Message #10 received at 737416@bugs.debian.org (full text, mbox):

From: Thomas Dickey <dickey@his.com>
To: Eric Cooper <ecc@cooper-siegel.org>, 737416@bugs.debian.org
Cc: 737416-submitter@bugs.debian.org
Subject: Re: Bug#737416: lynx: "lynx -dump" treats input as iso-8859-1 rather than utf-8
Date: Sun, 02 Feb 2014 20:39:40 -0500
[Message part 1 (text/plain, inline)]
On Sun, Feb 02, 2014 at 11:56:16AM -0500, Eric Cooper wrote:
> Package: lynx
> Version: 2.8.8pre3-1
> Severity: normal
...
> If I add the "-assume_charset=utf8" option, it does what I expect, but
> I believe that should be the default (especially since I have
> LANG=en.utf8 as my locale).

something like that.  Offhand, I would have expected the locale-charset feature to
enable the behavior you're expecting.  I'm investigating to see why/why not.

If it's a simple change, I'll add that to pre.4

-- 
Thomas E. Dickey <dickey@invisible-island.net>
http://invisible-island.net
ftp://invisible-island.net
[signature.asc (application/pgp-signature, inline)]

Message sent on to Eric Cooper <ecc@cooper-siegel.org>:
Bug#737416. (Mon, 03 Feb 2014 01:45:20 GMT) Full text and rfc822 format available.

Reply sent to Atsuhito Kohda <kohda@debian.org>:
You have taken responsibility. (Wed, 05 Feb 2014 03:39:10 GMT) Full text and rfc822 format available.

Notification sent to Eric Cooper <ecc@cooper-siegel.org>:
Bug acknowledged by developer. (Wed, 05 Feb 2014 03:39:10 GMT) Full text and rfc822 format available.

Message #18 received at 737416-close@bugs.debian.org (full text, mbox):

From: Atsuhito Kohda <kohda@debian.org>
To: 737416-close@bugs.debian.org
Subject: Bug#737416: fixed in lynx-cur 2.8.8pre4-1
Date: Wed, 05 Feb 2014 03:36:19 +0000
Source: lynx-cur
Source-Version: 2.8.8pre4-1

We believe that the bug you reported is fixed in the latest version of
lynx-cur, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 737416@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Atsuhito Kohda <kohda@debian.org> (supplier of updated lynx-cur package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.8
Date: Wed, 05 Feb 2014 10:54:27 +0900
Source: lynx-cur
Binary: lynx-cur lynx-cur-wrapper lynx
Architecture: source all amd64
Version: 2.8.8pre4-1
Distribution: unstable
Urgency: medium
Maintainer: Atsuhito KOHDA <kohda@debian.org>
Changed-By: Atsuhito Kohda <kohda@debian.org>
Description: 
 lynx       - Text-mode WWW Browser (transitional package)
 lynx-cur   - Text-mode WWW Browser with NLS support (development version)
 lynx-cur-wrapper - Wrapper for lynx-cur (transitional package)
Closes: 737416
Changes: 
 lynx-cur (2.8.8pre4-1) unstable; urgency=medium
 .
   * New upstream release
    - modify the LOCALE_CHARSET feature (Closes: #737416)
Checksums-Sha1: 
 6adc1d2a478757f134c044160cd29b5cc7b9c590 1274 lynx-cur_2.8.8pre4-1.dsc
 abdfac3aca6752a0d5c36107f5555306b1ce2831 3579113 lynx-cur_2.8.8pre4.orig.tar.gz
 8d9e2db17c3166117b6c7f37c1ef3eb07689cd78 33064 lynx-cur_2.8.8pre4-1.diff.gz
 66302da9a90a196d3260ab29d0dd13413e0190bb 230166 lynx-cur-wrapper_2.8.8pre4-1_all.deb
 e07cfc7ff86ace19065b11a617d86fd1f86c7326 230560 lynx_2.8.8pre4-1_all.deb
 b77e4ea97cf8d0547911ed013a6108a741dc5d02 1608176 lynx-cur_2.8.8pre4-1_amd64.deb
Checksums-Sha256: 
 dcb6383a753d47914be999b92fe21e390f1008ed73886e1ca46fa81997945f26 1274 lynx-cur_2.8.8pre4-1.dsc
 ee5f177808100163eff9e0df3f9aa074afa5ce4b9de652fe9d801db835195fa8 3579113 lynx-cur_2.8.8pre4.orig.tar.gz
 7869196451e7588374fe5c657d8212ad9d319cb4836649b2ea568c94e631ab62 33064 lynx-cur_2.8.8pre4-1.diff.gz
 25c7654deab29fad699c619341e5635980a28856cc780f4242cb08e10677428a 230166 lynx-cur-wrapper_2.8.8pre4-1_all.deb
 f2a26a4e53dc69c636a14de238043751fb723ada4c07f16a7e266058c6eb9f4a 230560 lynx_2.8.8pre4-1_all.deb
 4331c10d6f7d07cb8f7380983cff71f45930ca67f63a64390600dcaa5f224619 1608176 lynx-cur_2.8.8pre4-1_amd64.deb
Files: 
 d1575baf108c464729e0272d3debe418 1274 web extra lynx-cur_2.8.8pre4-1.dsc
 e0f6a8aed96f0824451a02b9bcb41760 3579113 web extra lynx-cur_2.8.8pre4.orig.tar.gz
 302d7a99aab2e2cbd26a39bae648ff13 33064 web extra lynx-cur_2.8.8pre4-1.diff.gz
 1168f13b04318debc67a85af3c6adf4a 230166 oldlibs extra lynx-cur-wrapper_2.8.8pre4-1_all.deb
 e79993d9e7342c5b7325a09a8157debb 230560 oldlibs extra lynx_2.8.8pre4-1_all.deb
 9dd5703a68c825c984deb302240e0f3c 1608176 web extra lynx-cur_2.8.8pre4-1_amd64.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlLxo30ACgkQ1IXdL1v6kOwT3ACfRo4Wk3EWh8cYtNRW2K0NWbv8
nNwAnRjWclQkDN+yzzAL/3caEDQkr7G3
=eL7m
-----END PGP SIGNATURE-----




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Mon, 10 Mar 2014 07:27:08 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sat Apr 19 19:57:55 2014; Machine Name: beach.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.