Debian Bug report logs -
#279221
should transcode characters from utf-8 if the terminal is not utf-8 capable
Reply or subscribe to this bug.
Toggle useless messages
Report forwarded to debian-bugs-dist@lists.debian.org, Fumitoshi UKAI <ukai@debian.or.jp>:
Bug#279221; Package w3m.
(full text, mbox, link).
Acknowledgement sent to Joey Hess <joeyh@debian.org>:
New Bug report received and forwarded. Copy sent to Fumitoshi UKAI <ukai@debian.or.jp>.
(full text, mbox, link).
Message #5 received at submit@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
Package: w3m
Version: 0.5.1-3
Severity: wishlist
Here's the problem:
joey@dragon:~>locale | grep CtypE
LC_CTYPE="POSIX"
joey@dragon:~>echo '—' > foo.html
joey@dragon:~>w3m -dump foo.html
?
That comes out as a '?' because w3m apparently internally converts it to the
utf-8 character for mdash (which is not '-', but the other dash), and then
discovers it's not in the character set for this terminal and decides to render
it as a question mark. When reading a document with lots of —, “,
&helip; and other fancy entities, this gets very annoying.
Instead, w3m should be aware of the character set and just use available
characters that are close to the right ones, like "-". Other browsers, such
as lynx, do that.
-- System Information:
Debian Release: 3.1
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.4.27
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Versions of packages w3m depends on:
ii libc6 2.3.2.ds1-18 GNU C Library: Shared libraries an
ii libgc1 1:6.3-1 Conservative garbage collector for
ii libgpmg1 1.19.6-19 General Purpose Mouse - shared lib
ii libncurses5 5.4-4 Shared libraries for terminal hand
ii libssl0.9.7 0.9.7d-5 SSL shared libraries
ii zlib1g 1:1.2.2-1 compression library - runtime
-- no debconf information
--
see shy jo
[signature.asc (application/pgp-signature, inline)]
Information forwarded to debian-bugs-dist@lists.debian.org, Fumitoshi UKAI <ukai@debian.or.jp>:
Bug#279221; Package w3m.
(full text, mbox, link).
Acknowledgement sent to Samuel Thibault <samuel.thibault@ens-lyon.org>:
Extra info received and forwarded to list. Copy sent to Fumitoshi UKAI <ukai@debian.or.jp>.
(full text, mbox, link).
Message #10 received at 279221@bugs.debian.org (full text, mbox, reply):
Hi,
For this, iconv can be much helpful:
$ hexdump foo
0000000 e2 80 94 0a
$ iconv -f utf-8 -t latin1//translit < foo
--
$
The //translit suffixe tells iconv to translate everything.
So w3m should do something like:
#define TRANSLIT "//translit"
char *codeset = nl_langinfo(CODESET);
int len = strlen(codeset);
char *charset = malloc(len+strlen(TRANSLIT)+1);
memcpy(charset,codeset,len);
memcpy(charset+len,TRANSLIT,strlen(TRANSLIT)+1);
conv = iconv_open(charset, page_charset);
iconv(conv, ...);
Regards,
Samuel
Severity set to `minor'.
Request was from Samuel Thibault <samuel.thibault@ens-lyon.org>
to control@bugs.debian.org.
(full text, mbox, link).
Changed Bug title to `should transcode characters from utf-8 if the terminal is not utf-8 capable' from `should transcde characters from utf-8 if the terminal is not utf-8 capable'.
Request was from Vincent Lefevre <vincent@vinc17.org>
to control@bugs.debian.org.
(Wed, 06 Jun 2007 12:09:02 GMT) (full text, mbox, link).
Information forwarded to debian-bugs-dist@lists.debian.org, Fumitoshi UKAI <ukai@debian.or.jp>:
Bug#279221; Package w3m.
(full text, mbox, link).
Acknowledgement sent to Vincent Lefevre <vincent@vinc17.org>:
Extra info received and forwarded to list. Copy sent to Fumitoshi UKAI <ukai@debian.or.jp>.
(full text, mbox, link).
Message #21 received at 279221@bugs.debian.org (full text, mbox, reply):
Hi,
On 2005-06-07 14:56:16 +0200, Samuel Thibault wrote:
> The //translit suffixe tells iconv to translate everything.
>
> So w3m should do something like:
>
> #define TRANSLIT "//translit"
> char *codeset = nl_langinfo(CODESET);
> int len = strlen(codeset);
> char *charset = malloc(len+strlen(TRANSLIT)+1);
> memcpy(charset,codeset,len);
> memcpy(charset+len,TRANSLIT,strlen(TRANSLIT)+1);
> conv = iconv_open(charset, page_charset);
> iconv(conv, ...);
Any news?
--
Vincent Lefèvre <vincent@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)
Unset Bug forwarded-to-address
Request was from d+deb@vdr.jp
to control@bugs.debian.org.
(Fri, 23 Jul 2010 18:27:04 GMT) (full text, mbox, link).
Information forwarded
to debian-bugs-dist@lists.debian.org, Tatsuya Kinoshita <tats@debian.org>:
Bug#279221; Package w3m.
(Sun, 12 Oct 2014 12:45:04 GMT) (full text, mbox, link).
Acknowledgement sent
to Markus Hiereth <markus.hiereth@freenet.de>:
Extra info received and forwarded to list. Copy sent to Tatsuya Kinoshita <tats@debian.org>.
(Sun, 12 Oct 2014 12:45:04 GMT) (full text, mbox, link).
Message #30 received at 279221@bugs.debian.org (full text, mbox, reply):
Package: w3m
Followup-For: Bug #279221
Dear Maintainer,
I wonder it this bug report can be closed for w3m in Debian 7.
I got the correct output
$ echo '—' > foo.html
$ w3m -dump < foo.html
—
Regards
Markus
-- System Information:
Debian Release: 7.6
APT prefers stable
APT policy: (500, 'stable')
Architecture: i386 (i686)
Kernel: Linux 3.2.0-4-486
Locale: LANG=de_DE, LC_CTYPE=de_DE (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/dash
Versions of packages w3m depends on:
ii libc6 2.13-38+deb7u2
ii libgc1c2 1:7.1-9.1
ii libgpm2 1.20.4-6
ii libssl1.0.0 1.0.1e-2+deb7u11
ii libtinfo5 5.9-10
ii zlib1g 1:1.2.7.dfsg-13
Versions of packages w3m recommends:
ii ca-certificates 20130119
Versions of packages w3m suggests:
ii man-db 2.6.2-1
pn menu <none>
pn migemo <none>
ii mime-support 3.52-1
pn w3m-el <none>
ii w3m-img 0.5.3-8
-- no debconf information
Information forwarded
to debian-bugs-dist@lists.debian.org:
Bug#279221; Package w3m.
(Sun, 12 Oct 2014 15:00:05 GMT) (full text, mbox, link).
Acknowledgement sent
to Tatsuya Kinoshita <tats@debian.org>:
Extra info received and forwarded to list.
(Sun, 12 Oct 2014 15:00:05 GMT) (full text, mbox, link).
Message #35 received at 279221@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
On October 12, 2014 at 2:31PM +0200, markus.hiereth (at freenet.de) wrote:
> I got the correct output
>
> $ echo '—' > foo.html
> $ w3m -dump < foo.html
> —
Still not improved.
$ w3m -dump foo.html
?
$ w3m -dump -T text/html < foo.html
?
Thanks,
--
Tatsuya Kinoshita
[Message part 2 (application/pgp-signature, inline)]
Send a report that this bug log contains spam.
Debian bug tracking system administrator <owner@bugs.debian.org>.
Last modified:
Mon Jun 5 03:09:25 2023;
Machine Name:
buxtehude
Debian Bug tracking system
Debbugs is free software and licensed under the terms of the GNU
Public License version 2. The current version can be obtained
from https://bugs.debian.org/debbugs-source/.
Copyright © 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson,
2005-2017 Don Armstrong, and many other contributors.