Debian Bug report logs - #553490
wdiff: Does not handle UTF-8 properly

version graph

Package: wdiff; Maintainer for wdiff is Santiago Vila <sanvila@debian.org>; Source for wdiff is src:wdiff.

Reported by: Josh Triplett <josh@joshtriplett.org>

Date: Sat, 31 Oct 2009 18:51:02 UTC

Severity: normal

Tags: upstream

Found in version wdiff/0.5-19

Fixed in version wdiff/1.1.0-1

Done: Santiago Vila <sanvila@debian.org>

Bug is archived. No further changes may be made.

Forwarded to wdiff-bugs@gnu.org

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, josh@joshtriplett.org, Santiago Vila <sanvila@debian.org>:
Bug#553490; Package wdiff. (Sat, 31 Oct 2009 18:51:05 GMT) Full text and rfc822 format available.

Message #3 received at submit@bugs.debian.org (full text, mbox):

From: Josh Triplett <josh@joshtriplett.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: wdiff: Does not handle UTF-8 properly
Date: Sat, 31 Oct 2009 11:39:08 -0700
Package: wdiff
Version: 0.5-19
Severity: normal

"wdiff -a" uses backspace and overstrike to provide emphasis; thus, it
will emphasize 'x' by printing 'x^Hx'.  When it encounters a UTF-8
character, it does this for each byte, rather than for each character;
thus, emphasis of <E2><80><99> (U+2019 RIGHT SINGLE QUOTATION MARK)
looks like '<E2>^H<E2><80>^H<80><99>^H<99>', when it should look
like '<E2><80><99>^H<E2><80><99>'.

- Josh Triplett

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.31-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages wdiff depends on:
ii  dpkg                      1.15.4.1       Debian package management system
ii  install-info              4.13a.dfsg.1-5 Manage installed documentation in 
ii  libc6                     2.10.1-3       GNU C Library: Shared libraries
ii  libncurses5               5.7+20090803-2 shared libraries for terminal hand

wdiff recommends no packages.

wdiff suggests no packages.

-- no debconf information




Information forwarded to debian-bugs-dist@lists.debian.org, Santiago Vila <sanvila@debian.org>:
Bug#553490; Package wdiff. (Wed, 30 Mar 2011 16:21:09 GMT) Full text and rfc822 format available.

Acknowledgement sent to Arunan Balasubramaniam <foss@abala.me>:
Extra info received and forwarded to list. Copy sent to Santiago Vila <sanvila@debian.org>. (Wed, 30 Mar 2011 16:21:09 GMT) Full text and rfc822 format available.

Message #8 received at 553490@bugs.debian.org (full text, mbox):

From: Arunan Balasubramaniam <foss@abala.me>
To: 553490@bugs.debian.org
Subject: Appears fixed upstream
Date: Wed, 30 Mar 2011 16:56:15 +0100

  I tested that this bug still appears in the Squeeze version 0.6.3. I
then built 0.6.5 and the bug appears to be fixed in that.





Information forwarded to debian-bugs-dist@lists.debian.org, Santiago Vila <sanvila@debian.org>:
Bug#553490; Package wdiff. (Mon, 04 Apr 2011 09:45:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Santiago Vila <sanvila@unex.es>:
Extra info received and forwarded to list. Copy sent to Santiago Vila <sanvila@debian.org>. (Mon, 04 Apr 2011 09:45:11 GMT) Full text and rfc822 format available.

Message #13 received at 553490@bugs.debian.org (full text, mbox):

From: Santiago Vila <sanvila@unex.es>
To: Arunan Balasubramaniam <foss@abala.me>, 553490@bugs.debian.org
Subject: Re: Bug#553490: Appears fixed upstream
Date: Mon, 4 Apr 2011 11:36:44 +0200 (CEST)
On Wed, 30 Mar 2011, Arunan Balasubramaniam wrote:

>   I tested that this bug still appears in the Squeeze version 0.6.3. I
> then built 0.6.5 and the bug appears to be fixed in that.

Thanks for the info, this serves as a reminder to me to upload 0.6.5
sometime soon.




Information forwarded to debian-bugs-dist@lists.debian.org, Santiago Vila <sanvila@debian.org>:
Bug#553490; Package wdiff. (Tue, 05 Apr 2011 15:21:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Santiago Vila <sanvila@unex.es>:
Extra info received and forwarded to list. Copy sent to Santiago Vila <sanvila@debian.org>. (Tue, 05 Apr 2011 15:21:03 GMT) Full text and rfc822 format available.

Message #18 received at 553490@bugs.debian.org (full text, mbox):

From: Santiago Vila <sanvila@unex.es>
To: Arunan Balasubramaniam <foss@abala.me>, 553490@bugs.debian.org
Subject: Re: Bug#553490: Appears fixed upstream
Date: Tue, 5 Apr 2011 17:16:28 +0200 (CEST)
On Wed, 30 Mar 2011, Arunan Balasubramaniam wrote:

>   I tested that this bug still appears in the Squeeze version 0.6.3. I
> then built 0.6.5 and the bug appears to be fixed in that.

Hmm, I can't reproduce that it's fixed in 0.6.5.
Could you please provide a test case?

(Email with attachments are welcome).




Information forwarded to debian-bugs-dist@lists.debian.org, Santiago Vila <sanvila@debian.org>:
Bug#553490; Package wdiff. (Tue, 05 Apr 2011 17:36:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Arunan Balasubramaniam <foss@abala.me>:
Extra info received and forwarded to list. Copy sent to Santiago Vila <sanvila@debian.org>. (Tue, 05 Apr 2011 17:36:03 GMT) Full text and rfc822 format available.

Message #23 received at 553490@bugs.debian.org (full text, mbox):

From: Arunan Balasubramaniam <foss@abala.me>
To: Santiago Vila <sanvila@unex.es>, 553490 <553490@bugs.debian.org>
Subject: Re: Bug#553490: Appears fixed upstream
Date: Tue, 05 Apr 2011 18:32:04 +0100
On Tue, 2011-04-05 at 17:16 +0200, Santiago Vila wrote:
> Hmm, I can't reproduce that it's fixed in 0.6.5.
> Could you please provide a test case?

  It turns out I made a mistake. My built version was not outputting to
a pager. When I set PAGER, then the error still occurs. Without PAGER,
'wdiff -a' displays the correctly highlighted diff on the following
line.

  I can see the same behaviour in the Squeeze version using:

  $ export PAGER=
  $ wdiff -a ...

  Sorry for wasting your time.






Added tag(s) upstream. Request was from Santiago Vila <sanvila@unex.es> to control@bugs.debian.org. (Thu, 20 Oct 2011 10:48:03 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Santiago Vila <sanvila@debian.org>:
Bug#553490; Package wdiff. (Thu, 20 Oct 2011 11:06:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Santiago Vila <sanvila@unex.es>:
Extra info received and forwarded to list. Copy sent to Santiago Vila <sanvila@debian.org>. (Thu, 20 Oct 2011 11:06:06 GMT) Full text and rfc822 format available.

Message #30 received at 553490@bugs.debian.org (full text, mbox):

From: Santiago Vila <sanvila@unex.es>
To: wdiff-bugs@gnu.org
Cc: 553490-forwarded@bugs.debian.org, 553490@bugs.debian.org, Josh Triplett <josh@joshtriplett.org>
Subject: Bug#553490: wdiff: Does not handle UTF-8 properly (fwd)
Date: Thu, 20 Oct 2011 13:02:34 +0200 (CEST)
Hello.

I received this from the Debian bug system.
I've checked and the current version (1.0.1) still shows the bug.
[ Please keep the Cc: lines when replying, thanks ].

[ Apologies to the submitter for taking so long to process this ]

---------- Forwarded message ----------
From: Josh Triplett <josh@joshtriplett.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Date: Sat, 31 Oct 2009 11:39:08 -0700
Subject: wdiff: Does not handle UTF-8 properly

Package: wdiff
Version: 0.5-19
Severity: normal

"wdiff -a" uses backspace and overstrike to provide emphasis; thus, it
will emphasize 'x' by printing 'x^Hx'.  When it encounters a UTF-8
character, it does this for each byte, rather than for each character;
thus, emphasis of <E2><80><99> (U+2019 RIGHT SINGLE QUOTATION MARK)
looks like '<E2>^H<E2><80>^H<80><99>^H<99>', when it should look
like '<E2><80><99>^H<E2><80><99>'.

- Josh Triplett

[...]




Reply sent to Santiago Vila <sanvila@unex.es>:
You have marked Bug as forwarded. (Thu, 20 Oct 2011 11:06:13 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Santiago Vila <sanvila@debian.org>:
Bug#553490; Package wdiff. (Thu, 20 Oct 2011 19:09:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Martin von Gagern <Martin.vGagern@gmx.net>:
Extra info received and forwarded to list. Copy sent to Santiago Vila <sanvila@debian.org>. (Thu, 20 Oct 2011 19:09:03 GMT) Full text and rfc822 format available.

Message #38 received at 553490@bugs.debian.org (full text, mbox):

From: Martin von Gagern <Martin.vGagern@gmx.net>
To: Santiago Vila <sanvila@unex.es>
Cc: wdiff-bugs@gnu.org, 553490@bugs.debian.org, 553490-forwarded@bugs.debian.org, Josh Triplett <josh@joshtriplett.org>
Subject: Re: [wdiff-bugs] Bug#553490: wdiff: Does not handle UTF-8 properly (fwd)
Date: Thu, 20 Oct 2011 21:05:56 +0200
[Message part 1 (text/plain, inline)]
Dear Santiago, Dear Josh,

I've already noticed that bug in your bug tracker, and added it to the
wdiff bug tracker at Savannah: https://savannah.gnu.org/bugs/?34224

Right now, I'm not sure how best to handle this case. Unicode support is
a big problem for the current wdiff implementation, in many ways. For
example, I guess that the most sensible way to really simulate
overstrike printing would be detecting grapheme clusters, i.e. even
treat sequences ofmultiple code points as a single entity if some of the
codepoints are combining.
http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries has the
details on this, but I don't think I'll implement this in wdiff myself.
I've been toying with the idea of writing wdiff up from scratch with
stuff like this in mind, using ICU break iterators or similar. Won't
happen too soon, though.

I'm also not sure what versions of less are behaving in what ways. For
one, I doubt that all of them will know about grapheme clusters when
reading their input, so they might fail to turn it back into character
attributes as expected. I also think that most less implementations
these days will handle terminal control codes just fine, particularly if
called as "less -R". So that overstriking thing might be obsolete in any
case.

Therefore I hope to roll a release soon which will pass terminal control
sequences to less, thus avoiding that overstrike stuff. I'll have to
give a bit more thought to the best combination of configure switches,
environment variables and command line options, though.

Greetings,
 Martin von Gagern

[signature.asc (application/pgp-signature, attachment)]

Reply sent to Santiago Vila <sanvila@debian.org>:
You have taken responsibility. (Sun, 26 Feb 2012 17:09:09 GMT) Full text and rfc822 format available.

Notification sent to Josh Triplett <josh@joshtriplett.org>:
Bug acknowledged by developer. (Sun, 26 Feb 2012 17:09:09 GMT) Full text and rfc822 format available.

Message #44 received at 553490-close@bugs.debian.org (full text, mbox):

From: Santiago Vila <sanvila@debian.org>
To: 553490-close@bugs.debian.org
Subject: Bug#553490: fixed in wdiff 1.1.0-1
Date: Sun, 26 Feb 2012 17:04:25 +0000
Source: wdiff
Source-Version: 1.1.0-1

We believe that the bug you reported is fixed in the latest version of
wdiff, which is due to be installed in the Debian FTP archive:

wdiff-doc_1.1.0-1_all.deb
  to main/w/wdiff/wdiff-doc_1.1.0-1_all.deb
wdiff_1.1.0-1.debian.tar.gz
  to main/w/wdiff/wdiff_1.1.0-1.debian.tar.gz
wdiff_1.1.0-1.dsc
  to main/w/wdiff/wdiff_1.1.0-1.dsc
wdiff_1.1.0-1_amd64.deb
  to main/w/wdiff/wdiff_1.1.0-1_amd64.deb
wdiff_1.1.0.orig.tar.gz
  to main/w/wdiff/wdiff_1.1.0.orig.tar.gz



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 553490@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Santiago Vila <sanvila@debian.org> (supplier of updated wdiff package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Format: 1.8
Date: Sun, 26 Feb 2012 15:45:12 +0100
Source: wdiff
Binary: wdiff wdiff-doc
Architecture: source amd64 all
Version: 1.1.0-1
Distribution: unstable
Urgency: low
Maintainer: Santiago Vila <sanvila@debian.org>
Changed-By: Santiago Vila <sanvila@debian.org>
Description: 
 wdiff      - Compares two files word by word
 wdiff-doc  - Documentation for GNU wdiff
Closes: 553490
Changes: 
 wdiff (1.1.0-1) unstable; urgency=low
 .
   * New upstream release.
   * Support for UTF-8 has improved. Closes: #553490.
   * Switch to dh.
Checksums-Sha1: 
 8fb6955ad99b471a3054fa8ee09f933f73f66a22 1445 wdiff_1.1.0-1.dsc
 89147bf81aeb9ba4607aabd57d05bc56002d941e 1384900 wdiff_1.1.0.orig.tar.gz
 37717f8a92aadf350b918482ae5410ca142e68d0 5077 wdiff_1.1.0-1.debian.tar.gz
 5e3fe69e50cf6dab3790fe1bdaab9b727627124c 199128 wdiff_1.1.0-1_amd64.deb
 02109e2ee9e63783d45990c13624b484a53dd35e 43742 wdiff-doc_1.1.0-1_all.deb
Checksums-Sha256: 
 01b7e050ac3a30fc0ba966934cf61505fb9efe473b5e9560e92c90407c1d0e25 1445 wdiff_1.1.0-1.dsc
 b154bba7f5a6b76c9eff1ddd5d5850b0a6dc2332d2a1eda29444c68ecea7e5d1 1384900 wdiff_1.1.0.orig.tar.gz
 b52a41a631e8846c39084caa8e28fbf3993f73d044e1e6df4c9fbc094c4b1b33 5077 wdiff_1.1.0-1.debian.tar.gz
 b25090e08dfeeac7500c1bffdbb7e788bed3e0d7bdc5ad53507f871b2f960952 199128 wdiff_1.1.0-1_amd64.deb
 8d45cc40b76d9cd08389f56286781d3141b0ec87ec22f23d1f23f4774e304cdf 43742 wdiff-doc_1.1.0-1_all.deb
Files: 
 527c3160ad56af455d9414e823e3e7a4 1445 text optional wdiff_1.1.0-1.dsc
 aa4dd87a9140a96ee85d2502673d19f3 1384900 text optional wdiff_1.1.0.orig.tar.gz
 281258ada39bc9c1f213c0d728aaaeb4 5077 text optional wdiff_1.1.0-1.debian.tar.gz
 82196ed97ccb00df04af004c42ba06c8 199128 text optional wdiff_1.1.0-1_amd64.deb
 4033b74ba0004d4c31558125aa96ec0c 43742 doc optional wdiff-doc_1.1.0-1_all.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBCAAGBQJPSmOJAAoJEEHOfwufG4sysjMIAJ/5jMx2S81QaUu7xwpOi8SJ
j2dfljHKlN8b3KYrPgL0mncamtKJJVZbaoXxQylYWz/H7uA+Wamg3UoBr/ALw5lI
1+uY2Lbtlq6kneNO3JphrGu0X6IX30/fS7a0YLo4g/+EluGPqI4UdeSQJR5+MXve
K28tM5aTwDF9ildA8Inzg79v+T4N8xriissJA1WBcwnkGBU1zksvLapy50IJNiMi
ahrYzsYMeWfJDwlLV+PDmVxcegWSlfPTXnZEg//2/DXRkLlqgSuCr22aPf9tkuWk
FpQ3zAyydUTbQE9Mma1Y7oAR4i1fARxqlvprOXA5RQ/4pkOL8lYT6sFk6fVfLp8=
=kZms
-----END PGP SIGNATURE-----





Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Mon, 26 Mar 2012 07:40:40 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed Apr 23 08:27:55 2014; Machine Name: beach.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.