Debian Bug report logs - #175370
Use UTF-8 internally, convert to locale's encoding for output

Package: dpkg; Maintainer for dpkg is Dpkg Developers <debian-dpkg@lists.debian.org>; Source for dpkg is src:dpkg.

Reported by: Colin Walters <walters@debian.org>

Date: Sun, 5 Jan 2003 00:48:10 UTC

Severity: normal

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Colin Walters <walters@debian.org>:
New Bug report received and forwarded. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Colin Walters <walters@debian.org>
To: submit@bugs.debian.org
Subject: patch: use UTF-8 internally, convert to locale's encoding for output
Date: 04 Jan 2003 19:43:33 -0500
[Message part 1 (text/plain, inline)]
Package: dpkg
Tags: patch

Here is a patch against the current dpkg source (1.10.9), which uses my
little "localeio" library to output UTF-8 strings in the user's current
locale encoding.

With this patch, the idea is that dpkg will use UTF-8 for all strings
internally, and only convert to the locale's encoding on output.  This
patch is still experimental, but it works for me on some tests.  

Wichert mentioned on IRC that he'd like to rename locale_* to just l*,
and I'm fine with that.

Now, one big remaining issue is that dpkg will need UTF-8 aware string
functions.  glib has a nice set of these...

Comments welcome!

[dpkg-localeio.patch (text/plain, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to barbier@linuxfr.org (Denis Barbier):
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #10 received at 175370@bugs.debian.org (full text, mbox):

From: barbier@linuxfr.org (Denis Barbier)
To: Colin Walters <walters@debian.org>, 175370@bugs.debian.org
Subject: Re: Bug#175370: patch: use UTF-8 internally, convert to locale's encoding for output
Date: Sun, 5 Jan 2003 22:31:54 +0100
On Sat, Jan 04, 2003 at 07:43:33PM -0500, Colin Walters wrote:
[...]
> +void locale_printf_init()
> +{
> +  char *lang = getenv ("LANG");
> +  char *dot;
> +
> +  if (locale_encoding)
> +    return;
> +
> +  if (!lang)
> +    goto out_ascii;
> +  dot = strchr (lang, '.');
> +  if (dot && dot+1)
> +    locale_encoding = strdup(dot+1);
> +  if (!locale_encoding)
> +    goto out_ascii;
> +  return;
> + out_ascii:
> +  locale_encoding = strdup("US-ASCII");
> +  if (!strcmp (locale_encoding, "UTF-8"))
> +    locale_is_utf8 = 1;
> +}

Unfortunately situation is much more complex (you must check LC_ALL and
LC_CTYPE variables too), the simplest solution is to run 'locale charmap'
to retrieve the current encoding.

Did you test your patch on a system where no UTF-8 locales are generated?
I am not sure it works then.

Denis



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Colin Walters <walters@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #15 received at 175370@bugs.debian.org (full text, mbox):

From: Colin Walters <walters@debian.org>
To: Denis Barbier <barbier@linuxfr.org>, 175370@bugs.debian.org
Subject: Re: Bug#175370: patch: use UTF-8 internally, convert to locale's encoding for output
Date: 06 Jan 2003 12:25:33 -0500
On Sun, 2003-01-05 at 16:31, Denis Barbier wrote:

> Unfortunately situation is much more complex (you must check LC_ALL and
> LC_CTYPE variables too), the simplest solution is to run 'locale charmap'
> to retrieve the current encoding.

Ok, yeah, I was kind of aware of that, but I forgot to fix it before
submitting the patch.  I will definitely fix it; I saw some code in glib
that looked perfect.  When the dpkg maintainers indicate their
acceptance of the idea in this patch, I'll go ahead and update the
patch.

> Did you test your patch on a system where no UTF-8 locales are generated?
> I am not sure it works then.

No...but I don't see why it wouldn't.




Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Wichert Akkerman <wichert@wiggy.net>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #20 received at 175370@bugs.debian.org (full text, mbox):

From: Wichert Akkerman <wichert@wiggy.net>
To: Colin Walters <walters@debian.org>, 175370@bugs.debian.org
Subject: Re: Bug#175370: patch: use UTF-8 internally, convert to locale's encoding for output
Date: Mon, 6 Jan 2003 20:40:33 +0100
Previously Colin Walters wrote:
> When the dpkg maintainers indicate their acceptance of the idea in
> this patch, I'll go ahead and update the patch.

Please update it :)

Wichert.

-- 
Wichert Akkerman <wichert@wiggy.net>           http://www.wiggy.net/
A random hacker



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Wichert Akkerman <wichert@wiggy.net>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #25 received at 175370@bugs.debian.org (full text, mbox):

From: Wichert Akkerman <wichert@wiggy.net>
To: Colin Walters <walters@debian.org>, 175370@bugs.debian.org
Subject: Re: Bug#175370: patch: use UTF-8 internally, convert to locale's encoding for output
Date: Mon, 6 Jan 2003 20:41:06 +0100
Previously Colin Walters wrote:
> When the dpkg maintainers indicate their acceptance of the idea in
> this patch, I'll go ahead and update the patch.

Can you also update the patch to make dselect handle UTF-8 as well?

Wichert.

-- 
Wichert Akkerman <wichert@wiggy.net>           http://www.wiggy.net/
A random hacker



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Colin Walters <walters@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #30 received at 175370@bugs.debian.org (full text, mbox):

From: Colin Walters <walters@debian.org>
To: 175370@bugs.debian.org
Subject: Re: Bug#175370: patch: use UTF-8 internally, convert to locale's encoding for output
Date: 07 Jan 2003 00:47:12 -0500
On Mon, 2003-01-06 at 14:41, Wichert Akkerman wrote:
> Previously Colin Walters wrote:
> > When the dpkg maintainers indicate their acceptance of the idea in
> > this patch, I'll go ahead and update the patch.
> 
> Can you also update the patch to make dselect handle UTF-8 as well?

My C++-fu is very weak :/




Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Wichert Akkerman <wichert@wiggy.net>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #35 received at 175370@bugs.debian.org (full text, mbox):

From: Wichert Akkerman <wichert@wiggy.net>
To: Colin Walters <walters@debian.org>, 175370@bugs.debian.org
Subject: Re: Bug#175370: patch: use UTF-8 internally, convert to locale's encoding for output
Date: Tue, 7 Jan 2003 11:52:48 +0100
Previously Colin Walters wrote:
> My C++-fu is very weak :/

dselect isn't real C++ so you should be able to manage :)

Wichert.

-- 
Wichert Akkerman <wichert@wiggy.net>           http://www.wiggy.net/
A random hacker



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to barbier@linuxfr.org (Denis Barbier):
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #40 received at 175370@bugs.debian.org (full text, mbox):

From: barbier@linuxfr.org (Denis Barbier)
To: Colin Walters <walters@debian.org>, 175370@bugs.debian.org
Subject: Re: Bug#175370: patch: use UTF-8 internally, convert to locale's encoding for output
Date: Fri, 10 Jan 2003 11:22:06 +0100
On Sat, Jan 04, 2003 at 07:43:33PM -0500, Colin Walters wrote:
> Package: dpkg
> Tags: patch
> 
> Here is a patch against the current dpkg source (1.10.9), which uses my
> little "localeio" library to output UTF-8 strings in the user's current
> locale encoding.
> 
> With this patch, the idea is that dpkg will use UTF-8 for all strings
> internally, and only convert to the locale's encoding on output.  This
> patch is still experimental, but it works for me on some tests.  
[...]

For the record, http://www.cl.cam.ac.uk/~mgk25/unicode.html (and
especially the #mod section) contains many valuable informations.

Denis



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Tomohiro KUBOTA <debian@tmail.plala.or.jp>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #45 received at 175370@bugs.debian.org (full text, mbox):

From: Tomohiro KUBOTA <debian@tmail.plala.or.jp>
To: 175370@bugs.debian.org
Subject: A reliable way to get LC_CTYPE encoding
Date: Mon, 13 Jan 2003 09:58:04 +0900 (JST)
barbier@linuxfr.org (Denis Barbier) writes:
> Unfortunately situation is much more complex (you must check LC_ALL and
> LC_CTYPE variables too), the simplest solution is to run 'locale charmap'
> to retrieve the current encoding.

In C program, nl_langinfo(CODESET) works reliably.

Though the function is mandated by XPG5 (X/Open Portability Guide,
Issue 5), some UNIX variants lack it.  (Since GNU libc supports it,
dpkg and so on can safely use this.)

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/





Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Tomohiro KUBOTA <debian@tmail.plala.or.jp>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #50 received at 175370@bugs.debian.org (full text, mbox):

From: Tomohiro KUBOTA <debian@tmail.plala.or.jp>
To: 175370@bugs.debian.org
Subject: Why so many locale_printf?
Date: Mon, 13 Jan 2003 16:12:35 +0900 (JST)
Hi,

I found many substitution of printf() family of functions
into locale_printf() and so on.  I guess it converts from
UTF-8 into locale encoding.

However, I also found that gettext strings, _("something"),
is used for parameters for locale_printf().  Since gettext
passes translated strings in locale encoding, I don't think
further conversion is needed.  (Rather, it is harmful.)
Am I missing something?


I mean,

#include <stdio.h>
main(){
    all needed initialization for gettext and locale;
    printf(_("something\n"));  /* not locale_printf */
}

works, for example, both in de_DE.ISO-8859-1 and de_DE.UTF-8,
regardless of the encoding of po file.


---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/







Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Wichert Akkerman <wichert@wiggy.net>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #55 received at 175370@bugs.debian.org (full text, mbox):

From: Wichert Akkerman <wichert@wiggy.net>
To: Tomohiro KUBOTA <debian@tmail.plala.or.jp>, 175370@bugs.debian.org
Subject: Re: Bug#175370: A reliable way to get LC_CTYPE encoding
Date: Mon, 13 Jan 2003 12:48:44 +0100
Previously Tomohiro KUBOTA wrote:
> Since GNU libc supports it, dpkg and so on can safely use this.

That is not true, dpkg runs (and should keep running) on non-GNU systems
as well.

Wichert.

-- 
Wichert Akkerman <wichert@wiggy.net>           http://www.wiggy.net/
A random hacker



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Tomohiro KUBOTA <debian@tmail.plala.or.jp>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #60 received at 175370@bugs.debian.org (full text, mbox):

From: Tomohiro KUBOTA <debian@tmail.plala.or.jp>
To: 175370@bugs.debian.org
Subject: Re: Bug#175370: A reliable way to get LC_CTYPE encoding
Date: Mon, 13 Jan 2003 21:24:48 +0900 (JST)
Hi,

From: Wichert Akkerman <wichert@wiggy.net>
Subject: Re: Bug#175370: A reliable way to get LC_CTYPE encoding
Date: Mon, 13 Jan 2003 12:48:44 +0100

> > Since GNU libc supports it, dpkg and so on can safely use this.
> 
> That is not true, dpkg runs (and should keep running) on non-GNU systems
> as well.

In such a case, libcharset from GNU would be helpful as a substitute
of nl_langinfo().

---
Tomohiro KUBOTA <kubota@debian.org>
http://www.debian.or.jp/~kubota/





Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Colin Walters <walters@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #65 received at 175370@bugs.debian.org (full text, mbox):

From: Colin Walters <walters@debian.org>
To: Tomohiro KUBOTA <debian@tmail.plala.or.jp>, 175370@bugs.debian.org
Subject: Re: Bug#175370: Why so many locale_printf?
Date: 13 Jan 2003 11:38:32 -0500
On Mon, 2003-01-13 at 02:12, Tomohiro KUBOTA wrote:
> Hi,
> 
> I found many substitution of printf() family of functions
> into locale_printf() and so on.  I guess it converts from
> UTF-8 into locale encoding.

Right.

> However, I also found that gettext strings, _("something"),
> is used for parameters for locale_printf().  Since gettext
> passes translated strings in locale encoding, I don't think
> further conversion is needed.  (Rather, it is harmful.)
> Am I missing something?

Yes; I told gettext to return strings in UTF-8 encoding, by saying:

bind_textdomain_codeset(PACKAGE, "UTF-8");

That way dpkg manipulates all strings in UTF-8 encoding internally.




Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Steve Langasek <vorlon@netexpress.net>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #70 received at 175370@bugs.debian.org (full text, mbox):

From: Steve Langasek <vorlon@netexpress.net>
To: 175370@bugs.debian.org
Subject: Re: A reliable way to get LC_CTYPE encoding
Date: Tue, 14 Jan 2003 17:40:45 -0600
[Message part 1 (text/plain, inline)]
Hello,

>> Since GNU libc supports it, dpkg and so on can safely use this.

> That is not true, dpkg runs (and should keep running) on non-GNU
> systems as well.

FWIW, I did some research into nl_langinfo() for an upstream recently,
and found that it's implemented on at least these platforms:

   Linux (glibc)
   FreeBSD
   NetBSD
   Solaris
   AIX

If there are other platforms that are on the "must work" list for dpkg, I
can try to do a little more research.

Cheers,
-- 
Steve Langasek
postmodern programmer
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Colin Walters <walters@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #75 received at 175370@bugs.debian.org (full text, mbox):

From: Colin Walters <walters@debian.org>
To: vorlon@netexpress.net
Cc: 175370@bugs.debian.org
Subject: Re: Bug#175370: A reliable way to get LC_CTYPE encoding
Date: 14 Jan 2003 20:16:17 -0500
On Tue, 2003-01-14 at 18:40, Steve Langasek wrote:

> FWIW, I did some research into nl_langinfo() for an upstream recently,
> and found that it's implemented on at least these platforms:

Ok, that is useful information.  What I personally would be quite
interested in is how many of those platforms support iconv().  That's a
requirement for glib, which I am investigating for use in dpkg.




Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Wichert Akkerman <wichert@wiggy.net>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #80 received at 175370@bugs.debian.org (full text, mbox):

From: Wichert Akkerman <wichert@wiggy.net>
To: Steve Langasek <vorlon@netexpress.net>, 175370@bugs.debian.org
Subject: Re: Bug#175370: A reliable way to get LC_CTYPE encoding
Date: Wed, 15 Jan 2003 11:42:39 +0100
Previously Steve Langasek wrote:
> FWIW, I did some research into nl_langinfo() for an upstream recently,
> and found that it's implemented on at least these platforms:
> 
>    Linux (glibc)
>    FreeBSD
>    NetBSD
>    Solaris
>    AIX

Great, that is very useful information.

> If there are other platforms that are on the "must work" list for dpkg, I
> can try to do a little more research.

HP-UX and IRIX are the other two platforms on which I know dpkg can run;
it would be great if you can find information for those two as well.

Wichert.

-- 
Wichert Akkerman <wichert@wiggy.net>           http://www.wiggy.net/
A random hacker



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #85 received at 175370@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: Steve Langasek <vorlon@netexpress.net>, 175370@bugs.debian.org
Subject: Re: Bug#175370: A reliable way to get LC_CTYPE encoding
Date: Wed, 15 Jan 2003 07:31:42 -0600
On Tue, Jan 14, 2003 at 05:40:45PM -0600, Steve Langasek wrote:
> Hello,
> > That is not true, dpkg runs (and should keep running) on non-GNU
> > systems as well.
> 
> FWIW, I did some research into nl_langinfo() for an upstream recently,
> and found that it's implemented on at least these platforms:
> 
>    Linux (glibc)
>    FreeBSD
>    NetBSD
>    Solaris
>    AIX

Hmm, groff needed a patch not all that long ago because Debian
GNU/NetBSD didn't have nl_langinfo(CHARSET). Maybe it's a recent
addition.

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Steve Langasek <vorlon@netexpress.net>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #90 received at 175370@bugs.debian.org (full text, mbox):

From: Steve Langasek <vorlon@netexpress.net>
To: Colin Watson <cjwatson@debian.org>
Cc: 175370@bugs.debian.org
Subject: Re: Bug#175370: A reliable way to get LC_CTYPE encoding
Date: Wed, 15 Jan 2003 09:34:45 -0600
[Message part 1 (text/plain, inline)]
On Wed, Jan 15, 2003 at 07:31:42AM -0600, Colin Watson wrote:
> On Tue, Jan 14, 2003 at 05:40:45PM -0600, Steve Langasek wrote:
> > Hello,
> > > That is not true, dpkg runs (and should keep running) on non-GNU
> > > systems as well.
> > 
> > FWIW, I did some research into nl_langinfo() for an upstream recently,
> > and found that it's implemented on at least these platforms:
> > 
> >    Linux (glibc)
> >    FreeBSD
> >    NetBSD
> >    Solaris
> >    AIX

> Hmm, groff needed a patch not all that long ago because Debian
> GNU/NetBSD didn't have nl_langinfo(CHARSET). Maybe it's a recent
> addition.

According to NetBSD's own on-line manpages (I've misplaced the URL ATM),
they've supported it since 1994.  Perhaps it's implemented but buggy?

Is Debian GNU/NetBSD using the same libc that NetBSD uses?

I'll see what I can dig up regarding HP-UX and Irix support.

-- 
Steve Langasek
postmodern programmer
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #95 received at 175370@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: Steve Langasek <vorlon@netexpress.net>
Cc: 175370@bugs.debian.org, debian-bsd@lists.debian.org
Subject: Re: Bug#175370: A reliable way to get LC_CTYPE encoding
Date: Wed, 15 Jan 2003 17:15:34 +0000
On Wed, Jan 15, 2003 at 09:34:45AM -0600, Steve Langasek wrote:
> On Wed, Jan 15, 2003 at 07:31:42AM -0600, Colin Watson wrote:
> > On Tue, Jan 14, 2003 at 05:40:45PM -0600, Steve Langasek wrote:
> > > FWIW, I did some research into nl_langinfo() for an upstream recently,
> > > and found that it's implemented on at least these platforms:
> > > 
> > >    Linux (glibc)
> > >    FreeBSD
> > >    NetBSD
> > >    Solaris
> > >    AIX
> 
> > Hmm, groff needed a patch not all that long ago because Debian
> > GNU/NetBSD didn't have nl_langinfo(CHARSET). Maybe it's a recent
                                       ^^^^^^^ I meant CODESET ...
> > addition.
> 
> According to NetBSD's own on-line manpages (I've misplaced the URL ATM),
> they've supported it since 1994.  Perhaps it's implemented but buggy?

Bug #130356 prompted the change to groff. The BSD list may know more.

> Is Debian GNU/NetBSD using the same libc that NetBSD uses?

Yep.

Cheers,

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to matthew green <mrg@eterna.com.au>:
Extra info received and forwarded to list. Copy sent to Dpkg Development <debian-dpkg@lists.debian.org>, dpkg@packages.qa.debian.org. Full text and rfc822 format available.

Message #100 received at 175370@bugs.debian.org (full text, mbox):

From: matthew green <mrg@eterna.com.au>
To: Colin Watson <cjwatson@debian.org>
Cc: 175370@bugs.debian.org, debian-bsd@lists.debian.org, Steve Langasek <vorlon@netexpress.net>
Subject: re: Bug#175370: A reliable way to get LC_CTYPE encoding
Date: Thu, 16 Jan 2003 12:38:07 +1100
   On Wed, Jan 15, 2003 at 09:34:45AM -0600, Steve Langasek wrote:
   > On Wed, Jan 15, 2003 at 07:31:42AM -0600, Colin Watson wrote:
   > > On Tue, Jan 14, 2003 at 05:40:45PM -0600, Steve Langasek wrote:
   > > > FWIW, I did some research into nl_langinfo() for an upstream recently,
   > > > and found that it's implemented on at least these platforms:
   > > > 
   > > >    Linux (glibc)
   > > >    FreeBSD
   > > >    NetBSD
   > > >    Solaris
   > > >    AIX
   > 
   > > Hmm, groff needed a patch not all that long ago because Debian
   > > GNU/NetBSD didn't have nl_langinfo(CHARSET). Maybe it's a recent
                                          ^^^^^^^ I meant CODESET ...
   > > addition.
   > 
   > According to NetBSD's own on-line manpages (I've misplaced the URL ATM),
   > they've supported it since 1994.  Perhaps it's implemented but buggy?
   
   Bug #130356 prompted the change to groff. The BSD list may know more.
   
   > Is Debian GNU/NetBSD using the same libc that NetBSD uses?
   
   Yep.


looking at the netbsd code, nl_langinfo() has been supported since 
relatively forever, but CODESET is a newer addition.  given these
days, CODESET should appear in netbsd 1.6.

from src/lib/libc/locale/nl_langinfo.c:

	[ .. ]
	1.1          (jtc      21-Jun-94): char *
	1.1          (jtc      21-Jun-94): nl_langinfo(item)
	1.1          (jtc      21-Jun-94):      nl_item item;
	[ .. ]
	1.7          (tshiozak 26-Mar-01):      case CODESET:
	1.8          (tshiozak 17-Mar-02):              s = _CurrentRuneLocale->rl_codeset;
	1.7          (tshiozak 26-Mar-01):              if (!s)
	1.7          (tshiozak 26-Mar-01):                      s = "";


i don't know anything about freebsd for this.


.mrg.



Changed Bug title. Request was from Adam Heath <doogie@brainfood.com> to control@bugs.debian.org. Full text and rfc822 format available.

Tags added: l10n Request was from Christian Perrier <bubulle@debian.org> to control@bugs.debian.org. Full text and rfc822 format available.

Tags removed: l10n Request was from Guillem Jover <guillem@debian.org> to control@bugs.debian.org. (Sat, 24 Mar 2007 04:42:02 GMT) Full text and rfc822 format available.

Changed Bug title to Use UTF-8 internally, convert to locale's encoding for output from [UTF-8] patch: use UTF-8 internally, convert to locale's encoding for output. Request was from Guillem Jover <guillem@debian.org> to control@bugs.debian.org. (Sun, 25 Mar 2007 04:57:08 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <team@dpkg.org>:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Vincent Lefevre <vincent@vinc17.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <team@dpkg.org>. Full text and rfc822 format available.

Message #113 received at 175370@bugs.debian.org (full text, mbox):

From: Vincent Lefevre <vincent@vinc17.org>
To: 175370@bugs.debian.org
Subject: Use UTF-8 internally, convert to locale's encoding for output
Date: Wed, 16 May 2007 15:44:55 +0200
Any news?

-- 
Vincent Lefèvre <vincent@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#175370; Package dpkg. Full text and rfc822 format available.

Acknowledgement sent to Kurt Roeckx <kurt@roeckx.be>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. Full text and rfc822 format available.

Message #118 received at 175370@bugs.debian.org (full text, mbox):

From: Kurt Roeckx <kurt@roeckx.be>
To: 175370@bugs.debian.org
Subject: Re: patch: use UTF-8 internally, convert to locale's encoding for output
Date: Sun, 2 Mar 2008 17:26:22 +0100
On Sat, Jan 04, 2003 at 07:43:33PM -0500, Colin Walters wrote:
> Package: dpkg
> Tags: patch
> 
> Here is a patch against the current dpkg source (1.10.9), which uses my
> little "localeio" library to output UTF-8 strings in the user's current
> locale encoding.
> 
> With this patch, the idea is that dpkg will use UTF-8 for all strings
> internally, and only convert to the locale's encoding on output.  This
> patch is still experimental, but it works for me on some tests.  

Can someone comment on what needs to happen for such a patch to be
included?

It seems that all packages have a control file in utf-8 now, but dpkg
doesn't properly display the maintainer and descriptions if you're not
on an utf-8 terminal.

I see 2 ways to deal with this:
- Do everything internally in utf-8 as the proposed patch does, and
  convert to CODESET on output.
- Convert the contents of the files we read from utf-8 to CODESET.

In either case we need 2 function that seem to have been a problem in
the past and resulted in this not getting applied yet:
- nl_langinfo(CODESET)
- iconv()

Are those still considered to be a problem?  Both were atleast
documented in SUS v2 in 1997.


Kurt





Removed tag(s) patch. Request was from Raphaël Hertzog <hertzog@debian.org> to control@bugs.debian.org. (Thu, 06 May 2010 13:33:09 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Mon Apr 21 02:46:39 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.