Debian Bug report logs - #440420
[AMENDMENT 11/02/2008] Manual page encoding

version graph

Package: debian-policy; Maintainer for debian-policy is Debian Policy List <debian-policy@lists.debian.org>; Source for debian-policy is src:debian-policy.

Reported by: Colin Watson <cjwatson@debian.org>

Date: Sat, 1 Sep 2007 12:09:01 UTC

Severity: normal

Fixed in version debian-policy/3.8.0.0

Done: Russ Allbery <rra@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, debian-i18n@lists.debian.org, debian-doc@lists.debian.org, debhelper@packages.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
New Bug report received and forwarded. Copy sent to debian-i18n@lists.debian.org, debian-doc@lists.debian.org, debhelper@packages.debian.org, Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: submit@bugs.debian.org
Subject: [PROPOSAL] Manual page encoding
Date: Sat, 1 Sep 2007 13:02:33 +0100
[Message part 1 (text/plain, inline)]
Package: debian-policy
Severity: wishlist

[CCs: debian-i18n and debian-doc for obvious reasons, and the debhelper
maintainer since there's a dh_installman change mentioned in the
transition plan further down.]

Recently I have encountered some confusion as to the proper encoding of
manual pages (which is entirely understandable given that this subsystem
is lagging somewhat behind the rest of the world in terms of UTF-8
support). As the man-db maintainer, I would like to clarify this in
policy.

Note that, while there are one or two instances of deviation which
prompted this proposal, this documents current practice in that it is
what has been implemented in man-db for some time and it is already
followed by the vast majority of packages. I don't believe that I'm
making large swathes of packages instantly buggy here; if they did not
follow this policy, they would already be buggy in that pages would be
displayed with visible encoding damage. Accordingly, I've tentatively
used a "must" for the encoding rules. I'm prepared to back off to a
"should" if consensus on the list is against me here.

I have used the language "not yet recommended" regarding installation of
UTF-8 manual pages. My intent here was not so much to normatively state
that this is a bug as to discourage it for the time being. As I noted in
a footnote, I do expect this to be supported properly in man-db 2.5.0,
which I've been working on for a while now (and in earnest for about the
last week).

I thus propose the following amendment, generated against
debian-policy@lists.debian.org--lenny/debian-policy--devel--3.7--base-0.
I am seeking comments on and seconds for this proposal.

--- orig/policy.sgml
+++ mod/policy.sgml
@@ -8450,6 +8450,39 @@
 	      be present in the future.
  	  </footnote>
  	</p>
+
+	<p>
+	  Manual pages that are installed under
+	  <file>/usr/share/man/</file><var>ll</var>, where <var>ll</var>
+	  is an ISO-639 language code, must be encoded with the usual
+	  legacy (non-UTF-8) character set for that language, as shown
+	  by:
+	  <example compact="compact">
+egrep -v '\.|@|UTF-8' /usr/share/i18n/SUPPORTED
+	  </example>
+	  <footnote>
+	    This is necessary because many packages have historically
+	    included manual pages encoded thus, and changing the
+	    encoding of the whole hierarchy would involve a difficult
+	    transitional period.
+	  </footnote>
+	  Manual pages that are installed under
+	  <file>/usr/share/man/</file><var>locale</var>, where
+	  <var>locale</var> is a full locale name listed in
+	  <file>/usr/share/i18n/SUPPORTED</file>, must be encoded with
+	  the character set implied by that locale.
+	</p>
+
+	<p>
+	  At present, it is not generally possible to install a manual
+	  page encoded in UTF-8 such that it will be used in all locales
+	  for that language (for example, a page installed under
+	  <file>/usr/share/man/fr_FR.UTF-8</file> will not be used in
+	  the <tt>fr_BE.UTF-8</tt> locale). It is therefore not yet
+	  recommended to install pages encoded in UTF-8, but rather to
+	  continue using the legacy encoding.<footnote>This is expected
+	  to change as of man-db 2.5.0.</footnote>
+	</p>
       </sect>
 
       <sect>


It will perhaps be helpful if I describe my transition plan for getting
manual pages into UTF-8. Contrary to what occasionally seems to be
popular belief, a newer version of groff is not necessary here (which is
just as well as repeated attempts to merge in the CJK patch have been
exceedingly painful, though I still hold out hope to get it done
eventually). man-db is capable of shoving in iconv pipes as necessary.

  1. Status at time of writing: packages should use only
     /usr/share/man/<ll>/ (although some packages have anticipated an
     approximation of the transition plan; we ignore these for the
     moment as there is little point in changing them only to change
     them back later), and must use the legacy encoding for pages
     installed there.

  2. man-db 2.5.0-1 uploaded, including support for installing pages in
     /usr/share/man/<ll>.<codeset>/ (e.g. /usr/share/man/fr.UTF-8). The
     basename of this directory is not typically a well-formed locale,
     but it is appropriate because it allows a clear specification of
     the hierarchy's encoding while applying to all countries using that
     language.

  3. man-db 2.5.0-1 moves into testing.

  4. Packages encouraged (via debian-devel-announce) to begin using
     /usr/share/man/<ll>.UTF-8/; installation in other hierarchies will
     not be necessary as man-db will recode as needed. Packages using
     these hierarchies will be encouraged to declare Conflicts: man-db
     (<< 2.5.0-1) (or will Breaks: be allowed by that point? is either
     one just overkill?).

  5. Update dh_installman to recode manual pages to UTF-8 automatically
     and install them under /usr/share/man/<ll>.UTF-8/. Getting the
     Conflicts:/Breaks: in here might be difficult, plus I'm not sure
     I'm wild about creating several thousand more arcs in our
     dependency graph. Maybe it's better just to wait for a stable
     release before changing debhelper, and not worry too much about the
     Conflicts:/Breaks: as it's not like the whole system will break as
     a result.

  6. Policy updated once this has been shaken down and confirmed to work
     properly.

  7. Distant future: deprecate /usr/share/man/<ll>/. This will only be
     for consistency, so there's no need to rush.

This shouldn't be too difficult from where I am now, and at the moment I
see no obstacles to landing UTF-8 manual page support for lenny. Note
that the implementation using iconv will mean that any characters used
that are not recodable to the corresponding legacy encoding will be
discarded; this is difficult to avoid without upgrading groff, but I
don't anticipate it being a substantial problem. Likewise, we'll
probably still be unable to handle Arabic and Indic scripts properly,
and CJK will probably still be a massive hack; but it'll be an
improvement.

Thanks,

-- 
Colin Watson                                       [cjwatson@debian.org]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Kurt Roeckx <kurt@roeckx.be>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #10 received at 440420@bugs.debian.org (full text, mbox):

From: Kurt Roeckx <kurt@roeckx.be>
To: Colin Watson <cjwatson@debian.org>, 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Sat, 1 Sep 2007 14:49:20 +0200
On Sat, Sep 01, 2007 at 01:02:33PM +0100, Colin Watson wrote:
> +	  Manual pages that are installed under
> +	  <file>/usr/share/man/</file><var>ll</var>, where <var>ll</var>
> +	  is an ISO-639 language code, must be encoded with the usual
> +	  legacy (non-UTF-8) character set for that language
[...]
> +	  Manual pages that are installed under
> +	  <file>/usr/share/man/</file><var>locale</var>, where
> +	  <var>locale</var> is a full locale name listed in
> +	  <file>/usr/share/i18n/SUPPORTED</file>, must be encoded with
> +	  the character set implied by that locale.
[...]

>   2. man-db 2.5.0-1 uploaded, including support for installing pages in
>      /usr/share/man/<ll>.<codeset>/ (e.g. /usr/share/man/fr.UTF-8). The
>      basename of this directory is not typically a well-formed locale,
>      but it is appropriate because it allows a clear specification of
>      the hierarchy's encoding while applying to all countries using that
>      language.

This part doesn't seem to be documented, or should the locale from the
second case also apply to the not well-formed locale <ll>.<codeset>?


Kurt




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #15 received at 440420@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: Kurt Roeckx <kurt@roeckx.be>
Cc: 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Sat, 1 Sep 2007 14:57:21 +0100
On Sat, Sep 01, 2007 at 02:49:20PM +0200, Kurt Roeckx wrote:
> On Sat, Sep 01, 2007 at 01:02:33PM +0100, Colin Watson wrote:
> > +	  Manual pages that are installed under
> > +	  <file>/usr/share/man/</file><var>ll</var>, where <var>ll</var>
> > +	  is an ISO-639 language code, must be encoded with the usual
> > +	  legacy (non-UTF-8) character set for that language
> [...]
> > +	  Manual pages that are installed under
> > +	  <file>/usr/share/man/</file><var>locale</var>, where
> > +	  <var>locale</var> is a full locale name listed in
> > +	  <file>/usr/share/i18n/SUPPORTED</file>, must be encoded with
> > +	  the character set implied by that locale.
> [...]
> 
> >   2. man-db 2.5.0-1 uploaded, including support for installing pages in
> >      /usr/share/man/<ll>.<codeset>/ (e.g. /usr/share/man/fr.UTF-8). The
> >      basename of this directory is not typically a well-formed locale,
> >      but it is appropriate because it allows a clear specification of
> >      the hierarchy's encoding while applying to all countries using that
> >      language.
> 
> This part doesn't seem to be documented, or should the locale from the
> second case also apply to the not well-formed locale <ll>.<codeset>?

Point 2. above is part of a transition plan that is not yet implemented,
and so isn't part of the policy amendment I'm proposing now. (I only
mentioned it in the same mail to anticipate questions about support for
UTF-8 manual pages.) When it is implemented, I'll propose a further
amendment that requires pages under /usr/share/man/<ll>.<codeset> to be
encoded with <codeset>.

Cheers,

-- 
Colin Watson                                       [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Jens Seidel <jensseidel@users.sf.net>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #20 received at 440420@bugs.debian.org (full text, mbox):

From: Jens Seidel <jensseidel@users.sf.net>
To: Colin Watson <cjwatson@debian.org>, 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Sun, 2 Sep 2007 22:24:43 +0200
On Sat, Sep 01, 2007 at 01:02:33PM +0100, Colin Watson wrote:
> --- orig/policy.sgml
> +++ mod/policy.sgml
> @@ -8450,6 +8450,39 @@
>  	      be present in the future.
>   	  </footnote>
>   	</p>
> +
> +	<p>
> +	  Manual pages that are installed under
> +	  <file>/usr/share/man/</file><var>ll</var>, where <var>ll</var>

Please use <file>/usr/share/man/<var>ll</var></file> as ll is part of
the filename.

> +	  is an ISO-639 language code, must be encoded with the usual
> +	  legacy (non-UTF-8) character set for that language, as shown
> +	  by:
> +	  <example compact="compact">
> +egrep -v '\.|@|UTF-8' /usr/share/i18n/SUPPORTED

You are aware of the fact that some languages such as Vietnamese have a
8 bit encoding but do not match this regular expression
(vi_VN.TCVN TCVN5712-1)?

> +	  At present, it is not generally possible to install a manual
> +	  page encoded in UTF-8 such that it will be used in all locales
> +	  for that language (for example, a page installed under
> +	  <file>/usr/share/man/fr_FR.UTF-8</file> will not be used in
> +	  the <tt>fr_BE.UTF-8</tt> locale). It is therefore not yet
> +	  recommended to install pages encoded in UTF-8, but rather to
> +	  continue using the legacy encoding.<footnote>This is expected
> +	  to change as of man-db 2.5.0.</footnote>

Maybe it would be a good idea to explain what to do with non supported
encodings these days. What to do with a Vietnamese page? Installing it
now UTF-8 encoded into vi.UTF-8/ seems fine to me but you write "not yet
recommended"!

>   1. Status at time of writing: packages should use only
>      /usr/share/man/<ll>/ (although some packages have anticipated an

Except for languages not yet supported by a classical encoding ...

>   5. Update dh_installman to recode manual pages to UTF-8 automatically
>      and install them under /usr/share/man/<ll>.UTF-8/. Getting the

This requires an option to specify the encoding of the manual page. Or
assume UTF-8 by default for all languages not having a matching regular
expression.

>      Conflicts:/Breaks: in here might be difficult, plus I'm not sure

Why not just ignoring this? If updating man-db is sufficient let's
ignore dependencies. (If a HTML documentation file uses the new
(fictitious) HTML version 9 there is no need to list all browsers
supporting this in the dependencies.)

Jens



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #25 received at 440420@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: Jens Seidel <jensseidel@users.sf.net>
Cc: 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Sun, 2 Sep 2007 23:31:45 +0100
On Sun, Sep 02, 2007 at 10:24:43PM +0200, Jens Seidel wrote:
> On Sat, Sep 01, 2007 at 01:02:33PM +0100, Colin Watson wrote:
> > --- orig/policy.sgml
> > +++ mod/policy.sgml
> > @@ -8450,6 +8450,39 @@
> >  	      be present in the future.
> >   	  </footnote>
> >   	</p>
> > +
> > +	<p>
> > +	  Manual pages that are installed under
> > +	  <file>/usr/share/man/</file><var>ll</var>, where <var>ll</var>
> 
> Please use <file>/usr/share/man/<var>ll</var></file> as ll is part of
> the filename.

Thanks; consider it amended.

> > +	  is an ISO-639 language code, must be encoded with the usual
> > +	  legacy (non-UTF-8) character set for that language, as shown
> > +	  by:
> > +	  <example compact="compact">
> > +egrep -v '\.|@|UTF-8' /usr/share/i18n/SUPPORTED
> 
> You are aware of the fact that some languages such as Vietnamese have a
> 8 bit encoding but do not match this regular expression
> (vi_VN.TCVN TCVN5712-1)?

Hmm, yes. I'm not sure what to do about Vietnamese at the moment; I
doubt it works properly right now. I'll check it out.

> > +	  At present, it is not generally possible to install a manual
> > +	  page encoded in UTF-8 such that it will be used in all locales
> > +	  for that language (for example, a page installed under
> > +	  <file>/usr/share/man/fr_FR.UTF-8</file> will not be used in
> > +	  the <tt>fr_BE.UTF-8</tt> locale). It is therefore not yet
> > +	  recommended to install pages encoded in UTF-8, but rather to
> > +	  continue using the legacy encoding.<footnote>This is expected
> > +	  to change as of man-db 2.5.0.</footnote>
> 
> Maybe it would be a good idea to explain what to do with non supported
> encodings these days. What to do with a Vietnamese page? Installing it
> now UTF-8 encoded into vi.UTF-8/ seems fine to me but you write "not yet
> recommended"!

Well, that just plain won't work; man won't look there. There are some
locales that are unfortunately left out in the cold at the moment. I'm
working to improve the situation.

(While man will look in /usr/share/man/vi_VI.UTF-8, that won't work
properly either because groff doesn't accept UTF-8 input and man doesn't
know how to recode that to an 8-bit encoding that can be passed through
groff's ascii8 device and recoded back to UTF-8 on the other side.
Basically, if man doesn't know about the legacy encoding for your
language, you're currently screwed, and no amount of changes to policy
will help you. Yes, this is far from ideal.)

> >   5. Update dh_installman to recode manual pages to UTF-8 automatically
> >      and install them under /usr/share/man/<ll>.UTF-8/. Getting the
> 
> This requires an option to specify the encoding of the manual page. Or
> assume UTF-8 by default for all languages not having a matching regular
> expression.

I was thinking of having dh_installman recode to UTF-8, and yes, you
would need to know the encoding somehow (maybe a table of legacy
encodings as is currently in man-db would do the job). This is the least
well-thought-out part of my transition plan, though, so ideas are good.

> >      Conflicts:/Breaks: in here might be difficult, plus I'm not sure
> 
> Why not just ignoring this? If updating man-db is sufficient let's
> ignore dependencies. (If a HTML documentation file uses the new
> (fictitious) HTML version 9 there is no need to list all browsers
> supporting this in the dependencies.)

Yeah, I'm open to just ignoring this. It's probably the pragmatic
approach.

Thanks,

-- 
Colin Watson                                       [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Jens Seidel <jensseidel@users.sf.net>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #30 received at 440420@bugs.debian.org (full text, mbox):

From: Jens Seidel <jensseidel@users.sf.net>
To: Colin Watson <cjwatson@debian.org>, 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Mon, 3 Sep 2007 08:30:39 +0200
On Sun, Sep 02, 2007 at 11:31:45PM +0100, Colin Watson wrote:
> On Sun, Sep 02, 2007 at 10:24:43PM +0200, Jens Seidel wrote:
> > On Sat, Sep 01, 2007 at 01:02:33PM +0100, Colin Watson wrote:
> > > +	  is an ISO-639 language code, must be encoded with the usual
> > > +	  legacy (non-UTF-8) character set for that language, as shown
> > > +	  by:
> > > +	  <example compact="compact">
> > > +egrep -v '\.|@|UTF-8' /usr/share/i18n/SUPPORTED
> > 
> > You are aware of the fact that some languages such as Vietnamese have a
> > 8 bit encoding but do not match this regular expression
> > (vi_VN.TCVN TCVN5712-1)?
> 
> Hmm, yes. I'm not sure what to do about Vietnamese at the moment; I
> doubt it works properly right now. I'll check it out.

I doubt it too...
 
> > > +	  the <tt>fr_BE.UTF-8</tt> locale). It is therefore not yet
> > > +	  recommended to install pages encoded in UTF-8, but rather to
> > 
> > Maybe it would be a good idea to explain what to do with non supported
> > encodings these days. What to do with a Vietnamese page? Installing it
> > now UTF-8 encoded into vi.UTF-8/ seems fine to me but you write "not yet
> > recommended"!
> 
> Well, that just plain won't work; man won't look there. There are some

Yup, I'm aware of it. But once proper support to man-db is added it will
work. There should be no need to upload a large amount of packages just
to fix manual pages after the man-db transition if this can happen
already now. (Or should currently not supported manual pages not
be installed at all?)

Isn't this the core idea of extenting the policy? To guide the
developer what should/will be used once the transition happened?

hex-a-hop installs already the Vietnamese and the Bulgarian manpages,
both are currently not supported (at least in Etch and according to the
changelog also in Sid -- and can be used as a test for you). (I will
file a bug for Bulgarian on man-db soon.)

Jens



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #35 received at 440420@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: Jens Seidel <jensseidel@users.sf.net>
Cc: 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Mon, 3 Sep 2007 13:11:04 +0100
On Mon, Sep 03, 2007 at 08:30:39AM +0200, Jens Seidel wrote:
> On Sun, Sep 02, 2007 at 11:31:45PM +0100, Colin Watson wrote:
> > On Sun, Sep 02, 2007 at 10:24:43PM +0200, Jens Seidel wrote:
> > > On Sat, Sep 01, 2007 at 01:02:33PM +0100, Colin Watson wrote:
> > > > +	  is an ISO-639 language code, must be encoded with the usual
> > > > +	  legacy (non-UTF-8) character set for that language, as shown
> > > > +	  by:
> > > > +	  <example compact="compact">
> > > > +egrep -v '\.|@|UTF-8' /usr/share/i18n/SUPPORTED
> > > 
> > > You are aware of the fact that some languages such as Vietnamese have a
> > > 8 bit encoding but do not match this regular expression
> > > (vi_VN.TCVN TCVN5712-1)?
> > 
> > Hmm, yes. I'm not sure what to do about Vietnamese at the moment; I
> > doubt it works properly right now. I'll check it out.
> 
> I doubt it too...

Regardless, to make it work with current groff (which reserves a part of
the input character set for its own use and thereby conflicts with UTF-8
input), a legacy character set is needed; what I was trying to express
is that this should be the "most usual" legacy encoding for that
language.

Vietnamese is an odd case. In the long term, I think being explicit
(vi.UTF-8) is the right answer anyway.

> > > > +	  the <tt>fr_BE.UTF-8</tt> locale). It is therefore not yet
> > > > +	  recommended to install pages encoded in UTF-8, but rather to
> > > 
> > > Maybe it would be a good idea to explain what to do with non supported
> > > encodings these days. What to do with a Vietnamese page? Installing it
> > > now UTF-8 encoded into vi.UTF-8/ seems fine to me but you write "not yet
> > > recommended"!
> > 
> > Well, that just plain won't work; man won't look there. There are some
> 
> Yup, I'm aware of it. But once proper support to man-db is added it will
> work. There should be no need to upload a large amount of packages just
> to fix manual pages after the man-db transition if this can happen
> already now. (Or should currently not supported manual pages not
> be installed at all?)

This is sort of what my caveat about the "not yet recommended" language
was about. I agree with you that if it doesn't work with current man
anyway then there is no harm in installing it in the future location.
I'm not sure how to word this in policy though; do you have any
suggestions?

Maybe it would be better for me to just focus on getting man-db 2.5.0
done ASAP and not worry too much about policy in the meantime. :-)

> Isn't this the core idea of extenting the policy? To guide the
> developer what should/will be used once the transition happened?
> 
> hex-a-hop installs already the Vietnamese and the Bulgarian manpages,
> both are currently not supported (at least in Etch and according to the
> changelog also in Sid -- and can be used as a test for you). (I will
> file a bug for Bulgarian on man-db soon.)

That Bulgarian page is a particularly unfortunate example because it
uses the ѝ character which is not in CP1251 (the encoding of the bg_BG
locale), so right now we have no reliable path to render this page. I've
added Bulgarian support anyway, it's just that this page will be a bit
broken. I think you would be best advised to move this page to
/usr/share/man/bg.UTF-8 given that it definitely won't work in
/usr/share/man/bg.

In the case of the Vietnamese page, please change the "—" character
(U+2014) to "\-" as is standard in NAME sections; otherwise this works
fine when recoded via TCVN5712-1 so I've added support for this too.
Again, I think you would be best advised to install this in
/usr/share/man/vi.UTF-8.

Cheers,

-- 
Colin Watson                                       [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Jens Seidel <jensseidel@users.sf.net>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #40 received at 440420@bugs.debian.org (full text, mbox):

From: Jens Seidel <jensseidel@users.sf.net>
To: Colin Watson <cjwatson@debian.org>, 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Mon, 3 Sep 2007 16:15:31 +0200
Hi Colin,

it becomes now slightly off topic but let me reply a last time to this
bug report.

On Mon, Sep 03, 2007 at 01:11:04PM +0100, Colin Watson wrote:
> On Mon, Sep 03, 2007 at 08:30:39AM +0200, Jens Seidel wrote:
> Vietnamese is an odd case. In the long term, I think being explicit
> (vi.UTF-8) is the right answer anyway.
 
OK.

> I'm not sure how to word this in policy though; do you have any
> suggestions?

How about:

"It is therefore not yet recommended to install UTF-8 encoded pages if a
classical encoding can be used instead, but rather to continue using the
legacy encoding."
 
> Maybe it would be better for me to just focus on getting man-db 2.5.0
> done ASAP and not worry too much about policy in the meantime. :-)

Right :-)
 
> That Bulgarian page is a particularly unfortunate example because it
> uses the ѝ character which is not in CP1251 (the encoding of the bg_BG

Hm, OK. Bulgarian is very similar to Russian so I really wondered that
it wasn't supported. charsets(7) mentions the 8859-5 encoding for
Bulgarian but I wrongly assumed this manpage documents valid man file
encodings :-)

> locale), so right now we have no reliable path to render this page. I've
> added Bulgarian support anyway, it's just that this page will be a bit

Thanks. 

> broken. I think you would be best advised to move this page to
> /usr/share/man/bg.UTF-8 given that it definitely won't work in

OK, will do so. But as dh_installman doesn't support it yet I need to do
it manually (not a big task).

> /usr/share/man/bg.
> 
> In the case of the Vietnamese page, please change the "—" character
> (U+2014) to "\-" as is standard in NAME sections; otherwise this works

Oops, Clytie used a Unicode character and Lintian or po4a did not
complained :-( I will file a bug against Lintian ...

> fine when recoded via TCVN5712-1 so I've added support for this too.

Great, but I think TCVN5712-1 is not supported by po4a as output
encoding.

> Again, I think you would be best advised to install this in
> /usr/share/man/vi.UTF-8.

Thanks, I will do so.

It's really nice that we will have soon a first Bulgarian manpage in
Debian. There are probably also not more Vietnamese pages installed.

Thanks, again
Jens



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to "Giacomo A. Catenazzi" <cate@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #45 received at 440420@bugs.debian.org (full text, mbox):

From: "Giacomo A. Catenazzi" <cate@debian.org>
To: Colin Watson <cjwatson@debian.org>, 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Mon, 03 Sep 2007 17:38:10 +0200
Colin Watson wrote:
> Package: debian-policy
> Severity: wishlist
> 
(...)

> --- orig/policy.sgml
> +++ mod/policy.sgml
> @@ -8450,6 +8450,39 @@
>  	      be present in the future.
>   	  </footnote>
>   	</p>
> +
> +	<p>
> +	  Manual pages that are installed under
> +	  <file>/usr/share/man/</file><var>ll</var>, where <var>ll</var>
> +	  is an ISO-639 language code, must be encoded with the usual
> +	  legacy (non-UTF-8) character set for that language, as shown
> +	  by:
> +	  <example compact="compact">
> +egrep -v '\.|@|UTF-8' /usr/share/i18n/SUPPORTED
> +	  </example>
> +	  <footnote>
> +	    This is necessary because many packages have historically
> +	    included manual pages encoded thus, and changing the
> +	    encoding of the whole hierarchy would involve a difficult
> +	    transitional period.
> +	  </footnote>
> +	  Manual pages that are installed under
> +	  <file>/usr/share/man/</file><var>locale</var>, where
> +	  <var>locale</var> is a full locale name listed in
> +	  <file>/usr/share/i18n/SUPPORTED</file>, must be encoded with
> +	  the character set implied by that locale.
> +	</p>

I don't like the proposal ;-)
It is not very POSIXly and to application specific.

1-
The POSIX way to specify locale is:
language[_territory][.codeset] or
[language[_territory][.codeset][@modifier]] for some LC_ variables)

It is confusing the "legacy (non-UTF-8) character".
Every locale has a charset. So the man page should be
encoded according the right locale (in the manual PATH).

2-
I've some problem with
/usr/share/i18n/SUPPORTED
Who generate this file?
IIRC our glibc has more locales.
I don't find "en", "de".

3-
With the above point, I think that "en" (as example) has
a charset (from glibc), so man page should be set with
such charset.
Every other charset in a man page is a bug


> +
> +	<p>
> +	  At present, it is not generally possible to install a manual
> +	  page encoded in UTF-8 such that it will be used in all locales
> +	  for that language (for example, a page installed under
> +	  <file>/usr/share/man/fr_FR.UTF-8</file> will not be used in
> +	  the <tt>fr_BE.UTF-8</tt> locale). It is therefore not yet
> +	  recommended to install pages encoded in UTF-8, but rather to
> +	  continue using the legacy encoding.<footnote>This is expected
> +	  to change as of man-db 2.5.0.</footnote>
> +	</p>
>        </sect>
>  
>        <sect>

If I understand correctly, this is only a transitional comment, so
I think we should forget about this, and update the policy when
the man-db/man is corrected.


> It will perhaps be helpful if I describe my transition plan for getting
> manual pages into UTF-8. Contrary to what occasionally seems to be
> popular belief, a newer version of groff is not necessary here (which is
> just as well as repeated attempts to merge in the CJK patch have been
> exceedingly painful, though I still hold out hope to get it done
> eventually). man-db is capable of shoving in iconv pipes as necessary.
> 
>   1. Status at time of writing: packages should use only
>      /usr/share/man/<ll>/ (although some packages have anticipated an
>      approximation of the transition plan; we ignore these for the
>      moment as there is little point in changing them only to change
>      them back later), and must use the legacy encoding for pages
>      installed there.

As above, I don't think it is incorrect.
But I agree that it will cause difficulties on an eventual change of
default encoding or to see what is the encoding of a given language.


>   2. man-db 2.5.0-1 uploaded, including support for installing pages in
>      /usr/share/man/<ll>.<codeset>/ (e.g. /usr/share/man/fr.UTF-8). The
>      basename of this directory is not typically a well-formed locale,
>      but it is appropriate because it allows a clear specification of
>      the hierarchy's encoding while applying to all countries using that
>      language.

Use locale and locale priorities as specified on POSIX, and
allow full <locale> not only a subclass.

>   3. man-db 2.5.0-1 moves into testing.
> 
>   4. Packages encouraged (via debian-devel-announce) to begin using
>      /usr/share/man/<ll>.UTF-8/; installation in other hierarchies will
>      not be necessary as man-db will recode as needed. Packages using
>      these hierarchies will be encouraged to declare Conflicts: man-db
>      (<< 2.5.0-1) (or will Breaks: be allowed by that point? is either
>      one just overkill?).

I don't think we should go to UTF-8, but we should allow users
to use any good (for the language) charset.  It is also a lot difficult
to change charset or upstreams.

So I propose that manpage specify a charset (i.e. not using the
defaul local with only the language (and territory)).

>   5. Update dh_installman to recode manual pages to UTF-8 automatically
>      and install them under /usr/share/man/<ll>.UTF-8/. Getting the
>      Conflicts:/Breaks: in here might be difficult, plus I'm not sure
>      I'm wild about creating several thousand more arcs in our
>      dependency graph. Maybe it's better just to wait for a stable
>      release before changing debhelper, and not worry too much about the
>      Conflicts:/Breaks: as it's not like the whole system will break as
>      a result.

change: to encode on relevant charset.
BTW I think it should be done on dynamically on "man" program.

BTW there should be only one "original" man page per language, and
this page should create the other encodings (but for very special cases).
Otherwise it should be difficult to maintain in parallel the versions.

> 
>   6. Policy updated once this has been shaken down and confirmed to work
>      properly.

So without the transition comment.
> 
>   7. Distant future: deprecate /usr/share/man/<ll>/. This will only be
>      for consistency, so there's no need to rush.

No, but in a short future: it should be a symbolic link to the
right (as defined in locale) ll.charset

Eventually we should discuss with glibc people
about locale definition, and how to export information
to other programs (and thus "man")

ciao
	cate



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #50 received at 440420@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: Jens Seidel <jensseidel@users.sf.net>
Cc: 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Mon, 3 Sep 2007 17:09:12 +0100
On Mon, Sep 03, 2007 at 04:15:31PM +0200, Jens Seidel wrote:
> On Mon, Sep 03, 2007 at 01:11:04PM +0100, Colin Watson wrote:
> > I'm not sure how to word this in policy though; do you have any
> > suggestions?
> 
> How about:
> 
> "It is therefore not yet recommended to install UTF-8 encoded pages if a
> classical encoding can be used instead, but rather to continue using the
> legacy encoding."

That sounds reasonable, though I'd use the same term (either "classical
encoding" or "legacy encoding") in both places to avoid creating
confusion about whether these might be two different things.

> > /usr/share/man/bg.
> > 
> > In the case of the Vietnamese page, please change the "—" character
> > (U+2014) to "\-" as is standard in NAME sections; otherwise this works
> 
> Oops, Clytie used a Unicode character and Lintian or po4a did not
> complained :-( I will file a bug against Lintian ...

There's already a to-do comment in Lintian noting that it doesn't check
non-English manual pages yet; this can be changed once man-db 2.5.0 is
uploaded, but not really before.

> > fine when recoded via TCVN5712-1 so I've added support for this too.
> 
> Great, but I think TCVN5712-1 is not supported by po4a as output
> encoding.

I meant having man do this automatically.

Cheers,

-- 
Colin Watson                                       [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #55 received at 440420@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: "Giacomo A. Catenazzi" <cate@debian.org>
Cc: 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Mon, 3 Sep 2007 17:47:19 +0100
On Mon, Sep 03, 2007 at 05:38:10PM +0200, Giacomo A. Catenazzi wrote:
> Colin Watson wrote:
> >--- orig/policy.sgml
> >+++ mod/policy.sgml
> >@@ -8450,6 +8450,39 @@
> > 	      be present in the future.
> >  	  </footnote>
> >  	</p>
> >+
> >+	<p>
> >+	  Manual pages that are installed under
> >+	  <file>/usr/share/man/</file><var>ll</var>, where <var>ll</var>
> >+	  is an ISO-639 language code, must be encoded with the usual
> >+	  legacy (non-UTF-8) character set for that language, as shown
> >+	  by:
> >+	  <example compact="compact">
> >+egrep -v '\.|@|UTF-8' /usr/share/i18n/SUPPORTED
> >+	  </example>
> >+	  <footnote>
> >+	    This is necessary because many packages have historically
> >+	    included manual pages encoded thus, and changing the
> >+	    encoding of the whole hierarchy would involve a difficult
> >+	    transitional period.
> >+	  </footnote>
> >+	  Manual pages that are installed under
> >+	  <file>/usr/share/man/</file><var>locale</var>, where
> >+	  <var>locale</var> is a full locale name listed in
> >+	  <file>/usr/share/i18n/SUPPORTED</file>, must be encoded with
> >+	  the character set implied by that locale.
> >+	</p>
> 
> I don't like the proposal ;-)
> It is not very POSIXly and to application specific.

Of course it is application-specific; /usr/share/man is
application-specific (i.e. specific to the man application). Methods of
processing /usr/share/man that don't use /usr/bin/man are already broken
in other ways. (man exports a number of specialised interfaces that can
be used by frontends, and I'm happy to add more on request.)

POSIX does not specify anything about the layout of /usr/share/man. The
FHS makes an attempt, but it's horribly broken (speaking as one who has
attempted to implement it), predates widespread deployment of UTF-8, and
does not really help with the problem to hand anyway.

> 1-
> The POSIX way to specify locale is:
> language[_territory][.codeset] or
> [language[_territory][.codeset][@modifier]] for some LC_ variables)

Note that e.g. fr.UTF-8 matches this pattern, so I don't see your
problem. The territory is intentionally omitted from the installation
directory in my transition plan because it causes real problems.

man will support full locale names under /usr/share/man, but in my
transition plan I do not recommend using them because you don't
typically want to make your French manual pages available only to users
in France; they should be available to Belgians, French Canadians, Swiss
French, and Luxembourgers as well. The standard exceptions well-known to
internationalisation implementors are Chinese (zh_CN and zh_TW are
different dialects and different scripts) and Portuguese (pt_PT and
pt_BR are more or less different languages).

> It is confusing the "legacy (non-UTF-8) character".

Yes, it is, but it is current practice and I merely document it. If we
were starting from scratch with the benefit of hindsight then obviously
we wouldn't have done it this way.

I think it's unambiguous for all languages where we actually have
existing manual pages to worry about.

> Every locale has a charset. So the man page should be
> encoded according the right locale (in the manual PATH).

My proposal (the diff, as opposed to the transition plan later in my
original message) documents current practice, in which manual pages are
installed in directories such as /usr/share/man/fr. "fr" is not a full
locale name recognised by glibc, and does not have a defined character
set in our system. Thus, we must define its character set by means of
observing that historically pages installed there have been encoded in
ISO-8859-1, and standardising that to prevent unsolvable encoding
conflicts.

In future, it absolutely makes sense to install the pages in
/usr/share/man/fr.UTF-8 instead, which is where my transition plan takes
us. But, for now, the only available alternatives are
/usr/share/man/fr_FR.ISO-8859-1 and /usr/share/man/fr_FR.UTF-8, which
(as above) have fundamental problems, and in any case are not
well-supported at the moment (in man-db 2.4.*,
/usr/share/man/fr_FR.UTF-8 will only be used if you are using that exact
locale; in man-db 2.5.0, it will be used for users of the fr_FR
(ISO-8859-1) locale as well and recoded on the fly, so that you don't
have to install one manual page per possible encoding).

> 2-
> I've some problem with
> /usr/share/i18n/SUPPORTED
> Who generate this file?
> IIRC our glibc has more locales.

glibc ships this file.

  $ dpkg -S /usr/share/i18n/SUPPORTED
  locales: /usr/share/i18n/SUPPORTED
  $ apt-cache show locales | grep Source:
  Source: glibc

> I don't find "en", "de".

That's because glibc does not recognise those as valid locales. If you
believe that a locale exists in our system but it is not in
/usr/share/i18n/SUPPORTED, you are by definition mistaken. :-)

> 3-
> With the above point, I think that "en" (as example) has
> a charset (from glibc), so man page should be set with
> such charset.

Your assumption is mistaken, I'm afraid. /usr/share/i18n/SUPPORTED is
the canonical list of available locales in our system. There is no
straightforward way to ask the question "what is the conventional legacy
character set for <language>?" without also specifying a country, which
doesn't help when trying to determine the character set of files under
/usr/share/man/fr. That's why man has its own table for this.

> >+
> >+	<p>
> >+	  At present, it is not generally possible to install a manual
> >+	  page encoded in UTF-8 such that it will be used in all locales
> >+	  for that language (for example, a page installed under
> >+	  <file>/usr/share/man/fr_FR.UTF-8</file> will not be used in
> >+	  the <tt>fr_BE.UTF-8</tt> locale). It is therefore not yet
> >+	  recommended to install pages encoded in UTF-8, but rather to
> >+	  continue using the legacy encoding.<footnote>This is expected
> >+	  to change as of man-db 2.5.0.</footnote>
> >+	</p>
> >       </sect>
> > 
> >       <sect>
> 
> If I understand correctly, this is only a transitional comment, so
> I think we should forget about this, and update the policy when
> the man-db/man is corrected.

I'm happy to go that route too; I simply thought in the event that a
policy upload was coming soon then it might be helpful to document
current practice. It also gives me something to document the new policy
against after man-db 2.5.0. :-)

> >  2. man-db 2.5.0-1 uploaded, including support for installing pages
> >  in /usr/share/man/<ll>.<codeset>/ (e.g. /usr/share/man/fr.UTF-8).
> >  The basename of this directory is not typically a well-formed
> >  locale, but it is appropriate because it allows a clear
> >  specification of the hierarchy's encoding while applying to all
> >  countries using that language.
> 
> Use locale and locale priorities as specified on POSIX, and allow full
> <locale> not only a subclass.

man-db permits them and will continue to do so, but as above I strongly
believe that with the exception of Chinese and Portuguese it is not
generally to our users' advantage to install manual pages under full
locale names, unless you're lucky enough to use a language spoken in
only one country. (IIRC you're in Switzerland; do you use it_CH.UTF-8?
If so, you would not be well-served by pages specifying it_IT.UTF-8, in
the same way that you would not be well-served by .po files specifying
"it_IT" rather than just "it".)

> >  3. man-db 2.5.0-1 moves into testing.
> >
> >  4. Packages encouraged (via debian-devel-announce) to begin using
> >  /usr/share/man/<ll>.UTF-8/; installation in other hierarchies will
> >  not be necessary as man-db will recode as needed. Packages using
> >  these hierarchies will be encouraged to declare Conflicts: man-db
> >  (<< 2.5.0-1) (or will Breaks: be allowed by that point? is either
> >  one just overkill?).
> 
> I don't think we should go to UTF-8, but we should allow users to use
> any good (for the language) charset.  It is also a lot difficult to
> change charset or upstreams.

I should clarify that /usr/share/man/<ll>.UTF-8/ will be used by man for
all <ll>* locales, not merely for those where the user requested UTF-8;
man will recode to the appropriate character set on the fly.

It is true that manual pages could be installed using any character set
and would work fine, but since we will be able to standardise on UTF-8 I
think we should do so, for all the same reasons that we should
standardise on UTF-8 elsewhere: for one, it greatly simplifies things if
you're looking at manual page source for whatever reason.

Upstreams do not need to change, or at least can change at their
leisure; it's trivial to recode the page to UTF-8 in debian/rules.

> So I propose that manpage specify a charset (i.e. not using the defaul
> local with only the language (and territory)).

That is what I'm doing here. The character set named in the directory
name specifies the encoding for all manual pages installed under that
directory; it does not mandate that only users of that character set may
use these manual pages. (I understand your confusion since this is not
what is implemented in current man-db, but frankly that implementation
doesn't benefit anyone.)

There are other ways of specifying the encoding such as by putting them
in a header in the page itself, but those are much less convenient in
practice and are less efficient when implemented (since you have to
decompress and open the page before you can find its encoding).

> >  5. Update dh_installman to recode manual pages to UTF-8
> >  automatically and install them under /usr/share/man/<ll>.UTF-8/.
> >  Getting the Conflicts:/Breaks: in here might be difficult, plus I'm
> >  not sure I'm wild about creating several thousand more arcs in our
> >  dependency graph. Maybe it's better just to wait for a stable
> >  release before changing debhelper, and not worry too much about the
> >  Conflicts:/Breaks: as it's not like the whole system will break as
> >  a result.
> 
> change: to encode on relevant charset. BTW I think it should be done
> on dynamically on "man" program.

As above, you appear to have misunderstood the transition plan; man will
recode dynamically.

> BTW there should be only one "original" man page per language, and
> this page should create the other encodings (but for very special
> cases). Otherwise it should be difficult to maintain in parallel the
> versions.

There should be only one manual page per language, full stop. In the new
world order, it should be installed under /usr/share/man/<ll>.UTF-8 and
all other encodings will be generated on the fly.

> >  7. Distant future: deprecate /usr/share/man/<ll>/. This will only
> >  be for consistency, so there's no need to rush.
> 
> No, but in a short future: it should be a symbolic link to the right
> (as defined in locale) ll.charset

No, this cannot be done safely (it will create incompatibility) and is
furthermore unnecessary and confusing. In any case it is not possible
for a symbolic link on the filesystem to be dependent on the user's
locale. This is handled in other ways.

> Eventually we should discuss with glibc people about locale
> definition, and how to export information to other programs (and thus
> "man")

I've implemented all this personally; glibc already provides all the
information I need, aside from the strange question of "conventional
legacy encodings" which is an extremely ambiguous and debatable request
to make of glibc in any case and which is already handled in a good
enough way in man. There is no need for glibc to change here.

Cheers,

-- 
Colin Watson                                       [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to "Giacomo A. Catenazzi" <cate@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #60 received at 440420@bugs.debian.org (full text, mbox):

From: "Giacomo A. Catenazzi" <cate@debian.org>
To: Colin Watson <cjwatson@debian.org>, 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Tue, 04 Sep 2007 10:55:03 +0200
Colin Watson wrote:
> On Mon, Sep 03, 2007 at 05:38:10PM +0200, Giacomo A. Catenazzi wrote:
>> Colin Watson wrote:

>> I don't like the proposal ;-)
>> It is not very POSIXly and to application specific.
> 
> Of course it is application-specific; /usr/share/man is
> application-specific (i.e. specific to the man application). Methods of
> processing /usr/share/man that don't use /usr/bin/man are already broken
> in other ways. (man exports a number of specialised interfaces that can
> be used by frontends, and I'm happy to add more on request.)

But we have the same problem with info, with the HOWTO, with the
doc, ....
[For most of such program, it is not a huge problem, the building
scripts could correct (i.e. from 8-bit to tex or html "symbol" codes]

For this reason, I would like a general policy and solution.
(The /usr/share/man then it would a follow-up policy)

Or there is fewer problem on other docs?


> POSIX does not specify anything about the layout of /usr/share/man. The
> FHS makes an attempt, but it's horribly broken (speaking as one who has
> attempted to implement it), predates widespread deployment of UTF-8, and
> does not really help with the problem to hand anyway.

Yes, I saw (and there are some strange consideration), but I meant:
POSIX define locales and how application use locales.
If we convert manpages with UTF-8, I think we broke posix:
the user can see wrong encoding.
But ok, this should be only an implementation details, but
not let broke non UTF-8 valid locales.


>> 1-
>> The POSIX way to specify locale is:
>> language[_territory][.codeset] or
>> [language[_territory][.codeset][@modifier]] for some LC_ variables)
> 
> Note that e.g. fr.UTF-8 matches this pattern, so I don't see your
> problem. The territory is intentionally omitted from the installation
> directory in my transition plan because it causes real problems.

yes, was only a commentary. I read wrong the rest thread
(a patch correcting markup and not an alternate proposal).

> man will support full locale names under /usr/share/man, but in my
> transition plan I do not recommend using them because you don't
> typically want to make your French manual pages available only to users
> in France; they should be available to Belgians, French Canadians, Swiss
> French, and Luxembourgers as well. The standard exceptions well-known to
> internationalisation implementors are Chinese (zh_CN and zh_TW are
> different dialects and different scripts) and Portuguese (pt_PT and
> pt_BR are more or less different languages).

Yes, I was sure there was exceptions (on locales there are special
cases everywere).
But I was thinking to a possible over-engineering: manpages that
explain output of the program: the output in an ideal world should
be written in the user locale (number and dates).
I think now should not be done for fr_FR, fr_CH, fr_CA, fr_BE,
but in future (and in an automatic way) maybe.
So in the policy I would mention the possible triplets
(for application reading the files), but OTOH, man pages
should not yet be installed with a territory (and eventually policy
could list the zh_ and pt_ [and what we forget])

>> It is confusing the "legacy (non-UTF-8) character".
> 
> Yes, it is, but it is current practice and I merely document it. If we
> were starting from scratch with the benefit of hindsight then obviously
> we wouldn't have done it this way.
> 
> I think it's unambiguous for all languages where we actually have
> existing manual pages to worry about.

I don't like the wording.  Now it seems that UTF-8 is superior
to other encoding, but we should take UTF-8 as the ultimate
encoding.  I propose a simple "non-UTF-8 character".
Anyway this is a very minor point.


>> Every locale has a charset. So the man page should be
>> encoded according the right locale (in the manual PATH).
> 
> My proposal (the diff, as opposed to the transition plan later in my
> original message) documents current practice, in which manual pages are
> installed in directories such as /usr/share/man/fr. "fr" is not a full
> locale name recognised by glibc, and does not have a defined character
> set in our system. Thus, we must define its character set by means of
> observing that historically pages installed there have been encoded in
> ISO-8859-1, and standardising that to prevent unsolvable encoding
> conflicts.
> 
> In future, it absolutely makes sense to install the pages in
> /usr/share/man/fr.UTF-8 instead, which is where my transition plan takes
> us. But, for now, the only available alternatives are
> /usr/share/man/fr_FR.ISO-8859-1 and /usr/share/man/fr_FR.UTF-8, which
> (as above) have fundamental problems, and in any case are not
> well-supported at the moment (in man-db 2.4.*,
> /usr/share/man/fr_FR.UTF-8 will only be used if you are using that exact
> locale; in man-db 2.5.0, it will be used for users of the fr_FR
> (ISO-8859-1) locale as well and recoded on the fly, so that you don't
> have to install one manual page per possible encoding).

Ok. I used a wrong assumption ("fr" is not a legal locale).
See other comment on point 4 on transition plan.


>> 2-
>> I've some problem with
>> /usr/share/i18n/SUPPORTED
(...)
>> I don't find "en", "de".
> 
> That's because glibc does not recognise those as valid locales. If you
> believe that a locale exists in our system but it is not in
> /usr/share/i18n/SUPPORTED, you are by definition mistaken. :-)

Yes. I was confusing about HTTP language syndication (I don't
remember exactly the word). On POSIX I found nothing about
priorities.

>> 3-
>> With the above point, I think that "en" (as example) has
>> a charset (from glibc), so man page should be set with
>> such charset.
> 
> Your assumption is mistaken, I'm afraid. /usr/share/i18n/SUPPORTED is
> the canonical list of available locales in our system. There is no
> straightforward way to ask the question "what is the conventional legacy
> character set for <language>?" without also specifying a country, which
> doesn't help when trying to determine the character set of files under
> /usr/share/man/fr. That's why man has its own table for this.

yes, wrong assumptions.

>>>  2. man-db 2.5.0-1 uploaded, including support for installing pages
>>>  in /usr/share/man/<ll>.<codeset>/ (e.g. /usr/share/man/fr.UTF-8).
>>>  The basename of this directory is not typically a well-formed
>>>  locale, but it is appropriate because it allows a clear
>>>  specification of the hierarchy's encoding while applying to all
>>>  countries using that language.
>> Use locale and locale priorities as specified on POSIX, and allow full
>> <locale> not only a subclass.
> 
> man-db permits them and will continue to do so, but as above I strongly
> believe that with the exception of Chinese and Portuguese it is not
> generally to our users' advantage to install manual pages under full
> locale names, unless you're lucky enough to use a language spoken in
> only one country. (IIRC you're in Switzerland; do you use it_CH.UTF-8?
> If so, you would not be well-served by pages specifying it_IT.UTF-8, in
> the same way that you would not be well-served by .po files specifying
> "it_IT" rather than just "it".)

my language and locale is "C", although I created the it_CH for glibc ;-)
As explained above, I would make a note so that program could expect
territory, but for now we should not install man page in a triplet
(but ev. fot pt_ and zh_)

>>>  3. man-db 2.5.0-1 moves into testing.
>>>
>>>  4. Packages encouraged (via debian-devel-announce) to begin using
>>>  /usr/share/man/<ll>.UTF-8/; installation in other hierarchies will
>>>  not be necessary as man-db will recode as needed. Packages using
>>>  these hierarchies will be encouraged to declare Conflicts: man-db
>>>  (<< 2.5.0-1) (or will Breaks: be allowed by that point? is either
>>>  one just overkill?).
>> I don't think we should go to UTF-8, but we should allow users to use
>> any good (for the language) charset.  It is also a lot difficult to
>> change charset or upstreams.
> 
> I should clarify that /usr/share/man/<ll>.UTF-8/ will be used by man for
> all <ll>* locales, not merely for those where the user requested UTF-8;
> man will recode to the appropriate character set on the fly.
> 
> It is true that manual pages could be installed using any character set
> and would work fine, but since we will be able to standardise on UTF-8 I
> think we should do so, for all the same reasons that we should
> standardise on UTF-8 elsewhere: for one, it greatly simplifies things if
> you're looking at manual page source for whatever reason.
> 
> Upstreams do not need to change, or at least can change at their
> leisure; it's trivial to recode the page to UTF-8 in debian/rules.

"man will recode to the appropriate character set on the fly.",
so on point 3, you should mention also a new "man" version.

I like UTF-8, but I don't like that we set UTF-8 as
predefinite debian encoding.
And in such case, I would set a default policy (not only
for manpages, for debian/changelog, ...).

Anyway, IIRC there was some negative comment about email
in UTF-8, in the discussion about DPL vote and wrong
MUA handling of signed UTF-8 vote.

Do you think it is feasible to convert manpage on UTF-8,
from the non-latin alphabet?
For this point we should see commentary on i18n list

> 
>> So I propose that manpage specify a charset (i.e. not using the defaul
>> local with only the language (and territory)).
> 
> That is what I'm doing here. The character set named in the directory
> name specifies the encoding for all manual pages installed under that
> directory; it does not mandate that only users of that character set may
> use these manual pages. (I understand your confusion since this is not
> what is implemented in current man-db, but frankly that implementation
> doesn't benefit anyone.)

But you propose only "UTF-8" encoding.
Unfortunately Debian is no more the upstream of man-db.

> There are other ways of specifying the encoding such as by putting them
> in a header in the page itself, but those are much less convenient in
> practice and are less efficient when implemented (since you have to
> decompress and open the page before you can find its encoding).

No, I agree that directory based selection of encoding is better.

>> BTW there should be only one "original" man page per language, and
>> this page should create the other encodings (but for very special
>> cases). Otherwise it should be difficult to maintain in parallel the
>> versions.
> 
> There should be only one manual page per language, full stop. In the new
> world order, it should be installed under /usr/share/man/<ll>.UTF-8 and
> all other encodings will be generated on the fly.

ok

>>>  7. Distant future: deprecate /usr/share/man/<ll>/. This will only
>>>  be for consistency, so there's no need to rush.
>> No, but in a short future: it should be a symbolic link to the right
>> (as defined in locale) ll.charset
> 
> No, this cannot be done safely (it will create incompatibility) and is
> furthermore unnecessary and confusing. In any case it is not possible
> for a symbolic link on the filesystem to be dependent on the user's
> locale. This is handled in other ways.

No, I meant "fr" point to "fr_ISO-8859-1".  But I used the wrong
assumption. So forget my comment.

>> Eventually we should discuss with glibc people about locale
>> definition, and how to export information to other programs (and thus
>> "man")
> 
> I've implemented all this personally; glibc already provides all the
> information I need, aside from the strange question of "conventional
> legacy encodings" which is an extremely ambiguous and debatable request
> to make of glibc in any case and which is already handled in a good
> enough way in man. There is no need for glibc to change here.

Also this was about the wrong assumption. I was not finding
option on locale(1) or on other files about what was the
default encoding of "fr".  But considering it is not a valid
locale, no problem here.


In summary, now I'm ok with your proposal.
I don't like the "hardcoded" UTF-8, and I'm not sure that
an automatic conversion is featible for some non latin alphabet.
But it is the only clean and reasonable solution.

ciao
	cate




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #65 received at 440420@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: "Giacomo A. Catenazzi" <cate@debian.org>
Cc: 440420@bugs.debian.org, debian-i18n@lists.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Tue, 4 Sep 2007 11:52:57 +0100
On Tue, Sep 04, 2007 at 10:55:03AM +0200, Giacomo A. Catenazzi wrote:
> Colin Watson wrote:
> >On Mon, Sep 03, 2007 at 05:38:10PM +0200, Giacomo A. Catenazzi wrote:
> >>I don't like the proposal ;-)
> >>It is not very POSIXly and to application specific.
> >
> >Of course it is application-specific; /usr/share/man is
> >application-specific (i.e. specific to the man application). Methods of
> >processing /usr/share/man that don't use /usr/bin/man are already broken
> >in other ways. (man exports a number of specialised interfaces that can
> >be used by frontends, and I'm happy to add more on request.)
> 
> But we have the same problem with info, with the HOWTO, with the
> doc, ....

Manual pages are different because:

  * They are not typically read directly, but via a toolset that is
    capable of dealing with such matters as encoding translation in a
    manner appropriate to the user's locale. In other words, we can
    safely recommend UTF-8 in the comfortable knowledge that it can be
    done transparently.

    info may share this property; I'm not sure because I'm not familiar
    with it at the implementation level and I haven't noticed it having
    much in the way of internationalisation support in general.

    I don't particularly object to recommending UTF-8 for HTML
    documentation as such, but it is clearly less convenient as you need
    to adjust the files themselves to declare a character set rather
    than just installing them in a different place.

    Other documentation is often read with a simple pager. UTF-8 is
    probably the most convenient encoding long-term in order that you
    can read documentation in more than one language without
    reconfiguring your software, but I imagine there is plenty of room
    for local exceptions here and it is certainly less clear.

  * As a general rule, manual pages are much better localised than other
    documentation. That is, they actually get localised. We may not be
    anywhere close to completion, but compare it to the other forms of
    documentation you mentioned: info has a handful of translations with
    a variety of naming conventions (is there any client support for
    selecting them automatically?), and random files in /usr/share/doc
    typically aren't localised or at best maybe have one or two
    translations (usually in the upstream author's native language). The
    only other form of documentation I'm aware of with a comparable
    level of localisation is the HOWTOs from the Linux Documentation
    Project.

  * Because our current groff implementation imposes quite strict
    restrictions on what input and output encodings are possible, and
    usually needs to know detailed information about these encodings in
    order to achieve correct typography, it is if anything more
    important than usual for man to have an accurate idea of the
    document's character set.

  * Because manual page encoding is specified by means of file system
    location, and because only a strict subset of the file system is
    allowed, it is important for policy to specify how this is to be
    handled across many packages for interoperability, more so than for
    forms of documentation where file system location is immaterial.

> For this reason, I would like a general policy and solution.
> (The /usr/share/man then it would a follow-up policy)
> 
> Or there is fewer problem on other docs?

I don't think it's really reasonable or necessary to create a general
policy covering both /usr/share/man and other documentation in a single
piece of text. The requirements are too different, and several different
documentation formats have their own special requirements and need to
move at their own pace. Current policy wisely does not attempt to treat
them as a single unit, but has subsections for the two major specialised
formats (man and info).

> >POSIX does not specify anything about the layout of /usr/share/man. The
> >FHS makes an attempt, but it's horribly broken (speaking as one who has
> >attempted to implement it), predates widespread deployment of UTF-8, and
> >does not really help with the problem to hand anyway.
> 
> Yes, I saw (and there are some strange consideration), but I meant:
> POSIX define locales and how application use locales.
> If we convert manpages with UTF-8, I think we broke posix:
> the user can see wrong encoding.

No, you still don't understand. The conversion is only applied to the
source files, not what users see. POSIX does not impose requirements on
the encoding of applications' data files: each file clearly has to have
an encoding and an application that can know what encoding is in use and
convert it to the user's locale is clearly doing a better job than one
that can't.

> But I was thinking to a possible over-engineering: manpages that
> explain output of the program: the output in an ideal world should
> be written in the user locale (number and dates).

You mean the LC_NUMERIC and LC_TIME locale categories? There is no
support for this in groff and I think this is unlikely to happen. As you
suggest yourself, this is overengineering; a manual page is probably
better advised to explain in prose, as it's not at all impossible for a
user to look at a manual page in a different locale.

In any case, I would appreciate it if you didn't distract this proposal
that's purely about encodings to become a general debate about wishlists
for locale handling in manual pages.

> So in the policy I would mention the possible triplets
> (for application reading the files),

Triplets? Do you mean language[_territory][.codeset]? Just say "locales"
rather than inventing a new term.

I'm not sure what you want to be mentioned, though. Are you looking for
a complete specification of the possible subdirectory names under
/usr/share/man? Perhaps it would be better to document that in man-db,
and leave policy to recommend the best choice rather than document all
possible choices. After all, the policy group's job is to take
decisions.

> >>It is confusing the "legacy (non-UTF-8) character".
> >
> >Yes, it is, but it is current practice and I merely document it. If we
> >were starting from scratch with the benefit of hindsight then obviously
> >we wouldn't have done it this way.
> >
> >I think it's unambiguous for all languages where we actually have
> >existing manual pages to worry about.
> 
> I don't like the wording.  Now it seems that UTF-8 is superior
> to other encoding, but we should take UTF-8 as the ultimate
> encoding.  I propose a simple "non-UTF-8 character".
> Anyway this is a very minor point.

I'm not sure this is the right place to debate UTF-8's superiority to
earlier 8-bit encodings such as ISO-8859-1 or the double-byte character
sets. I think it's self-evident while it's not clear that you do, and
this doesn't seem like the place to reach agreement on that. I also
don't think in this case that we need to be afraid to adopt the best
available encoding now for fear that a better one might come along
later; should that happen, we can simply move along gradually to it and
have man recode on the fly, just as I'm proposing we do here.

Sure, we can say "non-UTF-8" rather than "legacy", though I think policy
should be unafraid to take a strong stance on this. I borrow the
"legacy" term from Unicode advocates such as Markus Kuhn. I think it's
quite an accurate and justified description of the encodings that are
only useful for one or a small number of languages.

> >>> 3. man-db 2.5.0-1 moves into testing.
[...]
> >I should clarify that /usr/share/man/<ll>.UTF-8/ will be used by man for
> >all <ll>* locales, not merely for those where the user requested UTF-8;
> >man will recode to the appropriate character set on the fly.
[...]
> "man will recode to the appropriate character set on the fly.",
> so on point 3, you should mention also a new "man" version.

"3. man-db 2.5.0-1 moves into testing."

  $ ls -l /usr/bin/man
  lrwxrwxrwx 1 root root 17 2007-08-26 23:29 /usr/bin/man -> ../lib/man-db/man
  $ dpkg -S /usr/lib/man-db/man
  man-db: /usr/lib/man-db/man

This is the second time in this thread that you've apparently forgotten
to do basic fact-checking before posting. Could you please adjust your
behaviour here? This is getting a little tedious.

> I like UTF-8, but I don't like that we set UTF-8 as
> predefinite debian encoding.
> And in such case, I would set a default policy (not only
> for manpages, for debian/changelog, ...).

Policy is already moving in the direction of a default here. See the
footnote to section C.2.2 (which recommends UTF-8 for changelogs):

  I think it is fairly obvious that we need to eventually transition to
  UTF-8 for our package infrastructure; it is really the only sane
  char-set in an international environment. Now, we can't switch to
  using UTF-8 for package control fields and the like until dpkg has
  better support, but one thing we can start doing today is requesting
  that Debian changelogs are UTF-8 encoded. At some point in time, we
  can start requiring them to do so.

> Anyway, IIRC there was some negative comment about email
> in UTF-8, in the discussion about DPL vote and wrong
> MUA handling of signed UTF-8 vote.

E-mail is a difficult case because some mail user agents are stuck in a
bygone age, but that is not comparable to the case of a tree of files
for use essentially by a single program under our clear control.

I don't wish to be arrogant here, but I have six years of practical
experience implementing this kind of stuff in man-db (obviously with
lots of help from experts in particular languages etc.). I do not want
to deal with speculative worries that aren't even about the same
subsystem. For the purposes of this proposal, please restrict your
concerns to real examples regarding manual pages, not half-remembered
comments about e-mail.

> Do you think it is feasible to convert manpage on UTF-8,
> from the non-latin alphabet?
> For this point we should see commentary on i18n list

Yes, I do. The Debian CJK patch to groff already implements CJK
encodings (the only case that presents any kind of problem here, to my
knowledge) by converting them to UCS-2 internally and then back to the
source encoding for output. If there is a problem with the conversion,
which as far as I have heard there is not right now, then we would
already be encountering it.

The only other non-Latin encoding currently supported by man-db in
Debian is KOI8-R. Since it's a simple 8-bit encoding, I doubt there is
any kind of round-trip problem with Unicode, and I have not heard of
one.

Though the CC hasn't been preserved, I CCed debian-i18n on my initial
bug report, so I hope they're aware of this proposal. I have reinstated
the CC here.

> >>So I propose that manpage specify a charset (i.e. not using the defaul
> >>local with only the language (and territory)).
> >
> >That is what I'm doing here. The character set named in the directory
> >name specifies the encoding for all manual pages installed under that
> >directory; it does not mandate that only users of that character set may
> >use these manual pages. (I understand your confusion since this is not
> >what is implemented in current man-db, but frankly that implementation
> >doesn't benefit anyone.)
> 
> But you propose only "UTF-8" encoding.

I propose that policy should standardise that we move to using UTF-8 as
the source encoding for all manual pages since it clearly makes sense to
do so. This will still need to be specified by each manual page (by
means of the directory in which it is installed), and it does *not*
affect what user locales are supported in any way. The
internationalisation changes in man-db 2.5.0 will arrange for users to
see pages in their native language when they did not before; I do not
expect it to cause any users to fail to see pages in their native
language when they previously did.

Once man-db 2.5.0 is in place, the change in policy to recommend
installing pages with UTF-8 encoding in a properly marked directory will
have *no* effect on users, no matter what their locale. It is purely for
improved maintenance of the system.

> Unfortunately Debian is no more the upstream of man-db.

Excuse me! I'm sorry, but on this point you seem to be quite rude. *I*
am the upstream for man-db, and I do so wearing my Debian developer hat
and using my @debian.org address. After Fabrizio's death in 2001, when I
took over as Debian maintainer of man-db, I contacted Graeme Wilford
informing him of my wish to take over as upstream; I received a reply in
mid-April giving me permission. I released man-db 2.3.18 in May 2001,
and since then have made seven further upstream releases, the last one
being in February of this year.

I use the Debian bug tracking system for upstream purposes, typically
take account of Debian release cycles when doing upstream development,
and upload new upstream versions to Debian promptly. The only thing I
don't do is use the native packaging format, which was really never a
particularly good idea for man-db and which I don't find helpful in this
case. If I as a Debian developer am not the upstream maintainer for
man-db, I should very much like to know who is.

Please retract this misstatement. The most cursory examination of
/usr/share/doc/man-db/copyright would have overturned it. What was the
point of saying that, anyway?

> In summary, now I'm ok with your proposal.
> I don't like the "hardcoded" UTF-8, and I'm not sure that
> an automatic conversion is featible for some non latin alphabet.
> But it is the only clean and reasonable solution.

Thanks. I hope that my comments above clarify some further confusion. I
would still appreciate concrete information and examples on why you
don't like the idea of manual pages being installed in UTF-8 (noting
that as a package maintainer or a translator you wouldn't have to
actually edit it in that encoding if you didn't want to, it doesn't have
to be done urgently or on any kind of flag day, I have addressed the
non-Latin concern above, and it will not have a negative effect on users
of non-UTF-8 locales).

Regards,

-- 
Colin Watson                                       [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Jens Seidel <jensseidel@users.sf.net>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #70 received at 440420@bugs.debian.org (full text, mbox):

From: Jens Seidel <jensseidel@users.sf.net>
To: Colin Watson <cjwatson@debian.org>
Cc: "Giacomo A. Catenazzi" <cate@debian.org>, 440420@bugs.debian.org, debian-i18n@lists.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Tue, 4 Sep 2007 14:04:32 +0200
Hi Colin,

there is maybe one item which should be clarified.

On Tue, Sep 04, 2007 at 11:52:57AM +0100, Colin Watson wrote:
> Thanks. I hope that my comments above clarify some further confusion. I
> would still appreciate concrete information and examples on why you
> don't like the idea of manual pages being installed in UTF-8 (noting
> that as a package maintainer or a translator you wouldn't have to
> actually edit it in that encoding if you didn't want to, it doesn't have
> to be done urgently or on any kind of flag day, I have addressed the
> non-Latin concern above, and it will not have a negative effect on users
> of non-UTF-8 locales).

Is it save to use UTF-8 characters if a very similar character exists in
ASCII or can be expressed using groff macros? Think about the many
dashes which exist in typography. Is it OK to use a UTF-8 hyphen sign
instead of \(hy (same for en-dash, em-dash, ...) especially as the
ordinary minus "-" is very similar in the output?

Will man-db support all kind of white spaces (such as &nbsp;) ...?

This could make recodings a difficult task and I also do not know how to
recognize such a character without an hex editor.

Of course there exist transliterations of all these characters I'm
currently talking about but it would probably make the live easier to
restrict to ASCII if possible, right?

Isn't there not also more than one way to express accented characters
such as ä (as a single character and as "'a' followed by accent"?

Jens



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #75 received at 440420@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: Jens Seidel <jensseidel@users.sf.net>
Cc: "Giacomo A. Catenazzi" <cate@debian.org>, 440420@bugs.debian.org, debian-i18n@lists.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Tue, 4 Sep 2007 13:35:48 +0100
On Tue, Sep 04, 2007 at 02:04:32PM +0200, Jens Seidel wrote:
> On Tue, Sep 04, 2007 at 11:52:57AM +0100, Colin Watson wrote:
> > Thanks. I hope that my comments above clarify some further confusion. I
> > would still appreciate concrete information and examples on why you
> > don't like the idea of manual pages being installed in UTF-8 (noting
> > that as a package maintainer or a translator you wouldn't have to
> > actually edit it in that encoding if you didn't want to, it doesn't have
> > to be done urgently or on any kind of flag day, I have addressed the
> > non-Latin concern above, and it will not have a negative effect on users
> > of non-UTF-8 locales).
> 
> Is it save to use UTF-8 characters if a very similar character exists in
> ASCII or can be expressed using groff macros? Think about the many
> dashes which exist in typography. Is it OK to use a UTF-8 hyphen sign
> instead of \(hy (same for en-dash, em-dash, ...) especially as the
> ordinary minus "-" is very similar in the output?
> 
> Will man-db support all kind of white spaces (such as &nbsp;) ...?

You'll need to use the characters documented in groff_char(7) for this,
at least for the time being. See below.

> Of course there exist transliterations of all these characters I'm
> currently talking about but it would probably make the live easier to
> restrict to ASCII if possible, right?

I do appreciate that there are a few gotchas here. I think it is unduly
cumbersome to express all non-ASCII alphanumeric characters using groff
named characters, though. That option has been available for ages and
translators have generally not taken advantage of it; I can entirely
understand why not.

> Isn't there not also more than one way to express accented characters
> such as ä (as a single character and as "'a' followed by accent"?

groff 1.19 supports full Unicode-style composite glyphs, but the version
we have doesn't (see the comment in my original bug report about groff
versioning). Both our version and newer versions support named
characters such as \[:a] or \(:a (variant spellings), again documented
in groff_char(7). There's also the \N escape which can give you
font-dependent numbered glyphs, which are Unicode codepoints if you
happen to know that the utf8 device is in use.

As above, though, these have been available and translators generally
haven't used them; I can imagine that they're insanely cumbersome to use
in practice for e.g. Japanese. So I'd really rather just support plain
UTF-8 input for alphanumerics, which I think will actually get used.

Do you think we will need explicit language in policy for this? For the
time being, until we have a version of groff supporting direct UTF-8
input, the implementation will require that the page be convertible to
the legacy encoding for that language using iconv (it'll use 'iconv -c'
so that unknown characters are dropped rather than breaking the whole
page, but all the same): so e.g. for German pages characters without a
direct equivalent in ISO-8859-1 should be avoided. This seems like a
reasonable thing to document after man-db 2.5.0, and would cover things
like UTF-8 hyphen characters.

I'm not sure how groff will handle such characters once it does have
UTF-8 input support. I suspect it would convert U+2010 to its internal
"hy" glyph and render that in whatever way is appropriate for the output
device; that would really be ideal. However, I don't have enough
information to make a decision based on that guess.

In general, I think it's worthwhile for policy to make comments on
encoding for purposes of interoperability and standardisation, but I'd
be inclined to draw the line at filling it up with instructions on how
to use groff correctly. Does this sound reasonable?

Thanks,

-- 
Colin Watson                                       [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Jens Seidel <jensseidel@users.sf.net>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #80 received at 440420@bugs.debian.org (full text, mbox):

From: Jens Seidel <jensseidel@users.sf.net>
To: Colin Watson <cjwatson@debian.org>
Cc: "Giacomo A. Catenazzi" <cate@debian.org>, 440420@bugs.debian.org, debian-i18n@lists.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Tue, 4 Sep 2007 14:59:49 +0200
On Tue, Sep 04, 2007 at 01:35:48PM +0100, Colin Watson wrote:
> On Tue, Sep 04, 2007 at 02:04:32PM +0200, Jens Seidel wrote:
> > Is it save to use UTF-8 characters if a very similar character exists in
> > ASCII or can be expressed using groff macros? Think about the many
> > dashes which exist in typography. Is it OK to use a UTF-8 hyphen sign
> > instead of \(hy (same for en-dash, em-dash, ...) especially as the
> > ordinary minus "-" is very similar in the output?
> 
> > Of course there exist transliterations of all these characters I'm
> > currently talking about but it would probably make the live easier to
> > restrict to ASCII if possible, right?
> 
> I do appreciate that there are a few gotchas here. I think it is unduly
 
> So I'd really rather just support plain
> UTF-8 input for alphanumerics, which I think will actually get used.
> 
> Do you think we will need explicit language in policy for this? For the

Ah, no. But it should be documented somewhere and I wondered about this
after reading again your proposed patch (and the further info).

> This seems like a
> reasonable thing to document after man-db 2.5.0, and would cover things
> like UTF-8 hyphen characters.

Right. Without documentation every maintainer could now start fine tuning
man pages using all the stuff provided by Unicode ...
 
> In general, I think it's worthwhile for policy to make comments on
> encoding for purposes of interoperability and standardisation, but I'd
> be inclined to draw the line at filling it up with instructions on how
> to use groff correctly. Does this sound reasonable?

Yes, it does.

Thanks,
Jens



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #85 received at 440420@bugs.debian.org (full text, mbox):

From: Russ Allbery <rra@debian.org>
To: Colin Watson <cjwatson@debian.org>
Cc: 440420@bugs.debian.org, debian-i18n@lists.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Sun, 30 Dec 2007 22:28:12 -0800
Colin Watson <cjwatson@debian.org> writes:

> I propose that policy should standardise that we move to using UTF-8 as
> the source encoding for all manual pages since it clearly makes sense to
> do so. This will still need to be specified by each manual page (by
> means of the directory in which it is installed), and it does *not*
> affect what user locales are supported in any way. The
> internationalisation changes in man-db 2.5.0 will arrange for users to
> see pages in their native language when they did not before; I do not
> expect it to cause any users to fail to see pages in their native
> language when they previously did.
>
> Once man-db 2.5.0 is in place, the change in policy to recommend
> installing pages with UTF-8 encoding in a properly marked directory will
> have *no* effect on users, no matter what their locale. It is purely for
> improved maintenance of the system.

Hi Colin,

I assume that now that man-db 2.5.0 is in the archive, the original patch
in this bug report is no longer current and we should now be saying
something different.  We're trying to increase the speed of Policy work,
so hopefully this time we can get a change made in a timely fashion.  :)

Could you send a new patch to document the current recommendations for how
to encode man pages and deal with different locales when you get a chance?

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #90 received at 440420@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: Russ Allbery <rra@debian.org>
Cc: 440420@bugs.debian.org, debian-i18n@lists.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Mon, 31 Dec 2007 14:37:48 +0000
On Sun, Dec 30, 2007 at 10:28:12PM -0800, Russ Allbery wrote:
> Colin Watson <cjwatson@debian.org> writes:
> > I propose that policy should standardise that we move to using UTF-8 as
> > the source encoding for all manual pages since it clearly makes sense to
> > do so. This will still need to be specified by each manual page (by
> > means of the directory in which it is installed), and it does *not*
> > affect what user locales are supported in any way. The
> > internationalisation changes in man-db 2.5.0 will arrange for users to
> > see pages in their native language when they did not before; I do not
> > expect it to cause any users to fail to see pages in their native
> > language when they previously did.
> >
> > Once man-db 2.5.0 is in place, the change in policy to recommend
> > installing pages with UTF-8 encoding in a properly marked directory will
> > have *no* effect on users, no matter what their locale. It is purely for
> > improved maintenance of the system.
> 
> Hi Colin,
> 
> I assume that now that man-db 2.5.0 is in the archive, the original patch
> in this bug report is no longer current and we should now be saying
> something different.  We're trying to increase the speed of Policy work,
> so hopefully this time we can get a change made in a timely fashion.  :)

Right. Here's an update; I think I've captured most of the discussion in
the thread so far. The following patch could in principle be applied
now, given seconds. Wordsmithing welcome, as I'm aware that this is a
rather dense recommendation; I'm also looking for seconds for this
proposal.

--- orig/policy.sgml
+++ mod/policy.sgml
@@ -8521,6 +8521,43 @@
 	      be present in the future.
  	  </footnote>
  	</p>
+
+	<p>
+	  Manual pages installed under subdirectories of
+	  <file>/usr/share/man</file> with a codeset specification (e.g.
+	  <file>/usr/share/man/fr.UTF-8</file> or
+	  <file>/usr/share/man/de_DE.ISO-8859-1</file>) must be encoded
+	  using the named character encoding. The subdirectory name does
+	  not need to be a well-formed locale as in
+	  <file>/usr/share/i18n/SUPPORTED</file>; a language and
+	  codeset, for example <file>de.UTF-8</file>, is all that is
+	  necessary for most languages.<footnote>In fact, specifying a
+	  country is often harmful, as it excludes users of the language
+	  in other countries; de_DE would apply only to speakers of
+	  German in Germany, and not to those in Austria.</footnote>
+	</p>
+
+	<p>
+	  For compatibility with both previous versions of Debian and
+	  other systems, manual pages in other locale-specific
+	  subdirectories of <file>/usr/share/man</file> should use
+	  either UTF-8 or the usual legacy encoding for that language
+	  (usually the one corresponding to the shortest relevant locale
+	  name in <file>/usr/share/i18n/SUPPORTED</file>). For example,
+	  pages under <file>/usr/share/man/fr</file> should use either
+	  UTF-8 or ISO-8859-1.<footnote><prgn>man</prgn> will
+	  automatically detect whether UTF-8 is in use. In future, all
+	  manual pages will be required to use UTF-8.</footnote>
+	</p>
+
+	<p>
+	  Due to limitations in current implementations, all characters
+	  in the manual page source should be representable in the usual
+	  legacy encoding for that language, even if the file is
+	  actually encoded in UTF-8. Safe alternative ways to write many
+	  characters outside that range may be found in
+	  <manref name="groff_char" section="7">.
+	</p>
       </sect>
 
       <sect>

Not lying about your encoding is a safe "must", I think, because this is
pretty much indisputable and I know of no cases of this rule being
broken in today's archive (though I haven't done a full scan).

Once we're a little further into the transition, I would like to replace
the second paragraph above with one that says that all manual pages
"should" be encoded in UTF-8.

I'm still open to whether new-world-order pages should go in
/usr/share/man/LL.UTF-8 or just /usr/share/man/LL. Pros for LL.UTF-8:

  * Non-compliant implementations (I'm guessing xman, yelp, etc.) will
    display English manual pages rather than misencoded garbage. This
    might not be such a big deal for European languages, but for e.g.
    Japanese I suspect most people would prefer English to the spew you
    get by trying to interpret UTF-8 as EUC-JP.

  * Determining progress towards universal UTF-8 encoding can trivially
    be done by scanning Contents files rather than having to unpack the
    archive and run iconv over everything.

  * In the event that we later want to migrate to yet another
    "universal" encoding that can't be automatically distinguished from
    UTF-8, we already have the encoding name right there and migration
    will be straightforward. (I think this is an unlikely scenario.)

And cons:

  * Many upstream developers using Debian systems will follow along
    without realising that this only works with man-db. The result will
    be that e.g. Red Hat users will miss out on localised manual pages
    even though (AIUI) their man implementation expects UTF-8 in
    /usr/share/man/LL.

  * Changing dh_installman to move these files around might break a few
    debian/rules files that name subdirectories of /usr/share/man
    explicitly.

  * As an aesthetic point, the debris of this transition will be visible
    forever.

I think I am increasingly leaning towards just using /usr/share/man/LL,
seeing as man has to try decoding pages there as UTF-8 first anyway, but
please comment if you care.

> Could you send a new patch to document the current recommendations for how
> to encode man pages and deal with different locales when you get a chance?

Unfortunately 2.5.0 wasn't quite enough. Aside from a couple of stupid
bugs (mostly fixed now), it turns out that we need an extra feature to
allow debhelper to produce UTF-8 versions of manual pages without
needing the source encoding to be explicitly specified, by guessing the
encoding in the same way that man does:

  http://lists.debian.org/debian-i18n/2007/10/msg00063.html

I committed this feature to my development trunk earlier today, and will
be working on a 2.5.1 release over the next couple of weeks. After that
I'll send Joey a patch for debhelper.

Thus, an updated transition plan:

  1. Initial status: packages should use only /usr/share/man/<ll>/
     (although some packages have anticipated an approximation of the
     transition plan; we ignore these for the moment as there is little
     point in changing them only to change them back later), and must
     use the legacy encoding for pages installed there.

  2. man-db 2.5.0-1 uploaded, including support for installing pages in
     /usr/share/man/<ll>.<codeset>/ (e.g. /usr/share/man/fr.UTF-8). The
     basename of this directory is not typically a well-formed locale,
     but it allows a clear specification of the hierarchy's encoding
     while applying to all countries using that language. [DONE]

  3. man-db 2.5.1-1 uploaded, including 'man --recode'.

  4. dh_installman updated to recode manual pages to UTF-8 automatically
     (and install them under /usr/share/man/<ll>.UTF-8/?), using 'man
     --recode UTF-8' to guess the original encoding. debhelper Depends:
     man-db (>= 2.5.1-1) for this. Pages for which the DWIM fails can
     include an explicit coding: directive, which will be documented.

  5. man-db 2.5.1-1 and the corresponding debhelper move into testing.

  6. Packages encouraged (via debian-devel-announce) to begin using
     UTF-8 for manual pages (and /usr/share/man/<ll>.UTF-8/?). They do
     not need to declare any package relationship on man-db for this.

  7. Policy updated to recommend UTF-8 once this has been shaken down,
     confirmed to work properly, and deployed through a reasonable chunk
     of the archive thanks to debhelper.

  8. Distant future: deprecate /usr/share/man/<ll>/. This will only be
     for consistency, so there's no need to rush.

Thanks,

-- 
Colin Watson                                       [cjwatson@debian.org]




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #95 received at 440420@bugs.debian.org (full text, mbox):

From: Russ Allbery <rra@debian.org>
To: Colin Watson <cjwatson@debian.org>
Cc: 440420@bugs.debian.org, debian-i18n@lists.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Tue, 01 Jan 2008 11:37:30 -0800
Colin Watson <cjwatson@debian.org> writes:

> Right. Here's an update; I think I've captured most of the discussion in
> the thread so far. The following patch could in principle be applied
> now, given seconds. Wordsmithing welcome, as I'm aware that this is a
> rather dense recommendation; I'm also looking for seconds for this
> proposal.

This proposal and patch looks good to me, although I'd prefer to see a few
more seconds before I queue it up for applying to Policy 3.7.4.

> I'm still open to whether new-world-order pages should go in
> /usr/share/man/LL.UTF-8 or just /usr/share/man/LL. Pros for LL.UTF-8:
>
>   * Non-compliant implementations (I'm guessing xman, yelp, etc.) will
>     display English manual pages rather than misencoded garbage. This
>     might not be such a big deal for European languages, but for e.g.
>     Japanese I suspect most people would prefer English to the spew you
>     get by trying to interpret UTF-8 as EUC-JP.

I'd rather fix the other implementations, frankly.  All of Debian is
moving towards UTF-8, as is all of the rest of the Linux world, and I'd
rather not leave transitional measures around forever.

>   * Determining progress towards universal UTF-8 encoding can trivially
>     be done by scanning Contents files rather than having to unpack the
>     archive and run iconv over everything.

Yeah, but we already have an unpacked version of the archive available in
the lintian lab, so doing this isn't too bad.

>   * In the event that we later want to migrate to yet another
>     "universal" encoding that can't be automatically distinguished from
>     UTF-8, we already have the encoding name right there and migration
>     will be straightforward. (I think this is an unlikely scenario.)

Yes, this seems extremely unlikely to me.  UTF-8 isn't perfect, but it
seems to have reached the "good enough" level that people will work around
its flaws rather than replace it with something else.

> I think I am increasingly leaning towards just using /usr/share/man/LL,
> seeing as man has to try decoding pages there as UTF-8 first anyway, but
> please comment if you care.

I agree with this position.

> Unfortunately 2.5.0 wasn't quite enough. Aside from a couple of stupid
> bugs (mostly fixed now), it turns out that we need an extra feature to
> allow debhelper to produce UTF-8 versions of manual pages without
> needing the source encoding to be explicitly specified, by guessing the
> encoding in the same way that man does:
>
>   http://lists.debian.org/debian-i18n/2007/10/msg00063.html
>
> I committed this feature to my development trunk earlier today, and will
> be working on a 2.5.1 release over the next couple of weeks. After that
> I'll send Joey a patch for debhelper.

It sounds like the same feature could be used by other man implementations
that currently can't deal with UTF-8.

The transition plan looks good to me.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Christian Perrier <bubulle@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #100 received at 440420@bugs.debian.org (full text, mbox):

From: Christian Perrier <bubulle@debian.org>
To: 440420@bugs.debian.org, debian-i18n@lists.debian.org
Cc: cjwatson@debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Wed, 2 Jan 2008 07:32:54 +0100
[Message part 1 (text/plain, inline)]
(Colin CC'ed in case he's not subscribed to the bug or to the
debian-policy package PTS)

Quoting Russ Allbery (rra@debian.org):

> This proposal and patch looks good to me, although I'd prefer to see a few
> more seconds before I queue it up for applying to Policy 3.7.4.


I can't really make this very long but this manpages encoding issue is
something that should have been cared of a long time ago. Colin's
proposal is very complete and I'm entirely confident in his knowledge
of the topic so, even if I can't judge each and every detail in the
proposal, I'm entirely sure this is something we need to have.

In short, seconded warmly.



[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #105 received at 440420@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: Russ Allbery <rra@debian.org>
Cc: 440420@bugs.debian.org, debian-i18n@lists.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Mon, 7 Jan 2008 09:55:23 +0000
On Tue, Jan 01, 2008 at 11:37:30AM -0800, Russ Allbery wrote:
> Colin Watson <cjwatson@debian.org> writes:
> > I'm still open to whether new-world-order pages should go in
> > /usr/share/man/LL.UTF-8 or just /usr/share/man/LL. Pros for LL.UTF-8:
> >
> >   * Non-compliant implementations (I'm guessing xman, yelp, etc.) will
> >     display English manual pages rather than misencoded garbage. This
> >     might not be such a big deal for European languages, but for e.g.
> >     Japanese I suspect most people would prefer English to the spew you
> >     get by trying to interpret UTF-8 as EUC-JP.
> 
> I'd rather fix the other implementations, frankly.  All of Debian is
> moving towards UTF-8, as is all of the rest of the Linux world, and I'd
> rather not leave transitional measures around forever.

Mm.

> > I think I am increasingly leaning towards just using /usr/share/man/LL,
> > seeing as man has to try decoding pages there as UTF-8 first anyway, but
> > please comment if you care.
> 
> I agree with this position.

OK. I've changed man-db upstream to install its own translated manual
pages in non-.UTF-8 directories again, and will incorporate this option
into the transition plan when it comes time to send it to
-devel-announce.

> > Unfortunately 2.5.0 wasn't quite enough. Aside from a couple of stupid
> > bugs (mostly fixed now), it turns out that we need an extra feature to
> > allow debhelper to produce UTF-8 versions of manual pages without
> > needing the source encoding to be explicitly specified, by guessing the
> > encoding in the same way that man does:
> >
> >   http://lists.debian.org/debian-i18n/2007/10/msg00063.html
> >
> > I committed this feature to my development trunk earlier today, and will
> > be working on a 2.5.1 release over the next couple of weeks. After that
> > I'll send Joey a patch for debhelper.
> 
> It sounds like the same feature could be used by other man implementations
> that currently can't deal with UTF-8.

Yes; doing so would also fix their misfeatures of attempting to locate
manual pages on disk themselves rather than letting man do it. :-)

Cheers,

-- 
Colin Watson                                       [cjwatson@debian.org]




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #110 received at 440420@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: Russ Allbery <rra@debian.org>
Cc: 440420@bugs.debian.org, debian-i18n@lists.debian.org, Christian Perrier <bubulle@debian.org>
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Mon, 28 Jan 2008 12:29:35 +0000
On Mon, Dec 31, 2007 at 02:37:48PM +0000, Colin Watson wrote:
> On Sun, Dec 30, 2007 at 10:28:12PM -0800, Russ Allbery wrote:
> > Colin Watson <cjwatson@debian.org> writes:
> > > I propose that policy should standardise that we move to using UTF-8 as
> > > the source encoding for all manual pages since it clearly makes sense to
> > > do so.
[...]
> Right. Here's an update; I think I've captured most of the discussion in
> the thread so far. The following patch could in principle be applied
> now, given seconds. Wordsmithing welcome, as I'm aware that this is a
> rather dense recommendation; I'm also looking for seconds for this
> proposal.

Christian Perrier seconded this here:

  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=440420#100

However, since later discussion indicated that we should drop the .UTF-8
business, I think we can also drop it from the policy proposal. (Manual
pages still shouldn't lie about their encoding if they install files
there, but since this will not be the recommended default there is no
reason to bloat policy with it.)

Here's another updated version. Christian, are you still OK with this?
I'm also looking for at least one more second for this proposal.

--- orig/policy.sgml
+++ mod/policy.sgml
@@ -8521,6 +8521,37 @@
 	      be present in the future.
  	  </footnote>
  	</p>
+
+	<p>
+	  Manual pages in locale-specific subdirectories of
+	  <file>/usr/share/man</file> should use either UTF-8 or the usual
+	  legacy encoding for that language (normally the one corresponding
+	  to the shortest relevant locale name in
+	  <file>/usr/share/i18n/SUPPORTED</file>). For example, pages under
+	  <file>/usr/share/man/fr</file> should use either UTF-8 or
+	  ISO-8859-1.<footnote><prgn>man</prgn> will automatically detect
+	  whether UTF-8 is in use. In future, all manual pages will be
+	  required to use UTF-8.</footnote>
+	</p>
+
+	<p>
+	  A country name (e.g. <file>de_DE</file>) should not be included in
+	  the subdirectory name unless it indicates a significant difference
+	  in the language, as this excludes speakers of the language in
+	  other countries.<footnote>At the time of writing, Chinese and
+	  Portuguese are the main languages with such differences, so
+	  <file>pt_BR</file>, <file>zh_CN</file>, and <file>zh_TW</file> are
+	  all allowed.</footnote>
+	</p>
+
+	<p>
+	  Due to limitations in current implementations, all characters
+	  in the manual page source should be representable in the usual
+	  legacy encoding for that language, even if the file is
+	  actually encoded in UTF-8. Safe alternative ways to write many
+	  characters outside that range may be found in
+	  <manref name="groff_char" section="7">.
+	</p>
       </sect>
 
       <sect>

> Thus, an updated transition plan:
> 
>   1. Initial status: packages should use only /usr/share/man/<ll>/
>      (although some packages have anticipated an approximation of the
>      transition plan; we ignore these for the moment as there is little
>      point in changing them only to change them back later), and must
>      use the legacy encoding for pages installed there.
> 
>   2. man-db 2.5.0-1 uploaded, including support for installing pages in
>      /usr/share/man/<ll>.<codeset>/ (e.g. /usr/share/man/fr.UTF-8). The
>      basename of this directory is not typically a well-formed locale,
>      but it allows a clear specification of the hierarchy's encoding
>      while applying to all countries using that language. [DONE]
> 
>   3. man-db 2.5.1-1 uploaded, including 'man --recode'.

We are now here; I uploaded man-db 2.5.1-1 this morning.

>   4. dh_installman updated to recode manual pages to UTF-8 automatically
>      (and install them under /usr/share/man/<ll>.UTF-8/?), using 'man
>      --recode UTF-8' to guess the original encoding. debhelper Depends:
>      man-db (>= 2.5.1-1) for this. Pages for which the DWIM fails can
>      include an explicit coding: directive, which will be documented.

Bug filed:

  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=462937

>   8. Distant future: deprecate /usr/share/man/<ll>/. This will only be
>      for consistency, so there's no need to rush.

This step is deleted.

Thanks,

-- 
Colin Watson                                       [cjwatson@debian.org]




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Guillem Jover <guillem@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #115 received at 440420@bugs.debian.org (full text, mbox):

From: Guillem Jover <guillem@debian.org>
To: Colin Watson <cjwatson@debian.org>, 440420@bugs.debian.org
Cc: Russ Allbery <rra@debian.org>, debian-i18n@lists.debian.org, Christian Perrier <bubulle@debian.org>
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Tue, 29 Jan 2008 02:03:08 +0200
[Message part 1 (text/plain, inline)]
Hi,

On Mon, 2008-01-28 at 12:29:35 +0000, Colin Watson wrote:
> On Mon, Dec 31, 2007 at 02:37:48PM +0000, Colin Watson wrote:
> > On Sun, Dec 30, 2007 at 10:28:12PM -0800, Russ Allbery wrote:
> > > Colin Watson <cjwatson@debian.org> writes:
> > > > I propose that policy should standardise that we move to using UTF-8 as
> > > > the source encoding for all manual pages since it clearly makes sense to
> > > > do so.
> [...]
> > Right. Here's an update; I think I've captured most of the discussion in
> > the thread so far. The following patch could in principle be applied
> > now, given seconds. Wordsmithing welcome, as I'm aware that this is a
> > rather dense recommendation; I'm also looking for seconds for this
> > proposal.

> I'm also looking for at least one more second for this proposal.
> 
> --- orig/policy.sgml
> +++ mod/policy.sgml
> @@ -8521,6 +8521,37 @@
>  	      be present in the future.
>   	  </footnote>
>   	</p>
> +
> +	<p>
> +	  Manual pages in locale-specific subdirectories of
> +	  <file>/usr/share/man</file> should use either UTF-8 or the usual
> +	  legacy encoding for that language (normally the one corresponding
> +	  to the shortest relevant locale name in
> +	  <file>/usr/share/i18n/SUPPORTED</file>). For example, pages under
> +	  <file>/usr/share/man/fr</file> should use either UTF-8 or
> +	  ISO-8859-1.<footnote><prgn>man</prgn> will automatically detect
> +	  whether UTF-8 is in use. In future, all manual pages will be
> +	  required to use UTF-8.</footnote>
> +	</p>
> +
> +	<p>
> +	  A country name (e.g. <file>de_DE</file>) should not be included in
> +	  the subdirectory name unless it indicates a significant difference
> +	  in the language, as this excludes speakers of the language in
> +	  other countries.<footnote>At the time of writing, Chinese and
> +	  Portuguese are the main languages with such differences, so
> +	  <file>pt_BR</file>, <file>zh_CN</file>, and <file>zh_TW</file> are
> +	  all allowed.</footnote>
> +	</p>
> +
> +	<p>
> +	  Due to limitations in current implementations, all characters
> +	  in the manual page source should be representable in the usual
> +	  legacy encoding for that language, even if the file is
> +	  actually encoded in UTF-8. Safe alternative ways to write many
> +	  characters outside that range may be found in
> +	  <manref name="groff_char" section="7">.
> +	</p>
>        </sect>

Seconded.

regards,
guillem
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Christian Perrier <bubulle@kheops.frmug.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #120 received at 440420@bugs.debian.org (full text, mbox):

From: Christian Perrier <bubulle@kheops.frmug.org>
To: Colin Watson <cjwatson@debian.org>, Russ Allbery <rra@debian.org>, 440420@bugs.debian.org, debian-i18n@lists.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Sun, 10 Feb 2008 07:09:25 +0100
(resending that mail that apparently never made it to its recipients)

Quoting Colin Watson (cjwatson@debian.org):

> Christian Perrier seconded this here:
> 
>   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=440420#100
> 
> However, since later discussion indicated that we should drop the .UTF-8
> business, I think we can also drop it from the policy proposal. (Manual
> pages still shouldn't lie about their encoding if they install files
> there, but since this will not be the recommended default there is no
> reason to bloat policy with it.)
> 
> Here's another updated version. Christian, are you still OK with this?
> I'm also looking for at least one more second for this proposal.


Yes, I'l still OK. But as I said, not because I have that much
authority on that matter but more because I trust you for doing what's
right, indeed..:)

It seems that very few seconds came for that proposal. I essentially
think that this does not mean this is a bad suggestion but mostly that
most people were afraid and impressed by the level of detail of the
rationale.

That does not make the proposal bad, indeed, and I really warmly
suggest we adopt it.






Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #125 received at 440420@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Colin Watson <cjwatson@debian.org>, 440420@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Sun, 10 Feb 2008 19:58:46 +0100
[Message part 1 (text/plain, inline)]
On Mon, Jan 28, 2008 at 12:29:35PM +0000, Colin Watson wrote:
> On Mon, Dec 31, 2007 at 02:37:48PM +0000, Colin Watson wrote:
> > On Sun, Dec 30, 2007 at 10:28:12PM -0800, Russ Allbery wrote:
> > > Colin Watson <cjwatson@debian.org> writes:
> > > > I propose that policy should standardise that we move to using UTF-8 as
> > > > the source encoding for all manual pages since it clearly makes sense to
> > > > do so.
> [...]
> > Right. Here's an update; I think I've captured most of the discussion in
> > the thread so far. The following patch could in principle be applied
> > now, given seconds. Wordsmithing welcome, as I'm aware that this is a
> > rather dense recommendation; I'm also looking for seconds for this
> > proposal.
> 
> Christian Perrier seconded this here:
> 
>   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=440420#100
> 
> However, since later discussion indicated that we should drop the .UTF-8
> business, I think we can also drop it from the policy proposal. (Manual
> pages still shouldn't lie about their encoding if they install files
> there, but since this will not be the recommended default there is no
> reason to bloat policy with it.)
> 
> Here's another updated version. Christian, are you still OK with this?
> I'm also looking for at least one more second for this proposal.
> 
> --- orig/policy.sgml
> +++ mod/policy.sgml
> @@ -8521,6 +8521,37 @@
>  	      be present in the future.
>   	  </footnote>
>   	</p>
> +
> +	<p>
> +	  Manual pages in locale-specific subdirectories of
> +	  <file>/usr/share/man</file> should use either UTF-8 or the usual
> +	  legacy encoding for that language (normally the one corresponding
> +	  to the shortest relevant locale name in
> +	  <file>/usr/share/i18n/SUPPORTED</file>). For example, pages under
> +	  <file>/usr/share/man/fr</file> should use either UTF-8 or
> +	  ISO-8859-1.<footnote><prgn>man</prgn> will automatically detect
> +	  whether UTF-8 is in use. In future, all manual pages will be
> +	  required to use UTF-8.</footnote>
> +	</p>
> +
> +	<p>
> +	  A country name (e.g. <file>de_DE</file>) should not be included in
> +	  the subdirectory name unless it indicates a significant difference
> +	  in the language, as this excludes speakers of the language in
> +	  other countries.<footnote>At the time of writing, Chinese and
> +	  Portuguese are the main languages with such differences, so
> +	  <file>pt_BR</file>, <file>zh_CN</file>, and <file>zh_TW</file> are
> +	  all allowed.</footnote>
> +	</p>
> +
> +	<p>
> +	  Due to limitations in current implementations, all characters
> +	  in the manual page source should be representable in the usual
> +	  legacy encoding for that language, even if the file is
> +	  actually encoded in UTF-8. Safe alternative ways to write many
> +	  characters outside that range may be found in
> +	  <manref name="groff_char" section="7">.
> +	</p>
>        </sect>
>  
>        <sect>

Seconded.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #130 received at 440420@bugs.debian.org (full text, mbox):

From: Colin Watson <cjwatson@debian.org>
To: 440420@bugs.debian.org
Cc: debian-i18n@lists.debian.org, control@bugs.debian.org
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Mon, 11 Feb 2008 10:19:49 +0000
[Message part 1 (text/plain, inline)]
retitle 440420 [AMENDMENT 11/02/2008] Manual page encoding
severity 440420 normal
thanks

On Mon, Jan 28, 2008 at 12:29:35PM +0000, Colin Watson wrote:
> --- orig/policy.sgml
> +++ mod/policy.sgml
> @@ -8521,6 +8521,37 @@
>  	      be present in the future.
>   	  </footnote>
>   	</p>
> +
> +	<p>
> +	  Manual pages in locale-specific subdirectories of
> +	  <file>/usr/share/man</file> should use either UTF-8 or the usual
> +	  legacy encoding for that language (normally the one corresponding
> +	  to the shortest relevant locale name in
> +	  <file>/usr/share/i18n/SUPPORTED</file>). For example, pages under
> +	  <file>/usr/share/man/fr</file> should use either UTF-8 or
> +	  ISO-8859-1.<footnote><prgn>man</prgn> will automatically detect
> +	  whether UTF-8 is in use. In future, all manual pages will be
> +	  required to use UTF-8.</footnote>
> +	</p>
> +
> +	<p>
> +	  A country name (e.g. <file>de_DE</file>) should not be included in
> +	  the subdirectory name unless it indicates a significant difference
> +	  in the language, as this excludes speakers of the language in
> +	  other countries.<footnote>At the time of writing, Chinese and
> +	  Portuguese are the main languages with such differences, so
> +	  <file>pt_BR</file>, <file>zh_CN</file>, and <file>zh_TW</file> are
> +	  all allowed.</footnote>
> +	</p>
> +
> +	<p>
> +	  Due to limitations in current implementations, all characters
> +	  in the manual page source should be representable in the usual
> +	  legacy encoding for that language, even if the file is
> +	  actually encoded in UTF-8. Safe alternative ways to write many
> +	  characters outside that range may be found in
> +	  <manref name="groff_char" section="7">.
> +	</p>
>        </sect>
>  
>        <sect>

This has now acquired seconds from Guillem Jover, Christian Perrier, and
Bill Allombert, so I'm raising it to the status of a formal amendment.

-- 
Colin Watson                                       [cjwatson@debian.org]
[signature.asc (application/pgp-signature, inline)]

Changed Bug title to `[AMENDMENT 11/02/2008] Manual page encoding' from `[PROPOSAL] Manual page encoding'. Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Mon, 11 Feb 2008 10:21:10 GMT) Full text and rfc822 format available.

Severity set to `normal' from `wishlist' Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Mon, 11 Feb 2008 10:21:11 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#440420; Package debian-policy. Full text and rfc822 format available.

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. Full text and rfc822 format available.

Message #139 received at 440420@bugs.debian.org (full text, mbox):

From: Russ Allbery <rra@debian.org>
To: Colin Watson <cjwatson@debian.org>
Cc: 440420@bugs.debian.org, debian-i18n@lists.debian.org, Christian Perrier <bubulle@debian.org>
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
Date: Tue, 04 Mar 2008 18:39:20 -0800
Colin Watson <cjwatson@debian.org> writes:

> Christian Perrier seconded this here:
>
>   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=440420#100
>
> However, since later discussion indicated that we should drop the .UTF-8
> business, I think we can also drop it from the policy proposal. (Manual
> pages still shouldn't lie about their encoding if they install files
> there, but since this will not be the recommended default there is no
> reason to bloat policy with it.)
>
> Here's another updated version. Christian, are you still OK with this?
> I'm also looking for at least one more second for this proposal.

I have applied this patch to my Policy arch repository.  Sorry about the
delay.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>




Tags added: pending Request was from Russ Allbery <rra@debian.org> to control@bugs.debian.org. (Wed, 05 Mar 2008 03:06:01 GMT) Full text and rfc822 format available.

Reply sent to Russ Allbery <rra@debian.org>:
You have taken responsibility. Full text and rfc822 format available.

Notification sent to Colin Watson <cjwatson@debian.org>:
Bug acknowledged by developer. Full text and rfc822 format available.

Message #146 received at 440420-close@bugs.debian.org (full text, mbox):

From: Russ Allbery <rra@debian.org>
To: 440420-close@bugs.debian.org
Subject: Bug#440420: fixed in debian-policy 3.8.0.0
Date: Wed, 04 Jun 2008 23:32:03 +0000
Source: debian-policy
Source-Version: 3.8.0.0

We believe that the bug you reported is fixed in the latest version of
debian-policy, which is due to be installed in the Debian FTP archive:

debian-policy_3.8.0.0.dsc
  to pool/main/d/debian-policy/debian-policy_3.8.0.0.dsc
debian-policy_3.8.0.0.tar.gz
  to pool/main/d/debian-policy/debian-policy_3.8.0.0.tar.gz
debian-policy_3.8.0.0_all.deb
  to pool/main/d/debian-policy/debian-policy_3.8.0.0_all.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 440420@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Russ Allbery <rra@debian.org> (supplier of updated debian-policy package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.8
Date: Wed, 04 Jun 2008 15:53:27 -0700
Source: debian-policy
Binary: debian-policy
Architecture: source all
Version: 3.8.0.0
Distribution: unstable
Urgency: low
Maintainer: Debian Policy List <debian-policy@lists.debian.org>
Changed-By: Russ Allbery <rra@debian.org>
Description: 
 debian-policy - Debian Policy Manual and related documents
Closes: 65577 186700 209008 250202 291460 367984 379150 392362 403391 422552 430649 431813 440420 442070 452105 455602 458910 473761 475731 480551 481640 481954
Changes: 
 debian-policy (3.8.0.0) unstable; urgency=low
 .
   * Bug fix: "[PROPOSAL] "debian/README.source" file for packages with
     non-trivial source", thanks to Wouter Verhelst, Jörg Sommer, Colin Watson,
     and Junichi Uekawa                                       (Closes: #250202).
   * Bug fix: "[AMENDMENT 11/02/2008] Manual page encoding", thanks to
     Colin Watson                                             (Closes: #440420).
   * Bug fix: "[PROPOSAL] common interface for parallel building in
     DEB_BUILD_OPTIONS", thanks to Loïc Minier, Peter Samuelson, and Robert
     Millan                                                   (Closes: #209008).
   * Bug fix: "Please clarify splitting/syntax of DEB_BUILD_OPTIONS", thanks to
     Loïc Minier, Peter Samuelson, Robert Millan, and Guillem Jover
                                                              (Closes: #430649).
   * Bug fix: "Documentation for Breaks in dpkg", thanks to Ian Jackson
                                                              (Closes: #379150).
   * Bug fix: "support for wrapped Uploaders should now be mandatory"
                                                              (Closes: #431813).
   * Bug fix: "[PROPOSAL] Add should not embed code from other packages",
     thanks to Neil McGovern, Colin Watson, Bill Allombert, Steve Langasek,
     Kurt Roeckx, and others                                  (Closes: #392362).
   * Bug fix: "Homepage field in debian/control undocumented", thanks to
     Mario Iseli                                              (Closes: #452105).
   * Bug fix: "Policy inconsistent with reality: base subsection no longer
     used", thanks to Magnus Holmgren, Bernd Zeimetz, and Colin Watson
                                                              (Closes: #442070).
   * Bug fix: "Inclusion of Apache Software License versions in
     /usr/share/common-licenses", thanks to Barry Hawkins     (Closes: #291460).
   * Bug fix: "[Amended] copyright should include notice if a package is
     not a part of Debian distribution", thanks to Taketoshi Sano
                                                              (Closes: #65577).
   * Bug fix: "scripts as configuration files: should vs. must", thanks to Frank
     Küster                                                   (Closes: #403391).
   * Bug fix: "debconf specification should allow underscores in template
     names", thanks to Colin Watson                           (Closes: #473761).
   * Bug fix: "clarify handling of run-time and compile-time support programs",
     thanks to Goswin Brederlow and Raphael Hertzog           (Closes: #367984).
   * Policy: better document version ranking and empty Debian revisions
     Wording: Russ Allbery <rra@debian.org>
     Seconded: Raphaël Hertzog <hertzog@debian.org>
     Seconded: Manoj Srivastava <srivasta@debian.org>
     Seconded: Guillem Jover <guillem@debian.org>
     Closes: #186700, #458910
   * Policy: remove obsolete app-defaults and Xresources provisions
     Wording: Julien Cristau <jcristau@debian.org>
     Seconded: Russ Allbery <rra@debian.org>
     Closes: #480551
   * Bug fix: "Examples of dpkg frontends should mention apt now", thanks
     to Josh Triplett                                         (Closes: #455602).
   * Bug fix: "Minor typos and wording suggestions", thanks to Michael
     Tautschnig                                               (Closes: #422552).
   * Bug fix: "substvar reference moved from dpkg-source(1) to
     deb-substvars(5)", thanks to Ian Beckwith                (Closes: #475731).
   * Policy: bugs fixed in NMUs are now closed rather than marked fixed
     Wording: Russ Allbery <rra@debian.org> (thanks, Sandro Tosi)
     Closes: #481640
   * Policy: C.1.4, C.1.8: minor typos
     Wording: Sandro Tosi <matrixhasu@gmail.com>
     Closes: #481954
   * Remove the now-obsolete policy-process document.
   * Add an md5sums control file.
   * Add Vcs-Browser and Vcs-Git control fields.
   * Remove build system support for FHS 2.1 and FSSTND, mostly commented out.
   * Remove more temporary files created by the build.
   * Remove the FSSTND license from debian/copyright; no FSSTND files are
     currently part of policy.
   * Update FHS copyright dates in debian/copyright.
   * Standardize the spacing around headings in upgrading-checklist.html.
   * Remove old ChangeLog files and metadata headers in maintainer scripts
     and debian/rules.
Checksums-Sha1: 
 f42b9921908670eb41c04940875084bc07750592 1095 debian-policy_3.8.0.0.dsc
 3eda45d7ca5563bab8bfda93286137071979385c 638655 debian-policy_3.8.0.0.tar.gz
 73680c98bc62507858aa055bcf1f1688a812f5ba 1588552 debian-policy_3.8.0.0_all.deb
Checksums-Sha256: 
 507a048bc7c84039910843e284d8e0e305778224346fd981c6f749176cc79220 1095 debian-policy_3.8.0.0.dsc
 8321b1dddd3ddd55a09539c842084ea05a731265c4c5847997957a552ba1aaa4 638655 debian-policy_3.8.0.0.tar.gz
 6c2083f50ccaa5a2f2d7a89febd320cf3a862b3204157324ffd9b363daac3e58 1588552 debian-policy_3.8.0.0_all.deb
Files: 
 37ff33fb3ccebc4f87e23fd7b91e7859 1095 doc optional debian-policy_3.8.0.0.dsc
 2565d6eaceac0aa2d093538048c1b8ed 638655 doc optional debian-policy_3.8.0.0.tar.gz
 3b153faeec899cdf1199d4d46c5d8859 1588552 doc optional debian-policy_3.8.0.0_all.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIRyNB+YXjQAr8dHYRAt4NAKDbO1f3BlmKT5SgMVf4AHE2Z7bPTgCffcnI
Kwa3jEGgq+PV6dwiurjmSAc=
=wCDz
-----END PGP SIGNATURE-----





Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Fri, 04 Jul 2008 07:29:28 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Fri Apr 18 17:01:50 2014; Machine Name: beach.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.