Debian Bug report logs - #701081
debian-policy: mandate an encoding for filenames in binary packages

version graph

Package: debian-policy; Maintainer for debian-policy is Debian Policy List <debian-policy@lists.debian.org>; Source for debian-policy is src:debian-policy.

Reported by: Helmut Grohne <helmut@subdivi.de>

Date: Thu, 21 Feb 2013 11:45:02 UTC

Severity: wishlist

Tags: patch

Fixed in version debian-policy/3.9.5.0

Done: Charles Plessy <plessy@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Thu, 21 Feb 2013 11:45:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Helmut Grohne <helmut@subdivi.de>:
New Bug report received and forwarded. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Thu, 21 Feb 2013 11:45:04 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Helmut Grohne <helmut@subdivi.de>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: debian-policy: mandate an encoding for filenames in binary packages
Date: Thu, 21 Feb 2013 12:43:28 +0100
Package: debian-policy
Severity: wishlist

Apparently the debian-policy currently says nothing about the characters
used in filenames contained in binary packages. Most packages use common
sense and only use a small subset of US-ASCII. In Debian sid main most
filenames can be represented using the following subset of US-ASCII
characters (written as a regular expression):

	[][a-zA-Z0-9{}<>() ^/,=:&!*%#$~@+._-]

The number of exceptions is about 200 contained in about 50 binary
packages. In those packages some filenames are not representable as
UTF-8 (for example aspell-is) and others don't make any sense in
ISO-8859-15 (for example ca-certificates).

It would be nice if some common ground concerning filename encoding
could be reached. The options range from a rather restrictive definition
of acceptable characters via requiring filenames to be representable in
US-ASCII to mandating a particular encoding (such as UTF-8). This could
be first introduced as a SHOULD and later turned into a MUST.

Personally I do not really care about what the precise restriction is as
long as it permits a mechanical transformation to unicode.

Helmut



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Thu, 21 Feb 2013 14:51:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Thu, 21 Feb 2013 14:51:04 GMT) Full text and rfc822 format available.

Message #10 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Helmut Grohne <helmut@subdivi.de>, Debian Bug Tracking System <701081@bugs.debian.org>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Thu, 21 Feb 2013 15:48:15 +0100
On Thu, Feb 21, 2013 at 12:43:28PM +0100, Helmut Grohne wrote:
> Package: debian-policy
> Severity: wishlist
> 
> Apparently the debian-policy currently says nothing about the characters
> used in filenames contained in binary packages. Most packages use common
> sense and only use a small subset of US-ASCII. In Debian sid main most
> filenames can be represented using the following subset of US-ASCII
> characters (written as a regular expression):
> 
> 	[][a-zA-Z0-9{}<>() ^/,=:&!*%#$~@+._-]
> 
> The number of exceptions is about 200 contained in about 50 binary
> packages. In those packages some filenames are not representable as
> UTF-8 (for example aspell-is) and others don't make any sense in
> ISO-8859-15 (for example ca-certificates).
> 
> It would be nice if some common ground concerning filename encoding
> could be reached. The options range from a rather restrictive definition
> of acceptable characters via requiring filenames to be representable in
> US-ASCII to mandating a particular encoding (such as UTF-8). This could
> be first introduced as a SHOULD and later turned into a MUST.
> 
> Personally I do not really care about what the precise restriction is as
> long as it permits a mechanical transformation to unicode.

I raised a similar issue in 
http://lists.debian.org/debian-policy/2011/03/msg00212.html
In most case, 8bit chars in filename are bugs.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 23 Feb 2013 04:33:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 23 Feb 2013 04:33:03 GMT) Full text and rfc822 format available.

Message #15 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: 701081@bugs.debian.org, Helmut Grohne <helmut@subdivi.de>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 23 Feb 2013 13:31:32 +0900
Le Thu, Feb 21, 2013 at 03:48:15PM +0100, Bill Allombert a écrit :
> On Thu, Feb 21, 2013 at 12:43:28PM +0100, Helmut Grohne wrote:
> > 
> > It would be nice if some common ground concerning filename encoding
> > could be reached. The options range from a rather restrictive definition
> > of acceptable characters via requiring filenames to be representable in
> > US-ASCII to mandating a particular encoding (such as UTF-8). This could
> > be first introduced as a SHOULD and later turned into a MUST.
> > 
> > Personally I do not really care about what the precise restriction is as
> > long as it permits a mechanical transformation to unicode.
> 
> I raised a similar issue in 
> http://lists.debian.org/debian-policy/2011/03/msg00212.html
> In most case, 8bit chars in filename are bugs.

Hello everybody,

quick notes in random order:

 - There are here and there discussions raising possible corner cases
   where distributing files with a name not representable in UTF-8 might
   be justified, for instance in test suites.

 - Fedora's policy is: "filenames that contain non-ASCII characters must be
   encoded as UTF-8. Since there's no way to note which encoding the filename
   is in, using the same encoding for all filenames is the best way to ensure
   users can read the filenames properly. If upstream ships filenames that are
   not encoded in UTF-8 you can use a utility like convmv (from the convmv
   package) to convert the filename in your %install section."

 - POSIX.1-2008, section 3.276 (Portable Filename Character Set), mentions:

   The set of characters from which portable filenames are constructed.
   
   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
   a b c d e f g h i j k l m n o p q r s t u v w x y z
   0 1 2 3 4 5 6 7 8 9 . _ -
   
   The last three characters are the <period>, <underscore>, and <hyphen>
   characters, respectively.
   
   http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_276

 - Similar discussion also took place in #99933.  I wonder about merging this
   bug (#701081) and #99933.

 - Is there anybody following the preparation of the FHS 3.0 or the LSB, who
   could tell us if a broader guideline on name encoding for files distributed
   in core directories is under discussion there ?

Altogether, I think that it would be useful to have a policy on filename encoding.

Have a nice week-end,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 23 Feb 2013 07:06:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Helmut Grohne <helmut@subdivi.de>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 23 Feb 2013 07:06:03 GMT) Full text and rfc822 format available.

Message #20 received at 701081@bugs.debian.org (full text, mbox):

From: Helmut Grohne <helmut@subdivi.de>
To: Charles Plessy <plessy@debian.org>
Cc: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 23 Feb 2013 08:02:10 +0100
Thanks for your comments.

On Sat, Feb 23, 2013 at 01:31:32PM +0900, Charles Plessy wrote:
>  - There are here and there discussions raising possible corner cases
>    where distributing files with a name not representable in UTF-8 might
>    be justified, for instance in test suites.

Even though the general argument is correct, the particular example
probably applies to source packages in most cases. We don't control
source packages (unless we repack them), so I think they should not be
covered by a filename encoding policy.

>  - Similar discussion also took place in #99933.  I wonder about merging this
>    bug (#701081) and #99933.

I stumbled upon this bug before reporting this one and decided that the
issues were sufficiently separate from each other to warrant a new bug
number. I did not read the full bug log and therefore did not discover
that its scope widened to filenames as well. The discussion found
therein clearly is valuable. I still think that separating bugs for
filename encoding and file content encoding is a good idea, because
those issues can be solved independently. That said merging also makes
sense to point to the rest of the discussion. In the latter case, please
select a better summary message.

I have to admit, that I am slightly in favour of just copying Fedora's
approach. Making distributions more compatible with each other seems
like a worthwhile thing to do.

Helmut



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 24 Feb 2013 02:57:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 24 Feb 2013 02:57:03 GMT) Full text and rfc822 format available.

Message #25 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: Helmut Grohne <helmut@subdivi.de>, 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 24 Feb 2013 11:54:01 +0900
Le Sat, Feb 23, 2013 at 08:02:10AM +0100, Helmut Grohne a écrit :
> 
> I have to admit, that I am slightly in favour of just copying Fedora's
> approach. Making distributions more compatible with each other seems
> like a worthwhile thing to do.

This could be done by an addition like the following, after section 10.9
(Permissions and owners).  The wording is still a bit clumsy also, I am not
sure if "installed" includes files created by maintainer scripts (which would
be the intent here).  I named the section "File names", and not "File name
character set", in case we would add other restrictions (such as length) in the
future.

+      <sec id="filenames">
+       <heading>File names</heading>
+
+       <p>
+         The name of the files installed by binary packages must be encoded in
+         UTF-8 and should be restricted to ASCII unless there is a justified
+         need for using other characters.
+       </p>
+      </sec>

Some packages do not comply with the above.  Given the pace of the releases
of the Policy, I am not sure that it is worth having first a should and then
a must, if you or somebody else would have the time to tackle the issue
after the Wheezy release.

By the way, how about directories ?

Have a nice week-end,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 02 Mar 2013 12:12:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 02 Mar 2013 12:12:03 GMT) Full text and rfc822 format available.

Message #30 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: 701081@bugs.debian.org
Cc: Helmut Grohne <helmut@subdivi.de>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 2 Mar 2013 21:09:05 +0900
Le Sun, Feb 24, 2013 at 11:54:01AM +0900, Charles Plessy a écrit :
> Le Sat, Feb 23, 2013 at 08:02:10AM +0100, Helmut Grohne a écrit :
> > 
> > I have to admit, that I am slightly in favour of just copying Fedora's
> > approach. Making distributions more compatible with each other seems
> > like a worthwhile thing to do.
 
> This could be done by an addition like the following, after section 10.9
> 
> +      <sec id="filenames">
> +       <heading>File names</heading>
> +
> +       <p>
> +         The name of the files installed by binary packages must be encoded in
> +         UTF-8 and should be restricted to ASCII unless there is a justified
> +         need for using other characters.
> +       </p>
> +      </sec>
 
> By the way, how about directories ?

Related to this, I just found the following in /usr/share/doc/dpkg-dev/triggers.txt.gz.

    Because of the restriction on trigger names, it is not possible to
    declare a file trigger for a directory whose name contains whitespace,
    i18n characters, etc.  Such a trigger should not be necessary.

Cheers,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 02 Mar 2013 12:39:09 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 02 Mar 2013 12:39:09 GMT) Full text and rfc822 format available.

Message #35 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Helmut Grohne <helmut@subdivi.de>, Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 2 Mar 2013 13:24:55 +0100
On Sat, Feb 23, 2013 at 08:02:10AM +0100, Helmut Grohne wrote:
> On Sat, Feb 23, 2013 at 01:31:32PM +0900, Charles Plessy wrote:
> >  - There are here and there discussions raising possible corner cases
> >    where distributing files with a name not representable in UTF-8 might
> >    be justified, for instance in test suites.
> 
> Even though the general argument is correct, the particular example
> probably applies to source packages in most cases. We don't control
> source packages (unless we repack them), so I think they should not be
> covered by a filename encoding policy.

Agreed.

> >  - Similar discussion also took place in #99933.  I wonder about merging this
> >    bug (#701081) and #99933.
> 
> I stumbled upon this bug before reporting this one and decided that the
> issues were sufficiently separate from each other to warrant a new bug
> number. I did not read the full bug log and therefore did not discover
> that its scope widened to filenames as well. The discussion found
> therein clearly is valuable. I still think that separating bugs for
> filename encoding and file content encoding is a good idea, because
> those issues can be solved independently. That said merging also makes
> sense to point to the rest of the discussion. In the latter case, please
> select a better summary message.
> 
> I have to admit, that I am slightly in favour of just copying Fedora's
> approach. Making distributions more compatible with each other seems
> like a worthwhile thing to do.

I would like to see examples of UTF-8 filenames in source packages that are not
bugs and do not cause issues with some users before allowing them in policy.
Policy still allow to use non utf-8 locales.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 02 Mar 2013 14:54:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Guillem Jover <guillem@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 02 Mar 2013 14:54:03 GMT) Full text and rfc822 format available.

Message #40 received at 701081@bugs.debian.org (full text, mbox):

From: Guillem Jover <guillem@debian.org>
To: Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Cc: Helmut Grohne <helmut@subdivi.de>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 2 Mar 2013 15:51:37 +0100
On Sat, 2013-02-23 at 13:31:32 +0900, Charles Plessy wrote:
> Le Thu, Feb 21, 2013 at 03:48:15PM +0100, Bill Allombert a écrit :
>  - Is there anybody following the preparation of the FHS 3.0 or the LSB, who
>    could tell us if a broader guideline on name encoding for files distributed
>    in core directories is under discussion there ?

I think the new FHS version is currently stalled, so I'd not expect
any update in the near future.

Thanks,
Guillem



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 02 Mar 2013 15:42:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Guillem Jover <guillem@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 02 Mar 2013 15:42:03 GMT) Full text and rfc822 format available.

Message #45 received at 701081@bugs.debian.org (full text, mbox):

From: Guillem Jover <guillem@debian.org>
To: Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Cc: Helmut Grohne <helmut@subdivi.de>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 2 Mar 2013 16:38:49 +0100
Hi!

On Sun, 2013-02-24 at 11:54:01 +0900, Charles Plessy wrote:
> This could be done by an addition like the following, after section 10.9
> (Permissions and owners).  The wording is still a bit clumsy also, I am not
> sure if "installed" includes files created by maintainer scripts (which would
> be the intent here).  I named the section "File names", and not "File name
> character set", in case we would add other restrictions (such as length) in the
> future.

To make the installed situation pretty clear, it might make sense to
say something along the lines: «the files that have been created after
the binary package is "Installed"».

> +      <sec id="filenames">
> +       <heading>File names</heading>
> +
> +       <p>
> +         The name of the files installed by binary packages must be encoded in
> +         UTF-8 and should be restricted to ASCII unless there is a justified
> +         need for using other characters.
> +       </p>
> +      </sec>
> 
> Some packages do not comply with the above.  Given the pace of the releases
> of the Policy, I am not sure that it is worth having first a should and then
> a must, if you or somebody else would have the time to tackle the issue
> after the Wheezy release.

I'd second something like this, but I'd first like us to consider if
we really want any non-ASCII characters in filenames. Currently on sid
there does not appear to be many such filenames (64 from my check, if
that's not bogus):

  $ LC_ALL=C zgrep '[^[:print:]]' \
    ftp.debian.org_debian_dists_sid_*_Contents-amd64.gz | wc -l

> By the way, how about directories ?

This is a matter of terminology, directories are also filenames, and
part of pathnames, which point to a directory instead of a file. I
don't see why we'd want to exclude directories from filenames.

Thanks,
Guillem



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Tue, 05 Mar 2013 00:09:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Roger Leigh <rleigh@codelibre.net>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Tue, 05 Mar 2013 00:09:03 GMT) Full text and rfc822 format available.

Message #50 received at 701081@bugs.debian.org (full text, mbox):

From: Roger Leigh <rleigh@codelibre.net>
To: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>, 701081@bugs.debian.org
Cc: Helmut Grohne <helmut@subdivi.de>, Charles Plessy <plessy@debian.org>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Tue, 5 Mar 2013 00:06:06 +0000
On Sat, Mar 02, 2013 at 01:24:55PM +0100, Bill Allombert wrote:
> I would like to see examples of UTF-8 filenames in source packages that are
> not bugs and do not cause issues with some users before allowing them in
> policy.  Policy still allow to use non utf-8 locales.

We have defaulted to UTF-8 locales for over a decade now.  Unless
there are compelling reasons not to use UTF-8 locales, maybe we
could perhaps consider retiring them and having everything be
UTF-8 by default at this point.  If we do require this in
userspace, then the naming restrictions could also be enforced
in-kernel e.g. with create/open with O_CREAT to disallow non-UTF-8
filename creation.  This would bring some much needed sanity to
filename handling, so it's a wider issue than just what's
permitted in packages.

WRT the point about allowing non-UTF-8 filenames for purposes
such as testsuites, if we require UTF-8 across the board, such
tests become unnecessary ;-)


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux    http://people.debian.org/~rleigh/
 `. `'   schroot and sbuild  http://alioth.debian.org/projects/buildd-tools
   `-    GPG Public Key      F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Tue, 05 Mar 2013 00:21:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Tue, 05 Mar 2013 00:21:06 GMT) Full text and rfc822 format available.

Message #55 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Roger Leigh <rleigh@codelibre.net>, 701081@bugs.debian.org
Cc: Helmut Grohne <helmut@subdivi.de>, Charles Plessy <plessy@debian.org>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Tue, 5 Mar 2013 01:16:52 +0100
On Tue, Mar 05, 2013 at 12:06:06AM +0000, Roger Leigh wrote:
> On Sat, Mar 02, 2013 at 01:24:55PM +0100, Bill Allombert wrote:
> > I would like to see examples of UTF-8 filenames in source packages that are
> > not bugs and do not cause issues with some users before allowing them in
> > policy.  Policy still allow to use non utf-8 locales.

Hu I meant binary packages.

> We have defaulted to UTF-8 locales for over a decade now.  Unless
> there are compelling reasons not to use UTF-8 locales, maybe we
> could perhaps consider retiring them and having everything be
> UTF-8 by default at this point. 

My understanding is that we are supporting some character set that are still not 
included in unicode.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Tue, 05 Mar 2013 09:30:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Thomas Preud'homme" <robotux@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Tue, 05 Mar 2013 09:30:03 GMT) Full text and rfc822 format available.

Message #60 received at 701081@bugs.debian.org (full text, mbox):

From: "Thomas Preud'homme" <robotux@debian.org>
To: debian-policy@lists.debian.org, Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>, 701081@bugs.debian.org
Cc: Roger Leigh <rleigh@codelibre.net>, Helmut Grohne <helmut@subdivi.de>, Charles Plessy <plessy@debian.org>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Tue, 5 Mar 2013 10:17:58 +0100
[Message part 1 (text/plain, inline)]
Le mardi 5 mars 2013 01:16:52, Bill Allombert a écrit :
> On Tue, Mar 05, 2013 at 12:06:06AM +0000, Roger Leigh wrote:
> > We have defaulted to UTF-8 locales for over a decade now.  Unless
> > there are compelling reasons not to use UTF-8 locales, maybe we
> > could perhaps consider retiring them and having everything be
> > UTF-8 by default at this point. If we do require this in
> > userspace, then the naming restrictions could also be enforced
> > in-kernel e.g. with create/open with O_CREAT to disallow non-UTF-8
> > filename creation.
> 
> My understanding is that we are supporting some character set that are
> still not included in unicode.

Forgive me if I missed something but it seems to me that even if we are 
supporting only charset included in unicode, people could have files created 
with another distribution / OS not encoded in UTF-8. So I don't think it's 
possible / desirable to deny opening UTF-8 filename in the kernel.

> 
> Cheers,

Best regards,

Thomas
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Tue, 05 Mar 2013 10:48:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Roger Leigh <rleigh@codelibre.net>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Tue, 05 Mar 2013 10:48:03 GMT) Full text and rfc822 format available.

Message #65 received at 701081@bugs.debian.org (full text, mbox):

From: Roger Leigh <rleigh@codelibre.net>
To: Thomas Preud'homme <robotux@debian.org>
Cc: debian-policy@lists.debian.org, Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>, 701081@bugs.debian.org, Helmut Grohne <helmut@subdivi.de>, Charles Plessy <plessy@debian.org>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Tue, 5 Mar 2013 10:45:35 +0000
On Tue, Mar 05, 2013 at 10:17:58AM +0100, Thomas Preud'homme wrote:
> Le mardi 5 mars 2013 01:16:52, Bill Allombert a écrit :
> > On Tue, Mar 05, 2013 at 12:06:06AM +0000, Roger Leigh wrote:
> > > We have defaulted to UTF-8 locales for over a decade now.  Unless
> > > there are compelling reasons not to use UTF-8 locales, maybe we
> > > could perhaps consider retiring them and having everything be
> > > UTF-8 by default at this point. If we do require this in
> > > userspace, then the naming restrictions could also be enforced
> > > in-kernel e.g. with create/open with O_CREAT to disallow non-UTF-8
> > > filename creation.
> > 
> > My understanding is that we are supporting some character set that are
> > still not included in unicode.
> 
> Forgive me if I missed something but it seems to me that even if we are 
> supporting only charset included in unicode, people could have files created 
> with another distribution / OS not encoded in UTF-8. So I don't think it's 
> possible / desirable to deny opening UTF-8 filename in the kernel.

For opening, this is is necessary for backward compatibility.  For
/creation/, we could certainly mandate UTF-8 for the addition of
new files, which is why I qualified with O_CREAT.  The same would
apply for other syscalls which create file paths (e.g. mknod, mkdir,
bind).  This would permit UTF-8 to be enforced going forward while
retaining a means for users to migrate broken naming to UTF-8.

Which locales don't currently have charsets mapping to Unicode?

Could we remove the non-UTF-8 locales which /do/ have complete
UTF-8 coverage and replace them with aliases?  That would at least
achieve the transition for the vast majority of locales.


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux    http://people.debian.org/~rleigh/
 `. `'   schroot and sbuild  http://alioth.debian.org/projects/buildd-tools
   `-    GPG Public Key      F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Tue, 05 Mar 2013 11:00:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Tue, 05 Mar 2013 11:00:03 GMT) Full text and rfc822 format available.

Message #70 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Roger Leigh <rleigh@codelibre.net>
Cc: Thomas Preud'homme <robotux@debian.org>, debian-policy@lists.debian.org, 701081@bugs.debian.org, Helmut Grohne <helmut@subdivi.de>, Charles Plessy <plessy@debian.org>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Tue, 5 Mar 2013 11:57:40 +0100
On Tue, Mar 05, 2013 at 10:45:35AM +0000, Roger Leigh wrote:
> Which locales don't currently have charsets mapping to Unicode?
> 
> Could we remove the non-UTF-8 locales which /do/ have complete
> UTF-8 coverage and replace them with aliases?  That would at least
> achieve the transition for the vast majority of locales.

The way locales work, this would prevent people to read text files written
under such an encoding. Not an option I think.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Tue, 05 Mar 2013 20:42:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Tue, 05 Mar 2013 20:42:05 GMT) Full text and rfc822 format available.

Message #75 received at 701081@bugs.debian.org (full text, mbox):

From: Russ Allbery <rra@debian.org>
To: Roger Leigh <rleigh@codelibre.net>
Cc: Thomas Preud'homme <robotux@debian.org>, debian-policy@lists.debian.org, Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>, 701081@bugs.debian.org, Helmut Grohne <helmut@subdivi.de>, Charles Plessy <plessy@debian.org>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Tue, 05 Mar 2013 12:39:52 -0800
Roger Leigh <rleigh@codelibre.net> writes:

> Could we remove the non-UTF-8 locales which /do/ have complete UTF-8
> coverage and replace them with aliases?  That would at least achieve the
> transition for the vast majority of locales.

I don't think this is a good idea.  There are a lot of legacy ISO 8859-1
or KOI8-R or SJIS documents out there and people who work with those
documents may prefer to continue to operate in that locale.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Wed, 06 Mar 2013 04:48:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Wed, 06 Mar 2013 04:48:03 GMT) Full text and rfc822 format available.

Message #80 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Wed, 6 Mar 2013 13:45:14 +0900
Le Sat, Mar 02, 2013 at 04:38:49PM +0100, Guillem Jover a écrit :
> 
> I'd second something like this, but I'd first like us to consider if
> we really want any non-ASCII characters in filenames. Currently on sid
> there does not appear to be many such filenames (64 from my check, if
> that's not bogus):
> 
>   $ LC_ALL=C zgrep '[^[:print:]]' \
>     ftp.debian.org_debian_dists_sid_*_Contents-amd64.gz | wc -l

Hi Guillem and everybody,

I had a closer look at these files.

 * There are dictionaries where the filename is the native name of the
   language, like català, español, bokmål, etc.  In all the case the
   characters are valid Unicode.  I think that it would be fair to allow
   such cases.

 * There are names that look rather arbitrary and replaceable
   with ASCII alternatives if needed.  For instance in python-pyramid,
   usr/lib/python2.6/dist-packages/pyramid/tests/fixtures/static/héhé.html

 * There are CA certificates with names like Certinomis_-_Autorité_Racine.crt.
   Since I do not know how these certificates work, I do not know if they
   can be renamed.

 * There is a file that need to be in non-ASCII Unicode to fit its purpose:
   usr/share/doc/console-tools/examples/♪♬ in console-tools.  The package
   also distributes a file called README.strange-name in the same directory.

 * There are some more dubious names like 6Sze¶æ_Jab³ek.png in lletters-media,
   or Miroir_Sphérique in optgeo.  However, they do not cause much inconvenience
   with a Unicode locale.

 * The pitivi package gives entries with no obvious Unicode characters, like 
   usr/share/gnome/help/pitivi/C/figures/codecscontainers.jpg.
   I think that we should at least strongly recommend that if a name looks ASCII
   then it should be ASCII.

 * Lastly, there seems to be only a single package that ships non-Unicode filenames,
   non-free/ooohg with for instance 13_Afr d<U+0082>col.gif.

Requiring that all file and directory names are encoded in Unicode and
preferably in ASCII would therefore make only one package RC-buggy.  Requiring
all-ASCII would be also possible with a bit more work, but I am not sure that it
would be worth the effort, as most of the current examples above do not require
specialised fonts.  Altogether, there seems to be a good self-discipline.
However, if there are ways to test the following automatically, maybe we should
consider requesting that what is displayed ASCII should be ASCII.

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Wed, 06 Mar 2013 05:00:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Wed, 06 Mar 2013 05:00:04 GMT) Full text and rfc822 format available.

Message #85 received at 701081@bugs.debian.org (full text, mbox):

From: Russ Allbery <rra@debian.org>
To: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Tue, 05 Mar 2013 20:56:00 -0800
Charles Plessy <plessy@debian.org> writes:

>  * There are names that look rather arbitrary and replaceable
>    with ASCII alternatives if needed.  For instance in python-pyramid,
>    usr/lib/python2.6/dist-packages/pyramid/tests/fixtures/static/héhé.html

At least some of these (for things located in a directory named tests) are
probably explicit tests of non-ASCII file names.

>  * There are CA certificates with names like Certinomis_-_Autorité_Racine.crt.
>    Since I do not know how these certificates work, I do not know if they
>    can be renamed.

This to me feels like a good use of Unicde.  One of the reasons why I'm in
favor of a general policy saying we should use UTF-8, rather than a policy
saying to use only ASCII names, is that names of things in the real world
(people and organizations) are often put into file names.  And it really
bothers me when we tell people they can't use their *actual* name or are
required to misspell it in some arbitrary way in order to shoehorn
themselves into ASCII.

In this case, I assume the name of the relevant certificate authority is
Certinomis - Autorité Racine.  I think it's quite reasonable to use the
actual name for the certificate authority in the file name.

>  * The pitivi package gives entries with no obvious Unicode characters,
>  like usr/share/gnome/help/pitivi/C/figures/codecscontainers.jpg.  I
>  think that we should at least strongly recommend that if a name looks
>  ASCII then it should be ASCII.

It's mildly difficult to be clear about this, since this can depend very
heavily on the font.  In general, the way this sort of requirement is
stated in the Unicode world is to require a normalized form, but I think
that's rather heavy-weight for what we're trying to accomplish.

But yes, we can just make a general (but not formally precise)
recommendation.

> Requiring that all file and directory names are encoded in Unicode and
> preferably in ASCII would therefore make only one package RC-buggy.
> Requiring all-ASCII would be also possible with a bit more work, but I
> am not sure that it would be worth the effort, as most of the current
> examples above do not require specialised fonts.  Altogether, there
> seems to be a good self-discipline.  However, if there are ways to test
> the following automatically, maybe we should consider requesting that
> what is displayed ASCII should be ASCII.

I think it's reasonable to say that file names that can be represented in
ASCII should be in ASCII.  But I do think that it's entirely reasonable to
use Unicode for names that truly aren't ASCII names, and it would bother
me to tell people to misspell those names to squeeze them into ASCII.

For the other half of what's been discussed, I don't think that Debian
should have a position about what's *inside* files other than files where
we're already standardizing the contents (such as the copyright file).
There may be reasons why files should be encoded in legacy encodings for
specific uses, and I don't feel like it's the proper role of Policy to
dictate to all package maintainers that they can't work with those use
cases.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Wed, 06 Mar 2013 23:15:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Wed, 06 Mar 2013 23:15:03 GMT) Full text and rfc822 format available.

Message #90 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Thu, 7 Mar 2013 00:12:13 +0100
On Wed, Mar 06, 2013 at 01:45:14PM +0900, Charles Plessy wrote:
> Le Sat, Mar 02, 2013 at 04:38:49PM +0100, Guillem Jover a écrit :
> > 
> > I'd second something like this, but I'd first like us to consider if
> > we really want any non-ASCII characters in filenames. Currently on sid
> > there does not appear to be many such filenames (64 from my check, if
> > that's not bogus):
> > 
> >   $ LC_ALL=C zgrep '[^[:print:]]' \
> >     ftp.debian.org_debian_dists_sid_*_Contents-amd64.gz | wc -l
> 
> Hi Guillem and everybody,
> 
> I had a closer look at these files.
> 
>  * There are dictionaries where the filename is the native name of the
>    language, like català, español, bokmål, etc.  In all the case the
>    characters are valid Unicode.  I think that it would be fair to allow
>    such cases.

This is not the current practice:
In /usr/share/dict/ and /usr/lib/ispell/, only bokmål is 8bit. 
Most dictionnary names are in English,
with sometime an alias in the language
(catala, dansk, foeroyskt, bokmål, svenska).

In /usr/lib/aspell/, most dictionnary are named using the ISO-639 2-letter code
or the english name. There are some non-english aliases like francais.alias,
which is missing the cedilla.  Only català, español and íslenska  are not 8bit.

So currently, there is no standard practice to name dictionnaries after the
UTF-8 encoding of the native spelling for the language, and it would be more
practical to standardize on ISO 639 language code instead.

>  * There are names that look rather arbitrary and replaceable
>    with ASCII alternatives if needed.  For instance in python-pyramid,
>    usr/lib/python2.6/dist-packages/pyramid/tests/fixtures/static/héhé.html

Probably some test files that could be removed form the binary packages.

>  * There are CA certificates with names like Certinomis_-_Autorité_Racine.crt.
>    Since I do not know how these certificates work, I do not know if they
>    can be renamed.

The main reason they have such name is to avoid name clash with other .crt file.

>  * There is a file that need to be in non-ASCII Unicode to fit its purpose:
>    usr/share/doc/console-tools/examples/♪♬ in console-tools.  The package
>    also distributes a file called README.strange-name in the same directory.

The value of such file is pretty low.

>  * There are some more dubious names like 6Sze¶æ_Jab³ek.png in lletters-media,
>    or Miroir_Sphérique in optgeo.  However, they do not cause much inconvenience
>    with a Unicode locale.

Miroir_Sphe♦rique is a bug in itself: it should be
Miroir_Sphérique.
'6Sze¶æ_Jab³ek.png' is probably misencoded (it is intended to be 6 in Polish, i.e.
sześć).

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 09 Mar 2013 01:54:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 09 Mar 2013 01:54:03 GMT) Full text and rfc822 format available.

Message #95 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 9 Mar 2013 10:51:45 +0900
[Message part 1 (text/plain, inline)]
tag 701081 patch
thanks

Dear all,

I think that it emerges from the discussion that there are good uses of
Unicode, and that somebody would need to step up and ensure that a dozen of
packages are corrected if we were to restrict further the encoding of file
names.  Moreover, there seems to be a good self-discipline, and Unicode is
not used in paths that are central on non-Unicode systems.

Given that currently the Policy does not mention anything about file names, I
think that it would be fair to fill the gap by documenting the use of Unicode
as current practice and recommend ASCII for most cases.  This does not preculde
further restrictions if needed.  I volunteer to contact the maintainer of
lletters-media and ooohg, the only packages with non-Unicode file names.

I attached a slightly updated patch.  I have not added that the policy is for
'the files that have been created after the binary package is "Installed"',
because I think that it is clear throughrough chapter 10 that "installed files"
means this.  Nevertheless, it would be nice to have such a definition black on
white somewhere else, to be discussed in another thread.

Have a nice week-end,

-- 
Charles
[0001-Installed-file-names-must-be-in-UTF-8-and-should-use.patch (text/x-diff, attachment)]

Added tag(s) patch. Request was from Charles Plessy <plessy@debian.org> to control@bugs.debian.org. (Sat, 09 Mar 2013 01:54:05 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 16 Mar 2013 12:15:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Julien Cristau <jcristau@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 16 Mar 2013 12:15:03 GMT) Full text and rfc822 format available.

Message #102 received at 701081@bugs.debian.org (full text, mbox):

From: Julien Cristau <jcristau@debian.org>
To: Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 16 Mar 2013 13:11:37 +0100
[Message part 1 (text/plain, inline)]
On Sat, Mar  9, 2013 at 10:51:45 +0900, Charles Plessy wrote:

> tag 701081 patch
> thanks
> 
> Dear all,
> 
> I think that it emerges from the discussion that there are good uses of
> Unicode, and that somebody would need to step up and ensure that a dozen of
> packages are corrected if we were to restrict further the encoding of file
> names.  Moreover, there seems to be a good self-discipline, and Unicode is
> not used in paths that are central on non-Unicode systems.
> 
> Given that currently the Policy does not mention anything about file names, I
> think that it would be fair to fill the gap by documenting the use of Unicode
> as current practice and recommend ASCII for most cases.  This does not preculde
> further restrictions if needed.  I volunteer to contact the maintainer of
> lletters-media and ooohg, the only packages with non-Unicode file names.
> 
> I attached a slightly updated patch.  I have not added that the policy is for
> 'the files that have been created after the binary package is "Installed"',
> because I think that it is clear throughrough chapter 10 that "installed files"
> means this.  Nevertheless, it would be nice to have such a definition black on
> white somewhere else, to be discussed in another thread.
> 
You say unicode everywhere but you seem to actually mean utf-8...

Cheers,
Julien
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 16 Mar 2013 14:21:10 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 16 Mar 2013 14:21:10 GMT) Full text and rfc822 format available.

Message #107 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 16 Mar 2013 23:20:27 +0900
Le Sat, Mar 16, 2013 at 01:11:37PM +0100, Julien Cristau a écrit :
> On Sat, Mar  9, 2013 at 10:51:45 +0900, Charles Plessy wrote:
> > 
> > I think that it emerges from the discussion that there are good uses of
> > Unicode, and that somebody would need to step up and ensure that a dozen of
> > packages are corrected if we were to restrict further the encoding of file
> > names.  Moreover, there seems to be a good self-discipline, and Unicode is
> > not used in paths that are central on non-Unicode systems.
> > 
> > Given that currently the Policy does not mention anything about file names, I
> > think that it would be fair to fill the gap by documenting the use of Unicode
> > as current practice and recommend ASCII for most cases.  This does not preculde
> > further restrictions if needed.  I volunteer to contact the maintainer of
> > lletters-media and ooohg, the only packages with non-Unicode file names.
> > 
> > I attached a slightly updated patch.  I have not added that the policy is for
> > 'the files that have been created after the binary package is "Installed"',
> > because I think that it is clear throughrough chapter 10 that "installed files"
> > means this.  Nevertheless, it would be nice to have such a definition black on
> > white somewhere else, to be discussed in another thread.
> > 
> You say unicode everywhere but you seem to actually mean utf-8...

Indeed I meant UTF-8, sorry for being confusing.

The patch to the Policy already mentions UTF-8:

+      <sec id="filenames">
+       <heading>File names</heading>
+
+       <p>
+         The name of the files and directories installed by binary packages
+         must be encoded in UTF-8 and should be restricted to ASCII when they
+         can be represented in that character set.
+       </p>

I have just opened #703177 on ooohg, and figured out that there is already
#659345 for lletters-media.

Have a nice week-end,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 16 Mar 2013 22:03:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 16 Mar 2013 22:03:04 GMT) Full text and rfc822 format available.

Message #112 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 16 Mar 2013 22:58:11 +0100
On Sat, Mar 09, 2013 at 10:51:45AM +0900, Charles Plessy wrote:
> tag 701081 patch
> thanks
> 
> Dear all,
> 
> I think that it emerges from the discussion that there are good uses of
> Unicode, and that somebody would need to step up and ensure that a dozen of
> packages are corrected if we were to restrict further the encoding of file
> names.  Moreover, there seems to be a good self-discipline, and Unicode is
> not used in paths that are central on non-Unicode systems.

I have yet to see any good use of 8bit finename in Debian binary packages.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 16 Mar 2013 22:15:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 16 Mar 2013 22:15:07 GMT) Full text and rfc822 format available.

Message #117 received at 701081@bugs.debian.org (full text, mbox):

From: Russ Allbery <rra@debian.org>
To: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 16 Mar 2013 15:13:04 -0700
Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr> writes:
> On Sat, Mar 09, 2013 at 10:51:45AM +0900, Charles Plessy wrote:

>> I think that it emerges from the discussion that there are good uses of
>> Unicode, and that somebody would need to step up and ensure that a
>> dozen of packages are corrected if we were to restrict further the
>> encoding of file names.  Moreover, there seems to be a good
>> self-discipline, and Unicode is not used in paths that are central on
>> non-Unicode systems.

> I have yet to see any good use of 8bit finename in Debian binary packages.

Many were posted to this thread.  I guess I just disagree with you on
whether those uses are "good."  For me, allowing the correct spellings of
words and the correct names of things to be represented in file names is
important enough to rise to an ethical goal that I would advocate
adopting.  A pure ASCII stance feels like a very English-centric stance.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 16 Mar 2013 22:27:12 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 16 Mar 2013 22:27:12 GMT) Full text and rfc822 format available.

Message #122 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Russ Allbery <rra@debian.org>, 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 16 Mar 2013 23:24:17 +0100
On Sat, Mar 16, 2013 at 03:13:04PM -0700, Russ Allbery wrote:
> Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr> writes:
> > On Sat, Mar 09, 2013 at 10:51:45AM +0900, Charles Plessy wrote:
> 
> >> I think that it emerges from the discussion that there are good uses of
> >> Unicode, and that somebody would need to step up and ensure that a
> >> dozen of packages are corrected if we were to restrict further the
> >> encoding of file names.  Moreover, there seems to be a good
> >> self-discipline, and Unicode is not used in paths that are central on
> >> non-Unicode systems.
> 
> > I have yet to see any good use of 8bit finename in Debian binary packages.
> 
> Many were posted to this thread.  I guess I just disagree with you on
> whether those uses are "good."  For me, allowing the correct spellings of
> words and the correct names of things to be represented in file names is
> important enough to rise to an ethical goal that I would advocate
> adopting.  A pure ASCII stance feels like a very English-centric stance.

Filename are not translatable, so a better mechanism is needed anyway.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 16 Mar 2013 22:33:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 16 Mar 2013 22:33:07 GMT) Full text and rfc822 format available.

Message #127 received at 701081@bugs.debian.org (full text, mbox):

From: Russ Allbery <rra@debian.org>
To: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 16 Mar 2013 15:28:30 -0700
Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr> writes:
> On Sat, Mar 16, 2013 at 03:13:04PM -0700, Russ Allbery wrote:

>> Many were posted to this thread.  I guess I just disagree with you on
>> whether those uses are "good."  For me, allowing the correct spellings
>> of words and the correct names of things to be represented in file
>> names is important enough to rise to an ethical goal that I would
>> advocate adopting.  A pure ASCII stance feels like a very
>> English-centric stance.

> Filename are not translatable, so a better mechanism is needed anyway.

This discussion isn't about translations, and I don't agree that they're
relevant to this decision.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 16 Mar 2013 22:45:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jonathan Nieder <jrnieder@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 16 Mar 2013 22:45:04 GMT) Full text and rfc822 format available.

Message #132 received at 701081@bugs.debian.org (full text, mbox):

From: Jonathan Nieder <jrnieder@gmail.com>
To: 701081@bugs.debian.org
Subject: Re: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 16 Mar 2013 15:40:19 -0700
Russ Allbery wrote:

>                                 For me, allowing the correct spellings of
> words and the correct names of things to be represented in file names is
> important enough to rise to an ethical goal that I would advocate
> adopting.

This.  Among the examples listed the only one I found convincing was

	Certinomis_-_Autorité_Racine.crt

For test cases, it seems more sensible to just use a tarball, since
restricting oneself to UTF-8 filenames hurts test coverage in the same
way as sticking to ASCII.  But naming files after real entities (like
Certinomis) is both harmless and a good application of a universal
character encoding.



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 16 Mar 2013 23:33:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 16 Mar 2013 23:33:04 GMT) Full text and rfc822 format available.

Message #137 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 17 Mar 2013 08:29:48 +0900
Le Sat, Mar 16, 2013 at 10:58:11PM +0100, Bill Allombert a écrit :
> On Sat, Mar 09, 2013 at 10:51:45AM +0900, Charles Plessy wrote:
> > 
> > I think that it emerges from the discussion that there are good uses of
> > Unicode, and that somebody would need to step up and ensure that a dozen of
> > packages are corrected if we were to restrict further the encoding of file
> > names.  Moreover, there seems to be a good self-discipline, and Unicode is
> > not used in paths that are central on non-Unicode systems.
> 
> I have yet to see any good use of 8bit finename in Debian binary packages.

Hi Bill,

I undestand that you are critical with the idea of allowing 8bit file names.
I am sorry if my brief summary could have given the impression that there
is no "good" reason to refrain from using 8bit file names as well.

At that point of the discussion, I do not see new arguments being added.  We
therefore need to move towards the resolution of this bug.  I see the
possible outcomes.

 a) Status quo: currently there is no policy, and we can decide to not write
    any policy instead of taking one that does not reach consensus. (not my
    favorite).

 b) Disallow non-UTF-8 encodings.  This requires little work (which I started),
    answers to the original issue raised in this bug (there is no policy), and
    does not preclude further restrictions if there is consensus for doing so.

 c) Disallow non-ASCII encodings.  This requires more work, and I am fairly
    confident to write that if nobody takes action and leads the correction of
    the affected packages in the archive, nothing will happen and we will not
    be able to make the corresponding change in the Policy.

If we would tackle the issue with a Condorcet vote, I think that b) would
be chosen, unless there are worries that once we reach b) it will not be
possible to propose c) anymore.  Personally, I trust that Debian's do-ocracy
will work well, and that if there are developers determined to propose c)
and make it happen if our community sees a benefit, being in the b) state
will be a bonus, not a drawback.  I nevertheless need to add that I personally
think that b) is better than c), as shown by the summary that I wrote with
too much bias (sorry again).

Shall we go for b) ?

Cheers,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 17 Mar 2013 10:51:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 17 Mar 2013 10:51:04 GMT) Full text and rfc822 format available.

Message #142 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Russ Allbery <rra@debian.org>, 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 17 Mar 2013 11:46:23 +0100
On Sat, Mar 16, 2013 at 03:28:30PM -0700, Russ Allbery wrote:
> Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr> writes:
> > On Sat, Mar 16, 2013 at 03:13:04PM -0700, Russ Allbery wrote:
> 
> >> Many were posted to this thread.  I guess I just disagree with you on
> >> whether those uses are "good."  For me, allowing the correct spellings
> >> of words and the correct names of things to be represented in file
> >> names is important enough to rise to an ethical goal that I would
> >> advocate adopting.  A pure ASCII stance feels like a very
> >> English-centric stance.
> 
> > Filename are not translatable, so a better mechanism is needed anyway.
> 
> This discussion isn't about translations, and I don't agree that they're
> relevant to this decision.

Precisely, the situation is very different. Instead of displaying text in the
user prefered language and scripts, UTF-8 filenames will be in an arbitrary
scripts which might not be well supported on the user terminal both for output
and input (which might miss support for the correct fonts, left-to-right
support, ligature, input methods etc.), and that the user might not know how to
spell.

And that assuming the user use UTF-8 locale (so the C locale does not qualify).
By contrast ASCII 7-bit is well supported by all Debian systems and is generally
sufficient to carry the small quantity of information needed by filenames, 
and in any case ASCII 7-bit is the current standard practice for filenames so users
are used to them.

I am concerned that UTF-8 filenames in binary packages might hamper the ability 
of the user/sysadmin to query and troubleshout their system, because the name
are not readable, cannot be typed in and cannot be googled easily.

Dealing with a system where ls -R /usr/share/foo report

%ls -R /usr/share/foo/
/usr/share/foo/:
??????????????????
??????????????
?????????????
????????????????
?????????????????
???????????????
??????????????????
??????????????????
?????????????????
??????????????
???????????????
???????????????????
????????????????
???????????????
???????????????
???????
??????????????

/usr/share/foo/???????/:
?????????????????
??????????????????
????????????

is likely to be painful.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 24 Mar 2013 11:03:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 24 Mar 2013 11:03:04 GMT) Full text and rfc822 format available.

Message #147 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: Helmut Grohne <helmut@subdivi.de>, 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 24 Mar 2013 20:01:03 +0900
Le Thu, Feb 21, 2013 at 12:43:28PM +0100, Helmut Grohne a écrit :
> 
> Apparently the debian-policy currently says nothing about the characters
> used in filenames contained in binary packages. Most packages use common
> sense and only use a small subset of US-ASCII. In Debian sid main most
> filenames can be represented using the following subset of US-ASCII
> characters (written as a regular expression):
> 
> 	[][a-zA-Z0-9{}<>() ^/,=:&!*%#$~@+._-]
> 
> The number of exceptions is about 200 contained in about 50 binary
> packages. In those packages some filenames are not representable as
> UTF-8 (for example aspell-is) and others don't make any sense in
> ISO-8859-15 (for example ca-certificates).
> 
> It would be nice if some common ground concerning filename encoding
> could be reached. The options range from a rather restrictive definition
> of acceptable characters via requiring filenames to be representable in
> US-ASCII to mandating a particular encoding (such as UTF-8). This could
> be first introduced as a SHOULD and later turned into a MUST.
> 
> Personally I do not really care about what the precise restriction is as
> long as it permits a mechanical transformation to unicode.

Dear all,

after more than one month of discussion, we have not reached a conclusion.

In the current situation there is no policy, which means that everything is
allowed.  Indeed, there is at least one package with filenames using more than
one set of non-ASCII characters, so no user can see correctly the names of
every file in this package at the same time.

However, I think that it is clear from the discussion is that it would not
satisfy anybody if we would modify the Policy to implement the current
practice, that everything is permitted.

Given that this bug report asks for a policy about the encoding of filenames,
doing nothing is equivalent to reject it.  I therefore propose one more round
of concertation, and if it is not conclusive, I will tag this bug wontfix and
close it (we have 185 other bugs in the queue).

Of course, every developer is free to tackle the issue by working with all the
other package maintainers in order to change the current practice until it
matches something that we do not feel uncomfortable documenting in the Policy. 

On my side, I made a proposal with actionable items: fix the few packages that
are not using UTF-8, and modify the Policy to reflect the current practice
of using ASCII in most of the times and other UTF-8 characters parcimoniously.

I understand very well the arguments against having any UTF-8 character at all,
but we currently have such packages in our archive, so if there is no plan to
modify these packages, then we can not plan to solve this bug.

Can others comment how they would like to see this bug solved ?

Have a nice day,

-- 
Charles



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 30 Mar 2013 05:39:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 30 Mar 2013 05:39:05 GMT) Full text and rfc822 format available.

Message #152 received at 701081@bugs.debian.org (full text, mbox):

From: Russ Allbery <rra@debian.org>
To: 701081@bugs.debian.org, Helmut Grohne <helmut@subdivi.de>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Fri, 29 Mar 2013 22:38:10 -0700
Charles Plessy <plessy@debian.org> writes:

> On my side, I made a proposal with actionable items: fix the few
> packages that are not using UTF-8, and modify the Policy to reflect the
> current practice of using ASCII in most of the times and other UTF-8
> characters parcimoniously.

> I understand very well the arguments against having any UTF-8 character
> at all, but we currently have such packages in our archive, so if there
> is no plan to modify these packages, then we can not plan to solve this
> bug.

> Can others comment how they would like to see this bug solved ?

I think we should require UTF-8 as the character encoding for file names
and fix the non-UTF-8 file names in the archive currently.  None of the
other courses of action really make any sense to me.

To me, that's obviously the right thing to do, so I have a hard time
stepping back far enough to even understand why it's an argument, I guess.
I certainly do agree that using non-ASCII characters in file names that
are unlikely to be in people's fonts or otherwise be difficult to display
is a problem, but I guess that seems like common sense.  But I don't mind
saying something to that effect in Policy.

We have files in the archive already using non-ASCII encodings, and asking
them to convert to ASCII feels like a real step back.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Mon, 01 Apr 2013 06:36:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jonathan Nieder <jrnieder@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Mon, 01 Apr 2013 06:36:05 GMT) Full text and rfc822 format available.

Message #157 received at 701081@bugs.debian.org (full text, mbox):

From: Jonathan Nieder <jrnieder@gmail.com>
To: Charles Plessy <plessy@debian.org>
Cc: Helmut Grohne <helmut@subdivi.de>, 701081@bugs.debian.org
Subject: Re: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 31 Mar 2013 23:32:22 -0700
Charles Plessy wrote:

> after more than one month of discussion, we have not reached a conclusion.
[...]
> Can others comment how they would like to see this bug solved ?

I think wording (requiring UTF-8 filenames) is probably the
appropriate next step.  Yes, maybe not everyone will agree on the
initial wording, but having a base to build on makes constructive
feedback a lot easier.

Some issues were mentioned before regarding different characters with
similar looking glyphs, normalization forms, and unusual characters
that are not widely supported.  But if the initial wording doesn't
manage to nudge the packager in the right direction on those issues, I
don't mind.

Thanks,
Jonathan



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Mon, 01 Apr 2013 17:42:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Don Armstrong <don@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Mon, 01 Apr 2013 17:42:06 GMT) Full text and rfc822 format available.

Message #162 received at 701081@bugs.debian.org (full text, mbox):

From: Don Armstrong <don@debian.org>
To: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Mon, 1 Apr 2013 10:39:19 -0700
On Fri, 29 Mar 2013, Russ Allbery wrote:
> I think we should require UTF-8 as the character encoding for file
> names and fix the non-UTF-8 file names in the archive currently.
> None of the other courses of action really make any sense to me.

I think we should also forbid the use of non ASCII file names in PATH
and recommend that ASCII file names be used where possible, but I also
agree that where ASCII cannot serve, only UTF-8 should be used.


Don Armstrong

-- 
Unix, MS-DOS, and Windows NT (also known as the Good, the Bad, and
the Ugly).
 -- Matt Welsh

http://www.donarmstrong.com              http://rzlab.ucr.edu



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Mon, 01 Apr 2013 17:45:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Mon, 01 Apr 2013 17:45:04 GMT) Full text and rfc822 format available.

Message #167 received at 701081@bugs.debian.org (full text, mbox):

From: Russ Allbery <rra@debian.org>
To: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Mon, 01 Apr 2013 10:43:45 -0700
Don Armstrong <don@debian.org> writes:
> On Fri, 29 Mar 2013, Russ Allbery wrote:

>> I think we should require UTF-8 as the character encoding for file
>> names and fix the non-UTF-8 file names in the archive currently.
>> None of the other courses of action really make any sense to me.

> I think we should also forbid the use of non ASCII file names in PATH
> and recommend that ASCII file names be used where possible, but I also
> agree that where ASCII cannot serve, only UTF-8 should be used.

Yes, those sound like good ideas to me too.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Mon, 01 Apr 2013 19:27:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Helmut Grohne <helmut@subdivi.de>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Mon, 01 Apr 2013 19:27:04 GMT) Full text and rfc822 format available.

Message #172 received at 701081@bugs.debian.org (full text, mbox):

From: Helmut Grohne <helmut@subdivi.de>
To: Charles Plessy <plessy@debian.org>
Cc: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Mon, 1 Apr 2013 11:37:55 +0200
On Sun, Mar 24, 2013 at 08:01:03PM +0900, Charles Plessy wrote:
> after more than one month of discussion, we have not reached a conclusion.

Thanks for the ping.

> In the current situation there is no policy, which means that everything is
> allowed.  Indeed, there is at least one package with filenames using more than
> one set of non-ASCII characters, so no user can see correctly the names of
> every file in this package at the same time.

Some more data here. I checked sid main amd64 binary packages. The only
ones containing invalid UTF-8 sequences (and thus violating the current
proposal) would be aspell-is and jpilot. This suggests that UTF-8 is a
defacto standard already. Fixing two packages shouldn't be that hard. I
have filed a wishlist bug #704446 against lintian to check for this
regardless of the outcome of this bug.

> On my side, I made a proposal with actionable items: fix the few packages that
> are not using UTF-8, and modify the Policy to reflect the current practice
> of using ASCII in most of the times and other UTF-8 characters parcimoniously.

I am in favour of this solution.

 * Requiring any subset of UTF-8 has the direct benefit of being able to
   interpret all filenames used without guesswork.
 * This is in line with Fedora's policy.
 * I saw very little disagreement about whether to permit non-UTF-8
   sequences. Discussion seemed mostly to be around which subset to
   require.

> I understand very well the arguments against having any UTF-8 character at all,
> but we currently have such packages in our archive, so if there is no plan to
> modify these packages, then we can not plan to solve this bug.

I see little benefit with restricting to ASCII compared to the benefit
with restricting to UTF-8. Remember that the goal of this bug was to
make filenames machine readable. I think that further restrictions
should happen in the context of #99933. I asked for not merging these
issues, because I would like to keep the scope of this issue limited and
thus implementable.

> Can others comment how they would like to see this bug solved ?

Any proposal that limits to a subset of UTF-8 and a superset of
printable ASCII is fine with me. My preferred choice would be just
UTF-8. I have no objections to recommending the use of a subset of
printable ASCII either.

To me it appears to be a matter of wording right now. Consensus is
basically there. Implementing it would cause two policy violations
(aspell-is and jpilot), which imo is little impact.

Helmut



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 06 Apr 2013 11:21:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 06 Apr 2013 11:21:05 GMT) Full text and rfc822 format available.

Message #177 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: 701081@bugs.debian.org, Helmut Grohne <helmut@subdivi.de>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 6 Apr 2013 20:20:15 +0900
Le Mon, Apr 01, 2013 at 10:39:19AM -0700, Don Armstrong a écrit :
> On Fri, 29 Mar 2013, Russ Allbery wrote:
> > I think we should require UTF-8 as the character encoding for file
> > names and fix the non-UTF-8 file names in the archive currently.
> > None of the other courses of action really make any sense to me.
> 
> I think we should also forbid the use of non ASCII file names in PATH
> and recommend that ASCII file names be used where possible, but I also
> agree that where ASCII cannot serve, only UTF-8 should be used.

Hello everybody,

Here is a somewhat clumsy proposition.

      <sec id="filenames">
        <heading>File names</heading>

        <p>
          The name of the files installed by binary packages in the system PATH 
          (namely <tt>/bin</tt>, <tt>/sbin</tt>, <tt>/usr/bin</tt>,
          <tt>/usr/sbin</tt> and <tt>/usr/games/</tt>) must be encoded in
          ASCII.
        </p>

        <p>
          The name of the files and directories installed by binary packages
          outside the system PATH must be encoded in UTF-8 and should be
          restricted to ASCII when they can be represented in that character
          set.
        </p>
      </sec>


What do you think ?

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sat, 06 Apr 2013 14:21:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sat, 06 Apr 2013 14:21:04 GMT) Full text and rfc822 format available.

Message #182 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Jonathan Nieder <jrnieder@gmail.com>, 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sat, 6 Apr 2013 16:19:34 +0200
On Sat, Mar 16, 2013 at 03:40:19PM -0700, Jonathan Nieder wrote:
> Russ Allbery wrote:
> 
> >                                 For me, allowing the correct spellings of
> > words and the correct names of things to be represented in file names is
> > important enough to rise to an ethical goal that I would advocate
> > adopting.
> 
> This.  Among the examples listed the only one I found convincing was
> 
> 	Certinomis_-_Autorité_Racine.crt

It might be advantageous for the certification autority to use UTF-8 to encode
its name, but the benefit for the user of the system is something entirely
different.

As long as the user is using UTF-8 locale and the terminal is able to handle
the script properly, there might be little harm done. However this does not
need to be the case. 

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 07 Apr 2013 22:33:19 GMT) Full text and rfc822 format available.

Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 07 Apr 2013 22:33:19 GMT) Full text and rfc822 format available.

Message #187 received at 701081@bugs.debian.org (full text, mbox):

From: Russ Allbery <rra@debian.org>
To: Charles Plessy <plessy@debian.org>
Cc: 701081@bugs.debian.org, Helmut Grohne <helmut@subdivi.de>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 07 Apr 2013 15:28:11 -0700
Charles Plessy <plessy@debian.org> writes:

> Hello everybody,

> Here is a somewhat clumsy proposition.

>       <sec id="filenames">
>         <heading>File names</heading>

>         <p>
>           The name of the files installed by binary packages in the system PATH 
>           (namely <tt>/bin</tt>, <tt>/sbin</tt>, <tt>/usr/bin</tt>,
>           <tt>/usr/sbin</tt> and <tt>/usr/games/</tt>) must be encoded in
>           ASCII.
>         </p>

>         <p>
>           The name of the files and directories installed by binary packages
>           outside the system PATH must be encoded in UTF-8 and should be
>           restricted to ASCII when they can be represented in that character
>           set.
>         </p>
>       </sec>

This looks good to me.  I think that strikes the right balance without
going into too many details about what justification should or shouldn't
be required for using UTF-8.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 07 Apr 2013 22:42:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jonathan Nieder <jrnieder@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 07 Apr 2013 22:42:04 GMT) Full text and rfc822 format available.

Message #192 received at 701081@bugs.debian.org (full text, mbox):

From: Jonathan Nieder <jrnieder@gmail.com>
To: Charles Plessy <plessy@debian.org>
Cc: 701081@bugs.debian.org, Helmut Grohne <helmut@subdivi.de>
Subject: Re: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 7 Apr 2013 15:39:07 -0700
Charles Plessy wrote:

>       <sec id="filenames">
>         <heading>File names</heading>
>
>         <p>
>           The name of the files installed by binary packages in the system PATH 
>           (namely <tt>/bin</tt>, <tt>/sbin</tt>, <tt>/usr/bin</tt>,
>           <tt>/usr/sbin</tt> and <tt>/usr/games/</tt>) must be encoded in
>           ASCII.
>         </p>
>
>         <p>
>           The name of the files and directories installed by binary packages
>           outside the system PATH must be encoded in UTF-8 and should be
>           restricted to ASCII when they can be represented in that character
>           set.
>         </p>
>       </sec>

Seconded.

Thanks,
Jonathan



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 07 Apr 2013 22:51:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michael Shuler <michael@pbandjelly.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 07 Apr 2013 22:51:04 GMT) Full text and rfc822 format available.

Message #197 received at 701081@bugs.debian.org (full text, mbox):

From: Michael Shuler <michael@pbandjelly.org>
To: debian-policy@lists.debian.org
Cc: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 07 Apr 2013 17:50:12 -0500
On 04/07/2013 05:28 PM, Russ Allbery wrote:
> Charles Plessy <plessy@debian.org> writes:
>> Here is a somewhat clumsy proposition.

It sounds clear and concise to me.

>>       <sec id="filenames">
>>         <heading>File names</heading>
> 
>>         <p>
>>           The name of the files installed by binary packages in the system PATH 
>>           (namely <tt>/bin</tt>, <tt>/sbin</tt>, <tt>/usr/bin</tt>,
>>           <tt>/usr/sbin</tt> and <tt>/usr/games/</tt>) must be encoded in
>>           ASCII.
>>         </p>
> 
>>         <p>
>>           The name of the files and directories installed by binary packages
>>           outside the system PATH must be encoded in UTF-8 and should be
>>           restricted to ASCII when they can be represented in that character
>>           set.
>>         </p>
>>       </sec>
> 
> This looks good to me.  I think that strikes the right balance without
> going into too many details about what justification should or shouldn't
> be required for using UTF-8.

Agreed. As one of the concerned package maintainers, I think this sounds
fine.

-- 
Kind regards,
Michael Shuler



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 07 Apr 2013 23:21:08 GMT) Full text and rfc822 format available.

Acknowledgement sent to Julian Gilbey <jdg@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 07 Apr 2013 23:21:09 GMT) Full text and rfc822 format available.

Message #202 received at 701081@bugs.debian.org (full text, mbox):

From: Julian Gilbey <jdg@debian.org>
To: Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Cc: Helmut Grohne <helmut@subdivi.de>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Mon, 8 Apr 2013 00:18:37 +0100
On Sat, Apr 06, 2013 at 08:20:15PM +0900, Charles Plessy wrote:
> Here is a somewhat clumsy proposition.
> 
>       <sec id="filenames">
>         <heading>File names</heading>
> 
>         <p>
>           The name of the files installed by binary packages in the system PATH 
>           (namely <tt>/bin</tt>, <tt>/sbin</tt>, <tt>/usr/bin</tt>,
>           <tt>/usr/sbin</tt> and <tt>/usr/games/</tt>) must be encoded in
>           ASCII.
>         </p>

For consistency, I guess this should be /usr/games rather than
/usr/games/.

>         <p>
>           The name of the files and directories installed by binary packages
>           outside the system PATH must be encoded in UTF-8 and should be
>           restricted to ASCII when they can be represented in that character
>           set.
>         </p>
>       </sec>
> 
> 
> What do you think ?

That sounds a very reasonable proposal.

The final paragraph seems a little bit vague; would "should be
restricted to ASCII when it is possible to do so" be clearer?  For if
Unicode characters can be represented in ASCII, they almost always
would be.  This alternative wording would suggest that using
characters such as em-dashes or non-breaking spaces or the like is not
good (though I doubt people would use them as filenames of packaged
files!).

   Julian



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Mon, 08 Apr 2013 09:09:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Helmut Grohne <helmut@subdivi.de>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Mon, 08 Apr 2013 09:09:05 GMT) Full text and rfc822 format available.

Message #207 received at 701081@bugs.debian.org (full text, mbox):

From: Helmut Grohne <helmut@subdivi.de>
To: Charles Plessy <plessy@debian.org>
Cc: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Mon, 8 Apr 2013 11:04:30 +0200
On Sat, Apr 06, 2013 at 08:20:15PM +0900, Charles Plessy wrote:
>       <sec id="filenames">
>         <heading>File names</heading>
> 
>         <p>
>           The name of the files installed by binary packages in the system PATH 
>           (namely <tt>/bin</tt>, <tt>/sbin</tt>, <tt>/usr/bin</tt>,
>           <tt>/usr/sbin</tt> and <tt>/usr/games/</tt>) must be encoded in
>           ASCII.
>         </p>
> 
>         <p>
>           The name of the files and directories installed by binary packages
>           outside the system PATH must be encoded in UTF-8 and should be
>           restricted to ASCII when they can be represented in that character
>           set.
>         </p>
>       </sec>
> 
> 
> What do you think ?

Thanks to all involved parties for your work on this issue. I am very
much satisfied with the result and happy that it is met with consensus.
The suggestions of Julian Gilbey appear sensible, but do not touch the
general direction.

Helmut



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 14 Apr 2013 09:03:09 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 14 Apr 2013 09:03:09 GMT) Full text and rfc822 format available.

Message #212 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: 701081@bugs.debian.org
Cc: Helmut Grohne <helmut@subdivi.de>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 14 Apr 2013 18:01:10 +0900
Le Mon, Apr 08, 2013 at 12:18:37AM +0100, Julian Gilbey a écrit :
> 
> For consistency, I guess this should be /usr/games rather than
> /usr/games/.
 
> The final paragraph seems a little bit vague; would "should be
> restricted to ASCII when it is possible to do so" be clearer?  For if
> Unicode characters can be represented in ASCII, they almost always
> would be.  This alternative wording would suggest that using
> characters such as em-dashes or non-breaking spaces or the like is not
> good (though I doubt people would use them as filenames of packaged
> files!).

Thanks everybody for the feedback.  I am ready to commit the patch,
updated following Julian's suggestions.  But strictly speaking, I
need one more formal seconding statement for this :)

Cheers,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 14 Apr 2013 09:45:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Julian Gilbey <julian@d-and-j.net>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 14 Apr 2013 09:45:04 GMT) Full text and rfc822 format available.

Message #217 received at 701081@bugs.debian.org (full text, mbox):

From: Julian Gilbey <julian@d-and-j.net>
To: Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Cc: Helmut Grohne <helmut@subdivi.de>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 14 Apr 2013 10:40:56 +0100
On Sun, Apr 14, 2013 at 06:01:10PM +0900, Charles Plessy wrote:
> Le Mon, Apr 08, 2013 at 12:18:37AM +0100, Julian Gilbey a écrit :
> > 
> > For consistency, I guess this should be /usr/games rather than
> > /usr/games/.
>  
> > The final paragraph seems a little bit vague; would "should be
> > restricted to ASCII when it is possible to do so" be clearer?  For if
> > Unicode characters can be represented in ASCII, they almost always
> > would be.  This alternative wording would suggest that using
> > characters such as em-dashes or non-breaking spaces or the like is not
> > good (though I doubt people would use them as filenames of packaged
> > files!).
> 
> Thanks everybody for the feedback.  I am ready to commit the patch,
> updated following Julian's suggestions.  But strictly speaking, I
> need one more formal seconding statement for this :)

I'm happy to second the proposal.

   Julian



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 14 Apr 2013 10:00:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 14 Apr 2013 10:00:04 GMT) Full text and rfc822 format available.

Message #222 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Cc: Helmut Grohne <helmut@subdivi.de>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 14 Apr 2013 11:58:03 +0200
On Sat, Apr 06, 2013 at 08:20:15PM +0900, Charles Plessy wrote:
> Le Mon, Apr 01, 2013 at 10:39:19AM -0700, Don Armstrong a écrit :
> > On Fri, 29 Mar 2013, Russ Allbery wrote:
> > > I think we should require UTF-8 as the character encoding for file
> > > names and fix the non-UTF-8 file names in the archive currently.
> > > None of the other courses of action really make any sense to me.
> > 
> > I think we should also forbid the use of non ASCII file names in PATH
> > and recommend that ASCII file names be used where possible, but I also
> > agree that where ASCII cannot serve, only UTF-8 should be used.
> 
> Hello everybody,
> 
> Here is a somewhat clumsy proposition.
> 
>       <sec id="filenames">
>         <heading>File names</heading>
> 
>         <p>
>           The name of the files installed by binary packages in the system PATH 
>           (namely <tt>/bin</tt>, <tt>/sbin</tt>, <tt>/usr/bin</tt>,
>           <tt>/usr/sbin</tt> and <tt>/usr/games/</tt>) must be encoded in
>           ASCII.
>         </p>
> 
>         <p>
>           The name of the files and directories installed by binary packages
>           outside the system PATH must be encoded in UTF-8 and should be
>           restricted to ASCII when they can be represented in that character
>           set.
>         </p>
>       </sec>
> 
> 
> What do you think ?

I think configuration files should also be included in the first list, because the
user is supposed to be able to interact dirrectly with them.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 14 Apr 2013 10:30:15 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 14 Apr 2013 10:30:15 GMT) Full text and rfc822 format available.

Message #227 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 14 Apr 2013 12:13:59 +0200
On Sat, Apr 06, 2013 at 08:20:15PM +0900, Charles Plessy wrote:
> Le Mon, Apr 01, 2013 at 10:39:19AM -0700, Don Armstrong a écrit :
> > On Fri, 29 Mar 2013, Russ Allbery wrote:
> > > I think we should require UTF-8 as the character encoding for file
> > > names and fix the non-UTF-8 file names in the archive currently.
> > > None of the other courses of action really make any sense to me.
> > 
> > I think we should also forbid the use of non ASCII file names in PATH
> > and recommend that ASCII file names be used where possible, but I also
> > agree that where ASCII cannot serve, only UTF-8 should be used.
> 
> Hello everybody,
> 
> Here is a somewhat clumsy proposition.
> 
>       <sec id="filenames">
>         <heading>File names</heading>
> 
>         <p>
>           The name of the files installed by binary packages in the system PATH 
>           (namely <tt>/bin</tt>, <tt>/sbin</tt>, <tt>/usr/bin</tt>,
>           <tt>/usr/sbin</tt> and <tt>/usr/games/</tt>) must be encoded in
>           ASCII.
>         </p>

I am not sure I like the idea of indirectly defining the system PATH in the 
'File names' section. If we want policy to define the system PATH, we should do
it in 10.1, I think.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 14 Apr 2013 11:57:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Helmut Grohne <helmut@subdivi.de>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 14 Apr 2013 11:57:04 GMT) Full text and rfc822 format available.

Message #232 received at 701081@bugs.debian.org (full text, mbox):

From: Helmut Grohne <helmut@subdivi.de>
To: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
Cc: Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Sun, 14 Apr 2013 13:55:29 +0200
On Sun, Apr 14, 2013 at 11:58:03AM +0200, Bill Allombert wrote:
> I think configuration files should also be included in the first list, because the
> user is supposed to be able to interact dirrectly with them.

I object to this extension of the proposal, because use of UTF-8
characters in conffile names is a current use case of ca-certificates.
If anything it could be treated as a "should" and turned into "must"
after working with the ca-certificates maintainers on a solution.

Helmut



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 14 Apr 2013 12:27:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 14 Apr 2013 12:27:03 GMT) Full text and rfc822 format available.

Message #237 received at 701081@bugs.debian.org (full text, mbox):

From: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
To: Helmut Grohne <helmut@subdivi.de>, 701081@bugs.debian.org
Cc: Charles Plessy <plessy@debian.org>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packageso
Date: Sun, 14 Apr 2013 14:22:47 +0200
On Sun, Apr 14, 2013 at 01:55:29PM +0200, Helmut Grohne wrote:
> On Sun, Apr 14, 2013 at 11:58:03AM +0200, Bill Allombert wrote:
> > I think configuration files should also be included in the first list, because the
> > user is supposed to be able to interact dirrectly with them.
> 
> I object to this extension of the proposal, because use of UTF-8
> characters in conffile names is a current use case of ca-certificates.
> If anything it could be treated as a "should" and turned into "must"
> after working with the ca-certificates maintainers on a solution.

Why files in ca-certificates are configuration files in the first place ?
I doubt users are expected to edit PEM certificate.

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Sun, 14 Apr 2013 12:51:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Helmut Grohne <helmut@subdivi.de>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Sun, 14 Apr 2013 12:51:04 GMT) Full text and rfc822 format available.

Message #242 received at 701081@bugs.debian.org (full text, mbox):

From: Helmut Grohne <helmut@subdivi.de>
To: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr>
Cc: 701081@bugs.debian.org, Charles Plessy <plessy@debian.org>
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packageso
Date: Sun, 14 Apr 2013 14:47:27 +0200
On Sun, Apr 14, 2013 at 02:22:47PM +0200, Bill Allombert wrote:
> Why files in ca-certificates are configuration files in the first place ?
> I doubt users are expected to edit PEM certificate.

Correction of what I said before: ca-certificates does not ship them as
conffiles, but as configuration files.

Actually they are symbolic links to the actual certificates shipped
within /usr/share. The purpose of the links is to allow the user to
remove particular certificates, that she does not trust. As such those
symbolic links express configuration choices.

As it stands I see ca-certificates as a valid use case of UTF-8
characters in configuration file names. I strongly suggest to talk to
the ca-certificates maintainers before changing the policy in a way this
way.

The reason for reporting this bug was to get a way to interpret
filenames *now*. The proposed wording (by Charles Plessy) enables us to
do so. I would like to see further restrictions on filenames deferred to
another issue, because it has less of a perceived benefit and there is
not the broad consensus and support for further restrictions. Clearly
further discussion is required for these.

Helmut



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Mon, 12 Aug 2013 23:39:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Mon, 12 Aug 2013 23:39:04 GMT) Full text and rfc822 format available.

Message #247 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Tue, 13 Aug 2013 08:36:15 +0900
Hello everybody,

in light with the discussion about UTF-8 on the debian-devel mailing list,
I would like to close the issue 701081 about filename encodings.

I reproduce here the addition that has been worded by me, seconded by Jonathan
Nieder and Julian Gilbey, and supported by others.

>       <sec id="filenames">
>         <heading>File names</heading>
> 
>         <p>
>           The name of the files installed by binary packages in the system PATH 
>           (namely <tt>/bin</tt>, <tt>/sbin</tt>, <tt>/usr/bin</tt>,
>           <tt>/usr/sbin</tt> and <tt>/usr/games/</tt>) must be encoded in
>           ASCII.
>         </p>
> 
>         <p>
>           The name of the files and directories installed by binary packages
>           outside the system PATH must be encoded in UTF-8 and should be
>           restricted to ASCII when they can be represented in that character
>           set.
>         </p>
>       </sec>

The last objections were that it does not mandate ASCII for configuration files,
and that the system PATH should not be defined here.

For the system PATH, I think that we can move the definition anytime to a new
dedicated section; it only requires somebody to work on it and propose a
wording.  Alternatively, what is in parenthesis above can be turned into a
footnote.

For the configuration files, further restrictions would make some packages
non-compliant, and are not consensual.  On the other hand, the proposed patch
respects the current practice, through its general recommendation of ASCII with
a "should".

Unless there are further objections, I will go ahead with the wording above
(or with the parenthesis turned in a footnote).

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Policy List <debian-policy@lists.debian.org>:
Bug#701081; Package debian-policy. (Wed, 14 Aug 2013 23:21:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles Plessy <plessy@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Policy List <debian-policy@lists.debian.org>. (Wed, 14 Aug 2013 23:21:04 GMT) Full text and rfc822 format available.

Message #252 received at 701081@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: 701081@bugs.debian.org
Subject: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Date: Thu, 15 Aug 2013 08:16:43 +0900
tag 701081 pending
thanks

Le Tue, Aug 13, 2013 at 08:36:15AM +0900, Charles Plessy a écrit :
> 
> >       <sec id="filenames">
> >         <heading>File names</heading>
> > 
> >         <p>
> >           The name of the files installed by binary packages in the system PATH 
> >           (namely <tt>/bin</tt>, <tt>/sbin</tt>, <tt>/usr/bin</tt>,
> >           <tt>/usr/sbin</tt> and <tt>/usr/games/</tt>) must be encoded in
> >           ASCII.
> >         </p>
> > 
> >         <p>
> >           The name of the files and directories installed by binary packages
> >           outside the system PATH must be encoded in UTF-8 and should be
> >           restricted to ASCII when they can be represented in that character
> >           set.
> >         </p>
> >       </sec>
 
> Unless there are further objections, I will go ahead with the wording above
> (or with the parenthesis turned in a footnote).

Hello everybody,

I pushed it as it is.

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan



Added tag(s) pending. Request was from Charles Plessy <plessy@debian.org> to control@bugs.debian.org. (Wed, 14 Aug 2013 23:21:14 GMT) Full text and rfc822 format available.

Reply sent to Charles Plessy <plessy@debian.org>:
You have taken responsibility. (Mon, 28 Oct 2013 01:21:47 GMT) Full text and rfc822 format available.

Notification sent to Helmut Grohne <helmut@subdivi.de>:
Bug acknowledged by developer. (Mon, 28 Oct 2013 01:21:47 GMT) Full text and rfc822 format available.

Message #259 received at 701081-close@bugs.debian.org (full text, mbox):

From: Charles Plessy <plessy@debian.org>
To: 701081-close@bugs.debian.org
Subject: Bug#701081: fixed in debian-policy 3.9.5.0
Date: Mon, 28 Oct 2013 01:18:26 +0000
Source: debian-policy
Source-Version: 3.9.5.0

We believe that the bug you reported is fixed in the latest version of
debian-policy, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 701081@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Charles Plessy <plessy@debian.org> (supplier of updated debian-policy package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Format: 1.8
Date: Mon, 28 Oct 2013 09:40:48 +0900
Source: debian-policy
Binary: debian-policy
Architecture: source all
Version: 3.9.5.0
Distribution: unstable
Urgency: low
Maintainer: Debian Policy List <debian-policy@lists.debian.org>
Changed-By: Charles Plessy <plessy@debian.org>
Description: 
 debian-policy - Debian Policy Manual and related documents
Closes: 668394 669915 671355 676784 679326 690293 691352 697433 698030 700536 700574 701081 703022 704657 705403 706778 707077 707183 715804 720507
Changes: 
 debian-policy (3.9.5.0) unstable; urgency=low
 .
   * Policy: Document the Package-List field.
     Wording: Charles Plessy <plessy@debian.org>
     Seconded: Russ Allbery <rra@debian.org>
     Seconded: Guillem Jover <guillem@debian.org>
     Closes: #697433
   * Policy: DM-Upload-Allowed is now obsolete
     Wording: Charles Plessy <plessy@debian.org>
     Seconded: Russ Allbery <rra@debian.org>
     Seconded: Ansgar Burchardt <ansgar@debian.org>
     Seconded: Guillem Jover <guillem@debian.org>
     Closes: #679326
   * Policy: Checksums-{Sha1,Sha256} are now mandatory
     Wording: Charles Plessy <plessy@debian.org>
     Seconded: Guillem Jover <guillem@debian.org>
     Seconded: Ansgar Burchardt <ansgar@debian.org>
     Closes: #690293
   * Policy: Requirements for udebs are not well documented yet
     Wording: Russ Allbery <rra@debian.org>
     Wording: Jonathan Nieder <jrnieder@gmail.com>
     Seconded: Charles Plessy <plessy@debian.org>
     Seconded: Cyril Brulebois <kibi@debian.org>
     Seconded: Russ Allbery <rra@debian.org>
     Closes: #698030
   * Policy: install-info is run by a dpkg trigger.
     Wording: Jonathan Nieder <jrnieder@gmail.com>
     Seconded: Charles Plessy <plessy@debian.org>
     Seconded: Russ Allbery <rra@debian.org>
     Closes: #669915
   * Policy: Stop recommending to serve HTML documents from /usr/share/doc.
     Wording: Thomas Goirand <zigo@debian.org>
     Seconded: Charles Plessy <plessy@debian.org>
     Seconded: Jonathan Nieder <jrnieder@gmail.com>
     Closes: #715804
   * Policy: File names encoded in UTF-8. ASCII preferred and mandatory in PATH.
     Wording: Charles Plessy <plessy@debian.org>
     Seconded: Jonathan Nieder <jrnieder@gmail.com>
     Seconded: Julian Gilbey <julian@d-and-j.net>
     Closes: #701081
   * Policy: Document the Dgit field for Debian Source Control files.
     Wording: Ian Jackson <ijackson@chiark.greenend.org.uk>
     Seconded: Charles Plessy <plessy@debian.org>
     Seconded: Joey Hess <joeyh@debian.org>
     Seconded: Dmitrijs Ledkovs <xnox@debian.org>
     Closes: #720507
   * Policy: Remove the exception to the FHS for the /selinux directory.
     Wording: Charles Plessy <plessy@debian.org>
     Seconded: Steve Langasek <vorlon@debian.org>
     Seconded: Julien Cristau <jcristau@debian.org>
     Closes: #707183
   * Policy: on upgrades, recommend removing obsolete unchanged conf. files.
     Wording: Paul Wise <pabs@debian.org>
     Seconded: Jonathan Nieder <jrnieder@gmail.com>
     Seconded: Charles Plessy <plessy@debian.org>
     Closes: #707077
   * Policy: Control data fields must not start with a hyphen character.
     Wording: Niels Thykier <niels@thykier.net>
     Seconded: Russ Allbery <rra@debian.org>
     Seconded: Guillem Jover <guillem@debian.org>
     Closes: #706778
   * debconf_spec: Document the 'escape' capability.
     Wording: Jonathan Nieder <jrnieder@gmail.com>
     Seconded: Charles Plessy <plessy@debian.org>
     Seconded: Russ Allbery <rra@debian.org>
     Closes: #671355
   * virtual-package-names-list: removed mp3-decoder and mp3-encoder.
     Seconded: Jonathan Nieder <jrnieder@gmail.com>
     Seconded: Kurt Roeckx <kurt@roeckx.be>
     Seconded: Charles Plessy <plessy@debian.org>
     Closes: #668394
   * Clean outdated mentions of dpkg commands in appendix. Thanks, Guillem Jover
   * Remove outdated mention of dselect documentation.
     Closes: #700574.  Thanks, Guillem Jover.
   * Update dak reference from old katie name.
     Closes: #700536.  Thanks, Guillem Jover.
   * Fix typo in 8.6.4.  Thanks, Raúl Benencia.  (Closes: #691352)
   * Fix typo in 8.6.4.1.  Thanks, Salvatore Bonaccorso <carnil@debian.org>.
   * Added a warning in appendix G about diverting conffiles.
     Closes: #703022.  Thanks, Torsten Jerzembeck.
   * List build-arch and build-indep with the other required targets in 4.9.
     Closes: #704657.  Thanks, Philipp Hahn.
   * Replaced non-standard names of dpkg states by normalised ones.
     Closes: #705403
   * Clarify what is meant by "compressed" in section 10.5. (Closes: #676784)
   * Packaging: use the VCS URLs proposed by Lintian.
   * Packaging: normalised debian/control with the tool "config-model-edit".
   * Packaging: refreshed the names of the Policy Editors.
Checksums-Sha1: 
 49c5f971214313898754f1644b0ee93d2fba5b1a 1905 debian-policy_3.9.5.0.dsc
 44c176e8eb47b2ab31cf4a0f3b77d77d7b4ecea4 705836 debian-policy_3.9.5.0.tar.gz
 7d6edc865b3dd4d17f8a1f3f8a83febe0f6fa76d 1875912 debian-policy_3.9.5.0_all.deb
Checksums-Sha256: 
 39188d1779a5f79f5d742dc5909107440b236799d43df14d8ab2b1922a6f5b3e 1905 debian-policy_3.9.5.0.dsc
 2314a8daee0bbf212a5de119364d8350190f396a7104646c0540b2d80443dbf5 705836 debian-policy_3.9.5.0.tar.gz
 22def8fecebfc8332de984581602e68b584489ba512c2b9545daedd2248d7fb6 1875912 debian-policy_3.9.5.0_all.deb
Files: 
 7236d4ae22ea8706e70351241b33133f 1905 doc optional debian-policy_3.9.5.0.dsc
 e077997b84cb2463ca0cb02a03eb346b 705836 doc optional debian-policy_3.9.5.0.tar.gz
 b20057c896ccd385cb23d34cced3b5ac 1875912 doc optional debian-policy_3.9.5.0_all.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)

iQIcBAEBCgAGBQJSbbiQAAoJEMW9bI8ildUCNGUP/1R/Fy1fCZziEnf0jdWAhXK4
7hg58u4O9im+iDa47KjvmFs6lSibRn4gEGX4RbEAJzXAM7Yl8j/CNhZb3rX7vhs1
svX7zCXJSAgcYCCHiXXsR0QRWQdxuPCuAfRkx3aYsXcppWQePsr5QmhSJBADESQ1
UqLdi2bbhtV5ONJpBM5uAjIQDcIYyE3trHguPGO4aM46wJZT3RriMy09esiPMX7k
wdliCTHdUf3ozQxtnoj+FqisAP4bAGIUKVzNSFpVTbtKS9SOp/lUPCf5ePW6tQMZ
q9fB6Apigf3m9tUcrfiIhIkBM7EFtqtCT67OIdIGr3ZL5ku+DDrc83CmVvtYmsgs
+BycHdV5KCcD5LLMlFUKso0URgQSxoCYDBOgqUhDLm75fW7Sek2JRX4iLEoo4FF0
x4GQfbNmoToY8axJpU2oJYtd/pGfaZpXqWl6jvqKGIS/PaglS2F//oEIWN0q1grc
01WC2rppiumTb0eE+C7y7eQ9bIOEVqZFYYA52KHAeb7C0mzZoJ0bXR4wXrwhRZdm
8lODxmawknIFjDGZhar+Df8VdsGlvSRsP3SKDjEOu6MRJuDBEfGJtEd4BwqikYRL
gdUfNzyIIGOJF7J4Jrm7WaAoJMm9xWnBFuz/BPPAJr92ZQWr6QIoscOfeR/iPsaz
YUst3H9ResYJJ6xDGi4B
=+tfQ
-----END PGP SIGNATURE-----




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Thu, 05 Dec 2013 07:40:26 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Thu Apr 17 01:02:56 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.