Debian Bug report logs - #698258
RFP: python-charade -- universal encoding detector for Python 2 and Python 3

Package: wnpp; Maintainer for wnpp is wnpp@debian.org;

Reported by: Daniele Tricoli <eriol@mornie.org>

Date: Wed, 16 Jan 2013 02:33:01 UTC

Severity: wishlist

Blocking fix for 729781: ITP: subliminal -- Command-line tool to search and download subtitles

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, debian-devel@lists.debian.org, wnpp@debian.org:
Bug#698258; Package wnpp. (Wed, 16 Jan 2013 02:33:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniele Tricoli <eriol@mornie.org>:
New Bug report received and forwarded. Copy sent to debian-devel@lists.debian.org, wnpp@debian.org. (Wed, 16 Jan 2013 02:33:04 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Daniele Tricoli <eriol@mornie.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: ITP: python-charade -- universal encoding detector for Python 2 and Python 3
Date: Wed, 16 Jan 2013 03:29:20 +0100
Package: wnpp
Severity: wishlist
Owner: Daniele Tricoli <eriol@mornie.org>

* Package name    : python-charade
  Version         : 1.0.1
  Upstream Author : Ian Cordasco <graffatcolmingov@gmail.com>
* URL             : https://github.com/sigmavirus24/charade
* License         : LGPL
  Programming Lang: Python
  Description     : universal encoding detector for Python 2 and Python 3

 python-charade is a port of Mark Pilgrim's chardet with support for both
 Python 2 and Python 3.

 Supported encodings:

   - ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
   - Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and
     Simplified Chinese)
   - EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)
   - EUC-KR, ISO-2022-KR (Korean)
   - KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251
     (Cyrillic)
   - ISO-8859-2, windows-1250 (Hungarian)
   - ISO-8859-5, windows-1251 (Bulgarian)
   - windows-1252 (English)
   - ISO-8859-7, windows-1253 (Greek)
   - ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
   - TIS-620 (Thai)

The package will be maintained under the umbrella of the DPMT and it's
a dependency for the new version (1.1.0) of python-requests.



Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Daniele Tricoli <eriol@mornie.org>:
Bug#698258; Package wnpp. (Thu, 17 Jan 2013 11:42:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Piotr Ożarowski <piotr@debian.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Daniele Tricoli <eriol@mornie.org>. (Thu, 17 Jan 2013 11:42:03 GMT) Full text and rfc822 format available.

Message #10 received at 698258@bugs.debian.org (full text, mbox):

From: Piotr Ożarowski <piotr@debian.org>
To: 698258@bugs.debian.org
Subject: Re: Bug#698258: ITP: python-charade -- universal encoding detector for Python 2 and Python 3
Date: Thu, 17 Jan 2013 12:38:12 +0100
>  python-charade is a port of Mark Pilgrim's chardet with support for both
>  Python 2 and Python 3.

if Python 3 support is the only reason why it was forked, note that we
already have python3-chardet in Debian. Are there any other advantages?

> The package will be maintained under the umbrella of the DPMT and it's
> a dependency for the new version (1.1.0) of python-requests.

can requests use chardet?



Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org:
Bug#698258; Package wnpp. (Fri, 18 Jan 2013 02:45:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniele Tricoli <eriol@mornie.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org. (Fri, 18 Jan 2013 02:45:04 GMT) Full text and rfc822 format available.

Message #15 received at 698258@bugs.debian.org (full text, mbox):

From: Daniele Tricoli <eriol@mornie.org>
To: Piotr Ożarowski <piotr@debian.org>, 698258@bugs.debian.org
Subject: Re: Bug#698258: ITP: python-charade -- universal encoding detector for Python 2 and Python 3
Date: Fri, 18 Jan 2013 03:42:55 +0100
[Message part 1 (text/plain, inline)]
Hello Piotr,
thanks for your comments!

On Thursday 17 January 2013 12:38:12 Piotr Ożarowski wrote:
> >  python-charade is a port of Mark Pilgrim's chardet with support for
> >  both Python 2 and Python 3.
> 
> if Python 3 support is the only reason why it was forked, note that we
> already have python3-chardet in Debian. Are there any other advantages?

The Python 3 support is not what made me think about packaging python-
charade: right now python3-requests 0.12.1-1 is using python3-chardet.

Note that I missed, when I sent the ITP, that the following is true for the 
development version. I took project information from the git but I missed 
that the default branch is the development one:

Inside clean and isolated virtualenv:

Python 2.7.3 (default, Jan  2 2013, 13:56:14) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import charade
>>> data = open('bom-utf-16-be.srt', 'rb').read()
>>> charade.detect(data)
{'confidence': 1.0, 'encoding': 'UTF-16BE'}

Python 3.2.3 (default, Sep 10 2012, 11:22:57) 
[GCC 4.7.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import charade
>>> data = open('bom-utf-16-be.srt', 'rb').read()
>>> charade.detect(data)
{'confidence': 1.0, 'encoding': 'UTF-16BE'}

Here, instead, the system wide Debian python-chardet:

Python 2.7.3 (default, Jan  2 2013, 13:56:14) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import chardet
>>> data = open('bom-utf-16-be.srt', 'rb').read()
>>> chardet.detect(data)
{'confidence': 1.0, 'encoding': 'UTF-16BE'}

Python 3.2.3 (default, Sep 10 2012, 11:22:57) 
[GCC 4.7.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import chardet
>>> data = open('bom-utf-16-be.srt', 'rb').read()
>>> chardet.detect(data)
{'confidence': 0.5, 'encoding': 'windows-1252'}

Is it worth backporting to python-chardet? Right now charade doesn't differ 
to much from it but in future it might be.
 
> > The package will be maintained under the umbrella of the DPMT and it's
> > a dependency for the new version (1.1.0) of python-requests.
> 
> can requests use chardet?

Right now, yes, since the two codebase don't differ much. requests is 
currently embedding charade 1.0.1, so there should be no problems.

Maybe I can just update requests using python-chardet for now, but I'm a 
bit worried about that missed detection on Python 3.

What do you suggest?

Kind regards,

-- 
 Daniele Tricoli 'Eriol'
 http://mornie.org
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org:
Bug#698258; Package wnpp. (Mon, 21 Jan 2013 11:45:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniele Tricoli <eriol@mornie.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org. (Mon, 21 Jan 2013 11:45:03 GMT) Full text and rfc822 format available.

Message #20 received at 698258@bugs.debian.org (full text, mbox):

From: Daniele Tricoli <eriol@mornie.org>
To: 698258@bugs.debian.org
Subject: Re: Bug#698258: ITP: python-charade -- universal encoding detector for Python 2 and Python 3
Date: Mon, 21 Jan 2013 12:43:55 +0100
On Friday 18 January 2013 03:42:55 Daniele Tricoli wrote:
> Maybe I can just update requests using python-chardet for now, but I'm
> a  bit worried about that missed detection on Python 3.

Using python(3)-chardet, all requests' tests are ok, so for now I'm going 
to use it. I'm keeping this open to be ready if in future we need to switch 
to charade.

Kind regards,

-- 
 Daniele Tricoli 'Eriol'
 http://mornie.org



Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org:
Bug#698258; Package wnpp. (Tue, 19 Mar 2013 22:03:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniele Tricoli <eriol@mornie.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org. (Tue, 19 Mar 2013 22:03:04 GMT) Full text and rfc822 format available.

Message #25 received at 698258@bugs.debian.org (full text, mbox):

From: Daniele Tricoli <eriol@mornie.org>
To: Ian Cordasco <graffatcolmingov@gmail.com>
Cc: 698258@bugs.debian.org
Subject: Re: Bug#698258: ITP: python-charade -- universal encoding detector for Python 2 and Python 3
Date: Tue, 19 Mar 2013 22:58:37 +0100
Hello Ian,
sorry for this late response I was a bit busy this month.

I'm CC'ing the BTS so your reply can be read by anyone.

On Wednesday 13 February 2013 15:42:51 Ian Cordasco wrote:
> I'm the maintainer of charade and I just noticed your email thread on
> Debian. I just wanted to point out that requests is using charade
> because of how it vendors all of its dependencies.

Many thanks for your reply!

> Trying to have both python-chardet and python3-chardet in the same 
> package was causing a lot of problems and overall hair-pulling. charade 
> is just a way of supporting both python 2 and python 3 in the same 
> package without needing separate versions for separate python versions.

I understand the problem of having both python-chardet and python3-chardet 
in the same package but this problem doesn't exist in Debian, so I can't 
rely to this to bring python-charade into Debian.

Don't think Debian just don't care about it, but having two similar 
projects in the archive demands a cost, so we have to illustrate advantages 
clearly.

> It's still being improved and we are adding new encodings as well, so 
> your effort to add it to the Debian repositories was not in vein. We 
> don't yet have support for those encodings, but I'm planning on working 
> on them over the next couple weekends. If you're interested in helping, 
> that would be awesome.

Enhancements in python-charade is a very good point to bring it into 
Debian, and I undelined your enhancements for this reason.

I don't have a lot of time at the moment but I will try to devote a bit of 
time to help!
 
> Just thought I'd give you the short story as to why requests uses (and
> loves) charade, and give you some more reasons for arguing your case
> in the future.

Many thanks for your words! I will keep an eye to this ITP.

Kind regards,

-- 
 Daniele Tricoli 'Eriol'
 http://mornie.org



Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Daniele Tricoli <eriol@mornie.org>:
Bug#698258; Package wnpp. (Wed, 20 Mar 2013 02:45:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ian Cordasco <graffatcolmingov@gmail.com>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Daniele Tricoli <eriol@mornie.org>. (Wed, 20 Mar 2013 02:45:04 GMT) Full text and rfc822 format available.

Message #30 received at 698258@bugs.debian.org (full text, mbox):

From: Ian Cordasco <graffatcolmingov@gmail.com>
To: Daniele Tricoli <eriol@mornie.org>
Cc: 698258@bugs.debian.org
Subject: Re: Bug#698258: ITP: python-charade -- universal encoding detector for Python 2 and Python 3
Date: Tue, 19 Mar 2013 22:41:19 -0400
On Tue, Mar 19, 2013 at 5:58 PM, Daniele Tricoli <eriol@mornie.org> wrote:
> Hello Ian,
> sorry for this late response I was a bit busy this month.

No worries

>> Trying to have both python-chardet and python3-chardet in the same
>> package was causing a lot of problems and overall hair-pulling. charade
>> is just a way of supporting both python 2 and python 3 in the same
>> package without needing separate versions for separate python versions.
>
> I understand the problem of having both python-chardet and python3-chardet
> in the same package but this problem doesn't exist in Debian, so I can't
> rely to this to bring python-charade into Debian.
>
> Don't think Debian just don't care about it, but having two similar
> projects in the archive demands a cost, so we have to illustrate advantages
> clearly.

I was only concerned because, if I remember correctly, Debian has a
python-requests package. As someone who doesn't use Debian (or Ubuntu)
I was concerned (as a maintainer of requests) that the vendored
dependencies were being stripped and other Debian packages used in
their stead.

>> It's still being improved and we are adding new encodings as well, so
>> your effort to add it to the Debian repositories was not in vein. We
>> don't yet have support for those encodings, but I'm planning on working
>> on them over the next couple weekends. If you're interested in helping,
>> that would be awesome.
>
> Enhancements in python-charade is a very good point to bring it into
> Debian, and I undelined your enhancements for this reason.
>
> I don't have a lot of time at the moment but I will try to devote a bit of
> time to help!

Yeah, frankly, I have been very busy since sending that email and
finding the information needed to add those character sets has not
been... simple. Hopefully I will finish my degree uneventfully and
find more time afterward.



Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org:
Bug#698258; Package wnpp. (Fri, 26 Apr 2013 19:54:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniele Tricoli <eriol@mornie.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org. (Fri, 26 Apr 2013 19:54:04 GMT) Full text and rfc822 format available.

Message #35 received at 698258@bugs.debian.org (full text, mbox):

From: Daniele Tricoli <eriol@mornie.org>
To: Ian Cordasco <graffatcolmingov@gmail.com>
Cc: 698258@bugs.debian.org
Subject: Re: Bug#698258: ITP: python-charade -- universal encoding detector for Python 2 and Python 3
Date: Fri, 26 Apr 2013 21:49:49 +0200
[Message part 1 (text/plain, inline)]
Hello Ian,
sorry for this late response.

[CCing Piotr to get his attention]

On Wednesday 20 March 2013 03:41:19 Ian Cordasco wrote:
> I was only concerned because, if I remember correctly, Debian has a
> python-requests package. As someone who doesn't use Debian (or Ubuntu)
> I was concerned (as a maintainer of requests) that the vendored
> dependencies were being stripped and other Debian packages used in
> their stead.

Yes, Debian has a python-requests package: I'm the maintainer of the 
package, but I'm also a user of python-requests; So I want it in the best 
shape ;)

I understand your concern, and I want to make clear why, as the maintainer 
of python-requests I decided to use chardet instead of charade: after doing a 
diff between charade 1.0.1 and the version of chardet we have in Debian there 
were almost no difference.

My plan was to get charade into Debian before uploading a new version of 
requests, but yesterday Thomas Goirand uploaded requests 1.2.0: I discovered 
looking at commit log on #debian-python :)

As you can understand by this ITP, I'm in favor of having charade in Debian 
and this is my advocacy:
1) charade has an active upstream (which is very kind and supportive) :)
2) charade fix long standing bugs of chardet
3) charade is used by requests, a very popular library

Ian, can you add something more? I think point 1 and 2 are enough to have 
charade in Debian.

I plan to start packaging charade so I will can upload a new revision of 
requests 1.2.0 that use charade instead of chardet.

Piotr, do you have any concerns?

Kind regards,

-- 
 Daniele Tricoli 'Eriol'
 http://mornie.org
[signature.asc (application/pgp-signature, inline)]

Added indication that bug 698258 blocks 729781 Request was from Etienne Millon <me@emillon.org> to control@bugs.debian.org. (Mon, 18 Nov 2013 13:15:07 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Daniele Tricoli <eriol@mornie.org>:
Bug#698258; Package wnpp. (Mon, 25 Nov 2013 20:51:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Piotr Ozarowski <ozarow@gmail.com>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Daniele Tricoli <eriol@mornie.org>. (Mon, 25 Nov 2013 20:51:05 GMT) Full text and rfc822 format available.

Message #42 received at 698258@bugs.debian.org (full text, mbox):

From: Piotr Ozarowski <ozarow@gmail.com>
To: 698258@bugs.debian.org
Subject: Re: Bug#698258: ITP: python-charade -- universal encoding detector for Python 2 and Python 3
Date: Mon, 25 Nov 2013 21:47:31 +0100
[Message part 1 (text/plain, inline)]
[Daniele Tricoli, 2013-11-25]
> Piotr, do you have any concerns?

not anymore, please send me RFS mail when it's ready
(you can also add python-chardet package that imports * from charade
and I will remove chardet from Debian)
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, Daniele Tricoli <eriol@mornie.org>:
Bug#698258; Package wnpp. (Fri, 14 Feb 2014 13:51:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Etienne Millon <me@emillon.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, Daniele Tricoli <eriol@mornie.org>. (Fri, 14 Feb 2014 13:51:04 GMT) Full text and rfc822 format available.

Message #47 received at 698258@bugs.debian.org (full text, mbox):

From: Etienne Millon <me@emillon.org>
To: 698258@bugs.debian.org
Subject: Re: Bug#698258: ITP: python-charade -- universal encoding detector for Python 2 and Python 3
Date: Fri, 14 Feb 2014 14:49:41 +0100
Hi,

According to pypi, charade has been merged into chardet.
Is packaging charade still relevant?

* https://pypi.python.org/pypi/chardet:
> This is a continuation of Mark Pilgrim's excellent chardet.
> Previously, two versions needed to be maintained: one that supported
> python 2.x and one that supported python 3.x. We've recently merged
> with Ian Corduscano's charade fork, so now we have one coherent
> version that works for Python 2.6+.

-- 
Etienne Millon



Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org:
Bug#698258; Package wnpp. (Fri, 14 Feb 2014 14:12:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniele Tricoli <eriol@mornie.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org. (Fri, 14 Feb 2014 14:12:05 GMT) Full text and rfc822 format available.

Message #52 received at 698258@bugs.debian.org (full text, mbox):

From: Daniele Tricoli <eriol@mornie.org>
To: Etienne Millon <me@emillon.org>
Cc: 698258@bugs.debian.org, control@bugs.debian.org, piotr@debian.org
Subject: Re: Bug#698258: ITP: python-charade -- universal encoding detector for Python 2 and Python 3
Date: Fri, 14 Feb 2014 15:09:07 +0100
[Message part 1 (text/plain, inline)]
retitle 698258 RFP: python-charade -- universal encoding detector for Python 2 and Python 3
noowner 698258
thanks

Hello Etienne,

On Friday 14 February 2014 14:49:41 Etienne Millon wrote:
> According to pypi, charade has been merged into chardet.
> Is packaging charade still relevant?

Thanks for reminding that I had to close this bug.
Yes, I talked to Piotr after the discovery of the merge, and
packaging charade make no sense anymore:
closing this bug was in my TODO list, thanks for the reminder!

Piotr, I can help updating python-chardet if you agree.

Kind regards,

-- 
 Daniele Tricoli 'Eriol'
 http://mornie.org
[signature.asc (application/pgp-signature, inline)]

Changed Bug title to 'RFP: python-charade -- universal encoding detector for Python 2 and Python 3' from 'ITP: python-charade -- universal encoding detector for Python 2 and Python 3' Request was from Daniele Tricoli <eriol@mornie.org> to control@bugs.debian.org. (Fri, 14 Feb 2014 14:12:18 GMT) Full text and rfc822 format available.

Removed annotation that Bug was owned by Daniele Tricoli <eriol@mornie.org>. Request was from Daniele Tricoli <eriol@mornie.org> to control@bugs.debian.org. (Fri, 14 Feb 2014 14:12:19 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org:
Bug#698258; Package wnpp. (Fri, 14 Feb 2014 14:51:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Piotr Ożarowski <piotr@debian.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org. (Fri, 14 Feb 2014 14:51:05 GMT) Full text and rfc822 format available.

Message #61 received at 698258@bugs.debian.org (full text, mbox):

From: Piotr Ożarowski <piotr@debian.org>
To: Daniele Tricoli <eriol@mornie.org>
Cc: Etienne Millon <me@emillon.org>, 698258@bugs.debian.org
Subject: Re: Bug#698258: ITP: python-charade -- universal encoding detector for Python 2 and Python 3
Date: Fri, 14 Feb 2014 15:39:27 +0100
[Daniele Tricoli, 2014-02-14]
> On Friday 14 February 2014 14:49:41 Etienne Millon wrote:
> > According to pypi, charade has been merged into chardet.
> > Is packaging charade still relevant?
> 
> Thanks for reminding that I had to close this bug.
> Yes, I talked to Piotr after the discovery of the merge, and
> packaging charade make no sense anymore:
> closing this bug was in my TODO list, thanks for the reminder!
> 
> Piotr, I can help updating python-chardet if you agree.

https://pypi.python.org/pypi/chardet is the current one, right?

I will try to update it this weekend, feel free add yourself to
Uploaders if you beat me to it



Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed Apr 23 17:45:51 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.