Debian Bug report logs - #849094
liblept5: Broken on s390x (+ other big endian archs)

version graph

Package: liblept5; Maintainer for liblept5 is Jeff Breidenbach <jab@debian.org>; Source for liblept5 is src:leptonlib (PTS, buildd, popcon).

Reported by: Sean Whitton <spwhitton@spwhitton.name>

Date: Thu, 22 Dec 2016 16:09:01 UTC

Severity: normal

Found in version leptonlib/1.73-6

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, graham.inggs@gmail.com, mattia@mapreri.org, jim@purplerock.ca, Jeff Breidenbach <jab@debian.org>:
Bug#849094; Package liblept5. (Thu, 22 Dec 2016 16:09:04 GMT) (full text, mbox, link).


Acknowledgement sent to Sean Whitton <spwhitton@spwhitton.name>:
New Bug report received and forwarded. Copy sent to graham.inggs@gmail.com, mattia@mapreri.org, jim@purplerock.ca, Jeff Breidenbach <jab@debian.org>. (Thu, 22 Dec 2016 16:09:04 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Sean Whitton <spwhitton@spwhitton.name>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: liblept5: Broken on s390x (+ other big endian archs)
Date: Thu, 22 Dec 2016 16:05:29 +0000
[Message part 1 (text/plain, inline)]
Package: liblept5
Version: 1.73-6
Severity: normal
Tags: patch

Dear maintainer,

liblept looks to be broken on big endian architectures.  This was
discovered by means of the OCRmyPDF test suite.  It's failing on
s390x,[1] the broken files are emitted at the stage where OCRmyPDF
invokes liblept code, and the broken files are highly suggestive of
endianness issues.

I believe the attached backported patch will fix the problem, though
I've only been able to confirm that it doesn't break building the
package.

(Many thanks to James R. Barlow, OCRmyPDF's upstream author, for
examining the broken files, and to Mattia Rizzolo for running the tests
on an s390x porterbox.)

[1] http://autopkgtest.ubuntu.com/packages/o/ocrmypdf/zesty/s390x

-- System Information:
Debian Release: stretch/sid
  APT prefers testing
  APT policy: (900, 'testing')
Architecture: i386 (i686)

Kernel: Linux 4.8.0-2-686-pae (SMP w/2 CPU cores)
Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages liblept5 depends on:
ii  libc6            2.24-8
ii  libgif7          5.1.4-0.4
ii  libjpeg62-turbo  1:1.5.1-2
ii  libopenjp2-7     2.1.2-1
ii  libpng16-16      1.6.26-6
ii  libtiff5         4.0.7-1
ii  libwebp6         0.5.1-4
ii  zlib1g           1:1.2.8.dfsg-2+b3

liblept5 recommends no packages.

liblept5 suggests no packages.

-- no debconf information

-- 
Sean Whitton
[big_endian_fix.patch (text/x-diff, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Jeff Breidenbach <jab@debian.org>:
Bug#849094; Package liblept5. (Fri, 23 Dec 2016 18:57:04 GMT) (full text, mbox, link).


Acknowledgement sent to Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Jeff Breidenbach <jab@debian.org>. (Fri, 23 Dec 2016 18:57:04 GMT) (full text, mbox, link).


Message #10 received at 849094@bugs.debian.org (full text, mbox, reply):

From: Graham Inggs <ginggs@debian.org>
To: 849094@bugs.debian.org
Subject: liblept5: Broken on s390x (+ other big endian archs)
Date: Fri, 23 Dec 2016 20:55:21 +0200
[Message part 1 (text/plain, inline)]
I built leptonlib 1.73-6 including Sean's patch on powerpc and s390x.
I then ran ocrmypdf's test suite against it.

Test results went from:

tests/test_hocrtransform.py .
tests/test_main.py
...F......................ss.F.................................................
tests/test_pageinfo.py ....

to:

tests/test_hocrtransform.py .
tests/test_main.py
..........................ss.......FFF.........................................
tests/test_pageinfo.py ....

Tests on little-endian architectures remained successful, so it seems
to be a step in the right direction, but we aren't quite there yet.

Output of new failing tests attached.
[ocrmypdf-s390x-fail.txt (text/plain, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Jeff Breidenbach <jab@debian.org>:
Bug#849094; Package liblept5. (Sun, 25 Dec 2016 13:15:03 GMT) (full text, mbox, link).


Acknowledgement sent to Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Jeff Breidenbach <jab@debian.org>. (Sun, 25 Dec 2016 13:15:03 GMT) (full text, mbox, link).


Message #15 received at 849094@bugs.debian.org (full text, mbox, reply):

From: Graham Inggs <ginggs@debian.org>
To: 849094@bugs.debian.org
Subject: liblept5: Broken on s390x (+ other big endian archs)
Date: Sat, 24 Dec 2016 21:50:52 +0200
Control: tags -1 - patch

I've just tried running ocrmypdf's test suite against the recently
released leptonlib 1.74.0 on powerpc and I get the same results I did
with 1.73 and Sean's patch, i.e. the following three tests fail:

test_autorotate[hocr]
test_autorotate[tesseract]
test_autorotate_threshold_low

Untagging the patch as it incomplete.



Removed tag(s) patch. Request was from Graham Inggs <ginggs@debian.org> to 849094-submit@bugs.debian.org. (Sun, 25 Dec 2016 13:15:03 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Jeff Breidenbach <jab@debian.org>:
Bug#849094; Package liblept5. (Thu, 29 Dec 2016 05:18:03 GMT) (full text, mbox, link).


Acknowledgement sent to James R Barlow <jim@purplerock.ca>:
Extra info received and forwarded to list. Copy sent to Jeff Breidenbach <jab@debian.org>. (Thu, 29 Dec 2016 05:18:03 GMT) (full text, mbox, link).


Message #22 received at 849094@bugs.debian.org (full text, mbox, reply):

From: James R Barlow <jim@purplerock.ca>
To: 849094@bugs.debian.org
Subject: liblept5: Broken on s390x (+ other big endian archs)
Date: Wed, 28 Dec 2016 21:15:00 -0800
[Message part 1 (text/plain, inline)]
I'm the ocrmypdf upstream author.

First, be aware that the output of OCR and autorotate is cached in the test
suite and the results are persisted between test cases and runs of the test
suite in the tests/cache folder. The cache hit/miss check is not smart
enough to pick up changes that aren't reflected in leptonica's version
number, that is, debian changes. However, it looks to like me the test
suite is being run to target a temporary folder and that should remove
cache effects. Nuke the test/cache folder between test suite runs to be
sure.

All the failing tests relate to "check_monochrome_correlation", a function
that checks for close but not identical visual output compared to a
reference. Because of a now-fixed leptonica bug in one of the underlying
functions, I actually have a separate test that validates that this helper
function, and that passes on big endian.

The log shows that tesseract failed to properly detect page orientation and
came back with a low confidence answer. I interpret that to mean there are
endian issues in either tesseract or leptonica; the test isn't able to
distinguish.

It seems that the problem may be either a big endian issue in tesseract
alone (perhaps affecting multiple versions, since tesseract does not have
much a test suite) or it's some leptonica API that tesseract invokes while
doing a page orientation check. Tesseract's test suite is very limited and
probably doesn't check for consistency here.

I looks like the patch is safe to apply and would be a net improvement even
though it doesn't fix all of the issues my test suite finds.


You can check orientation (skipping full OCR) in tesseract 3.04.01 with:

$ tesseract -l eng -psm 0 test_image.png stdout

The output for LinnSequencer.jpg on my macOS-x64 machine is:

$ tesseract -l eng -psm 0 tests/resources/LinnSequencer.jpg stdout
Warning in pixReadMemJpeg: work-around: writing to a temp file
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 31.48
Script: Latin
Script confidence: 100.95

From the logs, tesseract reports (orientation, confidence) = (0, 1.32) for
the same page on big endian, which means whatever data it is examining is
much noisier, i.e. probably corrupted by endian swizzling. Quite likely the
OCR output is garbage as well.

It might be interesting to see what the behavior differences are for
leptonica 1.73-patched, 1.74 and tesseract 3.04.01 and 4.00alpha all on big
endian. The results matrix from those combinations would probably indicate
whether to blame tesseract or leptonica.
[Message part 2 (text/html, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Jeff Breidenbach <jab@debian.org>:
Bug#849094; Package liblept5. (Tue, 03 Jan 2017 18:27:02 GMT) (full text, mbox, link).


Acknowledgement sent to Jeff Breidenbach <jeff@jab.org>:
Extra info received and forwarded to list. Copy sent to Jeff Breidenbach <jab@debian.org>. (Tue, 03 Jan 2017 18:27:02 GMT) (full text, mbox, link).


Message #27 received at 849094@bugs.debian.org (full text, mbox, reply):

From: Jeff Breidenbach <jeff@jab.org>
To: James R Barlow <jim@purplerock.ca>, 849094@bugs.debian.org, Stefan Weil <sw@weilnetz.de>, Dan Bloomberg <dbloomberg@google.com>
Subject: Re: Bug#849094: liblept5: Broken on s390x (+ other big endian archs)
Date: Tue, 3 Jan 2017 10:24:01 -0800
[Message part 1 (text/plain, inline)]
Tesseract 4 is known to not work on big endian. Stefan (on CC) is excited
to
take a look if someone can give him access to a big endian machine.

There are no known endian problems with Tesseract 3 or Leptonica, but if any
are definitively found they will get immediate attention.

I am not going to apply this patch in Debian right now. Instead will send
it
upstream for consideration.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=849094
[Message part 2 (text/html, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Jeff Breidenbach <jab@debian.org>:
Bug#849094; Package liblept5. (Tue, 03 Jan 2017 18:54:02 GMT) (full text, mbox, link).


Acknowledgement sent to Jeff Breidenbach <jeff@jab.org>:
Extra info received and forwarded to list. Copy sent to Jeff Breidenbach <jab@debian.org>. (Tue, 03 Jan 2017 18:54:03 GMT) (full text, mbox, link).


Message #32 received at 849094@bugs.debian.org (full text, mbox, reply):

From: Jeff Breidenbach <jeff@jab.org>
To: James R Barlow <jim@purplerock.ca>, 849094@bugs.debian.org, Stefan Weil <sw@weilnetz.de>, Dan Bloomberg <dbloomberg@google.com>
Subject: Re: Bug#849094: liblept5: Broken on s390x (+ other big endian archs)
Date: Tue, 3 Jan 2017 10:51:40 -0800
[Message part 1 (text/plain, inline)]
I've just uploaded 1.74.1-1 to Debian, which contains something
similar to Sean's patch.
[Message part 2 (text/html, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Jeff Breidenbach <jab@debian.org>:
Bug#849094; Package liblept5. (Wed, 04 Jan 2017 07:09:04 GMT) (full text, mbox, link).


Acknowledgement sent to Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Jeff Breidenbach <jab@debian.org>. (Wed, 04 Jan 2017 07:09:04 GMT) (full text, mbox, link).


Message #37 received at 849094@bugs.debian.org (full text, mbox, reply):

From: Graham Inggs <ginggs@debian.org>
To: Jeff Breidenbach <jeff@jab.org>, 849094@bugs.debian.org
Cc: James R Barlow <jim@purplerock.ca>, Stefan Weil <sw@weilnetz.de>, Dan Bloomberg <dbloomberg@google.com>
Subject: Re: Bug#849094: liblept5: Broken on s390x (+ other big endian archs)
Date: Wed, 4 Jan 2017 09:03:55 +0200
On 3 January 2017 at 20:24, Jeff Breidenbach <jeff@jab.org> wrote:
> Tesseract 4 is known to not work on big endian. Stefan (on CC) is excited to
> take a look if someone can give him access to a big endian machine.

It is possible for non-DDs to request temporary access to porterboxes,
see https://dsa.debian.org/doc/guest-account/



Information forwarded to debian-bugs-dist@lists.debian.org, Jeff Breidenbach <jab@debian.org>:
Bug#849094; Package liblept5. (Wed, 04 Jan 2017 08:15:03 GMT) (full text, mbox, link).


Acknowledgement sent to Stefan Weil <sw@weilnetz.de>:
Extra info received and forwarded to list. Copy sent to Jeff Breidenbach <jab@debian.org>. (Wed, 04 Jan 2017 08:15:03 GMT) (full text, mbox, link).


Message #42 received at 849094@bugs.debian.org (full text, mbox, reply):

From: Stefan Weil <sw@weilnetz.de>
To: Graham Inggs <ginggs@debian.org>, Jeff Breidenbach <jeff@jab.org>, 849094@bugs.debian.org
Cc: James R Barlow <jim@purplerock.ca>, Dan Bloomberg <dbloomberg@google.com>
Subject: Re: Bug#849094: liblept5: Broken on s390x (+ other big endian archs)
Date: Wed, 4 Jan 2017 09:13:55 +0100
On 01/04/17 08:03, Graham Inggs wrote:
> On 3 January 2017 at 20:24, Jeff Breidenbach <jeff@jab.org> wrote:
>> Tesseract 4 is known to not work on big endian. Stefan (on CC) is excited to
>> take a look if someone can give him access to a big endian machine.
>
> It is possible for non-DDs to request temporary access to porterboxes,
> see https://dsa.debian.org/doc/guest-account/
>

"People who are not yet DMs or NMs will need to find a DD who is willing 
to sponsor their request". That's what I tried to do.

Stefan




Information forwarded to debian-bugs-dist@lists.debian.org, Jeff Breidenbach <jab@debian.org>:
Bug#849094; Package liblept5. (Wed, 04 Jan 2017 19:00:04 GMT) (full text, mbox, link).


Acknowledgement sent to Jeff Breidenbach <jeff@jab.org>:
Extra info received and forwarded to list. Copy sent to Jeff Breidenbach <jab@debian.org>. (Wed, 04 Jan 2017 19:00:04 GMT) (full text, mbox, link).


Message #47 received at 849094@bugs.debian.org (full text, mbox, reply):

From: Jeff Breidenbach <jeff@jab.org>
To: Stefan Weil <sw@weilnetz.de>
Cc: Graham Inggs <ginggs@debian.org>, 849094@bugs.debian.org, James R Barlow <jim@purplerock.ca>, Dan Bloomberg <dbloomberg@google.com>
Subject: Re: Bug#849094: liblept5: Broken on s390x (+ other big endian archs)
Date: Wed, 4 Jan 2017 10:56:47 -0800
[Message part 1 (text/plain, inline)]
Sorry, I wasn't aware of the guest account thing. Probably my fault for not
reading
email carefully enough. I am a Debian Developer and will sponsor this
request. Fill
out the information "Information guest needs to supply to sponsoring DD"
and I will
sign it.

https://dsa.debian.org/doc/guest-account/

Asking around, another option appears to be Oregon State University which
provides
access to a big endian PowerPC machine. I do not know which approach is
easier.
People who work on the Go computer language use this.

http://osuosl.org/services/powerdev/request_hosting/
[Message part 2 (text/html, inline)]

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed Jan 10 10:31:41 2018; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.