Debian Bug report logs - #516824
python-beautifulsoup: parser fails due to embedded javascript

version graph

Package: python-beautifulsoup; Maintainer for python-beautifulsoup is Debian Python Modules Team <python-modules-team@lists.alioth.debian.org>; Source for python-beautifulsoup is src:beautifulsoup.

Reported by: Eric Cooper <ecc@cmu.edu>

Date: Mon, 23 Feb 2009 21:09:01 UTC

Severity: important

Found in version beautifulsoup/3.1.0.1-1

Fixed in version beautifulsoup/3.2.0-1

Done: Stefano Rivera <stefanor@debian.org>

Bug is archived. No further changes may be made.

Forwarded to http://bugs.python.org/issue670664

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Decklin Foster <decklin@red-bean.com>:
Bug#516824; Package python-beautifulsoup. (Mon, 23 Feb 2009 21:09:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Cooper <ecc@cmu.edu>:
New Bug report received and forwarded. Copy sent to Decklin Foster <decklin@red-bean.com>. (Mon, 23 Feb 2009 21:09:04 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Eric Cooper <ecc@cmu.edu>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: python-beautifulsoup: parser fails due to embedded javascript
Date: Mon, 23 Feb 2009 16:06:16 -0500
Package: python-beautifulsoup
Version: 3.1.0.1-1
Severity: important

The recent upgrade from 3.0.7 to 3.1.0 caused BeautifulSoup to stop
being able to parse HTML pages that contain particular forms of
embedded JavaScript.

Here is a small example that parses correctly with 3.0.7.

    <html>
    <head>
    <title>Not-So-Beautiful Soup</title>
    </head>
    <body>
    <script>
    function legalJS() {
	var str = '</p>';
	return 0<str.length;
    }
    </script>
    </body>
    </html>

With 3.1.0, it causes this failure:

  File "./souptest.py", line 7, in <module>
    soup = BeautifulSoup(page)
  File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1499, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1230, in __init__
    self._feed(isHTML=isHTML)
  File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1263, in _feed
    self.builder.feed(markup)
  File "/usr/lib/python2.5/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/lib/python2.5/HTMLParser.py", line 148, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.5/HTMLParser.py", line 226, in parse_starttag
    endpos = self.check_for_whole_start_tag(i)
  File "/usr/lib/python2.5/HTMLParser.py", line 301, in check_for_whole_start_tag
    self.error("malformed start tag")
  File "/usr/lib/python2.5/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 9, column 28

-- System Information:
Debian Release: 5.0
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'stable'), (400, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-1-686 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages python-beautifulsoup depends on:
ii  python                        2.5.2-3    An interactive high-level object-o
ii  python-support                0.8.7      automated rebuilding support for P

python-beautifulsoup recommends no packages.

python-beautifulsoup suggests no packages.

-- no debconf information




Information forwarded to debian-bugs-dist@lists.debian.org, Decklin Foster <decklin@red-bean.com>:
Bug#516824; Package python-beautifulsoup. (Fri, 06 Mar 2009 22:03:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Gabriel Farrell <gsf@rc98.net>:
Extra info received and forwarded to list. Copy sent to Decklin Foster <decklin@red-bean.com>. (Fri, 06 Mar 2009 22:03:02 GMT) Full text and rfc822 format available.

Message #10 received at 516824@bugs.debian.org (full text, mbox):

From: Gabriel Farrell <gsf@rc98.net>
To: 516824@bugs.debian.org
Subject: re: #516824
Date: Fri, 6 Mar 2009 17:01:04 -0500
The issue is BeautifulSoup's move to HTMLParser from SGMLParser [1].
I've linked back to here from the HTMLParser bug [2].

[1] http://groups.google.com/group/beautifulsoup/msg/d5a7540620538d14
[2] http://bugs.python.org/issue670664




Set Bug forwarded-to-address to 'http://bugs.python.org/issue670664'. Request was from Stefano Rivera <stefanor@debian.org> to control@bugs.debian.org. (Sun, 13 Feb 2011 20:24:06 GMT) Full text and rfc822 format available.

Added tag(s) pending. Request was from stefanor@users.alioth.debian.org to control@bugs.debian.org. (Sun, 13 Feb 2011 21:06:05 GMT) Full text and rfc822 format available.

Reply sent to Stefano Rivera <stefanor@debian.org>:
You have taken responsibility. (Mon, 14 Feb 2011 13:36:06 GMT) Full text and rfc822 format available.

Notification sent to Eric Cooper <ecc@cmu.edu>:
Bug acknowledged by developer. (Mon, 14 Feb 2011 13:36:06 GMT) Full text and rfc822 format available.

Message #19 received at 516824-close@bugs.debian.org (full text, mbox):

From: Stefano Rivera <stefanor@debian.org>
To: 516824-close@bugs.debian.org
Subject: Bug#516824: fixed in beautifulsoup 3.2.0-1
Date: Mon, 14 Feb 2011 13:32:07 +0000
Source: beautifulsoup
Source-Version: 3.2.0-1

We believe that the bug you reported is fixed in the latest version of
beautifulsoup, which is due to be installed in the Debian FTP archive:

beautifulsoup_3.2.0-1.debian.tar.gz
  to main/b/beautifulsoup/beautifulsoup_3.2.0-1.debian.tar.gz
beautifulsoup_3.2.0-1.dsc
  to main/b/beautifulsoup/beautifulsoup_3.2.0-1.dsc
beautifulsoup_3.2.0.orig.tar.gz
  to main/b/beautifulsoup/beautifulsoup_3.2.0.orig.tar.gz
python-beautifulsoup_3.2.0-1_all.deb
  to main/b/beautifulsoup/python-beautifulsoup_3.2.0-1_all.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 516824@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Stefano Rivera <stefanor@debian.org> (supplier of updated beautifulsoup package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.8
Date: Mon, 14 Feb 2011 15:15:21 +0200
Source: beautifulsoup
Binary: python-beautifulsoup
Architecture: all source
Version: 3.2.0-1
Distribution: unstable
Urgency: low
Maintainer: Debian Python Modules Team <python-modules-team@lists.alioth.debian.org>
Changed-By: Stefano Rivera <stefanor@debian.org>
Closes: 516824 530408 564160 607864 612875
Description: 
 python-beautifulsoup - error-tolerant HTML parser for Python
Changes: 
 beautifulsoup (3.2.0-1) unstable; urgency=low
 .
   * Adopting beautifulsoup for Debian Python Modules Team. (Closes: #612875)
   * New upstream version.
     - The 3.2 release reverts back to the 3.0 SGMLParser approach.
       (Closes: #564160, LP: #392968)
     - <script> blocks are correctly handled again
       (Closes: #516824, LP: #357067)
     - Upstream no longer ships a changelog. (Closes: #530408)
   * Bump standards version to 3.9.1. Moved into python section.
   * Switch to Source Format 3.0 (quilt).
   * Switch to dh_python2.
     - Use X-Python-Version.
   * debian/control:
     - Drop -XB-Python-Version. Deprecated.
     - Drop Provides, Replaces, Conflicts. Versioned package names for Python
       modules are deprecated. No supported releases have packages requiring
       them.
     - Add Homepage.
     - Add Vcs- URLs.
     - Recommend python-chardet.
   * Bump debhelper dependency and compat level to 8.
   * Use DEP5 format debian/copyright.
   * Add watch file. (Closes: #607864)
   * Don't install tests as an example.
   * debian/rules:
     - Use minimal dh 7 style.
     - Run test suite during build.
Checksums-Sha1: 
 9f6a2feaf58c1b1005b30e3835ae0fa3bde4ecf8 1990 beautifulsoup_3.2.0-1.dsc
 924eb4e43144e233e3749edadc8dc5cd8ec8a3be 31056 beautifulsoup_3.2.0.orig.tar.gz
 924186032a3c60b223e0ec0b409b4fadeb0f7f13 3670 beautifulsoup_3.2.0-1.debian.tar.gz
 4c88acf8baf403e5d4457a893f1d455a4a5d5381 35096 python-beautifulsoup_3.2.0-1_all.deb
Checksums-Sha256: 
 6cee1ac85aea015711438520d89eba61960cfd99a20248b36fb73027ce04c78f 1990 beautifulsoup_3.2.0-1.dsc
 a0ea3377a1055bf2e17594c0808414afb65e11f25ce8998f1ed3e9b871de6ff6 31056 beautifulsoup_3.2.0.orig.tar.gz
 1f3cbf4b57dfb6e54672e346e293b3fc19988e04310cc19ce269a5c858d73ed3 3670 beautifulsoup_3.2.0-1.debian.tar.gz
 f416bbc498134b7fbadadde8d105b27f06a3dc979c3973c357e960681cbb0bff 35096 python-beautifulsoup_3.2.0-1_all.deb
Files: 
 68af49baa10a97790f920c3b96fe7dcb 1990 python optional beautifulsoup_3.2.0-1.dsc
 ef1e78f7689ea61314f7bddebcfde88c 31056 python optional beautifulsoup_3.2.0.orig.tar.gz
 d83055bff64933b2d4e0c529dfcefe3a 3670 python optional beautifulsoup_3.2.0-1.debian.tar.gz
 50781e75f291dff3bd8591741d03ab52 35096 python optional python-beautifulsoup_3.2.0-1_all.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIcBAEBAgAGBQJNWSvzAAoJEACQ/CG1zRrMyW4P/jy2uC1++pBlm2p6dB/lvHTD
0yXG++OkHkMkiB4uMQS1Tqxs3v84mgiEZu0htvs7OmQWQeV/5m9vojMCsxwTbVzX
vTkRBguCrxC1mwryICXN2lzShC0q2qgkgxTcnhEVT4x6f+3zY13GoDZ89DtjMQ5v
fLsb+fQBXC8QzVIRL0Yg+GK8HjQXo55D59ijFypQgYMmsiQh00UqLct+Ilxp8N/+
VAKgMlFJBi79F7ywbhlWivgo+o8jwpAFWQn5Z4BX0WAB0vCGex56SLU36Qf0Hvps
jtvCQ+8hMOm9Z49qfsSjVNBgdPwxCJB49cPaDFYBkyNKT/msvScGKmxdMt8amt4x
NM7u6x3iPm91YYgbTRX/529bn1wwSTlbGr/OifAZRhWy5gQmn4evdhk3bJyvOY6f
blNkHoA0+w75X9g+BkdGAFkNjQErJLv+tuhU9n+tgjQ7gB9Z6DVsu98tC1ArMWSv
O9q67JogtRQBtXR1tJ5GkX/YHG1A/UjrO6iHWn779yPxNeqMpqJKUQw3AnavGBET
p9MLb7MxPWmRYGeM7CROSdIjKrSY4Nbymj9f35AUePsaccsC1PguoxefOd6oRAr+
EIUbXmC0WhPvIak4JrqA009wzUyCTHEyV0e4+Ym5cgtlYjHut7S+3gJkUUHNM1v3
S2Pgr9NEgDkXrJhlQRQx
=1hns
-----END PGP SIGNATURE-----





Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Fri, 25 Mar 2011 07:31:14 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sun Apr 20 17:07:59 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.