Debian Bug report logs - #516824
python-beautifulsoup: parser fails due to embedded javascript

version graph

Package: python-beautifulsoup; Maintainer for python-beautifulsoup is Debian Python Modules Team <>; Source for python-beautifulsoup is src:beautifulsoup.

Reported by: Eric Cooper <>

Date: Mon, 23 Feb 2009 21:09:01 UTC

Severity: important

Found in version beautifulsoup/

Fixed in version beautifulsoup/3.2.0-1

Done: Stefano Rivera <>

Bug is archived. No further changes may be made.

Forwarded to

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to, Decklin Foster <>:
Bug#516824; Package python-beautifulsoup. (Mon, 23 Feb 2009 21:09:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Cooper <>:
New Bug report received and forwarded. Copy sent to Decklin Foster <>. (Mon, 23 Feb 2009 21:09:04 GMT) Full text and rfc822 format available.

Message #5 received at (full text, mbox):

From: Eric Cooper <>
To: Debian Bug Tracking System <>
Subject: python-beautifulsoup: parser fails due to embedded javascript
Date: Mon, 23 Feb 2009 16:06:16 -0500
Package: python-beautifulsoup
Severity: important

The recent upgrade from 3.0.7 to 3.1.0 caused BeautifulSoup to stop
being able to parse HTML pages that contain particular forms of
embedded JavaScript.

Here is a small example that parses correctly with 3.0.7.

    <title>Not-So-Beautiful Soup</title>
    function legalJS() {
	var str = '</p>';
	return 0<str.length;

With 3.1.0, it causes this failure:

  File "./", line 7, in <module>
    soup = BeautifulSoup(page)
  File "/var/lib/python-support/python2.5/", line 1499, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/var/lib/python-support/python2.5/", line 1230, in __init__
  File "/var/lib/python-support/python2.5/", line 1263, in _feed
  File "/usr/lib/python2.5/", line 108, in feed
  File "/usr/lib/python2.5/", line 148, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.5/", line 226, in parse_starttag
    endpos = self.check_for_whole_start_tag(i)
  File "/usr/lib/python2.5/", line 301, in check_for_whole_start_tag
    self.error("malformed start tag")
  File "/usr/lib/python2.5/", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 9, column 28

-- System Information:
Debian Release: 5.0
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'stable'), (400, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-1-686 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages python-beautifulsoup depends on:
ii  python                        2.5.2-3    An interactive high-level object-o
ii  python-support                0.8.7      automated rebuilding support for P

python-beautifulsoup recommends no packages.

python-beautifulsoup suggests no packages.

-- no debconf information

Information forwarded to, Decklin Foster <>:
Bug#516824; Package python-beautifulsoup. (Fri, 06 Mar 2009 22:03:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Gabriel Farrell <>:
Extra info received and forwarded to list. Copy sent to Decklin Foster <>. (Fri, 06 Mar 2009 22:03:02 GMT) Full text and rfc822 format available.

Message #10 received at (full text, mbox):

From: Gabriel Farrell <>
Subject: re: #516824
Date: Fri, 6 Mar 2009 17:01:04 -0500
The issue is BeautifulSoup's move to HTMLParser from SGMLParser [1].
I've linked back to here from the HTMLParser bug [2].


Set Bug forwarded-to-address to ''. Request was from Stefano Rivera <> to (Sun, 13 Feb 2011 20:24:06 GMT) Full text and rfc822 format available.

Added tag(s) pending. Request was from to (Sun, 13 Feb 2011 21:06:05 GMT) Full text and rfc822 format available.

Reply sent to Stefano Rivera <>:
You have taken responsibility. (Mon, 14 Feb 2011 13:36:06 GMT) Full text and rfc822 format available.

Notification sent to Eric Cooper <>:
Bug acknowledged by developer. (Mon, 14 Feb 2011 13:36:06 GMT) Full text and rfc822 format available.

Message #19 received at (full text, mbox):

From: Stefano Rivera <>
Subject: Bug#516824: fixed in beautifulsoup 3.2.0-1
Date: Mon, 14 Feb 2011 13:32:07 +0000
Source: beautifulsoup
Source-Version: 3.2.0-1

We believe that the bug you reported is fixed in the latest version of
beautifulsoup, which is due to be installed in the Debian FTP archive:

  to main/b/beautifulsoup/beautifulsoup_3.2.0-1.debian.tar.gz
  to main/b/beautifulsoup/beautifulsoup_3.2.0-1.dsc
  to main/b/beautifulsoup/beautifulsoup_3.2.0.orig.tar.gz
  to main/b/beautifulsoup/python-beautifulsoup_3.2.0-1_all.deb

A summary of the changes between this version and the previous one is

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
Stefano Rivera <> (supplier of updated beautifulsoup package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing

Hash: SHA1

Format: 1.8
Date: Mon, 14 Feb 2011 15:15:21 +0200
Source: beautifulsoup
Binary: python-beautifulsoup
Architecture: all source
Version: 3.2.0-1
Distribution: unstable
Urgency: low
Maintainer: Debian Python Modules Team <>
Changed-By: Stefano Rivera <>
Closes: 516824 530408 564160 607864 612875
 python-beautifulsoup - error-tolerant HTML parser for Python
 beautifulsoup (3.2.0-1) unstable; urgency=low
   * Adopting beautifulsoup for Debian Python Modules Team. (Closes: #612875)
   * New upstream version.
     - The 3.2 release reverts back to the 3.0 SGMLParser approach.
       (Closes: #564160, LP: #392968)
     - <script> blocks are correctly handled again
       (Closes: #516824, LP: #357067)
     - Upstream no longer ships a changelog. (Closes: #530408)
   * Bump standards version to 3.9.1. Moved into python section.
   * Switch to Source Format 3.0 (quilt).
   * Switch to dh_python2.
     - Use X-Python-Version.
   * debian/control:
     - Drop -XB-Python-Version. Deprecated.
     - Drop Provides, Replaces, Conflicts. Versioned package names for Python
       modules are deprecated. No supported releases have packages requiring
     - Add Homepage.
     - Add Vcs- URLs.
     - Recommend python-chardet.
   * Bump debhelper dependency and compat level to 8.
   * Use DEP5 format debian/copyright.
   * Add watch file. (Closes: #607864)
   * Don't install tests as an example.
   * debian/rules:
     - Use minimal dh 7 style.
     - Run test suite during build.
 9f6a2feaf58c1b1005b30e3835ae0fa3bde4ecf8 1990 beautifulsoup_3.2.0-1.dsc
 924eb4e43144e233e3749edadc8dc5cd8ec8a3be 31056 beautifulsoup_3.2.0.orig.tar.gz
 924186032a3c60b223e0ec0b409b4fadeb0f7f13 3670 beautifulsoup_3.2.0-1.debian.tar.gz
 4c88acf8baf403e5d4457a893f1d455a4a5d5381 35096 python-beautifulsoup_3.2.0-1_all.deb
 6cee1ac85aea015711438520d89eba61960cfd99a20248b36fb73027ce04c78f 1990 beautifulsoup_3.2.0-1.dsc
 a0ea3377a1055bf2e17594c0808414afb65e11f25ce8998f1ed3e9b871de6ff6 31056 beautifulsoup_3.2.0.orig.tar.gz
 1f3cbf4b57dfb6e54672e346e293b3fc19988e04310cc19ce269a5c858d73ed3 3670 beautifulsoup_3.2.0-1.debian.tar.gz
 f416bbc498134b7fbadadde8d105b27f06a3dc979c3973c357e960681cbb0bff 35096 python-beautifulsoup_3.2.0-1_all.deb
 68af49baa10a97790f920c3b96fe7dcb 1990 python optional beautifulsoup_3.2.0-1.dsc
 ef1e78f7689ea61314f7bddebcfde88c 31056 python optional beautifulsoup_3.2.0.orig.tar.gz
 d83055bff64933b2d4e0c529dfcefe3a 3670 python optional beautifulsoup_3.2.0-1.debian.tar.gz
 50781e75f291dff3bd8591741d03ab52 35096 python optional python-beautifulsoup_3.2.0-1_all.deb

Version: GnuPG v1.4.10 (GNU/Linux)


Bug archived. Request was from Debbugs Internal Request <> to (Fri, 25 Mar 2011 07:31:14 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.

Debian bug tracking system administrator <>. Last modified: Sun Apr 20 17:07:59 2014; Machine Name:

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.