Code Comments
Programming Forum and web based access to our favorite programming groups.I'm trying to parse an HTML file. I want to retrieve all of the text inside a certain tag that I find with XPath. The DOM seems to make this available with the innerHTML element, but I haven't found a way to do it in Python.
Post Follow-up to this message> I'm trying to parse an HTML file. I want to retrieve all of the text > inside a certain tag that I find with XPath. The DOM seems to make > this available with the innerHTML element, but I haven't found a way > to do it in Python. Have you tried http://www.google.com/search?q=python+html+parser ? HTH, Daniel
Post Follow-up to this messageBeautifulSoup does what I need it to. Though, I was hoping to find something that would let me work with the DOM the way JavaScript can work with web browsers' implementations of the DOM. Specifically, I'd like to be able to access the innerHTML element of a DOM element. Python's built-in HTMLParser is SAX-based, so I don't want to use that, and the minidom doesn't appear to implement this part of the DOM. On Wed, Apr 2, 2008 at 10:37 PM, Daniel Fetchinson <fetchinson@googlemail.com> wrote: > > Have you tried http://www.google.com/search?q=python+html+parser ? > > HTH, > Daniel >
Post Follow-up to this messageOn 3 Apr, 06:59, Benjamin <ben...@gmail.com> wrote: > I'm trying to parse an HTML file. I want to retrieve all of the text > inside a certain tag that I find with XPath. The DOM seems to make > this available with the innerHTML element, but I haven't found a way > to do it in Python. With libxml2dom you'd do the following: 1. Parse the file using libxml2dom.parse with html set to a true value. 2. Use the xpath method on the document to select the desired element. 3. Use the toString method on the element to get the text of the element (including start and end tags), or the textContent property to get the text between the tags. See the Package Index page for more details: http://www.python.org/pypi/libxml2dom Paul
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.