Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Parsing HTML?
I'm trying to parse an HTML file.  I want to retrieve all of the text
inside a certain tag that I find with XPath.  The DOM seems to make
this available with the innerHTML element, but I haven't found a way
to do it in Python.

Report this thread to moderator Post Follow-up to this message
Old Post
Benjamin
04-03-08 11:29 AM


Re: Parsing HTML?
> I'm trying to parse an HTML file.  I want to retrieve all of the text
> inside a certain tag that I find with XPath.  The DOM seems to make
> this available with the innerHTML element, but I haven't found a way
> to do it in Python.

Have you tried http://www.google.com/search?q=python+html+parser ?

HTH,
Daniel

Report this thread to moderator Post Follow-up to this message
Old Post
Daniel Fetchinson
04-03-08 11:29 AM


Re: Parsing HTML?
BeautifulSoup does what I need it to.  Though, I was hoping to find
something that would let me work with the DOM the way JavaScript can
work with web browsers' implementations of the DOM.  Specifically, I'd
like to be able to access the innerHTML element of a DOM element.
Python's built-in HTMLParser is SAX-based, so I don't want to use
that, and the minidom doesn't appear to implement this part of the
DOM.

On Wed, Apr 2, 2008 at 10:37 PM, Daniel Fetchinson
<fetchinson@googlemail.com> wrote: 
>
>  Have you tried http://www.google.com/search?q=python+html+parser ?
>
>  HTH,
>  Daniel
>

Report this thread to moderator Post Follow-up to this message
Old Post
benash@gmail.com
04-03-08 11:29 AM


Re: Parsing HTML?
On 3 Apr, 06:59, Benjamin <ben...@gmail.com> wrote:
> I'm trying to parse an HTML file.  I want to retrieve all of the text
> inside a certain tag that I find with XPath.  The DOM seems to make
> this available with the innerHTML element, but I haven't found a way
> to do it in Python.

With libxml2dom you'd do the following:

1. Parse the file using libxml2dom.parse with html set to a true
value.
2. Use the xpath method on the document to select the desired
element.
3. Use the toString method on the element to get the text of the
element (including start and end tags), or the textContent
property
to get the text between the tags.

See the Package Index page for more details:

http://www.python.org/pypi/libxml2dom

Paul

Report this thread to moderator Post Follow-up to this message
Old Post
Paul Boddie
04-03-08 01:44 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Python archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 11:13 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.