For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > October 2004 > How do I parse this page?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author How do I parse this page?
nntp

2004-10-26, 3:57 pm

I am trying to parse
http://www.ebay.com without success.

I view the source, and I see a lot of ?/td>. This page is unsavable.

It displays perfectly in IE, but once the source is saved/viewed, it no long
display right in IE.

When I use LYNX to view it, it is formated perfectly.

My question is how Ebay allow any brower to display the content right
without allowing viewing source or safe as?


Gregory Toomey

2004-10-26, 3:57 pm

nntp wrote:

> I am trying to parse
> http://www.ebay.com without success.


In Perl, try
http://search.cpan.org/~gaas/HTML-Parser-3.35/Parser.pm

> I view the source, and I see a lot of ?/td>. This page is unsavable.


> It displays perfectly in IE, but once the source is saved/viewed, it no
> long display right in IE.


Maybe it uses css, or needs images to provide formatting hints.

> When I use LYNX to view it, it is formated perfectly.
>
> My question is how Ebay allow any brower to display the content right
> without allowing viewing source or safe as?


Please don't clutter Perl newsgroups with web server questions.

gtoomey
nntp

2004-10-26, 3:57 pm

> > I am trying to parse
>
> In Perl, try
> http://search.cpan.org/~gaas/HTML-Parser-3.35/Parser.pm
>
>
>
> Maybe it uses css, or needs images to provide formatting hints.

Have you looked at the source codes of www.ebay.com?
I don't know what you mean by uses images to provide formatting hints.


Toby Inkster

2004-10-26, 8:56 pm

[F'ups set to a.w.w.]

nntp wrote:

> http://www.ebay.com
> I view the source, and I see a lot of ?/td>. This page is unsavable.
> It displays perfectly in IE, but once the source is saved/viewed, it no long
> display right in IE. My question is how Ebay allow any brower to
> display the content right without allowing viewing source or safe as?


IE doesn't simply show you the source when you hit the "view source"
button. Oh no. That would be too easy. It does all kinds of weird crap
first and then shows you some modified source code. I'm guessing that some
of that weird crap screws up some of the characters.

Look at the source code in a different browser and it displays fine.

Not that you should try to emulate any of that code. It's pants.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

George King

2004-10-26, 8:56 pm

"nntp" <nntp@rogers.com> wrote in message
news:_dydnarTGPNdFePcRVn-sQ@rogers.com...
>I am trying to parse
> http://www.ebay.com without success.
>
> I view the source, and I see a lot of ?/td>. This page is unsavable.
>
> It displays perfectly in IE, but once the source is saved/viewed, it no
> long
> display right in IE.
>
> When I use LYNX to view it, it is formated perfectly.
>
> My question is how Ebay allow any brower to display the content right
> without allowing viewing source or safe as?
>


I don't have a copy of Lynx, so I can't duplicate your problem, but...
Opera saves the file with images and IE displays it just fine from the saved
files.

Ebay.com (index.html) uses an external CSS stylesheet. It also uses a
sizeable number of external javascript files and 68 images to make up the
page I looked at.

George



Ben Morrow

2004-10-26, 8:56 pm


Quoth "nntp" <nntp@rogers.com>:
> I am trying to parse
> http://www.ebay.com without success.
>
> I view the source, and I see a lot of ?/td>. This page is unsavable.
>
> It displays perfectly in IE, but once the source is saved/viewed, it no long
> display right in IE.
>
> When I use LYNX to view it, it is formated perfectly.
>
> My question is how Ebay allow any brower to display the content right
> without allowing viewing source or safe as?


They can't. You've probably got character-set issues. Use LWP to retreive the
page.

Ben

--
I must not fear. Fear is the mind-killer. I will face my fear and
I will let it pass through me. When the fear is gone there will be
nothing. Only I will remain.
ben@morrow.me.uk Frank Herbert, 'Dune'
A. Sinan Unur

2004-10-26, 8:56 pm

"nntp" <nntp@rogers.com> wrote in
news:_dydnarTGPNdFePcRVn-sQ@rogers.com:

> I am trying to parse
> http://www.ebay.com without success.
>
> I view the source, and I see a lot of ?/td>. This page is unsavable.


That ain't true. If you have any questions on parsing HTML using
HTML::Parser, please post them here. Otherwise, this waaay off-topic.

Sinan
Dr John Stockton

2004-10-27, 3:57 pm

JRS: In article <Xns958EB135B42A7asu1cornelledu@132.236.56.8>, dated
Tue, 26 Oct 2004 21:25:13, seen in news:comp.lang.javascript, A. Sinan
Unur <1usa@llenroc.ude.invalid> posted :
>"nntp" <nntp@rogers.com> wrote in
>news:_dydnarTGPNdFePcRVn-sQ@rogers.com:
>
>
>That ain't true. If you have any questions on parsing HTML using
>HTML::Parser, please post them here. Otherwise, this waaay off-topic.


Please take greater, or at least better, thought before using a word
such as "here".

--
© John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4 ©
<URL:http://www.jibbering.com/faq/> JL/RC: FAQ of news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Tad McClellan

2004-10-27, 3:57 pm

Dr John Stockton <spam@merlyn.demon.co.uk> wrote:
> JRS: In article <Xns958EB135B42A7asu1cornelledu@132.236.56.8>, dated
> Tue, 26 Oct 2004 21:25:13, seen in news:comp.lang.javascript, A. Sinan
> Unur <1usa@llenroc.ude.invalid> posted :
>
> Please take greater, or at least better, thought before using a word
> such as "here".



Please take greater, or at least better, notice of the Newsgroups
header before determining which "where" is "here".

:-)


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com