For Programmers: Free Programming Magazines  


Home > Archive > ASP > April 2007 > Data mining?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Data mining?
Tom

2007-04-25, 7:56 am

Hi all,

I wonder if anyone can give me some help here.

I have permission from a colleage to use some data from his website,
now I need to take the data and intermingle some of the information
with data from my database. I only have access to the HTML of the
external site, there's no XML feed or anything simple I can get it
from either. So, I was wondering if there was an easyish way of
parsing out the info I need from the HTML and putting it into and
array or something? This is how the HTML is formed:

<div class="primary-clear">
<p style="float:right; font-weight:bold">Tel: .</p>

<h2>xxNeed this Titlexx</h2>
<div class="data-clear">
<div>
<address>Address line 1
address line 2
Postcode
</address>
</div>
</div>
<hr />
<ul>

<li class="first">
<a href="#">more details</a>
</li>
</ul>
</div>

<div class="listing-break"></div>

<div class="primary-clear">
<p style="float:right; font-weight:bold">Tel: .</p>

<h2>xxNeed this Titlexx</h2>
<div class="data-clear">
<div>
<address>Address line 1
address line 2
Postcode
</address>
</div>
</div>
<hr />
<ul>

<li class="first">
<a href="#">more details</a>
</li>
</ul>
</div>
<div class="listing-break"></div>
<div class="primary-clear">
<p style="float:right; font-weight:bold">Tel: .</p>

<h2>xxNeed this Titlexx</h2>
<div class="data-clear">
<div>
<address>Address line 1
address line 2
Postcode
</address>
</div>
</div>
<hr />
<ul>

<li class="first">
<a href="#">more details</a>
</li>
</ul>
</div>

And so on....

Does anyone have any bright ideas? I've got as far as putting the
whole page into a string and ripping out the <head></head> and other
stuff. It's looping through those DIVs and turning them into something
I can manipulate where I'm struggling.

Thanks in advance,

Tom

Anthony Jones

2007-04-26, 6:55 pm


"Tom" <google@tom-jordan.co.uk> wrote in message
news:1177489913.166536.19160@r30g2000prh.googlegroups.com...
> Hi all,
>
> I wonder if anyone can give me some help here.
>
> I have permission from a colleage to use some data from his website,
> now I need to take the data and intermingle some of the information
> with data from my database. I only have access to the HTML of the
> external site, there's no XML feed or anything simple I can get it
> from either. So, I was wondering if there was an easyish way of
> parsing out the info I need from the HTML and putting it into and
> array or something? This is how the HTML is formed:
>
> <div class="primary-clear">
> <p style="float:right; font-weight:bold">Tel: .</p>
>
> <h2>xxNeed this Titlexx</h2>
> <div class="data-clear">
> <div>
> <address>Address line 1
> address line 2
> Postcode
> </address>
> </div>
> </div>
> <hr />
> <ul>
>
> <li class="first">
> <a href="#">more details</a>
> </li>
> </ul>
> </div>
>
> <div class="listing-break"></div>
>
> <div class="primary-clear">
> <p style="float:right; font-weight:bold">Tel: .</p>
>
> <h2>xxNeed this Titlexx</h2>
> <div class="data-clear">
> <div>
> <address>Address line 1
> address line 2
> Postcode
> </address>
> </div>
> </div>
> <hr />
> <ul>
>
> <li class="first">
> <a href="#">more details</a>
> </li>
> </ul>
> </div>
> <div class="listing-break"></div>
> <div class="primary-clear">
> <p style="float:right; font-weight:bold">Tel: .</p>
>
> <h2>xxNeed this Titlexx</h2>
> <div class="data-clear">
> <div>
> <address>Address line 1
> address line 2
> Postcode
> </address>
> </div>
> </div>
> <hr />
> <ul>
>
> <li class="first">
> <a href="#">more details</a>
> </li>
> </ul>
> </div>
>
> And so on....
>
> Does anyone have any bright ideas? I've got as far as putting the
> whole page into a string and ripping out the <head></head> and other
> stuff. It's looping through those DIVs and turning them into something
> I can manipulate where I'm struggling.
>
> Thanks in advance,


I looks like the HTML is XML compliant (e.g., it uses <hr /> rather than
simply <hr> ) you might be able to get away with loading it into an XML DOM.




>
> Tom
>



Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com