| Floobee@web.de 2006-08-26, 6:57 pm |
| hello Robin hello all=20
@ Randal - i have read your complains and i guess that you think that i wa=
nt to do some harmeful.=20
i do not want to do any thing harmefull. I am working on my PhD and i nedd=
to have (collect some more data )=20
i work in so what -=20
i work in the filed of social resarch - and escpecially the fild of onli=
ne-research see -=20
http://opensource.mit.edu/online=5Fpapers.php
http://opensource.mit.edu/
my current investigation includes some analysis of discussions - online di=
scussions
first of - i have to explain something; I have to grab some data out of a=
phpBB in order to do some field reseach. I need the data out of a forum =
that is runned by a user community. I need the data to analyze the discus=
sions.
to give an example - let us take this forum here. How can i grab all the d=
ata out of this forum - and get it local and then after wards put it in a=
local database - of a phpBB-forum - is this possible"=3F!"=3F to give an ex=
ample - let us take this forum here - am i able to grabb and harvest data =
out of
this forum here. How can i do that. =20
What i have in mind - Nothing harmeful - nothing bad - nothing serious and=
dangerous.
But the issue is. i have to get the data - so what=3F
I need to to take out forum messages and other data (foum topics, users) i=
nto database.=20
Purpose: create forum copy for text analysis. Does anyone have approximate=
solution=3F
It is needed to get data through HTTP for further analysis - in need to ge=
t the data through
HTTP and put it into CSV - in order to get a dump that can fill a local d=
atabase of a phpBB-board.=20
I need the data in a allmost full and complete formate. So i need all the =
data like
username .-
forum
thread
topic
text of the posting and so on and so on.
see http://www.phpbbdoctor.com/doc=5Ftables.php for a full overview:=20
how to do that=3F i need some kind of a grabbing tool - can i do it with tha=
t kind of tool. How do i sove the storing-issue=20
into the local mysql-database. Well you see that is a tricky work - and i =
am pretty sure taht i am getting help here. So for any and all help i am =
very very thankful many many thanks in advance
And now Robin - Randal, please i am willing to discuss the implications th=
at come with my ideas, my wish -=20
but believe me.
I could run my investigations with a browser - as well =5F- i could load 700=
threads - THEY ARE ONLINE SO WHATS the=20
the difficult.=20
EVERYThing is online - i do not really understand the difference here... =
but i am open to the discussion with you=20
look forward to hear your ideas , suggestions and - yes after the legal /(=
and ethical discussions ) i am looking forward=20
to a technical discussion=20
jobst=20
- a Ethno-reseracher
> -----Urspr=FCngliche Nachricht-----
> Von: Robin Norwood <rnorwood@redhat.com>
> Gesendet: 26.08.06 16:18:07
> An: merlyn@stonehenge.com (Randal L. Schwartz)
> CC: beginners@perl.org
> Betreff: Re: subroutine in LWP - in order to get 700 forum threads
> merlyn@stonehenge.com (Randal L. Schwartz) writes:
>=20
in[color=darkred]
ow i[color=darkred]
e[color=darkred]
>=20
> Really=3F If I understood the OP correctly, all he wants to do is 'screen=
> scrape' the (public) board in question. In other words, nothing
> significantly different from what Google does when it indexes. I don't
> really see an ethical (as opposed to legal - IANAL!) problem with that.
> Of course, I would first email the admin for permission, and make *sure*=
> that such a bot is 'well behaved' - such as adding calls to sleep inside=
> some of those loops. After he gets the data, he could do something
> unethical with it - like republish it. But just getting the data
> doesn't seem wrong to me.
>=20
> As I said above, I am not a lawyer! The above should not be taken to
> mean I think it is legal to do this. But it does sound ethical to me.
>=20
> -RN
>=20
> --=20
> Robin Norwood
> Red Hat, Inc.
>=20
> "The Sage does nothing, yet nothing remains undone."
> -Lao Tzu, Te Tao Ching
>=20
> --=20
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>=20
>=20
> -----Urspr=FCngliche Nachricht-----
> Von: merlyn@stonehenge.com (Randal L. Schwartz)
> Gesendet: 26.08.06 16:19:44
> An: Robin Norwood <rnorwood@redhat.com>
> CC: beginners@perl.org
> Betreff: Re: subroutine in LWP - in order to get 700 forum threads
>=20
>=20
> Robin> Really=3F If I understood the OP correctly, all he wants to do is =
'screen
> Robin> scrape' the (public) board in question. In other words, nothing
> Robin> significantly different from what Google does when it indexes. I=
don't
> Robin> really see an ethical (as opposed to legal - IANAL!) problem with=
that.
> Robin> Of course, I would first email the admin for permission, and make=
*sure*
> Robin> that such a bot is 'well behaved' - such as adding calls to sleep=
inside
> Robin> some of those loops. After he gets the data, he could do somethi=
ng
> Robin> unethical with it - like republish it. But just getting the data=
> Robin> doesn't seem wrong to me.
>=20
> It's one thing to be google, and index all the pages for public use.
>=20
> It's entirely another to do it for your own personal gain (knowledge
> or commerce, doesn't matter).
>=20
> If you can't see the difference, you need to retune your ethics.
>=20
> --=20
> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0=
095
> <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
> See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl trai=
ning!
>=20
> --=20
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>=20
>=20
=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5
F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F
=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5
F=5F=5F
Der WEB.DE SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
http://smartsurfer.web.de/=3Fmc=3D1...=3D000000000066
|