Home > Archive > Fortran > June 2005 > http site text-only read possible?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
http site text-only read possible?
|
|
| David Frank 2005-06-06, 3:58 pm |
| I currently download/extract stock quotes within my Fortran day-trading
program by wading thru the html stuff
for the text of interest (prices)..
A typical once a minute read may have 25kb of pictures etc surrounding the
300 bytes of stock price text of interest.
Is there a internet read protocol flag that can reduce this overhead?
I note that my browser (not used for this job) has a multimedia option to
kill most of the web page display
altho I'm not sure the transmission of all those pictures doesnt take still
place and internet explorer browser
just suppresses their display.
| |
| Janne Blomqvist 2005-06-06, 3:58 pm |
| David Frank wrote:
> I currently download/extract stock quotes within my Fortran day-trading
> program by wading thru the html stuff
> for the text of interest (prices)..
>
> A typical once a minute read may have 25kb of pictures etc surrounding the
> 300 bytes of stock price text of interest.
> Is there a internet read protocol flag that can reduce this overhead?
Actually, at the http protocol level you have to explicitly download
all the images and so on separately; it's just that web browsers
conveniently do this automagically for you.
> I note that my browser (not used for this job) has a multimedia option to
> kill most of the web page display
> altho I'm not sure the transmission of all those pictures doesnt take still
> place and internet explorer browser
> just suppresses their display.
So what are you using to download the web page then?
Anyways, if what you're using now doesn't work you can always use
something like curl ( http://curl.haxx.se/ ) or wget (
ftp://sunsite.dk/projects/wget/windows/ ). Curl even has a library
which you probably can call directly from your code.
Or if you don't want to install any extra software, IIRC windows still
has a telnet client so you could just telnet to port 80 (in case you
somehow can send commands to the windows telnet client
non-interactively, e.g. via a file specified on the command line).
--
Janne Blomqvist
| |
| Dr Ivan D. Reid 2005-06-06, 3:58 pm |
| On Mon, 06 Jun 2005 15:19:11 GMT, David Frank <dave_frank@hotmail.com>
wrote in <PPZoe.130$VK4.122@newsread1.news.atl.earthlink.net>:
> I currently download/extract stock quotes within my Fortran day-trading
> program by wading thru the html stuff
> for the text of interest (prices)..
> A typical once a minute read may have 25kb of pictures etc surrounding the
> 300 bytes of stock price text of interest.
> Is there a internet read protocol flag that can reduce this overhead?
http. It's not the download per se that brings the pictures but
the browser(?) software honouring all the <image> tags. If you know the
file-name that you want you can use straight http protocol to get just that
file (unless everything has been encoded into an image file, of course).
On UNIX systems there is a programme called wget which can do just this;
I don't know its availability on Widows (well, it's available with cygwin).
[A quick google shows versions are available, e.g:
http://allserv.ugent.be/~bpuype/wget/ ]
I did once cobble up a shell script & awk files to do a similar
download directly using http commands, but I'd not recommend that approach
nowadays.
> I note that my browser (not used for this job) has a multimedia option to
> kill most of the web page display
> altho I'm not sure the transmission of all those pictures doesnt take still
> place and internet explorer browser
> just suppresses their display.
If you switch off image display then the image files aren't fetched
until you request them -- at least that's how it used to work when I was on
slow connexions.
--
Ivan Reid, Electronic & Computer Engineering, ___ CMS Collaboration,
Brunel University. Ivan.Reid@brunel.ac.uk Room 40-1-B12, CERN
KotPT -- "for stupidity above and beyond the call of duty".
| |
| David Frank 2005-06-06, 3:58 pm |
|
"Janne Blomqvist" <foo@bar.invalid> wrote in message
news:slrnda8re0.9nn.foo@vipunen.hut.fi...
> David Frank wrote:
>
> Actually, at the http protocol level you have to explicitly download
> all the images and so on separately; it's just that web browsers
> conveniently do this automagically for you.
>
>
> So what are you using to download the web page then?
>
> Anyways, if what you're using now doesn't work you can always use
> something like curl ( http://curl.haxx.se/ ) or wget (
> ftp://sunsite.dk/projects/wget/windows/ ). Curl even has a library
> which you probably can call directly from your code.
>
> Or if you don't want to install any extra software, IIRC windows still
> has a telnet client so you could just telnet to port 80 (in case you
> somehow can send commands to the windows telnet client
> non-interactively, e.g. via a file specified on the command line).
>
>
> --
> Janne Blomqvist
My RT_VIEW program has a thread that reads specified url link and suspends
itself,
the program exec then processes the text from the buffer and issues a
resume thread call 1 minute
later to get the next price quote...
see my read net thread code at
http://home.earthlink.net/~dave_gemini/rt_view.txt
| |
| David Frank 2005-06-06, 3:58 pm |
|
"Dr Ivan D. Reid" <Ivan.Reid@brunel.ac.uk> wrote in message
news:slrnda8rnr.27s.Ivan.Reid@loki.brunel.ac.uk...
> If you switch off image display then the image files aren't fetched
> until you request them -- at least that's how it used to work when I was
> on
> slow connexions.
>
Here is a typical link that I try to decode in my RT_VIEW program.
http://www.finance.lycos.com/qc/sto...ols=INDEX:SPX.X
Read of this web page fills my RT_VIEW program's net read buffer with about
16kb of superfluous html that I then must parse to get my price data.
| |
| Bart Vandewoestyne 2005-06-06, 3:58 pm |
| In article <slrnda8rnr.27s.Ivan.Reid@loki.brunel.ac.uk>, Dr Ivan D. Reid wrote:
>
> [...]
> On UNIX systems there is a programme called wget which can do just this;
> I don't know its availability on Widows (well, it's available with cygwin).
> [A quick google shows versions are available, e.g:
> http://allserv.ugent.be/~bpuype/wget/ ]
For the record: wget is available for Windows, see
http://wget.sunsite.dk/#downloading
Regards,
Bart
--
"Share what you know. Learn what you don't."
| |
| Harold Stevens 2005-06-06, 3:58 pm |
| In <slrnda8rnr.27s.Ivan.Reid@loki.brunel.ac.uk> Dr Ivan D. Reid:
[Snip...]
> If you switch off image display then the image files aren't fetched
> until you request them -- at least that's how it used to work when I was on
> slow connexions.
Indeed, such as textmode browsers like Lynx, which I use for this very
reason (and others) on dialup. Lynx is also similar to wget/curl which
others have mentioned, in that it has an option to download/dump pages
to a local tty instead of displaying them (from the Lynx manpage):
-dump dumps the formatted output of the default document or one
specified on the command line to standard output. This can be
used in the following way:
lynx -dump http://www.subir.com/lynx.html
--
Regards, Weird (Harold Stevens) * IMPORTANT EMAIL INFO FOLLOWS *
Pardon any bogus email addresses (wookie) in place for spambots.
Really, it's (wyrd) at airmail, dotted with net. DO NOT SPAM IT.
Kids jumping ship? Looking to hire an old-school type? Email me.
| |
| e p chandler 2005-06-06, 3:58 pm |
| David Frank wrote:
> "Dr Ivan D. Reid" <Ivan.Reid@brunel.ac.uk> wrote in message
> news:slrnda8rnr.27s.Ivan.Reid@loki.brunel.ac.uk...
>
>
> Here is a typical link that I try to decode in my RT_VIEW program.
>
> http://www.finance.lycos.com/qc/sto...ols=INDEX:SPX.X
>
>
> Read of this web page fills my RT_VIEW program's net read buffer with about
> 16kb of superfluous html that I then must parse to get my price data.
from the link supplied in a previous posting obtain wget.exe
set up a scheduled task which runs a .bat file containing
wget -O foo.txt url_goes_here
awk -f foo.awk foo.txt | awk -f foo1.awk >foo1.txt
my_fortran_program.exe .....
Notes:
The -O switch sends the downloaded file to the file "foo.txt"
foo.awk and foo1.awk are AWK scripts
The pattern in foo.awk matches all lines from a line containing the
first pattern to a line containing the second pattern. This is the HTML
table containing the data of interest.
foo1.awk grabs lines containing table data, removes extra stuff and
then concatenates the 8 numeric values into a single line of blank
separated data.
Finally a Fortran program reads and does something with the data
file....
foo.awk:
/<table summary=/,/<\/table>/
foo1.awk:
# grab <td>.....</td>
/<td>.*<\/td>$/ {
gsub(/ - /," ") # strip -
gsub(/,/,"") # strip ,
gsub(/ to /," ") # strip to
gsub(/\//,"") # strip /
gsub(/<td>/,"") # strip <td>
for(i=1;i<=NF;i++) b=b " " $i # append next number
# print # for debug
}
END{print b} # dump the vector
The OP or NG regulars are welcome to replace the AWK scripts with
Fortran programs, etc. if they so choose :-)!
A sample foo1.txt:
1192.75 1197.43 1196.02 1196.02 667 1375 1060.72 1229.11
Disclaimer: I take _no_ responsibility if the OP gets into trouble for
repeatedly and frequently wgetting this web page. :-).
| |
| Dr Ivan D. Reid 2005-06-06, 8:57 pm |
| On Mon, 06 Jun 2005 16:32:19 GMT, David Frank <dave_frank@hotmail.com>
wrote in <nU_oe.147$eM6.99@newsread3.news.atl.earthlink.net>:
> "Dr Ivan D. Reid" <Ivan.Reid@brunel.ac.uk> wrote in message
> news:slrnda8rnr.27s.Ivan.Reid@loki.brunel.ac.uk...
[color=darkred]
> Here is a typical link that I try to decode in my RT_VIEW program.
> http://www.finance.lycos.com/qc/sto...ols=INDEX:SPX.X
> Read of this web page fills my RT_VIEW program's net read buffer with about
> 16kb of superfluous html that I then must parse to get my price data.
Most of that seems to be javascript functions -- if there's a way of
downloading them separately and caching them to avoid net traffic then this
page certainly doesn't use it.
FWIW I get this with wget under cygwin:
$ time
wget http://www.finance.lycos.com/qc/sto...ols=INDEX:SPX.X
--20:04:54--
http://www.finance.lycos.com/qc/sto...ols=INDEX:SPX.X
=> `quotes.aspx@symbols=INDEX%3ASPX.X'
Resolving www.finance.lycos.com... 209.202.214.18
Connecting to www.finance.lycos.com[209.202.214.18]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14,988 [text/html]
100%[===================================
=>] 14,988 19.11K/s
20:04:55 (19.11 KB/s) - `quotes.aspx@symbols=INDEX%3ASPX.X' saved [14988/14988]
real 0m2.054s
user 0m0.109s
sys 0m0.015s
$ ls -l quotes.aspx\@symbols\=INDEX%3ASPX.X
-rw-r--r-- 1 Ivan None 14988 Jun 6 20:04 quotes.aspx@symbols=INDEX%3ASPX.X
e p has given you a recipe for extracting the data you want, so I
won't try to give another way.
--
Ivan Reid, Electronic & Computer Engineering, ___ CMS Collaboration,
Brunel University. Ivan.Reid@brunel.ac.uk Room 40-1-B12, CERN
KotPT -- "for stupidity above and beyond the call of duty".
| |
| Janne Blomqvist 2005-06-06, 8:57 pm |
| In article <slrnda988i.6ia.Ivan.Reid@loki.brunel.ac.uk>, Dr Ivan D. Reid wrote:
> Most of that seems to be javascript functions -- if there's a way of
> downloading them separately and caching them to avoid net traffic then this
> page certainly doesn't use it.
Yes, IIRC it's possible to load javascript from a separate file (which
is separately downloaded using http), but in this case when the
javascript is embedded in the html file I don't think one can avoid
it.
> FWIW I get this with wget under cygwin:
> $ time
> wget http://www.finance.lycos.com/qc/sto...ols=INDEX:SPX.X
> --20:04:54--
> http://www.finance.lycos.com/qc/sto...ols=INDEX:SPX.X
> => `quotes.aspx@symbols=INDEX%3ASPX.X'
> Resolving www.finance.lycos.com... 209.202.214.18
> Connecting to www.finance.lycos.com[209.202.214.18]:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 14,988 [text/html]
>
> 100%[===================================
=>] 14,988 19.11K/s
>
> 20:04:55 (19.11 KB/s) - `quotes.aspx@symbols=INDEX%3ASPX.X' saved [14988/14988]
>
> real 0m2.054s
> user 0m0.109s
> sys 0m0.015s
>
> $ ls -l quotes.aspx\@symbols\=INDEX%3ASPX.X
> -rw-r--r-- 1 Ivan None 14988 Jun 6 20:04 quotes.aspx@symbols=INDEX%3ASPX.X
>
> e p has given you a recipe for extracting the data you want, so I
> won't try to give another way.
Additionally, most webservers support compressing the content using
gzip (e.g. apache module mod_gzip or mod_deflate). wget doesn't seem
to support it, but curl does with the --compressed option.
--
Janne Blomqvist
|
|
|
|
|