For Programmers: Free Programming Magazines  


Home > Archive > AWK > September 2004 > Sockets as filenames.









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Sockets as filenames.
Ian Stirling

2004-09-28, 3:56 am

I notice that
awk '/pattern/' /inet/tcp/...
works to read from a server that requires no interaction.

Is it possible to write on the same socket?
Something like
awk 'BEGIN{RS="\r*\n"}
/^200 /{
print "list" >FILENAME;next}
/^\.$/{
print "quit" >FILENAME}
exit}
{print >"list"}' /inet/tcp/...

But this tries to open a new socket to write on.

It would be nice to be able to get rid of 'hose' (a simple network
utility that sets up the socket, connects stdin and stdout to it, then
gets out of the way) in some network scripts.

Or isn't this possible.
Ian Stirling

2004-09-28, 3:56 am

Aharon Robbins <arnold@skeeve.com> wrote:
> In article <4152bc69$0$80306$ed2619ec@ptn-nntp-reader01.plus.net>,
> Ian Stirling <root@mauve.demon.co.uk> wrote:
<snip>[color=darkred]
<snip>[color=darkred]
>
> Where is "S" getting set?


It was a code fragment, not actual working code.
>
>
> You forgot the
>
> close("articles-to-get")


Actually not, if you're reading a list of stuff to do.

>
>
> In other words, what you want is some way to say, "hey gawk, make this
> socket be stdin and stdout" ?


Or absolutely ideally, for all files in
awk -f whatever.awk /inet/tcp/... /inet/tcp/... /home/me/file

For
print >FILENAME
to do the expected thing.

More generally, it'd be nice if everything that could be done
with ARGV/FILENAME/... to choose the upcoming filenames 'just worked'
with sockets.

It already (seems to be from the limited testing) half-way there, with
normal input processing working just fine, only with the anomaly that
in this case, two sockets with identical names don't read and write
correctly.


> There is currently no way to do that in gawk. The external program that does
> it seems to be the simplest solution. If you're using bash, you might
> could do it externally too. Something like this (untried):
>
> gawk -f program 0<>/dev/tcp/HOST/PORT 1<>&0
>
> but I have no idea whether that will really work or not. (If it does,
> please let me know!)


On linux at least, /dev/tcp doesn't work.

Thanks for the comments.
Off to read the source.
Kenny McCormack

2004-09-28, 3:56 am

In article <2rglp3F1aiuabU1@uni-berlin.de>,
=?ISO-8859-1?Q?J=FCrgen_Kahrs?= <Juergen.KahrsDELETETHIS@vr-web.de> wrote:
....
>BTW: It is a good idea to use close(sock).
>But one can argue that it is not necessary
>in short examples like the one you gave above.
>The reason is that the socket (like every
>other file descriptor) is automatically
>closed by the shell when the script terminates.
>In the doc we are constantly arguing in favor of
>close(sock) to prevent you (the user) form getting
>into trouble, but if you know what you are doing,
>closing is not strictly necessary.


I use close because the above occurs in a loop and I want the server to see
each line as a separate connection. I.e., the server (which I wrote
myself) processes only one line of input (per connection).

Jürgen Kahrs

2004-09-28, 3:56 am

Ian Stirling wrote:

> And now, when it hits "print >FILENAME", it does not even try to write on
> the socket, but opens a new one.


Opening a second descriptor/socket happens
also if you use an ordinary file.

> socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
> ...
> connect(4, {sa_family=AF_INET, sin_port=htons(119), sin_addr=inet_addr("212.159.2.86")}, 16) = 0
> ...
> read(3, 0x808e278, 1024) = ? ERESTARTSYS (To be restarted)
>
> And after opening this new socket, it doesn't try to write on it.
> (unless I missed something.)


Interesting. Which OS do you use ?

My Linux 2.6.5 does it this way:

socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(4, SOL_SOCKET, SO_LINGER, {onoff=1, linger=30}, 8) = 0
bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
connect(4, {sa_family=AF_INET, sin_port=htons(119), sin_addr=inet_addr("212.159.2.86")}, 16) = 0
fstat64(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x401ac000
_lls(4, 0, 0xbfffed20, SEEK_CUR) = -1 ESPIPE (Illegal s)
ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfffeddc) = -1 EINVAL (Invalid argument)
read(3,


It waits (blocking) for input.
Both directions (read and write) are opened separately
because they are different (the one input, the other output).
The ls() on the socket is a bit unfortunate.
What you see is the consequence of using undocumented
features. Two-directional access has to be done with "|&".
Aharon Robbins

2004-09-28, 3:56 am

In article <4152bc69$0$80306$ed2619ec@ptn-nntp-reader01.plus.net>,
Ian Stirling <root@mauve.demon.co.uk> wrote:
>J?rgen Kahrs <Juergen.KahrsDELETETHIS@vr-web.de> wrote:
>
>Right.
>Finger is really trivial, as are things like http, and other simple
>services that are basically a request, and then the other end sends
>an answer.
>For services that have say 10-12 states, that you need to negotiate
>between, it can get messy to implement, especially when you'r
>taking into account all the errors that can result.
>
>But this can be quite easy with builtin awk.
>
>/500 server shutdown/ {quit()}
>S=="Waiting for hello" && /error/ {quit()}


Where is "S" getting set?

>S=="Read Data" && /^\.$/{S="Get Article";next}
>S=="Get Article" && {getline <"articles-to-get"; print $0;next}


You forgot the

close("articles-to-get")

>{print "fell out of loop, state = "S, $0;reset()}


In other words, what you want is some way to say, "hey gawk, make this
socket be stdin and stdout" ?

There is currently no way to do that in gawk. The external program that does
it seems to be the simplest solution. If you're using bash, you might
could do it externally too. Something like this (untried):

gawk -f program 0<>/dev/tcp/HOST/PORT 1<>&0

but I have no idea whether that will really work or not. (If it does,
please let me know!)

Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 350 8765
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
Jürgen Kahrs

2004-09-28, 3:56 am

Ian Stirling wrote:

> For services that have say 10-12 states, that you need to negotiate
> between, it can get messy to implement, especially when you'r
> taking into account all the errors that can result.


Yes, protocol design is not trivial.
But the problem of implementing a proper state
machine has to be solved in any language you
choose to use. An AWK solution will not look
much more messy than a C solution.
Jürgen Kahrs

2004-09-28, 3:56 am

Ian Stirling wrote:

> Or absolutely ideally, for all files in
> awk -f whatever.awk /inet/tcp/... /inet/tcp/... /home/me/file


Now I see what your problem actually was.
/inet/tcp has a meaning inside GNU Awk;
outside (in the realm of any shell) it
has no meaning. But even inside GNU Awk
sockets must be handled differently than
files.

When I started implementing socket access
in GAWK, I actually wanted to implement it
just the way you expected it (access with
"<" and ">"). But our honorable Maintainer
convinced me to implement socket access in
a different way ("|&"). The reasoning goes
like this: GAWK remembers if a "file" was
accessed with "<" or ">". These are one-
directional accesses (either read or write,
but not both). Sockets dont fit into this
scheme, therefore Arnold told me to use the
"|&" syntax, which was prepared for two-
directional access. This really makes sense
from an implementer's perspective.

From a user's perspective, I was never happy
with the "|&" syntax because users (like you)
expect things to work in a uniform way.

> More generally, it'd be nice if everything that could be done
> with ARGV/FILENAME/... to choose the upcoming filenames 'just worked'
> with sockets.


Indeed.

> It already (seems to be from the limited testing) half-way there, with
> normal input processing working just fine, only with the anomaly that
> in this case, two sockets with identical names don't read and write
> correctly.


The files with identical name are not much better.

> On linux at least, /dev/tcp doesn't work.


I think Arnold forgot to mention that /dev/tcp is
a feature of the new Korn Shell (ksh93) and not
a feature of GNU Awk. Therefore, you can expect to
see a working /dev/tcp only if your shell is ksh93
compatible.
Jürgen Kahrs

2004-09-28, 3:56 am

Kenny McCormack wrote:

> I am stepping in only to point out that ">" does work with sockets (as
> long, of course, as the communication is one-way). My program contains:
>
> sock="/inet/tcp/..."
> print data > sock;close(sock)
>
> and it works just fine.


You are right, I simply forgot this.
Having been convinced to use "|&" I forced
myself to ignore the "<" and ">" method.
Well, strictly speaking, only the "|&"
way of access is documented, everything
else is never mentioned in the doc (correct
me if I am wrong) and could be called an
undocumented feature.


BTW: It is a good idea to use close(sock).
But one can argue that it is not necessary
in short examples like the one you gave above.
The reason is that the socket (like every
other file descriptor) is automatically
closed by the shell when the script terminates.
In the doc we are constantly arguing in favor of
close(sock) to prevent you (the user) form getting
into trouble, but if you know what you are doing,
closing is not strictly necessary.
Kenny McCormack

2004-09-28, 3:55 pm

In article <2rgk3dF1a9f3bU1@uni-berlin.de>,
=?ISO-8859-1?Q?J=FCrgen_Kahrs?= <Juergen.KahrsDELETETHIS@vr-web.de> wrote:
>Ian Stirling wrote:
>
>
>Now I see what your problem actually was.
>/inet/tcp has a meaning inside GNU Awk;
>outside (in the realm of any shell) it
>has no meaning. But even inside GNU Awk
>sockets must be handled differently than
>files.
>
>When I started implementing socket access
>in GAWK, I actually wanted to implement it
>just the way you expected it (access with
>"<" and ">"). But our honorable Maintainer
>convinced me to implement socket access in
>a different way ("|&").


Editorial note: I'm not sure what the exact "state" of this discussion is,
and I don't want to get mired in it. I will just state that I think that
Ian has been arguing in circles with himself, but beyond that I don't care.

I am stepping in only to point out that ">" does work with sockets (as
long, of course, as the communication is one-way). My program contains:

sock="/inet/tcp/..."
print data > sock;close(sock)

and it works just fine.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com