For Programmers: Free Programming Magazines  


Home > Archive > AWK > January 2006 > FTP in gawk









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author FTP in gawk
Todd Coram

2006-01-10, 3:58 am

Hello,
I am building a suite of awk scripts to support a website CMS (content
management system) that I am building. I am trying to reduce the
dependency on lots of supplemental programs, so I am doing stuff in
gawk that is probably not best done in gawk ;-)

Below is an implementation of FTP in gawk (implementing just "put" not
"get" -- this script is meant to upload a bunch of rendered HTML,
images, etc to an FTP server running on a website). As I understand
BINMODE, it wasn't supposed to be used this way, but it works (although
in practice any file containing the string in RS (BinRS) can screw it
up). In particular, I am opening the "passive TCP" connection (which
must be open before a STOR) with:

printf("") |& dataport;

before going into BINMODE. This appears to NOT send anything but opens
the connection anyway. Is this reliable, or am I exploiting behavior
that may change in future releases of gawk?

I am abusing gawk and BINMODE in other scripts too, including a base64
decode, mime extractor, etc.

For those interested (or may find an awk based FTP script useful), here
it is (approx 100 lines) -- Warning, it has only been tested on a
couple of FTP servers...

-----------------------------------------------cut
here------------------------------------------------------------------

#!/usr/bin/gawk -f
#
# ftpput - put files via ftp
#
# Author: Todd Coram (todd at maplefish DOT com).
# Version: 0.9 1/9/2006
#
# A simple ftp file "put" transfer script. If a -d is supplied then try
and
# make the directories that are part of each file's filepath. 'rootdir'
is the remote
# FTP server directory to CWD to before starting the transfers.
#

BEGIN {
if (ARGC < 5) {
print "Usage: ftpput [-d] host user rootdir password file1 ..
fileN" \
>"/dev/stderr";

exit 1
}

# The record separator is set to a string that should
(statistically?)
# not occur even in binary data (ha!). We append a pid so that this
# app can transfer itself (makes the separator dynamic).
#
BinRS="yummydeadbeaf" PROCINFO["pid"];

Verbose=1;
RS = ORS = "\r\n";

mkdir = (optidx = (ARGV[1] == "-d") ? 2 : 1) - 1;
host = ARGV[optidx++]
user = ARGV[optidx++]
pass = ARGV[optidx++]
rootdir = ARGV[optidx++]

Ftphost = "/inet/tcp/0/" host "/21";
expect("", "220");
if (expect("USER " user, "331|230") ~ "^331") {
expect("PASS " pass, "230");
}
expect("TYPE I", "200");
expect("CWD " rootdir, "200|250");

for (; optidx < ARGC; optidx++) {
if (mkdir) mkdirhier(ARGV[optidx]);
pasv_send(ARGV[optidx]);
}

expect("QUIT", "221");

}

function mkdirhier(file, part,pidx,pcnt,dir) {
pcnt = split(file, part, "/")
for (pidx = 1; pidx < pcnt; pidx++) {
dir = (pidx == 1) ? part[pidx] : dir "/" part[pidx];
expect("MKD " dir, "250|257|521");
}

}

function pasv_send(file, ln,oldrs,oldbinmode,addr,port,dataport) {
ln = expect("PASV", "227");
if (!match(ln,
/([0-9]+),([0-9]+),([0-9]+),([0-9]+),([0-9]+),([0-9]+)/,aa)) {
print "Couldn't parse connection information from " ln
>"/dev/stderr";


exit 1;
}
addr = aa[1] "." aa[2] "." aa[3] "." aa[4];
port = (aa[5]*256)+aa[6];
dataport = "/inet/tcp/0/" addr "/" port;
printf("") |& dataport; # force connection open

expect("STOR " file, "150");
if (Verbose) printf(" * Sending %s to %s\n", file, dataport)

>"/dev/stderr";


oldrs = RS; RS = BinRS; ORS = "";
oldbinmode = BINMODE; BINMODE="rw";

while ((getline ln <file) > 0) {
print ln |& dataport; fflush(dataport);
}
close(dataport);
close(file);

RS = ORS = oldrs;
BINMODE = oldbinmode;

expect("","226");

}

function expect(query,code, line, a) {
if (query != "") {
print query |& Ftphost;
if (Verbose) printf("Sending: %s\n", query);
}
Ftphost |& getline line;
if (Verbose) printf(" Got: %s\n", line);
if (match(line,"^(" code ")",a)) {
return line;
}
printf("ERROR on (%s): Expected (%s). Got %s\n",
query, code, line) > "/dev/stderr";
exit 1;

}

Jürgen Kahrs

2006-01-10, 3:58 am

Todd Coram wrote:

> before going into BINMODE. This appears to NOT send anything but opens
> the connection anyway. Is this reliable, or am I exploiting behavior
> that may change in future releases of gawk?


BINMODE has a meaning only on non-POSIX systems.
It is rather unlikely that its behavior will change.
Ed Morton

2006-01-10, 3:58 am

Todd Coram wrote:
> Hello,
> I am building a suite of awk scripts to support a website CMS (content
> management system) that I am building. I am trying to reduce the
> dependency on lots of supplemental programs, so I am doing stuff in
> gawk that is probably not best done in gawk ;-)


on topic: $0 = gensub(/( not)( best)/,"\\2\\1","")

off topic: save yourself a lot of headaches - just pick a shell.

Sure, it's a cute challenge to do everything you can in a given
language, but it's not exactly the most productive way to spend your
time and, when necessary, you stand a much better chance of getting
sensible answers by posting sensible questions to comp.unix.shell rather
than posting questions here about how to force gawk to jump through
flaming hoops.

Ed.
Todd Coram

2006-01-10, 3:58 am

Sigh. Sorry. I'm a newbie to this group (not a newbie to awk, shells,
scripting, unix, etc).
I have my reasons for using gawk to do this. If it's considered noise
to post stuff like this here, then I'll back off and find another
venue.

/todd


Ed Morton wrote:
> Todd Coram wrote:
>
> on topic: $0 = gensub(/( not)( best)/,"\\2\\1","")
>
> off topic: save yourself a lot of headaches - just pick a shell.
>
> Sure, it's a cute challenge to do everything you can in a given
> language, but it's not exactly the most productive way to spend your
> time and, when necessary, you stand a much better chance of getting
> sensible answers by posting sensible questions to comp.unix.shell rather
> than posting questions here about how to force gawk to jump through
> flaming hoops.
>
> Ed.


Ed Morton

2006-01-10, 3:58 am

Todd Coram wrote:
> Sigh. Sorry. I'm a newbie to this group (not a newbie to awk, shells,
> scripting, unix, etc).
> I have my reasons for using gawk to do this. If it's considered noise
> to post stuff like this here, then I'll back off and find another
> venue.


If you really need to do it this way, it's not totally outrageous to
post questions as we do sometimes see the "lang" part of "comp.lang.awk"
being bent a bit, I'm just saying you probably won't see a lot of people
with experience in doing what you ask post answers so any answers you do
get you should probably take with a pinch of salt.

Ed.

P.S. please don't top-post.
Andrew Schorr

2006-01-10, 6:59 pm


Todd Coram wrote:
> printf("") |& dataport;
>
> before going into BINMODE. This appears to NOT send anything but opens
> the connection anyway. Is this reliable, or am I exploiting behavior
> that may change in future releases of gawk?


Based on a quick look at the gawk code, my guess is that this should
be pretty reliable. It seems like the socket should be opened at
first reference. I could only see this failing if the printf("") was
recognized
as a no-op and the whole statement was optimized away. My gut is that
it's unlikely to happen.

> For those interested (or may find an awk based FTP script useful), here
> it is (approx 100 lines) -- Warning, it has only been tested on a
> couple of FTP servers...


Very cute.

> oldrs = RS; RS = BinRS; ORS = "";
> oldbinmode = BINMODE; BINMODE="rw";
>
> while ((getline ln <file) > 0) {
> print ln |& dataport; fflush(dataport);
> }
> close(dataport);
> close(file);
>
> RS = ORS = oldrs;
> BINMODE = oldbinmode;


I wonder if you couldn't implement this part more sensibly by using the
readfile extension.
This is easiest to use with xgawk which has the shared library facility
cleanly implemented.
Check out http://sourceforge.net/projects/xmlgawk. With xgawk, you can
say something like:

$ xgawk -lreadfile 'BEGIN { s =
readfile("/boot/vmlinuz-2.6.10-0.ti.4.fc1smp"); printf "%s", s}' | diff
/boot/vmlinuz-2.6.10-0.ti.4.fc1smp -

That could solve all your concerns regarding binary data (and perhaps
be more efficient).

Regards,
Andy

Todd Coram

2006-01-14, 6:55 pm

> I wonder if you couldn't implement this part more sensibly by using the
> readfile extension.


Yes. That would certainly be more sensible. I am shooting for a pretty
spartan deployment environment, so I don't know if I can consider
xmlgawk yet... but thanks for the pointer. I'll check it out.

I'm building upon very restricted set of binaries (bourne shell, a few
unix utils) and gawk fits nicely. I am looking at replacing the binary
transfer portion of the gawk script with gawk spawning netcat (leaving
control of the ftp session in awk).

Thanks for the comments,

/todd

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com