Code Comments
Programming Forum and web based access to our favorite programming groups.Hey everyone... Here's a question: I want to take an input line in awk and output the line from the start of field 9 until the end of line EXACTLY as input. I can't do a simple loop from field 9 until NF as that strips out the FS chars. Field 9 does not sta rt at the same column each time, so I can't use substr and such... I suppose I could do an index, searching for $9 in $0, but that's error prone if the tex t appears more than in just $9... What I really want is something like a shift operator so I can shift off the first 9 arguments, or something to give me the starting character position o f field 9. Ideas? I know I can use a shell script to do this, but I was hoping for a m ore elegant solution in awk. My awk program is complete, I just need this one thing. -- It's 3:30, why aren't you at work? I.. I didn't feel like it.
Post Follow-up to this messageIn article <slrnd52i3o.f9b.drool@noudess.droolsretreat.com>,
Paul Coene <drool@noudess.droolsretreat.com> wrote:
>Hey everyone...
>
>Here's a question:
>
>I want to take an input line in awk and output the line from the start of
>field 9 until the end of line EXACTLY as input. I can't do a simple loop
>from field 9 until NF as that strips out the FS chars. Field 9 does not st
art
>at the same column each time, so I can't use substr and such... I suppose
I
>could do an index, searching for $9 in $0, but that's error prone if the te
xt
>appears more than in just $9...
This is a fairly common problem and it is one that is not directly
addressed by the language. I have often wished for this sort of
capability.
I'm pretty sure this will work, assuming your FS is the default
("whitespace"):
sub(/^[ \t]*/,"")
for (i=1; i<=8; i++)
sub(/[^ \t]*^[ \t]*/,"")
>What I really want is something like a shift operator so I can shift off th
e
>first 9 arguments, or something to give me the starting character position
of
>field 9.
Alas, no shift operator in AWK. I've often wondered what the original
rationale was for the "when the input is line is changed, you lose your
original spacing" "feature" was. I've never seen any benefit to it, and
working around it has achieved FAQ status.
>Ideas? I know I can use a shell script to do this.
Really? I don't see it. But then again, I've never been a fan of these
complicated shell-only solutions, using obscure and poorly documented shell
features.
Post Follow-up to this message> I'm pretty sure this will work, assuming your FS is the default
> ("whitespace"):
>
> sub(/^[ \t]*/,"")
> for (i=1; i<=8; i++)
> sub(/[^ \t]*^[ \t]*/,"")
I'm not sure what you're doing above.. making fields 1-8 one field? That st
ill
doesn't output fields 9 until NF unaltered.. Maybe we misunderstand one
another. This is what I have done, ugly but effective:
########################################
####################################
##
# Ok, now we need to toss off all chars up to the 9th field and keep the res
t
# of the input line verbatim
#
if (NF > 8) # should always be, but don't want infinate loop below
{
curchar=1;
# Skip over 8 fields and any precdeding FS chars
for (i=1;i<=8;++i)
{
while (substr($0,curchar,1) ~ FS)
curchar++;
while (substr($0,curchar,1) !~ FS)
curchar++;
}
# Skip the FS chars preceding field 9
while (substr($0,curchar,1) ~ FS)
curchar++;
# Print the rest of $0 from $9 on...
print substr($0, curchar)
}
########################################
####################################
##
--
It's 3:30, why aren't you at work?
I.. I didn't feel like it.
Post Follow-up to this messageIn article <slrnd52k9b.fg7.drool@noudess.droolsretreat.com>, Paul Coene <drool@noudess.droolsretreat.com> wrote: > >I'm not sure what you're doing above.. making fields 1-8 one field? That s till >doesn't output fields 9 until NF unaltered.. Maybe we misunderstand one >another. Well, if you misunderstand, then you need to go back and read more carefully. Notes: 1) The above doesn't actually print the line - left as an exercise for the reader. 2) I usually post "minimalist - leave the i dotting and t crossing to the reader" type solutions. By the way, what is your FS?
Post Follow-up to this message
Paul Coene wrote:
> Hey everyone...
>
> Here's a question:
>
> I want to take an input line in awk and output the line from the start of
> field 9 until the end of line EXACTLY as input. I can't do a simple loop
> from field 9 until NF as that strips out the FS chars. Field 9 does not s
tart
> at the same column each time, so I can't use substr and such... I suppose
I
> could do an index, searching for $9 in $0, but that's error prone if the t
ext
> appears more than in just $9...
>
> What I really want is something like a shift operator so I can shift off t
he
> first 9 arguments, or something to give me the starting character position
of
> field 9.
>
> Ideas? I know I can use a shell script to do this, but I was hoping for a
more
> elegant solution in awk. My awk program is complete, I just need this one
> thing.
This will do it in gawk:
gawk --re-interval 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
The "8" is obviously the number of fields you want to delete from the
start of each record.
Regards,
Ed.
Post Follow-up to this messagePaul Coene wrote:
> I want to take an input line in awk and output the line from
> the start of field 9 until the end of line EXACTLY as input.
> I can't do a simple loop from field 9 until NF as that strips
> out the FS chars. Field 9 does not start at the same column
> each time, so I can't use substr and such...
I think this function will do what you want. Demonstrated with Cygwin,
bash shell, GNU awk:
epement@SW218-ET03 ~
$ echo 'a b c d e f g h i j k l'
a b c d e f g h i j k l
epement@SW218-ET03 ~
$ cat tail.awk
function tail(line, arg) {
# returns the tail of a line, keeping field separators
# of varying lengths. arg is the last parameter to omit
for (i=1; i<=arg; i++)
sub($i, "", line)
sub(/^[ \t]*/, "", line)
return line
}
{ print tail($0,8) }
epement@SW218-ET03 ~
$ echo 'a b c d e f g h i j k l' | gawk -f tail.awk
i j k l
Hope this solution is helpful.
--
Eric Pement
Post Follow-up to this message
Ed Morton wrote:
>
>
> Paul Coene wrote:
>
<snip>
> This will do it in gawk:
>
> gawk --re-interval 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
Or in a POSIX awk (e.g. /usr/xpg4/bin/awk on Solaris) where interval
expressions are enabled by default:
awk 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
Note that the [:space:] character class includes newlines (see
http://www.gnu.org/software/gawk/ma...har_002dclasses
).
For the default FS, you could use [:blank:] if you prefer.
> The "8" is obviously the number of fields you want to delete from the
> start of each record.
>
> Regards,
>
> Ed.
Post Follow-up to this messageIn article <slrnd52i3o.f9b.drool@noudess.droolsretreat.com>,
Paul Coene <drool@noudess.droolsretreat.com> wrote:
>Hey everyone...
>
>Here's a question:
>
>I want to take an input line in awk and output the line from the start of
>field 9 until the end of line EXACTLY as input. I can't do a simple loop
>from field 9 until NF as that strips out the FS chars. Field 9 does not st
art
>at the same column each time, so I can't use substr and such... I suppose
I
>could do an index, searching for $9 in $0, but that's error prone if the te
xt
>appears more than in just $9...
This is a fairly common problem and it is one that is not directly
addressed by the language. I have often wished for this sort of
capability.
I'm pretty sure this will work, assuming your FS is the default
("whitespace"):
sub(/^[ \t]*/,"")
for (i=1; i<=8; i++)
sub(/[^ \t]*^[ \t]*/,"")
>What I really want is something like a shift operator so I can shift off th
e
>first 9 arguments, or something to give me the starting character position
of
>field 9.
Alas, no shift operator in AWK. I've often wondered what the original
rationale was for the "when the input is line is changed, you lose your
original spacing" "feature" was. I've never seen any benefit to it, and
working around it has achieved FAQ status.
>Ideas? I know I can use a shell script to do this.
Really? I don't see it. But then again, I've never been a fan of these
complicated shell-only solutions, using obscure and poorly documented shell
features.
Post Follow-up to this messagePaul Coene wrote:
> I want to take an input line in awk and output the line from
> the start of field 9 until the end of line EXACTLY as input.
> I can't do a simple loop from field 9 until NF as that strips
> out the FS chars. Field 9 does not start at the same column
> each time, so I can't use substr and such...
I think this function will do what you want. Demonstrated with Cygwin,
bash shell, GNU awk:
epement@SW218-ET03 ~
$ echo 'a b c d e f g h i j k l'
a b c d e f g h i j k l
epement@SW218-ET03 ~
$ cat tail.awk
function tail(line, arg) {
# returns the tail of a line, keeping field separators
# of varying lengths. arg is the last parameter to omit
for (i=1; i<=arg; i++)
sub($i, "", line)
sub(/^[ \t]*/, "", line)
return line
}
{ print tail($0,8) }
epement@SW218-ET03 ~
$ echo 'a b c d e f g h i j k l' | gawk -f tail.awk
i j k l
Hope this solution is helpful.
--
Eric Pement
Post Follow-up to this message
Ed Morton wrote:
>
>
> Paul Coene wrote:
>
<snip>
> This will do it in gawk:
>
> gawk --re-interval 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
Or in a POSIX awk (e.g. /usr/xpg4/bin/awk on Solaris) where interval
expressions are enabled by default:
awk 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
Note that the [:space:] character class includes newlines (see
http://www.gnu.org/software/gawk/ma...har_002dclasses
).
For the default FS, you could use [:blank:] if you prefer.
> The "8" is obviously the number of fields you want to delete from the
> start of each record.
>
> Regards,
>
> Ed.
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.