Home > Archive > AWK > April 2005 > Get range of fields without losing FS chars
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Get range of fields without losing FS chars
|
|
| Paul Coene 2005-04-06, 12:05 pm |
| Hey everyone...
Here's a question:
I want to take an input line in awk and output the line from the start of
field 9 until the end of line EXACTLY as input. I can't do a simple loop
from field 9 until NF as that strips out the FS chars. Field 9 does not start
at the same column each time, so I can't use substr and such... I suppose I
could do an index, searching for $9 in $0, but that's error prone if the text
appears more than in just $9...
What I really want is something like a shift operator so I can shift off the
first 9 arguments, or something to give me the starting character position of
field 9.
Ideas? I know I can use a shell script to do this, but I was hoping for a more
elegant solution in awk. My awk program is complete, I just need this one
thing.
--
It's 3:30, why aren't you at work?
I.. I didn't feel like it.
| |
| Kenny McCormack 2005-04-06, 12:05 pm |
| In article <slrnd52i3o.f9b.drool@noudess.droolsretreat.com>,
Paul Coene <drool@noudess.droolsretreat.com> wrote:
>Hey everyone...
>
>Here's a question:
>
>I want to take an input line in awk and output the line from the start of
>field 9 until the end of line EXACTLY as input. I can't do a simple loop
>from field 9 until NF as that strips out the FS chars. Field 9 does not start
>at the same column each time, so I can't use substr and such... I suppose I
>could do an index, searching for $9 in $0, but that's error prone if the text
>appears more than in just $9...
This is a fairly common problem and it is one that is not directly
addressed by the language. I have often wished for this sort of
capability.
I'm pretty sure this will work, assuming your FS is the default
("whitespace"):
sub(/^[ \t]*/,"")
for (i=1; i<=8; i++)
sub(/[^ \t]*^[ \t]*/,"")
>What I really want is something like a shift operator so I can shift off the
>first 9 arguments, or something to give me the starting character position of
>field 9.
Alas, no shift operator in AWK. I've often wondered what the original
rationale was for the "when the input is line is changed, you lose your
original spacing" "feature" was. I've never seen any benefit to it, and
working around it has achieved FAQ status.
>Ideas? I know I can use a shell script to do this.
Really? I don't see it. But then again, I've never been a fan of these
complicated shell-only solutions, using obscure and poorly documented shell
features.
| |
| Paul Coene 2005-04-06, 12:05 pm |
| > I'm pretty sure this will work, assuming your FS is the default
> ("whitespace"):
>
> sub(/^[ \t]*/,"")
> for (i=1; i<=8; i++)
> sub(/[^ \t]*^[ \t]*/,"")
I'm not sure what you're doing above.. making fields 1-8 one field? That still
doesn't output fields 9 until NF unaltered.. Maybe we misunderstand one
another. This is what I have done, ugly but effective:
########################################
######################################
# Ok, now we need to toss off all chars up to the 9th field and keep the rest
# of the input line verbatim
#
if (NF > 8) # should always be, but don't want infinate loop below
{
curchar=1;
# Skip over 8 fields and any precdeding FS chars
for (i=1;i<=8;++i)
{
while (substr($0,curchar,1) ~ FS)
curchar++;
while (substr($0,curchar,1) !~ FS)
curchar++;
}
# Skip the FS chars preceding field 9
while (substr($0,curchar,1) ~ FS)
curchar++;
# Print the rest of $0 from $9 on...
print substr($0, curchar)
}
########################################
######################################
--
It's 3:30, why aren't you at work?
I.. I didn't feel like it.
| |
| Kenny McCormack 2005-04-06, 12:05 pm |
| In article <slrnd52k9b.fg7.drool@noudess.droolsretreat.com>,
Paul Coene <drool@noudess.droolsretreat.com> wrote:
>
>I'm not sure what you're doing above.. making fields 1-8 one field? That still
>doesn't output fields 9 until NF unaltered.. Maybe we misunderstand one
>another.
Well, if you misunderstand, then you need to go back and read more
carefully.
Notes:
1) The above doesn't actually print the line - left as an exercise
for the reader.
2) I usually post "minimalist - leave the i dotting and t crossing
to the reader" type solutions.
By the way, what is your FS?
| |
| Ed Morton 2005-04-06, 12:05 pm |
|
Paul Coene wrote:
> Hey everyone...
>
> Here's a question:
>
> I want to take an input line in awk and output the line from the start of
> field 9 until the end of line EXACTLY as input. I can't do a simple loop
> from field 9 until NF as that strips out the FS chars. Field 9 does not start
> at the same column each time, so I can't use substr and such... I suppose I
> could do an index, searching for $9 in $0, but that's error prone if the text
> appears more than in just $9...
>
> What I really want is something like a shift operator so I can shift off the
> first 9 arguments, or something to give me the starting character position of
> field 9.
>
> Ideas? I know I can use a shell script to do this, but I was hoping for a more
> elegant solution in awk. My awk program is complete, I just need this one
> thing.
This will do it in gawk:
gawk --re-interval 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
The "8" is obviously the number of fields you want to delete from the
start of each record.
Regards,
Ed.
| |
| Eric Pement 2005-04-06, 12:05 pm |
| Paul Coene wrote:
> I want to take an input line in awk and output the line from
> the start of field 9 until the end of line EXACTLY as input.
> I can't do a simple loop from field 9 until NF as that strips
> out the FS chars. Field 9 does not start at the same column
> each time, so I can't use substr and such...
I think this function will do what you want. Demonstrated with Cygwin,
bash shell, GNU awk:
epement@SW218-ET03 ~
$ echo 'a b c d e f g h i j k l'
a b c d e f g h i j k l
epement@SW218-ET03 ~
$ cat tail.awk
function tail(line, arg) {
# returns the tail of a line, keeping field separators
# of varying lengths. arg is the last parameter to omit
for (i=1; i<=arg; i++)
sub($i, "", line)
sub(/^[ \t]*/, "", line)
return line
}
{ print tail($0,8) }
epement@SW218-ET03 ~
$ echo 'a b c d e f g h i j k l' | gawk -f tail.awk
i j k l
Hope this solution is helpful.
--
Eric Pement
| |
| Ed Morton 2005-04-06, 12:05 pm |
|
Ed Morton wrote:
>
>
> Paul Coene wrote:
>
<snip>[color=darkred]
> This will do it in gawk:
>
> gawk --re-interval 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
Or in a POSIX awk (e.g. /usr/xpg4/bin/awk on Solaris) where interval
expressions are enabled by default:
awk 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
Note that the [:space:] character class includes newlines (see
http://www.gnu.org/software/gawk/ma...har_002dclasses).
For the default FS, you could use [:blank:] if you prefer.
> The "8" is obviously the number of fields you want to delete from the
> start of each record.
>
> Regards,
>
> Ed.
| |
| Kenny McCormack 2005-04-06, 12:40 pm |
| In article <slrnd52i3o.f9b.drool@noudess.droolsretreat.com>,
Paul Coene <drool@noudess.droolsretreat.com> wrote:
>Hey everyone...
>
>Here's a question:
>
>I want to take an input line in awk and output the line from the start of
>field 9 until the end of line EXACTLY as input. I can't do a simple loop
>from field 9 until NF as that strips out the FS chars. Field 9 does not start
>at the same column each time, so I can't use substr and such... I suppose I
>could do an index, searching for $9 in $0, but that's error prone if the text
>appears more than in just $9...
This is a fairly common problem and it is one that is not directly
addressed by the language. I have often wished for this sort of
capability.
I'm pretty sure this will work, assuming your FS is the default
("whitespace"):
sub(/^[ \t]*/,"")
for (i=1; i<=8; i++)
sub(/[^ \t]*^[ \t]*/,"")
>What I really want is something like a shift operator so I can shift off the
>first 9 arguments, or something to give me the starting character position of
>field 9.
Alas, no shift operator in AWK. I've often wondered what the original
rationale was for the "when the input is line is changed, you lose your
original spacing" "feature" was. I've never seen any benefit to it, and
working around it has achieved FAQ status.
>Ideas? I know I can use a shell script to do this.
Really? I don't see it. But then again, I've never been a fan of these
complicated shell-only solutions, using obscure and poorly documented shell
features.
| |
| Eric Pement 2005-04-06, 3:55 pm |
| Paul Coene wrote:
> I want to take an input line in awk and output the line from
> the start of field 9 until the end of line EXACTLY as input.
> I can't do a simple loop from field 9 until NF as that strips
> out the FS chars. Field 9 does not start at the same column
> each time, so I can't use substr and such...
I think this function will do what you want. Demonstrated with Cygwin,
bash shell, GNU awk:
epement@SW218-ET03 ~
$ echo 'a b c d e f g h i j k l'
a b c d e f g h i j k l
epement@SW218-ET03 ~
$ cat tail.awk
function tail(line, arg) {
# returns the tail of a line, keeping field separators
# of varying lengths. arg is the last parameter to omit
for (i=1; i<=arg; i++)
sub($i, "", line)
sub(/^[ \t]*/, "", line)
return line
}
{ print tail($0,8) }
epement@SW218-ET03 ~
$ echo 'a b c d e f g h i j k l' | gawk -f tail.awk
i j k l
Hope this solution is helpful.
--
Eric Pement
| |
| Ed Morton 2005-04-06, 8:55 pm |
|
Ed Morton wrote:
>
>
> Paul Coene wrote:
>
<snip>[color=darkred]
> This will do it in gawk:
>
> gawk --re-interval 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
Or in a POSIX awk (e.g. /usr/xpg4/bin/awk on Solaris) where interval
expressions are enabled by default:
awk 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
Note that the [:space:] character class includes newlines (see
http://www.gnu.org/software/gawk/ma...har_002dclasses).
For the default FS, you could use [:blank:] if you prefer.
> The "8" is obviously the number of fields you want to delete from the
> start of each record.
>
> Regards,
>
> Ed.
| |
| Ed Morton 2005-04-09, 3:55 pm |
|
Ed Morton wrote:
>
>
> Paul Coene wrote:
>
<snip>[color=darkred]
> This will do it in gawk:
>
> gawk --re-interval 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
Or in a POSIX awk (e.g. /usr/xpg4/bin/awk on Solaris) where interval
expressions are enabled by default:
awk 'sub(/ ^[[:space:]]*([^[:space:]]*[[:space:]]*)
{8}/,"")'
Note that the [:space:] character class includes newlines (see
http://www.gnu.org/software/gawk/ma...har_002dclasses).
For the default FS, you could use [:blank:] if you prefer.
> The "8" is obviously the number of fields you want to delete from the
> start of each record.
>
> Regards,
>
> Ed.
|
|
|
|
|