For Programmers: Free Programming Magazines  


Home > Archive > AWK > February 2008 > Is that problem for awk?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Is that problem for awk?
Kurda Yon

2008-02-07, 6:59 pm

Hi,

I have the following problem. In my text-file each line has the
following format:

field_1 field_2 ... field_n (tf. field_1a, field_2a ... field_ka)

And I need to extract field_1a, field_2a, ...and field_ka. Here I see
several subproblems which I cannot solve:
1. Different lines have different number of fields before the
(tf. ... ) block.
2. (tf. ... ) blocks also contain different number of fields.
3. There is no space between "field_ka" and ")". And I want to remove
")".

Can this problem be easily solved in awk?

Thank you.
Ed Morton

2008-02-07, 6:59 pm



On 2/7/2008 2:37 PM, Kurda Yon wrote:
> Hi,
>
> I have the following problem. In my text-file each line has the
> following format:
>
> field_1 field_2 ... field_n (tf. field_1a, field_2a ... field_ka)
>
> And I need to extract field_1a, field_2a, ...and field_ka. Here I see
> several subproblems which I cannot solve:
> 1. Different lines have different number of fields before the
> (tf. ... ) block.
> 2. (tf. ... ) blocks also contain different number of fields.
> 3. There is no space between "field_ka" and ")". And I want to remove
> ")".
>
> Can this problem be easily solved in awk?


Yes:

$ cat file
field_1 field_2 ... field_n (tf. field_1a, field_2a ... field_ka)
$ awk 'gsub(/.*\(....|\)$/,"")1' file
field_1a, field_2a ... field_ka

Regards,

Ed.

Luuk

2008-02-08, 3:58 am


"Kurda Yon" <kurdayon@yahoo.com> schreef in bericht
news:9fb18506-41f8-49b1-9a22-6f030fc1c58d@c23g2000hsa.googlegroups.com...
> Hi,
>
> I have the following problem. In my text-file each line has the
> following format:
>
> field_1 field_2 ... field_n (tf. field_1a, field_2a ... field_ka)
>
> And I need to extract field_1a, field_2a, ...and field_ka. Here I see
> several subproblems which I cannot solve:
> 1. Different lines have different number of fields before the
> (tf. ... ) block.
> 2. (tf. ... ) blocks also contain different number of fields.
> 3. There is no space between "field_ka" and ")". And I want to remove
> ")".
>
> Can this problem be easily solved in awk?
>
> Thank you.


for a start you could do this, (or even strip the '(' and the ')' )...

awk '{ $0=substr($0,index($0,"(tf")); print $0 }' file

(tf. field_1a, field_2a, ... field_ka)
......


--
Luuk


Luuk

2008-02-08, 7:58 am


"Ed Morton" <morton@lsupcaemnt.com> schreef in bericht
news:47AB6DD4.5060009@lsupcaemnt.com...
>
>
> On 2/7/2008 2:37 PM, Kurda Yon wrote:
>
> Yes:
>
> $ cat file
> field_1 field_2 ... field_n (tf. field_1a, field_2a ... field_ka)
> $ awk 'gsub(/.*\(....|\)$/,"")1' file
> field_1a, field_2a ... field_ka
>
> Regards,
>
> Ed.
>


could someone explain the '1' in "$ awk 'gsub(/.*\(....|\)$/,"")1' file" ?

awk does not seem to do anything with it...
or is it just a typo?

but awk also does not complain when i type:
$ awk 'gsub(/.*\(....|\)$/,"")g' file


--
Luuk


Ed Morton

2008-02-08, 6:59 pm

On 2/8/2008 6:50 AM, Luuk wrote:
> "Ed Morton" <morton@lsupcaemnt.com> schreef in bericht
> news:47AB6DD4.5060009@lsupcaemnt.com...
>
>
>
> could someone explain the '1' in "$ awk 'gsub(/.*\(....|\)$/,"")1' file" ?


It makes sure that even if the input record is empty (in which case gsub() will
return 0) the eventual condition being tested by awk is non-zero/non-null so
that printing the current record occurs even in that case.

The operator used to combine the result of the gsub() with the "1" is
string-concatenation so you can put anything after the gsub() to get a non-null
resultant string, even zero (to get the string "00") or the null string (to get
the string "0" as opposed to the number zero).

> awk does not seem to do anything with it...
> or is it just a typo?


No. Look:

$ cat file
a

c
$ awk 'sub(/./,NR)' file
1
3
$ awk 'sub(/./,NR)1' file
1

3

>
> but awk also does not complain when i type:
> $ awk 'gsub(/.*\(....|\)$/,"")g' file


Right. In that case it evaluates the unassigned variable "g" to the null string
"" which is string-concatenated with the zero result of sub() to give a non-null
"0" string:

$ awk 'sub(/./,NR)g' file
1

3

Regards,

Ed.

Luuk

2008-02-08, 6:59 pm

Ed Morton schreef:
> On 2/8/2008 6:50 AM, Luuk wrote:
>
> It makes sure that even if the input record is empty (in which case gsub() will
> return 0) the eventual condition being tested by awk is non-zero/non-null so
> that printing the current record occurs even in that case.
>
> The operator used to combine the result of the gsub() with the "1" is
> string-concatenation so you can put anything after the gsub() to get a non-null
> resultant string, even zero (to get the string "00") or the null string (to get
> the string "0" as opposed to the number zero).
>
>
> No. Look:
>
> $ cat file
> a
>
> c
> $ awk 'sub(/./,NR)' file
> 1
> 3
> $ awk 'sub(/./,NR)1' file
> 1
>
> 3
>
>
> Right. In that case it evaluates the unassigned variable "g" to the null string
> "" which is string-concatenated with the zero result of sub() to give a non-null
> "0" string:
>
> $ awk 'sub(/./,NR)g' file
> 1
>
> 3
>
> Regards,
>
> Ed.
>



i must have skipped that part of the man-page....

normally i use to do:
awk '{ sub(/./,NR); print $0 }' file

which i indeed something longer... ;-)

--
Luuk
Ed Morton

2008-02-08, 6:59 pm



On 2/8/2008 9:19 AM, Luuk wrote:
> Ed Morton schreef:
>
<snip>[color=darkred]
>
>
>
> i must have skipped that part of the man-page....


I think it's in the fine-print at the bottom under "What???" :-).

> normally i use to do:
> awk '{ sub(/./,NR); print $0 }' file
>
> which i indeed something longer... ;-)
>


You could alternatively do:

awk 'BEGIN{
while ((getline var < ARGV[1]) > 0) {
sub(/./,++nr,var)
print var
}
close(ARGV[1])
exit
}' file

if you really need to fill up a file ;-). Just making the point for anyone else
reading this that while more text can make the script appear to be "more
readable" to non-awk-experienced procedural programmers, it's better to just get
used to and use the awk idioms than get stuck in a C-like paradigm and miss all
the benefits of the awk paradigm.

Ed.

Kenny McCormack

2008-02-08, 6:59 pm

In article <47AC80FB.20406@lsupcaemnt.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
....
>You could alternatively do:
>
>awk 'BEGIN{
> while ((getline var < ARGV[1]) > 0) {
> sub(/./,++nr,var)
> print var
> }
> close(ARGV[1])
> exit
>}' file
>
>if you really need to fill up a file ;-). Just making the point for
>anyone else reading this that while more text can make the script
>appear to be "more readable" to non-awk-experienced procedural
>programmers, it's better to just get used to and use the awk idioms
>than get stuck in a C-like paradigm and miss all the benefits of the
>awk paradigm.
>
> Ed.
>


Indeed. So true. So true.

Keep in mind that if you really want to be verbose/explicit you could
probably add some cats and greps and seds to the above command line as
well. But that would make this more on-topic in comp.unix.shell than
here.

Janis Papanagnou

2008-02-08, 6:59 pm

Luuk wrote:
> Ed Morton schreef:
>
>
>
> i must have skipped that part of the man-page....
>
> normally i use to do:
> awk '{ sub(/./,NR); print $0 }' file
>
> which i indeed something longer... ;-)
>


You can shorten that a bit without sacrificing the action block by

awk '{ sub(/./,NR) } 1' file

Personally I consider the concatenation of sub() and 1

awk 'sub(/./,NR) 1' file

as a hack; it's an unnecessary level of obfuscation[*]. A bit more
verbose but IMO conceptually clearer (no implicit casts) might be

awk 'sub(/./,NR) || 1' file

OTOH, the case where you just want those lines printed where you
actually substituted something, the expression

awk 'sub(/./,NR)' file

seems more natural compared to introducing a block (and maybe an
unnecessary if statement).

Janis

[*] Mixing integral expressions and "invisible" operators, having
implicit type conversions, and just for a boolean condition result.
Ed Morton

2008-02-08, 6:59 pm

On 2/8/2008 11:11 AM, Janis Papanagnou wrote:
> Luuk wrote:

<snip>
>
>
> You can shorten that a bit without sacrificing the action block by
>
> awk '{ sub(/./,NR) } 1' file
>
> Personally I consider the concatenation of sub() and 1
>
> awk 'sub(/./,NR) 1' file
>
> as a hack; it's an unnecessary level of obfuscation[*].


I'm all about conciseness. Within that, normally, I'd trade clarity over brevity
if there is a trade-off as in this case BUT in this particular case, I really
like favoring brevity since it's a very small script so there isn't a lot of
clutter with other details and to understand what awk is doing in that script,
you have to understand how awk works in terms of condition-action segments,
string concatenation, default actions, and return codes so to understand what
the above is doing requires you to know a lot that you SHOULD know anyway before
doing any awk programming.

So, I like the above because if you understand the awk paradigm already, the
script is perfectly clear and if you don't then the small amount of effort
required to learn how that small script works takes you a long way to that
understanding.

> A bit more
> verbose but IMO conceptually clearer (no implicit casts) might be
>
> awk 'sub(/./,NR) || 1' file


That seems confusing to me as I can't see any benefit to it over any other
approach so I think people would waste their time trying to figure out what the
benefit is. If you're going to do that, IMHO you'd be better off with:

awk '{sub(/./,NR);print}' file
or
awk '{sub(/./,NR)}1' file

as they're both clearer and about the same length.

Regards,

Ed

> OTOH, the case where you just want those lines printed where you
> actually substituted something, the expression
>
> awk 'sub(/./,NR)' file
>
> seems more natural compared to introducing a block (and maybe an
> unnecessary if statement).
>
> Janis
>
> [*] Mixing integral expressions and "invisible" operators, having
> implicit type conversions, and just for a boolean condition result.


Janis Papanagnou

2008-02-08, 6:59 pm

Ed Morton wrote:
> On 2/8/2008 11:11 AM, Janis Papanagnou wrote:

[snip]
>
>
> I'm all about conciseness. Within that, normally, I'd trade clarity over brevity
> if there is a trade-off as in this case BUT in this particular case, I really
> like favoring brevity since it's a very small script so there isn't a lot of
> clutter with other details and to understand what awk is doing in that script,
> you have to understand how awk works in terms of condition-action segments,
> string concatenation, default actions, and return codes so to understand what
> the above is doing requires you to know a lot that you SHOULD know anyway before
> doing any awk programming.
>
> So, I like the above because if you understand the awk paradigm already, the
> script is perfectly clear and if you don't then the small amount of effort
> required to learn how that small script works takes you a long way to that
> understanding.


Oh, I don't think that learning a couple characters of "awk paradigm"
would be a problem, and especially not for experienced awk programmers.
The expression is concise enough to be memorized as a "programming
pattern"; and, yes, even for newbies.

You know that I have no problem understanding all those expressions.
My critiques is the unnecessary complexity of unnecessarily involved
concepts in that expression; mainly the implicit conversion. Maybe I
shouldn't have put the point in my footnote, so I place it here...


It's actually the implicit complexity of

int(concat(string(sub(...)), string(1)))

that annoys me. (Others might not mind about such internals, though.)
[color=darkred]

Which is just

or(sub(...), 1)

all done within the integer domain.
[color=darkred]
> That seems confusing to me as I can't see any benefit to it over any other
> approach so I think people would waste their time trying to figure out what the
> benefit is.


Consider it as a programming pattern, as you do with the textually
more concise concatenation pattern.

> If you're going to do that, IMHO you'd be better off with:
>
> awk '{sub(/./,NR);print}' file
> or
> awk '{sub(/./,NR)}1' file
>
> as they're both clearer and about the same length.


Yes, I agree.

Janis

> Regards,
>
> Ed


[snip]
William James

2008-02-09, 3:58 am

On Feb 8, 11:29 am, Ed Morton <mor...@lsupcaemnt.com> wrote:
> On 2/8/2008 11:11 AM, Janis Papanagnou wrote:
>
> <snip>
>
>
>
>
>
>

True.
[color=darkred]
>
> I'm all about conciseness.


What arrogance! No one here cares what little Ed is
about. (Note that he implies that he cares
nothing for correctness or readability.)

> Within that, normally, I'd trade clarity over brevity
> if there is a trade-off as in this case BUT in this particular case, I really
> like favoring brevity since it's a very small script so there isn't a lot of
> clutter with other details and to understand what awk is doing in that script,
> you have to understand how awk works in terms of condition-action segments,
> string concatenation, default actions, and return codes so to understand what
> the above is doing requires you to know a lot that you SHOULD know anyway before
> doing any awk programming.


Shockingly arrogant nonsense. One doesn't have to know
everything about a programming language before he starts
to use that language. From "The AWK Programming Language":

Chapter 1 is a tutorial on the bare minimum necessary
to get started; after reading even a few pages, you
should have enough information to begin writing useful
programs.

Note to newbies. Don't be fooled by Ed's attempts to
pass himself off as an exemplary awk programmer. He
has a record of posting buggy, untested code and code
that solves less of the problem than what has
already been posted by others.
Luuk

2008-02-09, 3:58 am

William James schreef:
>
> Shockingly arrogant nonsense. One doesn't have to know
> everything about a programming language before he starts
> to use that language. From "The AWK Programming Language":
>
> Chapter 1 is a tutorial on the bare minimum necessary
> to get started; after reading even a few pages, you
> should have enough information to begin writing useful
> programs.
>
> Note to newbies. Don't be fooled by Ed's attempts to
> pass himself off as an exemplary awk programmer. He
> has a record of posting buggy, untested code and code
> that solves less of the problem than what has
> already been posted by others.


I see no need to flame anyone in here,

If someone does not like what another person writes then please IGNORE
him/her

If someone writes something which is definitly WRONG, please send a
message why he's wrong, and give a proper solution.


--
Luuk
Kenny McCormack

2008-02-09, 7:57 am

In article <pl8085-e01.ln1@leafnode.a62-251-88-195.adsl.xs4all.nl>,
Luuk <Luuk@invalid.lan> wrote:
>William James schreef:
>
>I see no need to flame anyone in here,


Yes, it certainly looks like somebody (WJ) didn't get enough mommy love.

>If someone does not like what another person writes then please IGNORE
>him/her
>
>If someone writes something which is definitly WRONG, please send a
>message why he's wrong, and give a proper solution.


Indeed.

Ed Morton

2008-02-09, 6:58 pm



On 2/9/2008 3:34 AM, Luuk wrote:
> William James schreef:
>
>
>
> I see no need to flame anyone in here,


William shows up in this NG from time time. He appears lucid for a period,
sometimes posting useful advice, then he suddenly posts a bizarre negative rant
about something before disappearing completely for several months. I've no idea
what causes that behavior but IMHO the best advice is to just ignore the rant as
it's harmless and when he comes back he'll be OK again for a while.

Ed.

Andrew Schorr

2008-02-09, 6:58 pm

On Feb 7, 3:37 pm, Kurda Yon <kurda...@yahoo.com> wrote:
> I have the following problem. In my text-file each line has the
> following format:
>
> field_1 field_2 ... field_n (tf. field_1a, field_2a ... field_ka)
>
> And I need to extract field_1a, field_2a, ...and field_ka.


Just for variety, here's a solution that involves using a regular
expression
to grab the part of the input that you do want (as opposed to the
other solutions
which focus on discarding the part of the input that you don't want):

bash-3.1$ cat /tmp/file
field_1 field_2 ... field_n (tf. field_1a, field_2a ... field_ka)
bash-3.1$ gawk 'match($0,/\(tf\. (.*)\)$/,f) {print f[1]}' /tmp/file
field_1a, field_2a ... field_ka

Granted this involves using a gawk extension (the 3rd array argument
to
match), but this is a very powerful feature that can help in many
situations.

Regards,
Andy

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com