For Programmers: Free Programming Magazines  


Home > Archive > AWK > September 2006 > Storing just the match in a line, Was: Re: How do you do this in awk?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Storing just the match in a line, Was: Re: How do you do this in awk?
jeanne.petrangelo@gmail.com

2006-09-15, 6:56 pm

Jon LaBadie wrote:
> jeanne.petrangelo@gmail.com wrote:
>
> Not the cause of your syntax error, but your pattern for matching
> is flawed. I presume you mean 'a space' to ensure the start of a
> word, followed by '*' to match anything upto the '.bar'.


You are correct.

[snip]
> I think this would be more like what you wanted:
>
> / [^ ]*\.bar/


Thank you!

> However, note this will fail to properly match foo.bar at the
> beginning of the line (no space) or after a tab or punctuation
> (match too much, all the way back to a space).



Since foo.bar will never occur at the start of a line and will only
occur after a space, this is okay, but I will polish up my regexp
knowledge.
[color=darkred]

So now I have:
===
if (line ~ / [^ ]*\.bar/ ) {
print line;
extract(line," [^ ]*\.bar"); { print RMATCH }
}
===
....and it works! Note I had to add a semicolon after the function
call, or else I got that same parse error.

Thank you!
JP

jeanne.petrangelo@gmail.com

2006-09-15, 6:56 pm

Ed Morton wrote:
> casioculture@gmail.com wrote:
>
> function extract(str,regexp)
> { RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
> return RSTART
> }
> extract($0,"word\\.[1-9]") { print RMATCH }
>


I'm hitting up against the same thing, but it's not working correctly
for me. My gawk script, running in a WinXP console window (sorry,
nothing I can do about that), has to read a file and find matches with
the pattern "foo.bar" where "foo" is unknown... all I know is the
".bar" part. These matches may exist anywhere in a line, and I want
just "foo.bar" stored to a variable. I haven't yet scripted the part
about storing the substring to the variable, as I haven't yet been able
to verify I have the substring correctly.

I copied the function(extract) exactly. After the BEGIN statement, I
have:
====
while ((getline line < filename) > 0) {
#Look for all words that match foo.bar and store them
if (line ~ / *\.bar/ ) {
#print line;
extract(line," *\\.bar") { print RMATCH }
}
}
====
gawk tells me there's a syntax error at the curly brace at the start of
{ print RMATCH }. I know the rest of that piece is working because if
I uncomment "print line" then I do indeed see every line of the file
containing that match. And yes, I must use gawk.

Thanks,
JP

jeanne.petrangelo@gmail.com

2006-09-18, 9:56 pm

This is relevant to the previous discussion in this thread but is
different enough that it's less confusing for me to start from scratch
in this post. The main problem is that the regular expression is not
always finding the correct pattern match.

Background:
I'm using gawk in a Win32 console window, though that may not be
relevant. I need to extract, in this case, the names of .c or .cpp
source files that may appear at random in an ASCII text file.
Fortunately only one instance of a source file name may appear in any
one line.

Provided this function:
====
# Extract a substring that matches the regular expression
function extract(str,regexp)
{ RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
return RSTART
}
====
.... I have this snippet of code (note the source file name is always in
quotation marks in the text file):
====
if (line ~ /[^\"]*\.cp*/ ) {
extract(line,"[^\"]+\.cp*");
print "The source file is named " RMATCH;
}
====
.... to try to get the quoted name of the source file. When the line is
as follows, the match is "foo.c", which is what I want:
RelativePath="foo.c"


.... but when the name of the source file begins with a c, the match is
either:
RelativePath="c
or
RelativePath="cp

.... (including all the leading whitespace) depending on whether the
source file is "coo.c" or "coo.cpp".

I do not understand why the file name starting with a "c" will change
the behavior, since the regular expression specifies the match must
have a period before the c. Help, please?

Thank you,
JP

jeanne.petrangelo@gmail.com

2006-09-18, 9:56 pm

jeanne.petrangelo@gmail.com wrote:
> This is relevant to the previous discussion in this thread but is
> different enough that it's less confusing for me to start from scratch
> in this post. The main problem is that the regular expression is not
> always finding the correct pattern match.
>
> Background:
> I'm using gawk in a Win32 console window, though that may not be
> relevant. I need to extract, in this case, the names of .c or .cpp
> source files that may appear at random in an ASCII text file.
> Fortunately only one instance of a source file name may appear in any
> one line.
>
> Provided this function:
> ====
> # Extract a substring that matches the regular expression
> function extract(str,regexp)
> { RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
> return RSTART
> }
> ====
> ... I have this snippet of code (note the source file name is always in
> quotation marks in the text file):
> ====
> if (line ~ /[^\"]*\.cp*/ ) {
> extract(line,"[^\"]+\.cp*");
> print "The source file is named " RMATCH;
> }
> ====
> ... to try to get the quoted name of the source file. When the line is
> as follows, the match is "foo.c", which is what I want:
> RelativePath="foo.c"
>
>
> ... but when the name of the source file begins with a c, the match is
> either:
> RelativePath="c
> or
> RelativePath="cp
>
> ... (including all the leading whitespace) depending on whether the
> source file is "coo.c" or "coo.cpp".


I'll add that if I don't call the function, but instead use the guts of
the function directly, the name of the source file is always found
correctly:
====
if (line ~ /[^\"]*\.cp*/ ) {
match(line,/[^\"]*\.cp*/);
temp = substr(line,RSTART,RLENGTH);
print "The source file is named " temp;
}
====
.... could this be a bug in gawk, or is there a finer point of the
language that eludes me?

Thanks,
JP

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com