For Programmers: Free Programming Magazines  


Home > Archive > AWK > September 2006 > awk escape special characters









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author awk escape special characters
Dave

2006-09-12, 6:56 pm

Hi,

Can anyone help me with this problem?
I'm trying to use a REGEX in match(), but REGEX comes from an array I load
from file (getline) and contains special characters such as ()[]. etc. I'm
trying to write a function which will escape the REGEX string: This is to
parse a badly formatted data to output $1 and $3. I have an 'infile' that
has $1 and $2, so I'm using the infile's $2 to find true position of $3 via
match().

Below example may be a bit clearer to show what I'm trying to do:

> cat infile

1234[TAB]ab(cd)e$fg

#!/bin/sh
#script.sh
echo "1234 ab(cd)e$fg 5678" | \
nawk -F" *" -v "infile=$infile" '
function escape_regex(string) {
???
return escaped_string;
}
BEGIN {
while ((getline line < infile) > 0) {
split(line, field, "[TAB]");
FLD[field[1]]=escape_regex(field[2]);
}
}
{
match($0,FLD[$1]);
RESULT=printf("%s\t%s\n", $1, substr($0,RSTART+RLENGTH);
print RESULT;

}
END{}'
exit
> script.sh

5678

Many thanks,

Dave


John DuBois

2006-09-12, 6:56 pm

In article <12gdmhii4g7uoba@corp.supernews.com>,
Dave <withheld@nospam.thanks> wrote:
>Hi,
>
>Can anyone help me with this problem?
>I'm trying to use a REGEX in match(), but REGEX comes from an array I load
>from file (getline) and contains special characters such as ()[]. etc. I'm
>trying to write a function which will escape the REGEX string: This is to
>parse a badly formatted data to output $1 and $3. I have an 'infile' that
>has $1 and $2, so I'm using the infile's $2 to find true position of $3 via
>match().
>
>Below example may be a bit clearer to show what I'm trying to do:
>
>1234[TAB]ab(cd)e$fg
>
>#!/bin/sh
>#script.sh
>echo "1234 ab(cd)e$fg 5678" | \
>nawk -F" *" -v "infile=$infile" '
>function escape_regex(string) {
> ???
> return escaped_string;
>}
>BEGIN {
> while ((getline line < infile) > 0) {
> split(line, field, "[TAB]");
> FLD[field[1]]=escape_regex(field[2]);
> }
>}
>{
> match($0,FLD[$1]);
> RESULT=printf("%s\t%s\n", $1, substr($0,RSTART+RLENGTH);
> print RESULT;
>
>}
>END{}'
>exit


Well, you could take the simple approach of replacing every character with
\character: gsub(".", "\\\\&", string). Or you could be more selective.
But if you want to find the location of a fixed string in another string, you
really shouldn't use match; use index() instead - something along the lines of

printf "%s\t%s\n", $1, substr($0, index($0, FLD[$1]) + length(FLD[$1]))

(where you've stored the un-escaped versions in FLD).

John
--
John DuBois spcecdt@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com