Home > Archive > AWK > December 2006 > How to index full context and locate special strings via awk?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
How to index full context and locate special strings via awk?
|
|
|
| Greeting to all,
I have a headache locate strings for the following text file:
10122523 condition 357 abc 9999.12.31 reader 386 CNY 1
ST 160003402 2
2006.07.02
10128945 ST condition 0.01
CNY 1 ST 9999.12.31
54678792 0 ST condition 230 9999.12.31 reader "231" cbd 1 ST 160005053
2006.11.21
10124545 ST uyr iryu condition "1231"
CNY 1 ST 9999.12.31
............
/snip
The result should looks like:
10122523 386
10128945 0.01
54678792 "231"
10124545 "1231"
How to realize this?
This text files is not regualled as lines rules, it shows me that 8
digits number is the symbol of each groups, in these groups, locate
"condition" and "reader", if "reader" exist, pickup the follow strings,
if not, pick the strings after "condition". So I try grep and substr,
but it is no lucky for me, I guess I need full context lookup function
not line by line.
Any tips, thanks a lot,
Zhou
| |
| Ed Morton 2006-12-05, 6:56 pm |
| zhou wrote:
> Greeting to all,
>
> I have a headache locate strings for the following text file:
>
> 10122523 condition 357 abc 9999.12.31 reader 386 CNY 1
> ST 160003402 2
> 2006.07.02
> 10128945 ST condition 0.01
> CNY 1 ST 9999.12.31
> 54678792 0 ST condition 230 9999.12.31 reader "231" cbd 1 ST 160005053
> 2006.11.21
> 10124545 ST uyr iryu condition "1231"
> CNY 1 ST 9999.12.31
>
> ...........
> /snip
>
> The result should looks like:
> 10122523 386
> 10128945 0.01
> 54678792 "231"
> 10124545 "1231"
>
> How to realize this?
>
> This text files is not regualled as lines rules, it shows me that 8
> digits number is the symbol of each groups, in these groups, locate
> "condition" and "reader", if "reader" exist, pickup the follow strings,
> if not, pick the strings after "condition". So I try grep and substr,
> but it is no lucky for me, I guess I need full context lookup function
> not line by line.
>
> Any tips, thanks a lot,
>
> Zhou
>
Something like this should do it if the file isn't too huge:
awk -v RS= '
for (i=1;i<=NF;i++)
if ($i ~ /^[0-9][0-9][0-9][0-9][0-9][0-9]$/) {
if (group)
print group, (reader ? reader : condition)
group = $i
reader = condition = ""
} else if ($i ~ /^reader/) {
reader = $(i + 1)
} else if (($i ~ /^condition/) {
condition = $(i + 1)
}
END {
if (group) print group, (reader ? reader : condition)
}' file
Regards,
Ed.
| |
|
|
"Ed Morton =D0=B4=B5=C0=A3=BA
"
> zhou wrote:
>
>
> Something like this should do it if the file isn't too huge:
>
> awk -v RS=3D '
> for (i=3D1;i<=3DNF;i++)
> if ($i ~ /^[0-9][0-9][0-9][0-9][0-9][0-9]$/) {
> if (group)
> print group, (reader ? reader : condition)
> group =3D $i
> reader =3D condition =3D ""
> } else if ($i ~ /^reader/) {
> reader =3D $(i + 1)
> } else if (($i ~ /^condition/) {
> condition =3D $(i + 1)
> }
> END {
> if (group) print group, (reader ? reader : condition)
> }' file
>
> Regards,
>
> Ed.
Thanks for your quick and perfect scripts, maybe I am so new to awk, I
try this scripts via #!/bin/awk way, save it as a file to execute,
I am setting in POSIX and GNU style, in linux system, so got errors:
awk: fatal: ` RS' is not a legal variable name
So I need to change something to fit this enviroment?
Thanks a lot,
Zhou
| |
| Vassilis 2006-12-05, 6:56 pm |
|
=CE=9F/=CE=97 zhou =CE=AD=CE=B3=CF=81=CE=B1=CF=88=CE=B5:[co
lor=darkred]
> "Ed Morton =E5=86=99=E9=81=93=EF=BC=9A
>
> Thanks for your quick and perfect scripts, maybe I am so new to awk, I
> try this scripts via #!/bin/awk way, save it as a file to execute,
> I am setting in POSIX and GNU style, in linux system, so got errors:
>
> awk: fatal: ` RS' is not a legal variable name
>
> So I need to change something to fit this enviroment?
>
> Thanks a lot,
>
> Zhou[/color]
You should put in the file anything between single quotes. Like this:
#!/bin/awk -f
for (i=3D1;i<=3DNF;i++)
if ($i ~ /^[0-9][0-9][0-9][0-9][0-9][0-9]$/) {
if (group)
print group, (reader ? reader : condition)
group =3D $i
reader =3D condition =3D ""
} else if ($i ~ /^reader/) {
reader =3D $(i + 1)
} else if (($i ~ /^condition/) {
condition =3D $(i + 1)
}
END {
if (group) print group, (reader ? reader : condition)
}
and call it like this: awk -v RS=3D -f script.awk file
Also you could try another approach:
/reader/ {
for (i =3D 1; i <=3D NF; i++)
if ($i =3D=3D "reader")
print $1, $(i + 1)
next
}
/condition/ {
for (i =3D 1; i <=3D NF; i++)
if ($i =3D=3D "condition")
print $1, $(i + 1)
}
|
|
|
|
|