Code Comments
Programming Forum and web based access to our favorite programming groups.Hi,
I have a DOS file in this format:
Ace Construction Co., Inc.
Fax: 123-555-1212
101 Main St., Ste 23., Jensen Beach FL 34958
321-555-6523
< second record>
< third record>
....
I would like to get it into the following format:
Ace Construction Co., Inc.|Fax: 123-555-1212|101 Main St., Ste 23.,
|Jensen Beach |FL|34958|321-555-6523
Is there anyway that I can do it in awk. I am using the following
program, but don't know how to split the address into <street
address>|city|state|zip. I am not a programmer and any help would be
appreciated. Thank you.
BEGIN { OFS = "|" }
{
if (NR % 4)
{gsub("^M", OFS) <removing dos line breaks>
printf("%s", $0)
}
else
printf("%s\n", $0)
}
END { }.
Post Follow-up to this messagembalu wrote:
> Hi,
>
> I have a DOS file in this format:
> Ace Construction Co., Inc.
> Fax: 123-555-1212
> 101 Main St., Ste 23., Jensen Beach FL 34958
> 321-555-6523
> < second record>
> < third record>
> ....
>
> I would like to get it into the following format:
> Ace Construction Co., Inc.|Fax: 123-555-1212|101 Main St., Ste 23.,
> |Jensen Beach |FL|34958|321-555-6523
* BEGIN { FS = ", +" }
* (NR-1)%4<2 { rec = rec $0 "|" ; next }
* 3==NR%4 {
* sub( /^ +/, "" )
* for (i=1; i<NF; i++)
* rec = rec $i ", "
* sub( / $/, "", rec )
* rec = rec "|"
* $0 = $NF
* match( $0, / [A-Z][A-Z] / )
* rec = rec substr($0,1,RSTART-1) "|"
* rec = rec substr($0,RSTART+1,2) "|"
* rec = rec substr($0,RSTART+4) "|"
* next
* }
* { print rec $0 ; rec = "" }
Post Follow-up to this message
mbalu wrote:
> Hi,
>
> I have a DOS file in this format:
> Ace Construction Co., Inc.
> Fax: 123-555-1212
> 101 Main St., Ste 23., Jensen Beach FL 34958
> 321-555-6523
> < second record>
> < third record>
> ....
>
> I would like to get it into the following format:
> Ace Construction Co., Inc.|Fax: 123-555-1212|101 Main St., Ste 23.,
> |Jensen Beach |FL|34958|321-555-6523
<snip>
This should do it in gawk for the sample you showed:
gawk '{gsub("^M","")}
++num==3{$0=gensub(/^[[:blank:]]*(.*,) (.*) (.*) ([[:digit:]]*)$/,
"\\1|\\2|\\3|\\4","") }
{ line = line sep $0; sep="|" }
!(NR%4){print line; num=0; line=""; sep=""}'
Playing with the gensub patterns should fix any problems in parsing your
real input.
Regards,
Ed.
Post Follow-up to this messageUsing standard Awk:
* 3==NR%4 { sub(/.*,/, "&|")
* $(NF-1) = "|" $(NF-1) "|" }
* 0==NR%4 {gsub(/[, ]*\| */, "|", rec)
* print rec $0; rec=""; next}
* {rec = rec $0 "|"}
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.