Home > Archive > AWK > January 2005 > Fixed length records containing 2 different records types with fixed field widths
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Fixed length records containing 2 different records types with fixed field widths
|
|
| Rookie Card 2005-01-07, 3:55 am |
| This is a fun one. I am beginning to receive fixed length record ASCII
files but they contain 2 different record types.
- All records are 290 in length but have different field widths.
- The field widths for each of the 2 record types is fixed
- One record type always begins with an "A" in the 73 position.
- I need to parse the records to 2 new comma separated text files.
I have bought 2 books and researched and still cannot get the syntax
even close.
I need to do something like this:
(Please forgive the sloppy syntax, I'm a rookie)
gawk {If ".{75}"= A}
{ print $1, $2, $3, $4, $5}" FIELDWIDTHS="75 45 50 75 45" OFS=,
RECTA.txt;
else
{ print $1, $2, $3, $4, $5}" FIELDWIDTHS="45 50 45 75 75" OFS=,
>RECTB.txt;
REC290.txt <<<< That's my ASCII file
In other words, I want to parse REC290.txt to RECTA.txt for (the
"A" in the 73 position) record type and parse the other record type
(Which has no distinguishing characteristics) to RECTB.txt
Any help would be very gratefully appreciated.
| |
| Kenny McCormack 2005-01-07, 3:55 am |
| In article <1104881851.636834.302920@z14g2000cwz.googlegroups.com>,
Rookie Card <mis_pro@yahoo.com> wrote:
>This is a fun one. I am beginning to receive fixed length record ASCII
>files but they contain 2 different record types.
>
>- All records are 290 in length but have different field widths.
>- The field widths for each of the 2 record types is fixed
>- One record type always begins with an "A" in the 73 position.
>- I need to parse the records to 2 new comma separated text files.
I would imagine something like:
BEGIN {
fw[0]="45 50 45 75 75"
fw[1]="75 45 50 75 45"
OFS=","
}
{FIELDWIDTHS = fw[substr($0,73,1) == "A"];$0=$0}
{ ... rest of program goes here ... }
| |
| Rookie Card 2005-01-10, 3:56 pm |
| William -
The files don't have leading spaces but do have spaces in the middle
Like this:
200409113 834736A90028
CLIENTID 00VNI112B92658
CLIE ID000VNI118S98271
200411 34129983A93065
The spaces are never consistant as they represent a fixed lenth field
and the data is not always the same legnth. Although the posistion of
the records identifier ( A ) is always the same if you count the
spaces. (Fixed length records with fixed field widths)
I also tried awk '/A/' REC18.txt
the output returned all records.
| |
| Rookie Card 2005-01-10, 3:56 pm |
| William,
What version of gawk are you using? I am using gnu gawk 3.1.3
We used the same script and the same data.
It filtered the records perfect for you and not for me.
Then the only difference would be the executable.
Right? I could be mistaken. I do that alot.
Well, there is always the X factor. My problem may be between the Chair
and the Keyboard.
If you could tell what version of gawk you are using that would be very
helpful.
Gary
| |
| William James 2005-01-10, 3:56 pm |
|
Rookie Card wrote:
> William,
> What version of gawk are you using? I am using gnu gawk 3.1.3
> We used the same script and the same data.
> It filtered the records perfect for you and not for me.
> Then the only difference would be the executable.
> Right? I could be mistaken. I do that alot.
>
> Well, there is always the X factor. My problem may be between the
Chair
> and the Keyboard.
> If you could tell what version of gawk you are using that would be
very
> helpful.
> Gary
GNU Awk 3.0.3 and Kernighan's awk and mawk.
I've converted spaces to commas and saved this in file "data":
200409113,,834736A90028
CLIENTID,00VNI112B92658
CLIE,,ID000VNI118S98271
200411,,,34129983A93065
This command line
awk "\"A\"==substr($0,18,1)" data
produces this output, using any of those awks:
200409113,,834736A90028
200411,,,34129983A93065
| |
| William James 2005-01-10, 3:56 pm |
| Ed Morton wrote:
>Rookie Card wrote:
>
>
>
>Given your other postings where you didn't use the above syntax, I
doubt
>if gawk was really the problem. Try it again using exactly the syntax
above.
Rookie, it would be incredible if gawk was the problem. The older
version
worked for me and the newer version almost certainly will.
| |
| Rookie Card 2005-01-10, 3:56 pm |
| Ed -
awk '{ print substr($0,18,1) }' REC18.txt
A
B
S
A
Then I try
awk '{ print "A"==substr($0,18,1) }' REC18.txt
0
0
0
0
This is with the example data: REC18.txt
20040911324834736A90028
CLIENTID000VNI112B92658
CLIENTID000VNI118S98271
20041112534129983A93065
Thanks
Gary / Rookie Card
| |
| Rookie Card 2005-01-12, 8:55 am |
| Thanks Kenny,
That puts me different direction. I'm still fighting syntax errors but
slowly making progress.
I will post the final code when I get it figured out. If I get stuck,
which I probably will, I'll post the code and detail where it is
choking.
| |
| Rookie Card 2005-01-12, 8:55 am |
| I need to backup a bit. My client keeps changing their requirements. (6
new files with 14 records types)
Lets forget the 2 different record types to two different files for
now.
I've got the parsing and field spliting figured out. (Thanks to John,
Janis and Jim's posts)
I just cant seem to figure out how to filter records based on the
position of a character in the record using awk.
here's the example:
The only output I would want would be records with an "A" in the 18
position
input file would look like this: (let call it REC18.txt)
20040911324834736A90028
CLIENTID000VNI112B92658
CLIENTID000VNI118S98271
20041112534129983A93065
The output I want would look like this: (Lets call it RECA.txt)
20040911324834736A90028
20041112534129983A93065
I tried:
gawk 'BEGIN { print substr($0,18) == "A"}' REC18.txt >RECA.txt
Output was:
1
And tried:
gawk ' { print substr(18,1) =="A"}' REC18.txt >RECA.txt
Output was:
0
0
0
0
0
gawk 'BEGIN { print substr($0,18,1) == "A"}' REC18.txt >RECA.txt
Output was:
1
amoung many "tries"
I RTFM, Debug for hours and searched news groups. Its got to be there.
Right?
I thank you in advance for your help.
Gary / Rookie Card
| |
| Rookie Card 2005-01-12, 8:55 am |
| Ed -
awk '{ print substr($0,18,1) }' REC18.txt
A
B
S
A
Then I try
awk '{ print "A"==substr($0,18,1) }' REC18.txt
0
0
0
0
This is with the example data: REC18.txt
20040911324834736A90028
CLIENTID000VNI112B92658
CLIENTID000VNI118S98271
20041112534129983A93065
Thanks
Gary / Rookie Card
| |
| William James 2005-01-12, 8:55 am |
| >Although the posistion of
>the records identifier ( A ) is always the same if you count the
>spaces.
In
200409113 834736A90028
A is the 17th character and is the 7th character in the
space-delimited field.
In
200411 34129983A93065
A is the 18th character and is the 9th character in the field.
Try this:
awk '$2 ~ /[0-9]A[0-9]/' REC18.txt
| |
| William James 2005-01-12, 8:55 am |
| Don't feel too bad, Rookie Card. We all make embarassing mistakes.
>I ran:
>gawk "{ print "A"=substr($0,18,$0)}" REC18.txt
In the future, don't retype the posted code; copy and paste it.
That way you won't make any typing errors.
| |
| Rookie Card 2005-01-12, 8:55 am |
| William
Kernighan's awk worked perfectly. Your code and Ed's code all worked
fine in Kernighan's awk.
gawk "{ print "\"A\"==substr($0,18,1)}" REC18.txt
Output was:
20040911324834736A90028
20041112534129983A93065
Thats what I've been looking for!
Besides the problem between the Chair and Keyboard I also had big
problems with the gnu version 3.1.3 for win32.
I want to thank you, Ed and Kenny. I know I have been a bit annoying.
- Also, can anyone suggest a good book on the awk lang as I will be
using it alot this year
Gary / Rookie Card / Annoying Lame Newbie
|
|
|
|
|