For Programmers: Free Programming Magazines  


Home > Archive > AWK > July 2006 > Collecting headers from many files









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Collecting headers from many files
Zain

2006-07-10, 6:56 pm

Hi guys, I'm new to awk so please forgive me if I ask silly questions.
Currently I'm working on a project involving about 20 files in a
directory. The files contain data of the form:

*/ Version No: 1.1.1 */
*/ Version Date: 01012006 */
dealid dealdate dealamount ..... about 30 headers seperated by spaces
001, 02022006, 1000, ..... record for each header seperated by commas
002, 06022006, 2500, ......
003, 12032006, 8500, ......
usually between 7-12 records per file

*/ Version No: 2.0.1 */
*/ Version Date: 21012006 */
dealcode dealdate dealamount .....
aaa, 02022006, 1050, .....
aa, 16022006, 5100, ......
bbb, 22052006, 5750, ......

The 3rd line of every file contains the headers, I need to collect
every possible header from all 20 files and make one long line in a new
file and match the records with them. Most of the headers are repeated
in files but a few appear only in certain files. So the above would
become:

VersionNo VersionDate dealid dealcode dealdate dealamount
1.1.1, 01012006, 001, , 02022006, 1000 ...
1.1.1, 01012006, 002, , 06022006, 2500 ...
1.1.1, 01012006, 003, , 12032006, 8500 ...
2.0.1, 21012006, , aaa, 02022006, 1050 ...
2.0.1, 21012006, , aa, 16022006, 5100 ...
2.0.1, 21012006, , bbb, 22052006, 5750 ...

How do I go about doing this?
Your help is much appreciated!

Xicheng Jia

2006-07-11, 3:56 am


Zain wrote:
> Hi guys, I'm new to awk so please forgive me if I ask silly questions.
> Currently I'm working on a project involving about 20 files in a
> directory. The files contain data of the form:
>
> */ Version No: 1.1.1 */
> */ Version Date: 01012006 */
> dealid dealdate dealamount ..... about 30 headers seperated by spaces
> 001, 02022006, 1000, ..... record for each header seperated by commas
> 002, 06022006, 2500, ......
> 003, 12032006, 8500, ......
> usually between 7-12 records per file
>
> */ Version No: 2.0.1 */
> */ Version Date: 21012006 */
> dealcode dealdate dealamount .....
> aaa, 02022006, 1050, .....
> aa, 16022006, 5100, ......
> bbb, 22052006, 5750, ......
>
> The 3rd line of every file contains the headers, I need to collect
> every possible header from all 20 files and make one long line in a new
> file and match the records with them. Most of the headers are repeated
> in files but a few appear only in certain files. So the above would
> become:
>
> VersionNo VersionDate dealid dealcode dealdate dealamount
> 1.1.1, 01012006, 001, , 02022006, 1000 ...
> 1.1.1, 01012006, 002, , 06022006, 2500 ...
> 1.1.1, 01012006, 003, , 12032006, 8500 ...
> 2.0.1, 21012006, , aaa, 02022006, 1050 ...
> 2.0.1, 21012006, , aa, 16022006, 5100 ...
> 2.0.1, 21012006, , bbb, 22052006, 5750 ...
>
> How do I go about doing this?
> Your help is much appreciated!


awk '
BEGIN {
FS=" *[:,*] *";
OFS=", ";
printf "VersionNo VersionDate dealid dealcode dealdate
dealamount\n";
}
/Version No/ { head = $3; flag = 0; next; };
/Version Date/ { head = head", "$3; next; };
/^dealid/ { flag = 1; next; }
NF > 3 {
$1 = flag ? $1", " : ", "$1;
print head, $0;
}' file*

Xicheng

Zain

2006-07-11, 6:56 pm

I'm also trying to add another column to the left of the table which
contains the deal name. The deal name can be found in the filename:

/home/prett/data/current/dealname.file

I've tried using the FILENAME function in awk but it gives the whole
path, I just need the "dealname" part and cut out all the path and
extenstion stuff. I've been trying different methods all morning, I
tried alot of the suggestions in the old posts here, but none of them
work. I even tried assigning the value of FILENAME to a different
variable and work with that but to no avail.

Ed Morton

2006-07-11, 6:56 pm

Zain wrote:
> I'm also trying to add another column to the left of the table which
> contains the deal name.


Always provide context when you post. This is netnews not a web forum.
Assume your current posting has to stand alone. In this case, I assume
you have some previous post that the above sentence relates to since you
used the word "always" but I don't plan to go digging through the
archives to find it and so can't help you with whatever that part of
your posting relates to. See http://en.wikipedia.org/wiki/Netiquette

The deal name can be found in the filename:
>
> /home/prett/data/current/dealname.file
>
> I've tried using the FILENAME function in awk but it gives the whole
> path, I just need the "dealname" part and cut out all the path and
> extenstion stuff. I've been trying different methods all morning, I
> tried alot of the suggestions in the old posts here, but none of them
> work. I even tried assigning the value of FILENAME to a different
> variable and work with that but to no avail.
>


OK, so what I gather is you have a script running on a file
"/home/prett/data/current/dealname.file" and just want to print
"dealname". That's just a couple of sub()s to remove everything up to
the last "/" and everything after the first ".":

$ awk '{f=FILENAME;sub(/.*\//,"",f);sub(/\..*$/,"",f);print f}'
/home/prett/data/current/dealname.file
dealname

Regards,

Ed
Zain

2006-07-11, 6:56 pm




> OK, so what I gather is you have a script running on a file
> "/home/prett/data/current/dealname.file" and just want to print
> "dealname". That's just a couple of sub()s to remove everything up to
> the last "/" and everything after the first ".":


Yes, thats exactly what I want to do.

> $ awk '{f=FILENAME;sub(/.*\//,"",f);sub(/\..*$/,"",f);print f}'
> /home/prett/data/current/dealname.file
> dealname
>
> Regards,
>
> Ed


I tried that Ed, for some reason it doesn't like the sub functions. I
keep getting syntax error and illegal statement messages.

Xicheng Jia

2006-07-11, 6:56 pm

Zain wrote:
> I'm also trying to add another column to the left of the table which
> contains the deal name. The deal name can be found in the filename:
>
> /home/prett/data/current/dealname.file
>
> I've tried using the FILENAME function in awk but it gives the whole
> path, I just need the "dealname" part and cut out all the path and
> extenstion stuff. I've been trying different methods all morning, I
> tried alot of the suggestions in the old posts here, but none of them
> work. I even tried assigning the value of FILENAME to a different
> variable and work with that but to no avail.


awk '
BEGIN {
FS=" *[:,*] *"
OFS=", "
printf "VersionNo VersionDate ..... dealdate dealamount\n"
}
/Version No/ {
n = split(FILENAME,f,"[/.]")
head = f[n-1]", "$3
flag = 0; next
}
/Version Date/ { head = head", "$3; next }
/^dealid/ { flag = 1; next }
NF > 3 {
$1 = flag ? $1", " : ", "$1
print head,$0
}' /path/to/file*

(BTW. 'NF > 3' is for your sample data)
Xicheng

Ed Morton

2006-07-11, 6:56 pm

Zain wrote:
>
>
>
>
> Yes, thats exactly what I want to do.
>
>
>
>
> I tried that Ed, for some reason it doesn't like the sub functions. I
> keep getting syntax error and illegal statement messages.
>


Maybe you're using old, broken awk on Solaris, but I'd have expected
even that to work with the above. Use gawk, nawk, or /usr/xpg4/bin/awk
if you're on Solaris. If that's not it, make sure you did a copy/paste
of my text rather than retyping it, and post exactly what you executed
and the error messages. You could try removing one statement at a time
to see if you can narrow down where the error message is coming from.

Ed.
Chris F.A. Johnson

2006-07-11, 6:56 pm

On 2006-07-11, Ed Morton wrote:
> Zain wrote:
>
> Maybe you're using old, broken awk on Solaris,


There is nothing broken about that version of awk; it's the
original awk, and it was replaced by the end f the 1980s. You
could, perhaps, call Solaris broken for supplying such as old
utility as the default.

> but I'd have expected even that to work with the above.


No. Old awk does not have the sub or gsub function.

> Use gawk, nawk, or /usr/xpg4/bin/awk if you're on Solaris. If that's
> not it, make sure you did a copy/paste of my text rather than
> retyping it, and post exactly what you executed and the error
> messages. You could try removing one statement at a time to see if
> you can narrow down where the error message is coming from.


--
Chris F.A. Johnson, author <http://cfaj.freeshell.org>
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence
Ed Morton

2006-07-11, 6:56 pm

Chris F.A. Johnson wrote:
> On 2006-07-11, Ed Morton wrote:
>
>
>
> There is nothing broken about that version of awk;


One small example:

$ gawk 'BEGIN{print sprintf("The magic number is"), 3; exit}'
The magic number is 3
$ nawk 'BEGIN{print sprintf("The magic number is"), 3; exit}'
The magic number is 3
$ /usr/xpg4/bin/awk 'BEGIN{print sprintf("The magic number is"), 3; exit}'
The magic number is 3
$ awk 'BEGIN{print sprintf("The magic number is"), 3; exit}'
The magic number is

You could, I suppose, make a case for something not being technically
"broken" if that was the designers intent but "broken"s a more
convenient way of expressing the condition as it relates to it's
intended user compared to "functioning as designed but not as expected,
not as any of it's peers, and not in a useful fashion" so I'll stick
with "broken".

Ed.
Zain

2006-07-12, 3:56 am

> Maybe you're using old, broken awk on Solaris, but I'd have expected
> even that to work with the above. Use gawk, nawk, or /usr/xpg4/bin/awk
> if you're on Solaris. If that's not it, make sure you did a copy/paste
> of my text rather than retyping it, and post exactly what you executed
> and the error messages. You could try removing one statement at a time
> to see if you can narrow down where the error message is coming from.
>
> Ed.


Sorry guys, I should have said that I'm only using old awk. Fortunately
I found out that there was a file containing a list of all deal names.
This made things alot easier and more conveniant for later tasks. But
thanks to everyone for their advice and thank you to Xicheng for the
code on collating the headers and records.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com