Home > Archive > AWK > July 2006 > Collecting headers from many files
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Collecting headers from many files
|
|
|
| Hi guys, I'm new to awk so please forgive me if I ask silly questions.
Currently I'm working on a project involving about 20 files in a
directory. The files contain data of the form:
*/ Version No: 1.1.1 */
*/ Version Date: 01012006 */
dealid dealdate dealamount ..... about 30 headers seperated by spaces
001, 02022006, 1000, ..... record for each header seperated by commas
002, 06022006, 2500, ......
003, 12032006, 8500, ......
usually between 7-12 records per file
*/ Version No: 2.0.1 */
*/ Version Date: 21012006 */
dealcode dealdate dealamount .....
aaa, 02022006, 1050, .....
aa, 16022006, 5100, ......
bbb, 22052006, 5750, ......
The 3rd line of every file contains the headers, I need to collect
every possible header from all 20 files and make one long line in a new
file and match the records with them. Most of the headers are repeated
in files but a few appear only in certain files. So the above would
become:
VersionNo VersionDate dealid dealcode dealdate dealamount
1.1.1, 01012006, 001, , 02022006, 1000 ...
1.1.1, 01012006, 002, , 06022006, 2500 ...
1.1.1, 01012006, 003, , 12032006, 8500 ...
2.0.1, 21012006, , aaa, 02022006, 1050 ...
2.0.1, 21012006, , aa, 16022006, 5100 ...
2.0.1, 21012006, , bbb, 22052006, 5750 ...
How do I go about doing this?
Your help is much appreciated!
| |
| Xicheng Jia 2006-07-11, 3:56 am |
|
Zain wrote:
> Hi guys, I'm new to awk so please forgive me if I ask silly questions.
> Currently I'm working on a project involving about 20 files in a
> directory. The files contain data of the form:
>
> */ Version No: 1.1.1 */
> */ Version Date: 01012006 */
> dealid dealdate dealamount ..... about 30 headers seperated by spaces
> 001, 02022006, 1000, ..... record for each header seperated by commas
> 002, 06022006, 2500, ......
> 003, 12032006, 8500, ......
> usually between 7-12 records per file
>
> */ Version No: 2.0.1 */
> */ Version Date: 21012006 */
> dealcode dealdate dealamount .....
> aaa, 02022006, 1050, .....
> aa, 16022006, 5100, ......
> bbb, 22052006, 5750, ......
>
> The 3rd line of every file contains the headers, I need to collect
> every possible header from all 20 files and make one long line in a new
> file and match the records with them. Most of the headers are repeated
> in files but a few appear only in certain files. So the above would
> become:
>
> VersionNo VersionDate dealid dealcode dealdate dealamount
> 1.1.1, 01012006, 001, , 02022006, 1000 ...
> 1.1.1, 01012006, 002, , 06022006, 2500 ...
> 1.1.1, 01012006, 003, , 12032006, 8500 ...
> 2.0.1, 21012006, , aaa, 02022006, 1050 ...
> 2.0.1, 21012006, , aa, 16022006, 5100 ...
> 2.0.1, 21012006, , bbb, 22052006, 5750 ...
>
> How do I go about doing this?
> Your help is much appreciated!
awk '
BEGIN {
FS=" *[:,*] *";
OFS=", ";
printf "VersionNo VersionDate dealid dealcode dealdate
dealamount\n";
}
/Version No/ { head = $3; flag = 0; next; };
/Version Date/ { head = head", "$3; next; };
/^dealid/ { flag = 1; next; }
NF > 3 {
$1 = flag ? $1", " : ", "$1;
print head, $0;
}' file*
Xicheng
| |
|
| I'm also trying to add another column to the left of the table which
contains the deal name. The deal name can be found in the filename:
/home/prett/data/current/dealname.file
I've tried using the FILENAME function in awk but it gives the whole
path, I just need the "dealname" part and cut out all the path and
extenstion stuff. I've been trying different methods all morning, I
tried alot of the suggestions in the old posts here, but none of them
work. I even tried assigning the value of FILENAME to a different
variable and work with that but to no avail.
| |
| Ed Morton 2006-07-11, 6:56 pm |
| Zain wrote:
> I'm also trying to add another column to the left of the table which
> contains the deal name.
Always provide context when you post. This is netnews not a web forum.
Assume your current posting has to stand alone. In this case, I assume
you have some previous post that the above sentence relates to since you
used the word "always" but I don't plan to go digging through the
archives to find it and so can't help you with whatever that part of
your posting relates to. See http://en.wikipedia.org/wiki/Netiquette
The deal name can be found in the filename:
>
> /home/prett/data/current/dealname.file
>
> I've tried using the FILENAME function in awk but it gives the whole
> path, I just need the "dealname" part and cut out all the path and
> extenstion stuff. I've been trying different methods all morning, I
> tried alot of the suggestions in the old posts here, but none of them
> work. I even tried assigning the value of FILENAME to a different
> variable and work with that but to no avail.
>
OK, so what I gather is you have a script running on a file
"/home/prett/data/current/dealname.file" and just want to print
"dealname". That's just a couple of sub()s to remove everything up to
the last "/" and everything after the first ".":
$ awk '{f=FILENAME;sub(/.*\//,"",f);sub(/\..*$/,"",f);print f}'
/home/prett/data/current/dealname.file
dealname
Regards,
Ed
| |
|
|
> OK, so what I gather is you have a script running on a file
> "/home/prett/data/current/dealname.file" and just want to print
> "dealname". That's just a couple of sub()s to remove everything up to
> the last "/" and everything after the first ".":
Yes, thats exactly what I want to do.
> $ awk '{f=FILENAME;sub(/.*\//,"",f);sub(/\..*$/,"",f);print f}'
> /home/prett/data/current/dealname.file
> dealname
>
> Regards,
>
> Ed
I tried that Ed, for some reason it doesn't like the sub functions. I
keep getting syntax error and illegal statement messages.
| |
| Xicheng Jia 2006-07-11, 6:56 pm |
| Zain wrote:
> I'm also trying to add another column to the left of the table which
> contains the deal name. The deal name can be found in the filename:
>
> /home/prett/data/current/dealname.file
>
> I've tried using the FILENAME function in awk but it gives the whole
> path, I just need the "dealname" part and cut out all the path and
> extenstion stuff. I've been trying different methods all morning, I
> tried alot of the suggestions in the old posts here, but none of them
> work. I even tried assigning the value of FILENAME to a different
> variable and work with that but to no avail.
awk '
BEGIN {
FS=" *[:,*] *"
OFS=", "
printf "VersionNo VersionDate ..... dealdate dealamount\n"
}
/Version No/ {
n = split(FILENAME,f,"[/.]")
head = f[n-1]", "$3
flag = 0; next
}
/Version Date/ { head = head", "$3; next }
/^dealid/ { flag = 1; next }
NF > 3 {
$1 = flag ? $1", " : ", "$1
print head,$0
}' /path/to/file*
(BTW. 'NF > 3' is for your sample data)
Xicheng
| |
| Ed Morton 2006-07-11, 6:56 pm |
| Zain wrote:
>
>
>
>
> Yes, thats exactly what I want to do.
>
>
>
>
> I tried that Ed, for some reason it doesn't like the sub functions. I
> keep getting syntax error and illegal statement messages.
>
Maybe you're using old, broken awk on Solaris, but I'd have expected
even that to work with the above. Use gawk, nawk, or /usr/xpg4/bin/awk
if you're on Solaris. If that's not it, make sure you did a copy/paste
of my text rather than retyping it, and post exactly what you executed
and the error messages. You could try removing one statement at a time
to see if you can narrow down where the error message is coming from.
Ed.
| |
| Chris F.A. Johnson 2006-07-11, 6:56 pm |
| On 2006-07-11, Ed Morton wrote:
> Zain wrote:
>
> Maybe you're using old, broken awk on Solaris,
There is nothing broken about that version of awk; it's the
original awk, and it was replaced by the end f the 1980s. You
could, perhaps, call Solaris broken for supplying such as old
utility as the default.
> but I'd have expected even that to work with the above.
No. Old awk does not have the sub or gsub function.
> Use gawk, nawk, or /usr/xpg4/bin/awk if you're on Solaris. If that's
> not it, make sure you did a copy/paste of my text rather than
> retyping it, and post exactly what you executed and the error
> messages. You could try removing one statement at a time to see if
> you can narrow down where the error message is coming from.
--
Chris F.A. Johnson, author <http://cfaj.freeshell.org>
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence
| |
| Ed Morton 2006-07-11, 6:56 pm |
| Chris F.A. Johnson wrote:
> On 2006-07-11, Ed Morton wrote:
>
>
>
> There is nothing broken about that version of awk;
One small example:
$ gawk 'BEGIN{print sprintf("The magic number is"), 3; exit}'
The magic number is 3
$ nawk 'BEGIN{print sprintf("The magic number is"), 3; exit}'
The magic number is 3
$ /usr/xpg4/bin/awk 'BEGIN{print sprintf("The magic number is"), 3; exit}'
The magic number is 3
$ awk 'BEGIN{print sprintf("The magic number is"), 3; exit}'
The magic number is
You could, I suppose, make a case for something not being technically
"broken" if that was the designers intent but "broken"s a more
convenient way of expressing the condition as it relates to it's
intended user compared to "functioning as designed but not as expected,
not as any of it's peers, and not in a useful fashion" so I'll stick
with "broken".
Ed.
| |
|
| > Maybe you're using old, broken awk on Solaris, but I'd have expected
> even that to work with the above. Use gawk, nawk, or /usr/xpg4/bin/awk
> if you're on Solaris. If that's not it, make sure you did a copy/paste
> of my text rather than retyping it, and post exactly what you executed
> and the error messages. You could try removing one statement at a time
> to see if you can narrow down where the error message is coming from.
>
> Ed.
Sorry guys, I should have said that I'm only using old awk. Fortunately
I found out that there was a file containing a list of all deal names.
This made things alot easier and more conveniant for later tasks. But
thanks to everyone for their advice and thank you to Xicheng for the
code on collating the headers and records.
|
|
|
|
|