For Programmers: Free Programming Magazines  


Home > Archive > AWK > November 2004 > Pemanent awk newbie









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Pemanent awk newbie
Scottie Hannigan

2004-11-30, 3:56 pm

Hi,
I need to do some data conversion on some large text files (on a HP3000)
before importing them into a new system. My HP3000 allows me to shell out to
a POSIX prompt where I can use awk.

This is going to be a one-off job and as I have neither the time nor the
brains to master awk, I think I will forever be a permanent awk newbie.

So I will could do with some help, please.

----------

My data file is French names and addresses in plain block capital text.
Data is in uppercase without accents.

Input :
FRANCOISE DESPRES;11 ALLEE DU CHATEAU; 85610 LA BERNARDIERE

Which should be, ideally, converted to :
Françoise Desprès;11 Allee du Château; 85610 La Bernardière
------------
So the aim of the game is to get the data

1. out of block capitals and into "Initial style", i.e. uppercased first
letter of each word and the rest of each word lowercased.
2. Then replace the unaccented words accented using the following
correcting files.

I have, in separate files,
1. a list of French firstnames in "Initials" style correctly spelt with
accents i.e. :
Jean
René
Françoise
Marie-Thérèse
Etc.

Which I could modify myself so as to have two fields.
First the wrong spelling, i.e. without the accents, and second, the right
spelling, i.e. with the accents:
Rene;René
Francoise;Françoise
Marie-Therese;Marie-Thérèse
Etc.

2. a list of French town names in "Initials" style correctly spelt with
accents which I could also modify so as to have two fields. First the wrong
spelling, i.e. without the accents, and second the right spelling, i.e. with
the accents.

As I see it, the best method would be to
1. Run through the entire file converting every word into "Initials"
case.
Eg.
Input : FRANCOISE DESPRES;11 ALLEE DU CHÂTEAU; 85610 LA BERNARDIERE
Output : Francoise Despres;11 Allee Du Chateau; 85610 La Bernardiere
(NameAndAd -> NameAndAd1)

2. Then somehow get awk to use the fistnames file to correct the names
and addresses file.
(Fistnames + NameAndAd1 -> NameAndAd2)

3. Then somehow get awk use the townnames file to correct the names and
addresses file.
(Townnames + NameAndAd2 -> NameAndAd3)
----------
Does that make sense ?
All and any help appreciated.

Scottie


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com