For Programmers: Free Programming Magazines  


Home > Archive > AWK > February 2007 > reconcile









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author reconcile
srikanth.subramanian@gmail.com

2007-02-07, 9:57 pm

Hi,

I need help in writing up a simple script to do some basic
reconcilliation in unix. I have two files file_1, file_2

>cat file_1

column 2
10010
20302
49890
3920
28222
....

>cat file2

column2
20321
24693
10010
20302
49890
3932
2321
23456
...

I need to show all th records in file_1 that are not in file_2 eg:
3920, 28222 .i.e compare the 2 column in this case.I was thinking of
loading these file columns content to an array and running it but the
large # of records and was was wondering if any other alternative ways
to do it.

-Srikanth

Ulrich M. Schwarz

2007-02-08, 3:57 am

srikanth.subramanian@gmail.com writes:

> Hi,
>
> I need help in writing up a simple script to do some basic
> reconcilliation in unix. I have two files file_1, file_2

[...]
> I need to show all th records in file_1 that are not in file_2 eg:
> 3920, 28222 .i.e compare the 2 column in this case.I was thinking of
> loading these file columns content to an array and running it but the
> large # of records and was was wondering if any other alternative ways
> to do it.


Short of sorting both files (then you can perform something along the
lines of a merge (as in mergesort) in O(1) space), I don't see how
you'd get around having file_2 in memory. (Well, you could remove all
items from the first half of file_2 from file_1 in a first iteration
and then the rest etc. etc.)

Ulrich
--
The Bastard Flatmate From Hell:
"Sticking his toothbrush up my arse?
I'm *not* putting anything there that he's had in his mouth!"
http://www.informatik.uni-kiel.de/~ums/bffh1.html
Lawson Hanson

2007-02-08, 3:57 am

srikanth.subramanian@gmail.com wrote:
> Hi,
>
> I need help in writing up a simple script to do some basic
> reconcilliation in unix. I have two files file_1, file_2
>
> column 2
> 10010
> 20302
> 49890
> 3920
> 28222
> ...
>
> column2
> 20321
> 24693
> 10010
> 20302
> 49890
> 3932
> 2321
> 23456
> ..
>
> I need to show all th records in file_1 that are not in file_2 eg:
> 3920, 28222 .i.e compare the 2 column in this case.I was thinking of
> loading these file columns content to an array and running it but the
> large # of records and was was wondering if any other alternative ways
> to do it.
>
> -Srikanth
>


You could try this approach using AWK's wonderful associative arrays:

# Program:
# prog.awk
{
if (FILENAME == "file_2")
inTwo[$2] = 1
else
if ($2 in inTwo)
print
}

The output should be a list of the lines in file_1 which match entries
that are also contained in file_2

If you saved the code in a script called "prog.awk" you would run it as
follows:

awk -f prog.awk file_2 file_1

Note that it needs to read file_2 first.

Best regards,

Lawson Hanson
goedel

2007-02-08, 7:57 am

On Feb 8, 4:23 am, srikanth.subraman...@gmail.com wrote:
> Hi,
>
> I need help in writing up a simple script to do some basic
> reconcilliation in unix. I have two files file_1, file_2
>
>
> column 2
> 10010
> 20302
> 49890
> 3920
> 28222
> ...
>
>
> column2
> 20321
> 24693
> 10010
> 20302
> 49890
> 3932
> 2321
> 23456
> ..
>
> I need to show all th records in file_1 that are not in file_2 eg:
> 3920, 28222 .i.e compare the 2 column in this case.I was thinking of
> loading these file columns content to an array and running it but the
> large # of records and was was wondering if any other alternative ways
> to do it.
>
> -Srikanth


#usage: awk -f script file 2 file1
#precondition: file2 is not empty
NR==FNR{I[$2]=1;next}
!($2 in I)

Regards,

Steffen Schuler

Ed Morton

2007-02-08, 7:57 am

srikanth.subramanian@gmail.com wrote:
> Hi,
>
> I need help in writing up a simple script to do some basic
> reconcilliation in unix. I have two files file_1, file_2
>
>
>
> column 2
> 10010
> 20302
> 49890
> 3920
> 28222
> ...
>
>
>
> column2
> 20321
> 24693
> 10010
> 20302
> 49890
> 3932
> 2321
> 23456
> ..
>
> I need to show all th records in file_1 that are not in file_2 eg:
> 3920, 28222 .i.e compare the 2 column in this case.I was thinking of
> loading these file columns content to an array and running it but the
> large # of records and was was wondering if any other alternative ways
> to do it.


awk isn't the necessarily best tool that's available on UNIX for this
job, so I'm crossposting and setting followups to comp.unix.shell. In
the meantime, you need to answer these questions to get the best solutions:

1) Are the values in each record unique?
2) If number "1234" appeared twice in file_1 but only once in file_2,
would you want that reported as one of the "records in file_1 that are
not in file_2"?
3) Does the order that the records appear in file_1 have to match the
order they appear in file_2?
4) Does the order of the records being output have to match the order
that they're in one of the files?

Regards,

Ed.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com