Home > Archive > AWK > November 2007 > How-to use values from 2 input files
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
How-to use values from 2 input files
|
|
| di98mase 2007-10-16, 6:58 pm |
| HI all,
I have the following structure:
>gawk -f myawkprogram.awk input1.txt input2.txt
Now what I dont understand is how do I get the values from the input
files? will they be used in some kind of order ie input1 first then
input2 then?
The thing I want to do is that I want to compare data on row 1 in
input1.txt with data on row 1 in input2.txt, how can this be done...
input1.txt
data=23
data=33
data=4
input2.txt
data=23
data=32
data=3
I want to compare the data and if two fields are NOT equal I would
like to print out a message. But HOW shall I do to first read line 1
in file 1 then line 1 in file 2 and then continue to row 2 until all
rows are compared?
How shall I write myawkprogram.awk in order to achieve this?
/di98mase
| |
| Janis Papanagnou 2007-10-16, 6:58 pm |
| di98mase wrote:
> HI all,
>
> I have the following structure:
>
> Now what I dont understand is how do I get the values from the input
> files? will they be used in some kind of order ie input1 first then
> input2 then?
Your program will read the files sequentially, yes.
>
> The thing I want to do is that I want to compare data on row 1 in
> input1.txt with data on row 1 in input2.txt, how can this be done...
You know there are usually other tools that do that job better than
awk (on Unix e.g. 'diff').
>
> input1.txt
> data=23
> data=33
> data=4
>
> input2.txt
> data=23
> data=32
> data=3
>
> I want to compare the data and if two fields are NOT equal I would
> like to print out a message. But HOW shall I do to first read line 1
> in file 1 then line 1 in file 2 and then continue to row 2 until all
> rows are compared?
If the files are not too huge you can read the first file into memory.
# condition where you are in the first file; NR==FNR
NR==FNR { mem[NR] = $0; next }
# because of the 'next' you are now in the second file; NR!=FNR
mem[FNR] != $0 { print "Some message for line " FNR }
There are other possibilities to implement the task (use of 'getline'
and 'ARGV[]'), but the above might suffice for your purpose.
Janis
>
> How shall I write myawkprogram.awk in order to achieve this?
>
> /di98mase
>
| |
| Steffen Schuler 2007-10-16, 6:58 pm |
| Hi di98mase, hello netlanders,
On Tue, 16 Oct 2007 11:58:15 -0700, di98mase wrote:
<snip>
> I have the following structure:
>
<snip>
> I want to compare the data and if two fields are NOT equal I would like
> to print out a message.
<snip>
>
> How shall I write myawkprogram.awk in order to achieve this?
<snip>
this gawk-script does your job --- but 'diff' is better:
------------------------------------------------------------
function printmsg(n1, n2, nth) {
if (n1 == n2)
printf "line %d missing in the %s file\n", n1, nth
else
printf "lines %d - %d missing in the %s file\n", n1, n2, nth
}
ARGIND == 1 {
a[NR] = $0
next
}
FNR in a && a[FNR] != $0 {
printf "difference in line %d\n", FNR
}
END {
n = NR - FNR
if (n < FNR)
printmsg(n + 1, FNR, "first")
else if (FNR < n)
printmsg(FNR + 1, n, "second")
}
-------------------------------------------------------
Kind Regards,
Steffen "goedel" Schuler
| |
| di98mase 2007-10-16, 6:58 pm |
| On 16 Okt, 22:00, Steffen Schuler <schuler.stef...@googlemail.com>
wrote:
> Hi di98mase, hello netlanders,
>
> On Tue, 16 Oct 2007 11:58:15 -0700, di98mase wrote:
>
> <snip>> I have the following structure:
>
> <snip>
> <snip>
>
>
> <snip>
>
> this gawk-script does your job --- but 'diff' is better:
>
> ------------------------------------------------------------
> function printmsg(n1, n2, nth) {
> if (n1 == n2)
> printf "line %d missing in the %s file\n", n1, nth
> else
> printf "lines %d - %d missing in the %s file\n", n1, n2, nth
>
> }
>
> ARGIND == 1 {
> a[NR] = $0
> next}
>
> FNR in a && a[FNR] != $0 {
> printf "difference in line %d\n", FNR}
>
> END {
> n = NR - FNR
> if (n < FNR)
> printmsg(n + 1, FNR, "first")
> else if (FNR < n)
> printmsg(FNR + 1, n, "second")}
>
> -------------------------------------------------------
>
> Kind Regards,
>
> Steffen "goedel" Schuler
Hi again,
well diff could perhaps be used but the thing is that my input files
does not look like my example above the "data" is different in file1
and file2. the thing is that I use awk for processing of files and
this is the "final" step in a series of processing i.e evaluate the
result. That iwshy I would like to prefer awk. (Also I enjoy learning
it).
Janis, I dont understand you code completely the section:
# condition where you are in the first file; NR==FNR
NR==FNR { mem[NR] = $0; next }
# because of the 'next' you are now in the second file; NR!=FNR
mem[FNR] != $0 { print "Some message for line " FNR }
is a bit confusing for me. This is how I tried to use it and it seems
to work:) :
BEGIN {
}
NR==FNR {
mem[NR] = $0;
next;
}
mem[FNR] != $0 {
printf("Diff found\n");
}
END{
}
However, I am still having trouble understandning the NR==FNR and your
comment "because of the 'next' you are now in the second file; NR!
=FNR"
This is my understanding of the script, please tell me what I have
missunderstood:
as long as NR is equal to FNR store $0 into the mem array. but what
will next do, read the next line in input2.txt or what? What do you
(I) mean by sequentially? Will awk read all lines in fil21 first then
all lines in file2 or first row in file1 then first row in file2 then
second row in file1 and second in file2 etc...
If the latter is the case then I think I understand what the next will
do for me...simply ignor all rows from file2, or?
Then the next condition mem[FNR]? FNR is the total number of file
records ie rows in file1 plus all rows in file2 is it? In my example
FNR=6 and NR=3. or??
Please help my straighten out my questions here? Yet so simple and few
lines but still not clear to me.
Regards,
/di98
| |
| Janis Papanagnou 2007-10-16, 6:58 pm |
| di98mase wrote:
> On 16 Okt, 22:00, Steffen Schuler <schuler.stef...@googlemail.com>
> wrote:
>
>
>
>
> Hi again,
>
> well diff could perhaps be used but the thing is that my input files
> does not look like my example above the "data" is different in file1
> and file2. the thing is that I use awk for processing of files and
> this is the "final" step in a series of processing i.e evaluate the
> result. That iwshy I would like to prefer awk. (Also I enjoy learning
> it).
>
> Janis, I dont understand you code completely the section:
I add some comments below...
>
> # condition where you are in the first file; NR==FNR
> NR==FNR { mem[NR] = $0; next }
> # because of the 'next' you are now in the second file; NR!=FNR
> mem[FNR] != $0 { print "Some message for line " FNR }
>
> is a bit confusing for me. This is how I tried to use it and it seems
> to work:) :
> BEGIN {
> }
An empty BEGIN section may be omitted.
>
> NR==FNR {
> mem[NR] = $0;
> next;
> }
You don't need semicolons if you put one statement per line.
Otherwise it's exactly what I suggested.
>
> mem[FNR] != $0 {
> printf("Diff found\n");
> }
>
> END{
> }
An empty END section may also be omitted.
> However, I am still having trouble understandning the NR==FNR and your
> comment "because of the 'next' you are now in the second file; NR!
> =FNR"
>
>
> This is my understanding of the script, please tell me what I have
> missunderstood:
>
> as long as NR is equal to FNR store $0 into the mem array. but what
> will next do, read the next line in input2.txt or what? What do you
> (I) mean by sequentially? Will awk read all lines in fil21 first then
> all lines in file2 or first row in file1 then first row in file2 then
> second row in file1 and second in file2 etc...
I assume that your imput files input1.txt and input2.txt have the
contents that you posted originally; then the following data will
be read by the program, record by record (line by line)...
data=23
data=33
data=4
data=23
data=32
data=3
It will read the contents of the first file then the contents of the
second file. Columns are not relevant the way awk reads the data, it
will read the files one after the other line-wise.
>
> If the latter is the case then I think I understand what the next will
> do for me...simply ignor all rows from file2, or?
The 'next' is a control statement that instructs awk to not consider
any subsequent condition/action pairs in the program; it will read the
next record (line) and continue again with the first condition/action
in the program.
Data from file2 will not be considered at all because first the program
will read and process all lines from file1.
>
> Then the next condition mem[FNR]? FNR is the total number of file
> records ie rows in file1 plus all rows in file2 is it? In my example
> FNR=6 and NR=3. or??
NR is the _total_ number of record (lines) thus far.
FNR is the number of records) lines in the _actual_ file.
While awk processes the first file NR and FNR are equal.
So the condition NR==FNR guarantees that the associated action is done
for the first file only; which is, store the data (from file1) in mem
and skip subsequent actions.
As soon as awk is processing the second file the condition is not true
any more, the respective actions are skipped, and then next action block
is executed, which performs the comparison of the actual line ($0) in
file2 with the memorized one in mem[].
>
> Please help my straighten out my questions here? Yet so simple and few
> lines but still not clear to me.
If may additionally help you to understand if you see the alternative
awk program (without next, but with an additional condition)...
NR==FNR { mem[NR] = $0 }
NR!=FNR && mem[FNR]!=$0 { print "Some message for line " FNR }
The action block is triggered only for the first file and the second
action block is triggered for all subsequent files. Or to write that
in yet another way (to make the file selection clearer)...
NR==FNR { mem[NR] = $0 }
NR!=FNR { if(mem[FNR] != $0) print "Some message for line " FNR }
Janis
>
> Regards,
>
> /di98
>
| |
|
| On Oct 17, 3:38 am, Janis Papanagnou <Janis_Papanag...@hotmail.com>
wrote:
> # condition where you are in the first file; NR==FNR
> NR==FNR { mem[NR] = $0; next }
> # because of the 'next' you are now in the second file; NR!=FNR
> mem[FNR] != $0 { print "Some message for line " FNR }
I think 'next' can be omitted.
Because when processing the first file, the condition 'mem[FNR] != $0'
is always false.
my edition:
NR==FNR { mem[NR] = $0 }
mem[FNR] != $0 { print "Some message for line " FNR }
xbz
| |
| William James 2007-10-19, 6:58 pm |
| On Oct 16, 1:58 pm, di98mase <di98m...@hotmail.com> wrote:
> HI all,
>
> I have the following structure:
>
> Now what I dont understand is how do I get the values from the input
> files? will they be used in some kind of order ie input1 first then
> input2 then?
>
> The thing I want to do is that I want to compare data on row 1 in
> input1.txt with data on row 1 in input2.txt, how can this be done...
>
> input1.txt
> data=23
> data=33
> data=4
>
> input2.txt
> data=23
> data=32
> data=3
>
> I want to compare the data and if two fields are NOT equal I would
> like to print out a message. But HOW shall I do to first read line 1
> in file 1 then line 1 in file 2 and then continue to row 2 until all
> rows are compared?
>
> How shall I write myawkprogram.awk in order to achieve this?
>
> /di98mase
BEGIN {
file2 = ARGV[ ARGC - 1 ]
ARGC--
}
{ getline line <file2
if ( $0 != line )
print "-->", line
}
| |
|
|
|
|
|