For Programmers: Free Programming Magazines  


Home > Archive > AWK > January 2006 > don't understand the reason why...









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author don't understand the reason why...
Michael Jaritz

2006-01-25, 6:56 pm

Hi,
I'm a newbie in awk - using GNU Awk 3.1.3, on Windows98SE.

# make_A_B.awk
BEGIN{
n_A = 1000
n_B = 150000
deci_ascii_min = 33
deci_ascii_max = 95
w_length_min = 2
w_length_max = 4
for ( i = 1; i <= n_A; i++ ) {
str = ""
w_length = roll(w_length_min, w_length_max)
for ( j = 1; j <= w_length; j++ ) {
rand_int = roll(deci_ascii_min, deci_ascii_max)
chr = sprintf( "%c", rand_int )
str = str chr
}
print str > "A.txt"
}
for ( i = 1; i <= n_B; i++ ) {
str = ""
w_length = roll(w_length_min, w_length_max)
for ( j = 1; j <= w_length; j++ ) {
rand_int = roll(deci_ascii_min, deci_ascii_max)
chr = sprintf( "%c", rand_int )
str = str chr
}
print str > "B.txt"
}
}
function roll(m, n) { return m + int(rand() * n ) }


After running "gawk -f make_A_B.awk" there are my two Input-Files A.txt
with 1000 lines and B.txt with 150000 lines for example.

# check_AB.awk
{
if ( FILENAME == "B.txt" ) {
if ( $0 in b_array ) {
val = b_array[$0]
b_array[$0] = val + 1
}
else {
b_array[$0] = 1
}
}
else {
i_a++
array_a[i_a] = $0
}
}

END{
for ( i = 1; i <= i_a; i++ ) {
val_a = array_a[i]
if ( val_a in b_array ) {
count = sprintf("%6s", b_array[val_a]) " x in B " val_a
print count > "A_checked_against_B.txt"
}
else {
print " " val_a > "A_checked_against_B.txt"
}
}
close("A_checked_against_B.txt")

command = "sort > counted_B_sorted.txt"
for ( i_b in b_array ) {
val_b = b_array[i_b]
print sprintf("%6s", val_b) " x " i_b | command
}
}

After running "gawk -f check_AB.awk B.txt A.txt" the result in
"A_checked_against_B.txt" is OK.

The result in "counted_B_sorted.txt" is a little bit mystique. The first
lines are absolut empty. Why? Every line In B.txt is filled with a
string.


Michael
Harlan Grove

2006-01-25, 6:56 pm

Michael Jaritz wrote...
....
>I'm a newbie in awk - using GNU Awk 3.1.3, on Windows98SE.

....
>function roll(m, n) { return m + int(rand() * n ) }


Maybe an issue, maybe not, but if your intention for the roll function
is to return a random integer between m and n, you'd need to rewrite
this as

function roll(m, n) { return m + int(rand() * (n - m + 1)) }

otherwise roll could return integers between m and m + n - 1 rather
than m and n.

># check_AB.awk

....
> command = "sort > counted_B_sorted.txt"
> for ( i_b in b_array ) {
> val_b = b_array[i_b]
> print sprintf("%6s", val_b) " x " i_b | command
> }

....
>The result in "counted_B_sorted.txt" is a little bit mystique. The first
>lines are absolut empty. Why? Every line In B.txt is filled with a
>string.


I suspect your problem lies in how gawk handles pipes in Windows 98.
It's not the same as how it handles them in Windows NT/2K/XP, and your
second script worked for me under Windows XP. Also, are you using the
Windows sort program or the GnuWin32 sort program? Either way, you'd be
better off rewriting this as

for (i_b in b_array) printf("%6s x %s\n", b_array[i_b], i_b) >
"bcs_temp.txt"
close("bcs_temp.txt")
system("sort bcs_temp.txt > counted_B_sorted.txt")
}

Michael Jaritz

2006-01-26, 6:56 pm

Harlan Grove schrieb:

>Michael Jaritz wrote...
>...
>...
[color=darkred]
>function roll(m, n) { return m + int(rand() * (n - m + 1)) }


Oh, you're right, thanks.

>...
>...
>
>I suspect your problem lies in how gawk handles pipes in Windows 98.
>It's not the same as how it handles them in Windows NT/2K/XP, and your
>second script worked for me under Windows XP.


The problem is the pipe, you're right again.

>Also, are you using the
>Windows sort program or the GnuWin32 sort program?


Windows sort. Now I've tried to use GnuWin32 sort -> No problems with
the pipe.

>Either way, you'd be
>better off rewriting this as
>
> for (i_b in b_array) printf("%6s x %s\n", b_array[i_b], i_b) > "bcs_temp.txt"
> close("bcs_temp.txt")
> system("sort bcs_temp.txt > counted_B_sorted.txt")


This works good with both kind of sort.
Thank you.

Michael
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com