For Programmers: Free Programming Magazines  


Home > Archive > AWK > September 2006 > field assignment and rebuild









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author field assignment and rebuild
Vassilis

2006-09-02, 6:56 pm


Well, I was wondering. I have some this code that:

$3 = $3 + 0
$4 = $4 + 0

This should make the specified fields numeric. Does it also rebuild all
the fields?
Does this $3 = $3 "" rebuild the fields?
We know that $1 = $1 should do that.
Does awk make out the difference? How does it?

I apologize for this obscure questions, I've been doing statistical
analysis in awk
these days and some things got blurred.

Ed Morton

2006-09-02, 6:56 pm

Vassilis wrote:
> Well, I was wondering. I have some this code that:
>
> $3 = $3 + 0
> $4 = $4 + 0
>
> This should make the specified fields numeric. Does it also rebuild all
> the fields?


No, ite reconstructs $0, not the individual fields.

> Does this $3 = $3 "" rebuild the fields?


Same answer.

> We know that $1 = $1 should do that.


Same answer.

> Does awk make out the difference? How does it?


No. It doesn't.

> I apologize for this obscure questions, I've been doing statistical
> analysis in awk
> these days and some things got blurred.


Regards,

Ed.
Loki Harfagr

2006-09-03, 7:56 am

Le Sat, 02 Sep 2006 09:25:15 -0700, Vassilis a écrit_:

>
> Well, I was wondering. I have some this code that:
>
> $3 = $3 + 0
> $4 = $4 + 0
>
> This should make the specified fields numeric. Does it also rebuild all
> the fields?
> Does this $3 = $3 "" rebuild the fields?
> We know that $1 = $1 should do that.
> Does awk make out the difference? How does it?


It just doesn't :-)

> I apologize for this obscure questions, I've been doing statistical
> analysis in awk
> these days and some things got blurred.


As you don't give any set of tests and samples it's hard to parse
in the dark, though from what you described I'd have the feeling it
could well be an issue with your LOCALES.
Jürgen Kahrs

2006-09-03, 7:56 am

Loki Harfagr wrote:

>
> As you don't give any set of tests and samples it's hard to parse
> in the dark, though from what you described I'd have the feeling it
> could well be an issue with your LOCALES.


Yes, that's possible. When printing numbers, one
should also look at OFMT:

OFMT The output format for numbers, "%.6g", by default.
Vassilis

2006-09-03, 6:56 pm

Loki Harfagr wrote:
> As you don't give any set of tests and samples it's hard to parse
> in the dark, though from what you described I'd have the feeling it
> could well be an issue with your LOCALES.


Thanks guys.
I think I've got my answers from Ed, but for the sake of clarity this
is what my input looks like:

# 14 day
Inter,Lazio,0,2
Juventus,Genoa,1,1
Reggiana,Padova,3,0
Roma,Milan,0,0
Sampdoria,Cagliari,5,0
Napoli,Brescia,1,1
Torino,Milan,0,0
Brescia,Reggiana,1,0
Cagliari,Inter,1,1

this is my program:

BEGIN { FS = "," }

/^#\ [[:digit:]]+\ day/ { split($0, a, " "); Day = a[2] + 0; next }

!/^$/ && Day >= 1 {
Games++
$3 = $3 + 0
$4 = $4 + 0 # problem lies here
HomeGoals += $3
AwayGoals += $4
}

END {
printf "Total goals = %d\n", (HomeGoals + AwayGoals)
printf "Total home goals = %d (%.2f)\n", HomeGoals, HomeGoals / Games
printf "Total away goals = %d (%.2f)\n", AwayGoals, AwayGoals / Games
}

As Ed said, when I assign to a field, the whole record gets rebuilt.
Having done $3 = $3 + 0, I know that $3 has a numeric value.
If I do a $4 = $4 + 0, will $3 still remain numeric?
The answer seems obvious now.

Ed Morton

2006-09-04, 3:56 am

Vassilis wrote:
> Loki Harfagr wrote:
>
>
>
> Thanks guys.
> I think I've got my answers from Ed, but for the sake of clarity this
> is what my input looks like:
>
> # 14 day
> Inter,Lazio,0,2
> Juventus,Genoa,1,1
> Reggiana,Padova,3,0
> Roma,Milan,0,0
> Sampdoria,Cagliari,5,0
> Napoli,Brescia,1,1
> Torino,Milan,0,0
> Brescia,Reggiana,1,0
> Cagliari,Inter,1,1
>
> this is my program:
>
> BEGIN { FS = "," }
>
> /^#\ [[:digit:]]+\ day/ { split($0, a, " "); Day = a[2] + 0; next }
>
> !/^$/ && Day >= 1 {
> Games++
> $3 = $3 + 0
> $4 = $4 + 0 # problem lies here
> HomeGoals += $3
> AwayGoals += $4
> }
>
> END {
> printf "Total goals = %d\n", (HomeGoals + AwayGoals)
> printf "Total home goals = %d (%.2f)\n", HomeGoals, HomeGoals / Games
> printf "Total away goals = %d (%.2f)\n", AwayGoals, AwayGoals / Games
> }
>
> As Ed said, when I assign to a field, the whole record gets rebuilt.
> Having done $3 = $3 + 0, I know that $3 has a numeric value.
> If I do a $4 = $4 + 0, will $3 still remain numeric?
> The answer seems obvious now.
>


Yes, but it's not obvious why you're doing that. HomeGoals and AwayGoals
will have the same value whether you do $3 = $3 + 0 and $4 = $4 + 0 or
not. If I were writing the above, I'd do it as:

BEGIN { FS = "," }

NF > 1 { Games++; HomeGoals += $3; AwayGoals += $4 }

END {
HomeAve = (Games ? HomeGoals / Games : 0)
AwayAve = (Games ? AwayGoals / Games : 0)
printf "Total goals = %d\n", (HomeGoals + AwayGoals)
printf "Total home goals = %d (%.2f)\n", HomeGoals, HomeAve
printf "Total away goals = %d (%.2f)\n", AwayGoals, AwayAve
}

Regards,

Ed.
Vassilis

2006-09-04, 7:56 am


Ed Morton wrote:
> Yes, but it's not obvious why you're doing that. HomeGoals and AwayGoals
> will have the same value whether you do $3 = $3 + 0 and $4 = $4 + 0 or
> not. If I were writing the above, I'd do it as:
>
> BEGIN { FS = "," }
>
> NF > 1 { Games++; HomeGoals += $3; AwayGoals += $4 }
>
> END {
> HomeAve = (Games ? HomeGoals / Games : 0)
> AwayAve = (Games ? AwayGoals / Games : 0)
> printf "Total goals = %d\n", (HomeGoals + AwayGoals)
> printf "Total home goals = %d (%.2f)\n", HomeGoals, HomeAve
> printf "Total away goals = %d (%.2f)\n", AwayGoals, AwayAve
> }
>
> Regards,
>
> Ed.


Thanks for your suggestions Ed.
Cleared up my program and my mental state of awk workings.

Jürgen Kahrs

2006-09-04, 6:56 pm

Vassilis wrote:

> # 14 day
> Inter,Lazio,0,2
> Juventus,Genoa,1,1
> Reggiana,Padova,3,0
> Roma,Milan,0,0
> Sampdoria,Cagliari,5,0
> Napoli,Brescia,1,1
> Torino,Milan,0,0
> Brescia,Reggiana,1,0
> Cagliari,Inter,1,1


Aaaaahhh, Italian football matches.
Are you doing statistics only or are you trying
to predict the results of the matches ? I have
written an AWK script for predicting German
football matches. My script is doing quite well
at prediction. Only a few humans can beat the
predictions of my script, as you can see here:

http://www.kicktipp.de/orthogon/gesamtuebersicht

> $4 = $4 + 0 # problem lies here


If your script fails here in some cases, this may
be because of FS=",". If there are any invisible
characters between numbers or at the end of the line
(for example tabs or blank character or the typical
linefeed that comes with CR/LF, then you might be
surprised.
Vassilis

2006-09-04, 9:56 pm

J=FCrgen Kahrs wrote:

<OT>
>
> Aaaaahhh, Italian football matches.
> Are you doing statistics only or are you trying
> to predict the results of the matches ? I have
> written an AWK script for predicting German
> football matches. My script is doing quite well
> at prediction. Only a few humans can beat the
> predictions of my script, as you can see here:
>
> http://www.kicktipp.de/orthogon/gesamtuebersicht


Not noly italian: Bundesliga, Championat de France, Primera Division,
English Premiership, all major european football championships. Maybe
I'll catch up with Champions' League and Uefa cup later.

Yes. Hypothesis first, statistics then, prediction later.
I've run my hypotheses against the data. I know now that I should
strengthen or change a bit my hypotheses. As Bohr said: ``Prediction is
difficult, especially of the future'' ;)
Anyway, I'm still at the beginnings, but I'd love to get my hands on
your script.
</OT>

>
> If your script fails here in some cases, this may
> be because of FS=3D",". If there are any invisible
> characters between numbers or at the end of the line
> (for example tabs or blank character or the typical
> linefeed that comes with CR/LF, then you might be
> surprised.


My script didn't fail, I had what the French might call a
m=E9connaissance, a misconception in the way AWK assigns fields. Ed's
postings made those things clear.

Jürgen Kahrs

2006-09-05, 6:56 pm

Vassilis wrote:

> Yes. Hypothesis first, statistics then, prediction later.
> I've run my hypotheses against the data. I know now that I should
> strengthen or change a bit my hypotheses. As Bohr said: ``Prediction is
> difficult, especially of the future'' ;)


Yes, especially hard for the next wend.
Are your hypotheses based on linear regression
or is it more like decision-tree algorithms ?
I tried decision-trees (C4.5) and was a bit
disappointed because results were only mediocre.

> Anyway, I'm still at the beginnings, but I'd love to get my hands on
> your script.


Well, I hesitate to post the script. My colleagues
(who take part in betting at kicktipp) may read this
posting and they could take advantage of applying my
script.

Anyway, you know how to assemble the data for
statistical evaluation and I think your background
knowledge should be good enough to re-engineer the
script with the following description.

The main idea is to keep it simple. If Bayern München
plays at home, the average number of goals they score
at home seems to be more or less constant. So, for each
team calculate the average number of goals scored by
this team at home do the same for the guest team. It
may sound primitive, but guessing that each team always
scores the same amount of goals when they are at home
or a guest team, is a pretty good first guess.
The second step only adds a bonus to each teams
"average number of goals". Divide the number of _points_
scored by the home team (when playing at home) by the
number of points scored by the guest team (when playing
as guest). Add these quotients to the initial guess and
that's roughly how my script works.

One has to take care of division-by-zero and some
minor offset also makes sense. But these ideas should
get you started and I am sure you will soon find some
improvements to the algorithm. Good luck.
Vassilis

2006-09-05, 6:56 pm

J=FCrgen Kahrs wrote:
> Vassilis wrote:
>
>
> Yes, especially hard for the next wend.
> Are your hypotheses based on linear regression
> or is it more like decision-tree algorithms ?
> I tried decision-trees (C4.5) and was a bit
> disappointed because results were only mediocre.


Well, I would say that my approach is structural(istic) (as in
Structural Anthropology by Claude L=E9vi-Strauss). I'm not interested in
specific teams, neither in specific games, but rather in the relative
position (ranking) of two teams, or the position of teams in a group
(for instance, I, broadly, guess that teams in the top almost always
win teams at the bottom of the ranking). After establishing these
groups, I try to refine (filter) the results to a more restrictive
case. I would never have guessed that filtering is such a good idea in
betting, as is in shell. In other words, I search for global patterns,
that hold for all games, all seasons, all championships.
OTOH, I'd like to use some linear regression, but it seems that I can't
get some things straight. I'm not alone, but my friends (being non
programmers) are reluctant to learn AWK (yet) or, for that matter,
study any statistics.

> Well, I hesitate to post the script. My colleagues
> (who take part in betting at kicktipp) may read this
> posting and they could take advantage of applying my
> script.


I understand, I was asking half seriously only.

> Anyway, you know how to assemble the data for
> statistical evaluation and I think your background
> knowledge should be good enough to re-engineer the
> script with the following description.
>
> The main idea is to keep it simple. If Bayern M=FCnchen
> plays at home, the average number of goals they score
> at home seems to be more or less constant. So, for each
> team calculate the average number of goals scored by
> this team at home do the same for the guest team. It
> may sound primitive, but guessing that each team always
> scores the same amount of goals when they are at home
> or a guest team, is a pretty good first guess.
> The second step only adds a bonus to each teams
> "average number of goals". Divide the number of _points_
> scored by the home team (when playing at home) by the
> number of points scored by the guest team (when playing
> as guest). Add these quotients to the initial guess and
> that's roughly how my script works.
>
> One has to take care of division-by-zero and some
> minor offset also makes sense. But these ideas should
> get you started and I am sure you will soon find some
> improvements to the algorithm. Good luck.


Nice. And as far as I can see, it pays off too.
Good luck to you too. And thanks.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com