Home > Archive > AWK > November 2007 > optimization - search free value through a range
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
optimization - search free value through a range
|
|
| mobidyc 2007-10-03, 6:58 pm |
| Hello,
i am searching to optimize a code.
this code search the first free value in a list through a range of value
this is what i do for the moment:
MIN_VALUE=800
MAX_VALUE=810
#> cat all_values
100
200
800
801
803
900
#> nawk -v MIN_VALUE=$MIN_VALUE -v MAX_VALUE=$MAX_VALUE '{
if(($1 >= MIN_VALUE) && ($1 <= MAX_VALUE)) {
VAL[$1] = $1 ;
}
} END {
for (i=MIN_VALUE ; i<=MAX_VALUE ; i++) {
if (VAL[i] == "") {
print "result: " i ;
exit ;
}
}
}' all_values
result: 802
all advices appreciated.
--
Cordialement,
Mobidyc
| |
| John DuBois 2007-10-03, 6:58 pm |
| In article <fe0ffb$lno$1@talisker.lacave.net>, mobidyc <mobidyc@msn.com> wrote:
>Hello,
>i am searching to optimize a code.
>this code search the first free value in a list through a range of value
>
>this is what i do for the moment:
>MIN_VALUE=800
>MAX_VALUE=810
>
>#> cat all_values
>100
>200
>800
>801
>803
>900
>
>#> nawk -v MIN_VALUE=$MIN_VALUE -v MAX_VALUE=$MAX_VALUE '{
> if(($1 >= MIN_VALUE) && ($1 <= MAX_VALUE)) {
> VAL[$1] = $1 ;
> }
> } END {
> for (i=MIN_VALUE ; i<=MAX_VALUE ; i++) {
> if (VAL[i] == "") {
> print "result: " i ;
> exit ;
> }
> }
> }' all_values
The way you're doing this is fine for the sort of data set & task you show.
You could optimize it a tiny bit replacing
VAL[$1] = $1
with just
VAL[$1]
and
if (VAL[i] == "") {
with
if (i in VAL) {
Beyond that, the sort of optimization (if any) that is appropriate will depend
on how the job scales. Are you liable to have a great many values, with many
contiguous ranges? Are you liable to want to find many "free values" in the
same dataset?
Some functions for dealing with integer ranges are available at
ftp://ftp.armory.com/pub/lib/awk/IntsToRanges
with an example of their use at
ftp://ftp.armory.com/pub/admin/unusedids
John
--
John DuBois spcecdt@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/
| |
| John DuBois 2007-10-03, 6:58 pm |
| In article <13g7nbtg66srg80@corp.supernews.com>,
John DuBois <spcecdt@armory.com> wrote:
>In article <fe0ffb$lno$1@talisker.lacave.net>, mobidyc <mobidyc@msn.com> wrote:
....[color=darkred]
>
>The way you're doing this is fine for the sort of data set & task you show.
>You could optimize it a tiny bit replacing
> VAL[$1] = $1
>with just
> VAL[$1]
>and
> if (VAL[i] == "") {
>with
> if (i in VAL) {
Sorry, that should of course be:
if (!(i in VAL)) {
John
--
John DuBois spcecdt@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/
| |
| loki harfagr 2007-10-03, 6:58 pm |
| On Wed, 03 Oct 2007 18:19:28 +0200, mobidyc wrote:
> Hello,
> i am searching to optimize a code.
> this code search the first free value in a list through a range of value
>
> this is what i do for the moment:
> MIN_VALUE=800
> MAX_VALUE=810
>
> #> cat all_values
> 100
> 200
> 800
> 801
> 803
> 900
>
> #> nawk -v MIN_VALUE=$MIN_VALUE -v MAX_VALUE=$MAX_VALUE '{
> if(($1 >= MIN_VALUE) && ($1 <= MAX_VALUE)) {
> VAL[$1] = $1 ;
> }
> } END {
> for (i=MIN_VALUE ; i<=MAX_VALUE ; i++) {
> if (VAL[i] == "") {
> print "result: " i ;
> exit ;
> }
> }
> }' all_values
> result: 802
>
> all advices appreciated.
The actual code you're using is probably quite correct for performance,
here are just a few "adjustments" I'd suppose to be a bit easier and
faster, but being at home and no direct access to a "real" machine
that'd run nawk and not really a lot of time to go VPN et al
I'll just suppose it may be help even if not in gawk :-)
There's the code:
---------
$ awk -v MIN_VALUE=$MIN_VALUE -v MAX_VALUE=$MAX_VALUE '
BEGIN{
for(i=MIN_VALUE;i<MAX_VALUE;i++){a[i]}
}
MIN_VALUE>$1{next}
MAX_VALUE<$1{next}
{delete a[$1]}
END{
for(f in a){
print f;
exit}
}
' all_values
802
---------
It prefills an array with numbers in the selected range then
reads the file and "kills" any existent value, in the end
prints out the first that the hash algo'll give and exits.
besides being a very little bit different algorithm it
tries to think that direct escape tests are quicker than
a test in "main loop", this and the idea sequential atomic
tests may be quicker than combined tests is probable mostly
mistic as very dependent on the OS/language/libs/cc/variant/etc.
used in real life ;o)
| |
| Janis Papanagnou 2007-10-03, 6:58 pm |
| mobidyc wrote:
> Hello,
> i am searching to optimize a code.
> this code search the first free value in a list through a range of value
>
> this is what i do for the moment:
> MIN_VALUE=800
> MAX_VALUE=810
>
> #> cat all_values
> 100
> 200
> 800
> 801
> 803
> 900
>
> #> nawk -v MIN_VALUE=$MIN_VALUE -v MAX_VALUE=$MAX_VALUE '{
> if(($1 >= MIN_VALUE) && ($1 <= MAX_VALUE)) {
> VAL[$1] = $1 ;
> }
The above condition is in awk typically written as condition{action}
instead of {if(condition){action}} i.e. without the outer brackets
and without if statement
$1 >= MIN_VALUE && $1 <= MAX_VALUE { v[$1] }
> } END {
> for (i=MIN_VALUE ; i<=MAX_VALUE ; i++) {
> if (VAL[i] == "") {
The above two conditions, i<=MAX_VALUE and VAL[i]=="", can be reduced to
a single condition, i in VAL.
Depending on your requirements, either print the actual index (which would
be either the lowest position or MAX_VALUE+1 if all positions within the
range are already occupied) or perform a single final check if(i<=MAXVALUE)
after the loop if you don't want an empty position outside the range.
> print "result: " i ;
> exit ;
> }
> }
> }' all_values
> result: 802
>
> all advices appreciated.
So the resulting program would just be...
nawk -v MIN_VALUE=$MIN_VALUE -v MAX_VALUE=$MAX_VALUE '
$1 >= MIN_VALUE && $1 <= MAX_VALUE { VAL[$1] }
END {
for (i = MIN_VALUE; i in VAL; i++)
; # nothing to do here, just iterate
if (i <= MAX_VALUE) # or simply print i without if(),
print i # depending on requirements if
# all positions already occupied
}
'
> --
> Cordialement,
> Mobidyc
>
>
| |
| mobidyc 2007-10-04, 3:57 am |
| Thanks for all your advices,
i'll integrate them ;)
Regards,
Mobidyc
| |
| Karthik Gurusamy 2007-11-22, 3:58 am |
| On Oct 3, 8:19 am, "mobidyc" <mobi...@msn.com> wrote:
> Hello,
> i am searching to optimize a code.
> this code search the first free value in a list through a range of value
>
> this is what i do for the moment:
> MIN_VALUE=800
> MAX_VALUE=810
>
> #> cat all_values
> 100
> 200
> 800
> 801
> 803
> 900
The solutions given by others using array may have solved your
performance needs.
If they don't, you can consider some of the following points.
And these ideas really matter only if your data-set is huge (several
million or so entries). I'm also assuming, at the risk of making this
post OT, you can implement the algorithm in a compiled language like C
(again only reason is assuming this code is a hot-spot in your
application)
1. If your data file is sorted as the above example shows, you can
load them in a single contiguous array in memory. Exploit the sorted
nature, by using a binary search to
a) Find the smallest n in data-set such that n >= MIN_VALUE (n=800
in the example)
b) Find the largest m in data-set such that m <= MAX_VALUE
(m=803 )
Both steps a and b take O(log N) time (N = total items in data-set)
2. Now the task is to find a "hole" in the interval [n, m]. A trivial
method is to run thru' from n to m; this takes O(k) where k is the
interval size (m-n) (k=803-800=3 in the example set). But since the
job is to just find first free and we are using only integers, a
modified binary search in the interval can find one in O(log k). (Just
see the middle element..and if it is equal to the n+k/2, then there is
no free slot in the whole of left half of the interval).
Thus you can solve the problem in O(log N) + O(log k). This ignores
for initial load time of the data into memory (that is of course
O(N)). The awk array based solution may use an hash-based
implementation and may suffer from collision problems on very large
data sets.
Karthik
>
> #> nawk -v MIN_VALUE=$MIN_VALUE -v MAX_VALUE=$MAX_VALUE '{
> if(($1 >= MIN_VALUE) && ($1 <= MAX_VALUE)) {
> VAL[$1] = $1 ;
> }
> } END {
> for (i=MIN_VALUE ; i<=MAX_VALUE ; i++) {
> if (VAL[i] == "") {
> print "result: " i ;
> exit ;
> }
> }
> }' all_values
> result: 802
>
> all advices appreciated.
> --
> Cordialement,
> Mobidyc
|
|
|
|
|