Code Comments
Programming Forum and web based access to our favorite programming groups.I have a while inside a while inside a while that is very slow for
large reads. Here is the code (it is really long):
{ while read myline; do
if [[ $myline = "<tlm-meas>"* ]];then
read dayline
firstpass="${dayline##<day-time>}"
daytime="${firstpass%%</day-time>}"
linevals=$daytime$comma
read vehicletime
read meascolumn
read measvalue
read limitsflag
i=0
while [[ $i -le $meascount ]];do
firstpass="${meascolumn##<meas-column>}"
meascol="${firstpass%%</meas-column>}"
if [[ $meascol = $i ]];then
firstpass="${measvalue##<meas-value>}"
measval="${firstpass%%</meas-value>}"
linevals=$linevals$measval$comma
read meascolumn
if [[ $meascolumn = "</tlm-meas>"* ]];then
while [[ $i < $(($meascount-1)) ]];do
linevals=$linevals$comma
i=$(($i+1))
done
break
fi
read measvalue
read limitsflag
else
linevals=$linevals$comma
fi
i=$(($i+1))
done
fi
if [[ $myline = "</pds-datasort>"* ]];then
break
fi
if [[ $myline = "</tlm-meas>"* || $meascolumn = "</tlm-
meas>"* ]];then
while [[ $i -le $(($meascount-1)) ]];do
linevals=$linevals$comma
i=$(($i+1))
done linevals1="${linevals%,,}"
print $linevals1 >> $3
continue
fi
done } < $dspbfile
So, what this does is takes data from one file that looks like this
(and this is just a a partial file):
<tlm-meas>
<day-time>2008/035:23:08:09.803</day-time>
<vehicle-time> 83289.803</vehicle-time>
<meas-column>8</meas-column>
<meas-value>-25.0335</meas-value>
<limits-flag><<tlm-meas>
<day-time>2008/035:23:08:25.333</day-time>
<vehicle-time> 83305.333</vehicle-time>
<meas-column>9</meas-column>
<meas-value>0</meas-value>
<limits-flag></limits-flag>
<meas-column>11</meas-column>
<meas-value>3.22123e+09</meas-value>
<limits-flag></limits-flag>
</tlm-meas>
/limits-flag>
</tlm-meas>
</pds-datasort>
And prints it into a file that looks like this:
2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
Where the meas-column field is where the value gets put and if there
is no value for the column (they are in order), then it will just get
a comma. And there needs to be commas for each mnemonic (which I do
know how many there are) even if it has no value.
When I have only 60 samples in the first file, it runs very quickly.
When I have 274,100 samples in the first file, it takes 2-3 hours to
run.
Is there a quicker way to do this? If not, that is ok. I just can't
seem to find one. Thanks for any help.
Allyson
Post Follow-up to this messageeskgwin@gmail.com wrote:
> I have a while inside a while inside a while that is very slow for
> large reads. Here is the code (it is really long):
>
> { while read myline; do
> if [[ $myline = "<tlm-meas>"* ]];then
> read dayline
> firstpass="${dayline##<day-time>}"
> daytime="${firstpass%%</day-time>}"
> linevals=$daytime$comma
> read vehicletime
> read meascolumn
> read measvalue
> read limitsflag
> i=0
> while [[ $i -le $meascount ]];do
> firstpass="${meascolumn##<meas-column>}"
> meascol="${firstpass%%</meas-column>}"
> if [[ $meascol = $i ]];then
> firstpass="${measvalue##<meas-value>}"
> measval="${firstpass%%</meas-value>}"
> linevals=$linevals$measval$comma
> read meascolumn
> if [[ $meascolumn = "</tlm-meas>"* ]];then
> while [[ $i < $(($meascount-1)) ]];do
> linevals=$linevals$comma
> i=$(($i+1))
> done
> break
> fi
> read measvalue
> read limitsflag
> else
> linevals=$linevals$comma
> fi
> i=$(($i+1))
> done
> fi
> if [[ $myline = "</pds-datasort>"* ]];then
> break
> fi
> if [[ $myline = "</tlm-meas>"* || $meascolumn = "</tlm-
> meas>"* ]];then
> while [[ $i -le $(($meascount-1)) ]];do
> linevals=$linevals$comma
> i=$(($i+1))
> done linevals1="${linevals%,,}"
> print $linevals1 >> $3
> continue
> fi
> done } < $dspbfile
>
> So, what this does is takes data from one file that looks like this
> (and this is just a a partial file):
>
> <tlm-meas>
> <day-time>2008/035:23:08:09.803</day-time>
> <vehicle-time> 83289.803</vehicle-time>
> <meas-column>8</meas-column>
> <meas-value>-25.0335</meas-value>
> <limits-flag><<tlm-meas>
> <day-time>2008/035:23:08:25.333</day-time>
> <vehicle-time> 83305.333</vehicle-time>
> <meas-column>9</meas-column>
> <meas-value>0</meas-value>
> <limits-flag></limits-flag>
> <meas-column>11</meas-column>
> <meas-value>3.22123e+09</meas-value>
> <limits-flag></limits-flag>
> </tlm-meas>
> /limits-flag>
> </tlm-meas>
> </pds-datasort>
>
> And prints it into a file that looks like this:
> 2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
> 2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
>
> Where the meas-column field is where the value gets put and if there
> is no value for the column (they are in order), then it will just get
> a comma. And there needs to be commas for each mnemonic (which I do
> know how many there are) even if it has no value.
>
> When I have only 60 samples in the first file, it runs very quickly.
> When I have 274,100 samples in the first file, it takes 2-3 hours to
> run.
>
> Is there a quicker way to do this? If not, that is ok. I just can't
> seem to find one. Thanks for any help.
Have a look at xgawk (XML extended GNU awk) to process such data.
Janis
>
> Allyson
>
>
>
>
>
Post Follow-up to this message
On 4/1/2008 12:41 PM, eskgwin@gmail.com wrote:
> I have a while inside a while inside a while that is very slow for
> large reads. Here is the code (it is really long):
>
> { while read myline; do
> if [[ $myline = "<tlm-meas>"* ]];then
> read dayline
> firstpass="${dayline##<day-time>}"
> daytime="${firstpass%%</day-time>}"
> linevals=$daytime$comma
> read vehicletime
> read meascolumn
> read measvalue
> read limitsflag
> i=0
> while [[ $i -le $meascount ]];do
> firstpass="${meascolumn##<meas-column>}"
> meascol="${firstpass%%</meas-column>}"
> if [[ $meascol = $i ]];then
> firstpass="${measvalue##<meas-value>}"
> measval="${firstpass%%</meas-value>}"
> linevals=$linevals$measval$comma
> read meascolumn
> if [[ $meascolumn = "</tlm-meas>"* ]];then
> while [[ $i < $(($meascount-1)) ]];do
> linevals=$linevals$comma
> i=$(($i+1))
> done
> break
> fi
> read measvalue
> read limitsflag
> else
> linevals=$linevals$comma
> fi
> i=$(($i+1))
> done
> fi
> if [[ $myline = "</pds-datasort>"* ]];then
> break
> fi
> if [[ $myline = "</tlm-meas>"* || $meascolumn = "</tlm-
> meas>"* ]];then
> while [[ $i -le $(($meascount-1)) ]];do
> linevals=$linevals$comma
> i=$(($i+1))
> done linevals1="${linevals%,,}"
> print $linevals1 >> $3
> continue
> fi
> done } < $dspbfile
>
> So, what this does is takes data from one file that looks like this
> (and this is just a a partial file):
>
> <tlm-meas>
> <day-time>2008/035:23:08:09.803</day-time>
> <vehicle-time> 83289.803</vehicle-time>
> <meas-column>8</meas-column>
> <meas-value>-25.0335</meas-value>
> <limits-flag><<tlm-meas>
> <day-time>2008/035:23:08:25.333</day-time>
> <vehicle-time> 83305.333</vehicle-time>
> <meas-column>9</meas-column>
> <meas-value>0</meas-value>
> <limits-flag></limits-flag>
> <meas-column>11</meas-column>
> <meas-value>3.22123e+09</meas-value>
> <limits-flag></limits-flag>
> </tlm-meas>
> /limits-flag>
> </tlm-meas>
> </pds-datasort>
>
> And prints it into a file that looks like this:
> 2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
> 2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
>
> Where the meas-column field is where the value gets put and if there
> is no value for the column (they are in order), then it will just get
> a comma. And there needs to be commas for each mnemonic (which I do
> know how many there are) even if it has no value.
>
> When I have only 60 samples in the first file, it runs very quickly.
> When I have 274,100 samples in the first file, it takes 2-3 hours to
> run.
>
> Is there a quicker way to do this? If not, that is ok. I just can't
> seem to find one. Thanks for any help.
shell loops are usually the wrong approach. I don't think your sample input
is
quite right as it has things in it like "<<tlm-meas>" and "/limits-flag>". I
t
appears that you're trying to get all the between "<tlm-meas>" and "</tlm-me
as>"
into a single line. If so, take a look at this using GNU awk on a modified
verion of your input file:
$ cat file
<tlm-meas>
<day-time>2008/035:23:08:09.803</day-time>
<vehicle-time> 83289.803</vehicle-time>
<meas-column>8</meas-column>
<meas-value>-25.0335</meas-value>
</tlm-meas>
<tlm-meas>
<day-time>2008/035:23:08:25.333</day-time>
<vehicle-time> 83305.333</vehicle-time>
<meas-column>9</meas-column>
<meas-value>0</meas-value>
<limits-flag></limits-flag>
</tlm-meas>
<tlm-meas>
<meas-column>11</meas-column>
<meas-value>3.22123e+09</meas-value>
<limits-flag></limits-flag>
</tlm-meas>
$ gawk -v RS="</tlm-meas>[[:space:]]*" -F'\n' '{
for (i=2;i<NF;i++) {
split($i,arr,"[<> ]+")
printf "%s=\"%s\"\n",arr[2],arr[3]
}
print "----"
}' file
day-time="2008/035:23:08:09.803"
vehicle-time="83289.803"
meas-column="8"
meas-value="-25.0335"
----
day-time="2008/035:23:08:25.333"
vehicle-time="83305.333"
meas-column="9"
meas-value="0"
limits-flag="/limits-flag"
----
meas-column="11"
meas-value="3.22123e+09"
limits-flag="/limits-flag"
----
and if it seems to be roughly pulling out and grouping the right information
, we
could tidy it up and figure out how to deal with the missing fields for each
record.
Ed.
Post Follow-up to this messageOn Apr 1, 11:05=A0am, Ed Morton <mor...@lsupcaemnt.com> wrote:
> On 4/1/2008 12:41 PM, eskg...@gmail.com wrote:
>
>
>
>
>
>
lm-
>
>
>
>
>
>
>
> shell loops are usually the wrong approach. I don't think your sample inpu=[/color
]
t is
> quite right as it has things in it like "<<tlm-meas>" and "/limits-flag>".=[/color
]
It
> appears that you're trying to get all the between "<tlm-meas>" and "</tlm-=[/color
]
meas>"
> into a single line. If so, take a look at this using GNU awk on a modified=[/color
]
> verion of your input file:
>
> $ cat file
> <tlm-meas>
> <day-time>2008/035:23:08:09.803</day-time>
> <vehicle-time> =A0 83289.803</vehicle-time>
> <meas-column>8</meas-column>
> <meas-value>-25.0335</meas-value>
> </tlm-meas>
> <tlm-meas>
> <day-time>2008/035:23:08:25.333</day-time>
> <vehicle-time> =A0 83305.333</vehicle-time>
> <meas-column>9</meas-column>
> <meas-value>0</meas-value>
> <limits-flag></limits-flag>
> </tlm-meas>
> <tlm-meas>
> <meas-column>11</meas-column>
> <meas-value>3.22123e+09</meas-value>
> <limits-flag></limits-flag>
> </tlm-meas>
> $ gawk -v RS=3D"</tlm-meas>[[:space:]]*" -F'\n' '{
> =A0 =A0 =A0 =A0 for (i=3D2;i<NF;i++) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 split($i,arr,"[<> ]+")
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf "%s=3D\"%s\"\n",arr[2],arr[3]
> =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 print "----"}' file
>
> day-time=3D"2008/035:23:08:09.803"
> vehicle-time=3D"83289.803"
> meas-column=3D"8"
> meas-value=3D"-25.0335"
> ----
> day-time=3D"2008/035:23:08:25.333"
> vehicle-time=3D"83305.333"
> meas-column=3D"9"
> meas-value=3D"0"
> limits-flag=3D"/limits-flag"
> ----
> meas-column=3D"11"
> meas-value=3D"3.22123e+09"
> limits-flag=3D"/limits-flag"
> ----
>
> and if it seems to be roughly pulling out and grouping the right informati=[/color
]
on, we
> could tidy it up and figure out how to deal with the missing fields for ea=[/color
]
ch record.
>
> =A0 =A0 =A0 =A0 Ed.- Hide quoted text -
>
> - Show quoted text -
The only data I need actually is the day-time and meas-value. I need
the meas-column to figure out where to put each value in the line. The
part that seems hard is to figure out how to deal with the missing
fields and getting the values in the right places. Thanks.
Allyson
Post Follow-up to this message
On 4/1/2008 1:18 PM, eskgwin@gmail.com wrote:
> On Apr 1, 11:05 am, Ed Morton <mor...@lsupcaemnt.com> wrote:
>
>
>
> The only data I need actually is the day-time and meas-value. I need
> the meas-column to figure out where to put each value in the line. The
> part that seems hard is to figure out how to deal with the missing
> fields and getting the values in the right places. Thanks.
>
OK, so given the input file I show above, we can do this:
gawk -v OFS="," -v RS="</tlm-meas>[[:space:]]*" -F'\n' '{
dayTime=measColumn=measValue=""
for (i=2;i<NF;i++) {
split($i,arr,"[<> ]+")
if (arr[2] == "day-time") {
dayTime=arr[3]
}
if (arr[2] == "meas-column") {
measColumn=arr[3]
}
if (arr[2] == "meas-value") {
measValue=arr[3]
}
}
print dayTime,measColumn,measValue
}' file
2008/035:23:08:09.803,8,-25.0335
2008/035:23:08:25.333,9,0
,11,3.22123e+09
What needs to be done now?
Ed.
Post Follow-up to this messageOn Apr 1, 11:27=A0am, Ed Morton <mor...@lsupcaemnt.com> wrote:
> On 4/1/2008 1:18 PM, eskg...@gmail.com wrote:
>
>
>
>
>
>
>
>
n
tlm-
>
>
>
>
>
>
>
put is
". It
m-meas>"
ed
>
>
>
tion, we
each record.
>
>
>
>
> OK, so given the input file I show above, we can do this:
>
> gawk -v OFS=3D"," -v RS=3D"</tlm-meas>[[:space:]]*" -F'\n' '{
> =A0 =A0 =A0 =A0 dayTime=3DmeasColumn=3DmeasValue=3D""
> =A0 =A0 =A0 =A0 for (i=3D2;i<NF;i++) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 split($i,arr,"[<> ]+")
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (arr[2] =3D=3D "day-time") {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 dayTime=3Darr[3]
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (arr[2] =3D=3D "meas-column") {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 measColumn=3Darr[3]
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (arr[2] =3D=3D "meas-value") {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 measValue=3Darr[3]
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 print dayTime,measColumn,measValue}' file
>
> 2008/035:23:08:09.803,8,-25.0335
> 2008/035:23:08:25.333,9,0
> ,11,3.22123e+09
>
> What needs to be done now?
>
> =A0 =A0 =A0 =A0 Ed.- Hide quoted text -
>
> - Show quoted text -
It needs to look like this:
2008/035:23:08:09.803,,,,,,,,,-25.0335,,,
2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09
with the 8 of the meas-column being the place to put the meas-value of
-25.0335. In the second line, the 9 is the column where the 0 meas-
value goes and the 11 is the column where the 3.22123e+09 goes.
Also, when I try to use gawk on my unix box:
Machine hardware: sun4u
OS version: 5.8
Processor type: sparc
Hardware: SUNW,Sun-Blade-100
I get this:
a.ksh[3]: gawk: not found
I can't even do a man on it:
No manual entry for gawk.
Is there something equivalent that I can use? Thanks.
Allyson
Post Follow-up to this messageOn 4/1/2008 1:36 PM, eskgwin@gmail.com wrote: > On Apr 1, 11:27 am, Ed Morton <mor...@lsupcaemnt.com> wrote: > > > It needs to look like this: > > 2008/035:23:08:09.803,,,,,,,,,-25.0335,,, > 2008/035:23:08:25.333,,,,,,,,,,0,,3.22123e+09 > > with the 8 of the meas-column being the place to put the meas-value of > -25.0335. In the second line, the 9 is the column where the 0 meas- > value goes and the 11 is the column where the 3.22123e+09 goes. That's not a problem but before I do any more: is my guess at your input fil e format correct or should it instead be this (deleted the 2 lines immediately before <meas-column>11</meas-column> ): <tlm-meas> <day-time>2008/035:23:08:09.803</day-time> <vehicle-time> 83289.803</vehicle-time> <meas-column>8</meas-column> <meas-value>-25.0335</meas-value> </tlm-meas> <tlm-meas> <day-time>2008/035:23:08:25.333</day-time> <vehicle-time> 83305.333</vehicle-time> <meas-column>9</meas-column> <meas-value>0</meas-value> <limits-flag></limits-flag> <meas-column>11</meas-column> <meas-value>3.22123e+09</meas-value> <limits-flag></limits-flag> </tlm-meas> or should it really be something else? There's different solutions depending on the correct input format. > Also, when I try to use gawk on my unix box: > > Machine hardware: sun4u > OS version: 5.8 > Processor type: sparc > Hardware: SUNW,Sun-Blade-100 > > I get this: > a.ksh[3]: gawk: not found Then gawk isn't in your PATH or it may not already be installed on your mach ine. > I can't even do a man on it: > No manual entry for gawk. > > Is there something equivalent that I can use? Thanks. You can use any awk that allows you to use a regular-expression as it's record-separator (RS) but the only awk I personally know of that supports th at is gawk. We could come up with workarounds but gawk has many, many features that make it a good choice of awk to use so if I were you I'd download and instal l it from http://www.gnu.org/software/gawk/ if you don't already have it. Ed.
Post Follow-up to this messageOn 2008-04-01, Janis Papanagnou <Janis_Papanagnou@hotmail.com> wrote: > eskgwin@gmail.com wrote: <immense piece of home-grown XML code snipped> > > Have a look at xgawk (XML extended GNU awk) to process such data. > Perl has a number of XML handling packages that have a lot of use and polishing behind them; I'd recommend going that route. -- Christopher Mattern NOTICE Thank you for noticing this new notice Your noticing it has been noted And will be reported to the authorities
Post Follow-up to this messageEd Morton wrote: > > On 4/1/2008 1:36 PM, eskgwin@gmail.com wrote: > > That's not a problem but before I do any more: is my guess at your input f ile > format correct or should it instead be this (deleted the 2 lines immediate ly > before <meas-column>11</meas-column> ): > > <tlm-meas> > <day-time>2008/035:23:08:09.803</day-time> > <vehicle-time> 83289.803</vehicle-time> > <meas-column>8</meas-column> > <meas-value>-25.0335</meas-value> > </tlm-meas> > <tlm-meas> > <day-time>2008/035:23:08:25.333</day-time> > <vehicle-time> 83305.333</vehicle-time> > <meas-column>9</meas-column> > <meas-value>0</meas-value> > <limits-flag></limits-flag> > <meas-column>11</meas-column> > <meas-value>3.22123e+09</meas-value> > <limits-flag></limits-flag> > </tlm-meas> > > or should it really be something else? There's different solutions dependi ng on > the correct input format. > > > Then gawk isn't in your PATH or it may not already be installed on your ma chine. > > > You can use any awk that allows you to use a regular-expression as it's > record-separator (RS) but the only awk I personally know of that supports that > is gawk. We could come up with workarounds but gawk has many, many feature s that > make it a good choice of awk to use so if I were you I'd download and inst all it > from http://www.gnu.org/software/gawk/ if you don't already have it. > > Ed. > IMHO nawk and /usr/xpg4/bin/awk would work in this case. But it's a good idea to download GNU awk, I would choose http://www.sunfreeware.com/ which will guide you to the needed GNU libraries. -- Michael Tosch @ hp : com
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.