For Programmers: Free Programming Magazines  


Home > Archive > AWK > August 2005 > extra help with awk









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author extra help with awk
Dave

2005-08-11, 4:59 pm

I posted a message in another newsgroup, but didnt realise awk had its own.

Basically i asked how i could search for a / and then append the beginning
of the line with - and the same to the line below like this:

text here / text there

would become:

- Text here
- Text there.

The script i was shown is the following.

awk -F" */ *" '
{
if (NF > 1)
fmt = "- %s\n"
else
fmt = "%s\n"
for ( i = 1; i <= NF; ++i) printf fmt, $i
}
NF == 0
' input.txt > output.txt

My problem now is that some lines dont have the / in the middle, its
actually at the end of the line instead like so:

text here /
text there
text here also / text there also

So using the above script my output would look like this:

- Text here
-
Text there
- Text here also
- Text there also

Is there anyway i can use the above script and alter it to search for the /
at the end of the line aswell as in the middle.

Any pointers would be appreciated, thanks


William James

2005-08-11, 4:59 pm


Dave wrote:

> Basically i asked how i could search for a / and then append the beginning
> of the line with - and the same to the line below like this:
>
> text here / text there
>
> would become:
>
> - Text here
> - Text there.
>
> The script i was shown is the following.
>
> awk -F" */ *" '
> {
> if (NF > 1)
> fmt = "- %s\n"
> else
> fmt = "%s\n"
> for ( i = 1; i <= NF; ++i) printf fmt, $i
> }
> NF == 0
> ' input.txt > output.txt
>
> My problem now is that some lines dont have the / in the middle, its
> actually at the end of the line instead like so:
>
> text here /
> text there
> text here also / text there also
>
> So using the above script my output would look like this:
>
> - Text here
> -
> Text there
> - Text here also
> - Text there also
>
> Is there anyway i can use the above script and alter it to search for the /
> at the end of the line aswell as in the middle.


BEGIN { FS = " */ *" } # Set field-separator.
NF > 1 {
print "- " $1
if ( $2 )
print "- " $2
else
prefix = "- "
next
}
{ print prefix $0; prefix = "" }

Dave

2005-08-11, 4:59 pm


"William James" <w_a_x_man@yahoo.com> wrote in message
news:1123772862.233572.227180@z14g2000cwz.googlegroups.com...
>
> Dave wrote:
>
beginning[color=darkred]
the /[color=darkred]
>
> BEGIN { FS = " */ *" } # Set field-separator.
> NF > 1 {
> print "- " $1
> if ( $2 )
> print "- " $2
> else
> prefix = "- "
> next
> }
> { print prefix $0; prefix = "" }
>


Thanks for your reply William, but when i tried this, the results are the
same as the original script i was shown.
I still get the empty lines with just a '- ' on them, with the text on the
line below.

I need to have:

This text here /
and this text here

And this text / and this too

to be altered like so:

- This text here
- And this text here

- And this text
- And this too

thanks in advance


William James

2005-08-11, 4:59 pm

Dave wrote:
> "William James" <w_a_x_man@yahoo.com> wrote in message
> news:1123772862.233572.227180@z14g2000cwz.googlegroups.com...
> beginning
> the /
>
> Thanks for your reply William, but when i tried this, the results are the
> same as the original script i was shown.
> I still get the empty lines with just a '- ' on them, with the text on the
> line below.
>
> I need to have:
>
> This text here /
> and this text here
>
> And this text / and this too
>
> to be altered like so:
>
> - This text here
> - And this text here
>
> - And this text
> - And this too
>
> thanks in advance


The program works correctly on normal text files.
If your text file is tainted with invisible tabs:


BEGIN { FS = " */ *" } # Set field-separator.
# Remove all trailing whitespace.
{ sub( /[ \t]+$/, "" ) }
NF > 1 {
print "- " $1
if ( $2 )
print "- " $2
else
prefix = "- "
next
}
{ print prefix $0; prefix = "" }

Dave

2005-08-11, 9:59 pm


"William James" <w_a_x_man@yahoo.com> wrote in message
news:1123795980.385352.145070@g43g2000cwa.googlegroups.com...
> Dave wrote:
for[color=darkred]
the[color=darkred]
the[color=darkred]
>
> The program works correctly on normal text files.
> If your text file is tainted with invisible tabs:
>
>
> BEGIN { FS = " */ *" } # Set field-separator.
> # Remove all trailing whitespace.
> { sub( /[ \t]+$/, "" ) }
> NF > 1 {
> print "- " $1
> if ( $2 )
> print "- " $2
> else
> prefix = "- "
> next
> }
> { print prefix $0; prefix = "" }
>


Thanks again William,

Firstly i must apologise.
Yes your script works, both of them. However i am getting a strange problem
when i try to use it on files that are already there. If i create new files,
and run the script, it works perfectly. When i run it on the existing files,
i get the fore mentioned problem where it splits the second line up.

I have found sed one liner scripts to remove white spaces from the front /
end of lines, also ones to convert newlines from DOS to UNIX and visa-versa,
but nothing seems to work. Is there any other invisible characters that
could be stopping this script from working properly on these files?


William James

2005-08-11, 9:59 pm


Dave wrote:

[color=darkred]
> Yes your script works, both of them. However i am getting a strange problem
> when i try to use it on files that are already there. If i create new files,
> and run the script, it works perfectly. When i run it on the existing files,
> i get the fore mentioned problem where it splits the second line up.
>
> I have found sed one liner scripts to remove white spaces from the front /
> end of lines, also ones to convert newlines from DOS to UNIX and visa-versa,
> but nothing seems to work. Is there any other invisible characters that
> could be stopping this script from working properly on these files?


If the files were created under DOS or windoze and you're using Unix,
there will be carriage returns (ASCII 13) at the ends of the lines;
if the files were created with a word processor, who knows what
they may contain. (I hate invisible characters in text files! I
always have my text editors set to insert spaces when I hit the TAB
key.)

The following version makes Awk eat up all control characters on
either side of the /.

# Set field-separator.
# Make it eat up characters whose ASCII codes are 1--32
# (octal 1--40).
BEGIN { FS = "[\1-\40]*/[\1-\40]*" }

NF > 1 {
print "- " $1
if ( $2 )
print "- " $2
else
prefix = "- "
next
}

{ print prefix $0; prefix = "" }

William James

2005-08-11, 9:59 pm


William James wrote:
> Dave wrote:
>
>
>
> If the files were created under DOS or windoze and you're using Unix,
> there will be carriage returns (ASCII 13) at the ends of the lines;
> if the files were created with a word processor, who knows what
> they may contain. (I hate invisible characters in text files! I
> always have my text editors set to insert spaces when I hit the TAB
> key.)
>
> The following version makes Awk eat up all control characters on
> either side of the /.
>
> # Set field-separator.
> # Make it eat up characters whose ASCII codes are 1--32
> # (octal 1--40).
> BEGIN { FS = "[\1-\40]*/[\1-\40]*" }
>
> NF > 1 {
> print "- " $1
> if ( $2 )
> print "- " $2
> else
> prefix = "- "
> next
> }
>
> { print prefix $0; prefix = "" }


May as well remove the carriage returns from all lines:

# Set field-separator.
# Make it eat up characters whose ASCII codes are 1--32
# (octal 1--40).
BEGIN { FS = "[\1-\40]*/[\1-\40]*" }

{ sub( /[ \t\r]+$/, "" ) }
NF > 1 {
print "- " $1
if ( $2 )
print "- " $2
else
prefix = "- "
next
}

{ print prefix $0; prefix = "" }

Dave

2005-08-12, 8:59 am


"William James" <w_a_x_man@yahoo.com> wrote in message
news:1123805374.018151.325270@g43g2000cwa.googlegroups.com...
>
> William James wrote:
problem[color=darkred]
files,[color=darkred]
files,[color=darkred]
front /[color=darkred]
visa-versa,[color=darkred]
that[color=darkred]
>
> May as well remove the carriage returns from all lines:
>
> # Set field-separator.
> # Make it eat up characters whose ASCII codes are 1--32
> # (octal 1--40).
> BEGIN { FS = "[\1-\40]*/[\1-\40]*" }
>
> { sub( /[ \t\r]+$/, "" ) }
> NF > 1 {
> print "- " $1
> if ( $2 )
> print "- " $2
> else
> prefix = "- "
> next
> }
>
> { print prefix $0; prefix = "" }
>


William, you are a genius. I owe you a pint.. or two.
I cannot tell you how grateful I am, the amount of time this will save is
unbelievable.

Thankyou.


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com