For Programmers: Free Programming Magazines  


Home > Archive > Unix Shell Programming > November 2006 > sed issue









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author sed issue
Leslie Rhorer

2006-11-24, 7:00 pm

Hello all,

I am far from being a Unix guru, so please forgive my ignorance in
advance. I am attempting to use sed to chop up a long line into several
smaller lines based on a reliably repeating pattern, but its is not working
because of an interfering issue. The file is being created by expect. The
expect script telnets to a device whose output is VT-100 (or something
similar), so it spits out lots of escape sequences. When I run sed against
the log file, it truncates the file after the first line, and I have been
unable to get it to read past the last character of the first line. In
fact, there are only two long lines, and the information I need is all in
the second line. I believe the problem is the first line is truncated with
0D, or to be more exact, the string with which sed seems to be having the
problem is

...36 0D 00 1B 02 30...

Sed's output is truncated after the 36 and a newline inserted. This
seems to be the only instance of the problematic string, but it is in every
one of the expect log output files. How can I remove the offending
characters so I can have sed go about the business of parsing the rest of
the file for the strings in which I am interested?



Janis Papanagnou

2006-11-24, 7:00 pm

Leslie Rhorer wrote:
> Hello all,
>
> I am far from being a Unix guru, so please forgive my ignorance in
> advance. I am attempting to use sed to chop up a long line into several
> smaller lines based on a reliably repeating pattern, but its is not working
> because of an interfering issue. The file is being created by expect. The
> expect script telnets to a device whose output is VT-100 (or something
> similar), so it spits out lots of escape sequences. When I run sed against
> the log file, it truncates the file after the first line, and I have been
> unable to get it to read past the last character of the first line. In
> fact, there are only two long lines, and the information I need is all in
> the second line. I believe the problem is the first line is truncated with
> 0D, or to be more exact, the string with which sed seems to be having the
> problem is
>
> ...36 0D 00 1B 02 30...
>
> Sed's output is truncated after the 36 and a newline inserted. This
> seems to be the only instance of the problematic string, but it is in every
> one of the expect log output files. How can I remove the offending
> characters so I can have sed go about the business of parsing the rest of
> the file for the strings in which I am interested?


To remove characters from a stream use (see man page for details)...

tr -d '\NNN'

where NNN is the octal representation of the character. You may specify more
than one character in the argument to option -d. You can also try to specify
a character class '[:cntrl:]' if in your case you suspect control characters
to be responsible for your problem.

Janis
Jon LaBadie

2006-11-25, 7:00 pm

Leslie Rhorer wrote:
> Hello all,
>
> I am far from being a Unix guru, so please forgive my ignorance in
> advance. I am attempting to use sed to chop up a long line into several
> smaller lines based on a reliably repeating pattern, but its is not working
> because of an interfering issue. The file is being created by expect. The
> expect script telnets to a device whose output is VT-100 (or something
> similar), so it spits out lots of escape sequences. When I run sed against
> the log file, it truncates the file after the first line, and I have been
> unable to get it to read past the last character of the first line. In
> fact, there are only two long lines, and the information I need is all in
> the second line. I believe the problem is the first line is truncated with
> 0D, or to be more exact, the string with which sed seems to be having the
> problem is
>
> ...36 0D 00 1B 02 30...
>
> Sed's output is truncated after the 36 and a newline inserted. This
> seems to be the only instance of the problematic string, but it is in every
> one of the expect log output files. How can I remove the offending
> characters so I can have sed go about the business of parsing the rest of
> the file for the strings in which I am interested?


The null byte following the 0D could also be causing problems.
To get rid of control chars leaving only tabs and newlines,
you could try:

tr -d '\0[^A-^H][^K-^_]' < datafile > newfile

Those sequences are ^A = control-A, not ^ and A,
and the last one, ^_, is control-_

Leslie Rhorer

2006-11-25, 10:02 pm

"Jon LaBadie" <jxlabadie@axcxmx.org> wrote in message
news:OLidncT5SfPOLvXYnZ2dnUVZ_tadnZ2d@co
mcast.com...
> Leslie Rhorer wrote:
>
> The null byte following the 0D could also be causing problems.
> To get rid of control chars leaving only tabs and newlines,
> you could try:
>
> tr -d '\0[^A-^H][^K-^_]' < datafile > newfile
>
> Those sequences are ^A = control-A, not ^ and A,
> and the last one, ^_, is control-_


Thanks, everyone. I was able to get tr to do the job. I never could
get the file into a state sed liked, but by using multiple pipes of tr -A
and tr -d, I was able to come up with a file which grep could filter. The
final output is a brief and clean .csv file which can easily be imported
into a spreadsheet.


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com