Home > Archive > Unix Programming > August 2006 > need to delete 1st occurrence of \n per line
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
need to delete 1st occurrence of \n per line
|
|
|
| I have files from US Customs that do not come in sequential order --
although each 'batch' does have a sequential number.
I need to 'reorder' the file so that it is in sequential order. I can
do this with the 'sort' command, but unfortunately my input file has \n
characters per line. Before the next set of data there are two \n --
I want to delete the first occurrence of \n on the input file.
I've tried "tr" but it does all \n
I've tried "sed" (both the delete and the replace), but when I use \n
in the command line...it does nothing.
Can anyone help?
Thanks : )
| |
| Chris F.A. Johnson 2006-08-09, 7:02 pm |
| On 2006-08-09, Lou wrote:
> I have files from US Customs that do not come in sequential order --
> although each 'batch' does have a sequential number.
> I need to 'reorder' the file so that it is in sequential order. I can
> do this with the 'sort' command, but unfortunately my input file has \n
> characters per line. Before the next set of data there are two \n --
> I want to delete the first occurrence of \n on the input file.
What do the lines look like, exactly?
> I've tried "tr" but it does all \n
What did you try? Give the exact command.
> I've tried "sed" (both the delete and the replace), but when I use \n
> in the command line...it does nothing.
How did you try it? What version of sed are you using?
Did you try:
sed 's//\\n//'
> Can anyone help?
If you give more information, probably.
--
Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
Shell Scripting Recipes: | My code in this post, if any,
A Problem-Solution Approach | is released under the
2005, Apress | GNU General Public Licence
| |
|
|
Chris F.A. Johnson wrote:
> On 2006-08-09, Lou wrote:
>
> What do the lines look like, exactly?
>
>
> What did you try? Give the exact command.
>
>
> How did you try it? What version of sed are you using?
>
> Did you try:
>
> sed 's//\\n//'
>
>
> If you give more information, probably.
>
> --
> Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
> Shell Scripting Recipes: | My code in this post, if any,
> A Problem-Solution Approach | is released under the
> 2005, Apress | GNU General Public Licence
I have tried:
tr -d '\012' < $EDI_ROOT/data/xamsusex > $EDI_ROOT/data/xamsusey
sed 's/\n//' <$EDI_ROOT/data/testxams >$EDI_ROOT/data/testxamsx
sed '/\n/d' <$EDI_ROOT/data/testxams >$EDI_ROOT/data/testxamsx
When I tried sed 's//\\n//' (that you mentioned) I got an error:
sed: 0602-410 The first regular expression cannot be null.
Not sure what version I am on...how do I find out?
Thanks :)
| |
| Chris F.A. Johnson 2006-08-09, 7:02 pm |
| On 2006-08-09, Lou wrote:
> Chris F.A. Johnson wrote:
>
> I have tried:
> tr -d '\012' < $EDI_ROOT/data/xamsusex > $EDI_ROOT/data/xamsusey
>
> sed 's/\n//' <$EDI_ROOT/data/testxams >$EDI_ROOT/data/testxamsx
> sed '/\n/d' <$EDI_ROOT/data/testxams >$EDI_ROOT/data/testxamsx
>
> When I tried sed 's//\\n//' (that you mentioned) I got an error:
> sed: 0602-410 The first regular expression cannot be null.
Sorry, my typo. Should have been:
sed 's/\\n//'
> Not sure what version I am on...how do I find out?
If it's GNU sed (e.g., on Linux):
sed --version
--
Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
Shell Scripting Recipes: | My code in this post, if any,
A Problem-Solution Approach | is released under the
2005, Apress | GNU General Public Licence
| |
| Pascal Bourguignon 2006-08-09, 7:02 pm |
| "Lou" <onewayonlytojesus@yahoo.com> writes:
> Chris F.A. Johnson wrote:
> [...]
> I have tried:
> tr -d '\012' < $EDI_ROOT/data/xamsusex > $EDI_ROOT/data/xamsusey
>
> sed 's/\n//' <$EDI_ROOT/data/testxams >$EDI_ROOT/data/testxamsx
> sed '/\n/d' <$EDI_ROOT/data/testxams >$EDI_ROOT/data/testxamsx
>
> When I tried sed 's//\\n//' (that you mentioned) I got an error:
> sed: 0602-410 The first regular expression cannot be null.
>
> Not sure what version I am on...how do I find out?
> Thanks :)
You have a bigger problem than what you think.
These tools, tr, sed, etc, all work on lines.
On unix, lines are terminated with a Line-Feed, ASCII LF, 12(dec) byte, aka \n
So, when you say that you want to delete the first occurence of \n,
what you're actually saying is that you want to join the first two
lines. From a file containing:
---------------
line one
line two
line three
---------------
you're saying you want to get a file containing:
---------------
line oneline two
line three
---------------
If this is really what you want, then you can use sed to do it:
sed -e '1{' -e N -e 's/\n//' -e '}' < in > out
Now if what you're saying is that you have multiline records separated
by an empty line, you can easily convert it to single line records:
From:
---------------
field a1
field a2
field a3
field b1
field b2
field b3
field c1
field c2
field c3
---------------
you want:
---------------
field a1+field a2+field a3
field b1+field b2+field b3
field c1+field c2+field c3
---------------
you can easily do it with awk:
awk -v recsep='+' '
BEGIN{record="";sep="";}
/^$/{printf "%s\n",record;record="";sep="";next;}
{record=record sep $0;sep=recsep;next;}
END{if(record!=""){printf "%s\n",record;}}' < in > out
--
__Pascal Bourguignon__ http://www.informatimago.com/
Until real software engineering is developed, the next best practice
is to develop with a dynamic system that has extreme late binding in
all aspects. The first system to really do this in an important way
is Lisp. -- Alan Kay
| |
| William James 2006-08-09, 7:02 pm |
| Pascal Bourguignon wrote:
> "Lou" <onewayonlytojesus@yahoo.com> writes:
>
>
> You have a bigger problem than what you think.
> These tools, tr, sed, etc, all work on lines.
> On unix, lines are terminated with a Line-Feed, ASCII LF, 12(dec) byte, aka \n
>
> So, when you say that you want to delete the first occurence of \n,
> what you're actually saying is that you want to join the first two
> lines. From a file containing:
>
> ---------------
> line one
> line two
> line three
> ---------------
>
> you're saying you want to get a file containing:
> ---------------
> line oneline two
> line three
> ---------------
>
>
> If this is really what you want, then you can use sed to do it:
>
> sed -e '1{' -e N -e 's/\n//' -e '}' < in > out
awk '(ORS=ORS ? x : RS) || 1' in >out
>
>
>
>
> Now if what you're saying is that you have multiline records separated
> by an empty line, you can easily convert it to single line records:
>
> From:
> ---------------
> field a1
> field a2
> field a3
>
> field b1
> field b2
> field b3
>
> field c1
> field c2
> field c3
>
> ---------------
>
> you want:
> ---------------
> field a1+field a2+field a3
> field b1+field b2+field b3
> field c1+field c2+field c3
> ---------------
>
> you can easily do it with awk:
>
> awk -v recsep='+' '
> BEGIN{record="";sep="";}
> /^$/{printf "%s\n",record;record="";sep="";next;}
> {record=record sep $0;sep=recsep;next;}
> END{if(record!=""){printf "%s\n",record;}}' < in > out
nawk 'BEGIN{RS="";OFS="+"}{$1=$1;print}' in >out
--
Every decision of the committee can be locally rationalized as
the right thing. We believe that the sum of these decisions,
however, has produced something greater than its parts; an
unwieldy, overweight beast, with significant costs (especially on
other than micro-codable personal Lisp engines) in compiler size
and speed, in runtime performance, in programmer overhead needed
to produce efficient programs, and in intellectual overload for a
programmer wishing to be a proficient COMMON LISP programmer.
| |
| John W. Krahn 2006-08-09, 7:02 pm |
| Lou wrote:
>
> I have tried:
> tr -d '\012' < $EDI_ROOT/data/xamsusex > $EDI_ROOT/data/xamsusey
>
> sed 's/\n//' <$EDI_ROOT/data/testxams >$EDI_ROOT/data/testxamsx
> sed '/\n/d' <$EDI_ROOT/data/testxams >$EDI_ROOT/data/testxamsx
>
> When I tried sed 's//\\n//' (that you mentioned) I got an error:
> sed: 0602-410 The first regular expression cannot be null.
perl -i -pe'?\n?&&chomp' $EDI_ROOT/data/testxams
John
--
use Perl;
program
fulfillment
| |
|
|
Pascal Bourguignon wrote:
> "Lou" <onewayonlytojesus@yahoo.com> writes:
>
>
> You have a bigger problem than what you think.
> These tools, tr, sed, etc, all work on lines.
> On unix, lines are terminated with a Line-Feed, ASCII LF, 12(dec) byte, aka \n
>
> So, when you say that you want to delete the first occurence of \n,
> what you're actually saying is that you want to join the first two
> lines. From a file containing:
>
> ---------------
> line one
> line two
> line three
> ---------------
>
> you're saying you want to get a file containing:
> ---------------
> line oneline two
> line three
> ---------------
>
>
> If this is really what you want, then you can use sed to do it:
>
> sed -e '1{' -e N -e 's/\n//' -e '}' < in > out
>
>
>
>
> Now if what you're saying is that you have multiline records separated
> by an empty line, you can easily convert it to single line records:
>
> From:
> ---------------
> field a1
> field a2
> field a3
>
> field b1
> field b2
> field b3
>
> field c1
> field c2
> field c3
>
> ---------------
>
> you want:
> ---------------
> field a1+field a2+field a3
> field b1+field b2+field b3
> field c1+field c2+field c3
> ---------------
>
> you can easily do it with awk:
>
> awk -v recsep='+' '
> BEGIN{record="";sep="";}
> /^$/{printf "%s\n",record;record="";sep="";next;}
> {record=record sep $0;sep=recsep;next;}
> END{if(record!=""){printf "%s\n",record;}}' < in > out
>
>
> --
> __Pascal Bourguignon__ http://www.informatimago.com/
> Until real software engineering is developed, the next best practice
> is to develop with a dynamic system that has extreme late binding in
> all aspects. The first system to really do this in an important way
> is Lisp. -- Alan Kay
Thanks for the information. You understand my dilema -- that's good.
Unfortunately when I tried this, I got an error:
awk: 0602-592 String +BEGIN{rec cannot contain a newline character. The
source
line is 1.
| |
|
|
Lou wrote:
> Pascal Bourguignon wrote:
>
> Thanks for the information. You understand my dilema -- that's good.
> Unfortunately when I tried this, I got an error:
> awk: 0602-592 String +BEGIN{rec cannot contain a newline character. The
> source
> line is 1.
Oops...forgot some more responses:
awk '(ORS=ORS ? x :RS) || 1' in > out
This worked for combining two of the lines together, but I want it to
combine all of the lines until it finds one with two \n on it.
nawk command -- did not work because data longer than 10,239 bytes
sed 's/\\n//' does nothing to the file.
sed --version did not work.
I don't have Perl.
:)
| |
| Chuck Dillon 2006-08-10, 8:01 am |
| Lou wrote:
>
>
> This worked for combining two of the lines together, but I want it to
> combine all of the lines until it finds one with two \n on it.
> nawk command -- did not work because data longer than 10,239 bytes
>
So you have a text file containing sets of consecutive lines that
constitue a record and each record contains a sequence number that you
want to sort on. The size of a record is on the order of 10k.
Your approach probably won't work because your system's text processing
functions (e.g. sort, sed...) probably won't handle the 10k line length
of your combined records.
I suggest the following strategy. Use awk to process the file and
prepend two sequence numbers on each line. The record number and the
line of the record...
12345 1 Record 12345: Joe Bloe
12345 2 Address: Wherever
12345 3 SSH: 000-00-0000
23496 1 Record 23496: Jane Doe
23496 2 Address: Somewhere
23496 3 SSH: 111-11-1111
....
Use sort to sort the file and use sed to strip the prepended keys off
to create the sorted file.
HTH,
-- ced
--
Chuck Dillon
Manager of Software Development, Bioinformatics
NimbleGen Systems Inc.
| |
|
|
Chuck Dillon wrote:
> Lou wrote:
>
> So you have a text file containing sets of consecutive lines that
> constitue a record and each record contains a sequence number that you
> want to sort on. The size of a record is on the order of 10k.
>
> Your approach probably won't work because your system's text processing
> functions (e.g. sort, sed...) probably won't handle the 10k line length
> of your combined records.
>
> I suggest the following strategy. Use awk to process the file and
> prepend two sequence numbers on each line. The record number and the
> line of the record...
>
> 12345 1 Record 12345: Joe Bloe
> 12345 2 Address: Wherever
> 12345 3 SSH: 000-00-0000
> 23496 1 Record 23496: Jane Doe
> 23496 2 Address: Somewhere
> 23496 3 SSH: 111-11-1111
> ...
>
> Use sort to sort the file and use sed to strip the prepended keys off
> to create the sorted file.
>
> HTH,
>
> -- ced
>
>
> --
> Chuck Dillon
> Manager of Software Development, Bioinformatics
> NimbleGen Systems Inc.
Hi Chuck:
Thanks for the info...I did this but apparently I'm not doing something
right with the sort command
sort +1n ......
because 11-19 are coming before 2-9
line1
line11
line12
line13...etc
line2
line3
| |
| Pascal Bourguignon 2006-08-10, 7:00 pm |
| "Lou" <onewayonlytojesus@yahoo.com> writes:
>
> Thanks for the information. You understand my dilema -- that's good.
> Unfortunately when I tried this, I got an error:
> awk: 0602-592 String +BEGIN{rec cannot contain a newline character. The
> source
> line is 1.
You can add \ at the end of the lines in the string to have them
substituted by spaces:
awk -v recsep='+' ' \
BEGIN{record="";sep="";} \
/^$/{printf "%s\n",record;record="";sep="";next;} \
{record=record sep $0;sep=recsep;next;} \
END{if(record!=""){printf "%s\n",record;}}' < in > out
If that still doesn't work, you can put everything on oneline:
awk -v recsep='+' ' BEGIN{record="";sep="";} /^$/{printf "%s\n",record;record="";sep="";next;} {record=record sep $0;sep=recsep;next;} END{if(record!=""){printf "%s\n",record;}}' < in > out
If that still doesn't work, you can put the script in a file:
cat > multi-to-single.awk <<'EOF'
BEGIN{record="";sep="";}
/^$/{printf "%s\n",record;record="";sep="";next;}
{record=record sep $0;sep=recsep;next;}
END{if(record!=""){printf "%s\n",record;}}
EOF
awk -v recsep='+' -f multi-to-single.awk < in > out
But really, what you should do is to read some tutorial about unix and
shell programming.
--
__Pascal Bourguignon__ http://www.informatimago.com/
You're always typing.
Well, let's see you ignore my
sitting on your hands.
| |
|
|
Pascal Bourguignon wrote:
> "Lou" <onewayonlytojesus@yahoo.com> writes:
>
>
> You can add \ at the end of the lines in the string to have them
> substituted by spaces:
>
> awk -v recsep='+' ' \
> BEGIN{record="";sep="";} \
> /^$/{printf "%s\n",record;record="";sep="";next;} \
> {record=record sep $0;sep=recsep;next;} \
> END{if(record!=""){printf "%s\n",record;}}' < in > out
>
> If that still doesn't work, you can put everything on oneline:
>
> awk -v recsep='+' ' BEGIN{record="";sep="";} /^$/{printf "%s\n",record;record="";sep="";next;} {record=record sep $0;sep=recsep;next;} END{if(record!=""){printf "%s\n",record;}}' < in > out
>
>
> If that still doesn't work, you can put the script in a file:
>
> cat > multi-to-single.awk <<'EOF'
> BEGIN{record="";sep="";}
> /^$/{printf "%s\n",record;record="";sep="";next;}
> {record=record sep $0;sep=recsep;next;}
> END{if(record!=""){printf "%s\n",record;}}
> EOF
>
> awk -v recsep='+' -f multi-to-single.awk < in > out
>
>
>
> But really, what you should do is to read some tutorial about unix and
> shell programming.
>
>
> --
> __Pascal Bourguignon__ http://www.informatimago.com/
> You're always typing.
> Well, let's see you ignore my
> sitting on your hands.
Thanks again for trying. No matter what I did I always got the same
error. I'm still trying some other options.
Once again...thanks. : )
| |
|
|
Lou wrote:
> Chuck Dillon wrote:
>
>
> Hi Chuck:
> Thanks for the info...I did this but apparently I'm not doing something
> right with the sort command
> sort +1n ......
> because 11-19 are coming before 2-9
> line1
> line11
> line12
> line13...etc
> line2
> line3
Hey!
I worked with the sequence number to retain leading zeroes and got
everything else to work.
I really appreciate the help...it made me look at the process in a
different light.
:)
| |
| Henry Townsend 2006-08-11, 7:01 pm |
| Lou wrote:
>
> Hey!
> I worked with the sequence number to retain leading zeroes and got
> everything else to work.
You could do this ... or you could read the sort man page and use sort
-n (numeric).
|
|
|
|
|