Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Joining lines
Is there a awk oneliner that would join two lines, only if the second line
has a space at the beginning?  ie:

1234
56
789
9876
1

would look like
123456
789
9876
1


--
Randomly generated signature --
We are coming after you. God may have mercy on you, but we won't  -- John Mc
Cain


Report this thread to moderator Post Follow-up to this message
Old Post
eldorado
07-29-04 08:55 PM


Re: Joining lines

eldorado wrote:

> Is there a awk oneliner that would join two lines, only if the second line
> has a space at the beginning?  ie:
>
> 1234
>  56
> 789
> 9876
> 1
>
> would look like
> 123456
> 789
> 9876
> 1
>
>

Something like this (untested):

gawk 'BEGIN{RS="";FS="\n ";OFS=""}$1=$1'

Regards,

Ed.


Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
07-29-04 08:55 PM


Re: Joining lines
In article <ceb8ia$1n6@netnews.proxy.lucent.com>,
Ed Morton  <morton@lsupcaemnt.com> wrote:
>
>
>eldorado wrote:
> 
>
>Something like this (untested):
>
>gawk 'BEGIN{RS="";FS="\n ";OFS=""}$1=$1'

Hey, that works.  Very clever!


Report this thread to moderator Post Follow-up to this message
Old Post
Kenny McCormack
07-29-04 08:55 PM


Re: Joining lines
On Thu, 29 Jul 2004, Kenny McCormack wrote:

> In article <ceb8ia$1n6@netnews.proxy.lucent.com>,
> Ed Morton  <morton@lsupcaemnt.com> wrote: 
>
> Hey, that works.  Very clever!
>

Ed, it does work! Thanks.

If you have a moment would you explain what the $1=$1 line does?

--
Randomly generated signature --
The worst thing about censorship is [deleted by censorship bereau].


Report this thread to moderator Post Follow-up to this message
Old Post
eldorado
07-29-04 08:55 PM


Re: Joining lines

eldorado wrote:
> On Thu, 29 Jul 2004, Kenny McCormack wrote:
>
> 
<snip> 
>
>
> Ed, it does work! Thanks.
>
> If you have a moment would you explain what the $1=$1 line does?
>

It tells awk to re-evaluate the fields so the resulting $0 has what I
specified as an OFS (i.e. nothing) rather than whatever it originally
had as an FS (i.e. "\n ") and the fact I'm doing it as a condition
invokes the default action behavior of printing $0.

Ed.

Ed.


Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
07-29-04 08:55 PM


Re: Joining lines
In article <cebjft$6mc@netnews.proxy.lucent.com>,
Ed Morton  <morton@lsupcaemnt.com> wrote:
>
>
>eldorado wrote: 
><snip> 
>
>It tells awk to re-evaluate the fields so the resulting $0 has what I
>specified as an OFS (i.e. nothing) rather than whatever it originally
>had as an FS (i.e. "\n ") and the fact I'm doing it as a condition
>invokes the default action behavior of printing $0.

Yes, but there is something wrong here.  First of all, purists will point
out that the "$1=$1" trick - using that as a cute shorthand for "re-scan
it and then print it" - fails, in the general case, if the $1 is a null
string (since null string evaluates to false).  However, as an aside, in
real life, I often maintain that this is actually a feature, since it has
the effect of filtering blank lines out of your output (which is usually
[but, of course, not always] desirable).

Anyway, back to the instant case, what do you think should be the output of
this command:

printf "\n foo" | gawk 'BEGIN{RS="";FS="\n "}{print NR,NF,$1,length($1)}'

I think that NF should be 2, and the length of the "foo field" should be 3,
but in my testing, NF always comes up 1 and the length always comes up 4.

Connecting the dots is left as an exercise for the reader...


Report this thread to moderator Post Follow-up to this message
Old Post
Kenny McCormack
07-30-04 01:55 AM


Re: Joining lines

Kenny McCormack wrote:
<snip>
> Anyway, back to the instant case, what do you think should be the output o
f
> this command:
>
>     printf "\n foo" | gawk 'BEGIN{RS="";FS="\n "}{print NR,NF,$1,length($1
)}'
>
> I think that NF should be 2, and the length of the "foo field" should be 3
,
> but in my testing, NF always comes up 1 and the length always comes up 4.

Here's how I'd explain that:

RS takes precedence over FS so in this case by setting RS="" we're
saying that a sequence of 1 or more blank lines is the record separator
and so the "\n" at the start of the printf is being treated as a
sequence of 1 blank line and so swalled as record separator. That just
leaves " foo" which is 1 field with size 4.

What I find odd though is that if I add a blank line to the end of the
input string to explicitly satisfy the RS:

printf "\n foo\n\n" |
gawk 'BEGIN{RS="";FS="\n "}{print NR,NF,$2,length($2)}'

Now the "foo" field IS number 2 and it's length is 3 which I find
confusing since I thought the end of input (file) was supposed to get
treated the same as the end of a record but that's not what's happening
in this case.

Ed.


Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
07-30-04 01:55 AM


Re: Joining lines
In article <cebqbu$9hk@netnews.proxy.lucent.com>,
Ed Morton  <morton@lsupcaemnt.com> wrote:
>
>
>Kenny McCormack wrote:
><snip> 
>
>Here's how I'd explain that:
>
>RS takes precedence over FS so in this case by setting RS="" we're
>saying that a sequence of 1 or more blank lines is the record separator
>and so the "\n" at the start of the printf is being treated as a
>sequence of 1 blank line and so swalled as record separator. That just
>leaves " foo" which is 1 field with size 4.

I can't say as I buy this.  I think you are right that "RS takes
precedence over FS", but that statement could only have relevance if RS==FS
(i.e., if they are set to the same thing).  Here, they are not; I leave it
as an exercise to prove that if they are equal, then every record has
either 0 or 1 field (cannot have more).

Further, having RS="" is, as far as I can tell, though I can find no
reference for this at the moment, equivalent to RS="\n\n+" - i.e., *2* or
more adjacent newlines.  Since the string I am feeding to gawk above
contains only 1 newline character in the entire string, RS does not come
into play.

>What I find odd though is that if I add a blank line to the end of the
>input string to explicitly satisfy the RS:
>
>printf "\n foo\n\n" |
>     gawk 'BEGIN{RS="";FS="\n "}{print NR,NF,$2,length($2)}'
>
>Now the "foo" field IS number 2 and it's length is 3 which I find
>confusing since I thought the end of input (file) was supposed to get
>treated the same as the end of a record but that's not what's happening
>in this case.

I could not replicate this.  And if you can, then it is clearly a bug.
(To make that more explicit, non-presence of an RS does not make an input
string invalid.  It just means that NR is never greater than 1...)

Observe: printf "foo" | gawk '{print NR,NF,$1,length($1)}'


Report this thread to moderator Post Follow-up to this message
Old Post
Kenny McCormack
07-30-04 08:55 AM


Re: Joining lines

Kenny McCormack wrote:
> In article <cebqbu$9hk@netnews.proxy.lucent.com>,
> Ed Morton  <morton@lsupcaemnt.com> wrote:
> 
>
>
> I can't say as I buy this.  I think you are right that "RS takes
> precedence over FS", but that statement could only have relevance if RS==F
S
> (i.e., if they are set to the same thing).

It's also relevant when one is a subset of the other. That's not quite
what's happening here, but it's close - the RS specification of "" is
sucking up the "\n" that you'd like to be part of the FS.

Here, they are not; I leave it
> as an exercise to prove that if they are equal, then every record has
> either 0 or 1 field (cannot have more).
>
> Further, having RS="" is, as far as I can tell, though I can find no
> reference for this at the moment, equivalent to RS="\n\n+" - i.e., *2* or
> more adjacent newlines.

I agree in that 2 adjacent newlines constitute one blank line but try
setting RS to "\n\n+" and see if it produces the results you expect (I
do this below and explain why they aren't exactly equivalent).

Since the string I am feeding to gawk above
> contains only 1 newline character in the entire string, RS does not come
> into play.

I disagree. printf "X" produces an X while printf "\nX" produces a blank
line followed by an X:

$ printf "X"
X$
$ printf "\nX"

X$

According to this from the GNU awk user's guide
(http://www.gnu.org/software/gawk/ma..._mono/gawk.html):

<By a special dispensation, an empty string as the value of RS indicates
that records are separated by *one* or more blank lines.>

so that single blank line should be treated as a record separator.

Having said that, the documentation DOES also go on to say the
contradictory:

<You can achieve the same effect as RS = "" by assigning the string
"\n\n+" to RS.>

so let's try that with a couple of versions of gawk:

gawk 3.0.4 on Solaris(ksh88):

$ printf "\n foo" |gawk 'BEGIN{RS="";FS="\n ";OFS=";"}{print NR,NF,$1,$2}'
1;1; foo;
$ printf "\n foo" |gawk 'BEGIN{RS="\n\n+";FS="\n ";OFS=";"}{print
NR,NF,$1,$2}'
1;2;;foo

gawk 3.1.3 on Cygwin(bash):

$ printf "\n foo" |gawk 'BEGIN{RS="";FS="\n ";OFS=";"}{print NR,NF,$1,$2}'
1;1; foo;
$ printf "\n foo" |gawk 'BEGIN{RS="\n\n+";FS="\n ";OFS=";"}{print
NR,NF,$1,$2}'
1;2;;fo

So, setting RS="" vs RS="\n\n+" produces different results on both
platforms. That is explained by this further text in the manual:

<There is an important difference between RS = "" and RS = "\n\n+". In
the first case, leading newlines in the input data file are ignored
....In the second case, this special processing is not done.>

and that's why your initial newline is not being treated as you'd like.

I've no idea why gawk 3.1.3 chooses to truncate "foo" to "fo" in the
final example above!

> 
>
>
> I could not replicate this.  And if you can, then it is clearly a bug.

I couldn't replicate it on my Cygwin distibution with gawk 3.1.3 either,
but this is gawk 3.0.4 on either ksh88 or bash on Solaris (SunOS 5.8):

$ printf "\n foo" |gawk 'BEGIN{RS="";FS="\n "}{print
NR,NF,$1,length($1)}'
1 1  foo 4
$ printf "\n foo\n\n" |gawk 'BEGIN{RS="";FS="\n "}{print
NR,NF,$2,length($2)}'
1 2 foo 3
$ gawk --version
GNU Awk 3.0.4

> (To make that more explicit, non-presence of an RS does not make an input
> string invalid.  It just means that NR is never greater than 1...)
>

I agree that's the intent. It looks like there's bugs in both version
of gawk I'm using 8-(.

Ed.

> Observe: printf "foo" | gawk '{print NR,NF,$1,length($1)}'
>


Report this thread to moderator Post Follow-up to this message
Old Post
Ed Morton
07-30-04 08:55 AM


Re: Joining lines
Hello,

In article <MJWdnXESBq14QpTc4p2dnA@comcast.com>, Ed Morton wrote:
> $ printf "\n foo" |gawk 'BEGIN{RS="\n\n+";FS="\n ";OFS=";"}{print
> NR,NF,$1,$2}'
> 1;2;;fo
...
> I've no idea why gawk 3.1.3 chooses to truncate "foo" to "fo" in the
> final example above!

That was a bug in gawk 3.1.3.  A few days ago I have verified that the
current beta, which Arnold has also announced here, has this bug fixed.

Have a nice day,
Stepan

Report this thread to moderator Post Follow-up to this message
Old Post
Stepan Kasal
07-30-04 01:55 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

AWK archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 04:28 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.