Code Comments
Programming Forum and web based access to our favorite programming groups.Is there a awk oneliner that would join two lines, only if the second line has a space at the beginning? ie: 1234 56 789 9876 1 would look like 123456 789 9876 1 -- Randomly generated signature -- We are coming after you. God may have mercy on you, but we won't -- John Mc Cain
Post Follow-up to this message
eldorado wrote:
> Is there a awk oneliner that would join two lines, only if the second line
> has a space at the beginning? ie:
>
> 1234
> 56
> 789
> 9876
> 1
>
> would look like
> 123456
> 789
> 9876
> 1
>
>
Something like this (untested):
gawk 'BEGIN{RS="";FS="\n ";OFS=""}$1=$1'
Regards,
Ed.
Post Follow-up to this messageIn article <ceb8ia$1n6@netnews.proxy.lucent.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>eldorado wrote:
>
>
>Something like this (untested):
>
>gawk 'BEGIN{RS="";FS="\n ";OFS=""}$1=$1'
Hey, that works. Very clever!
Post Follow-up to this messageOn Thu, 29 Jul 2004, Kenny McCormack wrote: > In article <ceb8ia$1n6@netnews.proxy.lucent.com>, > Ed Morton <morton@lsupcaemnt.com> wrote: > > Hey, that works. Very clever! > Ed, it does work! Thanks. If you have a moment would you explain what the $1=$1 line does? -- Randomly generated signature -- The worst thing about censorship is [deleted by censorship bereau].
Post Follow-up to this messageeldorado wrote: > On Thu, 29 Jul 2004, Kenny McCormack wrote: > > <snip> > > > Ed, it does work! Thanks. > > If you have a moment would you explain what the $1=$1 line does? > It tells awk to re-evaluate the fields so the resulting $0 has what I specified as an OFS (i.e. nothing) rather than whatever it originally had as an FS (i.e. "\n ") and the fact I'm doing it as a condition invokes the default action behavior of printing $0. Ed. Ed.
Post Follow-up to this messageIn article <cebjft$6mc@netnews.proxy.lucent.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>eldorado wrote:
><snip>
>
>It tells awk to re-evaluate the fields so the resulting $0 has what I
>specified as an OFS (i.e. nothing) rather than whatever it originally
>had as an FS (i.e. "\n ") and the fact I'm doing it as a condition
>invokes the default action behavior of printing $0.
Yes, but there is something wrong here. First of all, purists will point
out that the "$1=$1" trick - using that as a cute shorthand for "re-scan
it and then print it" - fails, in the general case, if the $1 is a null
string (since null string evaluates to false). However, as an aside, in
real life, I often maintain that this is actually a feature, since it has
the effect of filtering blank lines out of your output (which is usually
[but, of course, not always] desirable).
Anyway, back to the instant case, what do you think should be the output of
this command:
printf "\n foo" | gawk 'BEGIN{RS="";FS="\n "}{print NR,NF,$1,length($1)}'
I think that NF should be 2, and the length of the "foo field" should be 3,
but in my testing, NF always comes up 1 and the length always comes up 4.
Connecting the dots is left as an exercise for the reader...
Post Follow-up to this message
Kenny McCormack wrote:
<snip>
> Anyway, back to the instant case, what do you think should be the output o
f
> this command:
>
> printf "\n foo" | gawk 'BEGIN{RS="";FS="\n "}{print NR,NF,$1,length($1
)}'
>
> I think that NF should be 2, and the length of the "foo field" should be 3
,
> but in my testing, NF always comes up 1 and the length always comes up 4.
Here's how I'd explain that:
RS takes precedence over FS so in this case by setting RS="" we're
saying that a sequence of 1 or more blank lines is the record separator
and so the "\n" at the start of the printf is being treated as a
sequence of 1 blank line and so swalled as record separator. That just
leaves " foo" which is 1 field with size 4.
What I find odd though is that if I add a blank line to the end of the
input string to explicitly satisfy the RS:
printf "\n foo\n\n" |
gawk 'BEGIN{RS="";FS="\n "}{print NR,NF,$2,length($2)}'
Now the "foo" field IS number 2 and it's length is 3 which I find
confusing since I thought the end of input (file) was supposed to get
treated the same as the end of a record but that's not what's happening
in this case.
Ed.
Post Follow-up to this messageIn article <cebqbu$9hk@netnews.proxy.lucent.com>,
Ed Morton <morton@lsupcaemnt.com> wrote:
>
>
>Kenny McCormack wrote:
><snip>
>
>Here's how I'd explain that:
>
>RS takes precedence over FS so in this case by setting RS="" we're
>saying that a sequence of 1 or more blank lines is the record separator
>and so the "\n" at the start of the printf is being treated as a
>sequence of 1 blank line and so swalled as record separator. That just
>leaves " foo" which is 1 field with size 4.
I can't say as I buy this. I think you are right that "RS takes
precedence over FS", but that statement could only have relevance if RS==FS
(i.e., if they are set to the same thing). Here, they are not; I leave it
as an exercise to prove that if they are equal, then every record has
either 0 or 1 field (cannot have more).
Further, having RS="" is, as far as I can tell, though I can find no
reference for this at the moment, equivalent to RS="\n\n+" - i.e., *2* or
more adjacent newlines. Since the string I am feeding to gawk above
contains only 1 newline character in the entire string, RS does not come
into play.
>What I find odd though is that if I add a blank line to the end of the
>input string to explicitly satisfy the RS:
>
>printf "\n foo\n\n" |
> gawk 'BEGIN{RS="";FS="\n "}{print NR,NF,$2,length($2)}'
>
>Now the "foo" field IS number 2 and it's length is 3 which I find
>confusing since I thought the end of input (file) was supposed to get
>treated the same as the end of a record but that's not what's happening
>in this case.
I could not replicate this. And if you can, then it is clearly a bug.
(To make that more explicit, non-presence of an RS does not make an input
string invalid. It just means that NR is never greater than 1...)
Observe: printf "foo" | gawk '{print NR,NF,$1,length($1)}'
Post Follow-up to this messageKenny McCormack wrote: > In article <cebqbu$9hk@netnews.proxy.lucent.com>, > Ed Morton <morton@lsupcaemnt.com> wrote: > > > > I can't say as I buy this. I think you are right that "RS takes > precedence over FS", but that statement could only have relevance if RS==F S > (i.e., if they are set to the same thing). It's also relevant when one is a subset of the other. That's not quite what's happening here, but it's close - the RS specification of "" is sucking up the "\n" that you'd like to be part of the FS. Here, they are not; I leave it > as an exercise to prove that if they are equal, then every record has > either 0 or 1 field (cannot have more). > > Further, having RS="" is, as far as I can tell, though I can find no > reference for this at the moment, equivalent to RS="\n\n+" - i.e., *2* or > more adjacent newlines. I agree in that 2 adjacent newlines constitute one blank line but try setting RS to "\n\n+" and see if it produces the results you expect (I do this below and explain why they aren't exactly equivalent). Since the string I am feeding to gawk above > contains only 1 newline character in the entire string, RS does not come > into play. I disagree. printf "X" produces an X while printf "\nX" produces a blank line followed by an X: $ printf "X" X$ $ printf "\nX" X$ According to this from the GNU awk user's guide (http://www.gnu.org/software/gawk/ma..._mono/gawk.html): <By a special dispensation, an empty string as the value of RS indicates that records are separated by *one* or more blank lines.> so that single blank line should be treated as a record separator. Having said that, the documentation DOES also go on to say the contradictory: <You can achieve the same effect as RS = "" by assigning the string "\n\n+" to RS.> so let's try that with a couple of versions of gawk: gawk 3.0.4 on Solaris(ksh88): $ printf "\n foo" |gawk 'BEGIN{RS="";FS="\n ";OFS=";"}{print NR,NF,$1,$2}' 1;1; foo; $ printf "\n foo" |gawk 'BEGIN{RS="\n\n+";FS="\n ";OFS=";"}{print NR,NF,$1,$2}' 1;2;;foo gawk 3.1.3 on Cygwin(bash): $ printf "\n foo" |gawk 'BEGIN{RS="";FS="\n ";OFS=";"}{print NR,NF,$1,$2}' 1;1; foo; $ printf "\n foo" |gawk 'BEGIN{RS="\n\n+";FS="\n ";OFS=";"}{print NR,NF,$1,$2}' 1;2;;fo So, setting RS="" vs RS="\n\n+" produces different results on both platforms. That is explained by this further text in the manual: <There is an important difference between RS = "" and RS = "\n\n+". In the first case, leading newlines in the input data file are ignored ....In the second case, this special processing is not done.> and that's why your initial newline is not being treated as you'd like. I've no idea why gawk 3.1.3 chooses to truncate "foo" to "fo" in the final example above! > > > > I could not replicate this. And if you can, then it is clearly a bug. I couldn't replicate it on my Cygwin distibution with gawk 3.1.3 either, but this is gawk 3.0.4 on either ksh88 or bash on Solaris (SunOS 5.8): $ printf "\n foo" |gawk 'BEGIN{RS="";FS="\n "}{print NR,NF,$1,length($1)}' 1 1 foo 4 $ printf "\n foo\n\n" |gawk 'BEGIN{RS="";FS="\n "}{print NR,NF,$2,length($2)}' 1 2 foo 3 $ gawk --version GNU Awk 3.0.4 > (To make that more explicit, non-presence of an RS does not make an input > string invalid. It just means that NR is never greater than 1...) > I agree that's the intent. It looks like there's bugs in both version of gawk I'm using 8-(. Ed. > Observe: printf "foo" | gawk '{print NR,NF,$1,length($1)}' >
Post Follow-up to this messageHello,
In article <MJWdnXESBq14QpTc4p2dnA@comcast.com>, Ed Morton wrote:
> $ printf "\n foo" |gawk 'BEGIN{RS="\n\n+";FS="\n ";OFS=";"}{print
> NR,NF,$1,$2}'
> 1;2;;fo
...
> I've no idea why gawk 3.1.3 chooses to truncate "foo" to "fo" in the
> final example above!
That was a bug in gawk 3.1.3. A few days ago I have verified that the
current beta, which Arnold has also announced here, has this bug fixed.
Have a nice day,
Stepan
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.