Code Comments
Programming Forum and web based access to our favorite programming groups.HI all,
I'm a bit of an awk newbie. I'm trying to use some conditional
statements to generate printf statements to print colums on a page. As
I read the printf info, it said it didn't do a linefeed till you
explicitly put one in '\n'.
Here's an example of what I'm trying to do on an input of email
addresses. I'm trying in this section to split off the username before
the @ sign. And if there are underscores (_) in the username, I find
out if they are first and middle initials and a last name...and I want
to print out in columns the email address, the first name or first
initial, the middle initial or last name, and the last name if there
are a first name/first initial and middle initial.
I'm trying to do:
BEGIN {FS="~";}
{
v_email = ""
v_name = ""
v_first_name = ""
v_first_initial = ""
v_middle_initial = ""
v_last_name = ""
if ($10 !="" && /@/){
v_email = $10}
else if ($11!="" && /@/){
v_email = $11}
else if ($12 !="" && /@/){
v_email = $12}
split(v_email,v_email_parts,"@")
v_name = v_email_parts[1]
#First, test for under score splitter
if (match(v_name,/_/)){
split(v_name,v_name_parts,"_")
printf("%s\t",v_email)
if (length(v_name_parts[1]) == 1){
v_first_initial = v_name_parts[1]
printf("%s\t",v_first_initial)
}
else {
v_first_name = v_name_parts[1]
printf("%s\t",v_first_name)
}
if (length(v_name_parts[2]) == 1){
v_middle_initial = v_name_parts[2]
printf("%s\t",v_middle_initial)
}
else {
v_last_name = v_name_parts[2]
printf("%s\t",v_last_name)
}
if (length(v_name_parts[3]) > 0){
v_last_name = v_name_parts[3]
printf("%s\t",v_last_name)
}
printf("\n")
}
}
The file picks off the email from one of three colums..and it does
this perfectly. So, lets say the input in the v_email section is like
f_d_flinstone@bedrock.com
mick_jagger@stones.com
john_d_doe@dead.zone
I'd expect the out put to be
f_d_flinstone@bedrock.com f d flintstone
mick_jagger@stones.com mick jagger
john_d_doe@dead.zone john d doe
But, this isn't the case...I get something like:
f_d_flind@bedrdck.cflintstone
This isn't a real example, since I don't want to publish real email
addresses here. But, it appears to be overwriting the first entry
(v_email) instead of tabbing over across the page till it hits the \n.
If I comment out each printf statement except for one, they all work
individually..just blows when run all together. Can someone give me a
hint as to what's going wrong...or links to good info on this? I can't
find any good examples on the newsgroups or books so far on this.
This is just part of a program I'm writing, I'll be parsing for all
kinds of things in the name, but, this is the first section I'm
tackling.
Thanks in advance!!
Chilecayenne
Post Follow-up to this messageOn 1 Sep 2004 14:25:53 -0700 in comp.lang.awk, chilecayenne@yahoo.com
(cayenne) wrote:
>HI all,
>I'm a bit of an awk newbie. I'm trying to use some conditional
>statements to generate printf statements to print colums on a page. As
>I read the printf info, it said it didn't do a linefeed till you
>explicitly put one in '\n'.
Correct.
>Here's an example of what I'm trying to do on an input of email
>addresses. I'm trying in this section to split off the username before
>the @ sign. And if there are underscores (_) in the username, I find
>out if they are first and middle initials and a last name...and I want
>to print out in columns the email address, the first name or first
>initial, the middle initial or last name, and the last name if there
>are a first name/first initial and middle initial.
...
>The file picks off the email from one of three colums..and it does
>this perfectly. So, lets say the input in the v_email section is like
>
>f_d_flinstone@bedrock.com
>mick_jagger@stones.com
>john_d_doe@dead.zone
>
>I'd expect the out put to be
>
>f_d_flinstone@bedrock.com f d flintstone
>mick_jagger@stones.com mick jagger
>john_d_doe@dead.zone john d doe
Exactly what gawk produces.
>But, this isn't the case...I get something like:
>f_d_flind@bedrdck.cflintstone
Looks like your version of awk may have a problem. Need more info.
What command are you using to run awk? Which awk and version are you
using, under which shell and version, and OS and version?
--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
Brian.Inglis@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]a
b[dot]ca)
fake address use address above to reply
Post Follow-up to this message
cayenne wrote:
> HI all,
> I'm a bit of an awk newbie. I'm trying to use some conditional
> statements to generate printf statements to print colums on a page. As
> I read the printf info, it said it didn't do a linefeed till you
> explicitly put one in '\n'.
>
> Here's an example of what I'm trying to do on an input of email
> addresses. I'm trying in this section to split off the username before
> the @ sign. And if there are underscores (_) in the username, I find
> out if they are first and middle initials and a last name...and I want
> to print out in columns the email address, the first name or first
> initial, the middle initial or last name, and the last name if there
> are a first name/first initial and middle initial.
>
> I'm trying to do:
>
> BEGIN {FS="~";}
> {
>
> v_email = ""
> v_name = ""
> v_first_name = ""
> v_first_initial = ""
> v_middle_initial = ""
> v_last_name = ""
>
> if ($10 !="" && /@/){
> v_email = $10}
> else if ($11!="" && /@/){
> v_email = $11}
> else if ($12 !="" && /@/){
> v_email = $12}
>
>
> split(v_email,v_email_parts,"@")
>
> v_name = v_email_parts[1]
>
> #First, test for under score splitter
>
> if (match(v_name,/_/)){
>
> split(v_name,v_name_parts,"_")
>
> printf("%s\t",v_email)
>
> if (length(v_name_parts[1]) == 1){
> v_first_initial = v_name_parts[1]
> printf("%s\t",v_first_initial)
> }
> else {
> v_first_name = v_name_parts[1]
> printf("%s\t",v_first_name)
> }
>
> if (length(v_name_parts[2]) == 1){
> v_middle_initial = v_name_parts[2]
> printf("%s\t",v_middle_initial)
> }
> else {
> v_last_name = v_name_parts[2]
> printf("%s\t",v_last_name)
> }
>
> if (length(v_name_parts[3]) > 0){
> v_last_name = v_name_parts[3]
> printf("%s\t",v_last_name)
> }
> printf("\n")
>
> }
> }
>
> The file picks off the email from one of three colums..and it does
> this perfectly. So, lets say the input in the v_email section is like
>
> f_d_flinstone@bedrock.com
> mick_jagger@stones.com
> john_d_doe@dead.zone
>
> I'd expect the out put to be
>
> f_d_flinstone@bedrock.com f d flintstone
> mick_jagger@stones.com mick jagger
> john_d_doe@dead.zone john d doe
>
> But, this isn't the case...I get something like:
> f_d_flind@bedrdck.cflintstone
The above code shouldn't produce that given the input you show Have you
tried getting rid of some of the printfs to narrow it down to exactly
which printf(s) cause the problem?
Two possibilites are that your actual input file either:
a) contains control characters which could cause the output to look
jumbled, or
b) contains empty lines or others which don't contain a "@" in which
case your initial tests for setting v_email would fail and you fall into
the "split" with v_email set to "" and I don't know what would happen
with the resultant invalid array accesses you do after that.
For "a", which I think is the most likely problem, you just need to
check your input. For "b", you should really structure your code as:
BEGIN{ ... }
/@/ { ... }
rather than just:
BEGIN{ ... }
{ ... }
to make sure you're only processing lines with an "@" symbol (presumably
email addresses).
An unrelated enhancement you might want to consider is to change this:
if (match(v_name,/_/)){
split(v_name,v_name_parts,"_")
to this:
num_parts = split(v_name,v_name_parts,"_")
if (num_parts > 1){
i.e. just check the value returned from split to see if there as an "_"
rather than having to call a separate "match" function first.
You also don't need to test for whether the last name is in the 2nd or
3rd piosition becayuse you can just do:
v_last_name = v_name_parts[num_parts]
You should probably revisit the way you're assigning v_last_name anyway
since your current method would, given an input address of
"jim_bob_jones@whatever.com", set the first name to "jim" and the last
name to "jones" but completely ignore the "bob" (actually it would save
that as the last name then over-write it).
Hope that helps,
Ed.
Post Follow-up to this messageBrian Inglis <Brian.Inglis@SystematicSW.Invalid> wrote in message news:<behcj0tujq3v7f6gqr0 62m585833228prv@4ax.com>... > On 1 Sep 2004 14:25:53 -0700 in comp.lang.awk, chilecayenne@yahoo.com > (cayenne) wrote: > > > Correct. > > > ... > > > Exactly what gawk produces. > > > Looks like your version of awk may have a problem. Need more info. > What command are you using to run awk? Which awk and version are you > using, under which shell and version, and OS and version? Hi Brian, thank you very much for your reply!! I'm running Gentoo Linux, with the gentoo sources kernel: linux-2.4.20-gentoo-r5. awk --version gives me: GNU Awk 3.1.3 Copyright (C) 1989, 1991-2003 Free Software Foundation. I'm a little new to the differences with awk, gawk, and nawk...but, just to check things a little further I did a look in /bin to find that awk is a link to gawk on my system: /bin/awk -> gawk-3.1.3 I'm using the following to run my script: cat white_pages.csv | awk -f phone1.awk | more white_pages.csv is my file I'm picking off the email addresses of, and phone1.awk is my script file. I'm just using more to scroll down the results to look at them onscreen for now. Thanks for any insight and suggestions you can help me with! I really like working with awk so far...but, is easy to stumble as you progress to slighly more complex things. CC
Post Follow-up to this messagecayenne wrote: <snip> > I'm using the following to run my script: > cat white_pages.csv | awk -f phone1.awk | more This is commonly called "UUOC" (Useless Use Of Cat) since awk can take a file name argument. Do this instead: awk -f phone1.awk white_pages.csv | more Regards, Ed.
Post Follow-up to this messageIn article <2deb3d1.0409021126.2c6446fc@posting.google.com>, chilecayenne@yahoo.com (cayenne) wrote: > Ed Morton <morton@lsupcaemnt.com> wrote in message > news:<DcydnafNxcKtu6rcRVn-jA@comcast.com>... > <snip> > > Thanks for the reply Ed. > I've gone through and commented out all but one the printf's...each > one by themselves works just fine. > > Yeah, I know I need to clean up the code, and had thought about the > middle name vs. middle intital...this is just a first run through as I > started to refine it...and got stuck with the printing problem at this > early of an stage. > > This file is a csv from MS excell. I'll try to check for special > characters...maybe run a dos2unix on it...But, like I said, if I just > do one printf, it works...each one individually works...but, if I > start to use 2 or more of them to spit things out in columns, it mixes > them all into one line. > > I'll check on the special characters tho... > Any other suggestions greatly appreciated!! > :-) > > CC Change the command line to awk -f phone1.awk white_pages.csv | cat -vte | more The 'cat -vte' will tell you if there are any invisible characters and especially if there are <CR><LF> pairs by displaying ^M for the <CR> values and $ for the <LF> Bob Harris
Post Follow-up to this messageIn article <2deb3d1.0409030642.1a10d901@posting.google.com>, cayenne <chilecayenne@yahoo.com> wrote: ... >ps. Just curious, I've seen mentioned in responses here and other >forums where people get irritated about using cat 'too much'. Just >curious as to why? For some reason, newbies often write: cat somefile | someutil ... and this is unnecessary, and, in theory at least, wasteful. I won't go into the various details as to why it is unnecessary and wrong (STFW), but I will take the opportunity to say that, many, many moons ago, I saw the following in an MSDOS manual: type file | more and so, as with most things that are wrong in computing, it is all Bill Gates's fault. Note that the above is particularly bad in DOS, which doesn't have any sort of multitasking and thus has only fake pipes.
Post Follow-up to this messagecayenne wrote: <snip> > ps. Just curious, I've seen mentioned in responses here and other > forums where people get irritated about using cat 'too much'. Just > curious as to why? It's not so much about getting irritated, it's more helping people learn when they don't need to use it. The common mistake newcomers make is to use: cat file | some_command when "some_command" can take a file argument and so the above could be written as: some_command file or it could just read redirected input as: some_command < file and save an external command (cat) and a pipe. Let's say you have a kid who puts on their shoes, then takes them off and puts on their socks then puts their shoes back on. Wouldn't you tell them that they don't actually need to put on their shoes the first time? After seeing several kids do this, wouldn't you get a tad irritated and wonder who the heck is out there telling kids that that's the right way to do things? It's kinda like that.... Ed.
Post Follow-up to this messageKenny McCormack wrote:
>
>
> For some reason, newbies often write:
>
> cat somefile | someutil ...
I often do (although I'm not a newbie), since I think in terms of
pipelines and this way it goes nicely from left to right.
> and this is unnecessary, and, in theory at least, wasteful. I won't go
> into the various details as to why it is unnecessary and wrong (STFW), but
On occasion, it can be better. Copying a large file from one disk to
another is often better achieved with:
cat file1 | cat > file2
(or better still dd), since you're reading and writing in parallel like
this.
-Ed
--
(You can't go wrong with psycho-rats.) (er258)(@)(eng.cam)(.ac.uk)
/d{def}def/f{/Times findfont s scalefont setfont}d/s{10}d/r{roll}d f 5/m
{moveto}d -1 r 230 350 m 0 1 179{1 index show 88 rotate 4 mul 0 rmoveto}
for /s 15 d f pop 240 420 m 0 1 3 { 4 2 1 r sub -1 r show } for showpage
Post Follow-up to this messageIn article <q5idnS1ImonjMqXcRVn-rQ@comcast.com>, Ed Morton <morton@lsupcaemnt.com> wrote: > > >E. Rosten wrote: > >Then presumably writing it as: > > cat somefile | cat | someutil > >is even better since it extends even further from left to right ;-). >It's fine to think in terms of pipelines, but I can't imagine why you'd >want to introduce commands gratuitously at the head or tail of a pipeine. Indeed. The standard Randy Schwartz answer to "but I like to see my data go from left to right" is: < file someutil which works in any Bourne-ish shell. > >I've never come across that situation. When you say it's better - do you >mean faster, or more reliable, or something else? Can you go into any >more detail on why it's better as the benefits aren't intuitively obvious. I don't the claim holds water in any general sense. It *might* be true on some particular piece of hardware under some particular set of conditions.
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.