For Programmers: Free Programming Magazines  


Home > Archive > AWK > April 2005 > String handling









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author String handling
Colin

2005-04-08, 3:55 am

I'm trying to count the number of leading spaces in lines from a file using:

tfc=0
w=$0
for(i=1;i<=NF;i++) if (w[i] == ' ') tfc++ else break}

but this generates a syntax error

What am I doing wrong ??

Any help ...

Thank you

Colin
Bob Harris

2005-04-08, 3:55 am

In article <33c7bd32.0504071709.5711f3e3@posting.google.com>,
colinhay66@hotmail.com (Colin) wrote:

> I'm trying to count the number of leading spaces in lines from a file using:
>
> tfc=0
> w=$0
> for(i=1;i<=NF;i++) if (w[i] == ' ') tfc++ else break}
>
> but this generates a syntax error
>
> What am I doing wrong ??
>
> Any help ...
>
> Thank you
>
> Colin


w is not an array. so you can not index it. Also I'm not sure what you
are trying to do from the code. I'll try to see if I guess from your
text description.

awk '
{ sub(/[^ ].*/,"",$0); tfc += length($0) }
END { print tfc }
' input.file

I took $0, substituted everything after leading spaces on the line with
nothing, then used length to count the leading spaces that remained.

Bob Harris
Ed Morton

2005-04-08, 3:55 am



Colin wrote:

> I'm trying to count the number of leading spaces in lines from a file using:
>
> tfc=0
> w=$0
> for(i=1;i<=NF;i++) if (w[i] == ' ') tfc++ else break}
>
> but this generates a syntax error
>
> What am I doing wrong ??
>
> Any help ...
>
> Thank you
>
> Colin


The answers you got to this question at comp.unix.shell were correct.
Did you have a follow up question?

Ed
Patrick TJ McPhee

2005-04-08, 3:55 am

In article <33c7bd32.0504071709.5711f3e3@posting.google.com>,
Colin <colinhay66@hotmail.com> wrote:

% I'm trying to count the number of leading spaces in lines from a file using:
%
% tfc=0
% w=$0
% for(i=1;i<=NF;i++) if (w[i] == ' ') tfc++ else break}

You have unbalanced braces and missing semi-colons. w is a scalar, but
you treat it as an array. You use ' as a string delimiter, which it
isn't.

You could solve this using your approach like this:

BEGIN { FS = ""; tfc = 0 }
END { print tfc }
{ for (i = 1; i <= NF; i++) if ($i == " ") tfc++; else break }

I would do it like this:

BEGIN { tfc = 0 }
END { print tfc }
match($0, /^ +/) { tfc += RLENGTH }

which is about 3 times faster using gawk or mawk on this machine, and about
10 times faster using nawk.

Others don't like match() and might do it like this

BEGIN { tfc = 0 }
END { print tfc }
sub(/[^ ].*/, "") { tfc += length }

which seems to be slightly slower, but still quite a bit faster than going
at it character-by-character.
--

Patrick TJ McPhee
North York Canada
ptjm@interlog.com
Loki Harfagr

2005-04-08, 3:56 pm

Le Thu, 07 Apr 2005 18:09:11 -0700, Colin a écrit_:

> I'm trying to count the number of leading spaces in lines from a file using:
>
> tfc=0
> w=$0
> for(i=1;i<=NF;i++) if (w[i] == ' ') tfc++ else break}
>
> but this generates a syntax error
>
> What am I doing wrong ??


Now you know that :-)

> Any help ...


Then, another help/way of, just to be exhaustive
and for the pleasure to have fun with the toolbox :-)

$ sed 's/[^ ].*//' testfile|tr -d '\n' |wc -c

Well, it'd choke on non LF-terminating files of course ...
Michael Tosch

2005-04-09, 3:55 pm

In article <425690a6$0$32081$626a14ce@news.free.fr>, Loki Harfagr <loki@DarkDesign.free.fr> writes:
> Le Thu, 07 Apr 2005 18:09:11 -0700, Colin a écrit_:
>

Is this an awk script?
[color=darkred]
>
> Now you know that :-)
>
>
> Then, another help/way of, just to be exhaustive
> and for the pleasure to have fun with the toolbox :-)
>
> $ sed 's/[^ ].*//' testfile|tr -d '\n' |wc -c
>


With awk this becomes:

awk '{sum+=match($0"x","[^ ]")-1}END{print sum}' testfile

and can be stripped down to print the number per line:

awk '{print match($0"x","[^ ]")-1}' testfile



--
Michael Tosch @ hp : com


Loki Harfagr

2005-04-09, 3:55 pm

Le Sat, 09 Apr 2005 15:17:36 +0000, Michael Tosch a écrit_:

> Is this an awk script?


Oooops ! So sorry !
I really don't know why I wrongly xposted here :
My only feeble bginning of an explaination is my
conjunctivitis is getting worser and worser ...

Though o real harm done then for it gave you the opportunity
to give the good awk translation of the stuff :D)

>
> With awk this becomes:
> awk '{sum+=match($0"x","[^ ]")-1}END{print sum}' testfile
> and can be stripped down to print the number per line:
> awk '{print match($0"x","[^ ]")-1}' testfile


Thanx for this, and sorry again.
Bob Harris

2005-04-12, 3:56 am

In article <33c7bd32.0504071709.5711f3e3@posting.google.com>,
colinhay66@hotmail.com (Colin) wrote:

> I'm trying to count the number of leading spaces in lines from a file using:
>
> tfc=0
> w=$0
> for(i=1;i<=NF;i++) if (w[i] == ' ') tfc++ else break}
>
> but this generates a syntax error
>
> What am I doing wrong ??
>
> Any help ...
>
> Thank you
>
> Colin


w is not an array. so you can not index it. Also I'm not sure what you
are trying to do from the code. I'll try to see if I guess from your
text description.

awk '
{ sub(/[^ ].*/,"",$0); tfc += length($0) }
END { print tfc }
' input.file

I took $0, substituted everything after leading spaces on the line with
nothing, then used length to count the leading spaces that remained.

Bob Harris
Patrick TJ McPhee

2005-04-12, 3:56 am

In article <33c7bd32.0504071709.5711f3e3@posting.google.com>,
Colin <colinhay66@hotmail.com> wrote:

% I'm trying to count the number of leading spaces in lines from a file using:
%
% tfc=0
% w=$0
% for(i=1;i<=NF;i++) if (w[i] == ' ') tfc++ else break}

You have unbalanced braces and missing semi-colons. w is a scalar, but
you treat it as an array. You use ' as a string delimiter, which it
isn't.

You could solve this using your approach like this:

BEGIN { FS = ""; tfc = 0 }
END { print tfc }
{ for (i = 1; i <= NF; i++) if ($i == " ") tfc++; else break }

I would do it like this:

BEGIN { tfc = 0 }
END { print tfc }
match($0, /^ +/) { tfc += RLENGTH }

which is about 3 times faster using gawk or mawk on this machine, and about
10 times faster using nawk.

Others don't like match() and might do it like this

BEGIN { tfc = 0 }
END { print tfc }
sub(/[^ ].*/, "") { tfc += length }

which seems to be slightly slower, but still quite a bit faster than going
at it character-by-character.
--

Patrick TJ McPhee
North York Canada
ptjm@interlog.com
Michael Tosch

2005-04-12, 3:56 am

In article <425690a6$0$32081$626a14ce@news.free.fr>, Loki Harfagr <loki@DarkDesign.free.fr> writes:
> Le Thu, 07 Apr 2005 18:09:11 -0700, Colin a écrit_:
>

Is this an awk script?
[color=darkred]
>
> Now you know that :-)
>
>
> Then, another help/way of, just to be exhaustive
> and for the pleasure to have fun with the toolbox :-)
>
> $ sed 's/[^ ].*//' testfile|tr -d '\n' |wc -c
>


With awk this becomes:

awk '{sum+=match($0"x","[^ ]")-1}END{print sum}' testfile

and can be stripped down to print the number per line:

awk '{print match($0"x","[^ ]")-1}' testfile



--
Michael Tosch @ hp : com


Loki Harfagr

2005-04-12, 3:56 am

Le Sat, 09 Apr 2005 15:17:36 +0000, Michael Tosch a écrit_:

> Is this an awk script?


Oooops ! So sorry !
I really don't know why I wrongly xposted here :
My only feeble bginning of an explaination is my
conjunctivitis is getting worser and worser ...

Though o real harm done then for it gave you the opportunity
to give the good awk translation of the stuff :D)

>
> With awk this becomes:
> awk '{sum+=match($0"x","[^ ]")-1}END{print sum}' testfile
> and can be stripped down to print the number per line:
> awk '{print match($0"x","[^ ]")-1}' testfile


Thanx for this, and sorry again.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com