For Programmers: Free Programming Magazines  


Home > Archive > AWK > January 2005 > no nawk on gentoo? (and nawk/bwk difference?)









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author no nawk on gentoo? (and nawk/bwk difference?)
Zhang Weiwu

2005-01-03, 3:55 pm

Newbie learning gentoo. I just spend 5 hours to google around and test
and only to find gawk failed to support multibyte character in length()
and substr(). This is really a shock to me, as I think this is the BASIC
function that an Asian user might need, and gawk has been ages old not
supporting it. Are there so few Asian users!!

Now just happy to find nawk support multibytes:
http://www.cryst.bbk.ac.uk/CCSG/uni.../awk.html#sect3

> the *awk* command differs from the commands *oawk* and *gawk* in that
> *awk* conforms to the x/open portability guide, issue 4 (xpg4). the
> *awk* command is therefore capable of handling multibyte characters
> that occur in coded character sets defined for some native languages.



Now come to the question: _esearch nawk_ returns no result. Isn't nawk
ported into gentoo, or it is ported to gentoo under a different name?

Furthermore, I found it seems nawk is from Bell-Lab. I happen to find
another verison of awk, the bwk is also from Brian Kernighan from
Bell-Lab (and ain't in portage either). I wish to install the one awk
that CSSG book (mensioned above) claimed to be able to support
multi-bytes, which one is it?

OT:
Truly, the migration to awk is very painful. I had been using JScript
(by Windows Scripting File) dealing with my text processing 4 years ago
(mostly my historical research data), JScript support Unicode and
(perhaps not as good as in awk) regular expression. Now I switched to
Linux, and all my previous Chinese text files become a headache for 4
years. Awk could not process them, even vim could destroy several text
files simply by opening and saving it (as there are rare Chinese
ideograph might not covered by vim I guess, as I do Chinese history
research in spare time). Today in new years holiday I tried to find 5
hours of a whole block of time, I am so determined to start process
those old files on Linux, end up with hours of debugging and googling
around only to discover more gawk-multi-byte-incompatible problems.
Sorry, as I will go on using Linux despite of these problems, this is
just a useless complaint to make me not feel too bad ...
Zhang Weiwu

2005-01-03, 3:55 pm

Zhang Weiwu wrote:
> Newbie learning gentoo. I just spend 5 hours to google around and test


Sorry this message is supposed to go to gentoo-user list. But it's not
very OT for this group, right? And I am newbie learning awk not gentoo.
Jürgen Kahrs

2005-01-03, 3:55 pm

Zhang Weiwu wrote:
> Newbie learning gentoo. I just spend 5 hours to google around and test
> and only to find gawk failed to support multibyte character in length()
> and substr(). This is really a shock to me, as I think this is the BASIC
> function that an Asian user might need, and gawk has been ages old not
> supporting it. Are there so few Asian users!!


I am quite happy that finally someone dares
to ask the question. Go on, I am eagerly
awaiting comments.

> Now just happy to find nawk support multibytes:
> http://www.cryst.bbk.ac.uk/CCSG/uni.../awk.html#sect3
>

Hmm, really ? I dont trust this source.
For example, this source writes "begin"
instead of "BEGIN". This description is
incomplete and partly wrong.

What really counts is this one:

http://www.opengroup.org/onlinepubs...99/xcu/awk.html
[color=darkred]
> Now come to the question: _esearch nawk_ returns no result. Isn't nawk
> ported into gentoo, or it is ported to gentoo under a different name?


nawk is not part of the POSIX standard.
nawk is traditionally supported by many
Linux system and all SunOS derivatives.

> around only to discover more gawk-multi-byte-incompatible problems.


What is "multi-byte-incompatible" ?
You expect AWK to behave like JScript (which is a
Microsoft-variant of JavaScript as far as I know).
Jürgen Kahrs

2005-01-03, 3:55 pm

Zhang Weiwu wrote:

> Sorry this message is supposed to go to gentoo-user list. But it's not
> very OT for this group, right? And I am newbie learning awk not gentoo.


This is definitely on-topic.
Go on asking, otherwise we would never
start solving these problems.
Zhang Weiwu

2005-01-04, 3:55 am

Jürgen Kahrs wrote:
> Zhang Weiwu wrote:
>
>
>
> I am quite happy that finally someone dares
> to ask the question. Go on, I am eagerly
> awaiting comments.


Hope GNU people don't eat me for this question ;)
But I am not a developer who can contribute on this topic. I could only
ask questions :(

>
>
> What is "multi-byte-incompatible" ?
> You expect AWK to behave like JScript (which is a
> Microsoft-variant of JavaScript as far as I know).


At least all JScript functions destinguish multi-byte and single-byte
character correctly, and there is always an option in substr(),
indexOf(), length().. specify wheather or not the string should be
treated as unicode (although Microsoft understnad unicode as UTF16LE). I
dislike JScript itself but it just did what I wished. And it deals with
rare Chinese ideographs as well.

In Windows, JScript could be put into .wsf file and process text file
being called from CMD commandline.
Zhang Weiwu

2005-01-04, 3:55 am

Jürgen Kahrs wrote:
> Zhang Weiwu wrote:
>
>
>
> I am quite happy that finally someone dares
> to ask the question. Go on, I am eagerly
> awaiting comments.


One more question: can I avoid this question by using other language (in
my case, perl)? I am not sure if perl could deal with multi-byte, but I
prefer to tap the knowledge of this group rather than spending another 5
hours to find it out :( I have lots of files to process, and
substr/index/length will be used many a time.
Marc Vertes

2005-01-04, 8:55 am

On Tue, 04 Jan 2005 02:27:12 +0800
Zhang Weiwu <zhangweiwu@realss.com> wrote:

> Newbie learning gentoo. I just spend 5 hours to google around and test
> and only to find gawk failed to support multibyte character in length()
> and substr(). This is really a shock to me, as I think this is the BASIC
> function that an Asian user might need, and gawk has been ages old not
> supporting it. Are there so few Asian users!!
>

You could try to use TCL, which supports nicely unicode, and has strong text
processing features, even if different from awk.

--Marc
Jürgen Kahrs

2005-01-06, 3:57 pm

Zhang Weiwu wrote:
> Newbie learning gentoo. I just spend 5 hours to google around and test
> and only to find gawk failed to support multibyte character in length()
> and substr(). This is really a shock to me, as I think this is the BASIC
> function that an Asian user might need, and gawk has been ages old not
> supporting it. Are there so few Asian users!!


I am quite happy that finally someone dares
to ask the question. Go on, I am eagerly
awaiting comments.

> Now just happy to find nawk support multibytes:
> http://www.cryst.bbk.ac.uk/CCSG/uni.../awk.html#sect3
>

Hmm, really ? I dont trust this source.
For example, this source writes "begin"
instead of "BEGIN". This description is
incomplete and partly wrong.

What really counts is this one:

http://www.opengroup.org/onlinepubs...99/xcu/awk.html
[color=darkred]
> Now come to the question: _esearch nawk_ returns no result. Isn't nawk
> ported into gentoo, or it is ported to gentoo under a different name?


nawk is not part of the POSIX standard.
nawk is traditionally supported by many
Linux system and all SunOS derivatives.

> around only to discover more gawk-multi-byte-incompatible problems.


What is "multi-byte-incompatible" ?
You expect AWK to behave like JScript (which is a
Microsoft-variant of JavaScript as far as I know).
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com