Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

no nawk on gentoo? (and nawk/bwk difference?)
Newbie learning gentoo. I just spend 5 hours to google around and test
and only to find gawk failed to support multibyte character in length()
and substr(). This is really a shock to me, as I think this is the BASIC
function that an Asian user might need, and gawk has been ages old not
supporting it. Are there so few Asian users!!

Now just happy to find nawk support multibytes:
http://www.cryst.bbk.ac.uk/CCSG/uni.../awk.html#sect3

> the *awk* command differs from the commands *oawk* and *gawk* in that
> *awk* conforms to the x/open portability guide, issue 4 (xpg4). the
> *awk* command is therefore capable of handling multibyte characters
> that occur in coded character sets defined for some native languages.


Now come to the question: _esearch nawk_ returns no result. Isn't nawk
ported into gentoo, or it is ported to gentoo under a different name?

Furthermore, I found it seems nawk is from Bell-Lab. I happen to find
another verison of awk, the bwk is also from Brian Kernighan from
Bell-Lab (and ain't in portage either). I wish to install the one awk
that CSSG book (mensioned above) claimed to be able to support
multi-bytes, which one is it?

OT:
Truly, the migration to awk is very painful. I had been using JScript
(by Windows Scripting File) dealing with my text processing 4 years ago
(mostly my historical research data), JScript support Unicode and
(perhaps not as good as in awk) regular expression. Now I switched to
Linux, and all my previous Chinese text files become a headache for 4
years. Awk could not process them, even vim could destroy several text
files simply by opening and saving it (as there are rare Chinese
ideograph might not covered by vim I guess, as I do Chinese history
research in spare time). Today in new years holiday I tried to find 5
hours of a whole block of time, I am so determined to start process
those old files on Linux, end up with hours of debugging and googling
around only to discover more gawk-multi-byte-incompatible problems.
Sorry, as I will go on using Linux despite of these problems, this is
just a useless complaint to make me not feel too bad ...

Report this thread to moderator Post Follow-up to this message
Old Post
Zhang Weiwu
01-03-05 08:55 PM


Re: no nawk on gentoo? (and nawk/bwk difference?)
Zhang Weiwu wrote:
> Newbie learning gentoo. I just spend 5 hours to google around and test

Sorry this message is supposed to go to gentoo-user list. But it's not
very OT for this group, right? And I am newbie learning awk not gentoo.

Report this thread to moderator Post Follow-up to this message
Old Post
Zhang Weiwu
01-03-05 08:55 PM


Re: no nawk on gentoo? (and nawk/bwk difference?)
Zhang Weiwu wrote:
> Newbie learning gentoo. I just spend 5 hours to google around and test
> and only to find gawk failed to support multibyte character in length()
> and substr(). This is really a shock to me, as I think this is the BASIC
> function that an Asian user might need, and gawk has been ages old not
> supporting it. Are there so few Asian users!!

I am quite happy that finally someone dares
to ask the question. Go on, I am eagerly
awaiting comments.

> Now just happy to find nawk support multibytes:
> http://www.cryst.bbk.ac.uk/CCSG/uni.../awk.html#sect3
> 

Hmm, really ? I dont trust this source.
For example, this source writes "begin"
instead of "BEGIN". This description is
incomplete and partly wrong.

What really counts is this one:

http://www.opengroup.org/onlinepubs...99/xcu/awk.html

> Now come to the question: _esearch nawk_ returns no result. Isn't nawk
> ported into gentoo, or it is ported to gentoo under a different name?

nawk is not part of the POSIX standard.
nawk is traditionally supported by many
Linux system and all SunOS derivatives.

> around only to discover more gawk-multi-byte-incompatible problems.

What is "multi-byte-incompatible" ?
You expect AWK to behave like JScript (which is a
Microsoft-variant of JavaScript as far as I know).

Report this thread to moderator Post Follow-up to this message
Old Post
Jürgen Kahrs
01-03-05 08:55 PM


Re: no nawk on gentoo? (and nawk/bwk difference?)
Zhang Weiwu wrote:

> Sorry this message is supposed to go to gentoo-user list. But it's not
> very OT for this group, right? And I am newbie learning awk not gentoo.

This is definitely on-topic.
Go on asking, otherwise we would never
start solving these problems.

Report this thread to moderator Post Follow-up to this message
Old Post
Jürgen Kahrs
01-03-05 08:55 PM


Re: no nawk on gentoo? (and nawk/bwk difference?)
Jürgen Kahrs wrote:
> Zhang Weiwu wrote:
> 
>
>
> I am quite happy that finally someone dares
> to ask the question. Go on, I am eagerly
> awaiting comments.

Hope GNU people don't eat me for this question ;)
But I am not a developer who can contribute on this topic. I could only
ask questions :(
 
>
>
> What is "multi-byte-incompatible" ?
> You expect AWK to behave like JScript (which is a
> Microsoft-variant of JavaScript as far as I know).

At least all JScript functions destinguish multi-byte and single-byte
character correctly, and there is always an option in substr(),
indexOf(), length().. specify wheather or not the string should be
treated as unicode (although Microsoft understnad unicode as UTF16LE). I
dislike JScript itself but it just did what I wished. And it deals with
rare Chinese ideographs as well.

In Windows, JScript could be put into .wsf file and process text file
being called from CMD commandline.

Report this thread to moderator Post Follow-up to this message
Old Post
Zhang Weiwu
01-04-05 08:55 AM


Re: no nawk on gentoo? (and nawk/bwk difference?)
Jürgen Kahrs wrote:
> Zhang Weiwu wrote:
> 
>
>
> I am quite happy that finally someone dares
> to ask the question. Go on, I am eagerly
> awaiting comments.

One more question: can I avoid this question by using other language (in
my case, perl)? I am not sure if perl could deal with multi-byte, but I
prefer to tap the knowledge of this group rather than spending another 5
hours to find it out :( I have lots of files to process, and
substr/index/length will be used many a time.

Report this thread to moderator Post Follow-up to this message
Old Post
Zhang Weiwu
01-04-05 08:55 AM


Re: no nawk on gentoo? (and nawk/bwk difference?)
On Tue, 04 Jan 2005 02:27:12 +0800
Zhang Weiwu <zhangweiwu@realss.com> wrote:

> Newbie learning gentoo. I just spend 5 hours to google around and test
> and only to find gawk failed to support multibyte character in length()
> and substr(). This is really a shock to me, as I think this is the BASIC
> function that an Asian user might need, and gawk has been ages old not
> supporting it. Are there so few Asian users!!
>
You could try to use TCL, which supports nicely unicode, and has strong text
processing features, even if different from awk.

--Marc

Report this thread to moderator Post Follow-up to this message
Old Post
Marc Vertes
01-04-05 01:55 PM


Re: no nawk on gentoo? (and nawk/bwk difference?)
Zhang Weiwu wrote:
> Newbie learning gentoo. I just spend 5 hours to google around and test
> and only to find gawk failed to support multibyte character in length()
> and substr(). This is really a shock to me, as I think this is the BASIC
> function that an Asian user might need, and gawk has been ages old not
> supporting it. Are there so few Asian users!!

I am quite happy that finally someone dares
to ask the question. Go on, I am eagerly
awaiting comments.

> Now just happy to find nawk support multibytes:
> http://www.cryst.bbk.ac.uk/CCSG/uni.../awk.html#sect3
> 

Hmm, really ? I dont trust this source.
For example, this source writes "begin"
instead of "BEGIN". This description is
incomplete and partly wrong.

What really counts is this one:

http://www.opengroup.org/onlinepubs...99/xcu/awk.html

> Now come to the question: _esearch nawk_ returns no result. Isn't nawk
> ported into gentoo, or it is ported to gentoo under a different name?

nawk is not part of the POSIX standard.
nawk is traditionally supported by many
Linux system and all SunOS derivatives.

> around only to discover more gawk-multi-byte-incompatible problems.

What is "multi-byte-incompatible" ?
You expect AWK to behave like JScript (which is a
Microsoft-variant of JavaScript as far as I know).

Report this thread to moderator Post Follow-up to this message
Old Post
Jürgen Kahrs
01-06-05 08:57 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

AWK archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 07:43 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.