Home > Archive > AWK > April 2005 > no nawk on gentoo? (and nawk/bwk difference?)
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
no nawk on gentoo? (and nawk/bwk difference?)
|
|
| Zhang Weiwu 2005-01-06, 8:56 pm |
| Newbie learning gentoo. I just spend 5 hours to google around and test
and only to find gawk failed to support multibyte character in length()
and substr(). This is really a shock to me, as I think this is the BASIC
function that an Asian user might need, and gawk has been ages old not
supporting it. Are there so few Asian users!!
Now just happy to find nawk support multibytes:
http://www.cryst.bbk.ac.uk/CCSG/uni.../awk.html#sect3
> the *awk* command differs from the commands *oawk* and *gawk* in that
> *awk* conforms to the x/open portability guide, issue 4 (xpg4). the
> *awk* command is therefore capable of handling multibyte characters
> that occur in coded character sets defined for some native languages.
Now come to the question: _esearch nawk_ returns no result. Isn't nawk
ported into gentoo, or it is ported to gentoo under a different name?
Furthermore, I found it seems nawk is from Bell-Lab. I happen to find
another verison of awk, the bwk is also from Brian Kernighan from
Bell-Lab (and ain't in portage either). I wish to install the one awk
that CSSG book (mensioned above) claimed to be able to support
multi-bytes, which one is it?
OT:
Truly, the migration to awk is very painful. I had been using JScript
(by Windows Scripting File) dealing with my text processing 4 years ago
(mostly my historical research data), JScript support Unicode and
(perhaps not as good as in awk) regular expression. Now I switched to
Linux, and all my previous Chinese text files become a headache for 4
years. Awk could not process them, even vim could destroy several text
files simply by opening and saving it (as there are rare Chinese
ideograph might not covered by vim I guess, as I do Chinese history
research in spare time). Today in new years holiday I tried to find 5
hours of a whole block of time, I am so determined to start process
those old files on Linux, end up with hours of debugging and googling
around only to discover more gawk-multi-byte-incompatible problems.
Sorry, as I will go on using Linux despite of these problems, this is
just a useless complaint to make me not feel too bad ...
| |
| Zhang Weiwu 2005-01-06, 8:56 pm |
| Jürgen Kahrs wrote:
> Zhang Weiwu wrote:
>
>
>
> I am quite happy that finally someone dares
> to ask the question. Go on, I am eagerly
> awaiting comments.
One more question: can I avoid this question by using other language (in
my case, perl)? I am not sure if perl could deal with multi-byte, but I
prefer to tap the knowledge of this group rather than spending another 5
hours to find it out :( I have lots of files to process, and
substr/index/length will be used many a time.
| |
| Zhang Weiwu 2005-01-08, 3:55 am |
| Zhang Weiwu wrote:
> Newbie learning gentoo. I just spend 5 hours to google around and test
Sorry this message is supposed to go to gentoo-user list. But it's not
very OT for this group, right? And I am newbie learning awk not gentoo.
| |
| Jason Gurtz 2005-04-08, 8:56 pm |
| On 1/3/2005 13:27, Zhang Weiwu wrote:
> Newbie learning gentoo. I just spend 5 hours to google around and test
> and only to find gawk failed to support multibyte character in length()
> and substr(). This is really a shock to me, as I think this is the BASIC
> function that an Asian user might need, and gawk has been ages old not
> supporting it. Are there so few Asian users!!
In general, support of UTF is lacking in many areas in the Linux/GNU
environment. Perhaps Plan9 and it's awk would help you out for this
specific task?
It now runs emulated under Xen on Linux. It's a bit of a learning curve
but it supports unicode very well :)
~Jason
--
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
| |
| Jason Gurtz 2005-04-12, 3:56 am |
| On 1/3/2005 13:27, Zhang Weiwu wrote:
> Newbie learning gentoo. I just spend 5 hours to google around and test
> and only to find gawk failed to support multibyte character in length()
> and substr(). This is really a shock to me, as I think this is the BASIC
> function that an Asian user might need, and gawk has been ages old not
> supporting it. Are there so few Asian users!!
In general, support of UTF is lacking in many areas in the Linux/GNU
environment. Perhaps Plan9 and it's awk would help you out for this
specific task?
It now runs emulated under Xen on Linux. It's a bit of a learning curve
but it supports unicode very well :)
~Jason
--
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
|
|
|
|
|