For Programmers: Free Programming Magazines  


Home > Archive > AWK > March 2008 > Gawk FIELDWIDTHS and multibyte characters









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Gawk FIELDWIDTHS and multibyte characters
Hermann Peifer

2008-03-16, 4:35 am

Hi,

It looks to me that Gawk's FIELDWIDTHS extension is not aware of
multibyte characters, see my example below.

$ cat testdata
CDRegion Commune Site
SEV=E4stsverige Hallands l=E4n Kungsbacka
SESm=E5land med =F6arna V=E4stra G=F6talands l=E4nG=F6teborg
SEKronoberg Alvesta Stenungsund

$ file testdata
testdata: UTF-8 Unicode text

$ awk 'BEGIN{FIELDWIDTHS =3D "2 20 20 20"}{print $4}' testdata
Site
Kungsbacka
l=E4nG=F6teborg
Stenungsund

Can someone confirm?

Hermann
Jürgen Kahrs

2008-03-16, 10:01 pm

Hermann Peifer schrieb:

> $ awk 'BEGIN{FIELDWIDTHS = "2 20 20 20"}{print $4}' testdata
> Site
> Kungsbacka
> länGöteborg
> Stenungsund
>
> Can someone confirm?


Yes, I just tried this example with Arnold's current
gawk-stable source tree at Savannah. The result is
the same as yours. Looks like you have found one more
of these painful multi-byte dark corners.
Hermann Peifer

2008-03-16, 10:01 pm

Jürgen Kahrs wrote:
> Hermann Peifer schrieb:
>
>
> Yes, I just tried this example with Arnold's current
> gawk-stable source tree at Savannah. The result is
> the same as yours. Looks like you have found one more
> of these painful multi-byte dark corners.


Thanks for confirming. I reported the bug to: bug-gawk@gnu.org
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com