| Author |
Gawk FIELDWIDTHS and multibyte characters
|
|
| Hermann Peifer 2008-03-16, 4:35 am |
| Hi,
It looks to me that Gawk's FIELDWIDTHS extension is not aware of
multibyte characters, see my example below.
$ cat testdata
CDRegion Commune Site
SEV=E4stsverige Hallands l=E4n Kungsbacka
SESm=E5land med =F6arna V=E4stra G=F6talands l=E4nG=F6teborg
SEKronoberg Alvesta Stenungsund
$ file testdata
testdata: UTF-8 Unicode text
$ awk 'BEGIN{FIELDWIDTHS =3D "2 20 20 20"}{print $4}' testdata
Site
Kungsbacka
l=E4nG=F6teborg
Stenungsund
Can someone confirm?
Hermann
| |
| Jürgen Kahrs 2008-03-16, 10:01 pm |
| Hermann Peifer schrieb:
> $ awk 'BEGIN{FIELDWIDTHS = "2 20 20 20"}{print $4}' testdata
> Site
> Kungsbacka
> länGöteborg
> Stenungsund
>
> Can someone confirm?
Yes, I just tried this example with Arnold's current
gawk-stable source tree at Savannah. The result is
the same as yours. Looks like you have found one more
of these painful multi-byte dark corners.
| |
| Hermann Peifer 2008-03-16, 10:01 pm |
| Jürgen Kahrs wrote:
> Hermann Peifer schrieb:
>
>
> Yes, I just tried this example with Arnold's current
> gawk-stable source tree at Savannah. The result is
> the same as yours. Looks like you have found one more
> of these painful multi-byte dark corners.
Thanks for confirming. I reported the bug to: bug-gawk@gnu.org
|
|
|
|