Home > Archive > Fortran > August 2005 > [f77] wrong length of a string containing German umlauts
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
[f77] wrong length of a string containing German umlauts
|
|
| Hani A. Ibrahim 2005-08-19, 6:56 pm |
| Hello,
Concerning the code below I get the wrong result if the string contains
German umlauts:
E.g. str = 'abäd' -> Length = 5
This problem appears on my Linux (SuSE 9.1) only. On Windows (Win2k) I get
the correct result. It is independent of the compiler I used (g77, g95,
ifort). So I suppose it is a OS related problem.
Is it possible to avoid this mistake without losing portability?
--------------8<----------------------------
integer function length(str)
implicit none
character *(*) str
integer lmax, i
lmax=len(str)
* search the last non blank character
do i=lmax,1,-1
if(str(i:i).ne.' ') then
length=i
goto 10
end if
end do
length=lmax
10 continue
return
end
--------------8<----------------------------
Best regards and thanks in advance
Hani
| |
| James Giles 2005-08-19, 6:56 pm |
| Hani A. Ibrahim wrote:
> Hello,
>
> Concerning the code below I get the wrong result if the string
> contains German umlauts:
>
> E.g. str = 'abäd' -> Length = 5
>
> This problem appears on my Linux (SuSE 9.1) only. On Windows
> (Win2k) I get the correct result. It is independent of the compiler
> I used (g77, g95, ifort). So I suppose it is a OS related problem.
It's just a guess, but I'd bet the Linux environment is using a
UNICODE encoding (like UTF-8) in which all characters
except US ASCII require longer than 8-bits to represent.
Most of the time, Windows is configured to use just 8-bit
represenations and limit the available characters to some
flavor of ISO 8859. For example, my present default is
"Western European (Windows)" which is 8859-1 (Latin-1)
plus the extra characters Windows has in the character
codes between 128 and 159 (inclusive).
> Is it possible to avoid this mistake without losing portability?
Well, ä is not in the Fortran standard character set, so there's
no requirement on compilers to even allow its use. KIND
attributes were originally intended to address this, but if anyone
implements a Latin-1 KIND I've not heard.
--
J. Giles
"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare
| |
| Colin Watters 2005-08-20, 7:57 am |
| "Is it possible to avoid this mistake without losing portability?"
....Why do you think of it as a mistake? It doesn't look that way to me.
As you have already supposed (and James Giles has agreed) this is an
OS-dependent issue concerning how many bytes are required to store the
string. The answer returned by your code (and I hope, the len_trim
intrinsic) is the byte count. The way I see it, this is the important
answer, as it allows further processing to proceed correctly.
E.g. Suppose you want to create a filename, like say 'abäd.txt'. This could
be done thus:
integer ll
ll = len_trim(str)
str(ll+1:) = '.txt'
If ll was returned as 4 instead of 5, you would get 'abä.txt' instead.
--
Qolin
Email: my qname at domain
Domain: qomputing dot demon dot co dot uk
"Hani A. Ibrahim" <hibr@gmx.de> wrote in message
news:43064feb$0$24157$9b4e6d93@newsread4
.arcor-online.net...
> Hello,
>
> Concerning the code below I get the wrong result if the string contains
> German umlauts:
>
> E.g. str = 'abäd' -> Length = 5
>
> This problem appears on my Linux (SuSE 9.1) only. On Windows (Win2k) I get
> the correct result. It is independent of the compiler I used (g77, g95,
> ifort). So I suppose it is a OS related problem.
>
> Is it possible to avoid this mistake without losing portability?
>
>
> --------------8<----------------------------
>
> integer function length(str)
>
> implicit none
> character *(*) str
>
> integer lmax, i
>
> lmax=len(str)
>
> * search the last non blank character
> do i=lmax,1,-1
> if(str(i:i).ne.' ') then
> length=i
> goto 10
> end if
> end do
>
> length=lmax
>
> 10 continue
> return
> end
>
> --------------8<----------------------------
>
> Best regards and thanks in advance
>
> Hani
| |
| Hani A. Ibrahim 2005-08-20, 6:59 pm |
| Colin Watters wrote:
> "Is it possible to avoid this mistake without losing portability?"
>=20
> ...Why do you think of it as a mistake? It doesn't look that way to m=
e.
>=20
> As you have already supposed (and James Giles has agreed) this is an
> OS-dependent issue concerning how many bytes are required to store th=
e
> string. The answer returned by your code (and I hope, the len_trim
> intrinsic) is the byte count. The way I see it, this is the important=
> answer, as it allows further processing to proceed correctly.
>=20
> E.g. Suppose you want to create a filename, like say 'ab=C3=A4d.txt'.=
This
> could be done thus:
>=20
> integer ll
> ll =3D len_trim(str)
> str(ll+1:) =3D '.txt'
Yes, you are right. It works the way it should. But AFAIK LEN_TRIM is n=
ot
standard FORTRAN77 - anyway the user function do the same.
Thank you Colin and James.
Hani
| |
| Richard Maine 2005-08-20, 6:59 pm |
| In article <43078d3e$0$28539$9b4e6d93@newsread2.arcor-online.net>,
"Hani A. Ibrahim" <hibr@gmx.de> wrote:
> But AFAIK LEN TRIM is not
> standard FORTRAN77 - anyway the user function do the same.
It is standard f90, but not standard f77. As you have noted, it is easy
enough to write an f77 user function for it (given the widespread
extension of allowing underscores in names, standard in f90, but an
extension in f77) (and also as long as you are careful about the
possibility of the return value being zero; zero-length substrings are
not standard f77. Nothing wrong with a return value of zero, but using
it for a substring length can cause problems).
| |
| Michael Prager 2005-08-22, 7:02 pm |
| "Hani A. Ibrahim" <hibr@gmx.de> wrote:
>Hello,
>
>Concerning the code below I get the wrong result if the string contains
>German umlauts:
>
>E.g. str = 'abäd' -> Length = 5
>
>This problem appears on my Linux (SuSE 9.1) only. On Windows (Win2k) I get
>the correct result. It is independent of the compiler I used (g77, g95,
>ifort). So I suppose it is a OS related problem.
>
>Is it possible to avoid this mistake without losing portability?
The problem has been identified by earlier posters as a
character that is not in the Fortran character set. They have
implied, but maybe not said outright, that portability when
using extended characters never assured, because the mapping of
such characters to machine representations is highly dependent
on the system, OS, and even user settings.
The only way I can think of to avoid this particular issue in a
portable way would be to disallow the use of the umlaut and
instead require users to substitute "e". So abäd" would become
"abaed". AFAIK, that substitution is proper German. However,
if you are sorting characters or otherwise dissecting strings,
it may have its own problems....
--
Mike Prager, NOAA, Beaufort, NC
Address spam-trapped; remove color to reply.
* Opinions expressed are personal and not represented otherwise.
* Any use of tradenames does not constitute a NOAA endorsement.
|
|
|
|
|