For Programmers: Free Programming Magazines  


Home > Archive > Fortran > October 2006 > C String Interop









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author C String Interop
Gary Scott

2006-09-30, 7:02 pm

In Fortran passing strings to C functions using compiler extensions it
is common to concatenate with "char(0)".

retcode = c_func("This is a test" // char(0))

Some compilers also support

retcode = c_func("This is a test"C)

or something similar (although for consistency C"string" would have been
better) and I believe I saw a special "function syntax" for string
arguments.

So at F2k3 does the compiler perform this conversion for you without
need to manually concatenate? (assuming you've properly defined the
binding to the C function)

--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

Why are there two? God only knows.


If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
Richard Maine

2006-09-30, 7:02 pm

Gary Scott <garylscott@sbcglobal.net> wrote:

> So at F2k3 does the compiler perform this conversion for you without
> need to manually concatenate? (assuming you've properly defined the
> binding to the C function)


No.

It isn't always the right thing to do. And it couldn't be done
consistently anyway. Your examples use only literal strings. Some of the
compilers you mention do this as a property of literal strings. Try the
same things with... oh... how about a substring of a variable. Is the
compiler going to do copy-in/copy-out of the substring in order to make
room for the extra character? That's about the only way I can think of
to make it work.

In fact, in general, the compiler does not do conversions in argument
passing, for C interop or most anywhere else.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
Gary Scott

2006-09-30, 7:02 pm

Richard Maine wrote:
> Gary Scott <garylscott@sbcglobal.net> wrote:
>
>
>
>
> No.
>
> It isn't always the right thing to do. And it couldn't be done
> consistently anyway. Your examples use only literal strings. Some of the
> compilers you mention do this as a property of literal strings. Try the
> same things with... oh... how about a substring of a variable. Is the
> compiler going to do copy-in/copy-out of the substring in order to make
> room for the extra character? That's about the only way I can think of
> to make it work.
>
> In fact, in general, the compiler does not do conversions in argument
> passing, for C interop or most anywhere else.
>

It seems that the most common cases of passing to C are fairly easy to
define. It would be a major convenience for the majority of cases (like
95+%) if you could specify this particular automatic conversion (i.e. in
the case of fixed length strings, assume convert as trim(<stringvar> ) //
char(0)) on passing to C.

--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

Why are there two? God only knows.


If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
James Giles

2006-09-30, 7:02 pm

Gary Scott wrote:
> In Fortran passing strings to C functions using compiler extensions it
> is common to concatenate with "char(0)".
>
> retcode = c_func("This is a test" // char(0))

....
> So at F2k3 does the compiler perform this conversion for you without
> need to manually concatenate? (assuming you've properly defined the
> binding to the C function)


Two years or so before final approval of F2003, I recommended
that the standard should offer an intrinsic prefix operator to do that,
as well as processing escape codes. Those were the days when
people were still claiming that processing such escapes on *all*
literals would be a conforming implementation. :-(

In any case, processing such things with an operator would
handle almost all cases. The operator would create a copy
whose length was determined by how many characters were
removed (for escape handling) and by the fact that the null
was added. That copy could then be assigned into a variable
(or substring thereof) by the usual rules of character assignment,
or it could be passed to any procedures as an actual argument.
You can, of course, write such an operator yourself. In F2003,
you can even use deferred length strings instead of complicated
specification functions to get the returned length right.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Gary Scott

2006-09-30, 7:02 pm

James Giles wrote:

> Gary Scott wrote:
>
>
> ...
>
>
>
> Two years or so before final approval of F2003, I recommended
> that the standard should offer an intrinsic prefix operator to do that,
> as well as processing escape codes. Those were the days when
> people were still claiming that processing such escapes on *all*
> literals would be a conforming implementation. :-(
>
> In any case, processing such things with an operator would
> handle almost all cases. The operator would create a copy
> whose length was determined by how many characters were
> removed (for escape handling) and by the fact that the null
> was added. That copy could then be assigned into a variable
> (or substring thereof) by the usual rules of character assignment,
> or it could be passed to any procedures as an actual argument.
> You can, of course, write such an operator yourself. In F2003,
> you can even use deferred length strings instead of complicated
> specification functions to get the returned length right.
>

I'd rather just do something like:

character, variable <or varying>, suffix=char(0) :: cstring

and have the compiler worry about it. (and yes "suffix=" implies a
"prefix=" too)

--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

Why are there two? God only knows.


If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
glen herrmannsfeldt

2006-09-30, 7:02 pm

Richard Maine wrote:

(snip regarding null terminated strings passed to C)

> It isn't always the right thing to do. And it couldn't be done
> consistently anyway. Your examples use only literal strings. Some of the
> compilers you mention do this as a property of literal strings. Try the
> same things with... oh... how about a substring of a variable. Is the
> compiler going to do copy-in/copy-out of the substring in order to make
> room for the extra character? That's about the only way I can think of
> to make it work.


C doesn't do that, so I wouldn't expect Fortran to do it.

Null termination is a convention of the C library, and supported
by the compiler for string constants, but non-null terminated strings
are allowed by the language. (*) With C's pass by value, only the
address of a string is passed, and a copy of the string is
never made automatically. Numeric function arguments will be
converted as needed if a prototype is in scope, though.

One result of passing the address of a string is that called
functions can and do modify strings. One that does is
strtok(), the string tokenizer. It returns the address to
the next token in the source string, which is null terminated
by modifying the source string. No copy is made.

> In fact, in general, the compiler does not do conversions in
> argument passing, for C interop or most anywhere else.


For strings, C compilers don't either. For pass by value
numerical arguments it might have been nice if C interop
did it, consistent with C compilers. It also might have
been nice if Fortran string constants were null terminated
for C interop convenience. It doesn't cost much.

(*) In C, strings are just character arrays, and it is
legal to move characters around and pass them as arrays.
If they are not null terminated the length should be
passed as an additional argument, or there should be
some other way for the called routine to determine
the length.

-- glen

James Giles

2006-09-30, 7:02 pm

Gary Scott wrote:
....
> I'd rather just do something like:
>
> character, variable <or varying>, suffix=char(0) :: cstring
>
> and have the compiler worry about it. (and yes "suffix=" implies a
> "prefix=" too)


Well, let's pin this down. Does this mean that the assignment
operator is responsible to add the null? What about trailing
blanks in the data? What about trailing nulls already in the data?
What about passing string expressions directly to C procedures
without first assigning them to a variable?

That is:

cstring = 'abc ' ! three trailing blanks intended as part of the data
cstring = 'abc' // null ! a null intended already as part of the data
result = cprocedure(...'abc'...) ! where the ... are replaced by what?

The first C itself handles well and Fortran's rules don't. The second
Fortran handles well and C doesn't. The third is handled by either
language alone, but the interop between them is the problem.

In language design you have to handle all the end cases. An
operator manages that because all the end cases have been
addressed in the existing rules about expressions, procedure
references, and assignments (or, they'd better have been!).

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Richard Maine

2006-09-30, 7:02 pm

Gary Scott <garylscott@sbcglobal.net> wrote:

> i.e. in
> the case of fixed length strings, assume convert as trim(<stringvar> ) //
> char(0)) on passing to C.


Yukk. No thanks. So you want to implicitly trim also? That seems pretty
"out of the blue" to me. C doesn't do anything like that. While there
are times when that might be handy, there are other times when it would
be a hidden bug because it didn't occur to someone that the compiler
would pass something other than exactly what was specified. Heck, we
have enough people already quite often asking things like how they can
read a line of text and know how long it really was instead of how long
a trimmed version of it was. And how do you define a "fixed length
string?" That would be pretty much anythng that could be directly passed
to C as a string, since allocatable strings aren't interoperable.

And note that you apparently are forcing copy-in/copy-out... unless
perhaps "fixed-length string" actually means "literal string". You
certainly can't freely plop a char(0) in memory after any string (or
trimmed version thereof) that gets passed to C. That location might be
in use for something else - the following variable in the common block,
the rest of the substring (maybe you meant to exclude substrings, but if
so "fixed length" doesn't achieve that).

I personally would *FAR* prefer a rule that was simple and consistent,
which is what we have - just pass exactly whatever the user gives as an
argument. Otherwise you are going to have to start defining the exact
rules for what happens when. And you'd have to define some way to
override the conversion for cases where that wasn't intended.

In fact, that's the general direction of my preference for many things.
I prefer to build a (relatively) simple consistent base rather than a
more complicated scheme that targets particular applications.

Note that we do the same type of thing everywhere else in argument
passing. If you have a double precision dummy, you have to pass a double
precision actual and that's the user's responsability; the compiler
doesn't automatically converty to what it assumes you probably wanted.

In short, I'd oppose anything particularly close to this string
conversion on argument passing. If you want a syntactic shorthand for
constructing a literal string with a char(0) automatically appended, I
could see that. But I'd see it as having nothing in particular to do
with argument passing; that would just be a way to construct such a
literal string for any purpose. And if you want to define functions and
operators that work on strings with that convention, that's fine. (In
fact, you could do that as a user... and you can also use the existing C
ones via interop). But I would oppose special-casing of argument passing
for the purpose.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
Gary Scott

2006-09-30, 7:02 pm

James Giles wrote:

> Gary Scott wrote:
> ...
>
>


I would think that this creates a real C string entity in fortran with
dynamic length (allocated on demand if necessary), however I assume it
would be implemented as a component of a derived type (a set of
descriptor fields) and when passed to a procedure defined with a BIND(C)
string argument, Fortran would pass a reference to that component only.
If such a string were returned (assuming that C properly null
terminated it, then Fortran would need to determine the actual length of
the returned string and adjust the length of the dynamic component to
match (so a direct reference to the component in the function call might
need to be replaced by a temporary location followed by an allocation
adjustment and a value assignment. And yes, on assignment, the
application of the suffix (or prefix) is automatic. So, one of the
components of the derived type is a prefix component, one is a suffix
component, one may be a length field independent of the null termination
for convenience (among other possibilities I can think of). However, in
the Fortran code, assignment syntax without reference to a component of
the derived type would result in whatever processing is necessary to
achieve the effective representation of a string immediately followed by
<suffix> (and prefixed with <prefix> if one is defined) in the string
value component of the descriptor. I would not prevent direct access to
all components of the derived type (the internal representation), this
special case is necessary. The prefix and suffix should allow any bit
pattern and largest practical number of character or byte-sized units.

>
> Well, let's pin this down. Does this mean that the assignment
> operator is responsible to add the null? What about trailing
> blanks in the data? What about trailing nulls already in the data?
> What about passing string expressions directly to C procedures
> without first assigning them to a variable?
>
> That is:
>
> cstring = 'abc ' ! three trailing blanks intended as part of the data
> cstring = 'abc' // null ! a null intended already as part of the data
> result = cprocedure(...'abc'...) ! where the ... are replaced by what?
>
> The first C itself handles well and Fortran's rules don't. The second
> Fortran handles well and C doesn't. The third is handled by either
> language alone, but the interop between them is the problem.
>
> In language design you have to handle all the end cases. An
> operator manages that because all the end cases have been
> addressed in the existing rules about expressions, procedure
> references, and assignments (or, they'd better have been!).
>



--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

Why are there two? God only knows.


If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
Walter Spector

2006-09-30, 7:02 pm

Richard Maine wrote:
> .... C doesn't do anything like that...


If you constrain your argument to strings, you are right.
But C/C++ can do a number of 'automatic' conversions during
procedure calls, even with function prototypes. This is really
apparent with integers of various sizes (and char is an int in C).

Consider the following:

#include <stdio.h>

void mysub (char x) {
printf ("x = %i\n", x);
}

int main () {
long long my_ll = 4242;

mysub (my_ll); // No diags are issued!
}


$ gcc lldemo.c
$ a
x = -110
$

The C guys will have a hard time telling me this is a 'good thing'.

Walt
James Giles

2006-09-30, 7:02 pm

Gary Scott wrote:
> James Giles wrote:
>
>
> I would think that this creates a real C string entity in fortran with
> dynamic length [...]


But, this is pretty much one of the worst ways to implement
dynamic strings. Fortran shouldn't copy the things about C
that C does badly. Fortran 2003 has deferred length strings
which are dynamic in a much cleaner and clearer way.
All we need with null termination is when you need such
a beast for C interop.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Gary Scott

2006-09-30, 10:03 pm

James Giles wrote:
> Gary Scott wrote:
>
>
>
> But, this is pretty much one of the worst ways to implement
> dynamic strings. Fortran shouldn't copy the things about C
> that C does badly. Fortran 2003 has deferred length strings
> which are dynamic in a much cleaner and clearer way.
> All we need with null termination is when you need such
> a beast for C interop.
>

Well, this is intended to be a feature independent of c strings. The
generic, automatically processed suffix/prefix part is key to my
interest in the feature. It just so happens that it could be used to
define an entity that looks like a c string. Regardless of whether it
is a good idea to design strings that way, c already does.

--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

Why are there two? God only knows.


If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
James Giles

2006-09-30, 10:03 pm

Gary Scott wrote:
....
> Well, this is intended to be a feature independent of c strings. The
> generic, automatically processed suffix/prefix part is key to my
> interest in the feature. It just so happens that it could be used to
> define an entity that looks like a c string. Regardless of whether it
> is a good idea to design strings that way, c already does.


Except that the only reason you've given to justify the
feature is the fact that it can mimic C strings. It's a
rather complex feature whose benefits are hard to
see. Fortran (2003) already has some convenient
support for dynamic strings. I wish it had support
for maintaining the active length of statically declared
strings as well - but the best way to do so is *not*
null termination (or any such use of another perfectly
reasonable character that ought to be permitted in
the data itself).

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


glen herrmannsfeldt

2006-09-30, 10:03 pm

Walter Spector wrote:

(snip)

> void mysub (char x) {
> printf ("x = %i\n", x);

(snip)
> long long my_ll = 4242;
> mysub (my_ll); // No diags are issued!


Java will do widening without a cast, but requires
the cast for narrowing. These conversions are
easier to do for call by value, though.

-- glen

glen herrmannsfeldt

2006-09-30, 10:04 pm

Richard Maine wrote:

(snip)

> Yukk. No thanks. So you want to implicitly trim also? That seems pretty
> "out of the blue" to me. C doesn't do anything like that.


Yes, C doesn't do anything like that.

(snip)

> I personally would *FAR* prefer a rule that was simple and consistent,
> which is what we have - just pass exactly whatever the user gives as an
> argument. Otherwise you are going to have to start defining the exact
> rules for what happens when. And you'd have to define some way to
> override the conversion for cases where that wasn't intended.


I agree.

> In fact, that's the general direction of my preference for many things.
> I prefer to build a (relatively) simple consistent base rather than a
> more complicated scheme that targets particular applications.


I think most of C is simpler than many Fortran features, especially
those starting in Fortran 90.

> Note that we do the same type of thing everywhere else in argument
> passing. If you have a double precision dummy, you have to pass a double
> precision actual and that's the user's responsability; the compiler
> doesn't automatically converty to what it assumes you probably wanted.


Except for generic intrinsics or generic user written routines.
You can say that isn't conversion, but it does mean that the
programmer doesn't need to worry about the argument type.

With K&R C, the rules were fairly simple. For floating point,
all arithmetic was done in double precision, and function arguments
were converted to double before the call. For integers, anything
smaller than int was (and still is) converted to int for arithmetic
operations and, for K&R, for function calls. One result of this is that
there is no need for generics for floating point math functions.

ANSI C added function prototypes declaring the type for the dummy
variables. That allowed the compiler not to convert float to double
for calls to routines with float dummy variables. It also doesn't
convert char or short to int for corresponding dummy variables.
With prototypes, though, it can convert numerical types as needed,
even between fixed and floating point. For arrays (passed as pointers)
no conversion is done. (Pointers may be converted if they are different
for different types, but the data pointed to isn't.)

> In short, I'd oppose anything particularly close to this string
> conversion on argument passing. If you want a syntactic shorthand for
> constructing a literal string with a char(0) automatically appended, I
> could see that. But I'd see it as having nothing in particular to do
> with argument passing; that would just be a way to construct such a
> literal string for any purpose. And if you want to define functions and
> operators that work on strings with that convention, that's fine. (In
> fact, you could do that as a user... and you can also use the existing C
> ones via interop). But I would oppose special-casing of argument passing
> for the purpose.


Is there any problem with always null terminating string literals?
It costs one more byte each, which isn't so much.

-- glen

glen herrmannsfeldt

2006-10-01, 4:00 am

James Giles wrote:

(snip)

> cstring = 'abc ' ! three trailing blanks intended as part of the data
> cstring = 'abc' // null ! a null intended already as part of the data
> result = cprocedure(...'abc'...) ! where the ... are replaced by what?


> The first C itself handles well and Fortran's rules don't. The second
> Fortran handles well and C doesn't. The third is handled by either
> language alone, but the interop between them is the problem.


Assuming result is a string of size unknown when cprocedure is compiled,
I would say that neither C nor Fortran were especially good at it.
(At least for versions <99.)

For C, a pointer to a string is returned. This is either a static
variable, as many C library routines do, or dynamically allocated for
the called routine to free.

CHARACTER*(*) functions have been discussed before, and I won't
say any more here.

-- glen

Walter Spector

2006-10-01, 4:00 am

glen herrmannsfeldt wrote:
> ...
> Java will do widening without a cast, but requires
> the cast for narrowing. These conversions are
> easier to do for call by value, though.


Correct. Java enforces this in expressions too. It refuses to
'silently' lose precision.

Walt
Walter Spector

2006-10-01, 4:00 am

glen herrmannsfeldt wrote:
>
> Is there any problem with always null terminating string literals?
> It costs one more byte each, which isn't so much.


The problem is not space. It is performance. O(n) algorithms vs O(1)
for a number of common operations.

Walt
James Giles

2006-10-01, 4:00 am

glen herrmannsfeldt wrote:
> James Giles wrote:
>
> (snip)
>
>
>
> Assuming result is a string of size unknown when cprocedure is
> compiled, I would say that neither C nor Fortran were especially good
> at it. (At least for versions <99.)


F95 can do it. It's rather verbose and sometimes does a lot
of redundant work. You compute the length in a specification
expression (which can invoke a specification function - which
in turn can do practically any sort of computation to discover the
appropriate length of cprocedure's result). F2003 does better:
you can just build the result as a deferred length string and
the returned length is the length of that result variable at the
time RETURN happens.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


glen herrmannsfeldt

2006-10-01, 4:00 am

James Giles wrote:

> glen herrmannsfeldt wrote:

(snip)

[color=darkred]
> F95 can do it. It's rather verbose and sometimes does a lot
> of redundant work. You compute the length in a specification
> expression (which can invoke a specification function - which
> in turn can do practically any sort of computation to discover the
> appropriate length of cprocedure's result).


That sounds harder than 'especially good' should make it.

> F2003 does better:
> you can just build the result as a deferred length string and
> the returned length is the length of that result variable at the
> time RETURN happens.


I am not sure what C99 allows, and I wasn't sure about F2003,
either, so I put <99. Still, deferred length string variable
sounds like a little extra complication.

-- glen

glen herrmannsfeldt

2006-10-01, 4:00 am

Walter Spector wrote:

> glen herrmannsfeldt wrote:


[color=darkred]
> The problem is not space. It is performance. O(n) algorithms vs O(1)
> for a number of common operations.


I only asked about null terminating them, not about actually using them.

I have written enough C to know the problems of using them.

(You can also get O(n**2) algorithms instead of O(n).)

Assuming that you might be calling C, is it that much harder to
add the null onto string constants? If you don't use it, it
doesn't cost more than one byte.

-- glen

Richard Maine

2006-10-01, 4:00 am

glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:

> Still, deferred length string variable
> sounds like a little extra complication.


I think I lost track of the subject somwhere along the line, because I'd
swear that you just said that having a deferred length string variable
sounds like an extra complication for a problem explicitly stated to be
one of having a deferred length string. I can't follow that, so I'll
assume I misunderstood.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
James Giles

2006-10-01, 4:00 am

glen herrmannsfeldt wrote:
> James Giles wrote:
>
> (snip)
>
>
>
> That sounds harder than 'especially good' should make it.


It seems to me that there are two colloquial meanings to the
phrase 'especially good'. If used positively ("that restaurant
is especially good") it's significant praise. One the other hand,
if used negatively ("that auto mechanic is not especially good")
it's usually a polite way of expressing exceptional disfavor.
In this case, it appeared to me you were suggesting it couldn't
be done at all, or was *very* difficult.

To be sure, the way F95 does this is only "tolerably good".
Which means that it's tolerable. It works reliably and isn't
impossible to read and verify the correctness of the code.
So, in the first sense above, it isn't especially good. In the
second sense, it isn't "not especially good".

In any case, you missed the point of my earlier comment:

> result = cprocedure(...'abc'...) ! where the ... are replaced by what?
>
> [...] The third is handled by either language alone, but the interop
> between them is the problem.


Here I was pointing out that if Fortran was calling Fortran the
value of 'abc' would be passed correctly as a string of length
three. No problem. If C were calling C the value of 'abc' would
have been appended with a null automatically and the procedure
could correctly discover that the argument's length was three.
But, if Fortran is calling C the length will not necessarily be
correctly discovered by the C procedure.

Since this thread was about C interop (in a Fortran newsgroup)
that last sentence seems important. The feature suggested by
a previous article in this thread was tied to a variable declaration
(such that assignments to such a variable would automatically
append nulls). But there's not even a place for an assignment to
such a variable inside the procedure call above. Hence I was
asking how his new feature was responsive to the request of the
OP.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Brooks Moses

2006-10-01, 4:00 am

glen herrmannsfeldt wrote:
> Richard Maine wrote:
>
> Is there any problem with always null terminating string literals?
> It costs one more byte each, which isn't so much.


But what does it gain? It certainly doesn't gain any reliability;
sometimes a string literal is null-terminated, and sometimes it isn't.

Besides which, I'd rather not break the following code:

call myCfunction("Hello, " // "World!")

- Brooks


--
The "bmoses-nospam" address is valid; no unmunging needed.
Walter Spector

2006-10-01, 4:00 am

glen herrmannsfeldt wrote:
> ...
> I only asked about null terminating them, not about actually using them.


:)

Interestingly, back in the pre-Fortran 77 days, some compilers would
insert a word of zero after a Hollerith constant in a procedure
call. E.g.:

CALL PRNTOUT (12HHELLO, WORLD)

On, say, a CDC 6600 with 60 bit words (10 characters/word), the compiler
would generate 3 words instead of two. This allowed the callee to search
for the zero word and discover the length of the string.

It wasn't well-documented in the manuals. You just sorta hadta
'know about it'.

Walt
glen herrmannsfeldt

2006-10-01, 4:00 am

Brooks Moses wrote:
(snip on null terminating string literals)

> But what does it gain? It certainly doesn't gain any reliability;
> sometimes a string literal is null-terminated, and sometimes it isn't.


> Besides which, I'd rather not break the following code:


> call myCfunction("Hello, " // "World!")


I could be wrong on this, but as I understood it that would
be one string literal. That the // would be done at compile
time and one string is compiled.

-- glen

glen herrmannsfeldt

2006-10-01, 4:00 am

James Giles wrote:

(snip)

> In any case, you missed the point of my earlier comment:


[color=darkred]
[color=darkred]
> Here I was pointing out that if Fortran was calling Fortran the
> value of 'abc' would be passed correctly as a string of length
> three. No problem. If C were calling C the value of 'abc' would
> have been appended with a null automatically and the procedure
> could correctly discover that the argument's length was three.
> But, if Fortran is calling C the length will not necessarily be
> correctly discovered by the C procedure.


I am still not sure what the ... are for. I got onto the question
of return values from functions, but maybe you weren't asking that.

C doesn't have a string concatenation operator in the Fortran
sense. You can't, for example, strcat("abc","xyz").

> Since this thread was about C interop (in a Fortran newsgroup)
> that last sentence seems important. The feature suggested by
> a previous article in this thread was tied to a variable declaration
> (such that assignments to such a variable would automatically
> append nulls). But there's not even a place for an assignment to
> such a variable inside the procedure call above. Hence I was
> asking how his new feature was responsive to the request of the


All I was asking about was that string constants would be null
terminated. It isn't that hard to do and would make many C
interop problems easy. String expressions are different, but
since you can't do anything like that in C, it doesn't seem
reasonable to ask for it in Fortran's C interop.

Now, if someone wants to call strcat() from Fortran they will
have to do it right and supply a null terminated string variable
with enough space left as the first argument. That is completely
different from expecting var // 'constant' to work right as a
subroutine argument. In C, string constants are null terminated
by the compiler. Everything else is the responsibility of the
programmer. The library helps, but that is all.

Consider:

char x[4]="abcd";

Note that C does not null terminate x.

-- glen

James Giles

2006-10-01, 4:00 am

glen herrmannsfeldt wrote:
> James Giles wrote:

....
>
....[color=darkred]
> I am still not sure what the ... are for. [...]


The ... were there to provide space for the other person to
insert syntax showing how his proposed feature would pass
the appropriately null terminated argument to a C procedure.
I didn't actually think his feature could do it. But, unless it
could, it wasn't responsive to the OP's problem.

Unless you're discussing *his* proposed feature, you're
likely to still miss the point of the example.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Richard Maine

2006-10-01, 7:03 pm

glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:

> I could be wrong on this, but as I understood it that would
> be one string literal.


It better not be... unless someone is rewriting not only the definition
of character constants, but also the parsing of expressions from
scratch.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
Richard Maine

2006-10-01, 7:03 pm

glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:

> All I was asking about was that string constants would be null
> terminated. It isn't that hard to do


It might not seem hard to you. It seems pretty hard to even define
consistently to me. Perhaps not impossible, but not trivial either. I
think you are confusing implementation with language specification.
Sure, it isn't hard to have an implementation that tacks a zero byte
after a string constant in memory. But the language specification is
another matter entirely.

Recall that a constant isn't even something that exists in memory from a
language definition perspective. The compiler is quite likely to put a
copy in memory in order to imlpement some things, but that's just the
compiler's choice (even if it is a universal one for some situations).
Note, for example, that you are not allowed to have a pointer to a
constant; there's a reason for that.

And then there is the question of what the length of a string would be.
What is the proposed definition of the length of "hello"? Are you saying
that should be 6 characters instead of 5 - the string really includes
the extra character? If so, oh dear! You just broke innumerable numbers
of existing codes - likely an absolute majority. Or are you saying that
the length is still 5, but it is ok and well-defined to look one
character past the end of the string length? If so, that requires a lot
of work to redefine some very basic concepts.

Again, I agree that it is easy for a compiler to tack traling stuff on
in practice. But I vehemently disagree that it is easy to specify
consistently and simply in the language definition.

Unless, of course, one has a special new form for the purpose instead of
changing the definition of an existing form. That's a *COMPLETELY*
different matter. It has been suggested elsethread and I agree that it
would be simple. (I'm not so convinced that it is worth adding such a
feature, but at least I agree it would be simple).

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
Gary Scott

2006-10-01, 7:03 pm

Richard Maine wrote:

> glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
>
>
>
>
> It might not seem hard to you. It seems pretty hard to even define
> consistently to me. Perhaps not impossible, but not trivial either. I
> think you are confusing implementation with language specification.
> Sure, it isn't hard to have an implementation that tacks a zero byte
> after a string constant in memory. But the language specification is
> another matter entirely.
>
> Recall that a constant isn't even something that exists in memory from a
> language definition perspective. The compiler is quite likely to put a
> copy in memory in order to imlpement some things, but that's just the
> compiler's choice (even if it is a universal one for some situations).
> Note, for example, that you are not allowed to have a pointer to a
> constant; there's a reason for that.
>
> And then there is the question of what the length of a string would be.
> What is the proposed definition of the length of "hello"? Are you saying
> that should be 6 characters instead of 5 - the string really includes
> the extra character? If so, oh dear! You just broke innumerable numbers
> of existing codes - likely an absolute majority. Or are you saying that
> the length is still 5, but it is ok and well-defined to look one
> character past the end of the string length? If so, that requires a lot
> of work to redefine some very basic concepts.
>
> Again, I agree that it is easy for a compiler to tack traling stuff on
> in practice. But I vehemently disagree that it is easy to specify
> consistently and simply in the language definition.
>
> Unless, of course, one has a special new form for the purpose instead of
> changing the definition of an existing form. That's a *COMPLETELY*
> different matter. It has been suggested elsethread and I agree that it
> would be simple. (I'm not so convinced that it is worth adding such a
> feature, but at least I agree it would be simple).
>

It only needs to be specified in terms of contatenation of a string with
a suffix string value. I don't want it to hard code a null value, I
want the flexibility to specify the value including the number of
characters. There are operating system features where hex FFFFFFFF is
used to terminate strings (parameter lists). I'm looking for a more
full featured capability which just so happens to be useful for defining
a c-compatible string. And I have other uses for it in terms of
database record processing (both suffix and prefix). In reality though,
a more full featured string parsing function using a defined template
would be even better to this half measure.

--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

Why are there two? God only knows.


If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
Brooks Moses

2006-10-02, 4:04 am

glen herrmannsfeldt wrote:
> Brooks Moses wrote:
> (snip on null terminating string literals)

I should have said "sometimes a string is null-terminated (if it comes
from a literal), and sometimes it isn't (if it comes from something
else)." Null-terminating all the literals doesn't null-terminate all
strings, and this seems likely to lead to a lot of confusion.
[color=darkred]
>
>
> I could be wrong on this, but as I understood it that would
> be one string literal. That the // would be done at compile
> time and one string is compiled.


That sounds like a distinctly unpleasant idea. I should very much
dislike using a language that was designed to produce different answers
if the compiler optimized something to occur at compile time rather than
runtime!

Regardless, if you want to assign those two literals to individual
character strings and then concatenate them, the point still holds. For
this to work "correctly", you will need to redefine the concatenate
operator to strip off the trailing null on the first operand. That way
lies either (a) madness, or (b) a feature that has essentially no
observable effects within standard-compliant Fortran because _all_
string operators ignore the suffix.

- Brooks


--
The "bmoses-nospam" address is valid; no unmunging needed.
glen herrmannsfeldt

2006-10-02, 4:04 am

Brooks Moses wrote:
> glen herrmannsfeldt wrote:


[color=darkred]
[color=darkred]
> I should have said "sometimes a string is null-terminated (if it comes
> from a literal), and sometimes it isn't (if it comes from something
> else)." Null-terminating all the literals doesn't null-terminate all
> strings, and this seems likely to lead to a lot of confusion.


I suppose it could be confusing, but it is one of the things that
C programmers have to learn. Fortran programmers doing C interop
might as well learn it, too.

[color=darkred]
[color=darkred]
[color=darkred]
> That sounds like a distinctly unpleasant idea. I should very much
> dislike using a language that was designed to produce different answers
> if the compiler optimized something to occur at compile time rather than
> runtime!


What I was suggesting for this case is that the concatenation
be done at compile time, and the resulting string literal
be null terminated. In that case, the string as seen by Fortran
would be the same either way. The Fortran length, as well as
the C strlen() does not include the null.

> Regardless, if you want to assign those two literals to individual
> character strings and then concatenate them, the point still holds. For
> this to work "correctly", you will need to redefine the concatenate
> operator to strip off the trailing null on the first operand. That way
> lies either (a) madness, or (b) a feature that has essentially no
> observable effects within standard-compliant Fortran because _all_
> string operators ignore the suffix.


As a Fortran expression the terminating null would not be counted
in the length, so it would not even be seen by standard Fortran
code. There is no string concatenation operator in C, so it doesn't
make sense to ask what C would do.

In many cases C does require constant expressions to be evaluated at
compile time. Arrays are required to be dimensioned using constants
(in C89). C89 allows constant expressions, so

int x[10+3];

is legal. (Especially useful with preprocessor substitution.)

C89 will concatenate string constants with no operator between them,
convenient for preprocessor macros and for specifying long string
constants on multiple lines without the confusion of escaped newlines.

That might be why I thought Fortran did compile time constant
string concatenation.

-- glen

James Giles

2006-10-02, 4:04 am

glen herrmannsfeldt wrote:
.... inconsistent null termination ...
> I suppose it could be confusing, but it is one of the things that
> C programmers have to learn. Fortran programmers doing C interop
> might as well learn it, too.


But, those of us that seldom do C interop shouldn't even be
aware that the thing exists. The existing Fortran character
string mechanisms have a few weaknesses. But there's no
reason to make things worse.

> As a Fortran expression the terminating null would not be counted
> in the length, so it would not even be seen by standard Fortran
> code. [...]


Except in storage association contexts, modules with the
SEQUENCE attribute, passing substrings as actual arguments,
and so on. What you're recommending would constitute a
noticeable change.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Elijah Cardon

2006-10-02, 4:04 am


"Gary Scott" <garylscott@sbcglobal.net> wrote in message
news:xevTg.6333$TV3.1462@newssvr21.news.prodigy.com...
> In Fortran passing strings to C functions using compiler extensions it is
> common to concatenate with "char(0)".
>
> retcode = c_func("This is a test" // char(0))
>
> Some compilers also support
>
> retcode = c_func("This is a test"C)
>
> or something similar (although for consistency C"string" would have been
> better) and I believe I saw a special "function syntax" for string
> arguments.
>
> So at F2k3 does the compiler perform this conversion for you without need
> to manually concatenate? (assuming you've properly defined the binding to
> the C function)
>

Just replying so that I can look this up later. EC


Gary Scott

2006-10-02, 8:03 am

Brooks Moses wrote:
> glen herrmannsfeldt wrote:
>
>
>
> I should have said "sometimes a string is null-terminated (if it comes
> from a literal), and sometimes it isn't (if it comes from something
> else)." Null-terminating all the literals doesn't null-terminate all
> strings, and this seems likely to lead to a lot of confusion.
>

The key here is that myCfunction have the correct prototype. If so,
then the compiler should be smart enough to get this right regardless of
whether it is done at compiler or run time. As a literal, there is no
reason for "Hello, " to contain a null terminator. It is the final
result that requires (given the correct prototype) null termination.
[color=darkred]
>
>
> That sounds like a distinctly unpleasant idea. I should very much
> dislike using a language that was designed to produce different answers
> if the compiler optimized something to occur at compile time rather than
> runtime!
>
> Regardless, if you want to assign those two literals to individual
> character strings and then concatenate them, the point still holds. For
> this to work "correctly", you will need to redefine the concatenate
> operator to strip off the trailing null on the first operand. That way
> lies either (a) madness, or (b) a feature that has essentially no
> observable effects within standard-compliant Fortran because _all_
> string operators ignore the suffix.
>
> - Brooks
>
>



--

Gary Scott
mailto:garylscott@sbcglobal dot net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

Why are there two? God only knows.


If you want to do the impossible, don't hire an expert because he knows
it can't be done.

-- Henry Ford
glen herrmannsfeldt

2006-10-02, 7:03 pm

James Giles <jamesgiles@worldnet.att.net> wrote:
(I wrote)

[color=darkred]
> Except in storage association contexts, modules with the
> SEQUENCE attribute, passing substrings as actual arguments,
> and so on. What you're recommending would constitute a
> noticeable change.


Just to sure, I only meant literal character constants.

I don't believe that applies to either the SEQUENCE attribute
or substrings. There might be people who make assumptions
about what comes after a character constant in memory, but
I don't believe the standard applies in that case.

In one post I wondered about constant expressions that
I thought would be done at compile time. None of mine
have asked about run time evaluation of expressions.

-- glen
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com