For Programmers: Free Programming Magazines  


Home > Archive > Unix Programming > September 2005 > differing sizes of wchar_t (client vs server)









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author differing sizes of wchar_t (client vs server)
Henry Townsend

2005-09-25, 6:59 pm

I hope I'm missing something obvious here ...

My app reads text data over a socket. It's actually done via libcurl
(http://curl.haxx.se) but that doesn't matter here; the point is that I
am presented with a void * pointing to a block of raw data and a long
giving the number of bytes in the block, and that's it.

However, since my app comprises both server and client the client knows
the data is line-oriented text. All it needs to do is break it into
lines and print them to stdout, skipping certain lines. This is not
hard, using a bunch of <string.h> routines like strchr and strstr, plus
fputs(). I have this all working - when the text is 8-bit ASCII.

But for full generality I'd like the server to deliver text in the UCS-2
charset (Unicode). I figured handling this on the client side would be a
simple matter of transposing char to wchar_t and strlen() to wcslen(),
etc. But it turns out that wchar_t on my platform (and on many,
including Linux and Solaris) is 4 bytes wide. So I've got a stream of
16-bit characters from the server and mechanisms for handling 8- and
32-bit - but not 16-bit - character streams on the client!

Is there a common/elegant solution? I could allocate a buffer twice as
big as the incoming data and promote to 4-byte chars before operating on
it but that would be inelegant to say the least. Not to mention the
platforms where wchar_t is 2 bytes. I guess I could pass sizeof(wchar_t)
to the server and have it respond with 2- or 4-byte data based on that,
but that would mean a doubling of bandwidth consumption. What do people
usually do about this "impedance mismatch"?

Background: the data is basically just a list of filenames. I want to
store them in a Unicode format (which is easy on the server side since
it's written in Java where UTF-16 is the native format) because Windows
pathnames technically are Unicode. But there's not a lot of I18N stuff
to deal with here - I just want to be able to store and regurgitate
strings of 16-bit characters.

Thanks,
Henry Townsend

[PS I posted this to comp.lang.java.softwaretools first but that was
just a finger fumble. I meant to come here all along so please don't
accuse me of multiposting.]
Roger Leigh

2005-09-25, 6:59 pm

Henry Townsend <henry.townsend@not.here> writes:

> However, since my app comprises both server and client the client knows
> the data is line-oriented text. All it needs to do is break it into
> lines and print them to stdout, skipping certain lines. This is not
> hard, using a bunch of <string.h> routines like strchr and strstr, plus
> fputs(). I have this all working - when the text is 8-bit ASCII.
>
> But for full generality I'd like the server to deliver text in the UCS-2
> charset (Unicode). I figured handling this on the client side would be a
> simple matter of transposing char to wchar_t and strlen() to wcslen(),
> etc. But it turns out that wchar_t on my platform (and on many,
> including Linux and Solaris) is 4 bytes wide. So I've got a stream of
> 16-bit characters from the server and mechanisms for handling 8- and
> 32-bit - but not 16-bit - character streams on the client!


Why do you believe that UTF-16 is going to give you "full generality"?
It's a 16-bit encoding of a 32-bit character set (UCS). wchar_t is 32
bits on GNU/Linux for a good reason.

> Is there a common/elegant solution?


Yes. Just use UTF-8 (UCS Transformation Format 8). You can convert
it into whatever representation you like at either end, but it solves
all problems when sending data over the wire.

It's easy to use iconv(3) to transform UTF-8 <=> UTF-16 or UTF-32
(UCS-4), or in fact any other representation.


Regards,
Roger

--
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com