| Joe English 2007-11-26, 7:22 pm |
| Todd Helfter wrote:
> Ok, let me try to ask the question another way.
>
> If for instance I have a database table with a single column of
> varchar2(4000).
>
> In the ascii case, I can use a memory buffer of 4000 + 1 (for the null
> byte). In the case when I would use Unicode,
[ You mean "UCS2", not "Unicode" ]
> I must use a memory buffer of 8002. column size + 1 *
> sizeof(utext).
>
> It seems wasteful to me, to always use a larger buffer unless it is
> necessary.
[ Side note: isn't it also wasteful to allocate a 4001-byte array
to hold a VARCHAR2(4000) value? If memory pressure is a concern,
wouldn't it be better to allocate just enough space to hold
the value in question, instead of always allocating enough
space to hold the largest possible value? ]
> I guess my question is : given a pointer to a byte array. Is it
> possible easily determine if the contents if the array can be
> represented by the ascii character set or not.
What is the encoding of the contents of the byte array?
If it's an ASCII-superset 8-bit encoding like ISO8859-*
or the various Microsoft code pages, then you can scan for
any bytes with the high-order bit (0x80) set. If it's a
stateful encoding like ISO2022 or SHIFT-JIS, you can scan
for the presence of escape sequences. For KOI8-*, GB*, BIG5,
and others -- I don't know offhand, but can probably find
out with some digging.
But that's the wrong question. What you want to do is
get it into UTF-8.
If it's already UTF-8, you're golden: just pass it to
Tcl_NewStringObj(). (A quick googling indicates that
UTF-8 is Oracle's preferred encoding too, so that's
probably the best way to go). If it's not, then you
can use Tcl_ExternalToUtf* to convert it first.
You could also convert to UCS-2 and use Tcl_NewUnicodeObj(),
but -- depending on what happens downstream -- that's going
to be more expensive overall, since Tcl is likely to convert
the value to UTF-8 before doing anything with it anyway.
> If Tcl_GetUnicodeFromObj() really returns a UCS-2 strings then it
> should be Tcl_GetUCS2FromObj() :)
Yes, that's what it should have been called.
--Joe English
|