Home > Archive > Visual Basic Bugs > September 2005 > VB6 ANSI to Unicode conversion wrong with fixed-length strings in structures
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
VB6 ANSI to Unicode conversion wrong with fixed-length strings in structures
|
|
| MarkJackson 2005-09-06, 7:57 am |
| Hi
Briefly:
I have a Fortran DLL that returns a structure containing fixed-length
strings to a VB6 program. Fine for English characters, but not for
returning Chinese characters (using Chinese code page on the PC). In my
debugger the strings are correct in the DLL, but in the VB there are
garbage characters at the end. I can only think that VB's implicit ANSI
to Unicode conversion has messed up. Can anyone help with this?
Details, with code:
I'm calling a DLL from VB6 with a Declare statement, passing a
structure containing fixed-length strings of length 20 characters.
Behind the scenes VB creates a copy of the structure with ANSI strings
of 20 bytes. When the DLL returns, VB converts the strings from ANSI to
Unicode and copies the data back to the original structure.
If the DLL writes one Chinese character at the start of a 20 byte
string, that takes 2 bytes (in ANSI encoding). It pads with spaces,
making 19 characters in the 20 bytes. VB implicitly converts the
strings from ANSI to Unicode when the DLL returns. Those 19 characters
convert fine, but the 20th character in the VB string is garbage. VB
seems to just interpret the next two bytes in the structure (past the
end of the string!) as a Unicode character. And I don't mean it
converts them from ANSI! If the next two bytes are both hex 0x20, you
don't get a space, you get the Unicode character 0x2020.
I'm not going to explain why we're using Fortran DLLs, but there are
reasons! I think the same would happen with a C DLL but don't have time
to test it now. (I'm using Compaq Visual Fortran 6 by the way)
VB source code, in a form with a button:[color=darkred]
Private Type testStruct
sTestOne As String * 20
bytArray(0 To 20) As Byte
End Type
Private Declare Function TestDllFunction Lib "test32.dll" (ByRef
nTestNum As Integer, ByRef testStruct As testStruct) As Integer
Private Sub cmdTest_Click()
Dim i As Integer
Dim testRec As testStruct
Dim nNumChars As Integer
'Put some marker bytes past the string to help diagnose problem
For i = 0 To 20
testRec.bytArray(i) = 32 + i
Next i
nNumChars = 1
i = TestDllFunction(nNumChars, testRec)
Debug.Assert False
'Check testRec.sTestOne here - one garbage character at the end
'for each Chinese character returned
End Sub
<<<<
DLL source code. Fortran, but hopefully readable? ACHAR is like Chr$,
and I've hardcoded the ANSI encodings for the Chinese characters from 1
to 10[color=darkred]
INTEGER*2 FUNCTION TestDllFunction
+ (nNumChars, testRec )
IMPLICIT NONE
!DEC$ ATTRIBUTES DLLEXPORT::TestDllFunction !
This exports the name from the DLL
!DEC$ ATTRIBUTES ALIAS : "TestDllFunction" :: TestDllFunction !
This makes the name accessible to VB6
C ---
INTEGER*2 nNumChars
STRUCTURE /testStruct/
CHARACTER*20 sTestOne
INTEGER*1 bytArray(0:20)
END STRUCTURE
RECORD /testStruct/ testRec
INTEGER*2 i
C ---
!The Chinese characters for one to ten,
!when interpreted on simplified Chinese code page (936)
testRec%sTestOne(1:2) = ACHAR('D2'X) // ACHAR('BB'X)
testRec%sTestOne(3:4) = ACHAR('B6'X) // ACHAR('FE'X)
testRec%sTestOne(5:6) = ACHAR('C8'X) // ACHAR('FD'X)
testRec%sTestOne(7:8) = ACHAR('CB'X) // ACHAR('C4'X)
testRec%sTestOne(9:10) = ACHAR('CE'X) // ACHAR('E5'X)
testRec%sTestOne(11:12) = ACHAR('C1'X) // ACHAR('F9'X)
testRec%sTestOne(13:14) = ACHAR('C6'X) // ACHAR('DF'X)
testRec%sTestOne(15:16) = ACHAR('B0'X) // ACHAR('CB'X)
testRec%sTestOne(17:18) = ACHAR('BE'X) // ACHAR('C5'X)
testRec%sTestOne(19:20) = ACHAR('CA'X) // ACHAR('AE'X)
!Pad any leftover with spaces
DO i=(nNumChars*2)+1, 20
testRec%sTestOne(i:i) = ACHAR(32)
END DO
C ---
TestDllFunction=1
RETURN
END
<<<<
| |
| Tony Proctor 2005-09-06, 7:01 pm |
| Although I can't help you with this problem Mark, I'm aware that there have
been previous threading suggesting Ansi/Unicode problems with fixed-length
strings in VB. For instance:
http://groups.google.ie/group/micro...fa7e384cf?hl=en
Tony Proctor
"MarkJackson" <mark@cerc.co.uk> wrote in message
news:1126008342.898761.221930@g49g2000cwa.googlegroups.com...
> Hi
>
> Briefly:
> I have a Fortran DLL that returns a structure containing fixed-length
> strings to a VB6 program. Fine for English characters, but not for
> returning Chinese characters (using Chinese code page on the PC). In my
> debugger the strings are correct in the DLL, but in the VB there are
> garbage characters at the end. I can only think that VB's implicit ANSI
> to Unicode conversion has messed up. Can anyone help with this?
>
> Details, with code:
> I'm calling a DLL from VB6 with a Declare statement, passing a
> structure containing fixed-length strings of length 20 characters.
> Behind the scenes VB creates a copy of the structure with ANSI strings
> of 20 bytes. When the DLL returns, VB converts the strings from ANSI to
> Unicode and copies the data back to the original structure.
>
> If the DLL writes one Chinese character at the start of a 20 byte
> string, that takes 2 bytes (in ANSI encoding). It pads with spaces,
> making 19 characters in the 20 bytes. VB implicitly converts the
> strings from ANSI to Unicode when the DLL returns. Those 19 characters
> convert fine, but the 20th character in the VB string is garbage. VB
> seems to just interpret the next two bytes in the structure (past the
> end of the string!) as a Unicode character. And I don't mean it
> converts them from ANSI! If the next two bytes are both hex 0x20, you
> don't get a space, you get the Unicode character 0x2020.
>
> I'm not going to explain why we're using Fortran DLLs, but there are
> reasons! I think the same would happen with a C DLL but don't have time
> to test it now. (I'm using Compaq Visual Fortran 6 by the way)
>
> VB source code, in a form with a button:
>
> Private Type testStruct
> sTestOne As String * 20
> bytArray(0 To 20) As Byte
> End Type
>
> Private Declare Function TestDllFunction Lib "test32.dll" (ByRef
> nTestNum As Integer, ByRef testStruct As testStruct) As Integer
>
> Private Sub cmdTest_Click()
> Dim i As Integer
> Dim testRec As testStruct
> Dim nNumChars As Integer
>
> 'Put some marker bytes past the string to help diagnose problem
> For i = 0 To 20
> testRec.bytArray(i) = 32 + i
> Next i
>
> nNumChars = 1
> i = TestDllFunction(nNumChars, testRec)
>
> Debug.Assert False
> 'Check testRec.sTestOne here - one garbage character at the end
> 'for each Chinese character returned
> End Sub
>
> <<<<
>
> DLL source code. Fortran, but hopefully readable? ACHAR is like Chr$,
> and I've hardcoded the ANSI encodings for the Chinese characters from 1
> to 10
>
> INTEGER*2 FUNCTION TestDllFunction
> + (nNumChars, testRec )
> IMPLICIT NONE
> !DEC$ ATTRIBUTES DLLEXPORT::TestDllFunction !
> This exports the name from the DLL
> !DEC$ ATTRIBUTES ALIAS : "TestDllFunction" :: TestDllFunction !
> This makes the name accessible to VB6
> C ---
> INTEGER*2 nNumChars
> STRUCTURE /testStruct/
> CHARACTER*20 sTestOne
> INTEGER*1 bytArray(0:20)
> END STRUCTURE
> RECORD /testStruct/ testRec
> INTEGER*2 i
> C ---
> !The Chinese characters for one to ten,
> !when interpreted on simplified Chinese code page (936)
> testRec%sTestOne(1:2) = ACHAR('D2'X) // ACHAR('BB'X)
> testRec%sTestOne(3:4) = ACHAR('B6'X) // ACHAR('FE'X)
> testRec%sTestOne(5:6) = ACHAR('C8'X) // ACHAR('FD'X)
> testRec%sTestOne(7:8) = ACHAR('CB'X) // ACHAR('C4'X)
> testRec%sTestOne(9:10) = ACHAR('CE'X) // ACHAR('E5'X)
> testRec%sTestOne(11:12) = ACHAR('C1'X) // ACHAR('F9'X)
> testRec%sTestOne(13:14) = ACHAR('C6'X) // ACHAR('DF'X)
> testRec%sTestOne(15:16) = ACHAR('B0'X) // ACHAR('CB'X)
> testRec%sTestOne(17:18) = ACHAR('BE'X) // ACHAR('C5'X)
> testRec%sTestOne(19:20) = ACHAR('CA'X) // ACHAR('AE'X)
>
> !Pad any leftover with spaces
> DO i=(nNumChars*2)+1, 20
> testRec%sTestOne(i:i) = ACHAR(32)
> END DO
> C ---
> TestDllFunction=1
> RETURN
> END
>
> <<<<
>
| |
| Norman Diamond 2005-09-06, 9:58 pm |
| This problem happens with any code page that includes more than just
single-byte characters, it happens even if the DLL is written in VC++6 or
even if the DLL is built into Windows (part of the API), and it happens even
if a later version of VB is used.
A partial workaround, if your code page only has characters of length two
bytes or less, is to specify a string length double what you're actually
going to use, and then only store strings up to half of the specified
length. This workaround will likely help when calling DLLs that you coded
yourself. It might not be good enough for Windows APIs or some of VB's own
operations (such as random access file I/O).
Microsoft hinted that they might start to document part of this problem, but
I don't think they're going to fix most of it.
"MarkJackson" <mark@cerc.co.uk> wrote in message
news:1126008342.898761.221930@g49g2000cwa.googlegroups.com...
> Hi
>
> Briefly:
> I have a Fortran DLL that returns a structure containing fixed-length
> strings to a VB6 program. Fine for English characters, but not for
> returning Chinese characters (using Chinese code page on the PC). In my
> debugger the strings are correct in the DLL, but in the VB there are
> garbage characters at the end. I can only think that VB's implicit ANSI
> to Unicode conversion has messed up. Can anyone help with this?
>
> Details, with code:
> I'm calling a DLL from VB6 with a Declare statement, passing a
> structure containing fixed-length strings of length 20 characters.
> Behind the scenes VB creates a copy of the structure with ANSI strings
> of 20 bytes. When the DLL returns, VB converts the strings from ANSI to
> Unicode and copies the data back to the original structure.
>
> If the DLL writes one Chinese character at the start of a 20 byte
> string, that takes 2 bytes (in ANSI encoding). It pads with spaces,
> making 19 characters in the 20 bytes. VB implicitly converts the
> strings from ANSI to Unicode when the DLL returns. Those 19 characters
> convert fine, but the 20th character in the VB string is garbage. VB
> seems to just interpret the next two bytes in the structure (past the
> end of the string!) as a Unicode character. And I don't mean it
> converts them from ANSI! If the next two bytes are both hex 0x20, you
> don't get a space, you get the Unicode character 0x2020.
>
> I'm not going to explain why we're using Fortran DLLs, but there are
> reasons! I think the same would happen with a C DLL but don't have time
> to test it now. (I'm using Compaq Visual Fortran 6 by the way)
>
> VB source code, in a form with a button:
>
> Private Type testStruct
> sTestOne As String * 20
> bytArray(0 To 20) As Byte
> End Type
>
> Private Declare Function TestDllFunction Lib "test32.dll" (ByRef
> nTestNum As Integer, ByRef testStruct As testStruct) As Integer
>
> Private Sub cmdTest_Click()
> Dim i As Integer
> Dim testRec As testStruct
> Dim nNumChars As Integer
>
> 'Put some marker bytes past the string to help diagnose problem
> For i = 0 To 20
> testRec.bytArray(i) = 32 + i
> Next i
>
> nNumChars = 1
> i = TestDllFunction(nNumChars, testRec)
>
> Debug.Assert False
> 'Check testRec.sTestOne here - one garbage character at the end
> 'for each Chinese character returned
> End Sub
>
> <<<<
>
> DLL source code. Fortran, but hopefully readable? ACHAR is like Chr$,
> and I've hardcoded the ANSI encodings for the Chinese characters from 1
> to 10
>
> INTEGER*2 FUNCTION TestDllFunction
> + (nNumChars, testRec )
> IMPLICIT NONE
> !DEC$ ATTRIBUTES DLLEXPORT::TestDllFunction !
> This exports the name from the DLL
> !DEC$ ATTRIBUTES ALIAS : "TestDllFunction" :: TestDllFunction !
> This makes the name accessible to VB6
> C ---
> INTEGER*2 nNumChars
> STRUCTURE /testStruct/
> CHARACTER*20 sTestOne
> INTEGER*1 bytArray(0:20)
> END STRUCTURE
> RECORD /testStruct/ testRec
> INTEGER*2 i
> C ---
> !The Chinese characters for one to ten,
> !when interpreted on simplified Chinese code page (936)
> testRec%sTestOne(1:2) = ACHAR('D2'X) // ACHAR('BB'X)
> testRec%sTestOne(3:4) = ACHAR('B6'X) // ACHAR('FE'X)
> testRec%sTestOne(5:6) = ACHAR('C8'X) // ACHAR('FD'X)
> testRec%sTestOne(7:8) = ACHAR('CB'X) // ACHAR('C4'X)
> testRec%sTestOne(9:10) = ACHAR('CE'X) // ACHAR('E5'X)
> testRec%sTestOne(11:12) = ACHAR('C1'X) // ACHAR('F9'X)
> testRec%sTestOne(13:14) = ACHAR('C6'X) // ACHAR('DF'X)
> testRec%sTestOne(15:16) = ACHAR('B0'X) // ACHAR('CB'X)
> testRec%sTestOne(17:18) = ACHAR('BE'X) // ACHAR('C5'X)
> testRec%sTestOne(19:20) = ACHAR('CA'X) // ACHAR('AE'X)
>
> !Pad any leftover with spaces
> DO i=(nNumChars*2)+1, 20
> testRec%sTestOne(i:i) = ACHAR(32)
> END DO
> C ---
> TestDllFunction=1
> RETURN
> END
>
> <<<<
>
| |
| MarkJackson 2005-09-07, 7:58 am |
| Thanks Norman,
I thought it must be a VB bug, and would happen for anything called
through a Declare: DLL, API or whatever. I guess that's true.
I did consider doubling the length of all the strings, but that's a
reasonable amount of work in the VB and in our DLLs.
We're working round it by stripping off the garbage characters in the
VB. We call this routine below for each fixed-length string. Only the
characters that fit into the ANSI bytes are valid, everything after is
garbage.
Sub CorrectDLLString(ByRef sToCorrect As String)
Dim nLen As Long
Dim sWorking As String
'Number of characters in the fixed-length string
nLen = Len(sToCorrect)
'Slim down the string until we have only the characters that
'were actually returned from the DLL. Remove the garbage.
sWorking = sToCorrect
Do While LenB(StrConv(sWorking, vbFromUnicode)) > nLen
sWorking = Left$(sWorking, Len(sWorking) - 1)
Loop
'Just in case, explicitly pad with spaces up to the full length
sToCorrect = sWorking & Space$(nLen - Len(sWorking))
End Sub
| |
| Someone 2005-09-07, 7:02 pm |
| You may want to check these articles...
http://support.microsoft.com/defaul...kb;en-us;205277
http://support.microsoft.com/defaul...kb;en-us;199824
Especially read the part about StrPtr() in the last article.
VB auto converts between ANSI/Unicode when you use ByVal keyword, but not
when a string is passed ByRef. In ByRef case, the string is based as is, in
Unicode format. VB UI is ANSI only, so if you print text that is not in the
current code page, you will see question marks. One way to show Unicode
chars in VB6 for testing only is by using "Microsoft Forms 2.0 Object
Library" in Project|Components. This will add some Unicode controls to your
project, but you can't redistribute this component, because it requires and
comes with MS Office. If you want UI Unicode support, you have to make your
own, require the user to get MS Office, or get a third party control...
"MarkJackson" <mark@cerc.co.uk> wrote in message
news:1126008342.898761.221930@g49g2000cwa.googlegroups.com...
> Hi
>
> Briefly:
> I have a Fortran DLL that returns a structure containing fixed-length
> strings to a VB6 program. Fine for English characters, but not for
> returning Chinese characters (using Chinese code page on the PC). In my
> debugger the strings are correct in the DLL, but in the VB there are
> garbage characters at the end. I can only think that VB's implicit ANSI
> to Unicode conversion has messed up. Can anyone help with this?
>
> Details, with code:
> I'm calling a DLL from VB6 with a Declare statement, passing a
> structure containing fixed-length strings of length 20 characters.
> Behind the scenes VB creates a copy of the structure with ANSI strings
> of 20 bytes. When the DLL returns, VB converts the strings from ANSI to
> Unicode and copies the data back to the original structure.
>
> If the DLL writes one Chinese character at the start of a 20 byte
> string, that takes 2 bytes (in ANSI encoding). It pads with spaces,
> making 19 characters in the 20 bytes. VB implicitly converts the
> strings from ANSI to Unicode when the DLL returns. Those 19 characters
> convert fine, but the 20th character in the VB string is garbage. VB
> seems to just interpret the next two bytes in the structure (past the
> end of the string!) as a Unicode character. And I don't mean it
> converts them from ANSI! If the next two bytes are both hex 0x20, you
> don't get a space, you get the Unicode character 0x2020.
>
> I'm not going to explain why we're using Fortran DLLs, but there are
> reasons! I think the same would happen with a C DLL but don't have time
> to test it now. (I'm using Compaq Visual Fortran 6 by the way)
>
> VB source code, in a form with a button:
>
> Private Type testStruct
> sTestOne As String * 20
> bytArray(0 To 20) As Byte
> End Type
>
> Private Declare Function TestDllFunction Lib "test32.dll" (ByRef
> nTestNum As Integer, ByRef testStruct As testStruct) As Integer
>
> Private Sub cmdTest_Click()
> Dim i As Integer
> Dim testRec As testStruct
> Dim nNumChars As Integer
>
> 'Put some marker bytes past the string to help diagnose problem
> For i = 0 To 20
> testRec.bytArray(i) = 32 + i
> Next i
>
> nNumChars = 1
> i = TestDllFunction(nNumChars, testRec)
>
> Debug.Assert False
> 'Check testRec.sTestOne here - one garbage character at the end
> 'for each Chinese character returned
> End Sub
>
> <<<<
>
> DLL source code. Fortran, but hopefully readable? ACHAR is like Chr$,
> and I've hardcoded the ANSI encodings for the Chinese characters from 1
> to 10
>
> INTEGER*2 FUNCTION TestDllFunction
> + (nNumChars, testRec )
> IMPLICIT NONE
> !DEC$ ATTRIBUTES DLLEXPORT::TestDllFunction !
> This exports the name from the DLL
> !DEC$ ATTRIBUTES ALIAS : "TestDllFunction" :: TestDllFunction !
> This makes the name accessible to VB6
> C ---
> INTEGER*2 nNumChars
> STRUCTURE /testStruct/
> CHARACTER*20 sTestOne
> INTEGER*1 bytArray(0:20)
> END STRUCTURE
> RECORD /testStruct/ testRec
> INTEGER*2 i
> C ---
> !The Chinese characters for one to ten,
> !when interpreted on simplified Chinese code page (936)
> testRec%sTestOne(1:2) = ACHAR('D2'X) // ACHAR('BB'X)
> testRec%sTestOne(3:4) = ACHAR('B6'X) // ACHAR('FE'X)
> testRec%sTestOne(5:6) = ACHAR('C8'X) // ACHAR('FD'X)
> testRec%sTestOne(7:8) = ACHAR('CB'X) // ACHAR('C4'X)
> testRec%sTestOne(9:10) = ACHAR('CE'X) // ACHAR('E5'X)
> testRec%sTestOne(11:12) = ACHAR('C1'X) // ACHAR('F9'X)
> testRec%sTestOne(13:14) = ACHAR('C6'X) // ACHAR('DF'X)
> testRec%sTestOne(15:16) = ACHAR('B0'X) // ACHAR('CB'X)
> testRec%sTestOne(17:18) = ACHAR('BE'X) // ACHAR('C5'X)
> testRec%sTestOne(19:20) = ACHAR('CA'X) // ACHAR('AE'X)
>
> !Pad any leftover with spaces
> DO i=(nNumChars*2)+1, 20
> testRec%sTestOne(i:i) = ACHAR(32)
> END DO
> C ---
> TestDllFunction=1
> RETURN
> END
>
> <<<<
>
| |
| Norman Diamond 2005-09-07, 9:57 pm |
| "Someone" <nobody@cox.net> wrote in message
news:zHHTe.11949$ct5.5111@fed1read04...
> You may want to check these articles...
>
> http://support.microsoft.com/defaul...kb;en-us;205277
I think you're right. In this article, the section which is relevant to the
present discussion is labelled
> Pass a user-defined data type that contains strings to a C function or to
> a C++ function that expects a pointer to a structure
You can jump right to it with this link:
http://support.microsoft.com/defaul...122120121120120
The sample code requires the VB program to use VarPtr(). I think it has a
chance of working because the Declare statement specifies a parameter of
type Long, which we know is really the address of the structure, but VB
doesn't know that there's a string attached so VB doesn't convert it. This
could work with a Unicode Windows API (WhateverNameW() but not
WhateverNameA()). It could work with the DLL that the original poster is
using here if the original poster codes the DLL to use Unicode.
> http://support.microsoft.com/defaul...kb;en-us;199824
>
> Especially read the part about StrPtr() in the last article.
No, the user needs to pass the address of the entire UDT, so he needs
VarPtr() not StrPtr().
> VB auto converts between ANSI/Unicode when you use ByVal keyword, but not
> when a string is passed ByRef.
I think you're now talking about calls through ordinary Declares with
correct type declarations instead of using VarPtr() or StrPtr().
Unfortunately this is nonsense. VB6 converts between Unicode and ANSI for
both ByVal and ByRef. In VB.Net a Declare statement can specify whether to
pass Unicode or convert to ANSI. In VB.Net if a conversion to ANSI involves
a UDT containing a fixed length string then it still screws up the same way
VB6 does. Again the broken conversion can be avoided if the original poster
codes the DLL to use Unicode.
| |
| Someone 2005-09-07, 9:57 pm |
| "Norman Diamond" <ndiamond@community.nospam> wrote in message
news:u9D1QVBtFHA.3628@TK2MSFTNGP14.phx.gbl...
> "Someone" <nobody@cox.net> wrote in message
> news:zHHTe.11949$ct5.5111@fed1read04...
>
> I think you're now talking about calls through ordinary Declares with
> correct type declarations instead of using VarPtr() or StrPtr().
> Unfortunately this is nonsense. VB6 converts between Unicode and ANSI for
> both ByVal and ByRef.
Sorry, you are correct, it has been sometime since I used ByRef with a DLL.
| |
| MarkJackson 2005-09-08, 7:58 am |
| >It could work with the DLL that the original poster is
>using here if the original poster codes the DLL to use Unicode.
I think this would work because it bypasses VB's ANSI to Unicode
conversion, as Norman says. However it would be a pain to change the
DLLs to Unicode, because they're Fortran! I'm going to stick with my
workaround, which means I don't have to change the DLLs.
Thanks for the Microsoft article, it's very good & the techniques in it
look useful. But, being a bit picky, there's a mistake in one of the
remarks.
"The American National Standards Institute system (ANSI) and the
Unicode system (Unicode) are two systems to represent characters. ANSI
uses one byte to store characters. Unicode uses two bytes to store
characters."
The Windows "ANSI" strings aren't always one byte per character, what
about double-byte code pages?
| |
| Tony Proctor 2005-09-08, 7:58 am |
| ANSI character sets includes DBCS too
Tony Proctor
"MarkJackson" <mark@cerc.co.uk> wrote in message
news:1126170812.701266.311340@z14g2000cwz.googlegroups.com...
>
> I think this would work because it bypasses VB's ANSI to Unicode
> conversion, as Norman says. However it would be a pain to change the
> DLLs to Unicode, because they're Fortran! I'm going to stick with my
> workaround, which means I don't have to change the DLLs.
>
> Thanks for the Microsoft article, it's very good & the techniques in it
> look useful. But, being a bit picky, there's a mistake in one of the
> remarks.
>
> "The American National Standards Institute system (ANSI) and the
> Unicode system (Unicode) are two systems to represent characters. ANSI
> uses one byte to store characters. Unicode uses two bytes to store
> characters."
>
> The Windows "ANSI" strings aren't always one byte per character, what
> about double-byte code pages?
>
| |
| Mark Yudkin 2005-09-09, 3:57 am |
| I avoid this problem by combining two techniques:
1) instead of using fixed length strings, I use the string$(len+1, 0)
function to initialize a normal string function. Len is made large enough to
work for all DBCS characters.
2) The non-VB DLL always uses nul-terminated strings, regardless of whether
the length is fixed (even 1), and takes the value of len from 1). That DLL
also honours the len, meaning it will never write more than len chars, and
can ensure a nul terminator is present , as the storage length is len + 1.
A custom VB6 StrZToStr routine that locates the nul and uses left$ to get
the real string then completes the equation.
You may also wish to consider passing the native Unicode around, if your DLL
is able to understand Unicode. See KB145727 for documentation on the VB6
side of things.
"MarkJackson" <mark@cerc.co.uk> wrote in message
news:1126008342.898761.221930@g49g2000cwa.googlegroups.com...
> Hi
>
> Briefly:
> I have a Fortran DLL that returns a structure containing fixed-length
> strings to a VB6 program. Fine for English characters, but not for
> returning Chinese characters (using Chinese code page on the PC). In my
> debugger the strings are correct in the DLL, but in the VB there are
> garbage characters at the end. I can only think that VB's implicit ANSI
> to Unicode conversion has messed up. Can anyone help with this?
>
> Details, with code:
> I'm calling a DLL from VB6 with a Declare statement, passing a
> structure containing fixed-length strings of length 20 characters.
> Behind the scenes VB creates a copy of the structure with ANSI strings
> of 20 bytes. When the DLL returns, VB converts the strings from ANSI to
> Unicode and copies the data back to the original structure.
>
> If the DLL writes one Chinese character at the start of a 20 byte
> string, that takes 2 bytes (in ANSI encoding). It pads with spaces,
> making 19 characters in the 20 bytes. VB implicitly converts the
> strings from ANSI to Unicode when the DLL returns. Those 19 characters
> convert fine, but the 20th character in the VB string is garbage. VB
> seems to just interpret the next two bytes in the structure (past the
> end of the string!) as a Unicode character. And I don't mean it
> converts them from ANSI! If the next two bytes are both hex 0x20, you
> don't get a space, you get the Unicode character 0x2020.
>
> I'm not going to explain why we're using Fortran DLLs, but there are
> reasons! I think the same would happen with a C DLL but don't have time
> to test it now. (I'm using Compaq Visual Fortran 6 by the way)
>
> VB source code, in a form with a button:
>
> Private Type testStruct
> sTestOne As String * 20
> bytArray(0 To 20) As Byte
> End Type
>
> Private Declare Function TestDllFunction Lib "test32.dll" (ByRef
> nTestNum As Integer, ByRef testStruct As testStruct) As Integer
>
> Private Sub cmdTest_Click()
> Dim i As Integer
> Dim testRec As testStruct
> Dim nNumChars As Integer
>
> 'Put some marker bytes past the string to help diagnose problem
> For i = 0 To 20
> testRec.bytArray(i) = 32 + i
> Next i
>
> nNumChars = 1
> i = TestDllFunction(nNumChars, testRec)
>
> Debug.Assert False
> 'Check testRec.sTestOne here - one garbage character at the end
> 'for each Chinese character returned
> End Sub
>
> <<<<
>
> DLL source code. Fortran, but hopefully readable? ACHAR is like Chr$,
> and I've hardcoded the ANSI encodings for the Chinese characters from 1
> to 10
>
> INTEGER*2 FUNCTION TestDllFunction
> + (nNumChars, testRec )
> IMPLICIT NONE
> !DEC$ ATTRIBUTES DLLEXPORT::TestDllFunction !
> This exports the name from the DLL
> !DEC$ ATTRIBUTES ALIAS : "TestDllFunction" :: TestDllFunction !
> This makes the name accessible to VB6
> C ---
> INTEGER*2 nNumChars
> STRUCTURE /testStruct/
> CHARACTER*20 sTestOne
> INTEGER*1 bytArray(0:20)
> END STRUCTURE
> RECORD /testStruct/ testRec
> INTEGER*2 i
> C ---
> !The Chinese characters for one to ten,
> !when interpreted on simplified Chinese code page (936)
> testRec%sTestOne(1:2) = ACHAR('D2'X) // ACHAR('BB'X)
> testRec%sTestOne(3:4) = ACHAR('B6'X) // ACHAR('FE'X)
> testRec%sTestOne(5:6) = ACHAR('C8'X) // ACHAR('FD'X)
> testRec%sTestOne(7:8) = ACHAR('CB'X) // ACHAR('C4'X)
> testRec%sTestOne(9:10) = ACHAR('CE'X) // ACHAR('E5'X)
> testRec%sTestOne(11:12) = ACHAR('C1'X) // ACHAR('F9'X)
> testRec%sTestOne(13:14) = ACHAR('C6'X) // ACHAR('DF'X)
> testRec%sTestOne(15:16) = ACHAR('B0'X) // ACHAR('CB'X)
> testRec%sTestOne(17:18) = ACHAR('BE'X) // ACHAR('C5'X)
> testRec%sTestOne(19:20) = ACHAR('CA'X) // ACHAR('AE'X)
>
> !Pad any leftover with spaces
> DO i=(nNumChars*2)+1, 20
> testRec%sTestOne(i:i) = ACHAR(32)
> END DO
> C ---
> TestDllFunction=1
> RETURN
> END
>
> <<<<
>
| |
| MarkJackson 2005-09-09, 7:02 pm |
| >ANSI character sets includes DBCS too
Yep, so my point is that Microsoft article is wrong when it says "ANSI
uses one byte to store characters"
| |
| Someone 2005-09-09, 7:02 pm |
| You may find this TrimNull function useful...
http://vbnet.mvps.org/index.html?code/core/trimnull.htm
"Mark Yudkin" <myudkinATcompuserveDOTcom@boingboing.org> wrote in message
news:u60T84QtFHA.3720@TK2MSFTNGP14.phx.gbl...
>I avoid this problem by combining two techniques:
>
> 1) instead of using fixed length strings, I use the string$(len+1, 0)
> function to initialize a normal string function. Len is made large enough
> to work for all DBCS characters.
>
> 2) The non-VB DLL always uses nul-terminated strings, regardless of
> whether the length is fixed (even 1), and takes the value of len from 1).
> That DLL also honours the len, meaning it will never write more than len
> chars, and can ensure a nul terminator is present , as the storage length
> is len + 1.
>
> A custom VB6 StrZToStr routine that locates the nul and uses left$ to get
> the real string then completes the equation.
>
> You may also wish to consider passing the native Unicode around, if your
> DLL is able to understand Unicode. See KB145727 for documentation on the
> VB6 side of things.
>
> "MarkJackson" <mark@cerc.co.uk> wrote in message
> news:1126008342.898761.221930@g49g2000cwa.googlegroups.com...
>
>
| |
| Tony Proctor 2005-09-09, 7:02 pm |
| Sorry Mark. I read your reply in a hurry and didn't notice that it included
a quotation. You're absolutely right, and that Microsoft article is very
wrong.
As they say over here: "I'll get my coat..."
Tony Proctor
"Tony Proctor" <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote in message
news:#cGgWdFtFHA.2392@tk2msftngp13.phx.gbl...
> ANSI character sets includes DBCS too
>
> Tony Proctor
>
> "MarkJackson" <mark@cerc.co.uk> wrote in message
> news:1126170812.701266.311340@z14g2000cwz.googlegroups.com...
>
>
| |
| Norman Diamond 2005-09-11, 9:57 pm |
| "Mark Yudkin" <myudkinATcompuserveDOTcom@boingboing.org> wrote in message
news:u60T84QtFHA.3720@TK2MSFTNGP14.phx.gbl...
>I avoid this problem by combining two techniques:
>
> 1) instead of using fixed length strings, I use the string$(len+1, 0)
> function to initialize a normal string function. Len is made large enough
> to work for all DBCS characters.
For an API (or other DLL) that wants a structure to contain a pointer to a
string, that is of course the right thing to do. For other APIs (and the
original poster's DLL) that cannot be done.
> 2) The non-VB DLL always uses nul-terminated strings, regardless of
> whether the length is fixed (even 1), and takes the value of len from 1).
> That DLL also honours the len, meaning it will never write more than len
> chars, and can ensure a nul terminator is present , as the storage length
> is len + 1.
And then, even if your DLL takes a structure that contains a pointer to a
string, you mess up the string the same way as VB does in the original
poster's problem. The VB program is sending at most len+1 characters
including the nul terminator. For a DBCS that will be at most 2*len+1 bytes
(the nul terminator is a single byte -- though if you prefer, you can say at
most 2*len+2 bytes and you'll still be safe). The problem is when your DLL
writes at most len chars. In C/C++, char is byte. If any character needed
more than one char then you're losing some of the data.
| |
| Mark Yudkin 2005-09-13, 3:57 am |
|
"Norman Diamond" <ndiamond@community.nospam> wrote in message
news:%23F5Y8B0tFHA.4080@TK2MSFTNGP12.phx.gbl...
> "Mark Yudkin" <myudkinATcompuserveDOTcom@boingboing.org> wrote in message
> news:u60T84QtFHA.3720@TK2MSFTNGP14.phx.gbl...
>
>
> For an API (or other DLL) that wants a structure to contain a pointer to a
> string, that is of course the right thing to do. For other APIs (and the
> original poster's DLL) that cannot be done.
I had missed that he'd embedded fixed length strings into structures. For
those cases, the string*n must be used, where n = 2*len+1. The
initialization to string$(n, 0) remains though.
>
>
> And then, even if your DLL takes a structure that contains a pointer to a
> string, you mess up the string the same way as VB does in the original
> poster's problem. The VB program is sending at most len+1 characters
> including the nul terminator. For a DBCS that will be at most 2*len+1
> bytes (the nul terminator is a single byte -- though if you prefer, you
> can say at most 2*len+2 bytes and you'll still be safe). The problem is
> when your DLL writes at most len chars. In C/C++, char is byte. If any
> character needed more than one char then you're losing some of the data.
As you say, C/C++ may not write all characters, or even the terminating nul.
For this reason, the VB6 caller does it. Since the structure is declared in
both VB6 and C/C++, and the C/C++ code knows that it may only address n-1
bytes (because that is the rule you keep to), your VB6 always sees a
correctly formed nul-terminated string.
Problems only arise when the C/C++ DLL is not yours and will not guarantee
nul-termination,. Such an API is going to be hard to cope with even in C/C++
clients, but the technique you would use there is the one you'd use in VB6.
>
| |
| Tony Proctor 2005-09-13, 7:58 am |
| The 2*len+1 is a valid max for a DBCS Norman, but not all MBCS are true
DBCS. For instance, EUC has some triple-byte sequences, and UTF-8 has some
4-byte sequences. It's a rule that doesn't work for all locale settings.
Tony Proctor
"Norman Diamond" <ndiamond@community.nospam> wrote in message
news:#F5Y8B0tFHA.4080@TK2MSFTNGP12.phx.gbl...
> "Mark Yudkin" <myudkinATcompuserveDOTcom@boingboing.org> wrote in message
> news:u60T84QtFHA.3720@TK2MSFTNGP14.phx.gbl...
>
enough[color=darkred]
>
> For an API (or other DLL) that wants a structure to contain a pointer to a
> string, that is of course the right thing to do. For other APIs (and the
> original poster's DLL) that cannot be done.
>
1).[color=darkred]
length[color=darkred]
>
> And then, even if your DLL takes a structure that contains a pointer to a
> string, you mess up the string the same way as VB does in the original
> poster's problem. The VB program is sending at most len+1 characters
> including the nul terminator. For a DBCS that will be at most 2*len+1
bytes
> (the nul terminator is a single byte -- though if you prefer, you can say
at
> most 2*len+2 bytes and you'll still be safe). The problem is when your
DLL
> writes at most len chars. In C/C++, char is byte. If any character
needed
> more than one char then you're losing some of the data.
>
| |
| Norman Diamond 2005-09-13, 9:57 pm |
| Yes if this thread had discussed more than DBCS (i.e. MBCS) prior to my
previous posting then I would have posted a more general formula. Prior to
your posting, this thread had only been concerned with conversions between
DBCS ANSI code pages and Microsoft's version of Unicode.
By the way is there an ANSI code page number for EUC? Internet Explorer
barfs on nearly half the contents of the following web page:
http://www.rikai.com/library/kanjit...codes.euc.shtml
(I wonder what Internet Explorer would do if that page were encoded in
UTF-8.)
"Tony Proctor" <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote in message
news:OLoTFGEuFHA.2072@TK2MSFTNGP14.phx.gbl...
> The 2*len+1 is a valid max for a DBCS Norman, but not all MBCS are true
> DBCS. For instance, EUC has some triple-byte sequences, and UTF-8 has some
> 4-byte sequences. It's a rule that doesn't work for all locale settings.
>
> Tony Proctor
>
> "Norman Diamond" <ndiamond@community.nospam> wrote in message
> news:#F5Y8B0tFHA.4080@TK2MSFTNGP12.phx.gbl...
> enough
> 1).
> length
> bytes
> at
> DLL
> needed
>
>
| |
| Tony Proctor 2005-09-14, 7:58 am |
| My point was Norman that many people talk of DBCS when they actually mean
MBCS, and may not be aware that some ANSI code pages are, in fact, real
MBCS. The OP may have been talking about a specific locale, but this thread
may be viewed in the future by people using other locales.
I believe Japanese EUC is code page 51932
Tony Proctor
"Norman Diamond" <ndiamond@community.nospam> wrote in message
news:eAzfdYMuFHA.3588@tk2msftngp13.phx.gbl...
> Yes if this thread had discussed more than DBCS (i.e. MBCS) prior to my
> previous posting then I would have posted a more general formula. Prior
to
> your posting, this thread had only been concerned with conversions between
> DBCS ANSI code pages and Microsoft's version of Unicode.
>
> By the way is there an ANSI code page number for EUC? Internet Explorer
> barfs on nearly half the contents of the following web page:
> http://www.rikai.com/library/kanjit...codes.euc.shtml
> (I wonder what Internet Explorer would do if that page were encoded in
> UTF-8.)
>
> "Tony Proctor" <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote in
message
> news:OLoTFGEuFHA.2072@TK2MSFTNGP14.phx.gbl...
some[color=darkred]
message[color=darkred]
to[color=darkred]
the[color=darkred]
a[color=darkred]
say[color=darkred]
>
| |
| Someone 2005-09-14, 7:02 pm |
| You may find this reference useful:
http://www.microsoft.com/globaldev/
When you move the mouse over "References" on the left menu, you could see
the list of code pages, locale ID's, keyboard layouts and other useful
information. When you select a particular code page, you will see a GIF that
shows a table showing what each character looks like.
"Tony Proctor" <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote in message
news:%230GmRxQuFHA.1136@TK2MSFTNGP12.phx.gbl...
> My point was Norman that many people talk of DBCS when they actually mean
> MBCS, and may not be aware that some ANSI code pages are, in fact, real
> MBCS. The OP may have been talking about a specific locale, but this
> thread
> may be viewed in the future by people using other locales.
>
> I believe Japanese EUC is code page 51932
>
> Tony Proctor
>
> "Norman Diamond" <ndiamond@community.nospam> wrote in message
> news:eAzfdYMuFHA.3588@tk2msftngp13.phx.gbl...
> to
> message
> some
> message
> to
> the
> a
> say
>
>
| |
| Norman Diamond 2005-09-15, 3:58 am |
| "Tony Proctor" <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote in message
news:%230GmRxQuFHA.1136@TK2MSFTNGP12.phx.gbl...
> My point was Norman that many people talk of DBCS when they actually mean
> MBCS, and may not be aware that some ANSI code pages are, in fact, real
> MBCS.
OK, I agree. I should not have read the original posting so literally.
| |
| Tony Proctor 2005-09-15, 7:57 am |
| Sorry, I didn't mean it to sound as though I wasn't getting at you. Merely
"rounding off" the thread a little for future readers. Actually, I just saw
another of your posts, in the kernel group, about the second time you'd been
reminded today that you read things to literally :-) I think you're
drinking too much coffee Norman.
Tony Proctor
"Norman Diamond" <ndiamond@community.nospam> wrote in message
news:eDjG8eauFHA.3224@TK2MSFTNGP10.phx.gbl...
> "Tony Proctor" <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote in
message
> news:%230GmRxQuFHA.1136@TK2MSFTNGP12.phx.gbl...
mean[color=darkred]
>
> OK, I agree. I should not have read the original posting so literally.
>
| |
| Norman Diamond 2005-09-15, 9:57 pm |
| For 40 years I've been told that I read things too literally. Of course
this is the personality trait that led me to become a computer
"Tony Proctor" <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote in message
news:Oa4qnTeuFHA.3000@TK2MSFTNGP12.phx.gbl...
> Sorry, I didn't mean it to sound as though I wasn't getting at you. Merely
> "rounding off" the thread a little for future readers. Actually, I just
> saw
> another of your posts, in the kernel group, about the second time you'd
> been
> reminded today that you read things to literally :-) I think you're
> drinking too much coffee Norman.
>
> Tony Proctor
>
> "Norman Diamond" <ndiamond@community.nospam> wrote in message
> news:eDjG8eauFHA.3224@TK2MSFTNGP10.phx.gbl...
> message
> mean
>
>
|
|
|
|
|