Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Still having charset problems with Tomcat 5 on Windows
Hi, I'm back trying to sort out what happens to =A3 (UK currency symbol)
in  a JSP form running on Tomcat 5 under Windows. I have reduced the
problem to a simple example, which I enclose below. If I enter =A3 in
the textarea and submit the form, the =A3 gets prefixed with an accented
A=2E The A also appears in the query string in the browser's address bar
as %C2. However, if I save the source of the displayed JSP as an HTML
file, submitting the form displays only the =A3 (%A3) in the query
string.
Any help would be GREATLY appreciated.
TIA
Brian

Here is the JSP:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<%@page contentType=3D"text/html;charset=3DUTF-8"%>
<%@page pageEncoding=3D"UTF-8"%>
<%@ taglib prefix=3D"c" uri=3D"http://java.sun.com/jsp/jstl/core" %>
<html>
<head><title>JSP Page</title></head>
<body>
<form>
=A3<br>
<textarea name=3Dtext1>
<c:out value=3D"${param.text1}"/>
</textarea><br>
<input type=3Dsubmit name=3Dsubmit value=3D'submit'/>
</form>
</body>
</html>

Here is the URL displayed after submitting the form:
http://localhost:8084/Test/index.js...submit=3Dsubmit


Report this thread to moderator Post Follow-up to this message
Old Post
bdobby@fish.co.uk
10-26-04 01:59 PM


Re: Still having charset problems with Tomcat 5 on Windows
bdobby@fish.co.uk wrote:
> Hi, I'm back trying to sort out what happens to £ (UK currency symbol)
> in  a JSP form running on Tomcat 5 under Windows. I have reduced the
> problem to a simple example, which I enclose below. If I enter £ in
> the textarea and submit the form, the £ gets prefixed with an accented
> A. The A also appears in the query string in the browser's address bar
> as %C2. However, if I save the source of the displayed JSP as an HTML
> file, submitting the form displays only the £ (%A3) in the query
> string.
> Any help would be GREATLY appreciated.
> TIA
> Brian

The accented A is a UTF-8 character with its MSB set indicating that the
pound sign is encoded into two bytes instead of one.  This is normal
behaviour for UTF-8 and nothing to worry about.  It occurs in this case
because you have made the page encoding UTF-8 (this is sent in the HTTP
headers and will not be present when the page is saved to file).  Try
setting the encoding to iso-8859-1 in the two <%@page> tags and see what
happens.

There are fundamental flaws in specifying and detecting the character
set used for submitted form data so you can't always assume that the
data will be passed in the same character set that was used to deliver
the page.  The link
http://ppewww.ph.gla.ac.uk/~flavell.../form-i18n.html has some tips
on how to overcome this.

HTH
Gerard

Report this thread to moderator Post Follow-up to this message
Old Post
Gerard Krupa
10-26-04 08:57 PM


Re: Still having charset problems with Tomcat 5 on Windows
Nothing wrong with your script, it's a browser (at least IE) flow.

Look at this search query from google ("pound", "sign", <pound sigh> ):

http://www.google.com/search?hl=en&...%A3&btnG=Search

The problem lies in very unstable Unicode reading for chars with first bite
eq 0.

Somehow the system gets lost with such chars when the coding is set to UTF-8
It cannot "get" that %A3 or such is really %00A3.
Instead the system tries to "guess" the right Unicode table.
Strangely enough 99% of its guess is Korean, so it's prefixing the chars
with %C2 - right in the middle of Hangul (Korean syllable alphabet).
More about the special Korean meaning in IE (which seams to be a debugging
trash left by one of IE developers) you can read in comp.lang.javascript,
look the thread by keywords "Bizarre JS brackets bug".

The situation is not so desperate though: at least YOU know what table to
use, so drop C2 (or whatever trach you'll get) and re-prefix it with 00
Another solution would be to use char-entities instead wherever it's
possible.






Report this thread to moderator Post Follow-up to this message
Old Post
VK
10-26-04 08:57 PM


Re: Still having charset problems with Tomcat 5 on Windows
VK wrote:
> Somehow the system gets lost with such chars when the coding is set to UTF
-8
> It cannot "get" that %A3 or such is really %00A3.
> Instead the system tries to "guess" the right Unicode table.
> Strangely enough 99% of its guess is Korean, so it's prefixing the chars
> with %C2 - right in the middle of Hangul (Korean syllable alphabet).
> More about the special Korean meaning in IE (which seams to be a debugging
> trash left by one of IE developers) you can read in comp.lang.javascript,
> look the thread by keywords "Bizarre JS brackets bug".

C2A3 is the correct UTF-8 encoding for pound sign (correctly passed by
the browser as specified in the page encoding) - see
http://www1.tip.nl/~t876506/utf8tbl.html.  When this is converted into a
java.lang.String, the system is probably using the default iso-latin
string encoding and performing a single-byte conversion.  I don't
believe that any 16-bit unicode matching is being performed at all.

I have performed a quick test with IE by adding the following to a form:
<input type="hidden" name="_charset_" />
This is an IE-only trick that can tell you the encoding of submitted
parameters.  This confirms that the data is being passed using UTF-8.
In fact, IE continues to encode form data in UTF-8 even if the page
encoding is changed to UTF-16.

Regards,
Gerard

Report this thread to moderator Post Follow-up to this message
Old Post
Gerard Krupa
10-26-04 08:57 PM


Re: Still having charset problems with Tomcat 5 on Windows
Gerard Krupa wrote:
> There are fundamental flaws in specifying and detecting the character
> set used for submitted form data so you can't always assume that the
> data will be passed in the same character set that was used to deliver
> the page.  The link
> http://ppewww.ph.gla.ac.uk/~flavell.../form-i18n.html has some tips
> on how to overcome this.

You may also want to see

https://bugzilla.mozilla.org/show_bug.cgi?id=241540

--
 ========================================
================================
Clearly, there is no political benefit to expediting the admission of
legal immigrants into the United States.  Nevertheless, I believe that
our elected officials have an obligation to do more than simply pander
to the thinly veiled racism of their constituents.
Ian Pilcher
 ========================================
================================

Report this thread to moderator Post Follow-up to this message
Old Post
Ian Pilcher
10-26-04 08:57 PM


Re: Still having charset problems with Tomcat 5 on Windows
Thanks, Gerard. Changing the page-encoding to ISO-8859-1 did the trick.
Thanks again
Brian


Report this thread to moderator Post Follow-up to this message
Old Post
bdobby@fish.co.uk
10-27-04 08:57 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Java Programmer archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 08:12 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.