Code Comments
Programming Forum and web based access to our favorite programming groups.Hi, I'm back trying to sort out what happens to =A3 (UK currency symbol)
in a JSP form running on Tomcat 5 under Windows. I have reduced the
problem to a simple example, which I enclose below. If I enter =A3 in
the textarea and submit the form, the =A3 gets prefixed with an accented
A=2E The A also appears in the query string in the browser's address bar
as %C2. However, if I save the source of the displayed JSP as an HTML
file, submitting the form displays only the =A3 (%A3) in the query
string.
Any help would be GREATLY appreciated.
TIA
Brian
Here is the JSP:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<%@page contentType=3D"text/html;charset=3DUTF-8"%>
<%@page pageEncoding=3D"UTF-8"%>
<%@ taglib prefix=3D"c" uri=3D"http://java.sun.com/jsp/jstl/core" %>
<html>
<head><title>JSP Page</title></head>
<body>
<form>
=A3<br>
<textarea name=3Dtext1>
<c:out value=3D"${param.text1}"/>
</textarea><br>
<input type=3Dsubmit name=3Dsubmit value=3D'submit'/>
</form>
</body>
</html>
Here is the URL displayed after submitting the form:
http://localhost:8084/Test/index.js...submit=3Dsubmit
Post Follow-up to this messagebdobby@fish.co.uk wrote: > Hi, I'm back trying to sort out what happens to £ (UK currency symbol) > in a JSP form running on Tomcat 5 under Windows. I have reduced the > problem to a simple example, which I enclose below. If I enter £ in > the textarea and submit the form, the £ gets prefixed with an accented > A. The A also appears in the query string in the browser's address bar > as %C2. However, if I save the source of the displayed JSP as an HTML > file, submitting the form displays only the £ (%A3) in the query > string. > Any help would be GREATLY appreciated. > TIA > Brian The accented A is a UTF-8 character with its MSB set indicating that the pound sign is encoded into two bytes instead of one. This is normal behaviour for UTF-8 and nothing to worry about. It occurs in this case because you have made the page encoding UTF-8 (this is sent in the HTTP headers and will not be present when the page is saved to file). Try setting the encoding to iso-8859-1 in the two <%@page> tags and see what happens. There are fundamental flaws in specifying and detecting the character set used for submitted form data so you can't always assume that the data will be passed in the same character set that was used to deliver the page. The link http://ppewww.ph.gla.ac.uk/~flavell.../form-i18n.html has some tips on how to overcome this. HTH Gerard
Post Follow-up to this messageNothing wrong with your script, it's a browser (at least IE) flow.
Look at this search query from google ("pound", "sign", <pound sigh> ):
http://www.google.com/search?hl=en&...%A3&btnG=Search
The problem lies in very unstable Unicode reading for chars with first bite
eq 0.
Somehow the system gets lost with such chars when the coding is set to UTF-8
It cannot "get" that %A3 or such is really %00A3.
Instead the system tries to "guess" the right Unicode table.
Strangely enough 99% of its guess is Korean, so it's prefixing the chars
with %C2 - right in the middle of Hangul (Korean syllable alphabet).
More about the special Korean meaning in IE (which seams to be a debugging
trash left by one of IE developers) you can read in comp.lang.javascript,
look the thread by keywords "Bizarre JS brackets bug".
The situation is not so desperate though: at least YOU know what table to
use, so drop C2 (or whatever trach you'll get) and re-prefix it with 00
Another solution would be to use char-entities instead wherever it's
possible.
Post Follow-up to this messageVK wrote: > Somehow the system gets lost with such chars when the coding is set to UTF -8 > It cannot "get" that %A3 or such is really %00A3. > Instead the system tries to "guess" the right Unicode table. > Strangely enough 99% of its guess is Korean, so it's prefixing the chars > with %C2 - right in the middle of Hangul (Korean syllable alphabet). > More about the special Korean meaning in IE (which seams to be a debugging > trash left by one of IE developers) you can read in comp.lang.javascript, > look the thread by keywords "Bizarre JS brackets bug". C2A3 is the correct UTF-8 encoding for pound sign (correctly passed by the browser as specified in the page encoding) - see http://www1.tip.nl/~t876506/utf8tbl.html. When this is converted into a java.lang.String, the system is probably using the default iso-latin string encoding and performing a single-byte conversion. I don't believe that any 16-bit unicode matching is being performed at all. I have performed a quick test with IE by adding the following to a form: <input type="hidden" name="_charset_" /> This is an IE-only trick that can tell you the encoding of submitted parameters. This confirms that the data is being passed using UTF-8. In fact, IE continues to encode form data in UTF-8 even if the page encoding is changed to UTF-16. Regards, Gerard
Post Follow-up to this messageGerard Krupa wrote: > There are fundamental flaws in specifying and detecting the character > set used for submitted form data so you can't always assume that the > data will be passed in the same character set that was used to deliver > the page. The link > http://ppewww.ph.gla.ac.uk/~flavell.../form-i18n.html has some tips > on how to overcome this. You may also want to see https://bugzilla.mozilla.org/show_bug.cgi?id=241540 -- ======================================== ================================ Clearly, there is no political benefit to expediting the admission of legal immigrants into the United States. Nevertheless, I believe that our elected officials have an obligation to do more than simply pander to the thinly veiled racism of their constituents. Ian Pilcher ======================================== ================================
Post Follow-up to this messageThanks, Gerard. Changing the page-encoding to ISO-8859-1 did the trick. Thanks again Brian
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.