Subject: Re: [xsl] German character set problem(Umlaute) From: Andreas Schlegel <schlegelaw@xxxxxx> Date: Thu, 19 Dec 2002 23:58:50 +0100 |
Greetings, Andreas
Andreas Schlegel wrote: [ Charset windows-1252 unsupported, converting... ]
Hi,
we have the following problem with our internet application.
If the user make an input in a pure HTML form like "müller" the server (JAVA servlets with Tomcat 4.0.3) get "müller".
Not always. The encoding of the HTML document containing the form determines (by convention, not standard) how the form data is escaped and sent in the HTTP request to the servlet.
So if your HTML with the form contains <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> and the user hasn't overridden the encoding in their browser, then the form is submitted with data encoded like m%C3%BCller because byte pair C3 BC is how ü is represented in UTF-8. If the form is iso-8859-1 encoded then you get m%FCller, because byte FC is how ü is represented in iso-8859-1.
In the request, there's typically no indication of what encoding was used as
the basis for the %-escaping, so when converting this data to a String for
access in a "parameter" of the request, Tomcat makes a guess, using
iso-8859-1, last I checked -- someone correct me if they've changed it. Parameter is a heavily overloaded term; I try not to use it when talking about
HTML form data.
So as long as your HTML form is iso-8859-1 encoded and the user isn't doing anything unusual, Tomcat tells you that it got a String like "m\u00FCller".
If the user make the input in a HTML form which was generated by the TransformerFactory of the package javax.xml.transform (j2sdk1.4.0_01) the server receives the String "mÃ?ller"!
Apparently your form is UTF-8 encoded, and the browser knows that, and is sending the data like m%C3%BCller. Tomcat doesn't know about UTF-8 being used, so it thinks C3 and BC are iso-8859-1 bytes that map to separate characters.
Either change your transformation to output the HTML form as iso-8859-1, or have your servlet re-encode the String as iso-8859-1 bytes, then decode it back into a String using utf-8.
Mike
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] German character set prob, Mike Brown | Thread | Re: [xsl] German character set prob, David Carlisle |
[xsl] xsl:analyze-string trouble or, Gunther Schadow | Date | RE: [xsl] xsl:analyze-string troubl, Michael Kay |
Month |