Sunday, 4 September 2011

How to get character encoding correct on Google App Engine

Character encodings can be a primary trigger for stomach ulcers. My Swedish web applications deployed on Google App Engine have had great difficulties to behave when presented with user input containing for example Swedish characters Å, Ä and Ö.

So here's a recipe for treating such characters with respect.

  • Don't use ISO-8859-1 as character encoding. Just don't.
  • Instead specify you JSP's to use UTF-8 with something like 
<%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>

However, this might not be enough unfortunately. Current browsers might not set a character encoding even if specified in the HTML page or form.

So if you aren't already using Spring, add spring-web to your application and add one of the Spring filters first in your filter chain.

<filter>
   <filter-name>SetCharacterEncoding</filter-name>
   <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
   <init-param>
       <param-name>encoding</param-name>
       <param-value>UTF-8</param-value>
   </init-param>
   <init-param>
      <param-name>forceEncoding</param-name>
      <param-value>true</param-value>
   </init-param>
</filter>
<filter-mapping>
   <filter-name>SetCharacterEncoding</filter-name>
   <url-pattern>/*</url-pattern>
</filter-mapping>

That's all folks.