Wednesday, December 31, 2008

Replacing Accented Chars for their respective HTML Code

Hi there, long time I do not write in my blog but as you know the World is going through huge economical crisis and I had to work hard to maintain my financial stability (as most of you I am sure). Nevertheless, I wanted to share a piece of javascript code that saved my problems in dealing with accents (Latin Chars) with mySQL.

I have an Internet Guide ( and it is Spanish and English based site. I had a huge problem when it came with saving accented chars into a mySQL database as they were saved in some 'gibberish binary language' even though I used UTF-8 (this is quite normal). I didn't have any problems with that as by using a META TAG CONTENT with charset UTF-8 accents were displayed easily. My problems came when I had to do a text search on the database, the accents from PHP were interpreted differently that in the database and moreover, INPUT TEXT fields and SELECT boxes had values that also didn't match the database content.

The solution I found the easiest (I tryed everything (SQL- SET NAMES UTF-8, utf8_encode-decode,etc. everything found on Google was faulty and gived me wrong results)) was to convert the accented Chars into their HTML code representation (á becomes á). So I made a Web research on that but none of the solutions presented did replace 'anything at all'), the values were still saved in 'gibberish chars'.

At the end, and before FORM submissions (so to be saved in the database), this is what I did and it worked to perfection (you need prototype.js to use this code but you can easily convert it to suit your needs):
parejas = Class.create();

filter = Class.create();

parejas.prototype = {
initialize: function(char, code){
this.char = char;
this.code = code;
filter.prototype = {
initialize: function(){
this.pares = new Array();

this.pares[0] = new parejas('%E2%82%AC', "€");
this.pares[1] = new parejas('%C3%A1', "á");
this.pares[2] = new parejas('%C3%A9', "é");
this.pares[3] = new parejas('%C3%AD', "í");
this.pares[4] = new parejas('%C3%B3', "ó");
this.pares[5] = new parejas('%C3%BA', "ú");
this.pares[6] = new parejas('%C3%81', "Á");
this.pares[7] = new parejas('%C3%89', "É");
this.pares[8] = new parejas('%C3%8D', "Í");
this.pares[9] = new parejas('%C3%93', "Ó");
this.pares[10] = new parejas('%C3%9A', "Ú");
this.pares[11] = new parejas('%C3%B1', "ñ");
this.pares[12] = new parejas('%C3%91', "Ñ");
this.pares[13] = new parejas('%C3%9C', "Ü");
this.pares[14] = new parejas('%C3%BC', "ü");
htmlentities: function(txt){
var p = this.pares;
txt = encodeURIComponent(txt);
for (var i = 0,count = p.length; i < count; i++)
txt = txt.replace(new RegExp(p[i].char,'g'), p[i].code);
return decodeURIComponent(txt);
function htmlEntities(txt){
var f = new filter;
return f.htmlentities(txt);

The object parejas is created in order to hold the value pairs of the URI Encoded string and its HTML representation. The object filter first creates an array of parejas objects and inserts the correspondent values to replace (in the object above has Latin Chars but you can insert as many different Special Chars as you wish), and its htmlentities function first URI Encodes the passed string in order to replace its special chars to after URI Decode its result.

How to use:
Just call the htmlEntities function and pass the string to convert, the function will return the converted value. Really easy...

No comments:

The content of this blog is published under a Creative Commons License | RSS Feed
Powered By Blogger