Check your PHP UTF-8 Checklist

I spent way too long this weekend on a problem that had such a simple solution. I guess this issue may have been a little to do with the fact that I use the CodeIgniter framework, which does so much of the hard work for you. it’s easy to get complaisant.

I have been working with text files that contain multi-byte characters and had previously ensured that my database and tables were setup for UTF-8 and that everything in codeigniter was correctly configured. Yet still I was getting invalid character errors on the database insert.

As the text files were of varying formats, including excel’s unicode csv format, I had already ensured that the reading of the text file also included conversion to UTF-8. Thanks to the script on Practical Web Ltd, I was attempting to detect the format of the files and converting them to UTF-8 on the fly. Yet still I was getting invalid character errors on the database insert.

I even ran through my code line by line and checked for any string manipulation I was doing using non-safe string functions. Yet still I was getting invalid character errors on the database insert.

If I had any decent amount of hair left, I would certainly have pulled it all out by the time I figured out what was wrong. I only discovered the answer by accident when I decided to remove the string manipulation altogether. As soon as I did that, it worked a treat. Had I discovered a bug in the multibyte string functions? No.

I had not checked the default encoding of mbstring.

So please, make sure it is on your check list of things to do when dealing with multi-byte strings. Set up the default correctly or religiously use the encoding parameter in the multi-byte string functions.

Even better, you could use the great checklist on nicknettleton.com (see below), which seems to cover everything.

I totally deserved the dunce hat.

Edit: Looks like the link on nicknettleton.com is no longer available (thanks @Les). A little digging around led me to the same checklist on php UTF-8 on another site.

 

4 thoughts on “Check your PHP UTF-8 Checklist

  1. @Les Thanks for pointing out the dead link. I have updated the post – I found the same article on another website. I hope it is of some use.

  2. Your link to nicknettleton is also broken; your link isn’t the only one though as many other blogs link to same site too.

    If you cannot resolve the url, could you remove it?

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.