. . Hong Kong Business Software Company
Expert in web-based solutions
Software House in Hong Kong

Microsoft Certified Solution Developer

 

Chief Architect's Blog on software development

1 Aug 2010
Big5 Messy Code / Unreadable Text Mapping to UTF8
In a recent project, I encounter a Big5 Messy Code (亂碼) issue.  Usually one might see unreadable text or Messy Code when using wrong encoding to read text.  For example, if you use Western European (Windows) encoding to read Chinese Big 5 encoding, you will see "¤j¤-½X¶Ã½X" instead of "大五碼亂碼".  If it is on a web page, you can easily change the encoding to the correct one (in IE menu, just go to View -> Encoding).For non web-based application, one must need to write a customed conversion program. 

The invention of Unicode has solved most of the messy code problem but there are still many legacy system using the old way to represent Chinese/Japanese/Korean.

Here is the scenario that I encountered.  My client  (actually not my client at the very beginning) had an application storing Chinese characters as varchar (using Windows - 1252 collation) in a relatively old database (MS SQL 2000). They read Chinese correctly from their existing application (programmed in PowerBuilder 6).  Unfortunately, this application can only run correctly on Windows 98 which must be phased out because of hardware replacement issue.  The original programmer told my client that it was very difficult to upgrade the application and that he did not want to do the upgrade for them.  There was a pressing need to find a replacement solution.  My client sought me (also other software houses, I think) for advice and help.  I am not competent in PowerBuilder but I told them I can easily use Microsoft .Net to do the conversion for them.  They commissioned me to solve the problem for them.  At the beginning of the project, I tried to use various .Net built-in encoding methods to do the messy codes conversion but in vain.  All built-in .Net encoding methods are proved unhelpful in this project.  Finally, I need to wrote my own conversion program to solve this problem for them! The idea is to map unreadable text into readable unicode, e.g. map  "¤j" to "大", "¤-" to  "五" and so on.  A table of about 60,000 characters is sufficient for most of the commonly used Chinese character.

Although I spent extra time in completing this project, it is an excellent experience for my future projects.



more topics...  

Home | Products | Services | FAQ | Chinese version | Contact Us
© 2008 Bisware Technology Limited. All Rights Reserved.