PHPWord – Round 2

This is a follow up to PHPWord I wrote a blog about I wrote last year. I’ve had to use it a couple different times since then. While this plugin is pretty powerful, there are a few kinks that need to be worked out–especially if you are exporting a lot of text that was put in by users. I have been known to say that sometimes developing is like being a psychic. You need to prep the site in how you feel like the users should put in the information. Sometimes I’m good at it. Sometimes I’ve failed. This time, I didn’t think something through while programming a form.

Let me back up. I was given the task to create an application where managers can review their employees, submit to HR, then both managers and employees would sign. The signing would be using the same functionality I used previously. No sweat. The managers could export at any time after the first save. Again, no sweat. Well, I didn’t think of something. Managers wrote their reviews in Word (for grammar purposes) and then copied and pasted it into the application I built. Wellllllll any programmer knows that Microsoft products add a bunch of back code–especially Word. It looked fine on the front end, when you tried to export it, the Word doc wouldn’t open. It was corrupted. Sooo what was going on? All the Word characters. Ampersands, curly quotes and etc. You name it, it corrupted it. Well it appears I wasn’t the only one with that problem. The answer? This little function that looks for the word characters and replace it with the correct symbol. It’s an ever growing function. The only Word symbol I couldn’t figure out how to replace are bullet points.

To use this, all you have to do is run the decodeString() before you do the setValue() in the Word Template Processor. That’s it.

Boom! You’re welcome. Merry Christmas. 🙂

private function decodeString($str){
 $chr_map = array(
 // Windows codepage 1252
 "\xC2\x82" => "'", // U+0082⇒U+201A single low-9 quotation mark
 "\xC2\x84" => '"', // U+0084⇒U+201E double low-9 quotation mark
 "\xC2\x8B" => "'", // U+008B⇒U+2039 single left-pointing angle quotation mark
 "\xC2\x91" => "'", // U+0091⇒U+2018 left single quotation mark
 "\xC2\x92" => "'", // U+0092⇒U+2019 right single quotation mark
 "\xC2\x93" => '"', // U+0093⇒U+201C left double quotation mark
 "\xC2\x94" => '"', // U+0094⇒U+201D right double quotation mark
 "\xC2\x9B" => "'", // U+009B⇒U+203A single right-pointing angle quotation mark

 // Regular Unicode // U+0022 quotation mark (")
 // U+0027 apostrophe (')
 "\xC2\xAB" => '"', // U+00AB left-pointing double angle quotation mark
 "\xC2\xBB" => '"', // U+00BB right-pointing double angle quotation mark
 "\xE2\x80\x98" => "'", // U+2018 left single quotation mark
 "\xE2\x80\x99" => "'", // U+2019 right single quotation mark
 "\xE2\x80\x9A" => "'", // U+201A single low-9 quotation mark
 "\xE2\x80\x9B" => "'", // U+201B single high-reversed-9 quotation mark
 "\xE2\x80\x9C" => '"', // U+201C left double quotation mark
 "\xE2\x80\x9D" => '"', // U+201D right double quotation mark
 "\xE2\x80\x9E" => '"', // U+201E double low-9 quotation mark
 "\xE2\x80\x9F" => '"', // U+201F double high-reversed-9 quotation mark
 "\xE2\x80\xB9" => "'", // U+2039 single left-pointing angle quotation mark
 "\xE2\x80\x93" => "-", // U+203A single right-pointing angle quotation mark
 "\xE2\x80\x94" => "-", // U+203A single right-pointing angle quotation mark
 "\xE2\x80\xA6" => "...", // U+203A single right-pointing angle quotation mark
 );
 $chr = array_keys ($chr_map); // but: for efficiency you should
 $rpl = array_values($chr_map); // pre-calculate these two arrays
 $str = str_replace($chr, $rpl, html_entity_decode($str, ENT_QUOTES, "UTF-8"));

 return $str;
 }

Source:

http://stackoverflow.com/questions/7419302/converting-microsoft-word-special-characters-with-php
http://stackoverflow.com/questions/20025030/convert-all-types-of-smart-quotes-with-php