[Xerte-dev] Invalid character breaking data.xml parsing (and therefore export)

David Goodwin david at palepurple.co.uk
Tue Nov 6 11:36:01 GMT 2012


Hi,

I'm seeing an invalid character being added to a USER-FILES/xxx-xxxx-xxxxx/data.xml file which is breaking the export of the LO.


From the PHP error log I see :

05-Nov-2012 17:55:01 Europe/London] PHP Warning:  simplexml_load_file(): /xxxxxxx/155-elsjbh-Nottingham/template.xml:1: parser error : CData section not finished
When virtual communication isn't effective, people in /xxxxx/website_code/php/xmlInspector.php on line 26
[05-Nov-2012 17:55:01 Europe/London] PHP Warning:  simplexml_load_file(): concern for others' well being</li>^A in /xxxxxx/website_code/php/xmlInspector.php on line 26
[05-Nov-2012 17:55:01 Europe/London] PHP Warning:  simplexml_load_file():                                    ^ in /xxxxxx/website_code/php/xmlInspector.php on line 26
[05-Nov-2012 17:55:01 Europe/London] PHP Warning:  simplexml_load_file(): /xxxxxxx/USER-FILES/155-elsjbh-Nottingham/template.xml:1: parser error : PCDATA invalid Char value 1 in /xxxxxx/website_code/php/xmlInspector.php on line 26


You should see the ^A which is the invalid character.

If I fix the newlines in the file (either data.xml or template.xml) and then run xmllint --format on it, I get the following :

foo.xml:832: parser error : CData section not finished
When virtual communication isn't effective, people
concern for others' well being</li>
                                   ^
foo.xml:832: parser error : PCDATA invalid Char value 1
concern for others' well being</li>
                                   ^
foo.xml:846: parser error : Sequence ']]>' not allowed in content
Byrne, M. and Associates (2000) <i>Virtual Teams, Virtual Management.</i>]]></te
                                                                         ^
foo.xml:846: parser error : internal error
Byrne, M. and Associates (2000) <i>Virtual Teams, Virtual Management.</i>]]></te
                                                                         ^
foo.xml:846: parser error : Extra content at the end of the document
Byrne, M. and Associates (2000) <i>Virtual Teams, Virtual Management.</i>]]></te


And upon viewing the file I see :

Three components of a well-functioning virtual team are:
<li>competence
integrity
concern for others' well being</li>^A
…..


I'm not sure what character the ^A is - but it's obviously causing problems. If I delete it, and save data.xml it then passes xmllint and the LO can be exported and everything works fine.


However, somehow the character is re-appearing over time, as the problem has come back.

From the export code, I can see data.xml is copied to template.xml; but is something used to create data.xml? 
How is this ^A character returning? Is it possible the end user is pasting something into the XOT editor which contains the strange character?


(I assume somewhere some code isn't creating the XML document using a library, and is instead concatenating strings … hence the character isn't escaped/encoded correctly?)

Thanks
David.

Pale Purple Ltd.  (Company No: 5580814)
'Web and Mobile Application Development for Business'

http://www.palepurple.co.uk   
Office: 0845 0046746     Mobile: 07792380669 

Follow us on Twitter: @PalePurpleLtd




More information about the Xerte-dev mailing list