[Xerte-dev] Re: FW: Further Xerte Data Loss

Julian Tenney Julian.Tenney at nottingham.ac.uk
Mon May 12 14:18:21 BST 2014


I'm pretty certain the problem is down to pasting in non UTF code.

So how best to make sure it doesn't get through to the PHP? Old Flash isn't so good at this sort of thing. I already have event handlers looking of r a large change in size of the text, to detect pastes. I can do something there in the editor to strip out non-UTF8 I think?

J

From: xerte-dev-bounces at lists.nottingham.ac.uk [mailto:xerte-dev-bounces at lists.nottingham.ac.uk] On Behalf Of Dave Burnett
Sent: 12 May 2014 14:15
To: For Xerte technical developers
Subject: [Xerte-dev] Re: FW: Further Xerte Data Loss


It's a ' (RIGHT SINGLE QUOTATION MARK<http://www.fileformat.info/info/unicode/char/2019/index.htm> - U+2019) character which has been encoded as CP-1252<http://en.wikipedia.org/wiki/Windows-1252> instead of UTF-8<http://en.wikipedia.org/wiki/UTF-8>.

http://stackoverflow.com/questions/2477452/a-showing-on-page-instead-of

(and I assume the tilded version is the left single quote).



________________________________
From: Julian.Tenney at nottingham.ac.uk<mailto:Julian.Tenney at nottingham.ac.uk>
To: xerte-dev at lists.nottingham.ac.uk<mailto:xerte-dev at lists.nottingham.ac.uk>
Date: Mon, 12 May 2014 13:23:40 +0100
Subject: [Xerte-dev] Re: FW: Further Xerte Data Loss
This looks suspicious: that's not escaping, right? That looks to me like errant characters creeping in from somewhere? Paste from word, etc?

[cid:image001.png at 01CF6DED.076A2FB0]

-----Original Message-----
From: xerte-dev-bounces at lists.nottingham.ac.uk<mailto:xerte-dev-bounces at lists.nottingham.ac.uk> [mailto:xerte-dev-bounces at lists.nottingham.ac.uk] On Behalf Of David Goodwin
Sent: 12 May 2014 12:53
To: For Xerte technical developers
Subject: [Xerte-dev] Re: FW: Further Xerte Data Loss



On 12/05/14 12:41, Julian Tenney wrote:
> What beats me is why the file should break halfway through a tag name.


What I've seen is that the PHP (or whatever) can't read the XML file as it has the wrong encoding specified, or it just somehow contains characters that are invalid.

see e.g.

http://stackoverflow.com/questions/14463573/php-simplexml-load-file-invalid-character-error

https://chrismckee.co.uk/saving-user-content-to-xml-error-contains-none-utf-8-content-aka-how-to-remove-invalid-characters-in-utf-8/


David.

>
> -----Original Message-----
> From: xerte-dev-bounces at lists.nottingham.ac.uk<mailto:xerte-dev-bounces at lists.nottingham.ac.uk>
> [mailto:xerte-dev-bounces at lists.nottingham.ac.uk] On Behalf Of David
> Goodwin
> Sent: 12 May 2014 12:37
> To: For Xerte technical developers
> Subject: [Xerte-dev] Re: FW: Further Xerte Data Loss
>
> Is it possible that it's a unicode ` character?
>
>
> I've seen a number of UTF-8 related issues with the data.xml etc files.
>
> David.
>



--
Pale Purple Ltd

PHP Web application development and support

http://www.palepurple.co.uk
@PalePurpleLtd
07792 380669 / 0845 0046746

_______________________________________________
Xerte-dev mailing list
Xerte-dev at lists.nottingham.ac.uk<mailto:Xerte-dev at lists.nottingham.ac.uk>
http://lists.nottingham.ac.uk/mailman/listinfo/xerte-dev

_______________________________________________ Xerte-dev mailing list Xerte-dev at lists.nottingham.ac.uk<mailto:Xerte-dev at lists.nottingham.ac.uk> http://lists.nottingham.ac.uk/mailman/listinfo/xerte-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nottingham.ac.uk/pipermail/xerte-dev/attachments/20140512/95596b75/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 41424 bytes
Desc: image001.png
URL: <http://lists.nottingham.ac.uk/pipermail/xerte-dev/attachments/20140512/95596b75/attachment-0001.png>


More information about the Xerte-dev mailing list