[Xerte-dev] Re: FW: Xenith/XOT Glossary regexps
Smith, John
J.J.Smith at gcu.ac.uk
Fri Oct 19 08:37:41 BST 2012
Hi Fay,
Greetings from Czeck Republic!! Yes, sorry, I got a bit carried away and didn't read the brief correctly or do sufficient testing!!
Anyway, I think I have cracked it and now have a regular expression that 'seems' to work as desired (in my testing anyway!!) but will need to be tested with real life data that I don't have access to so i'm putting it out there to see if anyone can break it. It also only uses a single glossary term and 3 capture groups to capture the before and after parts (which may be nothing - using the ^ and $) and the actual term (so that we maintain the case). It also handles some punctuation (this can easily be expanded upon). This means that it may even be suitable for xot also.
// function makes every glossary word found into a link
function insertGlossaryTag(node) {
var temp = node.nodeValue;
for (var k=0; k<glossary.length; k++) {
var regExp = new RegExp('(^|\\s)(' + glossary[k].word + ')([\\s\\.,!?]|$)', 'gi');
temp = temp.replace(regExp, '$1<a class="glossary" href="#" title="$2">$2</a>$3');
}
node.nodeValue = temp;
}
Again this has only been tested with Xenith code. It's also thrown up other questions such as:
1. Should it match the first or second part of a hyphenated word: not a real life example but cat-fish for example?
2. How should it handle plurals, if even at all: such at cats. It could have 's' and 'es' etc added to the punctuation group so would hyperlink cat but with a letter s afterwards, still not hyperlinked which would be fine?
Regards,
John Smith | Learning Technologist
Room A251, Govan Mbeki Building | School of Health & Life Sciences | Glasgow Caledonian University
Cowcaddens Road | Glasgow | G4 0BA
________________________________________
From: xerte-dev-bounces at lists.nottingham.ac.uk [xerte-dev-bounces at lists.nottingham.ac.uk] On Behalf Of Fay Cross [Fay.Cross at nottingham.ac.uk]
Sent: 17 October 2012 09:50
To: For Xerte technical developers
Subject: [Xerte-dev] Re: FW: Xenith/XOT Glossary regexps
Thanks for looking into this John. I'm just looking at the code you sent and you're right it doesn't take punctuation into account. It works when the word is at the beginning or end of a sentence, with or without a space immediately before or after but whenever there's punctuation next to it (cat. cat? etc.) it just gets replaced with cat and the punctuation is lost. Also, longer words that start with a word from the glossary get replaced with the shorter glossary word (e.g. category becomes cat).
-----Original Message-----
From: xerte-dev-bounces at lists.nottingham.ac.uk [mailto:xerte-dev-bounces at lists.nottingham.ac.uk] On Behalf Of Smith, John
Sent: 15 October 2012 18:48
To: xerte-dev at lists.nottingham.ac.uk
Subject: [Xerte-dev] Re: FW: Xenith/XOT Glossary regexps
Thinking about it over dinner though I've not taken ending punctuation into consideration - beginners mistake!!
Will look on my return unless its solved beforehand...
Regards
John Smith
Learning Technologist
School of Health and Life Sciences
Sent from Samsung Galaxy SII
Pat Lockley <patrick.lockley at googlemail.com> wrote:
Don't think that regexp works if the word is the first thing in a sentence
On 15 Oct 2012, at 07:10, Julian Tenney <Julian.Tenney at nottingham.ac.uk> wrote:
> Just forwarding this to the list for everyone's info: Fay can use it in the Xenith code, I'm not sure if I can integrate it into engine as this is (I guess) javascript and not the actionscript RegExp engine (although the expressions should work in both...). I'll try..
>
> -----Original Message-----
> From: Smith, John [mailto:J.J.Smith at gcu.ac.uk]
> Sent: 14 October 2012 17:26
> To: julian.tenney at nottingham.ac.uk; Fay.Cross at nottingham.ac.uk;
> ronm at mitchellmedia.co.uk; reijnders at tor.nl
> Subject: Xenith/XOT Glossary regexps
> Importance: High
>
> Hi guys,
>
> Great to meet you all and I've been looking through the xenith code to see where I can contribute. Also, have been looking through the archives and came across the regexp problem for the glossary. Since i'm only today on the list proper not sure whether a reply will go through to the correct place so sending to you all to see if it helps...
>
> Not sure whether this has been fixed yet but it seems the problem is partly caused by /b requiring a word boundary and there being no word boundary on the very first word. Also, seem to remember somewhere that /b can in some cases match international characters in the middle of words which might not be the desired effect...
>
> I have changed the regexp to this "\sTERM[^\s]*|^TERM[^\s]*" in the xenith.js code as so:
>
> // function makes every glossary word found into a link function insertGlossaryTag(node) {
> var temp = node.nodeValue;
> for (var k=0; k<glossary.length; k++) {
> // ** see recent emails on list about regular expression stuff **
> //var regExp = new RegExp(" " + glossary[k].word + " ",
> "ig");
>
> var regExp = new RegExp('\\s' + glossary[k].word +
> '[^\\s]*|^' + glossary[k].word + '[^\\s]*', 'gi');
>
> temp = temp.replace(regExp, ' <a class="glossary" href="#" title="' + glossary[k].definition + '">' + glossary[k].word + '</a> ');
> }
> node.nodeValue = temp;
> }
>
> and now it seems to match all the words, no matter where they are and irrespective of spaces. See attached screenshots - you can see there are no spaces before any words and only some have a space after. Probably needs further testing to go into xot though...
>
> Will start adding to the list soon...
>
> Regards,
>
> John Smith | Learning Technologist
> Room A251, Govan Mbeki Building | School of Health & Life Sciences |
> Glasgow Caledonian University Cowcaddens Road | Glasgow | G4 0BA
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6
> 219,en.html
>
> Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,1
> 5691,en.html
>
> This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.
>
> This message has been checked for viruses but the contents of an
> attachment may still contain software viruses which could damage your computer system:
> you are advised to perform your own checks. Email communications with
> the University of Nottingham may be monitored as permitted by UK legislation.
>
> <glossary.png>
> <glossary2.png>
> _______________________________________________
> Xerte-dev mailing list
> Xerte-dev at lists.nottingham.ac.uk
> http://lists.nottingham.ac.uk/mailman/listinfo/xerte-dev
>
> This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.
>
> This message has been checked for viruses but the contents of an
> attachment may still contain software viruses which could damage your computer system:
> you are advised to perform your own checks. Email communications with
> the University of Nottingham may be monitored as permitted by UK legislation.
>
_______________________________________________
Xerte-dev mailing list
Xerte-dev at lists.nottingham.ac.uk
http://lists.nottingham.ac.uk/mailman/listinfo/xerte-dev
This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.
This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system:
you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.
Glasgow Caledonian University is a registered Scottish charity, number SC021474
Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
_______________________________________________
Xerte-dev mailing list
Xerte-dev at lists.nottingham.ac.uk
http://lists.nottingham.ac.uk/mailman/listinfo/xerte-dev
_______________________________________________
Xerte-dev mailing list
Xerte-dev at lists.nottingham.ac.uk
http://lists.nottingham.ac.uk/mailman/listinfo/xerte-dev
Glasgow Caledonian University is a registered Scottish charity, number SC021474
Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glossary.png
Type: image/png
Size: 27397 bytes
Desc: glossary.png
URL: <http://lists.nottingham.ac.uk/pipermail/xerte-dev/attachments/20121019/ee40b314/attachment-0001.png>
More information about the Xerte-dev
mailing list