[Xerte-dev] Re: FW: Xenith/XOT Glossary regexps

Tom Reijnders reijnders at tor.nl
Fri Oct 19 09:13:02 BST 2012


John,

Yes, I think this regexp is usable in xot as well. I'll implement it and 
will test it.

With regards to your second remark, my vote would be not to do this, 
because the way plurals are formed is different from language to language.

Regards,

Tom

Op 19-10-2012 9:37, Smith, John schreef:
> Hi Fay,
>
> Greetings from Czeck Republic!! Yes, sorry, I got a bit carried away and didn't read the brief correctly or do sufficient testing!!
>
> Anyway, I think I have cracked it and now have a regular expression that 'seems' to work as desired (in my testing anyway!!) but will need to be tested with real life data that I don't have access to so i'm putting it out there to see if anyone can break it. It also only uses a single glossary term and 3 capture groups to capture the before and after parts (which may be nothing - using the ^ and $) and the actual term (so that we maintain the case). It also handles some punctuation (this can easily be expanded upon). This means that it may even be suitable for xot also.
>
>
> // function makes every glossary word found into a link
> function insertGlossaryTag(node) {
>          var temp = node.nodeValue;
>          for (var k=0; k<glossary.length; k++) {
>                  var regExp = new RegExp('(^|\\s)(' + glossary[k].word + ')([\\s\\.,!?]|$)', 'gi');
>                  temp = temp.replace(regExp, '$1<a class="glossary" href="#" title="$2">$2</a>$3');
>          }
>          node.nodeValue = temp;
> }
>
>
> Again this has only been tested with Xenith code. It's also thrown up other questions such as:
>
> 1. Should it match the first or second part of a hyphenated word: not a real life example but cat-fish for example?
> 2. How should it handle plurals, if even at all: such at cats. It could have 's' and 'es' etc added to the punctuation group so would hyperlink cat but with a letter s afterwards, still not hyperlinked which would be fine?
>
> Regards,
>
> John Smith | Learning Technologist
> Room A251, Govan Mbeki Building | School of Health & Life Sciences | Glasgow Caledonian University
> Cowcaddens Road | Glasgow | G4 0BA
> ________________________________________
> From: xerte-dev-bounces at lists.nottingham.ac.uk [xerte-dev-bounces at lists.nottingham.ac.uk] On Behalf Of Fay Cross [Fay.Cross at nottingham.ac.uk]
> Sent: 17 October 2012 09:50
> To: For Xerte technical developers
> Subject: [Xerte-dev] Re: FW: Xenith/XOT Glossary regexps
>
> Thanks for looking into this John.  I'm just looking at the code you sent and you're right it doesn't take punctuation into account.  It works when the word is at the beginning or end of a sentence, with or without a space immediately before or after but whenever there's punctuation next to it (cat. cat? etc.) it just gets replaced with cat and the punctuation is lost.  Also, longer words that start with a word from the glossary get replaced with the shorter glossary word (e.g. category becomes cat).
>
>
> -----Original Message-----
> From: xerte-dev-bounces at lists.nottingham.ac.uk [mailto:xerte-dev-bounces at lists.nottingham.ac.uk] On Behalf Of Smith, John
> Sent: 15 October 2012 18:48
> To: xerte-dev at lists.nottingham.ac.uk
> Subject: [Xerte-dev] Re: FW: Xenith/XOT Glossary regexps
>
> Thinking about it over dinner though I've not taken ending punctuation into consideration - beginners mistake!!
>
> Will look on my return unless its solved beforehand...
>
> Regards
>
> John Smith
> Learning Technologist
> School of Health and Life Sciences
>
> Sent from Samsung Galaxy SII
>
>
>
> Pat Lockley <patrick.lockley at googlemail.com> wrote:
>
>
> Don't think that regexp works if the word is the first thing in a sentence
>
> On 15 Oct 2012, at 07:10, Julian Tenney <Julian.Tenney at nottingham.ac.uk> wrote:
>
>> Just forwarding this to the list for everyone's info: Fay can use it in the Xenith code, I'm not sure if I can integrate it into engine as this is (I guess) javascript and not the actionscript RegExp engine (although the expressions should work in both...). I'll try..
>>
>> -----Original Message-----
>> From: Smith, John [mailto:J.J.Smith at gcu.ac.uk]
>> Sent: 14 October 2012 17:26
>> To: julian.tenney at nottingham.ac.uk; Fay.Cross at nottingham.ac.uk;
>> ronm at mitchellmedia.co.uk; reijnders at tor.nl
>> Subject: Xenith/XOT Glossary regexps
>> Importance: High
>>
>> Hi guys,
>>
>> Great to meet you all and I've been looking through the xenith code to see where I can contribute. Also, have been looking through the archives and came across the regexp problem for the glossary. Since i'm only today on the list proper not sure whether a reply will go through to the correct place so sending to you all to see if it helps...
>>
>> Not sure whether this has been fixed yet but it seems the problem is partly caused by /b requiring a word boundary and there being no word boundary on the very first word. Also, seem to remember somewhere that /b can in some cases match international characters in the middle of words which might not be the desired effect...
>>
>> I have changed the regexp to this "\sTERM[^\s]*|^TERM[^\s]*" in the xenith.js code as so:
>>
>> // function makes every glossary word found into a link function insertGlossaryTag(node) {
>>         var temp = node.nodeValue;
>>         for (var k=0; k<glossary.length; k++) {
>>                 // ** see recent emails on list about regular expression stuff **
>>                 //var regExp = new RegExp(" " + glossary[k].word + " ",
>> "ig");
>>
>>                 var regExp = new RegExp('\\s' + glossary[k].word +
>> '[^\\s]*|^' + glossary[k].word + '[^\\s]*', 'gi');
>>
>>                 temp = temp.replace(regExp, ' <a class="glossary" href="#" title="' + glossary[k].definition + '">' + glossary[k].word + '</a> ');
>>         }
>>         node.nodeValue = temp;
>> }
>>
>> and now it seems to match all the words, no matter where they are and irrespective of spaces. See attached screenshots - you can see there are no spaces before any words and only some have a space after. Probably needs further testing to go into xot though...
>>
>> Will start adding to the list soon...
>>
>> Regards,
>>
>> John Smith | Learning Technologist
>> Room A251, Govan Mbeki Building | School of Health & Life Sciences |
>> Glasgow Caledonian University Cowcaddens Road | Glasgow | G4 0BA
>>
>> Glasgow Caledonian University is a registered Scottish charity, number
>> SC021474
>>
>> Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009.
>> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6
>> 219,en.html
>>
>> Winner: Times Higher Education's Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
>> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,1
>> 5691,en.html
>>
>> This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.
>>
>> This message has been checked for viruses but the contents of an
>> attachment may still contain software viruses which could damage your computer system:
>> you are advised to perform your own checks. Email communications with
>> the University of Nottingham may be monitored as permitted by UK legislation.
>>
>> <glossary.png>
>> <glossary2.png>
>> _______________________________________________
>> Xerte-dev mailing list
>> Xerte-dev at lists.nottingham.ac.uk
>> http://lists.nottingham.ac.uk/mailman/listinfo/xerte-dev
>>
>> This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.
>>
>> This message has been checked for viruses but the contents of an
>> attachment may still contain software viruses which could damage your computer system:
>> you are advised to perform your own checks. Email communications with
>> the University of Nottingham may be monitored as permitted by UK legislation.
>>
> _______________________________________________
> Xerte-dev mailing list
> Xerte-dev at lists.nottingham.ac.uk
> http://lists.nottingham.ac.uk/mailman/listinfo/xerte-dev
>
> This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.
>
> This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system:
> you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.
>
>
> Glasgow Caledonian University is a registered Scottish charity, number SC021474
>
> Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>
> Winner: Times Higher Education's Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
> _______________________________________________
> Xerte-dev mailing list
> Xerte-dev at lists.nottingham.ac.uk
> http://lists.nottingham.ac.uk/mailman/listinfo/xerte-dev
> _______________________________________________
> Xerte-dev mailing list
> Xerte-dev at lists.nottingham.ac.uk
> http://lists.nottingham.ac.uk/mailman/listinfo/xerte-dev
>
> Glasgow Caledonian University is a registered Scottish charity, number SC021474
>
> Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>
> Winner: Times Higher Education's Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
>
>
> _______________________________________________
> Xerte-dev mailing list
> Xerte-dev at lists.nottingham.ac.uk
> http://lists.nottingham.ac.uk/mailman/listinfo/xerte-dev
>
> This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it.   Please do not use, copy or disclose the information contained in this message or in any attachment.  Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.
>
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses which could damage your computer system:
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
>

-- 
--

Tom Reijnders
TOR Informatica
Chopinlaan 27
5242HM Rosmalen
Tel: 073 5226191
Fax: 073 5226196

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nottingham.ac.uk/pipermail/xerte-dev/attachments/20121019/acdf9515/attachment-0001.html>


More information about the Xerte-dev mailing list