[Syrphidae] Re: Kuznetsov's Pelecocera paper = translation

Bastiaan Wakkie bwakkie at syrphidae.com
Thu Feb 28 22:47:19 GMT 2013

Hash: SHA1

Hi Jens-Herman,

All is done an old (10 years) Linux pc with Ubuntu 12.10. I used
gscan2pdf (scans from scanner but also opens pdf's) which gave me very
good pdf's ready for ocr, it deskrews and unpapers the pdf's also
resizes it for better recognition. You can use directly ocr in gscan2pdf
but i find it less useful as I cannot edit and look at the paper at the
same time.

Then I use ocrfeeder which does nice analysing of the text/img blocks
and I run tesseract with Russian language and some parts (like
literature) in English (although it was not the best choice). But I had
to manually change all Latin in the Russian text though. It is possible
to -train- tesseract to understand both Latin and Russian but I didn't
dive into it.

After export from ocrfeeder to text I just copy/paste it into google
translate. and add the translation to the text.

 Hope that helps. If you want to try this I can help in more detail if
you like.

My plan initially is to be able to upload (all of you) a pdf or djvu
file on my website and run on request the ocr and translate on request
again via google translate in whatever language you like (which is
possible) and add the result in a wiki style page so people (we) can
adjust it for future use.



> I am very impressed by the results with your ocr tools and the translation of the Kuznetzov work.
Which software do you use and how can you scan the paper? I failed
several times when trying to use an ocr for Conopidae paper.

Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/


More information about the Syrphidae mailing list