ROSA Linux Bugzilla – Bug 1053
[PACKAGE REQUEST] ocrfeeder
Last modified: 2013-11-18 21:28:53 MSK
Please, move OCR programs (yagl and cuneiform) from unsupported contrib to main repo
Well, we have tesseract in main. Tesseract is developed much more actively then coneiform, which seems to be almost dead atm. So moving coneiform to main doesn't seem to be reasonable (but meanwhile, we can add some gui program for tesseract, though most of existing gui tools are quite primitive...).
And I'm not sure - what is 'yagl'? I can find only "Yet Another Generator Language. A mini-DSL for describing Rails-like generators", this is definitely not OCR:)
(In reply to comment #1)
> And I'm not sure - what is 'yagl'? I can find only "Yet Another Generator
> Language. A mini-DSL for describing Rails-like generators", this is
> definitely not OCR:)
Ah, you likely meant "yagf". This is GUI frontend that can be used with both tesseract and cuneiform.
I've played a little with gui frontends for tesseract. YAGF looks nice, but it segfaults and hangs too often (at least in my experiments), and it is not actively developed.
The best frontend I've found is ocrfeeder, I'll build it for Desktop Fresh (and maybe for LTS, but only for contrib).
Added ocrfeeder - GUI frontend for tesseract and cuneiform.
ocrfeeder also provides a command-line tool that allows to create .odt files with recognized text with preserved formatting.
1) No icon
2) Import page from scanner don't work :-(
3) require unpaper! (see settings)
P.S. Yagf and cuneiform on my hardware work correct.
Ok, I'll take a look. Thoug hI don't have a scanner near me to perform experiments.
As for cuneiform, it is unmaintained for the last two years - https://launchpad.net/cuneiform-linux. There doesn't seem to be many volunteers able to improve it or at least fix bugs.
tesseract is developed much more actively and supported by Google. That's why we have tesseract in main an cuneiform in contrib.
As for yagf - maybe its failures were caused by tesseract engine, not sure. But it really behaved ugly in my experiments. I'll try it with cuneiform for interest. Meanwhile, with yagf I've tried to recognize "Добро пожаловать.pdf" which is present in Downloads folder of user by default.
Now I have a sheet with complex text in my 3 in 1 DeskJet.
ocrfeeder + tesseract don't work
Yagf + cuneiform recognize all lines (with errors, it's no Fine Reader :-)
Yagf + tesseract recognize only one text line :-(
Tomorrow I will try to change the settings of the scanner to achieve better recognition with tesseract (I read about this on the Internet)
Perhaps the best solution would be to Yagf + tesseract.
Close as obsolete time for QA
Well, ocrfeed was finally pushed to contrib.
But note that we still have no OCR program in the main repository.