Bug 1053 - [PACKAGE REQUEST] ocrfeeder
: [PACKAGE REQUEST] ocrfeeder
Product: Desktop Bugs
Classification: ROSA Desktop
Component: Main Packages
: Fresh
: All Linux
: Normal normal
: ---
Assigned To: ROSA Linux Bugs
: ROSA Linux Bugs
Depends on:
  Show dependency treegraph
Reported: 2012-11-09 07:49 MSK by Vladimir Potapov
Modified: 2013-11-18 21:28 MSK (History)
2 users (show)

See Also:
RPM Package:
Bad POT generating:
vladimir.potapov: qa_verified-


Note You need to log in before you can comment on or make changes to this bug.
Description Vladimir Potapov 2012-11-09 07:49:22 MSK
Please, move OCR programs (yagl and cuneiform) from unsupported contrib to main repo
Comment 1 Denis Silakov 2013-04-17 22:02:45 MSD
Well, we have tesseract in main. Tesseract is developed much more actively then coneiform, which seems to be almost dead atm. So moving coneiform to main doesn't seem to be reasonable (but meanwhile, we can add some gui program for tesseract, though most of existing gui tools are quite primitive...).

And I'm not sure - what is 'yagl'? I can find only "Yet Another Generator Language. A mini-DSL for describing Rails-like generators", this is definitely not OCR:)
Comment 2 Denis Silakov 2013-04-18 10:05:16 MSD
(In reply to comment #1)

> And I'm not sure - what is 'yagl'? I can find only "Yet Another Generator
> Language. A mini-DSL for describing Rails-like generators", this is
> definitely not OCR:)

Ah, you likely meant "yagf". This is GUI frontend that can be used with both tesseract and cuneiform.
Comment 3 Denis Silakov 2013-04-18 11:52:06 MSD
I've played a little with gui frontends for tesseract. YAGF looks nice, but it segfaults and hangs too often (at least in my experiments), and it is not actively developed.

The best frontend I've found is ocrfeeder, I'll build it for Desktop Fresh (and maybe for LTS, but only for contrib).
Comment 4 Denis Silakov 2013-04-18 12:03:10 MSD
Added ocrfeeder - GUI frontend for tesseract and cuneiform.

ocrfeeder also provides a command-line tool that allows to create .odt files with recognized text with preserved formatting.

Build lists:
Comment 5 Vladimir Potapov 2013-04-18 18:07:15 MSD
1) No icon
2) Import page from scanner don't work :-(
3) require unpaper! (see settings)
Comment 6 Vladimir Potapov 2013-04-18 18:11:39 MSD
P.S. Yagf and cuneiform on my hardware work correct.
Comment 7 Denis Silakov 2013-04-18 18:22:46 MSD
Ok, I'll take a look. Thoug hI don't have a scanner near me to perform experiments.

As for cuneiform, it is unmaintained for the last two years - https://launchpad.net/cuneiform-linux. There doesn't seem to be many volunteers able to improve it or at least fix bugs.

tesseract is developed much more actively and supported by Google. That's why we have tesseract in main an cuneiform in contrib.

As for yagf - maybe its failures were caused by tesseract engine, not sure. But it really behaved ugly in my experiments. I'll try it with cuneiform for interest. Meanwhile, with yagf I've tried to recognize "Добро пожаловать.pdf" which is present in Downloads folder of user by default.
Comment 8 Vladimir Potapov 2013-04-18 19:40:20 MSD
Now I have a sheet with complex text in my 3 in 1 DeskJet.
ocrfeeder + tesseract don't work
Yagf + cuneiform recognize all lines (with errors, it's no Fine Reader :-)
Yagf + tesseract recognize only one text line :-(

Tomorrow I will try to change the settings of the scanner to achieve better recognition with tesseract  (I read about this on the Internet) 
Perhaps the best solution would be to Yagf + tesseract.
Comment 9 Aleksandr Kazantcev 2013-11-08 23:45:22 MSK
Close as obsolete time for QA
Comment 10 Denis Silakov 2013-11-18 21:28:53 MSK
Well, ocrfeed was finally pushed to contrib.

But note that we still have no OCR program in the main repository.