Bug 2423 - urpmi --auto tesseract may select wrong tesseract-<lang> package
: urpmi --auto tesseract may select wrong tesseract-<lang> package
Status: RESOLVED FIXED
Product: Desktop Bugs
Classification: ROSA Desktop
Component: Main Packages
: Fresh
: All Linux
: Normal normal
: ---
Assigned To: ROSA Linux Bugs
: ROSA Linux Bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-31 12:01 MSD by Eugene Shatokhin
Modified: 2013-08-26 02:01 MSD (History)
3 users (show)

See Also:
RPM Package: tesseract-3.01-2
ISO-related:
Bad POT generating:
Upstream:
vladimir.potapov: qa_verified+
alex.burmashev: published+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eugene Shatokhin 2013-07-31 12:01:46 MSD
Description of problem:

On my system, 'urpmi --auto tesseract' installs tesseract-vie (Vietnamese support) as a dependency rather than the package for the system language (English).


Version-Release number of selected component (if applicable):
tesseract-3.01

How reproducible: always


Steps to Reproduce:
1. urpmi --auto tesseract
Comment 1 Denis Silakov 2013-07-31 15:49:20 MSD
Advisory:
tesseract-<lang> now require appropriate locales-* packages, so urpmi now will be able to automatically choose language packs suitable for current locale. E.g., if you have Russian locale, tesseract-rus will be installed.

In addition, tesseract was updated to a new minor version - 3.02.02.

Build lists:
https://abf.rosalinux.ru/build_lists/1198664
https://abf.rosalinux.ru/build_lists/1198665
Comment 2 Vladimir Potapov 2013-08-05 13:02:21 MSD
The issue reproduce but don't fixed.
On my system:

urpmi --auto tesseract
Пакеты locales-fr-2.15-2-rosa2012.1.i586, libgcc1-4.7.3_2012.10-3.1-rosa2012.1.i586, liblept2-1.69-1-rosa2012.1.i586, libtesseract3-3.02.02-2-rosa2012.1.i586, libstdc++6-4.7.3_2012.10-3.1-rosa2012.1.i586, glibc-2.15-8-rosa2012.1.i586 уже установлен
    http://abf-downloads.rosalinux.ru/rosa2012.1/container/1198664/i586/main/release/tesseract-3.02.02-2-rosa2012.1.i586.rpm
    http://abf-downloads.rosalinux.ru/rosa2012.1/container/1198664/i586/main/release/tesseract-frm-3.02.02-2-rosa2012.1.i586.rpm 
                                                                                                                                 

устанавливается tesseract-3.02.02-2-rosa2012.1.i586.rpm tesseract-frm-3.02.02-2-rosa2012.1.i586.rpm из /var/cache/urpmi/rpms
warning: LOOP:
warning: removing tesseract-frm-3.02.02-2.i586 "Requires: tesseract >= 3.00" from tsort relations.
warning: removing tesseract-3.02.02-2.i586 "Requires: tesseract-language >= 3.00" from tsort relations.
Подготовка...                    ###############################################################################################
      1/2: tesseract             ###############################################################################################
      2/2: tesseract-frm         ##############################################################################################
Comment 3 Vladimir Potapov 2013-08-07 09:01:31 MSD
for x64 
urpmi --auto tesseract
Пакеты locales-fr-2.15-2-rosa2012.1.x86_64, glibc-2.15-8-rosa2012.1.x86_64, lib64gif4-4.1.6-15-rosa2012.1.x86_64, lib64z1-1.2.7-3-rosa2012.1.x86_64, lib64png15-1.5.13-1-rosa2012.1.x86_64, lib64gcc1-4.7.3_2012.10-3.1-rosa2012.1.x86_64, lib64stdc++6-4.7.3_2012.10-3.1-rosa2012.1.x86_64, lib64tiff5-4.0.3-1-rosa2012.1.x86_64, lib64jpeg8-1.2.1-1-rosa2012.1.x86_64 уже установлен
locales-fr помечается как установленный вручную; он не будет учитываться при определении пакетов-сирот
writing /var/lib/rpm/installed-through-deps.list
    ftp://mirror.yandex.ru/rosa/rosa2012.1/repository/x86_64/media/main/release/lib64lept2-1.69-1-rosa2012.1.x86_64.rpm
    http://abf-downloads.rosalinux.ru/rosa2012.1/container/1198665/x86_64/main/release/lib64tesseract3-3.02.02-2-rosa2012.1.x86_64.rpm
    http://abf-downloads.rosalinux.ru/rosa2012.1/container/1198665/x86_64/main/release/tesseract-frm-3.02.02-2-rosa2012.1.x86_64.rp

****************
QA Denied
Comment 4 Denis Silakov 2013-08-18 16:35:23 MSD
Well, my advisory was not completely correct. urpmi doesn't look for system locale settings; instead, it analyzes which locales-* packages are installed (besides English) and tries to choose among corresponding tesseract-* packages. So if ou have locales-en, locales-ru and locales-fr installed in your system, urpmi will choose between Russian and French packs for tesseract. And it's impossible to predict which one will be selected with '--auto' (this can be different for different systems). Remove locales-fr and then 'urpmi --auto tesseract' should install Russian language for tesseract.

But I will take a look if it is easy to teach urpmi to take system locale settings into account.
Comment 5 Denis Silakov 2013-08-19 11:32:28 MSD
Ok, so the behavior you observe comes from urpmi algorithm, while this bug is intended to fix tesseract packages. 

I have filed a separate bug #2509 for urpmi; as for this bug, I suggest to publish tesseract packages in their current state - there is no more issues to fix there.

Advisory:
tesseract-<lang> now require appropriate locales-* packages, so urpmi now will be able to automatically choose language packs suitable for current locale. E.g., if you have Russian locale (locales-ru package is installed), tesseract-rus will be installed. Note that urpmi doesn't take into account system locale settings, so if you have several locales-* packages installed (besides locales-en), urpmi still will choose between several tesseract-<lang> packages corresponding to that locales.

In addition, tesseract was updated to a new minor version - 3.02.02.

Build lists:
https://abf.rosalinux.ru/build_lists/1198664
https://abf.rosalinux.ru/build_lists/1198665
Comment 6 Vladimir Potapov 2013-08-19 17:14:16 MSD
urpmi --auto tesseract
Пакеты libgcc1-4.7.3_2012.10-3.1-rosa2012.1.i586, locales-ru-2.15-2-rosa2012.1.i586, liblept2-1.69-1-rosa2012.1.i586, libstdc++6-4.7.3_2012.10-3.1-rosa2012.1.i586, glibc-2.15-8-rosa2012.1.i586 уже установлен
    http://abf-downloads.rosalinux.ru/rosa2012.1/container/1198664/i586/main/release/tesseract-rus-3.02.02-2-rosa2012.1.i586.rpm
    http://abf-downloads.rosalinux.ru/rosa2012.1/container/1198664/i586/main/release/libtesseract3-3.02.02-2-rosa2012.1.i586.rpm 
    http://abf-downloads.rosalinux.ru/rosa2012.1/container/1198664/i586/main/release/tesseract-3.02.02-2-rosa2012.1.i586.rpm     
                                                                                                                                 

устанавливается tesseract-rus-3.02.02-2-rosa2012.1.i586.rpm libtesseract3-3.02.02-2-rosa2012.1.i586.rpm tesseract-3.02.02-2-rosa2012.1.i586.rpm из /var/cache/urpmi/rpms
warning: LOOP:
warning: removing tesseract-3.02.02-2.i586 "Requires: tesseract-language >= 3.00" from tsort relations.
warning: removing tesseract-rus-3.02.02-2.i586 "Requires: tesseract >= 3.00" from tsort relations.
Подготовка...                    ###############################################################################################
      1/3: libtesseract3         ###############################################################################################
      2/3: tesseract-rus         ###############################################################################################
      3/3: tesseract             ###############################################################################################
[root@FRESH2012 keleg]# exit
exit
[keleg@FRESH2012 ~]$ tesseract
Error opening data file /usr/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
Comment 7 Denis Silakov 2013-08-20 12:27:12 MSD
Hm, yes, I can confirm this, will take a look...
Comment 8 Denis Silakov 2013-08-20 12:32:11 MSD
Seems to be a known upstream issue fixed in git but not released yet:
https://bugs.launchpad.net/ubuntu/+source/tesseract/+bug/1189995
Comment 9 Denis Silakov 2013-08-20 15:36:27 MSD
Ok, I have built tesseract from the latest SVN snapshot. The issue mentioned in comments above is fixed there - tesseract is able to start and work even if eng.traineddata is missing. However, there is an issue with new options added in 3.02.02 - '--list-langs' and '--print-parameters'. They don't work without eng.traineddata. But this is not regression, since in 3.01 there were no such options at all.

The issue is reported upstream - http://code.google.com/p/tesseract-ocr/issues/detail?id=970

I think that new tesseract has many advantages (support for more languages, improved traineddata, etc.), and it is better to update to current SVN snapshot, even though the new options might not work.

Advisory:
tesseract-<lang> now require appropriate locales-* packages, so urpmi now will be able to automatically choose language packs suitable for current locale. E.g., if you have Russian locale (locales-ru package is installed), tesseract-rus will be installed.

Note that urpmi doesn't take into account system locale settings, so if you have several locales-* packages installed (besides locales-en), urpmi still will choose between several tesseract-<lang> packages corresponding to that locales.

In addition, tesseract was updated to a new minor version - 3.02.02 with additional improvements from SVN (revision 866).

Build lists:
https://abf.rosalinux.ru/build_lists/1218089
https://abf.rosalinux.ru/build_lists/1218088
Comment 10 Vladimir Potapov 2013-08-21 13:45:44 MSD
tesseract-3.02.03-0.svn866.1-rosa2012.1
****************** Advisory ************************
tesseract-<lang> now require appropriate locales-* packages, so urpmi now will be able to automatically choose language packs suitable for current locale. E.g., if you have Russian locale (locales-ru package is installed), tesseract-rus will be installed.

Note that urpmi doesn't take into account system locale settings, so if you have several locales-* packages installed (besides locales-en), urpmi still will choose between several tesseract-<lang> packages corresponding to that locales.

In addition, tesseract was updated to a new minor version - 3.02.02 with additional improvements from SVN (revision 866).
******************************************************
QA Verified