Bug 26005

Summary: aspell (as used by recoll) is not compatible with aspell-sv, missing file
Product: Mageia Reporter: Morgan Leijström <fri>
Component: RPM PackagesAssignee: David GEIGER <geiger.david68210>
Status: RESOLVED INVALID QA Contact:
Severity: normal    
Priority: Normal CC: dglent, jf
Version: 7Keywords: UPSTREAM
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Source RPM: recoll-1.25.13-1.mga7.src.rpm CVE:
Status comment:

Description Morgan Leijström 2020-01-02 00:11:45 CET
Version-Release number of selected component (if applicable):
aspell 0.60.8-1.mga7
aspell-sv 0.51.0-17.mga7
recoll-full 1.25.13-1.mga7

I believe the problem is with aspell/aspell-sv, recoll only triggers and detect it per below:

Description of problem & Steps to Reproduce:

1. urpmi recoll-full aspell-sv
2. Start recoll indexing
3. After a while recoll shows a popup message:

" Non-fatal indexing message: 
aspell : aspell dictionary creation command failed: /usr/bin/aspell --lang=swedish --encoding=utf-8 create master /home/morgan/.recoll/aspdict.swedish.rws One possible reason might be missing language data files for lang = swedish. Maybe try to execute the command by hand for a better diag.
   "

Testing that failing command manually:
$ LC_ALL=C /usr/bin/aspell --lang=swedish --encoding=utf-8 create master /home/morgan/.recoll/aspdict.swedish.rws
Error: The language "swedish" is not known. This is probably because: the file "/usr/lib64/aspell-0.60/swedish.dat" can not be opened for reading.

Simple attempt to workaround by making a soft link:
# cd /usr/lib64/aspell-0.60/
# ln -s sv.dat swedish.dat
=> Result:
Now aspell simply get stuck consuming no CPU, have to kill it by ctrl-C
Comment 1 Lewis Smith 2020-01-02 22:01:00 CET
Since aspell is widely used, I think it better to stick this one on recoll at least for starters. aspell might indeed be the culprit.

Assigning to DavidG as the main committer; CC Dimitrios as the registered maintainer, for his opinion.

Source RPM: aspell-sv => recoll-1.25.13-1.mga7.src.rpm
Assignee: bugsquad => geiger.david68210
CC: (none) => dglent

Comment 2 David GEIGER 2020-01-31 09:30:41 CET
I think it would be better to file a new bug upstream:

https://opensourceprojects.eu/p/recoll1/tickets/
Comment 3 Morgan Leijström 2020-01-31 10:52:48 CET
Done now: https://opensourceprojects.eu/p/recoll1/tickets/126/

Keywords: (none) => UPSTREAM

Comment 4 Jean-Francois Dockes 2020-02-25 15:30:23 CET
Responding here instead of on opensourceprojects as the latter site has "issues".

I am not too sure where the "swedish" comes from. As far as I can see, recoll try to guess a language name from the nls environment, 

Did you try to set aspellLanguage = sv in the configuration file ?

CC: (none) => jf

Comment 5 Morgan Leijström 2020-02-25 16:57:21 CET
It seem to guess correctly; locale is swedish, and that is my mother's toungue.
I did not edit a file.
In Recoll menu > Preferences, " [x] (all languages) " is checked.
Comment 6 Jean-Francois Dockes 2020-02-25 17:30:13 CET
It guessed the right language, but the code used by aspell for swedish is sv not swedish, and it's also normally what's used in the locale. What does the 'locale' command output in a terminal ?

Anyway, the thing to try to fix this is to input sv in the GUI in Preferences->Index configuration->Global parameters->Aspell language, then run an index update.
Comment 7 Morgan Leijström 2020-02-26 00:35:44 CET
[morgan@svarten ~]$ locale
LANG=sv_SE.UTF-8
LC_CTYPE="sv_SE.UTF-8"
LC_NUMERIC="sv_SE.UTF-8"
LC_TIME="sv_SE.UTF-8"
LC_COLLATE="sv_SE.UTF-8"
LC_MONETARY="sv_SE.UTF-8"
LC_MESSAGES="sv_SE.UTF-8"
LC_PAPER="sv_SE.UTF-8"
LC_NAME="sv_SE.UTF-8"
LC_ADDRESS="sv_SE.UTF-8"
LC_TELEPHONE="sv_SE.UTF-8"
LC_MEASUREMENT="sv_SE.UTF-8"
LC_IDENTIFICATION="sv_SE.UTF-8"
LC_ALL=
[morgan@svarten ~]$ 


GUI Preferences->Index configuration->Global parameters->Aspell language was set to "swedish"  I now changed it to sv and started index update.  Will report back. Good night for today, and thank you for helping out here.
Comment 8 Morgan Leijström 2020-02-26 23:06:06 CET
I have noted no issues.

Now the quiestion is how it became "swedish" instead of sv in that setting.  Did I and forgot about it, or is it a packaging bug, or...?

It is a bit confusing that in "Stemming languages" full language names are used ( english, swedish )
Comment 9 Jean-Francois Dockes 2020-02-27 08:15:27 CET
I think that you must have input "swedish" yourself :) If nothing appears in this field, recoll computes the language from the NLS environment (sv_SE.UTF-8 -> sv in your case).

There are two sets of language names, specific to the two pieces of software which use them:

- The Xapian Snowball stemmers are in charge of expanding searches to term grammatical variations (floor -> floor floors flooring floored...)
- The aspell dictionary is in charge of suggesting orthographic/phonetic variations when a search returns no results

I both cases, I keep the language names specific to the tool (english, french, swedish, etc. for the stemmers vs. fr, en, sv for aspell) because trying to have a pivot list and translating to the specific terms appeared too risky and high maintenance to me.

And there are also other sets of language names for, e.g. tesseract :)
Comment 10 Morgan Leijström 2020-02-27 10:26:14 CET
Ah, operator error then...

Thank you for helping out, and for your great explanations!

Suggestion: in future version have a hint for the format to enter in that field?

Status: NEW => RESOLVED
Resolution: (none) => INVALID

Comment 11 Jean-Francois Dockes 2020-02-28 08:29:30 CET
Yep, I have improved the tooltips for the next version. These appear when you hover on the labels. Maybe I should add them to the entry zones too.
Comment 12 Morgan Leijström 2020-02-28 12:02:01 CET
Way to go !
Thank you for this great program :)
/Morgan