You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Clemens Wyss DEV <cl...@mysign.ch> on 2014/06/13 14:53:08 UTC

fuzzy/case insensitive AnalyzingSuggester )

Looking for an AnalyzingSuggester which supports
- fuzzyness
- case insensitivity
- small (in memors) footprint (*)

(*)Just tried to "hand" my big IndexReader (see oher post " [lucene 4.6] NPE when calling IndexReader#openIfChanged") into JaspellLookup. Got an OOM.
Is there any (Jaspell)Lookup implementation that can handle really big indexes (by swapping  out part of the "lookup-table")?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: fuzzy/case insensitive AnalyzingSuggester )

Posted by Clemens Wyss DEV <cl...@mysign.ch>.
I am back on this topic ;)

>Case- and diacritics insensitivity is supported out-of-the-box by the 
>analyzing suggesters, including the FuzzySuggester. 
>The logic is in the Analyzer.
So how do I force case-insensitivity?
I tried
...
	        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FuzzyLookupFactory</str>
	        <str name="ignoreCase=">true</str>
...
or
...
	        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingLookupFactory</str>
	        <str name="ignoreCase=">true</str>
...
to no avail

-----Ursprüngliche Nachricht-----
Von: Oliver Christ [mailto:ochrist@EBSCO.COM] 
Gesendet: Freitag, 20. Juni 2014 15:52
An: java-user@lucene.apache.org
Betreff: RE: fuzzy/case insensitive AnalyzingSuggester )

Hi Clemens,

I haven't yet built a suggester which combines all three, and am not aware of one. I'd love to have one though ;-)

Case- and diacritics insensitivity is supported out-of-the-box by the analyzing suggesters, including the FuzzySuggester. The logic is in the Analyzer.

I haven't yet tried out AnalyzingInfixSuggester, and haven't investigated whether it's possible to combine that with FuzzySuggester (which also is an analyzing suggester).

Due to memory constraints, we build infix suggesters by adding each relevant substring, but use WFST suggesters with payloads as the base, to reduce RAM load at runtime. We call the analyzer in the dictionary iterator. At search time, we look up the surface form (completion) in a secondary index using the payload as a key (and for deduping).

If FuzzySuggester supports payloads (haven't checked), you could get an infix suggester using the same approach. That will lead to large automata, and as you'd have to look up the completion in a secondary index, you'd never use the surface form returned by the automaton itself, so it's a waste of space. WFSTs are more space-efficient but don't support payloads (if I remember correctly) and there's no fuzzy WFST suggester either :(

Generally, we found it beneficial to not combine all functionality in a single suggester, but use separate automata in a cascaded model. We first look up completions in the prefix non-fuzzy suggester. Based on several criteria, we may then consult the infix suggester, and if needed, the fuzzy suggester. The rationale is that we don't want high-ranking fuzzy or infix hits to fill up the completion list while there are good (but less popular) prefix hits. Having control over which suggester is used when, and how its specific suggestions are merged into the final result list, helps improving the user experience, at least with our use cases.

Cheers, Oli

-----Original Message-----
From: Clemens Wyss DEV [mailto:clemensdev@mysign.ch] 
Sent: Friday, June 20, 2014 6:47 AM
To: java-user@lucene.apache.org
Subject: AW: fuzzy/case insensitive AnalyzingSuggester )

Sorry for re-asking. 
Has anyone implemented an AnalyzingSuggester which 
- is fuzzy
- is case insensitive (or must/should this be implemented by the analyzer?)
- does infix search
[- has a small memory footprint]

-----Ursprüngliche Nachricht-----
Von: Clemens Wyss DEV [mailto:clemensdev@mysign.ch] 
Gesendet: Freitag, 13. Juni 2014 14:53
An: java-user@lucene.apache.org
Betreff: fuzzy/case insensitive AnalyzingSuggester )

Looking for an AnalyzingSuggester which supports
- fuzzyness
- case insensitivity
- small (in memors) footprint (*)

(*)Just tried to "hand" my big IndexReader (see oher post " [lucene 4.6] NPE when calling IndexReader#openIfChanged") into JaspellLookup. Got an OOM.
Is there any (Jaspell)Lookup implementation that can handle really big indexes (by swapping  out part of the "lookup-table")?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB��[��X��ܚX�KK[XZ[
��]�K]\�\�][��X��ܚX�PX�[�K�\X�K�ܙ�B��܈Y][ۘ[��[X[��K[XZ[
��]�K]\�\�Z[X�[�K�\X�K�ܙ�B�B

AW: fuzzy/case insensitive AnalyzingSuggester )

Posted by Clemens Wyss DEV <cl...@mysign.ch>.
Oli, 
thanks for your valuable inputs!

> Generally, we found it beneficial to not combine all functionality in a single suggester
Makes absolutely sense, but doesn't help keeping RAM-load low ;) unless you go with WFSTs. 

What we have done so far is build a term-index based on the terms of the corresponding (data)index. I.e. an index always comes in pair with its corresponding term index.

-----Ursprüngliche Nachricht-----
Von: Oliver Christ [mailto:ochrist@EBSCO.COM] 
Gesendet: Freitag, 20. Juni 2014 15:52
An: java-user@lucene.apache.org
Betreff: RE: fuzzy/case insensitive AnalyzingSuggester )

Hi Clemens,

I haven't yet built a suggester which combines all three, and am not aware of one. I'd love to have one though ;-)

Case- and diacritics insensitivity is supported out-of-the-box by the analyzing suggesters, including the FuzzySuggester. The logic is in the Analyzer.

I haven't yet tried out AnalyzingInfixSuggester, and haven't investigated whether it's possible to combine that with FuzzySuggester (which also is an analyzing suggester).

Due to memory constraints, we build infix suggesters by adding each relevant substring, but use WFST suggesters with payloads as the base, to reduce RAM load at runtime. We call the analyzer in the dictionary iterator. At search time, we look up the surface form (completion) in a secondary index using the payload as a key (and for deduping).

If FuzzySuggester supports payloads (haven't checked), you could get an infix suggester using the same approach. That will lead to large automata, and as you'd have to look up the completion in a secondary index, you'd never use the surface form returned by the automaton itself, so it's a waste of space. WFSTs are more space-efficient but don't support payloads (if I remember correctly) and there's no fuzzy WFST suggester either :(

Generally, we found it beneficial to not combine all functionality in a single suggester, but use separate automata in a cascaded model. We first look up completions in the prefix non-fuzzy suggester. Based on several criteria, we may then consult the infix suggester, and if needed, the fuzzy suggester. The rationale is that we don't want high-ranking fuzzy or infix hits to fill up the completion list while there are good (but less popular) prefix hits. Having control over which suggester is used when, and how its specific suggestions are merged into the final result list, helps improving the user experience, at least with our use cases.

Cheers, Oli

-----Original Message-----
From: Clemens Wyss DEV [mailto:clemensdev@mysign.ch] 
Sent: Friday, June 20, 2014 6:47 AM
To: java-user@lucene.apache.org
Subject: AW: fuzzy/case insensitive AnalyzingSuggester )

Sorry for re-asking. 
Has anyone implemented an AnalyzingSuggester which 
- is fuzzy
- is case insensitive (or must/should this be implemented by the analyzer?)
- does infix search
[- has a small memory footprint]

-----Ursprüngliche Nachricht-----
Von: Clemens Wyss DEV [mailto:clemensdev@mysign.ch] 
Gesendet: Freitag, 13. Juni 2014 14:53
An: java-user@lucene.apache.org
Betreff: fuzzy/case insensitive AnalyzingSuggester )

Looking for an AnalyzingSuggester which supports
- fuzzyness
- case insensitivity
- small (in memors) footprint (*)

(*)Just tried to "hand" my big IndexReader (see oher post " [lucene 4.6] NPE when calling IndexReader#openIfChanged") into JaspellLookup. Got an OOM.
Is there any (Jaspell)Lookup implementation that can handle really big indexes (by swapping  out part of the "lookup-table")?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB��[��X��ܚX�KK[XZ[
��]�K]\�\�][��X��ܚX�PX�[�K�\X�K�ܙ�B��܈Y][ۘ[��[X[��K[XZ[
��]�K]\�\�Z[X�[�K�\X�K�ܙ�B�B

RE: fuzzy/case insensitive AnalyzingSuggester )

Posted by Oliver Christ <oc...@EBSCO.COM>.
Hi Clemens,

I haven't yet built a suggester which combines all three, and am not aware of one. I'd love to have one though ;-)

Case- and diacritics insensitivity is supported out-of-the-box by the analyzing suggesters, including the FuzzySuggester. The logic is in the Analyzer.

I haven't yet tried out AnalyzingInfixSuggester, and haven't investigated whether it's possible to combine that with FuzzySuggester (which also is an analyzing suggester).

Due to memory constraints, we build infix suggesters by adding each relevant substring, but use WFST suggesters with payloads as the base, to reduce RAM load at runtime. We call the analyzer in the dictionary iterator. At search time, we look up the surface form (completion) in a secondary index using the payload as a key (and for deduping).

If FuzzySuggester supports payloads (haven't checked), you could get an infix suggester using the same approach. That will lead to large automata, and as you'd have to look up the completion in a secondary index, you'd never use the surface form returned by the automaton itself, so it's a waste of space. WFSTs are more space-efficient but don't support payloads (if I remember correctly) and there's no fuzzy WFST suggester either :(

Generally, we found it beneficial to not combine all functionality in a single suggester, but use separate automata in a cascaded model. We first look up completions in the prefix non-fuzzy suggester. Based on several criteria, we may then consult the infix suggester, and if needed, the fuzzy suggester. The rationale is that we don't want high-ranking fuzzy or infix hits to fill up the completion list while there are good (but less popular) prefix hits. Having control over which suggester is used when, and how its specific suggestions are merged into the final result list, helps improving the user experience, at least with our use cases.

Cheers, Oli

-----Original Message-----
From: Clemens Wyss DEV [mailto:clemensdev@mysign.ch] 
Sent: Friday, June 20, 2014 6:47 AM
To: java-user@lucene.apache.org
Subject: AW: fuzzy/case insensitive AnalyzingSuggester )

Sorry for re-asking. 
Has anyone implemented an AnalyzingSuggester which 
- is fuzzy
- is case insensitive (or must/should this be implemented by the analyzer?)
- does infix search
[- has a small memory footprint]

-----Ursprüngliche Nachricht-----
Von: Clemens Wyss DEV [mailto:clemensdev@mysign.ch] 
Gesendet: Freitag, 13. Juni 2014 14:53
An: java-user@lucene.apache.org
Betreff: fuzzy/case insensitive AnalyzingSuggester )

Looking for an AnalyzingSuggester which supports
- fuzzyness
- case insensitivity
- small (in memors) footprint (*)

(*)Just tried to "hand" my big IndexReader (see oher post " [lucene 4.6] NPE when calling IndexReader#openIfChanged") into JaspellLookup. Got an OOM.
Is there any (Jaspell)Lookup implementation that can handle really big indexes (by swapping  out part of the "lookup-table")?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


AW: fuzzy/case insensitive AnalyzingSuggester )

Posted by Clemens Wyss DEV <cl...@mysign.ch>.
Sorry for re-asking. 
Has anyone implemented an AnalyzingSuggester which 
- is fuzzy
- is case insensitive (or must/should this be implemented by the analyzer?)
- does infix search
[- has a small memory footprint]

-----Ursprüngliche Nachricht-----
Von: Clemens Wyss DEV [mailto:clemensdev@mysign.ch] 
Gesendet: Freitag, 13. Juni 2014 14:53
An: java-user@lucene.apache.org
Betreff: fuzzy/case insensitive AnalyzingSuggester )

Looking for an AnalyzingSuggester which supports
- fuzzyness
- case insensitivity
- small (in memors) footprint (*)

(*)Just tried to "hand" my big IndexReader (see oher post " [lucene 4.6] NPE when calling IndexReader#openIfChanged") into JaspellLookup. Got an OOM.
Is there any (Jaspell)Lookup implementation that can handle really big indexes (by swapping  out part of the "lookup-table")?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org