You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Lochschmied, Alexander" <Al...@vishay.com> on 2012/08/02 03:58:29 UTC

Special suggestions requirement

Is there a way to offer distinct, alphabetically sorted, fixed length options?

I am trying to suggest part numbers and I'm currently trying to do it with the spellchecker component.
Let's say "ABCD" was entered and we have indexed part numbers like
ABCD00000000
ABCD20000000
ABCD21000000
ABCD22000000
...

I would like to have 2 characters suggested always, so for "ABCD", it should suggest
ABCD00
ABCD20
ABCD21
ABCD22
...

No smart sorting is needed, just alphabetically sorting. The problem is that for example 00 (or ABCD00) may not be suggested currently as it doesn't score high enough. But we are really trying to get all distinct values starting from the smallest (up to a certain number of suggestions).

I was looking already at custom comparator class option. But this would probably not work as I would need more information to implement it there (like at least the currently entered search term, "ABCD" in the example).

Thanks,
Alexander

AW: Special suggestions requirement

Posted by "Lochschmied, Alexander" <Al...@vishay.com>.
Is there anything you cannot do with Solr? :-)
Thanks a lot Erick! I only had to use . instead of ?, e.g.

...:8983/solr/terms?terms.fl=fieldname&terms.limit=100&terms.prefix=abcd&terms.regex.flag=case_insensitive&terms=true&terms.regex=abcd..

Adding terms.sort=index allows me even to sort as I need.

Thanks,
Alexander

-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:erickerickson@gmail.com] 
Gesendet: Samstag, 4. August 2012 20:11
An: solr-user@lucene.apache.org
Betreff: Re: Special suggestions requirement

Would it work to use TermsComponent with wildcards?
Something like terms.regex="ABCD42??"...

see: http://wiki.apache.org/solr/TermsComponent/

Best
Erick


On Fri, Aug 3, 2012 at 9:07 AM, Michael Della Bitta <mi...@appinions.com> wrote:
> I could be crazy, but it sounds to me like you need a trie, not a 
> search index: http://en.wikipedia.org/wiki/Trie
>
> But in any case, what you want to do should be achievable. It seems 
> like you need to do EdgeNgrams and facet on the results, where 
> facet.counts > 1 to exclude the actual part numbers, since each of 
> those would be distinct.
>
> I'm on the train right now, so I can't test this. :\
>
> Michael Della Bitta
>
> ------------------------------------------------
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 
> www.appinions.com Where Influence Isn't a Game
>
>
> On Thu, Aug 2, 2012 at 9:19 PM, Lochschmied, Alexander 
> <Al...@vishay.com> wrote:
>> Even with prefix query, I do not get "ABCD02" or any "ABCD02..." back. BTW: EdgeNGramFilterFactory is used on the field we are getting the suggestions/spellchecks from.
>> I think the problem is that there are a lot of different part numbers starting with "ABCD" and every part number has the same length. I showed only 4 in the example but there might be thousands.
>>
>> Here are some full part number examples that might be in the index:
>> ABCD1100000040
>> ABCD0000000000
>> ABCD9999999999
>> ABCD1000055500
>> ...
>>
>> I'm looking for a way to make Solr return distinct list of fixed 
>> length substrings of them, e.g. if "ABCD" is entered, I would need
>> ABCD00
>> ABCD01
>> ABCD02
>> ABCD03
>> ...
>> ABCD99
>>
>> Then if user chose "ABCD42" from the suggestions, I would need
>> ABCD4201
>> ABCD4202
>> ABCD4203
>> ...
>> ABCD4299
>>
>> and so on.
>>
>> I would be able to do some "post processing" if needed or adjust the schema or indexing process. But the key functionality I need from Solr is returning distinct set of those suggestions where only the last two characters change. All of the available combinations of those last two characters must be considered though. I need to show alpha-numerically sorted suggestions; the smallest value first.
>>
>> Thanks,
>> Alexander
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Michael Della Bitta [mailto:michael.della.bitta@appinions.com]
>> Gesendet: Donnerstag, 2. August 2012 15:02
>> An: solr-user@lucene.apache.org
>> Betreff: Re: Special suggestions requirement
>>
>> In this case, we're storing the overall value length and sorting it on that, then alphabetically.
>>
>> Also, how are your queries fashioned? If you're doing a prefix query, everything that matches it should score the same. If you're only doing a prefix query, you might need to add a term for exact matches as well to get them to show up.
>>
>> Michael Della Bitta
>>
>> ------------------------------------------------
>> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 
>> www.appinions.com Where Influence Isn't a Game
>>
>>
>> On Wed, Aug 1, 2012 at 9:58 PM, Lochschmied, Alexander <Al...@vishay.com> wrote:
>>> Is there a way to offer distinct, alphabetically sorted, fixed length options?
>>>
>>> I am trying to suggest part numbers and I'm currently trying to do it with the spellchecker component.
>>> Let's say "ABCD" was entered and we have indexed part numbers like
>>> ABCD00000000
>>> ABCD20000000
>>> ABCD21000000
>>> ABCD22000000
>>> ...
>>>
>>> I would like to have 2 characters suggested always, so for "ABCD", 
>>> it should suggest
>>> ABCD00
>>> ABCD20
>>> ABCD21
>>> ABCD22
>>> ...
>>>
>>> No smart sorting is needed, just alphabetically sorting. The problem is that for example 00 (or ABCD00) may not be suggested currently as it doesn't score high enough. But we are really trying to get all distinct values starting from the smallest (up to a certain number of suggestions).
>>>
>>> I was looking already at custom comparator class option. But this would probably not work as I would need more information to implement it there (like at least the currently entered search term, "ABCD" in the example).
>>>
>>> Thanks,
>>> Alexander

Re: Special suggestions requirement

Posted by Erick Erickson <er...@gmail.com>.
Would it work to use TermsComponent with wildcards?
Something like terms.regex="ABCD42??"...

see: http://wiki.apache.org/solr/TermsComponent/

Best
Erick


On Fri, Aug 3, 2012 at 9:07 AM, Michael Della Bitta
<mi...@appinions.com> wrote:
> I could be crazy, but it sounds to me like you need a trie, not a
> search index: http://en.wikipedia.org/wiki/Trie
>
> But in any case, what you want to do should be achievable. It seems
> like you need to do EdgeNgrams and facet on the results, where
> facet.counts > 1 to exclude the actual part numbers, since each of
> those would be distinct.
>
> I'm on the train right now, so I can't test this. :\
>
> Michael Della Bitta
>
> ------------------------------------------------
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> www.appinions.com
> Where Influence Isn’t a Game
>
>
> On Thu, Aug 2, 2012 at 9:19 PM, Lochschmied, Alexander
> <Al...@vishay.com> wrote:
>> Even with prefix query, I do not get "ABCD02" or any "ABCD02..." back. BTW: EdgeNGramFilterFactory is used on the field we are getting the suggestions/spellchecks from.
>> I think the problem is that there are a lot of different part numbers starting with "ABCD" and every part number has the same length. I showed only 4 in the example but there might be thousands.
>>
>> Here are some full part number examples that might be in the index:
>> ABCD1100000040
>> ABCD0000000000
>> ABCD9999999999
>> ABCD1000055500
>> ...
>>
>> I'm looking for a way to make Solr return distinct list of fixed length substrings of them, e.g. if "ABCD" is entered, I would need
>> ABCD00
>> ABCD01
>> ABCD02
>> ABCD03
>> ...
>> ABCD99
>>
>> Then if user chose "ABCD42" from the suggestions, I would need
>> ABCD4201
>> ABCD4202
>> ABCD4203
>> ...
>> ABCD4299
>>
>> and so on.
>>
>> I would be able to do some "post processing" if needed or adjust the schema or indexing process. But the key functionality I need from Solr is returning distinct set of those suggestions where only the last two characters change. All of the available combinations of those last two characters must be considered though. I need to show alpha-numerically sorted suggestions; the smallest value first.
>>
>> Thanks,
>> Alexander
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Michael Della Bitta [mailto:michael.della.bitta@appinions.com]
>> Gesendet: Donnerstag, 2. August 2012 15:02
>> An: solr-user@lucene.apache.org
>> Betreff: Re: Special suggestions requirement
>>
>> In this case, we're storing the overall value length and sorting it on that, then alphabetically.
>>
>> Also, how are your queries fashioned? If you're doing a prefix query, everything that matches it should score the same. If you're only doing a prefix query, you might need to add a term for exact matches as well to get them to show up.
>>
>> Michael Della Bitta
>>
>> ------------------------------------------------
>> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn't a Game
>>
>>
>> On Wed, Aug 1, 2012 at 9:58 PM, Lochschmied, Alexander <Al...@vishay.com> wrote:
>>> Is there a way to offer distinct, alphabetically sorted, fixed length options?
>>>
>>> I am trying to suggest part numbers and I'm currently trying to do it with the spellchecker component.
>>> Let's say "ABCD" was entered and we have indexed part numbers like
>>> ABCD00000000
>>> ABCD20000000
>>> ABCD21000000
>>> ABCD22000000
>>> ...
>>>
>>> I would like to have 2 characters suggested always, so for "ABCD", it
>>> should suggest
>>> ABCD00
>>> ABCD20
>>> ABCD21
>>> ABCD22
>>> ...
>>>
>>> No smart sorting is needed, just alphabetically sorting. The problem is that for example 00 (or ABCD00) may not be suggested currently as it doesn't score high enough. But we are really trying to get all distinct values starting from the smallest (up to a certain number of suggestions).
>>>
>>> I was looking already at custom comparator class option. But this would probably not work as I would need more information to implement it there (like at least the currently entered search term, "ABCD" in the example).
>>>
>>> Thanks,
>>> Alexander

Re: Special suggestions requirement

Posted by Michael Della Bitta <mi...@appinions.com>.
I could be crazy, but it sounds to me like you need a trie, not a
search index: http://en.wikipedia.org/wiki/Trie

But in any case, what you want to do should be achievable. It seems
like you need to do EdgeNgrams and facet on the results, where
facet.counts > 1 to exclude the actual part numbers, since each of
those would be distinct.

I'm on the train right now, so I can't test this. :\

Michael Della Bitta

------------------------------------------------
Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Thu, Aug 2, 2012 at 9:19 PM, Lochschmied, Alexander
<Al...@vishay.com> wrote:
> Even with prefix query, I do not get "ABCD02" or any "ABCD02..." back. BTW: EdgeNGramFilterFactory is used on the field we are getting the suggestions/spellchecks from.
> I think the problem is that there are a lot of different part numbers starting with "ABCD" and every part number has the same length. I showed only 4 in the example but there might be thousands.
>
> Here are some full part number examples that might be in the index:
> ABCD1100000040
> ABCD0000000000
> ABCD9999999999
> ABCD1000055500
> ...
>
> I'm looking for a way to make Solr return distinct list of fixed length substrings of them, e.g. if "ABCD" is entered, I would need
> ABCD00
> ABCD01
> ABCD02
> ABCD03
> ...
> ABCD99
>
> Then if user chose "ABCD42" from the suggestions, I would need
> ABCD4201
> ABCD4202
> ABCD4203
> ...
> ABCD4299
>
> and so on.
>
> I would be able to do some "post processing" if needed or adjust the schema or indexing process. But the key functionality I need from Solr is returning distinct set of those suggestions where only the last two characters change. All of the available combinations of those last two characters must be considered though. I need to show alpha-numerically sorted suggestions; the smallest value first.
>
> Thanks,
> Alexander
>
> -----Ursprüngliche Nachricht-----
> Von: Michael Della Bitta [mailto:michael.della.bitta@appinions.com]
> Gesendet: Donnerstag, 2. August 2012 15:02
> An: solr-user@lucene.apache.org
> Betreff: Re: Special suggestions requirement
>
> In this case, we're storing the overall value length and sorting it on that, then alphabetically.
>
> Also, how are your queries fashioned? If you're doing a prefix query, everything that matches it should score the same. If you're only doing a prefix query, you might need to add a term for exact matches as well to get them to show up.
>
> Michael Della Bitta
>
> ------------------------------------------------
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn't a Game
>
>
> On Wed, Aug 1, 2012 at 9:58 PM, Lochschmied, Alexander <Al...@vishay.com> wrote:
>> Is there a way to offer distinct, alphabetically sorted, fixed length options?
>>
>> I am trying to suggest part numbers and I'm currently trying to do it with the spellchecker component.
>> Let's say "ABCD" was entered and we have indexed part numbers like
>> ABCD00000000
>> ABCD20000000
>> ABCD21000000
>> ABCD22000000
>> ...
>>
>> I would like to have 2 characters suggested always, so for "ABCD", it
>> should suggest
>> ABCD00
>> ABCD20
>> ABCD21
>> ABCD22
>> ...
>>
>> No smart sorting is needed, just alphabetically sorting. The problem is that for example 00 (or ABCD00) may not be suggested currently as it doesn't score high enough. But we are really trying to get all distinct values starting from the smallest (up to a certain number of suggestions).
>>
>> I was looking already at custom comparator class option. But this would probably not work as I would need more information to implement it there (like at least the currently entered search term, "ABCD" in the example).
>>
>> Thanks,
>> Alexander

AW: Special suggestions requirement

Posted by "Lochschmied, Alexander" <Al...@vishay.com>.
Even with prefix query, I do not get "ABCD02" or any "ABCD02..." back. BTW: EdgeNGramFilterFactory is used on the field we are getting the suggestions/spellchecks from.
I think the problem is that there are a lot of different part numbers starting with "ABCD" and every part number has the same length. I showed only 4 in the example but there might be thousands.

Here are some full part number examples that might be in the index:
ABCD1100000040
ABCD0000000000
ABCD9999999999
ABCD1000055500
...

I'm looking for a way to make Solr return distinct list of fixed length substrings of them, e.g. if "ABCD" is entered, I would need
ABCD00
ABCD01
ABCD02
ABCD03
...
ABCD99

Then if user chose "ABCD42" from the suggestions, I would need
ABCD4201
ABCD4202	
ABCD4203
...
ABCD4299

and so on.

I would be able to do some "post processing" if needed or adjust the schema or indexing process. But the key functionality I need from Solr is returning distinct set of those suggestions where only the last two characters change. All of the available combinations of those last two characters must be considered though. I need to show alpha-numerically sorted suggestions; the smallest value first.

Thanks,
Alexander

-----Ursprüngliche Nachricht-----
Von: Michael Della Bitta [mailto:michael.della.bitta@appinions.com] 
Gesendet: Donnerstag, 2. August 2012 15:02
An: solr-user@lucene.apache.org
Betreff: Re: Special suggestions requirement

In this case, we're storing the overall value length and sorting it on that, then alphabetically.

Also, how are your queries fashioned? If you're doing a prefix query, everything that matches it should score the same. If you're only doing a prefix query, you might need to add a term for exact matches as well to get them to show up.

Michael Della Bitta

------------------------------------------------
Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn't a Game


On Wed, Aug 1, 2012 at 9:58 PM, Lochschmied, Alexander <Al...@vishay.com> wrote:
> Is there a way to offer distinct, alphabetically sorted, fixed length options?
>
> I am trying to suggest part numbers and I'm currently trying to do it with the spellchecker component.
> Let's say "ABCD" was entered and we have indexed part numbers like
> ABCD00000000
> ABCD20000000
> ABCD21000000
> ABCD22000000
> ...
>
> I would like to have 2 characters suggested always, so for "ABCD", it 
> should suggest
> ABCD00
> ABCD20
> ABCD21
> ABCD22
> ...
>
> No smart sorting is needed, just alphabetically sorting. The problem is that for example 00 (or ABCD00) may not be suggested currently as it doesn't score high enough. But we are really trying to get all distinct values starting from the smallest (up to a certain number of suggestions).
>
> I was looking already at custom comparator class option. But this would probably not work as I would need more information to implement it there (like at least the currently entered search term, "ABCD" in the example).
>
> Thanks,
> Alexander

Re: Special suggestions requirement

Posted by Michael Della Bitta <mi...@appinions.com>.
In this case, we're storing the overall value length and sorting it on
that, then alphabetically.

Also, how are your queries fashioned? If you're doing a prefix query,
everything that matches it should score the same. If you're only doing
a prefix query, you might need to add a term for exact matches as well
to get them to show up.

Michael Della Bitta

------------------------------------------------
Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 1, 2012 at 9:58 PM, Lochschmied, Alexander
<Al...@vishay.com> wrote:
> Is there a way to offer distinct, alphabetically sorted, fixed length options?
>
> I am trying to suggest part numbers and I'm currently trying to do it with the spellchecker component.
> Let's say "ABCD" was entered and we have indexed part numbers like
> ABCD00000000
> ABCD20000000
> ABCD21000000
> ABCD22000000
> ...
>
> I would like to have 2 characters suggested always, so for "ABCD", it should suggest
> ABCD00
> ABCD20
> ABCD21
> ABCD22
> ...
>
> No smart sorting is needed, just alphabetically sorting. The problem is that for example 00 (or ABCD00) may not be suggested currently as it doesn't score high enough. But we are really trying to get all distinct values starting from the smallest (up to a certain number of suggestions).
>
> I was looking already at custom comparator class option. But this would probably not work as I would need more information to implement it there (like at least the currently entered search term, "ABCD" in the example).
>
> Thanks,
> Alexander