You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Boris Quiroz <bo...@menco.it> on 2011/10/26 22:14:27 UTC
solr break up word
Hi,
I've solr running on a CentOS server working OK, but sometimes my application needs to index some parts of a word. For example, if I search 'dislike' word fine but if I search 'disl' it returns zero. Also, if I search 'disl*' returns some values (the same if I search for 'dislike') but if I search 'dislike*' it returns zero too.
So, I've two questions:
1. How exactly the asterisk works as a wildcard?
2. What can I do to index properly parts of a word? I added this lines to my schema.xml:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
But I can't get it to work. Is OK what I did or I'm wrong?
Thanks.
--
Boris Quiroz
boris.quiroz@menco.it
Re: solr break up word
Posted by Boris Quiroz <bo...@menco.it>.
Hi,
I solved the issue. I added to my schema.xml the following lines:
<analyzer>
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3"
maxGramSize="15" />
<filter class="solr.LowerCaseFilterFactory"/>
...
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
...
</analyzer>
Then, I re-index and everything is working great :-)
Thanks for your help.
On Fri, Oct 28, 2011 at 10:08 AM, Boris Quiroz <bo...@menco.it> wrote:
> Hi Erick,
>
> I'll try without the type="index" on analyzer tag and then I'll
> re-index some files.
>
> Thanks for you answer.
>
> On Thu, Oct 27, 2011 at 6:54 PM, Erick Erickson <er...@gmail.com> wrote:
>> Hmmm, I'm not sure what happens when you specify
>> <analyzer> (without type="index" and
>> <analyzer type="query">. I have no clue which one
>> is used.
>>
>> Look at the admin/analysis page to understand how things are
>> broken up.
>>
>> Did you re-index after you added the ngram filter?
>>
>> You'll get better help if you include example queries with
>> &debugQuery=on appended, it'll give us a lot more to
>> work with.
>>
>> Best
>> Erick
>>
>> On Wed, Oct 26, 2011 at 4:14 PM, Boris Quiroz <bo...@menco.it> wrote:
>>> Hi,
>>>
>>> I've solr running on a CentOS server working OK, but sometimes my application needs to index some parts of a word. For example, if I search 'dislike' word fine but if I search 'disl' it returns zero. Also, if I search 'disl*' returns some values (the same if I search for 'dislike') but if I search 'dislike*' it returns zero too.
>>>
>>> So, I've two questions:
>>>
>>> 1. How exactly the asterisk works as a wildcard?
>>>
>>> 2. What can I do to index properly parts of a word? I added this lines to my schema.xml:
>>>
>>> <fieldType name="text" class="solr.TextField" omitNorms="false">
>>> <analyzer>
>>> <tokenizer class="solr.StandardTokenizerFactory"/>
>>> <filter class="solr.StandardFilterFactory"/>
>>> <filter class="solr.LowerCaseFilterFactory"/>
>>> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
>>> </analyzer>
>>>
>>> <analyzer type="query">
>>> <tokenizer class="solr.StandardTokenizerFactory"/>
>>> <filter class="solr.StandardFilterFactory"/>
>>> <filter class="solr.LowerCaseFilterFactory"/>
>>> </analyzer>
>>> </fieldType>
>>>
>>> But I can't get it to work. Is OK what I did or I'm wrong?
>>>
>>> Thanks.
>>>
>>> --
>>> Boris Quiroz
>>> boris.quiroz@menco.it
>>>
>>>
>>
>
>
>
> --
> Boris Quiroz
> boris.quiroz@menco.it
>
--
Boris Quiroz
boris.quiroz@menco.it
Re: solr break up word
Posted by Boris Quiroz <bo...@menco.it>.
Hi Erick,
I'll try without the type="index" on analyzer tag and then I'll
re-index some files.
Thanks for you answer.
On Thu, Oct 27, 2011 at 6:54 PM, Erick Erickson <er...@gmail.com> wrote:
> Hmmm, I'm not sure what happens when you specify
> <analyzer> (without type="index" and
> <analyzer type="query">. I have no clue which one
> is used.
>
> Look at the admin/analysis page to understand how things are
> broken up.
>
> Did you re-index after you added the ngram filter?
>
> You'll get better help if you include example queries with
> &debugQuery=on appended, it'll give us a lot more to
> work with.
>
> Best
> Erick
>
> On Wed, Oct 26, 2011 at 4:14 PM, Boris Quiroz <bo...@menco.it> wrote:
>> Hi,
>>
>> I've solr running on a CentOS server working OK, but sometimes my application needs to index some parts of a word. For example, if I search 'dislike' word fine but if I search 'disl' it returns zero. Also, if I search 'disl*' returns some values (the same if I search for 'dislike') but if I search 'dislike*' it returns zero too.
>>
>> So, I've two questions:
>>
>> 1. How exactly the asterisk works as a wildcard?
>>
>> 2. What can I do to index properly parts of a word? I added this lines to my schema.xml:
>>
>> <fieldType name="text" class="solr.TextField" omitNorms="false">
>> <analyzer>
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.StandardFilterFactory"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
>> </analyzer>
>>
>> <analyzer type="query">
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.StandardFilterFactory"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> </analyzer>
>> </fieldType>
>>
>> But I can't get it to work. Is OK what I did or I'm wrong?
>>
>> Thanks.
>>
>> --
>> Boris Quiroz
>> boris.quiroz@menco.it
>>
>>
>
--
Boris Quiroz
boris.quiroz@menco.it
Re: solr break up word
Posted by Erick Erickson <er...@gmail.com>.
Hmmm, I'm not sure what happens when you specify
<analyzer> (without type="index" and
<analyzer type="query">. I have no clue which one
is used.
Look at the admin/analysis page to understand how things are
broken up.
Did you re-index after you added the ngram filter?
You'll get better help if you include example queries with
&debugQuery=on appended, it'll give us a lot more to
work with.
Best
Erick
On Wed, Oct 26, 2011 at 4:14 PM, Boris Quiroz <bo...@menco.it> wrote:
> Hi,
>
> I've solr running on a CentOS server working OK, but sometimes my application needs to index some parts of a word. For example, if I search 'dislike' word fine but if I search 'disl' it returns zero. Also, if I search 'disl*' returns some values (the same if I search for 'dislike') but if I search 'dislike*' it returns zero too.
>
> So, I've two questions:
>
> 1. How exactly the asterisk works as a wildcard?
>
> 2. What can I do to index properly parts of a word? I added this lines to my schema.xml:
>
> <fieldType name="text" class="solr.TextField" omitNorms="false">
> <analyzer>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
> </analyzer>
>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
>
> But I can't get it to work. Is OK what I did or I'm wrong?
>
> Thanks.
>
> --
> Boris Quiroz
> boris.quiroz@menco.it
>
>