You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Boris Quiroz <bo...@menco.it> on 2011/10/26 22:14:27 UTC

solr break up word

Hi,

I've solr running on a CentOS server working OK, but sometimes my application needs to index some parts of a word. For example, if I search 'dislike' word fine but if I search 'disl' it returns zero. Also, if I search 'disl*' returns some values (the same if I search for 'dislike') but if I search 'dislike*' it returns zero too. 

So, I've two questions:

1. How exactly the asterisk works as a wildcard?

2. What can I do to index properly parts of a word? I added this lines to my schema.xml:

<fieldType name="text" class="solr.TextField" omitNorms="false">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
      </analyzer>

      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
</fieldType>

But I can't get it to work. Is OK what I did or I'm wrong?

Thanks.

--
Boris Quiroz
boris.quiroz@menco.it


Re: solr break up word

Posted by Boris Quiroz <bo...@menco.it>.
Hi,

I solved the issue. I added to my schema.xml the following lines:

<analyzer>
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3"
maxGramSize="15" />
<filter class="solr.LowerCaseFilterFactory"/>
...
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
...
</analyzer>

Then, I re-index and everything is working great :-)

Thanks for your help.

On Fri, Oct 28, 2011 at 10:08 AM, Boris Quiroz <bo...@menco.it> wrote:
> Hi Erick,
>
> I'll try without the type="index" on analyzer tag and then I'll
> re-index some files.
>
> Thanks for you answer.
>
> On Thu, Oct 27, 2011 at 6:54 PM, Erick Erickson <er...@gmail.com> wrote:
>> Hmmm, I'm not sure what happens when you specify
>> <analyzer> (without type="index" and
>> <analyzer type="query">. I have no clue which one
>> is used.
>>
>> Look at the admin/analysis page to understand how things are
>> broken up.
>>
>> Did you re-index after you added the ngram filter?
>>
>> You'll get better help if you include example queries with
>> &debugQuery=on appended, it'll give us a lot more to
>> work with.
>>
>> Best
>> Erick
>>
>> On Wed, Oct 26, 2011 at 4:14 PM, Boris Quiroz <bo...@menco.it> wrote:
>>> Hi,
>>>
>>> I've solr running on a CentOS server working OK, but sometimes my application needs to index some parts of a word. For example, if I search 'dislike' word fine but if I search 'disl' it returns zero. Also, if I search 'disl*' returns some values (the same if I search for 'dislike') but if I search 'dislike*' it returns zero too.
>>>
>>> So, I've two questions:
>>>
>>> 1. How exactly the asterisk works as a wildcard?
>>>
>>> 2. What can I do to index properly parts of a word? I added this lines to my schema.xml:
>>>
>>> <fieldType name="text" class="solr.TextField" omitNorms="false">
>>>      <analyzer>
>>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>>        <filter class="solr.StandardFilterFactory"/>
>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
>>>      </analyzer>
>>>
>>>      <analyzer type="query">
>>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>>        <filter class="solr.StandardFilterFactory"/>
>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>      </analyzer>
>>> </fieldType>
>>>
>>> But I can't get it to work. Is OK what I did or I'm wrong?
>>>
>>> Thanks.
>>>
>>> --
>>> Boris Quiroz
>>> boris.quiroz@menco.it
>>>
>>>
>>
>
>
>
> --
> Boris Quiroz
> boris.quiroz@menco.it
>



-- 
Boris Quiroz
boris.quiroz@menco.it

Re: solr break up word

Posted by Boris Quiroz <bo...@menco.it>.
Hi Erick,

I'll try without the type="index" on analyzer tag and then I'll
re-index some files.

Thanks for you answer.

On Thu, Oct 27, 2011 at 6:54 PM, Erick Erickson <er...@gmail.com> wrote:
> Hmmm, I'm not sure what happens when you specify
> <analyzer> (without type="index" and
> <analyzer type="query">. I have no clue which one
> is used.
>
> Look at the admin/analysis page to understand how things are
> broken up.
>
> Did you re-index after you added the ngram filter?
>
> You'll get better help if you include example queries with
> &debugQuery=on appended, it'll give us a lot more to
> work with.
>
> Best
> Erick
>
> On Wed, Oct 26, 2011 at 4:14 PM, Boris Quiroz <bo...@menco.it> wrote:
>> Hi,
>>
>> I've solr running on a CentOS server working OK, but sometimes my application needs to index some parts of a word. For example, if I search 'dislike' word fine but if I search 'disl' it returns zero. Also, if I search 'disl*' returns some values (the same if I search for 'dislike') but if I search 'dislike*' it returns zero too.
>>
>> So, I've two questions:
>>
>> 1. How exactly the asterisk works as a wildcard?
>>
>> 2. What can I do to index properly parts of a word? I added this lines to my schema.xml:
>>
>> <fieldType name="text" class="solr.TextField" omitNorms="false">
>>      <analyzer>
>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>        <filter class="solr.StandardFilterFactory"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
>>      </analyzer>
>>
>>      <analyzer type="query">
>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>        <filter class="solr.StandardFilterFactory"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>      </analyzer>
>> </fieldType>
>>
>> But I can't get it to work. Is OK what I did or I'm wrong?
>>
>> Thanks.
>>
>> --
>> Boris Quiroz
>> boris.quiroz@menco.it
>>
>>
>



-- 
Boris Quiroz
boris.quiroz@menco.it

Re: solr break up word

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, I'm not sure what happens when you specify
<analyzer> (without type="index" and
<analyzer type="query">. I have no clue which one
is used.

Look at the admin/analysis page to understand how things are
broken up.

Did you re-index after you added the ngram filter?

You'll get better help if you include example queries with
&debugQuery=on appended, it'll give us a lot more to
work with.

Best
Erick

On Wed, Oct 26, 2011 at 4:14 PM, Boris Quiroz <bo...@menco.it> wrote:
> Hi,
>
> I've solr running on a CentOS server working OK, but sometimes my application needs to index some parts of a word. For example, if I search 'dislike' word fine but if I search 'disl' it returns zero. Also, if I search 'disl*' returns some values (the same if I search for 'dislike') but if I search 'dislike*' it returns zero too.
>
> So, I've two questions:
>
> 1. How exactly the asterisk works as a wildcard?
>
> 2. What can I do to index properly parts of a word? I added this lines to my schema.xml:
>
> <fieldType name="text" class="solr.TextField" omitNorms="false">
>      <analyzer>
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.StandardFilterFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
>      </analyzer>
>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.StandardFilterFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
> </fieldType>
>
> But I can't get it to work. Is OK what I did or I'm wrong?
>
> Thanks.
>
> --
> Boris Quiroz
> boris.quiroz@menco.it
>
>