You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Michał Matulka <mi...@gowork.pl> on 2013/05/28 10:43:17 UTC

Strange behavior on text field with number-text content

Hello,

I've got following problem. I have a text type in my schema and a field 
"name" of that type.
That field contains a data, there is, for example, record that has 
"300letters" as name.

Now field type definition:
<fieldType name="text" class="solr.TextField"></fieldType>

And, of course, field definition:
<fieldname="name"type="text"indexed="true"stored="true"/>

yes, that's all - there are no tokenizers.

And now time for my question:

Why following queries:

name:300

and

name:letters

are returning that result, but:

name:300letters

is not (0 results)?

Best regards,
Michał Matulka

Re: Strange behavior on text field with number-text content

Posted by Michał Matulka <mi...@gowork.pl>.

Thank you for your response.
I looked at analysis and, as far as I see, if I'll put "4nSolution Inc." 
to "Index" and "4nSolution" to "Query" I get following tokens at the end:

index:
4nsolut, 4, n, solut, inc., inc
query:
4, nsolut, 4nsolut

So... 4nsolut occurs at both and technically if it does than shouldn't 
it be found? Now I don't know whether I don't have any idea how that 
tool works or something's broken here.

Your advice to change split on case change to "1" fixed 1 query: 
company_name:4nSolution but didn't fix company_name:4nsolution . 
Changing "generateNumberParts" to "0" on query helps to "4nsolution" so 
it theoretically fixes my problem BUT I still don't know why analysis 
show different results from the regular searching at previous scenario 
and I'm dying to know that.

W dniu 29.05.2013 13:11, Erick Erickson pisze:
> Hmmm, there are two things you _must_ get familiar with when diagnosing
> these <G>..
>
> 1> admin/analysis. That'll show you exactly what the analysis chain does,
> and it's
>       not always obvious.
> 2> add &debug=query to your input and look at the parsed query results. For
> instance,
>       this "name:4nSolution Inc." parses as name:4nSolution defaultfield:inc.
>
> That doesn't explain why name=4nSolutions, except......
>
> your index chain has splitOnCaseChange=1 and your query bit has
> splitOnCaseChange=0
> which doesn't seem right....
>
> Best
> Erick
>
>
> On Tue, May 28, 2013 at 10:31 AM, Алексей Цой <al...@gmail.com> wrote:
>
>> solr-user-unsubscribe <so...@lucene.apache.org>
>>
>>
>> 2013/5/28 Michał Matulka <mi...@gowork.pl>
>>
>>>   Thanks for your responses, I must admit that after hours of trying I
>>> made some mistakes.
>>> So the most problematic phrase will now be:
>>> "4nSolution Inc." which cannot be found using query:
>>>
>>> name:4nSolution
>>>
>>> or even
>>>
>>> name:4nSolution Inc.
>>>
>>> but can be using following queries:
>>>
>>> name:nSolution
>>> name:4
>>> name:inc
>>>
>>> Sorry for the mess, it turned out I didn't reindex fields after modyfying
>>> schema so I thought that the problem also applies to 300letters .
>>>
>>> The cause of all of this is the WordDelimiter filter defined as following:
>>>
>>> <fieldType name="text" class="solr.TextField">
>>>        <analyzer type="index">
>>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>          <!-- in this example, we will only use synonyms at query time
>>>          <filter class="solr.SynonymFilterFactory"
>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>>          -->
>>>          <!-- Case insensitive stop word removal.
>>>            add enablePositionIncrements=true in both the index and query
>>>            analyzers to leave a 'gap' for more accurate phrase queries.
>>>          -->
>>>          <filter class="solr.StopFilterFactory"
>>>                  ignoreCase="true"
>>>                  words="stopwords.txt"
>>>                  enablePositionIncrements="true"
>>>                  />
>>>          <filter class="solr.WordDelimiterFilterFactory"
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
>>> preserveOriginal="1"/>
>>>          <filter class="solr.LowerCaseFilterFactory"/>
>>>          <filter class="solr.SnowballPorterFilterFactory"
>>> language="English" protected="protwords.txt"/>
>>>        </analyzer>
>>>        <analyzer type="query">
>>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>> ignoreCase="true" expand="true"/>
>>>          <filter class="solr.StopFilterFactory"
>>>                  ignoreCase="true"
>>>                  words="stopwords.txt"
>>>                  enablePositionIncrements="true"
>>>                  />
>>>          <filter class="solr.WordDelimiterFilterFactory"
>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>>> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
>>> preserveOriginal="1" />
>>>          <filter class="solr.LowerCaseFilterFactory"/>
>>>          <filter class="solr.SnowballPorterFilterFactory"
>>> language="English" protected="protwords.txt"/>
>>>        </analyzer>
>>>      </fieldType>
>>>
>>> and I still don't know why it behaves like that - after all there is
>>> "preserveOriginal" attribute set to 1...
>>>
>>> W dniu 28.05.2013 14:21, Erick Erickson pisze:
>>>
>>> Hmmm, with 4.x I get much different behavior than you're
>>> describing, what version of Solr are you using?
>>>
>>> Besides Alex's comments, try adding &debug=query to the url and see what comes
>>> out from the query parser.
>>>
>>> A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do
>>> any analysis, here's the javadoc...
>>>   /**
>>>     * Default analyzer for types that only produces 1 verbatim token...
>>>     * A maximum size of chars to be read must be specified
>>>     */
>>>
>>> so it's much like the "string" type. Which means I'm totally perplexed by your
>>> statement that 300 and letters return a hit. Have you perhaps changed the
>>> field definition and not re-indexed?
>>>
>>> The behavior you're seeing really looks like somehow WordDelimiterFilterFactory
>>> is getting into your analysis chain with settings that don't mash the parts back
>>> together, i.e. you can set up WDDF to split on letter/number transitions, index
>>> each and NOT index the original, but I have no explanation for how that
>>> could happen with the field definition you indicated....
>>>
>>> FWIW,
>>> Erick
>>>
>>> On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch<ar...@gmail.com> <ar...@gmail.com> wrote:
>>>
>>>    What does analyzer screen say in the Web AdminUI when you try to do that?
>>> Also, what are the tokens stored in the field (also in Web AdminUI).
>>>
>>> I think it is very strange to have TextField without a tokenizer chain.
>>> Maybe you get a standard one assigned by default, but I don't know what the
>>> standard chain would be.
>>>
>>> Regards,
>>>
>>>    Alex.
>>> On 28 May 2013 04:44, "Michał Matulka" <mi...@gowork.pl> <mi...@gowork.pl> wrote:
>>>
>>>
>>>   Hello,
>>>
>>> I've got following problem. I have a text type in my schema and a field
>>> "name" of that type.
>>> That field contains a data, there is, for example, record that has
>>> "300letters" as name.
>>>
>>> Now field type definition:
>>> <fieldType name="text" class="solr.TextField"></**fieldType>
>>>
>>> And, of course, field definition:
>>> <fieldname="name"type="text"**indexed="true"stored="true"/>
>>>
>>> yes, that's all - there are no tokenizers.
>>>
>>> And now time for my question:
>>>
>>> Why following queries:
>>>
>>> name:300
>>>
>>> and
>>>
>>> name:letters
>>>
>>> are returning that result, but:
>>>
>>> name:300letters
>>>
>>> is not (0 results)?
>>>
>>> Best regards,
>>> Michał Matulka
>>>
>>>
>>>
>>>
>>> --
>>>   Pozdrawiam,
>>> Michał Matulka
>>>   Programista
>>>   michal.matulka@gowork.pl
>>>
>>>
>>>   *[image: GoWork.pl]*
>>>   ul. Zielna 39
>>>   00-108 Warszawa
>>>   www.GoWork.pl
>>>
>>


-- 
Pozdrawiam,
Michał Matulka
Programista
michal.matulka@gowork.pl

<ma...@gowork.pl>
*GoWork.pl*
ul. Zielna 39
00-108 Warszawa
www.GoWork.pl <http://www.GoWork.pl>

Re: Strange behavior on text field with number-text content

Posted by Erick Erickson <er...@gmail.com>.

Hmmm, there are two things you _must_ get familiar with when diagnosing
these <G>..

1> admin/analysis. That'll show you exactly what the analysis chain does,
and it's
     not always obvious.
2> add &debug=query to your input and look at the parsed query results. For
instance,
     this "name:4nSolution Inc." parses as name:4nSolution defaultfield:inc.

That doesn't explain why name=4nSolutions, except......

your index chain has splitOnCaseChange=1 and your query bit has
splitOnCaseChange=0
which doesn't seem right....

Best
Erick


On Tue, May 28, 2013 at 10:31 AM, Алексей Цой <al...@gmail.com> wrote:

> solr-user-unsubscribe <so...@lucene.apache.org>
>
>
> 2013/5/28 Michał Matulka <mi...@gowork.pl>
>
>>  Thanks for your responses, I must admit that after hours of trying I
>> made some mistakes.
>> So the most problematic phrase will now be:
>> "4nSolution Inc." which cannot be found using query:
>>
>> name:4nSolution
>>
>> or even
>>
>> name:4nSolution Inc.
>>
>> but can be using following queries:
>>
>> name:nSolution
>> name:4
>> name:inc
>>
>> Sorry for the mess, it turned out I didn't reindex fields after modyfying
>> schema so I thought that the problem also applies to 300letters .
>>
>> The cause of all of this is the WordDelimiter filter defined as following:
>>
>> <fieldType name="text" class="solr.TextField">
>>       <analyzer type="index">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <!-- in this example, we will only use synonyms at query time
>>         <filter class="solr.SynonymFilterFactory"
>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>         -->
>>         <!-- Case insensitive stop word removal.
>>           add enablePositionIncrements=true in both the index and query
>>           analyzers to leave a 'gap' for more accurate phrase queries.
>>         -->
>>         <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
>> preserveOriginal="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>       </analyzer>
>>       <analyzer type="query">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>         <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
>> preserveOriginal="1" />
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>       </analyzer>
>>     </fieldType>
>>
>> and I still don't know why it behaves like that - after all there is
>> "preserveOriginal" attribute set to 1...
>>
>> W dniu 28.05.2013 14:21, Erick Erickson pisze:
>>
>> Hmmm, with 4.x I get much different behavior than you're
>> describing, what version of Solr are you using?
>>
>> Besides Alex's comments, try adding &debug=query to the url and see what comes
>> out from the query parser.
>>
>> A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do
>> any analysis, here's the javadoc...
>>  /**
>>    * Default analyzer for types that only produces 1 verbatim token...
>>    * A maximum size of chars to be read must be specified
>>    */
>>
>> so it's much like the "string" type. Which means I'm totally perplexed by your
>> statement that 300 and letters return a hit. Have you perhaps changed the
>> field definition and not re-indexed?
>>
>> The behavior you're seeing really looks like somehow WordDelimiterFilterFactory
>> is getting into your analysis chain with settings that don't mash the parts back
>> together, i.e. you can set up WDDF to split on letter/number transitions, index
>> each and NOT index the original, but I have no explanation for how that
>> could happen with the field definition you indicated....
>>
>> FWIW,
>> Erick
>>
>> On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch<ar...@gmail.com> <ar...@gmail.com> wrote:
>>
>>   What does analyzer screen say in the Web AdminUI when you try to do that?
>> Also, what are the tokens stored in the field (also in Web AdminUI).
>>
>> I think it is very strange to have TextField without a tokenizer chain.
>> Maybe you get a standard one assigned by default, but I don't know what the
>> standard chain would be.
>>
>> Regards,
>>
>>   Alex.
>> On 28 May 2013 04:44, "Michał Matulka" <mi...@gowork.pl> <mi...@gowork.pl> wrote:
>>
>>
>>  Hello,
>>
>> I've got following problem. I have a text type in my schema and a field
>> "name" of that type.
>> That field contains a data, there is, for example, record that has
>> "300letters" as name.
>>
>> Now field type definition:
>> <fieldType name="text" class="solr.TextField"></**fieldType>
>>
>> And, of course, field definition:
>> <fieldname="name"type="text"**indexed="true"stored="true"/>
>>
>> yes, that's all - there are no tokenizers.
>>
>> And now time for my question:
>>
>> Why following queries:
>>
>> name:300
>>
>> and
>>
>> name:letters
>>
>> are returning that result, but:
>>
>> name:300letters
>>
>> is not (0 results)?
>>
>> Best regards,
>> Michał Matulka
>>
>>
>>
>>
>> --
>>  Pozdrawiam,
>> Michał Matulka
>>  Programista
>>  michal.matulka@gowork.pl
>>
>>
>>  *[image: GoWork.pl]*
>>  ul. Zielna 39
>>  00-108 Warszawa
>>  www.GoWork.pl
>>
>
>

Re: Strange behavior on text field with number-text content

Posted by Алексей Цой <al...@gmail.com>.

solr-user-unsubscribe <so...@lucene.apache.org>


2013/5/28 Michał Matulka <mi...@gowork.pl>

>  Thanks for your responses, I must admit that after hours of trying I
> made some mistakes.
> So the most problematic phrase will now be:
> "4nSolution Inc." which cannot be found using query:
>
> name:4nSolution
>
> or even
>
> name:4nSolution Inc.
>
> but can be using following queries:
>
> name:nSolution
> name:4
> name:inc
>
> Sorry for the mess, it turned out I didn't reindex fields after modyfying
> schema so I thought that the problem also applies to 300letters .
>
> The cause of all of this is the WordDelimiter filter defined as following:
>
> <fieldType name="text" class="solr.TextField">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <!-- Case insensitive stop word removal.
>           add enablePositionIncrements=true in both the index and query
>           analyzers to leave a 'gap' for more accurate phrase queries.
>         -->
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
> preserveOriginal="1" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>       </analyzer>
>     </fieldType>
>
> and I still don't know why it behaves like that - after all there is
> "preserveOriginal" attribute set to 1...
>
> W dniu 28.05.2013 14:21, Erick Erickson pisze:
>
> Hmmm, with 4.x I get much different behavior than you're
> describing, what version of Solr are you using?
>
> Besides Alex's comments, try adding &debug=query to the url and see what comes
> out from the query parser.
>
> A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do
> any analysis, here's the javadoc...
>  /**
>    * Default analyzer for types that only produces 1 verbatim token...
>    * A maximum size of chars to be read must be specified
>    */
>
> so it's much like the "string" type. Which means I'm totally perplexed by your
> statement that 300 and letters return a hit. Have you perhaps changed the
> field definition and not re-indexed?
>
> The behavior you're seeing really looks like somehow WordDelimiterFilterFactory
> is getting into your analysis chain with settings that don't mash the parts back
> together, i.e. you can set up WDDF to split on letter/number transitions, index
> each and NOT index the original, but I have no explanation for how that
> could happen with the field definition you indicated....
>
> FWIW,
> Erick
>
> On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch<ar...@gmail.com> <ar...@gmail.com> wrote:
>
>   What does analyzer screen say in the Web AdminUI when you try to do that?
> Also, what are the tokens stored in the field (also in Web AdminUI).
>
> I think it is very strange to have TextField without a tokenizer chain.
> Maybe you get a standard one assigned by default, but I don't know what the
> standard chain would be.
>
> Regards,
>
>   Alex.
> On 28 May 2013 04:44, "Michał Matulka" <mi...@gowork.pl> <mi...@gowork.pl> wrote:
>
>
>  Hello,
>
> I've got following problem. I have a text type in my schema and a field
> "name" of that type.
> That field contains a data, there is, for example, record that has
> "300letters" as name.
>
> Now field type definition:
> <fieldType name="text" class="solr.TextField"></**fieldType>
>
> And, of course, field definition:
> <fieldname="name"type="text"**indexed="true"stored="true"/>
>
> yes, that's all - there are no tokenizers.
>
> And now time for my question:
>
> Why following queries:
>
> name:300
>
> and
>
> name:letters
>
> are returning that result, but:
>
> name:300letters
>
> is not (0 results)?
>
> Best regards,
> Michał Matulka
>
>
>
>
> --
>  Pozdrawiam,
> Michał Matulka
>  Programista
>  michal.matulka@gowork.pl
>
>
>  *[image: GoWork.pl]*
>  ul. Zielna 39
>  00-108 Warszawa
>  www.GoWork.pl
>

Re: Strange behavior on text field with number-text content

Posted by Michał Matulka <mi...@gowork.pl>.

Thanks for your responses, I must admit that after hours of trying I 
made some mistakes.
So the most problematic phrase will now be:
"4nSolution Inc." which cannot be found using query:

name:4nSolution

or even

name:4nSolution Inc.

but can be using following queries:

name:nSolution
name:4
name:inc

Sorry for the mess, it turned out I didn't reindex fields after 
modyfying schema so I thought that the problem also applies to 300letters .

The cause of all of this is the WordDelimiter filter defined as following:

<fieldType name="text" class="solr.TextField">
       <analyzer type="index">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <!-- in this example, we will only use synonyms at query time
         <filter class="solr.SynonymFilterFactory" 
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
         -->
         <!-- Case insensitive stop word removal.
           add enablePositionIncrements=true in both the index and query
           analyzers to leave a 'gap' for more accurate phrase queries.
         -->
         <filter class="solr.StopFilterFactory"
                 ignoreCase="true"
                 words="stopwords.txt"
                 enablePositionIncrements="true"
                 />
         <filter class="solr.WordDelimiterFilterFactory" 
generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" 
preserveOriginal="1"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.SnowballPorterFilterFactory" 
language="English" protected="protwords.txt"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.SynonymFilterFactory" 
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
         <filter class="solr.StopFilterFactory"
                 ignoreCase="true"
                 words="stopwords.txt"
                 enablePositionIncrements="true"
                 />
         <filter class="solr.WordDelimiterFilterFactory" 
generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="1" splitOnCaseChange="0" 
preserveOriginal="1" />
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.SnowballPorterFilterFactory" 
language="English" protected="protwords.txt"/>
       </analyzer>
     </fieldType>

and I still don't know why it behaves like that - after all there is 
"preserveOriginal" attribute set to 1...

W dniu 28.05.2013 14:21, Erick Erickson pisze:
> Hmmm, with 4.x I get much different behavior than you're
> describing, what version of Solr are you using?
>
> Besides Alex's comments, try adding &debug=query to the url and see what comes
> out from the query parser.
>
> A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do
> any analysis, here's the javadoc...
>   /**
>     * Default analyzer for types that only produces 1 verbatim token...
>     * A maximum size of chars to be read must be specified
>     */
>
> so it's much like the "string" type. Which means I'm totally perplexed by your
> statement that 300 and letters return a hit. Have you perhaps changed the
> field definition and not re-indexed?
>
> The behavior you're seeing really looks like somehow WordDelimiterFilterFactory
> is getting into your analysis chain with settings that don't mash the parts back
> together, i.e. you can set up WDDF to split on letter/number transitions, index
> each and NOT index the original, but I have no explanation for how that
> could happen with the field definition you indicated....
>
> FWIW,
> Erick
>
> On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch
> <ar...@gmail.com> wrote:
>>   What does analyzer screen say in the Web AdminUI when you try to do that?
>> Also, what are the tokens stored in the field (also in Web AdminUI).
>>
>> I think it is very strange to have TextField without a tokenizer chain.
>> Maybe you get a standard one assigned by default, but I don't know what the
>> standard chain would be.
>>
>> Regards,
>>
>>    Alex.
>> On 28 May 2013 04:44, "Michał Matulka" <mi...@gowork.pl> wrote:
>>
>>> Hello,
>>>
>>> I've got following problem. I have a text type in my schema and a field
>>> "name" of that type.
>>> That field contains a data, there is, for example, record that has
>>> "300letters" as name.
>>>
>>> Now field type definition:
>>> <fieldType name="text" class="solr.TextField"></**fieldType>
>>>
>>> And, of course, field definition:
>>> <fieldname="name"type="text"**indexed="true"stored="true"/>
>>>
>>> yes, that's all - there are no tokenizers.
>>>
>>> And now time for my question:
>>>
>>> Why following queries:
>>>
>>> name:300
>>>
>>> and
>>>
>>> name:letters
>>>
>>> are returning that result, but:
>>>
>>> name:300letters
>>>
>>> is not (0 results)?
>>>
>>> Best regards,
>>> Michał Matulka
>>>


-- 
Pozdrawiam,
Michał Matulka
Programista
michal.matulka@gowork.pl

<ma...@gowork.pl>
*GoWork.pl*
ul. Zielna 39
00-108 Warszawa
www.GoWork.pl <http://www.GoWork.pl>

Re: Strange behavior on text field with number-text content

Posted by Erick Erickson <er...@gmail.com>.

Hmmm, with 4.x I get much different behavior than you're
describing, what version of Solr are you using?

Besides Alex's comments, try adding &debug=query to the url and see what comes
out from the query parser.

A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do
any analysis, here's the javadoc...
 /**
   * Default analyzer for types that only produces 1 verbatim token...
   * A maximum size of chars to be read must be specified
   */

so it's much like the "string" type. Which means I'm totally perplexed by your
statement that 300 and letters return a hit. Have you perhaps changed the
field definition and not re-indexed?

The behavior you're seeing really looks like somehow WordDelimiterFilterFactory
is getting into your analysis chain with settings that don't mash the parts back
together, i.e. you can set up WDDF to split on letter/number transitions, index
each and NOT index the original, but I have no explanation for how that
could happen with the field definition you indicated....

FWIW,
Erick

On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch
<ar...@gmail.com> wrote:
>  What does analyzer screen say in the Web AdminUI when you try to do that?
> Also, what are the tokens stored in the field (also in Web AdminUI).
>
> I think it is very strange to have TextField without a tokenizer chain.
> Maybe you get a standard one assigned by default, but I don't know what the
> standard chain would be.
>
> Regards,
>
>   Alex.
> On 28 May 2013 04:44, "Michał Matulka" <mi...@gowork.pl> wrote:
>
>> Hello,
>>
>> I've got following problem. I have a text type in my schema and a field
>> "name" of that type.
>> That field contains a data, there is, for example, record that has
>> "300letters" as name.
>>
>> Now field type definition:
>> <fieldType name="text" class="solr.TextField"></**fieldType>
>>
>> And, of course, field definition:
>> <fieldname="name"type="text"**indexed="true"stored="true"/>
>>
>> yes, that's all - there are no tokenizers.
>>
>> And now time for my question:
>>
>> Why following queries:
>>
>> name:300
>>
>> and
>>
>> name:letters
>>
>> are returning that result, but:
>>
>> name:300letters
>>
>> is not (0 results)?
>>
>> Best regards,
>> Michał Matulka
>>

Re: Strange behavior on text field with number-text content

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

 What does analyzer screen say in the Web AdminUI when you try to do that?
Also, what are the tokens stored in the field (also in Web AdminUI).

I think it is very strange to have TextField without a tokenizer chain.
Maybe you get a standard one assigned by default, but I don't know what the
standard chain would be.

Regards,

  Alex.
On 28 May 2013 04:44, "Michał Matulka" <mi...@gowork.pl> wrote:

> Hello,
>
> I've got following problem. I have a text type in my schema and a field
> "name" of that type.
> That field contains a data, there is, for example, record that has
> "300letters" as name.
>
> Now field type definition:
> <fieldType name="text" class="solr.TextField"></**fieldType>
>
> And, of course, field definition:
> <fieldname="name"type="text"**indexed="true"stored="true"/>
>
> yes, that's all - there are no tokenizers.
>
> And now time for my question:
>
> Why following queries:
>
> name:300
>
> and
>
> name:letters
>
> are returning that result, but:
>
> name:300letters
>
> is not (0 results)?
>
> Best regards,
> Michał Matulka
>