You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Solr user <um...@yahoo.co.in> on 2010/01/28 11:25:08 UTC

index of facet fields are not same as original string

Hi,

 I am new to Solr. I found facets fields does not reflect the original
string in the record. For example,

the returned xml is,

- <doc>
  <str name="g_number">G-EUPE</str> 
</doc>
- <lst name="facet_counts">
  <lst name="facet_queries" /> 
- <lst name="facet_fields">
- 	<lst name="g_number">
  <int name="gupe">1</int>
</lst>
  </lst>
-  <lst name="facet_dates" /> 
  </lst>

Here, "G-EUPE" is displayed under facet field as 'gupe' where it is not
capital and missing '-' from the original string. Is there any way we could
fix this to match the original text in record? Thanks in advance.

Regards,
uma
-- 
View this message in context: http://old.nabble.com/index-of-facet-fields-are-not-same-as-original-string-tp27353838p27353838.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: index of facet fields are not same as original string

Posted by Joe Calderon <ca...@gmail.com>.

facets are based off the indexed version of your string nor the stored
version, you probably have an analyzer thats removing punctuation,
most people index the same field multiple ways for different purposes,
matching. storting, faceting etc...

index a copy of your field as string type and facet on that

On Thu, Jan 28, 2010 at 3:12 AM, Sergey Pavlikovskiy
<pa...@gmail.com> wrote:
> Hi,
>
> probably, it's because of stemming
> if you need unstemmed text you can use 'textgen' data type for the field
>
> Sergey
>
> On Thu, Jan 28, 2010 at 12:25 PM, Solr user <um...@yahoo.co.in>wrote:
>
>>
>> Hi,
>>
>>  I am new to Solr. I found facets fields does not reflect the original
>> string in the record. For example,
>>
>> the returned xml is,
>>
>> - <doc>
>>  <str name="g_number">G-EUPE</str>
>> </doc>
>> - <lst name="facet_counts">
>>  <lst name="facet_queries" />
>> - <lst name="facet_fields">
>> -       <lst name="g_number">
>>  <int name="gupe">1</int>
>> </lst>
>>  </lst>
>> -  <lst name="facet_dates" />
>>  </lst>
>>
>> Here, "G-EUPE" is displayed under facet field as 'gupe' where it is not
>> capital and missing '-' from the original string. Is there any way we could
>> fix this to match the original text in record? Thanks in advance.
>>
>> Regards,
>> uma
>> --
>> View this message in context:
>> http://old.nabble.com/index-of-facet-fields-are-not-same-as-original-string-tp27353838p27353838.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>

Re: index of facet fields are not same as original string

Posted by Solr user <um...@yahoo.co.in>.

Hi Lance,

  I created a new fieldtype with solr.KeywordTokenizerFactory class in
analyser and it worked for me. Thanks for all your help.

Regards,
Uma


Lance Norskog-2 wrote:
> 
> After you change the schema.xml file, you have to rebuild the index
> completely. At that point, g_number fields should not be stemmed.
> 
> You can examine what these text field types do.
> 
> http://wiki.apache.org/solr/FAQ#My_search_returns_too_many_.2BAC8_too_little_.2BAC8_unexpected_results.2C_how_to_debug.3F
> 
> http://www.lucidimagination.com/search/document/CDRG_ch05_5.9?q=analysis.jsp
> 
> On Thu, Jan 28, 2010 at 3:19 PM, Solr user <um...@yahoo.co.in>
> wrote:
>>
>> Hi Sergey,
>>
>> In schema.xml, i have got by default
>>
>>    <!-- A general unstemmed text field - good if one does not know the
>> language of the field -->^M
>>    <fieldType name="textgen" class="solr.TextField"
>> positionIncrementGap="100">^M
>>      <analyzer type="index">^M
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>^M
>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" enablePositionIncrements="true" />^M
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1" caten
>> ateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>^M
>>        <filter class="solr.LowerCaseFilterFactory"/>^M
>>      </analyzer>^M
>>      <analyzer type="query">^M
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>^M
>>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>^M
>>        <filter class="solr.StopFilterFactory"^M
>>                ignoreCase="true"^M
>>                words="stopwords.txt"^M
>>                enablePositionIncrements="true"^M
>>                />^M
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0" caten
>> ateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>^M
>>        <filter class="solr.LowerCaseFilterFactory"/>^M
>>      </analyzer>^M
>>    </fieldType>^M
>>
>>
>> and i added following entry in schema.xml file,
>>
>> <field name="g_number" type="textgen" indexed="true" stored="true"/>
>>
>> But it didnt help. Still the texts are not in original format. Correct me
>> if
>> i am wrong.
>>
>> Regards,
>> Uma
>>
>> Sergey Pavlikovskiy wrote:
>>>
>>> Hi,
>>>
>>> probably, it's because of stemming
>>> if you need unstemmed text you can use 'textgen' data type for the field
>>>
>>> Sergey
>>>
>>> On Thu, Jan 28, 2010 at 12:25 PM, Solr user
>>> <um...@yahoo.co.in>wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>>  I am new to Solr. I found facets fields does not reflect the original
>>>> string in the record. For example,
>>>>
>>>> the returned xml is,
>>>>
>>>> - <doc>
>>>>  <str name="g_number">G-EUPE</str>
>>>> </doc>
>>>> - <lst name="facet_counts">
>>>>  <lst name="facet_queries" />
>>>> - <lst name="facet_fields">
>>>> -       <lst name="g_number">
>>>>  <int name="gupe">1</int>
>>>> </lst>
>>>>  </lst>
>>>> -  <lst name="facet_dates" />
>>>>  </lst>
>>>>
>>>> Here, "G-EUPE" is displayed under facet field as 'gupe' where it is not
>>>> capital and missing '-' from the original string. Is there any way we
>>>> could
>>>> fix this to match the original text in record? Thanks in advance.
>>>>
>>>> Regards,
>>>> uma
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/index-of-facet-fields-are-not-same-as-original-string-tp27353838p27353838.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/index-of-facet-fields-are-not-same-as-original-string-tp27353838p27364887.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com
> 
> 

-- 
View this message in context: http://old.nabble.com/index-of-facet-fields-are-not-same-as-original-string-tp27353838p27392314.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: index of facet fields are not same as original string

Posted by Lance Norskog <go...@gmail.com>.

After you change the schema.xml file, you have to rebuild the index
completely. At that point, g_number fields should not be stemmed.

You can examine what these text field types do.

http://wiki.apache.org/solr/FAQ#My_search_returns_too_many_.2BAC8_too_little_.2BAC8_unexpected_results.2C_how_to_debug.3F

http://www.lucidimagination.com/search/document/CDRG_ch05_5.9?q=analysis.jsp

On Thu, Jan 28, 2010 at 3:19 PM, Solr user <um...@yahoo.co.in> wrote:
>
> Hi Sergey,
>
> In schema.xml, i have got by default
>
>    <!-- A general unstemmed text field - good if one does not know the
> language of the field -->^M
>    <fieldType name="textgen" class="solr.TextField"
> positionIncrementGap="100">^M
>      <analyzer type="index">^M
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>^M
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />^M
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1" caten
> ateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>^M
>        <filter class="solr.LowerCaseFilterFactory"/>^M
>      </analyzer>^M
>      <analyzer type="query">^M
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>^M
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>^M
>        <filter class="solr.StopFilterFactory"^M
>                ignoreCase="true"^M
>                words="stopwords.txt"^M
>                enablePositionIncrements="true"^M
>                />^M
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0" caten
> ateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>^M
>        <filter class="solr.LowerCaseFilterFactory"/>^M
>      </analyzer>^M
>    </fieldType>^M
>
>
> and i added following entry in schema.xml file,
>
> <field name="g_number" type="textgen" indexed="true" stored="true"/>
>
> But it didnt help. Still the texts are not in original format. Correct me if
> i am wrong.
>
> Regards,
> Uma
>
> Sergey Pavlikovskiy wrote:
>>
>> Hi,
>>
>> probably, it's because of stemming
>> if you need unstemmed text you can use 'textgen' data type for the field
>>
>> Sergey
>>
>> On Thu, Jan 28, 2010 at 12:25 PM, Solr user
>> <um...@yahoo.co.in>wrote:
>>
>>>
>>> Hi,
>>>
>>>  I am new to Solr. I found facets fields does not reflect the original
>>> string in the record. For example,
>>>
>>> the returned xml is,
>>>
>>> - <doc>
>>>  <str name="g_number">G-EUPE</str>
>>> </doc>
>>> - <lst name="facet_counts">
>>>  <lst name="facet_queries" />
>>> - <lst name="facet_fields">
>>> -       <lst name="g_number">
>>>  <int name="gupe">1</int>
>>> </lst>
>>>  </lst>
>>> -  <lst name="facet_dates" />
>>>  </lst>
>>>
>>> Here, "G-EUPE" is displayed under facet field as 'gupe' where it is not
>>> capital and missing '-' from the original string. Is there any way we
>>> could
>>> fix this to match the original text in record? Thanks in advance.
>>>
>>> Regards,
>>> uma
>>> --
>>> View this message in context:
>>> http://old.nabble.com/index-of-facet-fields-are-not-same-as-original-string-tp27353838p27353838.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/index-of-facet-fields-are-not-same-as-original-string-tp27353838p27364887.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: index of facet fields are not same as original string

Posted by Solr user <um...@yahoo.co.in>.

Hi Sergey,

In schema.xml, i have got by default

    <!-- A general unstemmed text field - good if one does not know the
language of the field -->^M
    <fieldType name="textgen" class="solr.TextField"
positionIncrementGap="100">^M
      <analyzer type="index">^M
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>^M
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />^M
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1" caten
ateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>^M
        <filter class="solr.LowerCaseFilterFactory"/>^M
      </analyzer>^M
      <analyzer type="query">^M
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>^M
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>^M
        <filter class="solr.StopFilterFactory"^M
                ignoreCase="true"^M
                words="stopwords.txt"^M
                enablePositionIncrements="true"^M
                />^M
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0" caten
ateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>^M
        <filter class="solr.LowerCaseFilterFactory"/>^M
      </analyzer>^M
    </fieldType>^M


and i added following entry in schema.xml file,

<field name="g_number" type="textgen" indexed="true" stored="true"/>

But it didnt help. Still the texts are not in original format. Correct me if
i am wrong.

Regards,
Uma

Sergey Pavlikovskiy wrote:
> 
> Hi,
> 
> probably, it's because of stemming
> if you need unstemmed text you can use 'textgen' data type for the field
> 
> Sergey
> 
> On Thu, Jan 28, 2010 at 12:25 PM, Solr user
> <um...@yahoo.co.in>wrote:
> 
>>
>> Hi,
>>
>>  I am new to Solr. I found facets fields does not reflect the original
>> string in the record. For example,
>>
>> the returned xml is,
>>
>> - <doc>
>>  <str name="g_number">G-EUPE</str>
>> </doc>
>> - <lst name="facet_counts">
>>  <lst name="facet_queries" />
>> - <lst name="facet_fields">
>> -       <lst name="g_number">
>>  <int name="gupe">1</int>
>> </lst>
>>  </lst>
>> -  <lst name="facet_dates" />
>>  </lst>
>>
>> Here, "G-EUPE" is displayed under facet field as 'gupe' where it is not
>> capital and missing '-' from the original string. Is there any way we
>> could
>> fix this to match the original text in record? Thanks in advance.
>>
>> Regards,
>> uma
>> --
>> View this message in context:
>> http://old.nabble.com/index-of-facet-fields-are-not-same-as-original-string-tp27353838p27353838.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/index-of-facet-fields-are-not-same-as-original-string-tp27353838p27364887.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: index of facet fields are not same as original string

Posted by Sergey Pavlikovskiy <pa...@gmail.com>.

Hi,

probably, it's because of stemming
if you need unstemmed text you can use 'textgen' data type for the field

Sergey

On Thu, Jan 28, 2010 at 12:25 PM, Solr user <um...@yahoo.co.in>wrote:

>
> Hi,
>
>  I am new to Solr. I found facets fields does not reflect the original
> string in the record. For example,
>
> the returned xml is,
>
> - <doc>
>  <str name="g_number">G-EUPE</str>
> </doc>
> - <lst name="facet_counts">
>  <lst name="facet_queries" />
> - <lst name="facet_fields">
> -       <lst name="g_number">
>  <int name="gupe">1</int>
> </lst>
>  </lst>
> -  <lst name="facet_dates" />
>  </lst>
>
> Here, "G-EUPE" is displayed under facet field as 'gupe' where it is not
> capital and missing '-' from the original string. Is there any way we could
> fix this to match the original text in record? Thanks in advance.
>
> Regards,
> uma
> --
> View this message in context:
> http://old.nabble.com/index-of-facet-fields-are-not-same-as-original-string-tp27353838p27353838.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>