You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by karthik <km...@gmail.com> on 2012/04/02 21:41:50 UTC

viewing the terms indexed for a specific document

Hi,

I am trying to view what terms are getting indexed for a specific field in
a specific document. How can i view this information?

I tried the luke handler & it's not showing me what I am looking for. I am
using Solr 3.1.0.

I am using index time synonym expansion & saw that one of my synonym was
not working. In general synonyms are working since there are many other
cases where they are working. So to debug this issue I wanted to see if the
synonym for the word is stored within the field for a given document inside
the index. Luke showed me the actual string from the document but not the
synonym.

I tested luke on a different document which gets returned while using a
synonym and I dont see the synonym term in the field "<str name="value">"
or "<str name="internal">" of the luke handler.

Any pointers on how to view the actual indexed term would be helpful.

Thanks,
Karthik

Re: viewing the terms indexed for a specific document

Posted by kmohanas <km...@gmail.com>.
I wiped out my entire index & tried to index only 2 - 3 docs along with the
problematic document. I figured out the issue now. It was caused due to the
presence of the 2 filters -

-------
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[,\.'?!@#%^*()_~\[\]{}\\+=`;:]" replacement=""/>
                        <charFilter
class="solr.PatternReplaceCharFilterFactory" pattern="[-/|]" replacement="
"/>
------

The synonym had some of these special characters in there & that was causing
it not to be picked up while indexing. 

I saw that the other docs had synonyms applied to them, so I started digging
into why this doc alone didnt have its synonyms picked up & narrowed it down
to the issue.

--
View this message in context: http://lucene.472066.n3.nabble.com/viewing-the-terms-indexed-for-a-specific-document-tp3878783p3882343.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: viewing the terms indexed for a specific document

Posted by kmohanas <km...@gmail.com>.
I tried adding &explainOther=<id> and I didn't see any proper explanation
getting returned.

I am using index time synonym. I got luke 3.5.0 and read the index & pulled
up the document in question, but I still don't see the synonym being present
as part of the field. I tried to pull another document which gets returned
due to a synonym (different synonym though) and even for that document I
don't see that synonym term being present in the field.

My schema.xml definition for the field & its field type are:

-------
<field name="name" type="textEnglish" indexed="true" stored="true"/>

<fieldType name="textEnglish" class="solr.TextField" >
        
                <analyzer type="index">
                        <charFilter
class="solr.HTMLStripCharFilterFactory"/>
                        <charFilter
class="solr.PatternReplaceCharFilterFactory"
pattern="[,\.'?!@#%^*()_~\[\]{}\\+=`;:]" replacement=""/>
                        <charFilter
class="solr.PatternReplaceCharFilterFactory" pattern="[-/|]" replacement="
"/>
                        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                        <filter class="solr.ASCIIFoldingFilterFactory"/>
                        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                        <filter class="solr.LowerCaseFilterFactory"/>
                        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
preserveOriginal="1"/>
                        <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/>
                        <filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
                </analyzer>
                <analyzer type="query">
                        <charFilter
class="solr.HTMLStripCharFilterFactory"/>
                        <charFilter
class="solr.PatternReplaceCharFilterFactory"
pattern="[,\.'!@#%^()_~\[\]{}\\+=`;:]" replacement=""/>
                        <charFilter
class="solr.PatternReplaceCharFilterFactory" pattern="[-/|]" replacement="
"/>
                        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                        <filter class="solr.ASCIIFoldingFilterFactory"/>
                        <filter class="solr.LowerCaseFilterFactory"/>
                        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
preserveOriginal="1"/>
                        <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/>
                        <filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
                </analyzer>
        </fieldType>

-------

--
View this message in context: http://lucene.472066.n3.nabble.com/viewing-the-terms-indexed-for-a-specific-document-tp3878783p3881826.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: viewing the terms indexed for a specific document

Posted by Erick Erickson <er...@gmail.com>.
If you add &explainOther=<some id>, see:
http://wiki.apache.org/solr/SolrRelevancyFAQ

you might get some hints. You can use the TermsComponent
to see if the synonyms are getting in the index, but you'll
have to have a very restricted input set (like one doc) for that
to be helpful for a specific document.

Ahhh, try getting the stand-alone Luke program, it allows
a lower-level exploration of the index, see:
http://code.google.com/p/luke/
The LukeRequestHandler is based on Luke, but Luke
itself is more flexible.....

When are you putting synonyms in? Index time? Query time?
Both? Showing your schema.xml fragment for the field
in question would help diagnose the problem, as would
showing the results of attaching &debugQuery=on to the
URL.

Best
Erick

On Mon, Apr 2, 2012 at 4:26 PM, karthik <km...@gmail.com> wrote:
> A few more details to this thread -
>
> when i try the analysis tab from the admin console I see that the synonym
> is kicking in & its matching the text in the document that I am expecting
> to see as part of the results. However the actual search is not returning
> that document.
>
> Also I used the termcomponent and tried to see how many docs match the
> synonym term & i don't see the term at all.
>
> So not sure how to check if this is working or not.
>
> Thanks,
> Karthik
>
> On Mon, Apr 2, 2012 at 3:41 PM, karthik <km...@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to view what terms are getting indexed for a specific field in
>> a specific document. How can i view this information?
>>
>> I tried the luke handler & it's not showing me what I am looking for. I am
>> using Solr 3.1.0.
>>
>> I am using index time synonym expansion & saw that one of my synonym was
>> not working. In general synonyms are working since there are many other
>> cases where they are working. So to debug this issue I wanted to see if the
>> synonym for the word is stored within the field for a given document inside
>> the index. Luke showed me the actual string from the document but not the
>> synonym.
>>
>> I tested luke on a different document which gets returned while using a
>> synonym and I dont see the synonym term in the field "<str name="value">"
>> or "<str name="internal">" of the luke handler.
>>
>> Any pointers on how to view the actual indexed term would be helpful.
>>
>> Thanks,
>> Karthik
>>

Re: viewing the terms indexed for a specific document

Posted by karthik <km...@gmail.com>.
A few more details to this thread -

when i try the analysis tab from the admin console I see that the synonym
is kicking in & its matching the text in the document that I am expecting
to see as part of the results. However the actual search is not returning
that document.

Also I used the termcomponent and tried to see how many docs match the
synonym term & i don't see the term at all.

So not sure how to check if this is working or not.

Thanks,
Karthik

On Mon, Apr 2, 2012 at 3:41 PM, karthik <km...@gmail.com> wrote:

> Hi,
>
> I am trying to view what terms are getting indexed for a specific field in
> a specific document. How can i view this information?
>
> I tried the luke handler & it's not showing me what I am looking for. I am
> using Solr 3.1.0.
>
> I am using index time synonym expansion & saw that one of my synonym was
> not working. In general synonyms are working since there are many other
> cases where they are working. So to debug this issue I wanted to see if the
> synonym for the word is stored within the field for a given document inside
> the index. Luke showed me the actual string from the document but not the
> synonym.
>
> I tested luke on a different document which gets returned while using a
> synonym and I dont see the synonym term in the field "<str name="value">"
> or "<str name="internal">" of the luke handler.
>
> Any pointers on how to view the actual indexed term would be helpful.
>
> Thanks,
> Karthik
>