You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Fergus McMenemie <fe...@twig.me.uk> on 2009/09/30 08:21:53 UTC

Number of terms in a SOLR field

Hi all,

I am attempting to test some changes I made to my DIH based
indexing process. The changes only affect the way I 
describe my fields in data-config.xml, there should be no
changes to the way the data is indexed or stored.

As a QA check I was wanting to compare the results from
indexing the same data before/after the change. I was looking
for a way of getting counts of terms in each field. I 
guess Luke etc most allow this but how?

Regards Fergus.

Re: Number of terms in a SOLR field

Posted by Fergus McMenemie <fe...@twig.me.uk>.
>Fergus McMenemie wrote:
>>> Fergus McMenemie wrote:
>>>> Hi all,
>>>>
>>>> I am attempting to test some changes I made to my DIH based
>>>> indexing process. The changes only affect the way I 
>>>> describe my fields in data-config.xml, there should be no
>>>> changes to the way the data is indexed or stored.
>>>>
>>>> As a QA check I was wanting to compare the results from
>>>> indexing the same data before/after the change. I was looking
>>>> for a way of getting counts of terms in each field. I 
>>>> guess Luke etc most allow this but how?
>>> Luke uses brute force approach - it traverses all terms, and counts 
>>> terms per field. This is easy to implement yourself - just get 
>>> IndexReader.terms() enumeration and traverse it.
>>>
>> Thanks Andrzej 
>> 
>> This is just a one off QA check. How do I get Luke to display
>> terms and counts?
>
>1. get Luke 0.9.9
>2. open index with Luke
>3. Look at the Overview panel, you will see the list titled "Available 
>fields and term counts per field".
>
>
Thanks,

That got me going, and I felt a little stupid after stumbling
across http://wiki.apache.org/solr/LukeRequestHandler

Regards Fergus

Re: Number of terms in a SOLR field

Posted by Andrzej Bialecki <ab...@getopt.org>.
Fergus McMenemie wrote:
>> Fergus McMenemie wrote:
>>> Hi all,
>>>
>>> I am attempting to test some changes I made to my DIH based
>>> indexing process. The changes only affect the way I 
>>> describe my fields in data-config.xml, there should be no
>>> changes to the way the data is indexed or stored.
>>>
>>> As a QA check I was wanting to compare the results from
>>> indexing the same data before/after the change. I was looking
>>> for a way of getting counts of terms in each field. I 
>>> guess Luke etc most allow this but how?
>> Luke uses brute force approach - it traverses all terms, and counts 
>> terms per field. This is easy to implement yourself - just get 
>> IndexReader.terms() enumeration and traverse it.
>>
> Thanks Andrzej 
> 
> This is just a one off QA check. How do I get Luke to display
> terms and counts?

1. get Luke 0.9.9
2. open index with Luke
3. Look at the Overview panel, you will see the list titled "Available 
fields and term counts per field".


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Number of terms in a SOLR field

Posted by Fergus McMenemie <fe...@twig.me.uk>.
>Fergus McMenemie wrote:
>> Hi all,
>> 
>> I am attempting to test some changes I made to my DIH based
>> indexing process. The changes only affect the way I 
>> describe my fields in data-config.xml, there should be no
>> changes to the way the data is indexed or stored.
>> 
>> As a QA check I was wanting to compare the results from
>> indexing the same data before/after the change. I was looking
>> for a way of getting counts of terms in each field. I 
>> guess Luke etc most allow this but how?
>
>Luke uses brute force approach - it traverses all terms, and counts 
>terms per field. This is easy to implement yourself - just get 
>IndexReader.terms() enumeration and traverse it.
>
Thanks Andrzej 

This is just a one off QA check. How do I get Luke to display
terms and counts?

>
>-- 
>Best regards,
>Andrzej Bialecki 

Fergus.  
-- 

Re: Number of terms in a SOLR field

Posted by Andrzej Bialecki <ab...@getopt.org>.
Fergus McMenemie wrote:
> Hi all,
> 
> I am attempting to test some changes I made to my DIH based
> indexing process. The changes only affect the way I 
> describe my fields in data-config.xml, there should be no
> changes to the way the data is indexed or stored.
> 
> As a QA check I was wanting to compare the results from
> indexing the same data before/after the change. I was looking
> for a way of getting counts of terms in each field. I 
> guess Luke etc most allow this but how?

Luke uses brute force approach - it traverses all terms, and counts 
terms per field. This is easy to implement yourself - just get 
IndexReader.terms() enumeration and traverse it.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com