You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Saman Rasheed <sa...@hotmail.com> on 2017/04/26 17:14:51 UTC

counting_number_of_term_in_a_doc

Hi, I've been trying to figure out how to return the (number) of  matching words in a regex term lookup with no luck.


Basically i have a large text document indexed, next when i do a regex term lookup like the following:


http://localhost:8983/solr/core1/terms?terms.fl=content&terms.regex=.*term.*&terms.limit=10000


That returns all the words (up to 1000) that are either an exact match, start, end or contain the word 'term' successfully, see below:


<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">6</int>
</lst>
<lst name="terms">
<lst name="content">
<int name="buttermilk">1</int>
<int name="determine">1</int>
<int name="determined">1</int>
<int name="determines">1</int>
<int name="exterminated">1</int>
<int name="indeterminable">1</int>
<int name="indeterminate">1</int>
<int name="intermediate">1</int>
<int name="intermitting">1</int>
<int name="intermixed">1</int>
<int name="term">1</int>
<int name="terminated">1</int>
<int name="terminating">1</int>
<int name="terminus">1</int>
<int name="terms">1</int>
<int name="watermelon">1</int>
</lst>
</lst>
</response>


What i need is the syntax to produce e.g. how many times the word 'min' or 'term' exists in that document either as term by itself or part of another term?


At the moment it only tells me that it occurs in '1' document which can be useful later on.


I've been looking at the cwiki page: https://cwiki.apache.org/confluence/display/solr/The+Terms+Component


and other articles on the net with no luck.


Can you please help.


Many thanks.


Re: counting_number_of_term_in_a_doc

Posted by "alessandro.benedetti" <a....@sease.io>.
I think the closest you get out of the box is the term vector component[1] .

Cheers

[1]
https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/counting-number-of-term-in-a-doc-tp4332032p4332161.html
Sent from the Solr - User mailing list archive at Nabble.com.