You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Saman Rasheed <sa...@hotmail.com> on 2017/04/26 17:14:51 UTC
counting_number_of_term_in_a_doc
Hi, I've been trying to figure out how to return the (number) of matching words in a regex term lookup with no luck.
Basically i have a large text document indexed, next when i do a regex term lookup like the following:
http://localhost:8983/solr/core1/terms?terms.fl=content&terms.regex=.*term.*&terms.limit=10000
That returns all the words (up to 1000) that are either an exact match, start, end or contain the word 'term' successfully, see below:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">6</int>
</lst>
<lst name="terms">
<lst name="content">
<int name="buttermilk">1</int>
<int name="determine">1</int>
<int name="determined">1</int>
<int name="determines">1</int>
<int name="exterminated">1</int>
<int name="indeterminable">1</int>
<int name="indeterminate">1</int>
<int name="intermediate">1</int>
<int name="intermitting">1</int>
<int name="intermixed">1</int>
<int name="term">1</int>
<int name="terminated">1</int>
<int name="terminating">1</int>
<int name="terminus">1</int>
<int name="terms">1</int>
<int name="watermelon">1</int>
</lst>
</lst>
</response>
What i need is the syntax to produce e.g. how many times the word 'min' or 'term' exists in that document either as term by itself or part of another term?
At the moment it only tells me that it occurs in '1' document which can be useful later on.
I've been looking at the cwiki page: https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
and other articles on the net with no luck.
Can you please help.
Many thanks.
Re: counting_number_of_term_in_a_doc
Posted by "alessandro.benedetti" <a....@sease.io>.
I think the closest you get out of the box is the term vector component[1] .
Cheers
[1]
https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: http://lucene.472066.n3.nabble.com/counting-number-of-term-in-a-doc-tp4332032p4332161.html
Sent from the Solr - User mailing list archive at Nabble.com.