You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nitin Solanki <ni...@gmail.com> on 2015/01/23 08:56:29 UTC

Count total frequency of a word in a SOLR index

I indexed some text_file files in Solr as it is. Applied "
*StandardTokenizerFactory*" and "*ShingleFilterFactory*" on text_file field

*Configuration of Schema.xml structure below :*
<field name="id" type="string" indexed="true" stored="true" required="true"
multiValued="false" />
<field name="text_file" type="textSpell" indexed="true" stored="true"
required="true" multiValued="false"/>










*<fieldType name="textSpell" class="solr.TextField"
positionIncrementGap="100">       <analyzer
type="index">                             <tokenizer
class="solr.StandardTokenizerFactory"/>                             <filter
class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2"
outputUnigrams="true"/>       </analyzer>       <analyzer
type="query">                             <tokenizer
class="solr.StandardTokenizerFactory"/>                             <filter
class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2"
outputUnigrams="true"/>      </analyzer></fieldType>*

*Stored Documents like:*
*[{"id":"1", "text_file": "text": "text of document"}, {"id":"2",
"text_file": "text": "text of document"} and so on ]*

*Problem* : If I search a word in a SOLR index I get a document count for
documents which contain this word, but if the word is included more times
in a document, the total count is still 1 per document. I need every
returned document is counted for the number of times they have the searched
word in the field. *Example* :I see a "numFound" value of 12, but the word
"what" is included 20 times in all 12 documents. Could you help me to find
where I'm wrong, please?

Re: Count total frequency of a word in a SOLR index

Posted by Nitin Solanki <ni...@gmail.com>.
Ok.. Is there any to use user-defined field instead of word and freq in
suggestion block ?

On Fri, Jan 23, 2015 at 2:33 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> I don't think it's implemented.
> I can propose to send the first request to termsComponent, that yields
> terms by prefix, then the second request can gather totaltermfreqs.
>
> On Fri, Jan 23, 2015 at 11:51 AM, Nitin Solanki <ni...@gmail.com>
> wrote:
>
> > Thanks Mikhail Khludnev..
> > I tried this:
> > *
> >
> http://localhost:8983/solr/collection1/spell?q=gram:%22the%22&rows=1&fl=totaltermfreq(gram,the)
> > <
> >
> http://localhost:8983/solr/collection1/spell?q=gram:%22the%22&rows=1&fl=totaltermfreq(gram,the)
> > >*
> > and it worked.
> > I want to know more. Can we do same thing *(totaltermfreq)* on
> suggestions
> > ? I tried "th" and get "the" is suggestion. I want to retrieve term
> > frequency not document frequency even in the suggestions. Can I do that?
> >
> > *Instance of suggestions: *
> > <lst>
> > <str name="word">the</str>
> > <int name="freq">897</int>  *Here -* freq is Document frequency. I need
> > Term frequency
> > </lst>
> >
> >
> >
> > On Fri, Jan 23, 2015 at 1:53 PM, Mikhail Khludnev <
> > mkhludnev@griddynamics.com> wrote:
> >
> > > https://cwiki.apache.org/confluence/display/solr/Function+Queries
> > > totaltermfreq()
> > >
> > > of you need to sum term freq on docs from resultset?
> > >
> > >
> > > On Fri, Jan 23, 2015 at 10:56 AM, Nitin Solanki <ni...@gmail.com>
> > > wrote:
> > >
> > > > I indexed some text_file files in Solr as it is. Applied "
> > > > *StandardTokenizerFactory*" and "*ShingleFilterFactory*" on text_file
> > > field
> > > >
> > > > *Configuration of Schema.xml structure below :*
> > > > <field name="id" type="string" indexed="true" stored="true"
> > > required="true"
> > > > multiValued="false" />
> > > > <field name="text_file" type="textSpell" indexed="true" stored="true"
> > > > required="true" multiValued="false"/>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *<fieldType name="textSpell" class="solr.TextField"
> > > > positionIncrementGap="100">       <analyzer
> > > > type="index">                             <tokenizer
> > > > class="solr.StandardTokenizerFactory"/>
> > >  <filter
> > > > class="solr.ShingleFilterFactory" maxShingleSize="5"
> minShingleSize="2"
> > > > outputUnigrams="true"/>       </analyzer>       <analyzer
> > > > type="query">                             <tokenizer
> > > > class="solr.StandardTokenizerFactory"/>
> > >  <filter
> > > > class="solr.ShingleFilterFactory" maxShingleSize="5"
> minShingleSize="2"
> > > > outputUnigrams="true"/>      </analyzer></fieldType>*
> > > >
> > > > *Stored Documents like:*
> > > > *[{"id":"1", "text_file": "text": "text of document"}, {"id":"2",
> > > > "text_file": "text": "text of document"} and so on ]*
> > > >
> > > > *Problem* : If I search a word in a SOLR index I get a document count
> > for
> > > > documents which contain this word, but if the word is included more
> > times
> > > > in a document, the total count is still 1 per document. I need every
> > > > returned document is counted for the number of times they have the
> > > searched
> > > > word in the field. *Example* :I see a "numFound" value of 12, but the
> > > word
> > > > "what" is included 20 times in all 12 documents. Could you help me to
> > > find
> > > > where I'm wrong, please?
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > <http://www.griddynamics.com>
> > > <mk...@griddynamics.com>
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mk...@griddynamics.com>
>

Re: Count total frequency of a word in a SOLR index

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
I don't think it's implemented.
I can propose to send the first request to termsComponent, that yields
terms by prefix, then the second request can gather totaltermfreqs.

On Fri, Jan 23, 2015 at 11:51 AM, Nitin Solanki <ni...@gmail.com>
wrote:

> Thanks Mikhail Khludnev..
> I tried this:
> *
> http://localhost:8983/solr/collection1/spell?q=gram:%22the%22&rows=1&fl=totaltermfreq(gram,the)
> <
> http://localhost:8983/solr/collection1/spell?q=gram:%22the%22&rows=1&fl=totaltermfreq(gram,the)
> >*
> and it worked.
> I want to know more. Can we do same thing *(totaltermfreq)* on suggestions
> ? I tried "th" and get "the" is suggestion. I want to retrieve term
> frequency not document frequency even in the suggestions. Can I do that?
>
> *Instance of suggestions: *
> <lst>
> <str name="word">the</str>
> <int name="freq">897</int>  *Here -* freq is Document frequency. I need
> Term frequency
> </lst>
>
>
>
> On Fri, Jan 23, 2015 at 1:53 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
> > https://cwiki.apache.org/confluence/display/solr/Function+Queries
> > totaltermfreq()
> >
> > of you need to sum term freq on docs from resultset?
> >
> >
> > On Fri, Jan 23, 2015 at 10:56 AM, Nitin Solanki <ni...@gmail.com>
> > wrote:
> >
> > > I indexed some text_file files in Solr as it is. Applied "
> > > *StandardTokenizerFactory*" and "*ShingleFilterFactory*" on text_file
> > field
> > >
> > > *Configuration of Schema.xml structure below :*
> > > <field name="id" type="string" indexed="true" stored="true"
> > required="true"
> > > multiValued="false" />
> > > <field name="text_file" type="textSpell" indexed="true" stored="true"
> > > required="true" multiValued="false"/>
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > *<fieldType name="textSpell" class="solr.TextField"
> > > positionIncrementGap="100">       <analyzer
> > > type="index">                             <tokenizer
> > > class="solr.StandardTokenizerFactory"/>
> >  <filter
> > > class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2"
> > > outputUnigrams="true"/>       </analyzer>       <analyzer
> > > type="query">                             <tokenizer
> > > class="solr.StandardTokenizerFactory"/>
> >  <filter
> > > class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2"
> > > outputUnigrams="true"/>      </analyzer></fieldType>*
> > >
> > > *Stored Documents like:*
> > > *[{"id":"1", "text_file": "text": "text of document"}, {"id":"2",
> > > "text_file": "text": "text of document"} and so on ]*
> > >
> > > *Problem* : If I search a word in a SOLR index I get a document count
> for
> > > documents which contain this word, but if the word is included more
> times
> > > in a document, the total count is still 1 per document. I need every
> > > returned document is counted for the number of times they have the
> > searched
> > > word in the field. *Example* :I see a "numFound" value of 12, but the
> > word
> > > "what" is included 20 times in all 12 documents. Could you help me to
> > find
> > > where I'm wrong, please?
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> > <mk...@griddynamics.com>
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: Count total frequency of a word in a SOLR index

Posted by Nitin Solanki <ni...@gmail.com>.
Thanks Mikhail Khludnev..
I tried this:
*http://localhost:8983/solr/collection1/spell?q=gram:%22the%22&rows=1&fl=totaltermfreq(gram,the)
<http://localhost:8983/solr/collection1/spell?q=gram:%22the%22&rows=1&fl=totaltermfreq(gram,the)>*
and it worked.
I want to know more. Can we do same thing *(totaltermfreq)* on suggestions
? I tried "th" and get "the" is suggestion. I want to retrieve term
frequency not document frequency even in the suggestions. Can I do that?

*Instance of suggestions: *
<lst>
<str name="word">the</str>
<int name="freq">897</int>  *Here -* freq is Document frequency. I need
Term frequency
</lst>



On Fri, Jan 23, 2015 at 1:53 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> https://cwiki.apache.org/confluence/display/solr/Function+Queries
> totaltermfreq()
>
> of you need to sum term freq on docs from resultset?
>
>
> On Fri, Jan 23, 2015 at 10:56 AM, Nitin Solanki <ni...@gmail.com>
> wrote:
>
> > I indexed some text_file files in Solr as it is. Applied "
> > *StandardTokenizerFactory*" and "*ShingleFilterFactory*" on text_file
> field
> >
> > *Configuration of Schema.xml structure below :*
> > <field name="id" type="string" indexed="true" stored="true"
> required="true"
> > multiValued="false" />
> > <field name="text_file" type="textSpell" indexed="true" stored="true"
> > required="true" multiValued="false"/>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *<fieldType name="textSpell" class="solr.TextField"
> > positionIncrementGap="100">       <analyzer
> > type="index">                             <tokenizer
> > class="solr.StandardTokenizerFactory"/>
>  <filter
> > class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2"
> > outputUnigrams="true"/>       </analyzer>       <analyzer
> > type="query">                             <tokenizer
> > class="solr.StandardTokenizerFactory"/>
>  <filter
> > class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2"
> > outputUnigrams="true"/>      </analyzer></fieldType>*
> >
> > *Stored Documents like:*
> > *[{"id":"1", "text_file": "text": "text of document"}, {"id":"2",
> > "text_file": "text": "text of document"} and so on ]*
> >
> > *Problem* : If I search a word in a SOLR index I get a document count for
> > documents which contain this word, but if the word is included more times
> > in a document, the total count is still 1 per document. I need every
> > returned document is counted for the number of times they have the
> searched
> > word in the field. *Example* :I see a "numFound" value of 12, but the
> word
> > "what" is included 20 times in all 12 documents. Could you help me to
> find
> > where I'm wrong, please?
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mk...@griddynamics.com>
>

Re: Count total frequency of a word in a SOLR index

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
https://cwiki.apache.org/confluence/display/solr/Function+Queries
totaltermfreq()

of you need to sum term freq on docs from resultset?


On Fri, Jan 23, 2015 at 10:56 AM, Nitin Solanki <ni...@gmail.com>
wrote:

> I indexed some text_file files in Solr as it is. Applied "
> *StandardTokenizerFactory*" and "*ShingleFilterFactory*" on text_file field
>
> *Configuration of Schema.xml structure below :*
> <field name="id" type="string" indexed="true" stored="true" required="true"
> multiValued="false" />
> <field name="text_file" type="textSpell" indexed="true" stored="true"
> required="true" multiValued="false"/>
>
>
>
>
>
>
>
>
>
>
> *<fieldType name="textSpell" class="solr.TextField"
> positionIncrementGap="100">       <analyzer
> type="index">                             <tokenizer
> class="solr.StandardTokenizerFactory"/>                             <filter
> class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2"
> outputUnigrams="true"/>       </analyzer>       <analyzer
> type="query">                             <tokenizer
> class="solr.StandardTokenizerFactory"/>                             <filter
> class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2"
> outputUnigrams="true"/>      </analyzer></fieldType>*
>
> *Stored Documents like:*
> *[{"id":"1", "text_file": "text": "text of document"}, {"id":"2",
> "text_file": "text": "text of document"} and so on ]*
>
> *Problem* : If I search a word in a SOLR index I get a document count for
> documents which contain this word, but if the word is included more times
> in a document, the total count is still 1 per document. I need every
> returned document is counted for the number of times they have the searched
> word in the field. *Example* :I see a "numFound" value of 12, but the word
> "what" is included 20 times in all 12 documents. Could you help me to find
> where I'm wrong, please?
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>