You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Ali, Saqib" <do...@gmail.com> on 2013/09/10 03:22:04 UTC

find all two word phrases that appear in more than one document

Dear Solr Ninjas,

We would like to run a query that returns two word phrases that appear in
more than one document. So for e.g. take the string "Solr Ninja". Since it
appears in more than one document in our Solr instance, the query should
return that. The query should  find all such phrases from all the documents
in our Solr instance, by querying for two adjacent word combination
(forming a phrase) in the documents that are in the Solr. These two
adjacent word combinations should come from the documents in the Solr index.

Any ideas on how to write this query?

Thanks.

Re: find all two word phrases that appear in more than one document

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I believe one of the admin pages (Solr 4+) shows all the terms and
frequencies. You can use that even with stock example. Try that. If that
makes sense, you can explore further.

As to other examples, there is a couple of books. I bet Jack's book covers
this.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Sep 10, 2013 at 12:09 PM, Ali, Saqib <do...@gmail.com> wrote:

> Thanks Alexandre. I looked at the wiki page for the TermsComponent. But I
> am not sure if I follow. Do you have an example or some better document?
> Thanks! :)
>
>
> On Mon, Sep 9, 2013 at 8:17 PM, Alexandre Rafalovitch <arafalov@gmail.com
> >wrote:
>
> > The "phases" are usually called n-grams or shingles.
> >
> > You can probably use ShingleFilterFactory to create your shingles
> (possibly
> > with outputUnigrams=false) and then use TermsComponent (
> > http://wiki.apache.org/solr/TermsComponent) to list the results.
> >
> > Regards,
> >    Alex.
> >
> > Personal website: http://www.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib <do...@gmail.com>
> wrote:
> >
> > > Dear Solr Ninjas,
> > >
> > > We would like to run a query that returns two word phrases that appear
> in
> > > more than one document. So for e.g. take the string "Solr Ninja". Since
> > it
> > > appears in more than one document in our Solr instance, the query
> should
> > > return that. The query should  find all such phrases from all the
> > documents
> > > in our Solr instance, by querying for two adjacent word combination
> > > (forming a phrase) in the documents that are in the Solr. These two
> > > adjacent word combinations should come from the documents in the Solr
> > > index.
> > >
> > > Any ideas on how to write this query?
> > >
> > > Thanks.
> > >
> >
>

Re: find all two word phrases that appear in more than one document

Posted by "Ali, Saqib" <do...@gmail.com>.
Thanks Alexandre. I looked at the wiki page for the TermsComponent. But I
am not sure if I follow. Do you have an example or some better document?
Thanks! :)


On Mon, Sep 9, 2013 at 8:17 PM, Alexandre Rafalovitch <ar...@gmail.com>wrote:

> The "phases" are usually called n-grams or shingles.
>
> You can probably use ShingleFilterFactory to create your shingles (possibly
> with outputUnigrams=false) and then use TermsComponent (
> http://wiki.apache.org/solr/TermsComponent) to list the results.
>
> Regards,
>    Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib <do...@gmail.com> wrote:
>
> > Dear Solr Ninjas,
> >
> > We would like to run a query that returns two word phrases that appear in
> > more than one document. So for e.g. take the string "Solr Ninja". Since
> it
> > appears in more than one document in our Solr instance, the query should
> > return that. The query should  find all such phrases from all the
> documents
> > in our Solr instance, by querying for two adjacent word combination
> > (forming a phrase) in the documents that are in the Solr. These two
> > adjacent word combinations should come from the documents in the Solr
> > index.
> >
> > Any ideas on how to write this query?
> >
> > Thanks.
> >
>

Re: find all two word phrases that appear in more than one document

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
The "phases" are usually called n-grams or shingles.

You can probably use ShingleFilterFactory to create your shingles (possibly
with outputUnigrams=false) and then use TermsComponent (
http://wiki.apache.org/solr/TermsComponent) to list the results.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib <do...@gmail.com> wrote:

> Dear Solr Ninjas,
>
> We would like to run a query that returns two word phrases that appear in
> more than one document. So for e.g. take the string "Solr Ninja". Since it
> appears in more than one document in our Solr instance, the query should
> return that. The query should  find all such phrases from all the documents
> in our Solr instance, by querying for two adjacent word combination
> (forming a phrase) in the documents that are in the Solr. These two
> adjacent word combinations should come from the documents in the Solr
> index.
>
> Any ideas on how to write this query?
>
> Thanks.
>