You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by James T <co...@gmail.com> on 2009/07/20 16:35:56 UTC

Implementing related tags

Hi,

I have a specific requirement for searching and looking for some help from
the community on how to achieve it using solr:

I need to index 1million + documents. Each document contains ( among other
fields ) 3 fields representing the category which that doc belongs to. For
example ( a very simplied case to make it easier to explain )

Doc 1
  Place : NY, Paris, Tokyo
  Authors: AuthorA, AuthorB, AuthorC, AuthorD
  Tags: tagA, tagB, ballon

Doc 2
  Place : Bangkok
  Authors: AuthorD
  Tags: tagZ

So each doc can contain multiple values for each of above fields ( place,
author, tags )

Now the searching requirements is that, by constrainting on one of the
value, I need a search on related fields.

Example: By giving a constraint "Author: AuthorD", I need a search on the
search space:
    Place: Ny, Paris, Tokyo and London
    Author: AuthorA, AuthorB, AuthorC,
    Tags: tagA, tagB and tagZ
( The above result is generated by the fact that every item in the result
has atleast 1 doc in common with "AuthorD" )

So as I am typing "Ba", I need to get Ballon and Bangkok ( These Tags and
Places have atleast 1doc where it also had AuthorD )

Is such a system possible to implement using solr?

Thanks!

Re: Implementing related tags

Posted by Bill Au <bi...@gmail.com>.
Faceting on tags will give you all the related tags, including the original
tag (tagA in your case).  You will have to filter out the original tag on
the client side if you don't want to show that.  With Solar 1.4, you will be
able to use localParam to exclude the original tag in the results.  If you
tags field is analyzed, you will want to facet on a raw copy (using copy
field) of the tags.

If you want related tags that starts with ba, you can use a facet.prefix;

q=tags:tagA&facet=true&facet.mincount=1&facet.perfix=ab

Bill

On Mon, Jul 20, 2009 at 2:40 PM, Avlesh Singh <av...@gmail.com> wrote:

> If I understood your problem correctly, faceting on "tags" field is what
> you
> need. Try this -
> http://localhost:8983/solr/ <goog_1248106219337>
> memoir/select?fq=tag:tagA&q=( <goog_1248106219337>
> tags%3Aba*)&facet=true&facet.field=tags&facet.mincount=1<
> http://localhost:8983/solr/memoir/select?fq=tag:tagA&q=%28tags%3Aba*%29&facet=true&facet.field=tags&facet.mincount=1
> >
>
> Notice the usage of facet parameters. Locate the "facet_counts" section in
> your response. If this is what you were looking for, then
> http://wiki.apache.org/solr/SimpleFacetParameters might be a good read.
>
> Cheers
> Avlesh
>
> On Mon, Jul 20, 2009 at 11:37 PM, James T
> <co...@gmail.com>wrote:
>
> > That does not seem to work fine. To further simplify the issue, assuming
> > there is a multi valued tag field and number of docs is 1 million. By
> > constrainting on a given tag, I need to search on the related tags.
> >
> > So
> > Doc 1:
> >   tags: tagA, tagB, tagC, ball
> > Doc 2:
> >   tags: tagA, bat
> >
> > Now constrainting on "tagA" and searching for "ba*",  I need something
> like
> > http://localhost:8983/solr/memoir/select?fq=tag:tagA&q=(tags%3Aba*)<http://localhost:8983/solr/memoir/select?fq=tag:tagA&q=%28tags%3Aba*%29>
> <http://localhost:8983/solr/memoir/select?fq=tag:tagA&q=%28tags%3Aba*%29>and
> just
> > return the related tags ( not the docs where that tag is present )
> >
> > "tagA" maybe present in 20K docs ( of 1 million docs), but "tagA" might
> > have
> > totally 100 other related tags ( i.e those 100 tags had appeared with
> > "tagA"
> > in atleast 1 doc ). So the search space ( by constrainting on "tagA" ) is
> > 100 and not 1million.
> >
> > Hope that helps in explaining the issue better.
> >
> > Thanks!
> >
> >
> > On Mon, Jul 20, 2009 at 9:51 PM, Avlesh Singh <av...@gmail.com> wrote:
> >
> > > Have a look at the MoreLikeThis component -
> > > http://wiki.apache.org/solr/MoreLikeThis
> > >
> > > Cheers
> > > Avlesh
> > >
> > > On Mon, Jul 20, 2009 at 8:05 PM, James T <
> > codetester.codetester@gmail.com
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a specific requirement for searching and looking for some help
> > > from
> > > > the community on how to achieve it using solr:
> > > >
> > > > I need to index 1million + documents. Each document contains ( among
> > > other
> > > > fields ) 3 fields representing the category which that doc belongs
> to.
> > > For
> > > > example ( a very simplied case to make it easier to explain )
> > > >
> > > > Doc 1
> > > >  Place : NY, Paris, Tokyo
> > > >  Authors: AuthorA, AuthorB, AuthorC, AuthorD
> > > >  Tags: tagA, tagB, ballon
> > > >
> > > > Doc 2
> > > >  Place : Bangkok
> > > >  Authors: AuthorD
> > > >  Tags: tagZ
> > > >
> > > > So each doc can contain multiple values for each of above fields (
> > place,
> > > > author, tags )
> > > >
> > > > Now the searching requirements is that, by constrainting on one of
> the
> > > > value, I need a search on related fields.
> > > >
> > > > Example: By giving a constraint "Author: AuthorD", I need a search on
> > the
> > > > search space:
> > > >    Place: Ny, Paris, Tokyo and London
> > > >    Author: AuthorA, AuthorB, AuthorC,
> > > >    Tags: tagA, tagB and tagZ
> > > > ( The above result is generated by the fact that every item in the
> > result
> > > > has atleast 1 doc in common with "AuthorD" )
> > > >
> > > > So as I am typing "Ba", I need to get Ballon and Bangkok ( These Tags
> > and
> > > > Places have atleast 1doc where it also had AuthorD )
> > > >
> > > > Is such a system possible to implement using solr?
> > > >
> > > > Thanks!
> > > >
> > >
> >
>

Re: Implementing related tags

Posted by Avlesh Singh <av...@gmail.com>.
If I understood your problem correctly, faceting on "tags" field is what you
need. Try this -
http://localhost:8983/solr/ <goog_1248106219337>
memoir/select?fq=tag:tagA&q=( <goog_1248106219337>
tags%3Aba*)&facet=true&facet.field=tags&facet.mincount=1<http://localhost:8983/solr/memoir/select?fq=tag:tagA&q=%28tags%3Aba*%29&facet=true&facet.field=tags&facet.mincount=1>

Notice the usage of facet parameters. Locate the "facet_counts" section in
your response. If this is what you were looking for, then
http://wiki.apache.org/solr/SimpleFacetParameters might be a good read.

Cheers
Avlesh

On Mon, Jul 20, 2009 at 11:37 PM, James T
<co...@gmail.com>wrote:

> That does not seem to work fine. To further simplify the issue, assuming
> there is a multi valued tag field and number of docs is 1 million. By
> constrainting on a given tag, I need to search on the related tags.
>
> So
> Doc 1:
>   tags: tagA, tagB, tagC, ball
> Doc 2:
>   tags: tagA, bat
>
> Now constrainting on "tagA" and searching for "ba*",  I need something like
> http://localhost:8983/solr/memoir/select?fq=tag:tagA&q=(tags%3Aba*)<http://localhost:8983/solr/memoir/select?fq=tag:tagA&q=%28tags%3Aba*%29>and just
> return the related tags ( not the docs where that tag is present )
>
> "tagA" maybe present in 20K docs ( of 1 million docs), but "tagA" might
> have
> totally 100 other related tags ( i.e those 100 tags had appeared with
> "tagA"
> in atleast 1 doc ). So the search space ( by constrainting on "tagA" ) is
> 100 and not 1million.
>
> Hope that helps in explaining the issue better.
>
> Thanks!
>
>
> On Mon, Jul 20, 2009 at 9:51 PM, Avlesh Singh <av...@gmail.com> wrote:
>
> > Have a look at the MoreLikeThis component -
> > http://wiki.apache.org/solr/MoreLikeThis
> >
> > Cheers
> > Avlesh
> >
> > On Mon, Jul 20, 2009 at 8:05 PM, James T <
> codetester.codetester@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > I have a specific requirement for searching and looking for some help
> > from
> > > the community on how to achieve it using solr:
> > >
> > > I need to index 1million + documents. Each document contains ( among
> > other
> > > fields ) 3 fields representing the category which that doc belongs to.
> > For
> > > example ( a very simplied case to make it easier to explain )
> > >
> > > Doc 1
> > >  Place : NY, Paris, Tokyo
> > >  Authors: AuthorA, AuthorB, AuthorC, AuthorD
> > >  Tags: tagA, tagB, ballon
> > >
> > > Doc 2
> > >  Place : Bangkok
> > >  Authors: AuthorD
> > >  Tags: tagZ
> > >
> > > So each doc can contain multiple values for each of above fields (
> place,
> > > author, tags )
> > >
> > > Now the searching requirements is that, by constrainting on one of the
> > > value, I need a search on related fields.
> > >
> > > Example: By giving a constraint "Author: AuthorD", I need a search on
> the
> > > search space:
> > >    Place: Ny, Paris, Tokyo and London
> > >    Author: AuthorA, AuthorB, AuthorC,
> > >    Tags: tagA, tagB and tagZ
> > > ( The above result is generated by the fact that every item in the
> result
> > > has atleast 1 doc in common with "AuthorD" )
> > >
> > > So as I am typing "Ba", I need to get Ballon and Bangkok ( These Tags
> and
> > > Places have atleast 1doc where it also had AuthorD )
> > >
> > > Is such a system possible to implement using solr?
> > >
> > > Thanks!
> > >
> >
>

Re: Implementing related tags

Posted by James T <co...@gmail.com>.
That does not seem to work fine. To further simplify the issue, assuming
there is a multi valued tag field and number of docs is 1 million. By
constrainting on a given tag, I need to search on the related tags.

So
Doc 1:
   tags: tagA, tagB, tagC, ball
Doc 2:
   tags: tagA, bat

Now constrainting on "tagA" and searching for "ba*",  I need something like
http://localhost:8983/solr/memoir/select?fq=tag:tagA&q=(tags%3Aba*) and just
return the related tags ( not the docs where that tag is present )

"tagA" maybe present in 20K docs ( of 1 million docs), but "tagA" might have
totally 100 other related tags ( i.e those 100 tags had appeared with "tagA"
in atleast 1 doc ). So the search space ( by constrainting on "tagA" ) is
100 and not 1million.

Hope that helps in explaining the issue better.

Thanks!


On Mon, Jul 20, 2009 at 9:51 PM, Avlesh Singh <av...@gmail.com> wrote:

> Have a look at the MoreLikeThis component -
> http://wiki.apache.org/solr/MoreLikeThis
>
> Cheers
> Avlesh
>
> On Mon, Jul 20, 2009 at 8:05 PM, James T <codetester.codetester@gmail.com
> >wrote:
>
> > Hi,
> >
> > I have a specific requirement for searching and looking for some help
> from
> > the community on how to achieve it using solr:
> >
> > I need to index 1million + documents. Each document contains ( among
> other
> > fields ) 3 fields representing the category which that doc belongs to.
> For
> > example ( a very simplied case to make it easier to explain )
> >
> > Doc 1
> >  Place : NY, Paris, Tokyo
> >  Authors: AuthorA, AuthorB, AuthorC, AuthorD
> >  Tags: tagA, tagB, ballon
> >
> > Doc 2
> >  Place : Bangkok
> >  Authors: AuthorD
> >  Tags: tagZ
> >
> > So each doc can contain multiple values for each of above fields ( place,
> > author, tags )
> >
> > Now the searching requirements is that, by constrainting on one of the
> > value, I need a search on related fields.
> >
> > Example: By giving a constraint "Author: AuthorD", I need a search on the
> > search space:
> >    Place: Ny, Paris, Tokyo and London
> >    Author: AuthorA, AuthorB, AuthorC,
> >    Tags: tagA, tagB and tagZ
> > ( The above result is generated by the fact that every item in the result
> > has atleast 1 doc in common with "AuthorD" )
> >
> > So as I am typing "Ba", I need to get Ballon and Bangkok ( These Tags and
> > Places have atleast 1doc where it also had AuthorD )
> >
> > Is such a system possible to implement using solr?
> >
> > Thanks!
> >
>

Re: Implementing related tags

Posted by Avlesh Singh <av...@gmail.com>.
Have a look at the MoreLikeThis component -
http://wiki.apache.org/solr/MoreLikeThis

Cheers
Avlesh

On Mon, Jul 20, 2009 at 8:05 PM, James T <co...@gmail.com>wrote:

> Hi,
>
> I have a specific requirement for searching and looking for some help from
> the community on how to achieve it using solr:
>
> I need to index 1million + documents. Each document contains ( among other
> fields ) 3 fields representing the category which that doc belongs to. For
> example ( a very simplied case to make it easier to explain )
>
> Doc 1
>  Place : NY, Paris, Tokyo
>  Authors: AuthorA, AuthorB, AuthorC, AuthorD
>  Tags: tagA, tagB, ballon
>
> Doc 2
>  Place : Bangkok
>  Authors: AuthorD
>  Tags: tagZ
>
> So each doc can contain multiple values for each of above fields ( place,
> author, tags )
>
> Now the searching requirements is that, by constrainting on one of the
> value, I need a search on related fields.
>
> Example: By giving a constraint "Author: AuthorD", I need a search on the
> search space:
>    Place: Ny, Paris, Tokyo and London
>    Author: AuthorA, AuthorB, AuthorC,
>    Tags: tagA, tagB and tagZ
> ( The above result is generated by the fact that every item in the result
> has atleast 1 doc in common with "AuthorD" )
>
> So as I am typing "Ba", I need to get Ballon and Bangkok ( These Tags and
> Places have atleast 1doc where it also had AuthorD )
>
> Is such a system possible to implement using solr?
>
> Thanks!
>