You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by fiedzia <fi...@gmail.com> on 2010/07/16 14:59:00 UTC

documents with known relevancy

I want to  know if what i am trying to achieve is doable using solr.

I have some objects that have tags assigned. Tag is as string with weight
attached,
so whole document that i want to index can look like that:
{
  id: 123,
  tags: {
          tag1: 0.01,
          tag2: 0.3,
          ...
          tagN: some_weight
          }
}
Now i want to store list of tags and sort returned results by tag weight.
The list of tags can be large (up to thousands per document, though mostly
much less).
So when i am querying solr for documents containing tag1, it should return
all documents containing it,
sorted by weight of this tag. Is there any way to do that?
-- 
View this message in context: http://lucene.472066.n3.nabble.com/documents-with-known-relevancy-tp972462p972462.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: documents with known relevancy

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Looks to me like a sort of way to get to 'categories', if one were interested in doing that, shudder.


Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 7/16/10, Peter Karich <pe...@yahoo.de> wrote:

> From: Peter Karich <pe...@yahoo.de>
> Subject: Re: documents with known relevancy
> To: solr-user@lucene.apache.org
> Date: Friday, July 16, 2010, 12:25 PM
> I didn't looked at payloads as
> mentioned by Jonathan, but another
> solution could be (similar to Dennis'):
> 
> create a field 'tags' and then add the tag1 several times
> to it -
> depending on the weight.
> E.g. add it 10 times if the weight is 1.0
> But add it only 2 times if the weight is 0.2 etc.
> 
> Of course this limits the weight to 11 weights (0, 0.1,
> 0.2, ... and 1)
> but should work :-)
> 
> Regards,
> Peter.
> 
> > I came up with another idea, which seem to do what i
> want. Any comments about
> > better solutions
> > or improving efficiency are welcome:
> >
> > for each document create multivalue text field "tags"
> with all tags,
> > and multiple dynamic fields for each tag containging
> value, so we have:
> > {
> >   id: 123
> >   tags: tag1, tag2, ..., tagN
> >   tag1_float: 0.1,
> >   tag2_float: 0.2,
> >   ...
> >   tagN_float: 0.3,
> > }
> >
> > then query for tag1 and tag2 could like that:
> > tags:tag1 AND tags: tag2
> > and sort results by sum of tag1_float and tag2_float.
> >
> >   
> 
> 
> -- 
> http://karussell.wordpress.com/
> 
>

Re: documents with known relevancy

Posted by Peter Karich <pe...@yahoo.de>.

I didn't looked at payloads as mentioned by Jonathan, but another
solution could be (similar to Dennis'):

create a field 'tags' and then add the tag1 several times to it -
depending on the weight.
E.g. add it 10 times if the weight is 1.0
But add it only 2 times if the weight is 0.2 etc.

Of course this limits the weight to 11 weights (0, 0.1, 0.2, ... and 1)
but should work :-)

Regards,
Peter.

> I came up with another idea, which seem to do what i want. Any comments about
> better solutions
> or improving efficiency are welcome:
>
> for each document create multivalue text field "tags" with all tags,
> and multiple dynamic fields for each tag containging value, so we have:
> {
>   id: 123
>   tags: tag1, tag2, ..., tagN
>   tag1_float: 0.1,
>   tag2_float: 0.2,
>   ...
>   tagN_float: 0.3,
> }
>
> then query for tag1 and tag2 could like that:
> tags:tag1 AND tags: tag2
> and sort results by sum of tag1_float and tag2_float.
>
>   


-- 
http://karussell.wordpress.com/

Re: documents with known relevancy

Posted by fiedzia <fi...@gmail.com>.

I came up with another idea, which seem to do what i want. Any comments about
better solutions
or improving efficiency are welcome:

for each document create multivalue text field "tags" with all tags,
and multiple dynamic fields for each tag containging value, so we have:
{
  id: 123
  tags: tag1, tag2, ..., tagN
  tag1_float: 0.1,
  tag2_float: 0.2,
  ...
  tagN_float: 0.3,
}

then query for tag1 and tag2 could like that:
tags:tag1 AND tags: tag2
and sort results by sum of tag1_float and tag2_float.

-- 
View this message in context: http://lucene.472066.n3.nabble.com/documents-with-known-relevancy-tp972462p972873.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: documents with known relevancy

Posted by fiedzia <fi...@gmail.com>.


Jonathan Rochkind wrote:
> 
> I've never used it, but I think this is the use case that the Solr feature
> to use Lucene 'payloads' is meant for?  
> http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
> 
This is it, thanks for this link.

-- 
View this message in context: http://lucene.472066.n3.nabble.com/documents-with-known-relevancy-tp972462p973444.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: documents with known relevancy

Posted by Jonathan Rochkind <ro...@jhu.edu>.

> Exactly. The weight is a weight of a given tag for specific document, not
> weight of the field as in weighted search. So one document may have tag1
> with weight of 0.1, and another may have the same tag1 with weight=0.8.

I've never used it, but I think this is the use case that the Solr feature to use Lucene 'payloads' is meant for?  
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/

Re: documents with known relevancy

Posted by fiedzia <fi...@gmail.com>.


Dennis Gearon wrote:
> 
> So does this mean that each document has a different weight for the same
> tag?
> 

Exactly. The weight is a weight of a given tag for specific document, not
weight of the field as in weighted search. So one document may have tag1
with weight of 0.1, and another may have the same tag1 with weight=0.8.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/documents-with-known-relevancy-tp972462p973036.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: documents with known relevancy

Posted by Dennis Gearon <ge...@sbcglobal.net>.

So does this mean that each document has a different weight for the same tag?
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 7/16/10, fiedzia <fi...@gmail.com> wrote:

> From: fiedzia <fi...@gmail.com>
> Subject: Re: documents with known relevancy
> To: solr-user@lucene.apache.org
> Date: Friday, July 16, 2010, 8:06 AM
> 
> 
> Peter Karich wrote:
> > 
> > Hi,
> > 
> > Why do you need the weight for the tags?
> > 
> 
> The only reason to include weights is to sort results by
> weights.
> So if there are multiple documents containing given tag,
> i want them to be sorted by weight. Also i would like to be
> able 
> to seach by multiple tags at once (so if there would be
> field "tags" with
> all tags,
> then documents with highest sum of their weights shoud be
> first. Sum is just
> example here,
> if solr can offer something similar or more advanced, its
> fine).
> 
> 
> 
> Peter Karich wrote:
> > 
> > you could index it this way:
> > 
> > {
> >  id:     123
> >  tag:    'tag1'
> >  weight:  0.01
> >  uniqueKey: combine(id, tag)
> > }
> > 
> > {
> >  id:     123
> >  tag:    'tag2'
> >  weight:  0.3
> >  uniqueKey: combine(id, tag)
> > }
> > 
> > and specify the query-time boost with the help of the
> weight.
> > Retrieving the document content in a second request to
> another solrindex
> > or using a db.
> > 
> 
> Well, that would work for querying  for single tag. Do
> you know solution
> solving problem of querying for multiple tags?
> 
> Perhaps i can explain the problem better by presenting
> obvious solution:
> create multivalue field "tags" with all tags. Ths will
> allow to easily ask
> solr for documents matching query
> (which may look like that:  tags:tag1 AND tags:tag2).
> Then get list of all
> results, retrieve tag weights from database and sort them
> by weight. This is
> obviously inneficient, as it requires getting all documents
> from solr
> (possibly large list), then again get them from db, then
> calculate weights
> then sort them. So i am trying to involve solr in this
> processing.
> 
> Other solution i can think could work (though haven't
> examined it fully yet)
> woud be to create single text field for tags with tags
> occurences matching
> tag weight (so if tag2 weigtht is twice as big as tag1,
> then the text contains tag1 once and tag2 twice ("tag1 tag2
> tag2"), then
> calculate document score
> basing on amount of occurences of given tag in text). From
> what i know about
> solr this could be done,
> but maybe there is a better solution.
> 
> 
> Peter Karich wrote:
> > 
> > there could be a different solution using dynamic
> fields and index-time
> > boosts but I am not sure at the
> moment.    
> > 
> 
> Can write more about it? Any idea is welcome.
> 
> Thanks for your help anyway.
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/documents-with-known-relevancy-tp972462p972748.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
>

Re: documents with known relevancy

Posted by fiedzia <fi...@gmail.com>.

Peter Karich wrote:
> 
> Hi,
> 
> Why do you need the weight for the tags?
> 

The only reason to include weights is to sort results by weights.
So if there are multiple documents containing given tag,
i want them to be sorted by weight. Also i would like to be able 
to seach by multiple tags at once (so if there would be field "tags" with
all tags,
then documents with highest sum of their weights shoud be first. Sum is just
example here,
if solr can offer something similar or more advanced, its fine).

Peter Karich wrote:
> 
> you could index it this way:
> 
> {
>  id:     123
>  tag:    'tag1'
>  weight:  0.01
>  uniqueKey: combine(id, tag)
> }
> 
> {
>  id:     123
>  tag:    'tag2'
>  weight:  0.3
>  uniqueKey: combine(id, tag)
> }
> 
> and specify the query-time boost with the help of the weight.
> Retrieving the document content in a second request to another solrindex
> or using a db.
> 

Well, that would work for querying  for single tag. Do you know solution
solving problem of querying for multiple tags?

Perhaps i can explain the problem better by presenting obvious solution:
create multivalue field "tags" with all tags. Ths will allow to easily ask
solr for documents matching query
(which may look like that:  tags:tag1 AND tags:tag2). Then get list of all
results, retrieve tag weights from database and sort them by weight. This is
obviously inneficient, as it requires getting all documents from solr
(possibly large list), then again get them from db, then calculate weights
then sort them. So i am trying to involve solr in this processing.

Other solution i can think could work (though haven't examined it fully yet)
woud be to create single text field for tags with tags occurences matching
tag weight (so if tag2 weigtht is twice as big as tag1,
then the text contains tag1 once and tag2 twice ("tag1 tag2 tag2"), then
calculate document score
basing on amount of occurences of given tag in text). From what i know about
solr this could be done,
but maybe there is a better solution.

Peter Karich wrote:
> 
> there could be a different solution using dynamic fields and index-time
> boosts but I am not sure at the moment.	
> 

Can write more about it? Any idea is welcome.

Thanks for your help anyway.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/documents-with-known-relevancy-tp972462p972748.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: documents with known relevancy

Posted by Peter Karich <pe...@yahoo.de>.

Hi,

Why do you need the weight for the tags?

you could index it this way:

{
 id:     123
 tag:    'tag1'
 weight:  0.01
 uniqueKey: combine(id, tag)
}

{
 id:     123
 tag:    'tag2'
 weight:  0.3
 uniqueKey: combine(id, tag)
}

and specify the query-time boost with the help of the weight.
Retrieving the document content in a second request to another solrindex or using a db.

there could be a different solution using dynamic fields and index-time boosts but I am not sure at the moment.	

Regards,
Peter.

> I want to  know if what i am trying to achieve is doable using solr.
>
> I have some objects that have tags assigned. Tag is as string with weight
> attached,
> so whole document that i want to index can look like that:
> {
>   id: 123,
>   tags: {
>           tag1: 0.01,
>           tag2: 0.3,
>           ...
>           tagN: some_weight
>           }
> }
> Now i want to store list of tags and sort returned results by tag weight.
> The list of tags can be large (up to thousands per document, though mostly
> much less).
> So when i am querying solr for documents containing tag1, it should return
> all documents containing it,
> sorted by weight of this tag. Is there any way to do that?
>

Re: documents with known relevancy

Posted by fiedzia <fi...@gmail.com>.

Dennis Gearon wrote:
> 
> Seems to me that you are doing externally to Solr what you could be doing
> internally. If you had ONE field as <tags> and weighted those in your SOLR
> query, that is how I am guessing it is usually done.
> 

I guess i used confusing term for weight. The weight (value assigned for
given tag) is document specific and may be different for each document, it
is not weight of a field as in weighted search.

-- 
View this message in context: http://lucene.472066.n3.nabble.com/documents-with-known-relevancy-tp972462p973045.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: documents with known relevancy

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Seems to me that you are doing externally to Solr what you could be doing internally. If you had ONE field as <tags> and weighted those in your SOLR query, that is how I am guessing it is usually done.
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 7/16/10, fiedzia <fi...@gmail.com> wrote:

> From: fiedzia <fi...@gmail.com>
> Subject: documents with known relevancy
> To: solr-user@lucene.apache.org
> Date: Friday, July 16, 2010, 5:59 AM
> 
> I want to  know if what i am trying to achieve is
> doable using solr.
> 
> I have some objects that have tags assigned. Tag is as
> string with weight
> attached,
> so whole document that i want to index can look like that:
> {
>   id: 123,
>   tags: {
>           tag1: 0.01,
>           tag2: 0.3,
>           ...
>           tagN: some_weight
>           }
> }
> Now i want to store list of tags and sort returned results
> by tag weight.
> The list of tags can be large (up to thousands per
> document, though mostly
> much less).
> So when i am querying solr for documents containing tag1,
> it should return
> all documents containing it,
> sorted by weight of this tag. Is there any way to do that?
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/documents-with-known-relevancy-tp972462p972462.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
>