You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nishant Chandra <ni...@gmail.com> on 2012/03/23 18:44:55 UTC

Tags and Folksonomies

Suppose I have content which has title and description. Users can tag content
and search content based on tag, title and description. Tag has more
weightage.

Any inputs on how indexing and retrieval will work given there is content
and tags using Solr? Has anyone implemented search based on collaborative
tagging?

Thanks,
Nishant

Re: Tags and Folksonomies

Posted by Richard Noble <ra...@gmail.com>.
Hi

I have not actually done this yet, but will need to do something similar.
We will also be using user tagging, and ratings to influence relevancy for
the searches.

I take it that you want something like if a document has been tagged 8
times with the tag "tagvalue"
but only 4 times with the tag "othervalue" then you want to boost rate the
tag tagvalue higher?

The route I plan to go down would be to store the tag value count against
the document, and
use a (possibly custom) function to boost accordingly.

Just a theory at this point, and I am sure that there may be better ways.

Hope it helps

Richard


On Fri, Mar 23, 2012 at 5:44 PM, Nishant Chandra
<ni...@gmail.com>wrote:

> Suppose I have content which has title and description. Users can tag
> content
> and search content based on tag, title and description. Tag has more
> weightage.
>
> Any inputs on how indexing and retrieval will work given there is content
> and tags using Solr? Has anyone implemented search based on collaborative
> tagging?
>
> Thanks,
> Nishant
>



-- 
*nix has users, Mac has fans, Windows has victims.

Re: Tags and Folksonomies

Posted by Ravish Bhagdev <ra...@gmail.com>.
OK, yes that's true.  Although I'd expect term vectors to just increment
term count when a tag is re-applied (if you have term vectors enabled),
increasing a boost stored as a payload with each tag, each time an existing
tag is re-tagged maybe a more sensible approach if this is the case.
 You'll still have to rewrite the whole record for this though as its not
possible to 'update' a specific field value in Solr for efficiency reasons.

Rav

On Tue, Apr 3, 2012 at 4:50 PM, Chris Hostetter <ho...@fucit.org>wrote:

>
> : I am not sure why you suggest Payload for ranking documents with more
> : frequent tags above those with fewer tags.  Wont the term frequency part
> of
> : relevancy score ensure this by default?  If you make tags a 'lowercase'
>
> Sorry, yes ... absolutely - if you use omitNormws=false on the tags
> field, and add these two docs...
>
>  { id: doc1; tags: [house, house, house, boat] }
>  { id: doc2; tags: [house, boat, car, vegas] }
>
> ...then doc1 will score higher on a query for "tags:house.
>
> my suggestion to use payloads was because sending the same value many many
> times (ie: if 100,000 users apply the tag "house" you would need to index
> that doc with the word "house" repeated 100,000 times) can be prohibitive.
>
>
> -Hoss
>

Re: Tags and Folksonomies

Posted by Chris Hostetter <ho...@fucit.org>.
: I am not sure why you suggest Payload for ranking documents with more
: frequent tags above those with fewer tags.  Wont the term frequency part of
: relevancy score ensure this by default?  If you make tags a 'lowercase'

Sorry, yes ... absolutely - if you use omitNormws=false on the tags 
field, and add these two docs...

  { id: doc1; tags: [house, house, house, boat] }
  { id: doc2; tags: [house, boat, car, vegas] }

...then doc1 will score higher on a query for "tags:house.

my suggestion to use payloads was because sending the same value many many 
times (ie: if 100,000 users apply the tag "house" you would need to index 
that doc with the word "house" repeated 100,000 times) can be prohibitive.


-Hoss

Re: Tags and Folksonomies

Posted by Ravish Bhagdev <ra...@gmail.com>.
Hi Hoss,

I am not sure why you suggest Payload for ranking documents with more
frequent tags above those with fewer tags.  Wont the term frequency part of
relevancy score ensure this by default?  If you make tags a 'lowercase'
field (with full value tokenisation), the frequency of tags in multivalued
field should improve score for doc A in below scenario?

Payloads, I thought would be more useful when you want some tags in a
record to be weighted more than others?  Or have I missed some point maybe.

Thanks,
Rav

On Tue, Apr 3, 2012 at 1:02 AM, Chris Hostetter <ho...@fucit.org>wrote:

>
> : Suppose I have content which has title and description. Users can tag
> content
> : and search content based on tag, title and description. Tag has more
> : weightage.
> :
> : Any inputs on how indexing and retrieval will work given there is content
> : and tags using Solr? Has anyone implemented search based on collaborative
> : tagging?
>
> simple stuff would be to have your 3 fields, and search them with a
> weighted boosting -- giving more importance to the tag field.
>
> where things get more complicated is when you want docA to score
> higher for hte query "boat" then docB because 100 users have taged docA
> with boat, but only 5 users have taged docB "boat"
>
> The canonical way to deal with this would be using payloads to boost the
> weight of a term -- the DelimitedPayloadTokenFilterFactory can help with
> this at index time, but off the top of my head i don't think any of the
> existing Solr QParsers will build the neccessary PayloadTermQuery, so you
> might have to roll your own -- there are afew Jira issues with patches
> that you might be able to re-use or get inspired from...
>
> https://issues.apache.org/jira/browse/SOLR-1485
>
>
>
>
> -Hoss
>

Re: Tags and Folksonomies

Posted by Chris Hostetter <ho...@fucit.org>.
: Suppose I have content which has title and description. Users can tag content
: and search content based on tag, title and description. Tag has more
: weightage.
: 
: Any inputs on how indexing and retrieval will work given there is content
: and tags using Solr? Has anyone implemented search based on collaborative
: tagging?

simple stuff would be to have your 3 fields, and search them with a 
weighted boosting -- giving more importance to the tag field.

where things get more complicated is when you want docA to score 
higher for hte query "boat" then docB because 100 users have taged docA 
with boat, but only 5 users have taged docB "boat"

The canonical way to deal with this would be using payloads to boost the 
weight of a term -- the DelimitedPayloadTokenFilterFactory can help with 
this at index time, but off the top of my head i don't think any of the 
existing Solr QParsers will build the neccessary PayloadTermQuery, so you 
might have to roll your own -- there are afew Jira issues with patches 
that you might be able to re-use or get inspired from...

https://issues.apache.org/jira/browse/SOLR-1485




-Hoss