You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by smanad <sm...@gmail.com> on 2013/08/07 03:22:24 UTC

entity classification solr

I have the following situation when using Solr 4.3. 
My document contains "entities" for example "peanut butter". I have a list
of such entities. These are items that go together and are not to be treated
as two individual words. During indexing, I want solr to realize this and
treat "peanut butter" as an entity. For example if someone searches for

"peanut"

then documents that have the word peanut should rank higher than documents
that have the word "peanut butter". However if someone searches for

"peanut butter"

then the document that has peanut butter should show up higher than ones
that have just peanut. Is there a config setting somewhere which can be
modified such that the entity list can be specified in a file and Solr would
do the needful?

Should I be using KeepWordFilterFactory for this? 

Any pointers will be much appreciated.
Thanks, 
-Manasi



--
View this message in context: http://lucene.472066.n3.nabble.com/entity-classification-solr-tp4082923.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: entity classification solr

Posted by manju16832003 <ma...@gmail.com>.
Can you provide sample structure of the document with entities, how does the
document look like?.
As far as I can assume, you do not need to apply any filters. If you are
entities are searchable include them in the fulltext or keyword research.
Is your entities are part of the document and are they multivalued?

do you want to keep the word combination 'peanut butter' while indexing?

Thanks





--
View this message in context: http://lucene.472066.n3.nabble.com/entity-classification-solr-tp4082923p4082931.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: entity classification solr

Posted by Markus Jelsma <ma...@openindex.io>.
Yes, you can copyField the source's contents to another field, use the KeepWordTokenFilter to keep only those words you really care about. Using (e)dismax you can then apply a heavy boost on the field. All special words in that field will show up higher if queried for. 
 
-----Original message-----
> From:smanad <sm...@gmail.com>
> Sent: Wednesday 7th August 2013 3:23
> To: solr-user@lucene.apache.org
> Subject: entity classification solr
> 
> I have the following situation when using Solr 4.3. 
> My document contains "entities" for example "peanut butter". I have a list
> of such entities. These are items that go together and are not to be treated
> as two individual words. During indexing, I want solr to realize this and
> treat "peanut butter" as an entity. For example if someone searches for
> 
> "peanut"
> 
> then documents that have the word peanut should rank higher than documents
> that have the word "peanut butter". However if someone searches for
> 
> "peanut butter"
> 
> then the document that has peanut butter should show up higher than ones
> that have just peanut. Is there a config setting somewhere which can be
> modified such that the entity list can be specified in a file and Solr would
> do the needful?
> 
> Should I be using KeepWordFilterFactory for this? 
> 
> Any pointers will be much appreciated.
> Thanks, 
> -Manasi
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/entity-classification-solr-tp4082923.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>