You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Frank A <fs...@gmail.com> on 2011/06/12 16:20:04 UTC

Finding Keywords/Phrases

I have a single copyfield that has a number of other fields copied to it.
I'm trying to "extract" a list of keywords and common terms.  I realize it
may not be a 100% dynamic and I may need to manually filter.  Right now I
tried using a CommonGrams filter.  However, what I see is it creates tokens
for both "hot" "dog" and "hot dog".  Is there anyway from within solr
configuration to treat "hot"'s frequency as "hot when not followed by dog".
For example, right now I may see a term/frequency of:

hot   8
dog  6
hot dog  6

What I really want is:

hot dog 6
hot 2

Any ideas?

Re: Finding Keywords/Phrases

Posted by Adam Estrada <es...@gmail.com>.
Hi Frank,

I have been working on something very similar and I am at the point where I
don't believe (and I could be totally wrong) that a pure Solr solution is
going to do this. I would look at Mahout and play with some of the machine
learning algorithms that it can run against a Lucene index. I have not
gotten any further than experimenting with it right now but so far it looks
promising.

Adam

On Sun, Jun 12, 2011 at 10:20 AM, Frank A <fs...@gmail.com> wrote:

> I have a single copyfield that has a number of other fields copied to it.
> I'm trying to "extract" a list of keywords and common terms.  I realize it
> may not be a 100% dynamic and I may need to manually filter.  Right now I
> tried using a CommonGrams filter.  However, what I see is it creates tokens
> for both "hot" "dog" and "hot dog".  Is there anyway from within solr
> configuration to treat "hot"'s frequency as "hot when not followed by dog".
> For example, right now I may see a term/frequency of:
>
> hot   8
> dog  6
> hot dog  6
>
> What I really want is:
>
> hot dog 6
> hot 2
>
> Any ideas?
>