You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Maryam <mk...@yahoo.com> on 2007/03/22 19:24:48 UTC

How can I index Phrases in Lucene?

Hi, 

I know how to index terms in lucene, now I wanna see
how can I index phrases like "information retreival"
in lucene and calculate the number of times that
phrase has appeared in the document. Is there any way
to do it in Lucene?

Thanks


 
____________________________________________________________________________________
It's here! Your new message!  
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How can I index Phrases in Lucene?

Posted by karl wettin <ka...@gmail.com>.
23 mar 2007 kl. 04.25 skrev Ryan McKinley:

> Is there any way to find frequent phrases without knowing what you are
> looking for?

I think you are looking for association rules. Try searching for  
Levelwise-Scan.

Weka contains GPLed Java code.
Cite seer is your best friend for whitepapers. http:// 
citeseer.ist.psu.edu/cs


-- 
karl


>
> I could index "A B C D E" as "A B C", "B C D", "C D E" etc, but that
> seems kind of clunky particularly if the phrase length is large.  Is
> there any position offset magic that will surface frequent phrases
> automatically?
>
> thanks
> ryan
>
>
> On 3/22/07, Erick Erickson <er...@gmail.com> wrote:
>> Well, you don't index phrases, it's done for you. You should try
>> something like the following....
>>
>> Create a SpanNearQuery with your terms. Specify an appropriate
>> slop (probably 0 assuming you want them all next to each other).
>>
>> Now use call getSpans and count <G>... You may have to do
>> something with overlapping spans, but you'll need to experiment
>> a bit to understand it.
>>
>> Erick
>>
>> On 3/22/07, Maryam <mk...@yahoo.com> wrote:
>> >
>> > Hi,
>> >
>> > I know how to index terms in lucene, now I wanna see
>> > how can I index phrases like "information retreival"
>> > in lucene and calculate the number of times that
>> > phrase has appeared in the document. Is there any way
>> > to do it in Lucene?
>> >
>> > Thanks
>> >
>> >
>> >
>> >
>> >  
>> _____________________________________________________________________ 
>> _______________
>> > It's here! Your new message!
>> > Get new email alerts with the free Yahoo! Toolbar.
>> > http://tools.search.yahoo.com/toolbar/features/mail/
>> >
>> >  
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How can I index Phrases in Lucene?

Posted by Ryan McKinley <ry...@gmail.com>.
Is there any way to find frequent phrases without knowing what you are
looking for?

I could index "A B C D E" as "A B C", "B C D", "C D E" etc, but that
seems kind of clunky particularly if the phrase length is large.  Is
there any position offset magic that will surface frequent phrases
automatically?

thanks
ryan


On 3/22/07, Erick Erickson <er...@gmail.com> wrote:
> Well, you don't index phrases, it's done for you. You should try
> something like the following....
>
> Create a SpanNearQuery with your terms. Specify an appropriate
> slop (probably 0 assuming you want them all next to each other).
>
> Now use call getSpans and count <G>... You may have to do
> something with overlapping spans, but you'll need to experiment
> a bit to understand it.
>
> Erick
>
> On 3/22/07, Maryam <mk...@yahoo.com> wrote:
> >
> > Hi,
> >
> > I know how to index terms in lucene, now I wanna see
> > how can I index phrases like "information retreival"
> > in lucene and calculate the number of times that
> > phrase has appeared in the document. Is there any way
> > to do it in Lucene?
> >
> > Thanks
> >
> >
> >
> >
> > ____________________________________________________________________________________
> > It's here! Your new message!
> > Get new email alerts with the free Yahoo! Toolbar.
> > http://tools.search.yahoo.com/toolbar/features/mail/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How can I index Phrases in Lucene?

Posted by Erick Erickson <er...@gmail.com>.
Well, you don't index phrases, it's done for you. You should try
something like the following....

Create a SpanNearQuery with your terms. Specify an appropriate
slop (probably 0 assuming you want them all next to each other).

Now use call getSpans and count <G>... You may have to do
something with overlapping spans, but you'll need to experiment
a bit to understand it.

Erick

On 3/22/07, Maryam <mk...@yahoo.com> wrote:
>
> Hi,
>
> I know how to index terms in lucene, now I wanna see
> how can I index phrases like "information retreival"
> in lucene and calculate the number of times that
> phrase has appeared in the document. Is there any way
> to do it in Lucene?
>
> Thanks
>
>
>
>
> ____________________________________________________________________________________
> It's here! Your new message!
> Get new email alerts with the free Yahoo! Toolbar.
> http://tools.search.yahoo.com/toolbar/features/mail/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>