You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Xaida <ho...@gmail.com> on 2010/05/24 13:25:48 UTC

Applying term frequency thresholds on indexing time

Hi guys!

does there exist a way to define some threshold on the terms I wanna store
in the index(before they are indexed). I need to store the terms  with
higheest frequencies. I done it with term vectors and some cutoff ratio that
cuts off the least occuring terms, but all this is, ofcourse works during
retrieval time, reading from index. 

I know it make no sense to be able to calculate frequencies of the terms
before they are stored, but i guess there could be some way to work it
around???

All hellp appreciated!

Thank you!
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Applying-term-frequency-thresholds-on-indexing-time-tp839449p839449.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Applying term frequency thresholds on indexing time

Posted by Erick Erickson <er...@gmail.com>.
Why do you want to calculate this? This is done for
you by the indexing process and taken into account
when searching.

You're asking for a solution before defining the problem,
which makes it very hard to help.
See: http://people.apache.org/~hossman/#xyproblem

Best
Erick


On Mon, May 24, 2010 at 7:25 AM, Xaida <ho...@gmail.com> wrote:

>
> Hi guys!
>
> does there exist a way to define some threshold on the terms I wanna store
> in the index(before they are indexed). I need to store the terms  with
> higheest frequencies. I done it with term vectors and some cutoff ratio
> that
> cuts off the least occuring terms, but all this is, ofcourse works during
> retrieval time, reading from index.
>
> I know it make no sense to be able to calculate frequencies of the terms
> before they are stored, but i guess there could be some way to work it
> around???
>
> All hellp appreciated!
>
> Thank you!
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Applying-term-frequency-thresholds-on-indexing-time-tp839449p839449.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Applying term frequency thresholds on indexing time

Posted by Michael McCandless <lu...@mikemccandless.com>.
Also have a look at the index pruning tool:

    https://issues.apache.org/jira/browse/LUCENE-1812

Mike

On Mon, May 24, 2010 at 7:25 AM, Xaida <ho...@gmail.com> wrote:
>
> Hi guys!
>
> does there exist a way to define some threshold on the terms I wanna store
> in the index(before they are indexed). I need to store the terms  with
> higheest frequencies. I done it with term vectors and some cutoff ratio that
> cuts off the least occuring terms, but all this is, ofcourse works during
> retrieval time, reading from index.
>
> I know it make no sense to be able to calculate frequencies of the terms
> before they are stored, but i guess there could be some way to work it
> around???
>
> All hellp appreciated!
>
> Thank you!
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Applying-term-frequency-thresholds-on-indexing-time-tp839449p839449.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org