You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Alex vB <ma...@avomberg.de> on 2011/02/02 20:35:19 UTC

Storing payloads without term-position and frequency

Hello everybody,

I am currently using Lucene 3.0.2 with payloads. I store extra information
in the payloads about the term like frequencies and therefore I don't need
frequencies and term positions stored normally by Lucene. I would like to
set f.setOmitTermFreqAndPositions(true) but then I am not able to retrieve
payloads. Would it be hard to "hack" Lucene for my requests? Anymore I only
store one payload per term if that information makes it easier.

Best regards
Alex
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Storing-payloads-without-term-position-and-frequency-tp2408094p2408094.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Storing payloads without term-position and frequency

Posted by Alex <al...@googlemail.com>.
Hello Grant,

I am currently storing the first term instance only because I just index
each token for an article once. What I want to achieve is an index for
versioned document collections like wikipedia (See this paper
http://www.cis.poly.edu/suel/papers/archive.pdf). 

In detail I create on the first level (Lucene) a document for one
wikipedia article containing all distinct terms of its versions. On the
second level (payloads) I store the frequency information corresponding
to each article version and its terms. If I search now I can find an
article by its term and through the term and its payload I receive
informations about the other versions and how often a token occured (In
my case with one term the payload pos is always 1!). So I look on the
first level and pick only the information from the second level which I
need. By this I can avoid storing informations several times because
most wikipedia versions are very similar (in term context).

This is working so far and I just want to reduce my index size but I
don't know how much I can save by disabling term freqs/pos.
I hope I could explain the problem a little bit. If not just tell me I
try to explain it again. :)

Best regards
Alex

PS: I am currently looking for a bedroom in New York, Brooklyn (Park
Slope or near NYU Poly). Maybe somebody rents a room from 15 Feb until
15 April. :)

Am Donnerstag, den 03.02.2011, 12:38 -0500 schrieb Grant Ingersoll:
> Payloads only make sense in terms of specific positions in the index, so I don't think there is a way to hack Lucene for it.  You could, I suppose, just store the payload for the first instance of the term.
> 
> Also, what's the use case you are trying to solve here?  Why store term frequency as a payload when Lucene already does it (and it probably does it more efficiently)
> 
> -Grant
> 
> On Feb 2, 2011, at 2:35 PM, Alex vB wrote:
> 
> > 
> > Hello everybody,
> > 
> > I am currently using Lucene 3.0.2 with payloads. I store extra information
> > in the payloads about the term like frequencies and therefore I don't need
> > frequencies and term positions stored normally by Lucene. I would like to
> > set f.setOmitTermFreqAndPositions(true) but then I am not able to retrieve
> > payloads. Would it be hard to "hack" Lucene for my requests? Anymore I only
> > store one payload per term if that information makes it easier.
> > 
> > Best regards
> > Alex
> > -- 
> > View this message in context: http://lucene.472066.n3.nabble.com/Storing-payloads-without-term-position-and-frequency-tp2408094p2408094.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Storing payloads without term-position and frequency

Posted by Grant Ingersoll <gs...@apache.org>.
Payloads only make sense in terms of specific positions in the index, so I don't think there is a way to hack Lucene for it.  You could, I suppose, just store the payload for the first instance of the term.

Also, what's the use case you are trying to solve here?  Why store term frequency as a payload when Lucene already does it (and it probably does it more efficiently)

-Grant

On Feb 2, 2011, at 2:35 PM, Alex vB wrote:

> 
> Hello everybody,
> 
> I am currently using Lucene 3.0.2 with payloads. I store extra information
> in the payloads about the term like frequencies and therefore I don't need
> frequencies and term positions stored normally by Lucene. I would like to
> set f.setOmitTermFreqAndPositions(true) but then I am not able to retrieve
> payloads. Would it be hard to "hack" Lucene for my requests? Anymore I only
> store one payload per term if that information makes it easier.
> 
> Best regards
> Alex
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/Storing-payloads-without-term-position-and-frequency-tp2408094p2408094.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Storing payloads without term-position and frequency

Posted by Yuhan Zhang <yz...@onescreen.com>.
HI Alex,

you can specify the infomation to be stored on a field by setting
Field.TermVector.NO

doc.add(new Field(TEXT_FIELD_NAME, text, Field.Store.NO,
Field.Index.ANALYZED, Field.TermVector.NO));

On Wed, Feb 2, 2011 at 11:35 AM, Alex vB <ma...@avomberg.de> wrote:

>
> Hello everybody,
>
> I am currently using Lucene 3.0.2 with payloads. I store extra information
> in the payloads about the term like frequencies and therefore I don't need
> frequencies and term positions stored normally by Lucene. I would like to
> set f.setOmitTermFreqAndPositions(true) but then I am not able to retrieve
> payloads. Would it be hard to "hack" Lucene for my requests? Anymore I only
> store one payload per term if that information makes it easier.
>
> Best regards
> Alex
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Storing-payloads-without-term-position-and-frequency-tp2408094p2408094.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>