You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Shane O'Sullivan <sh...@gmail.com> on 2005/10/10 15:37:43 UTC
Adding generic payloads to a Term's posting list
Hi,
To the best of my knowledge, it is not possible to add generic data to a
Term's posting list.
By this I mean info that is defined by the search engine, not Lucene itself.
Whereas Lucene adds some data to the posting lists, such as the term's
position within a document,
there are many other useful types of information that could be attached to a
term.
Some examples would be in XML documents, to store the depth of a tag in the
document,
or font information, such as if the term appeared in a header or in the main
body of text.
Are there any plans to add such functionality to the API? If not, where
would be a the appropriate place
to implement these changes? I presume the TermInfosWriter and
TermInfosReader would have to be altered,
as well as the classes which call them. Could this be done without having to
modify the index in such a way
that standard Lucene indexes couldn't read it?
Thanks
Shane
RE: Adding generic payloads to a Term's posting list
Posted by Grant Ingersoll <gs...@syr.edu>.
>From my understanding, I don't think there has been any work, except the
idea put forth by Doug and others.
Contributions are definitely welcome...
>-----Original Message-----
>From: Shane O'Sullivan [mailto:shaneosullivan1@gmail.com]
>Sent: Tuesday, October 11, 2005 5:08 AM
>To: java-dev@lucene.apache.org
>Subject: Re: Adding generic payloads to a Term's posting list
>
>This is precisely what I am looking for. Does anyone know if
>this work is going in to Lucene 2.0?
>
>Shane
>
>On 10/10/05, Grant Ingersoll <gs...@syr.edu> wrote:
>>
>> http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard
>>
>> See item #11 of API changes. Maybe along the lines of what you are
>> interested in, although I don't know if anyone has even attempted a
>> design of it. I would also like to see this, plus the
>ability to store
>> info at higher levels in the Index, such as Field (not on a
>per token
>> basis), Document (info about the document that spans it's
>fields) and
>> Index (such as coreference information). Alas, no time...
>>
>> -Grant
>>
>> >-----Original Message-----
>> >From: Shane O'Sullivan [mailto:shaneosullivan1@gmail.com]
>> >Sent: Monday, October 10, 2005 8:38 AM
>> >To: java-dev@lucene.apache.org
>> >Subject: Adding generic payloads to a Term's posting list
>> >
>> >Hi,
>> >
>> >To the best of my knowledge, it is not possible to add generic data
>> >to a Term's posting list.
>> >By this I mean info that is defined by the search engine,
>not Lucene
>> >itself.
>> >Whereas Lucene adds some data to the posting lists, such as the
>> >term's position within a document, there are many other
>useful types
>> >of information that could be attached to a term.
>> >
>> >Some examples would be in XML documents, to store the depth
>of a tag
>> >in the document, or font information, such as if the term
>appeared in
>> >a header or in the main body of text.
>> >
>> >Are there any plans to add such functionality to the API? If not,
>> >where would be a the appropriate place to implement these
>changes? I
>> >presume the TermInfosWriter and TermInfosReader would have to be
>> >altered, as well as the classes which call them. Could this be done
>> >without having to modify the index in such a way that
>standard Lucene
>> >indexes couldn't read it?
>> >
>> >Thanks
>> >
>> >Shane
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Adding generic payloads to a Term's posting list
Posted by Shane O'Sullivan <sh...@gmail.com>.
This is precisely what I am looking for. Does anyone know if this work is
going in to Lucene 2.0?
Shane
On 10/10/05, Grant Ingersoll <gs...@syr.edu> wrote:
>
> http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard
>
> See item #11 of API changes. Maybe along the lines of what you are
> interested in, although I don't know if anyone has even attempted a design
> of it. I would also like to see this, plus the ability to store info at
> higher levels in the Index, such as Field (not on a per token basis),
> Document (info about the document that spans it's fields) and Index (such
> as
> coreference information). Alas, no time...
>
> -Grant
>
> >-----Original Message-----
> >From: Shane O'Sullivan [mailto:shaneosullivan1@gmail.com]
> >Sent: Monday, October 10, 2005 8:38 AM
> >To: java-dev@lucene.apache.org
> >Subject: Adding generic payloads to a Term's posting list
> >
> >Hi,
> >
> >To the best of my knowledge, it is not possible to add generic
> >data to a Term's posting list.
> >By this I mean info that is defined by the search engine, not
> >Lucene itself.
> >Whereas Lucene adds some data to the posting lists, such as
> >the term's position within a document, there are many other
> >useful types of information that could be attached to a term.
> >
> >Some examples would be in XML documents, to store the depth of
> >a tag in the document, or font information, such as if the
> >term appeared in a header or in the main body of text.
> >
> >Are there any plans to add such functionality to the API? If
> >not, where would be a the appropriate place to implement these
> >changes? I presume the TermInfosWriter and TermInfosReader
> >would have to be altered, as well as the classes which call
> >them. Could this be done without having to modify the index in
> >such a way that standard Lucene indexes couldn't read it?
> >
> >Thanks
> >
> >Shane
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
Re: Adding generic payloads to a Term's posting list
Posted by jian chen <ch...@gmail.com>.
Hi,
I have been studying the Lucene indexing code for a bit. I am not sure if I
understand the problem scope completely, but, storing extra information
using TermsInfoWriter may not solve the problem?
For the example of XML document tag depth, could that be a seperate field?
Because Lucene term is a combination of (field, termText), so, depth could
be a field and even though two XML tags are the same, if their depths are
different, they are still treated as separate terms.
This is what I could think about so far.
Jian
On 10/10/05, Grant Ingersoll <gs...@syr.edu> wrote:
>
> http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard
>
> See item #11 of API changes. Maybe along the lines of what you are
> interested in, although I don't know if anyone has even attempted a design
> of it. I would also like to see this, plus the ability to store info at
> higher levels in the Index, such as Field (not on a per token basis),
> Document (info about the document that spans it's fields) and Index (such
> as
> coreference information). Alas, no time...
>
> -Grant
>
> >-----Original Message-----
> >From: Shane O'Sullivan [mailto:shaneosullivan1@gmail.com]
> >Sent: Monday, October 10, 2005 8:38 AM
> >To: java-dev@lucene.apache.org
> >Subject: Adding generic payloads to a Term's posting list
> >
> >Hi,
> >
> >To the best of my knowledge, it is not possible to add generic
> >data to a Term's posting list.
> >By this I mean info that is defined by the search engine, not
> >Lucene itself.
> >Whereas Lucene adds some data to the posting lists, such as
> >the term's position within a document, there are many other
> >useful types of information that could be attached to a term.
> >
> >Some examples would be in XML documents, to store the depth of
> >a tag in the document, or font information, such as if the
> >term appeared in a header or in the main body of text.
> >
> >Are there any plans to add such functionality to the API? If
> >not, where would be a the appropriate place to implement these
> >changes? I presume the TermInfosWriter and TermInfosReader
> >would have to be altered, as well as the classes which call
> >them. Could this be done without having to modify the index in
> >such a way that standard Lucene indexes couldn't read it?
> >
> >Thanks
> >
> >Shane
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
RE: Adding generic payloads to a Term's posting list
Posted by Grant Ingersoll <gs...@syr.edu>.
http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard
See item #11 of API changes. Maybe along the lines of what you are
interested in, although I don't know if anyone has even attempted a design
of it. I would also like to see this, plus the ability to store info at
higher levels in the Index, such as Field (not on a per token basis),
Document (info about the document that spans it's fields) and Index (such as
coreference information). Alas, no time...
-Grant
>-----Original Message-----
>From: Shane O'Sullivan [mailto:shaneosullivan1@gmail.com]
>Sent: Monday, October 10, 2005 8:38 AM
>To: java-dev@lucene.apache.org
>Subject: Adding generic payloads to a Term's posting list
>
>Hi,
>
>To the best of my knowledge, it is not possible to add generic
>data to a Term's posting list.
>By this I mean info that is defined by the search engine, not
>Lucene itself.
>Whereas Lucene adds some data to the posting lists, such as
>the term's position within a document, there are many other
>useful types of information that could be attached to a term.
>
>Some examples would be in XML documents, to store the depth of
>a tag in the document, or font information, such as if the
>term appeared in a header or in the main body of text.
>
>Are there any plans to add such functionality to the API? If
>not, where would be a the appropriate place to implement these
>changes? I presume the TermInfosWriter and TermInfosReader
>would have to be altered, as well as the classes which call
>them. Could this be done without having to modify the index in
>such a way that standard Lucene indexes couldn't read it?
>
>Thanks
>
>Shane
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org