You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2007/12/13 15:50:13 UTC
IndexOutput writeVInt and others
I have been fiddling w/ some payload token filter helpers, such as the
NumericPayloadTokenFilter. I was in the process of adding a
TokenOffsetPayloadTokenFilter (what a mouthful) that adds the start
and end offset as payloads to the token. Now, the easiest way to do
this is to encode the first 4 bytes with the startOffset and another 4
bytes as the endOffset. Then it occurs to me that it might make sense
to encode them as vInts to save some bits. Naturally, there is no
point in duplicating code, so I wonder if it makes sense to make these
available for people wanting to encode payloads. Any thoughts on this?
-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: IndexOutput writeVInt and others
Posted by Paul Elschot <pa...@xs4all.nl>.
I've copied writing and reading VInt's into SortedVIntList in LUCENE-584,
and I added a comment there refering to the original code.
A utility class on byte[] (or ByteBuffer) would be good to get rid
of that copy.
Regards,
Paul Elschot
On Saturday 15 December 2007 07:51:06 Shai Erera wrote:
> Why not add IndexInput/Output methods to read/write to/from
> Input/OutputStream, ByteBuffer and/or byte[]?
> Isn't their logic general enough to make them utility classes?
>
...
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: IndexOutput writeVInt and others
Posted by Shai Erera <se...@gmail.com>.
Why not add IndexInput/Output methods to read/write to/from
Input/OutputStream, ByteBuffer and/or byte[]?
Isn't their logic general enough to make them utility classes?
On Dec 14, 2007 9:55 AM, Doron Cohen <cd...@gmail.com> wrote:
> On Dec 13, 2007 6:55 PM, Grant Ingersoll <gs...@apache.org> wrote:
>
> > Yes.
> >
> > On Dec 13, 2007, at 10:43 AM, Doron Cohen wrote:
> >
> > > Did you mean refactoring IndexInput.readVint() and
> > > IndeOutput.writeVint()
> > > so that they can be used for e.g. payloads?
>
>
> Huh, sorry for the stupid question, it was in the subject all along.
>
>
> > >
> > > On Dec 13, 2007 4:50 PM, Grant Ingersoll < gsingers@apache.org> wrote:
> > >
> > >> I have been fiddling w/ some payload token filter helpers, such as
> > >> the
> > >> NumericPayloadTokenFilter. I was in the process of adding a
> > >> TokenOffsetPayloadTokenFilter (what a mouthful) that adds the start
> > >> and end offset as payloads to the token. Now, the easiest way to do
> > >> this is to encode the first 4 bytes with the startOffset and
> > >> another 4
> > >> bytes as the endOffset. Then it occurs to me that it might make
> > >> sense
> > >> to encode them as vInts to save some bits. Naturally, there is no
> > >> point in duplicating code, so I wonder if it makes sense to make
> > >> these
> > >> available for people wanting to encode payloads. Any thoughts on
> > >> this?
>
>
> Seems good to me, as long as IndexIn/Output efficiency and
> readability is not hurt. Do you already have an API in mind?
>
>
> > >>
> > >> -Grant
> >
>
--
Regards,
Shai Erera
Re: IndexOutput writeVInt and others
Posted by Doron Cohen <cd...@gmail.com>.
On Dec 13, 2007 6:55 PM, Grant Ingersoll <gs...@apache.org> wrote:
> Yes.
>
> On Dec 13, 2007, at 10:43 AM, Doron Cohen wrote:
>
> > Did you mean refactoring IndexInput.readVint() and
> > IndeOutput.writeVint()
> > so that they can be used for e.g. payloads?
Huh, sorry for the stupid question, it was in the subject all along.
> >
> > On Dec 13, 2007 4:50 PM, Grant Ingersoll < gsingers@apache.org> wrote:
> >
> >> I have been fiddling w/ some payload token filter helpers, such as
> >> the
> >> NumericPayloadTokenFilter. I was in the process of adding a
> >> TokenOffsetPayloadTokenFilter (what a mouthful) that adds the start
> >> and end offset as payloads to the token. Now, the easiest way to do
> >> this is to encode the first 4 bytes with the startOffset and
> >> another 4
> >> bytes as the endOffset. Then it occurs to me that it might make
> >> sense
> >> to encode them as vInts to save some bits. Naturally, there is no
> >> point in duplicating code, so I wonder if it makes sense to make
> >> these
> >> available for people wanting to encode payloads. Any thoughts on
> >> this?
Seems good to me, as long as IndexIn/Output efficiency and
readability is not hurt. Do you already have an API in mind?
> >>
> >> -Grant
>
Re: IndexOutput writeVInt and others
Posted by Grant Ingersoll <gs...@apache.org>.
Yes.
On Dec 13, 2007, at 10:43 AM, Doron Cohen wrote:
> Did you mean refactoring IndexInput.readVint() and
> IndeOutput.writeVint()
> so that they can be used for e.g. payloads?
>
> On Dec 13, 2007 4:50 PM, Grant Ingersoll < gsingers@apache.org> wrote:
>
>> I have been fiddling w/ some payload token filter helpers, such as
>> the
>> NumericPayloadTokenFilter. I was in the process of adding a
>> TokenOffsetPayloadTokenFilter (what a mouthful) that adds the start
>> and end offset as payloads to the token. Now, the easiest way to do
>> this is to encode the first 4 bytes with the startOffset and
>> another 4
>> bytes as the endOffset. Then it occurs to me that it might make
>> sense
>> to encode them as vInts to save some bits. Naturally, there is no
>> point in duplicating code, so I wonder if it makes sense to make
>> these
>> available for people wanting to encode payloads. Any thoughts on
>> this?
>>
>> -Grant
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: IndexOutput writeVInt and others
Posted by Doron Cohen <cd...@gmail.com>.
Did you mean refactoring IndexInput.readVint() and IndeOutput.writeVint()
so that they can be used for e.g. payloads?
On Dec 13, 2007 4:50 PM, Grant Ingersoll < gsingers@apache.org> wrote:
> I have been fiddling w/ some payload token filter helpers, such as the
> NumericPayloadTokenFilter. I was in the process of adding a
> TokenOffsetPayloadTokenFilter (what a mouthful) that adds the start
> and end offset as payloads to the token. Now, the easiest way to do
> this is to encode the first 4 bytes with the startOffset and another 4
> bytes as the endOffset. Then it occurs to me that it might make sense
> to encode them as vInts to save some bits. Naturally, there is no
> point in duplicating code, so I wonder if it makes sense to make these
> available for people wanting to encode payloads. Any thoughts on this?
>
> -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>