You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2007/12/13 15:50:13 UTC

IndexOutput writeVInt and others

I have been fiddling w/ some payload token filter helpers, such as the  
NumericPayloadTokenFilter.  I was in the process of adding a  
TokenOffsetPayloadTokenFilter (what a mouthful) that adds the start  
and end offset as payloads to the token.  Now, the easiest way to do  
this is to encode the first 4 bytes with the startOffset and another 4  
bytes as the endOffset.  Then it occurs to me that it might make sense  
to encode them as vInts to save some bits.  Naturally, there is no  
point in duplicating code, so I wonder if it makes sense to make these  
available for people wanting to encode payloads.  Any thoughts on this?

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: IndexOutput writeVInt and others

Posted by Paul Elschot <pa...@xs4all.nl>.
I've copied writing and reading VInt's into SortedVIntList in LUCENE-584,
and I added a comment there refering to the original code.
A utility class on byte[] (or ByteBuffer) would be good to get rid
of that copy.

Regards,
Paul Elschot


On Saturday 15 December 2007 07:51:06 Shai Erera wrote:
> Why not add IndexInput/Output methods to read/write to/from
> Input/OutputStream, ByteBuffer and/or byte[]?
> Isn't their logic general enough to make them utility classes?
> 
...

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: IndexOutput writeVInt and others

Posted by Shai Erera <se...@gmail.com>.
Why not add IndexInput/Output methods to read/write to/from
Input/OutputStream, ByteBuffer and/or byte[]?
Isn't their logic general enough to make them utility classes?

On Dec 14, 2007 9:55 AM, Doron Cohen <cd...@gmail.com> wrote:

> On Dec 13, 2007 6:55 PM, Grant Ingersoll <gs...@apache.org> wrote:
>
> > Yes.
> >
> > On Dec 13, 2007, at 10:43 AM, Doron Cohen wrote:
> >
> > > Did you mean refactoring IndexInput.readVint() and
> > > IndeOutput.writeVint()
> > > so that they can be used for e.g. payloads?
>
>
> Huh, sorry for the stupid question, it was in the subject all along.
>
>
> > >
> > > On Dec 13, 2007 4:50 PM, Grant Ingersoll < gsingers@apache.org> wrote:
> > >
> > >> I have been fiddling w/ some payload token filter helpers, such as
> > >> the
> > >> NumericPayloadTokenFilter.  I was in the process of adding a
> > >> TokenOffsetPayloadTokenFilter (what a mouthful) that adds the start
> > >> and end offset as payloads to the token.  Now, the easiest way to do
> > >> this is to encode the first 4 bytes with the startOffset and
> > >> another 4
> > >> bytes as the endOffset.  Then it occurs to me that it might make
> > >> sense
> > >> to encode them as vInts to save some bits.  Naturally, there is no
> > >> point in duplicating code, so I wonder if it makes sense to make
> > >> these
> > >> available for people wanting to encode payloads.  Any thoughts on
> > >> this?
>
>
> Seems good to me, as long as IndexIn/Output efficiency and
> readability is not hurt. Do you already have an API in mind?
>
>
> > >>
> > >> -Grant
> >
>



-- 
Regards,

Shai Erera

Re: IndexOutput writeVInt and others

Posted by Doron Cohen <cd...@gmail.com>.
On Dec 13, 2007 6:55 PM, Grant Ingersoll <gs...@apache.org> wrote:

> Yes.
>
> On Dec 13, 2007, at 10:43 AM, Doron Cohen wrote:
>
> > Did you mean refactoring IndexInput.readVint() and
> > IndeOutput.writeVint()
> > so that they can be used for e.g. payloads?


Huh, sorry for the stupid question, it was in the subject all along.


> >
> > On Dec 13, 2007 4:50 PM, Grant Ingersoll < gsingers@apache.org> wrote:
> >
> >> I have been fiddling w/ some payload token filter helpers, such as
> >> the
> >> NumericPayloadTokenFilter.  I was in the process of adding a
> >> TokenOffsetPayloadTokenFilter (what a mouthful) that adds the start
> >> and end offset as payloads to the token.  Now, the easiest way to do
> >> this is to encode the first 4 bytes with the startOffset and
> >> another 4
> >> bytes as the endOffset.  Then it occurs to me that it might make
> >> sense
> >> to encode them as vInts to save some bits.  Naturally, there is no
> >> point in duplicating code, so I wonder if it makes sense to make
> >> these
> >> available for people wanting to encode payloads.  Any thoughts on
> >> this?


Seems good to me, as long as IndexIn/Output efficiency and
readability is not hurt. Do you already have an API in mind?


> >>
> >> -Grant
>

Re: IndexOutput writeVInt and others

Posted by Grant Ingersoll <gs...@apache.org>.
Yes.

On Dec 13, 2007, at 10:43 AM, Doron Cohen wrote:

> Did you mean refactoring IndexInput.readVint() and  
> IndeOutput.writeVint()
> so that they can be used for e.g. payloads?
>
> On Dec 13, 2007 4:50 PM, Grant Ingersoll < gsingers@apache.org> wrote:
>
>> I have been fiddling w/ some payload token filter helpers, such as  
>> the
>> NumericPayloadTokenFilter.  I was in the process of adding a
>> TokenOffsetPayloadTokenFilter (what a mouthful) that adds the start
>> and end offset as payloads to the token.  Now, the easiest way to do
>> this is to encode the first 4 bytes with the startOffset and  
>> another 4
>> bytes as the endOffset.  Then it occurs to me that it might make  
>> sense
>> to encode them as vInts to save some bits.  Naturally, there is no
>> point in duplicating code, so I wonder if it makes sense to make  
>> these
>> available for people wanting to encode payloads.  Any thoughts on  
>> this?
>>
>> -Grant
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: IndexOutput writeVInt and others

Posted by Doron Cohen <cd...@gmail.com>.
Did you mean refactoring IndexInput.readVint() and IndeOutput.writeVint()
so that they can be used for e.g. payloads?

On Dec 13, 2007 4:50 PM, Grant Ingersoll < gsingers@apache.org> wrote:

> I have been fiddling w/ some payload token filter helpers, such as the
> NumericPayloadTokenFilter.  I was in the process of adding a
> TokenOffsetPayloadTokenFilter (what a mouthful) that adds the start
> and end offset as payloads to the token.  Now, the easiest way to do
> this is to encode the first 4 bytes with the startOffset and another 4
> bytes as the endOffset.  Then it occurs to me that it might make sense
> to encode them as vInts to save some bits.  Naturally, there is no
> point in duplicating code, so I wonder if it makes sense to make these
> available for people wanting to encode payloads.  Any thoughts on this?
>
> -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>