You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Lorenzo Luengo <lo...@gmail.com> on 2011/10/04 19:22:49 UTC

Question about file format

Hi all,

I'm trying to make my own reader for lucene files, in pure python (i 
haven't found a suitable library for windows x64). And while reading 
docs, a question arises.

In http://lucene.apache.org/java/3_4_0/fileformats.html#String it says 
that the string is composed of an VInt and a sequence of modified UTF-8 
encoded chars. My question is: That VInt is the length of the string 
before encoding or is the number of encoded bytes?

Regards.

-- 
Lorenzo Luengo C.
Ingeniero Civil Electrónico
Cel: 98270385


Re: Question about file format

Posted by Lorenzo Luengo <lo...@gmail.com>.
Thanks Uwe!

On 04-10-2011 14:28, Uwe Schindler wrote:
> Bytes since recent Lucene versions (on or after 2.4).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Lorenzo Luengo [mailto:loluengo@gmail.com]
>> Sent: Tuesday, October 04, 2011 7:23 PM
>> To: general@lucene.apache.org
>> Subject: Question about file format
>>
>> Hi all,
>>
>> I'm trying to make my own reader for lucene files, in pure python (i
> haven't
>> found a suitable library for windows x64). And while reading docs, a
> question
>> arises.
>>
>> In http://lucene.apache.org/java/3_4_0/fileformats.html#String it says
> that the
>> string is composed of an VInt and a sequence of modified UTF-8 encoded
> chars.
>> My question is: That VInt is the length of the string before encoding or
> is the
>> number of encoded bytes?
>>
>> Regards.
>>
>> --
>> Lorenzo Luengo C.
>> Ingeniero Civil Electrónico
>> Cel: 98270385


-- 
Lorenzo Luengo C.
Ingeniero Civil Electrónico
Cel: 98270385


RE: Question about file format

Posted by Uwe Schindler <uw...@thetaphi.de>.
Bytes since recent Lucene versions (on or after 2.4).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Lorenzo Luengo [mailto:loluengo@gmail.com]
> Sent: Tuesday, October 04, 2011 7:23 PM
> To: general@lucene.apache.org
> Subject: Question about file format
> 
> Hi all,
> 
> I'm trying to make my own reader for lucene files, in pure python (i
haven't
> found a suitable library for windows x64). And while reading docs, a
question
> arises.
> 
> In http://lucene.apache.org/java/3_4_0/fileformats.html#String it says
that the
> string is composed of an VInt and a sequence of modified UTF-8 encoded
chars.
> My question is: That VInt is the length of the string before encoding or
is the
> number of encoded bytes?
> 
> Regards.
> 
> --
> Lorenzo Luengo C.
> Ingeniero Civil Electrónico
> Cel: 98270385