You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Marvin Humphrey (JIRA)" <ji...@apache.org> on 2006/06/05 03:56:30 UTC

[jira] Updated: (LUCENE-510) IndexOutput.writeString() should write length in bytes

     [ http://issues.apache.org/jira/browse/LUCENE-510?page=all ]

Marvin Humphrey updated LUCENE-510:
-----------------------------------

    Attachment: SortExternal.java
                TestSortExternal.java

Greets,

I've ported KinoSearch's external sorting module to java, along with its tests.  This class is the linchpin for the KinoSearch merge model, as it allows serialized postings to be dumped into a sort pool of effectively unlimited size.

At some point, I'll submit patches implementing the KinoSearch merge model in Lucene.  I'm reasonably confident that it will more than make up for the index-time performance hit caused by using bytecounts as string headers.

Thematically, this class belongs in org.apache.lucene.util, and that's where I've put it for now.  The classes that will use it are in org.apache.lucene.index, so if it stays in util, it will have to be public.  However, it shouldn't be part of Lucene's documented public API.  The process by which Lucene's docs are generated is not clear to me, so access control advice would be appreciated.

There are a number of other areas where this code could stand review, especially considering my relatively limited experience using Java.  I'd single out exception handling and thread safety, but of course anything else is fair game.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


> IndexOutput.writeString() should write length in bytes
> ------------------------------------------------------
>
>          Key: LUCENE-510
>          URL: http://issues.apache.org/jira/browse/LUCENE-510
>      Project: Lucene - Java
>         Type: Improvement

>   Components: Store
>     Versions: 2.1
>     Reporter: Doug Cutting
>      Fix For: 2.1
>  Attachments: SortExternal.java, TestSortExternal.java, strings.diff
>
> We should change the format of strings written to indexes so that the length of the string is in bytes, not Java characters.  This issue has been discussed at:
> http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html
> We must increment the file format number to indicate this change.  At least the format number in the segments file should change.
> I'm targetting this for 2.1, i.e., we shouldn't commit it to trunk until after 2.0 is released, to minimize incompatible changes between 1.9 and 2.0 (other than removal of deprecated features).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org