You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2013/03/03 11:33:13 UTC

[jira] [Commented] (LUCENE-3918) Port index sorter to trunk APIs

    [ https://issues.apache.org/jira/browse/LUCENE-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591699#comment-13591699 ] 

Shai Erera commented on LUCENE-3918:
------------------------------------

bq. Then lets do this! (the right way, in memory during indexing before it hits the disk, not re-ordering existing on-disk segments after the fact)

This issue is about porting the previous IndexSorter implementation to trunk API. The previous one offered a one-time sorting of an index, so is this one. While that doesn't mean we shouldn't explore alternatives, I find it a much lower hanging fruit than LUCENE-4752, especially as no one yet assigned the issue to himself, nor it looks like any progress was made. If LUCENE-4752 will eventually see the light of day, I don't mind if we nuke IndexSorter completely (by a SortingCodec I guess?), but until then, I think that offering users *A* way to sort their index is valuable too.

Also, it's not clear to me at the moment (but I admit I haven't thought about it much) how can you sort documents during indexing, while the values to be sorted by may still be unknown? I.e. what if your sort-by-key is a NumericDocValues which the Codec hasn't seen yet? How should it write posting lists, stored fields etc.? Does this mean the Codec must cache the entire to-be-written segment in RAM? That will consume much more RAM than the approach in this issue ...

I think that online sorting is much more powerful than a one-time sort, but there's work to do to make it happen, and efficiently. Therefore until then, I think that we should proceed with this offline sorting strategy, which is better than nothing.
                
> Port index sorter to trunk APIs
> -------------------------------
>
>                 Key: LUCENE-3918
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3918
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: modules/other
>    Affects Versions: 4.0-ALPHA
>            Reporter: Robert Muir
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-3918.patch, LUCENE-3918.patch, LUCENE-3918.patch
>
>
> LUCENE-2482 added an IndexSorter to 3.x, but we need to port this
> functionality to 4.0 apis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org