You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Dawid Weiss (Jira)" <ji...@apache.org> on 2020/09/29 07:09:00 UTC

[jira] [Updated] (LUCENE-7477) ExternalRefSorter should use OfflineSorter's actual writer for writing the input file

     [ https://issues.apache.org/jira/browse/LUCENE-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated LUCENE-7477:
--------------------------------
    Fix Version/s:     (was: 7.0)

> ExternalRefSorter should use OfflineSorter's actual writer for writing the input file
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7477
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>         Attachments: LUCENE-7477.patch
>
>
> Consider this constructor in ExternalRefSorter:
> {code}
>   public ExternalRefSorter(OfflineSorter sorter) throws IOException {
>     this.sorter = sorter;
>     this.input = sorter.getDirectory().createTempOutput(sorter.getTempFileNamePrefix(), "RefSorterRaw", IOContext.DEFAULT);
>     this.writer = new OfflineSorter.ByteSequencesWriter(this.input);
>   }
> {code}
> The problem with it is that the writer for the initial input file is written with the default {{OfflineSorter.ByteSequencesWriter}}, but the instance of {{OfflineSorter}} may be unable to read it if it overrides {{getReader}} to use something else than the default.
> While this works now, it should be cleaned up (I think). It'd be probably ideal to allow {{OfflineSorter}} to generate its own temporary file and just return the ByteSequencesWriter it chooses to use, so the above snippet would read:
> {code}
>   public ExternalRefSorter(OfflineSorter sorter) throws IOException {
>     this.sorter = sorter;
>     this.writer = sorter.newUnsortedPartition();
>   }
> {code}
> This could be also extended so that {{OfflineSorter}} is in charge of managing its own (sorted and unsorted) partitions. Then {{sort(String file)}} would simply become {{ByteSequenceIterator sort()}} (or even {{Stream<BytesRef> sort()}} as Stream is conveniently {{AutoCloseable}}). If we made {{OfflineSorter}} implement {{Closeable}} it could also take care of cleaning up any resources it opens in the directory we pass to it. An additional bonus would be the ability to dodge the final internal merge(1) -- if we manage sorted and unsorted partitions then there are open possibilities of returning an iterator that dynamically merges from multiple partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org