You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Earwin Burrfoot (JIRA)" <ji...@apache.org> on 2010/05/19 20:55:52 UTC

[jira] Created: (LUCENE-2471) Supporting bulk copies in Directory

Supporting bulk copies in Directory
-----------------------------------

                 Key: LUCENE-2471
                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Earwin Burrfoot


A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869516#action_12869516 ] 

Shai Erera commented on LUCENE-2471:
------------------------------------

Sorry, too many issue these days. I meant LUCENE-2455. I've removed FSDir.copyTo mehods and instead created Dir.copy(Dir, File, File). Still need to upload the patch, with those changes.

But aside from that, I think the API you're talking about is good.

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869524#action_12869524 ] 

Shai Erera commented on LUCENE-2471:
------------------------------------

The default impl still exist, only in the form of a single file copy instead of an entire directory. There were a couple of reasons to replace them:
# They didn't take a target-name API. when I'm copying the segments in addIndexes over on LUCENE-2455, I need to rename then in the process (to reflect their new segment name), and the API did not exist.
# The API was very dangerous as it overwrote thr target files, no questions asked. So you could very easily overwrite one of the segments.

You can still accomplish that by iterating on the dir yourself and copy the files that you want, only you can do that selectively, leas risky and rename them in the process.

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869504#action_12869504 ] 

Earwin Burrfoot commented on LUCENE-2471:
-----------------------------------------

Bad link? The issue is closed already and no mentions of Directory in the patches.

Immediate consumer, just as I said - is all bulk-merging code. I.e. - instead of loading norms to a byte array and then writing them out, you do, roughly:
{code}
IndexInput normFile = ...;
IndexOutput newNormFile = ...;
newNormFile.write(normFile, offset, length);
{code}

I looked at FSDir and refreshed my memory. copyTo is implemented with channels and transferTo, I think new method will look quite similar.

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869579#action_12869579 ] 

Earwin Burrfoot commented on LUCENE-2471:
-----------------------------------------

The only reason for keeping that iteration within Directory was to reuse the buffer. The savings are neglectable, I think. 

bq. Copying an entire Directory is used by Lucene code only in RAMDir when it's init'ed w/ a Directory.
So what? :) I used copying an entire Directory for backup purporses, then switched to copyTo(collection), to cherry-pick a single commit.

Still I agree with switching to single file copy+rename. Back-compat luckily went out of the window, so we can design better APIs :)
Can we do this in a separate issue from LUCENE-2455 ?

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869522#action_12869522 ] 

Earwin Burrfoot commented on LUCENE-2471:
-----------------------------------------

Ahem. Why did you remove them? :)
The point was to have default impl on Directory and transferTo-optimized one on FSDirectory.

Ok, let's wait for your patch.

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-2471:
--------------------------------

    Component/s: Store

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Earwin Burrfoot
>             Fix For: 3.1, 4.0
>
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869537#action_12869537 ] 

Earwin Burrfoot commented on LUCENE-2471:
-----------------------------------------

Ah. Actually there was two methods, one that copies entire directory, and another - selected files.
The former is a legacy :) Only there for back-compat-loving folk to accept the patch.

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869558#action_12869558 ] 

Michael McCandless commented on LUCENE-2471:
--------------------------------------------

I agree: we should only expose the per-file copyTo, ie, the Directory shouldn't "own" the iteration through a collection of files; the caller can do that.

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869586#action_12869586 ] 

Shai Erera commented on LUCENE-2471:
------------------------------------

LUCENE-2455 depends on this, and the changes are very minor. If we do this in a separate issue, it will block my progress on LUCENE-2455. Let me post a patch there today or tomorrow, and if we won't have consensus on the change, I'll open a separate issue, or reopen LUCENE-2339?

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870560#action_12870560 ] 

Michael McCandless commented on LUCENE-2471:
--------------------------------------------

I think this issue makes sense, separate from LUCENE-2455?  Ie this issue is for bulk copying when you have IndexInput/Output already open (I don't think LUCENE-2455 covers this?).  Whereas LUCENE-2455 is operating on file names...

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>             Fix For: 3.1, 4.0
>
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869556#action_12869556 ] 

Shai Erera commented on LUCENE-2471:
------------------------------------

Yes, I'm aware of the two methods. The one which accepts a Collection of files is better, but it still didn't allow you to rename them in the process. And adding another Collection argument, and require that the two will align seemed unnecessary. So src.copy(dest, from, to) seemed to be enough.

Copying an entire Directory is used by Lucene code only in RAMDir when it's init'ed w/ a Directory. Besides that, the scenario of copying an entire Dir is not really clear when it's useful. So the single file copy gives you as much flexibility as you need, and less chances of making crucial mistakes.

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2471:
---------------------------------------

    Fix Version/s: 3.1
                   4.0

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>             Fix For: 3.1, 4.0
>
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869595#action_12869595 ] 

Earwin Burrfoot commented on LUCENE-2471:
-----------------------------------------

I actually suggested separating so this minor patch goes in without being blocked by your progress on LUCENE-2455 :)

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2471) Supporting bulk copies in Directory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869470#action_12869470 ] 

Shai Erera commented on LUCENE-2471:
------------------------------------

On LUCENE-1585 I'm already introducing a copy(Dir, File, File) which is overridden in FSDirectory to implement using ByteBuffers (like you did on copyTo(Dir). So which directories would benefit from that? RAM only (because NIO and MMap already use FSDir's impl)?

I'm generally +1 for adding such API, just wandering who's the immediate consumer of it.

> Supporting bulk copies in Directory
> -----------------------------------
>
>                 Key: LUCENE-2471
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2471
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Earwin Burrfoot
>
> A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source.
> This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org