You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2010/07/28 21:17:15 UTC

[jira] Created: (LUCENE-2574) Optimize copies between IndexInput and Output

Optimize copies between IndexInput and Output
---------------------------------------------

                 Key: LUCENE-2574
                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Store
            Reporter: Shai Erera
            Assignee: Shai Erera
             Fix For: 3.1, 4.0


We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.

FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.

If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.

If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Reopened: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reopened LUCENE-2574:
----------------------------------------


> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895206#action_12895206 ] 

Shai Erera commented on LUCENE-2574:
------------------------------------

bq. Better to finish drinking coffee in the morning before thinking too much...

"Better to finish drinking coffee in the morning" ... period ! :)

Ok so I'll commit this. Thanks !

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera updated LUCENE-2574:
-------------------------------

    Attachment: LUCENE-2574.patch

Patch adds copyBytes to IndexInput and overrides in RAM and FS classes. Also optimizes copyBytes in RAMOutputStream. I'd appreciate if someone can review this. Changes are on 3x

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894936#action_12894936 ] 

Michael McCandless commented on LUCENE-2574:
--------------------------------------------

OK thanks Shai!  I'll stress test that, and I'll also add the seek in CSIndexInput.copyBytes.

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895199#action_12895199 ] 

Shai Erera commented on LUCENE-2574:
------------------------------------

I've copied it from readInternal:

{code}
          long start = getFilePointer();
          if(start + len > length)
            throw new IOException("read past EOF");
          base.seek(fileOffset + start);
{code}

I think it's ok. fileOffset is where this file starts and length is its length. You want to know if you have enough bytes from 'start', no?

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894858#action_12894858 ] 

Michael McCandless commented on LUCENE-2574:
--------------------------------------------

It was pretty fast to track down once I had a randomized test case showing it...

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894934#action_12894934 ] 

Shai Erera commented on LUCENE-2574:
------------------------------------

Found the problem -- the order of the flushes in FSIndexOutput.copyBytes was wrong. flush() first emptied FSIndexOutput's buffer, but flushBuffer() filled it back. I reversed the calls and the test passes.

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera resolved LUCENE-2574.
--------------------------------

    Resolution: Fixed

Committed revision 980654 (3x).
Committed revision 980656 (trunk).

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Reopened: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reopened LUCENE-2574:
----------------------------------------


This optimization causes index corruption, to at least stored fields.

I've been iterating w/ Karl Wright (see dev thread subject "busywait hang using extracting update handler on trunk"), and tracked it down to a missing seek call in the FSIndexInput.copyBytes method.

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895147#action_12895147 ] 

Shai Erera commented on LUCENE-2574:
------------------------------------

Mike, did you add the seek to CSIndexInput? I don't see it in the commit log.

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895181#action_12895181 ] 

Shai Erera commented on LUCENE-2574:
------------------------------------

Seems like it was left out. I'd like to add this fix, could you please review:

{code}
Index: lucene/src/java/org/apache/lucene/index/CompoundFileReader.java
===================================================================
--- lucene/src/java/org/apache/lucene/index/CompoundFileReader.java	(revision 982137)
+++ lucene/src/java/org/apache/lucene/index/CompoundFileReader.java	(working copy)
@@ -310,6 +310,11 @@
           // If there are more bytes left to copy, delegate the copy task to the
           // base IndexInput, in case it can do an optimized copy.
           if (numBytes > 0) {
+            long start = getFilePointer();
+            if (start + numBytes > length) {
+              throw new IOException("read past EOF");
+            }
+            base.seek(fileOffset + start);
             base.copyBytes(out, numBytes);
           }
         }
{code}

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2574:
---------------------------------------

    Attachment: LUCENE-2574.patch

Attached patch that shows that we still have a bug, here.

The patch adds deletions to the randomized test (and disables all other tests in TIW), and sets a seed that fails quickly.

I'm not sure where the bug is yet...

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-2574.
----------------------------------------

    Resolution: Fixed

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893713#action_12893713 ] 

Michael McCandless commented on LUCENE-2574:
--------------------------------------------

Ahhh you're right, OK, excellent!

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894830#action_12894830 ] 

Shai Erera commented on LUCENE-2574:
------------------------------------

Good catch Mike !

Calling getChannel() on the first time positions the FileChannel properly - it is the latter times that this seek is needed for. I hope this didn't cause you too much headache :).

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893698#action_12893698 ] 

Michael McCandless commented on LUCENE-2574:
--------------------------------------------

Patch looks good Shai!

It looks like NIOFSIndexInput.copyBytes doesn't have an optimized impl (it's currently the default dir on non-Windows)?

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-2574.
----------------------------------------

    Resolution: Fixed

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895191#action_12895191 ] 

Michael McCandless commented on LUCENE-2574:
--------------------------------------------

Woops I did miss this.

Shouldn't the if statement be fileOffset + numBytes > length instead?

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894926#action_12894926 ] 

Shai Erera commented on LUCENE-2574:
------------------------------------

Weird .. I've found out that if I disable this call:

      // flush any bytes in the input's buffer.
      numBytes -= fsInput.flushBuffer(this, numBytes);

from FSIndexOutput.copyBytes, then the test passes.

But I'm not yet sure why ... I've disabled everything else (IndexInput.copyBytes). It's as if this call copies bytes that should not have been copied. I've added this change because I thought there is a bug in the current copyBytes impl: it uses FileChannel to do the optimized copy, but since SimpleFSIndexInput extends BufferedIndexInput, there might be bytes read to its buffer, and not written yet ...

I've made the following changes:
FSIndexOutput.copyBytes:
{code}
      SimpleFSIndexInput fsInput = (SimpleFSIndexInput) input;

// change start
//      // flush any bytes in the input's buffer.
//      numBytes -= fsInput.flushBuffer(this, numBytes);
      
      // flush any bytes in our buffer
      flush();
// change end

      // do the optimized copy
      FileChannel in = fsInput.file.getChannel();
{code}

and rewrote BufferedIndexInput.flushBuffer:

{code}
    int toCopy = bufferLength - bufferPosition;
    if (toCopy < numBytes) {
      // We're copying the entire content of the buffer, so update accordingly.
      out.writeBytes(buffer, bufferPosition, toCopy);
      bufferPosition = bufferLength = 0;
      bufferStart += toCopy;
    } else {
      toCopy = (int) numBytes;
      // We are asked to copy less bytes than are available in the buffer.
      out.writeBytes(buffer, bufferPosition, toCopy);
      bufferPosition += toCopy;
    }
    return toCopy;
{code}

The test now fails on some other exception (not the assert in addRawDocuments). If I remove the call to flushBuffer, it passes. I still need to understand this.

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-2574:
---------------------------------------

    Attachment: LUCENE-2574.patch

Adds a seek to the input FileChannel.

The randomized testcase in TestIW fails (w/o the fix) if you run it enough times.

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893710#action_12893710 ] 

Shai Erera commented on LUCENE-2574:
------------------------------------

I think it borrows the impl from SimpleFSIndexInput, which it overrides? Their impl is the same - both use FileChannel.

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895203#action_12895203 ] 

Michael McCandless commented on LUCENE-2574:
--------------------------------------------

Duh sorry you are correct!  fileOffset is the "base" of this file within the CFS.

Better to finish drinking coffee in the morning before thinking too much...

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2574) Optimize copies between IndexInput and Output

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895214#action_12895214 ] 

Shai Erera commented on LUCENE-2574:
------------------------------------

Committed to 3x and trunk. Hopefully the last time :).

> Optimize copies between IndexInput and Output
> ---------------------------------------------
>
>                 Key: LUCENE-2574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2574.patch, LUCENE-2574.patch, LUCENE-2574.patch
>
>
> We've created an optimized copy of files from Directory to Directory. We've also optimized copyBytes recently. However, we're missing the opposite side of the copy - from IndexInput to Output. I'd like to mimic the FileChannel API by having copyTo on IndexInput and copyFrom on IndexOutput. That way, both sides can optimize the copy process, depending on the type of the IndexInput/Output that they need to copy to/from.
> FSIndexInput/Output can use FileChannel if the two are FS types. RAMInput/OutputStream can copy to/from the buffers directly, w/o going through intermediate ones. Actually, for RAMIn/Out this might be a big win, because it doesn't care about the type of IndexInput/Output given - it just needs to copy to its buffer directly.
> If we do this, I think we can consolidate all Dir.copy() impls down to one (in Directory), and rely on the In/Out ones to do the optimized copy. Plus, it will enable someone to do optimized copies between In/Out outside the scope of Directory.
> If this somehow turns out to be impossible, or won't make sense, then I'd like to optimize RAMDirectory.copy(Dir, src, dest) to not use an intermediate buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org