You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2011/06/14 01:58:47 UTC

[jira] [Created] (LUCENE-3201) improved compound file handling

improved compound file handling
-------------------------------

                 Key: LUCENE-3201
                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Robert Muir


Currently CompoundFileReader could use some improvements, i see the following problems
* its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
* it seeks on every readInternal
* its not possible for a directory to override or improve the handling of compound files.

for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
as a user could read into the next file and be left unaware.

however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
as its position would just work.

So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
case for the least code change would be to add this to Directory.java:

{code}
  public Directory openCompoundInput(String filename) {
    return new CompoundFileReader(this, filename);
  }
{code}

Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049287#comment-13049287 ] 

Michael McCandless commented on LUCENE-3201:
--------------------------------------------

Patch looks great!  Incredible that this means there's no penalty at all at search time when using CFS, if you use MMapDir.

I like that CFS reader is now under oal.store not .index.

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052156#comment-13052156 ] 

Simon Willnauer commented on LUCENE-3201:
-----------------------------------------

this seems ready to commit... I think we should get that in so I can take it further on LUCENE-3218

Robert is it ok for you if I commit this or are you gonig to do it?

simon

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049346#comment-13049346 ] 

Robert Muir commented on LUCENE-3201:
-------------------------------------

I agree, the fileswitchdirectory should delegate the openCompoundInput.

As far as mapping small things, I think we should set this aside for another issue. 
as far as this issue goes, I don't mind returning the DefaultCompound impl if unmapping isn't supported, but i'd really rather defer the open the can of worms of 'mapping small things' to some other issue :)


> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3201) improved compound file handling

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3201:
--------------------------------

    Fix Version/s:     (was: 3.3)
                   3.4

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3201) improved compound file handling

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3201:
--------------------------------

    Attachment: LUCENE-3201.patch

Initial patch for review. In this patch I only cut over MMapDirectory to using a special CompoundFileDirectory, all others use the default as before (but i cleaned up some things about it).

Pretty sure i can easily improve SimpleFS and NIOFS, i'll take a look at that now, but I wanted to get this up for review.


> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3201) improved compound file handling

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3201:
--------------------------------

    Priority: Minor  (was: Blocker)

not a blocker, it was pulled from 3.x (and fixed in trunk)
                
> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 3.5, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3201) improved compound file handling

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3201:
--------------------------------

    Priority: Blocker  (was: Major)

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048912#comment-13048912 ] 

Robert Muir commented on LUCENE-3201:
-------------------------------------

I think for this one, I prefer to wait for Uwe's refactoring of MMap on LUCENE-3200.
Then mmap is simpler, and i think we can even use the same indexinput implementation here.

This would mean no slowdown when searching CFS.


> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049335#comment-13049335 ] 

Uwe Schindler commented on LUCENE-3201:
---------------------------------------

Hi Robert, great patch, exactly as I would have wished to have it when we discussed about it!

Patch looks file, small bug:
- FileSwitchDirectory should also override the openCompoundInput() from Directory and delegate to the correct underlying directory. Now it always uses the default impl, which is double buffering. So if you e.g. put MMapDirectory as a delegate for CFS files, those files would be opened like before your patch. Just copy'n'paste the code from one of the other FileSwitchDirectory methods.

Some suggestions:
We currently map the whole compound file into address space, read the header/contents and unmap it again. This may be some overhead especially if unmapping is not supported.
- We could use SimpleFSIndexInput to read CFS contents (we only need to pass the already open RAF there, alternatively use Dawids new wrapper IndexInput around a standard InputStream, got from RAF -> LUCENE-3202)
- Only map the header of the CFS file, the problem: we dont know exact size.

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048906#comment-13048906 ] 

Michael McCandless commented on LUCENE-3201:
--------------------------------------------

+1

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052159#comment-13052159 ] 

Robert Muir commented on LUCENE-3201:
-------------------------------------

I didnt commit because I didn't measure any performance improvements from the patch (this frustrated me).
Also, I didn't address Uwe's last comment...

In general, I was thinking that this would be a good performance win, but it isn't. So we should consider it from a refactoring perspective only.


> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088168#comment-13088168 ] 

Uwe Schindler commented on LUCENE-3201:
---------------------------------------

During code review I found a problem in the MMap special handling regarding number of open files:

The default CFS Reader opens one file handle for the CFS and then maps slices using CFIndexInput. On the other hand, MMap's CFS directory impl does a separate mapping for each slice. To map this slice, it opens a new file handle, mmaps the slice, and closes the file handle.

The question is now: Will this file handle then be occupied until the mapping diappears? If this is the case, we could have TooManyOpenFiles even for CFS as each sub file would occupy one file handle. At least the MMap specific CFS reader should use the same RAF all the time time and keep it open for mapping.

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3201) improved compound file handling

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3201:
----------------------------------

    Comment: was deleted

(was: During code review I found a problem in the MMap special handling regarding number of open files:

The default CFS Reader opens one file handle for the CFS and then maps slices using CFIndexInput. On the other hand, MMap's CFS directory impl does a separate mapping for each slice. To map this slice, it opens a new file handle, mmaps the slice, and closes the file handle.

The question is now: Will this file handle then be occupied until the mapping diappears? If this is the case, we could have TooManyOpenFiles even for CFS as each sub file would occupy one file handle. At least the MMap specific CFS reader should use the same RAF all the time time and keep it open for mapping.)

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088170#comment-13088170 ] 

Robert Muir commented on LUCENE-3201:
-------------------------------------

thats just not true... but illustrates my point that this stuff is complicated and I think we need to take the safe option here and back it out.

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088169#comment-13088169 ] 

Uwe Schindler commented on LUCENE-3201:
---------------------------------------

My last comment was wrong, the impl was changed before commit so it reuses RAF.

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3201) improved compound file handling

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3201:
--------------------------------

    Attachment: LUCENE-3201.patch

here is an updated patch, including impls for SimpleFS and NIOFS, fixing the FileSwitchDirectory thing uwe mentioned, and also mockdirectorywrapper and NRTCachingDirectory.

all the tests pass with Simple/NIO/MMap but we need to benchmark. haven't had good luck today with luceneutil

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Simon Willnauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118988#comment-13118988 ] 

Simon Willnauer commented on LUCENE-3201:
-----------------------------------------

I think we can close this issue unless we plan to backport the CFS changes to 3.x? Opinions?
                
> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Blocker
>             Fix For: 3.5, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Reopened] (LUCENE-3201) improved compound file handling

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir reopened LUCENE-3201:
---------------------------------


reopening, like LUCENE-3218, I think we should pull this stuff back and revisit.

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (LUCENE-3201) improved compound file handling

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049460#comment-13049460 ] 

Uwe Schindler edited comment on LUCENE-3201 at 6/14/11 10:07 PM:
-----------------------------------------------------------------

Robert: Very nice. Small thing:

- NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory / MMapCompoundFileDirectory are non-static inner classes but still get parent Directory in ctor. This is douplicated as javac also passes the parent around (the special ParentClassName.this one). I would remove the ctor param and use "*FSDirectory.this" as reference to outer class. I nitpick, because at some places it references the parent directory without the ctor param, so its inconsistent.

That's all for now, thanks for hard work!

      was (Author: thetaphi):
    Robert: Very nice. Small thing:

- NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory are non-static inner classes but still get parent Directory in ctor. This is douplicated as javac also passes the parent around (the special ParentClassName.this one). I would remove the ctor param and use "Simple/NIO-FSDirectory.this" as reference to outer class. I nitpick, because at some places it references the parent directory without the ctor param, so its inconsistent.

That's all for now, thanks for hard work!
  
> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088172#comment-13088172 ] 

Uwe Schindler commented on LUCENE-3201:
---------------------------------------

I reverted my comment :-)

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Blocker
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3201) improved compound file handling

Posted by "Simon Willnauer (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-3201.
-------------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 3.5)

I am closing this... if we feel like porting we can still reopen
                
> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3201) improved compound file handling

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-3201.
-------------------------------------

    Resolution: Fixed
      Assignee: Simon Willnauer

incorporated in LUCENE-3218 I will track backporting there

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049352#comment-13049352 ] 

Uwe Schindler commented on LUCENE-3201:
---------------------------------------

We have LUCENE-1743 for the small files can of worms.

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3201) improved compound file handling

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049460#comment-13049460 ] 

Uwe Schindler commented on LUCENE-3201:
---------------------------------------

Robert: Very nice. Small thing:

- NIOFSCompoundFileDirectory / SimpleFSCompoundFileDirectory are non-static inner classes but still get parent Directory in ctor. This is douplicated as javac also passes the parent around (the special ParentClassName.this one). I would remove the ctor param and use "Simple/NIO-FSDirectory.this" as reference to outer class. I nitpick, because at some places it references the parent directory without the ctor param, so its inconsistent.

That's all for now, thanks for hard work!

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3201) improved compound file handling

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-3201:
---------------------------------------

    Fix Version/s:     (was: 3.4)
                   3.5

> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Simon Willnauer
>            Priority: Blocker
>             Fix For: 3.5, 4.0
>
>         Attachments: LUCENE-3201.patch, LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3201) improved compound file handling

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3201:
--------------------------------

    Fix Version/s: 4.0
                   3.3

setting 3.3/4.0 as fix version, as the changes are backwards compatible (compoundfilereader is pkg-private still in 3.x)


> improved compound file handling
> -------------------------------
>
>                 Key: LUCENE-3201
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3201
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 3.3, 4.0
>
>         Attachments: LUCENE-3201.patch
>
>
> Currently CompoundFileReader could use some improvements, i see the following problems
> * its CSIndexInput extends bufferedindexinput, which is stupid for directories like mmap.
> * it seeks on every readInternal
> * its not possible for a directory to override or improve the handling of compound files.
> for example: it seems if you were impl'ing this thing from scratch, you would just wrap the II directly (not extend BufferedIndexInput,
> and add compound file offset X to seek() calls, and override length(). But of course, then you couldnt throw read past EOF always when you should,
> as a user could read into the next file and be left unaware.
> however, some directories could handle this better. for example MMapDirectory could return an indexinput that simply mmaps the 'slice' of the CFS file.
> its underlying bytebuffer etc naturally does bounds checks already etc, so it wouldnt need to be buffered, not even needing to add any offsets to seek(),
> as its position would just work.
> So I think we should try to refactor this so that a Directory can customize how compound files are handled, the simplest 
> case for the least code change would be to add this to Directory.java:
> {code}
>   public Directory openCompoundInput(String filename) {
>     return new CompoundFileReader(this, filename);
>   }
> {code}
> Because most code depends upon the fact compound files are implemented as a Directory and transparent. at least then a subclass could override...
> but the 'recursion' is a little ugly... we could still label it expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org