You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2006/11/02 20:03:16 UTC

[jira] Created: (LUCENE-705) CompoundFileWriter should pre-set its file length

CompoundFileWriter should pre-set its file length
-------------------------------------------------

                 Key: LUCENE-705
                 URL: http://issues.apache.org/jira/browse/LUCENE-705
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
    Affects Versions: 2.1
            Reporter: Michael McCandless
         Assigned To: Michael McCandless
            Priority: Minor


I've read that if you are writing a large file, it's best to pre-set
the size of the file in advance before you write all of its contents.
This in general minimizes fragmentation and improves IO performance
against the file in the future.

I think this makes sense (intuitively) but I haven't done any real
performance testing to verify.

Java has the java.io.File.setLength() method (since 1.2) for this.

We can easily fix CompoundFileWriter to call setLength() on the file
it's writing (and add setLength() method to IndexOutput).  The
CompoundFileWriter knows exactly how large its file will be.

Another good thing is: if you are going run out of disk space, then,
the setLength call should fail up front instead of failing when the
compound file is actually written.  This has two benefits: first, you
find out sooner that you will run out of disk space, and, second, you
don't fill up the disk down to 0 bytes left (always a frustrating
experience!).  Instead you leave what space was available
and throw an IOException.

My one hesitation here is: what if out there there exists a filesystem
that can't handle this call, and it throws an IOException on that
platform?  But this is balanced against possible easy-win improvement
in performance.

Does anyone have any feedback / thoughts / experience relevant to
this?


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-705) CompoundFileWriter should pre-set its file length

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564314#action_12564314 ] 

Otis Gospodnetic commented on LUCENE-705:
-----------------------------------------

Didn't find time to comment on this earlier.
Does this mean one will no longer be able to tell exactly how large the index really is (because some portion of some data files will actually be empty)?


> CompoundFileWriter should pre-set its file length
> -------------------------------------------------
>
>                 Key: LUCENE-705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-705
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-705.patch
>
>
> I've read that if you are writing a large file, it's best to pre-set
> the size of the file in advance before you write all of its contents.
> This in general minimizes fragmentation and improves IO performance
> against the file in the future.
> I think this makes sense (intuitively) but I haven't done any real
> performance testing to verify.
> Java has the java.io.File.setLength() method (since 1.2) for this.
> We can easily fix CompoundFileWriter to call setLength() on the file
> it's writing (and add setLength() method to IndexOutput).  The
> CompoundFileWriter knows exactly how large its file will be.
> Another good thing is: if you are going run out of disk space, then,
> the setLength call should fail up front instead of failing when the
> compound file is actually written.  This has two benefits: first, you
> find out sooner that you will run out of disk space, and, second, you
> don't fill up the disk down to 0 bytes left (always a frustrating
> experience!).  Instead you leave what space was available
> and throw an IOException.
> My one hesitation here is: what if out there there exists a filesystem
> that can't handle this call, and it throws an IOException on that
> platform?  But this is balanced against possible easy-win improvement
> in performance.
> Does anyone have any feedback / thoughts / experience relevant to
> this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-705) CompoundFileWriter should pre-set its file length

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-705:
--------------------------------------

    Attachment: LUCENE-705.patch

Attached patch.  All tests pass on OS X 10.4, Linux 2.6.22, Windows
Server 2003.  I plan to commit in a day or two.

> CompoundFileWriter should pre-set its file length
> -------------------------------------------------
>
>                 Key: LUCENE-705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-705
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-705.patch
>
>
> I've read that if you are writing a large file, it's best to pre-set
> the size of the file in advance before you write all of its contents.
> This in general minimizes fragmentation and improves IO performance
> against the file in the future.
> I think this makes sense (intuitively) but I haven't done any real
> performance testing to verify.
> Java has the java.io.File.setLength() method (since 1.2) for this.
> We can easily fix CompoundFileWriter to call setLength() on the file
> it's writing (and add setLength() method to IndexOutput).  The
> CompoundFileWriter knows exactly how large its file will be.
> Another good thing is: if you are going run out of disk space, then,
> the setLength call should fail up front instead of failing when the
> compound file is actually written.  This has two benefits: first, you
> find out sooner that you will run out of disk space, and, second, you
> don't fill up the disk down to 0 bytes left (always a frustrating
> experience!).  Instead you leave what space was available
> and throw an IOException.
> My one hesitation here is: what if out there there exists a filesystem
> that can't handle this call, and it throws an IOException on that
> platform?  But this is balanced against possible easy-win improvement
> in performance.
> Does anyone have any feedback / thoughts / experience relevant to
> this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-705) CompoundFileWriter should pre-set its file length

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558637#action_12558637 ] 

Michael McCandless commented on LUCENE-705:
-------------------------------------------

OK I'll test on the major platforms, and take that approach.  I'll tentatively target 2.4.

> CompoundFileWriter should pre-set its file length
> -------------------------------------------------
>
>                 Key: LUCENE-705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-705
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.4
>
>
> I've read that if you are writing a large file, it's best to pre-set
> the size of the file in advance before you write all of its contents.
> This in general minimizes fragmentation and improves IO performance
> against the file in the future.
> I think this makes sense (intuitively) but I haven't done any real
> performance testing to verify.
> Java has the java.io.File.setLength() method (since 1.2) for this.
> We can easily fix CompoundFileWriter to call setLength() on the file
> it's writing (and add setLength() method to IndexOutput).  The
> CompoundFileWriter knows exactly how large its file will be.
> Another good thing is: if you are going run out of disk space, then,
> the setLength call should fail up front instead of failing when the
> compound file is actually written.  This has two benefits: first, you
> find out sooner that you will run out of disk space, and, second, you
> don't fill up the disk down to 0 bytes left (always a frustrating
> experience!).  Instead you leave what space was available
> and throw an IOException.
> My one hesitation here is: what if out there there exists a filesystem
> that can't handle this call, and it throws an IOException on that
> platform?  But this is balanced against possible easy-win improvement
> in performance.
> Does anyone have any feedback / thoughts / experience relevant to
> this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-705) CompoundFileWriter should pre-set its file length

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-705:
--------------------------------------

    Fix Version/s: 2.4

> CompoundFileWriter should pre-set its file length
> -------------------------------------------------
>
>                 Key: LUCENE-705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-705
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.4
>
>
> I've read that if you are writing a large file, it's best to pre-set
> the size of the file in advance before you write all of its contents.
> This in general minimizes fragmentation and improves IO performance
> against the file in the future.
> I think this makes sense (intuitively) but I haven't done any real
> performance testing to verify.
> Java has the java.io.File.setLength() method (since 1.2) for this.
> We can easily fix CompoundFileWriter to call setLength() on the file
> it's writing (and add setLength() method to IndexOutput).  The
> CompoundFileWriter knows exactly how large its file will be.
> Another good thing is: if you are going run out of disk space, then,
> the setLength call should fail up front instead of failing when the
> compound file is actually written.  This has two benefits: first, you
> find out sooner that you will run out of disk space, and, second, you
> don't fill up the disk down to 0 bytes left (always a frustrating
> experience!).  Instead you leave what space was available
> and throw an IOException.
> My one hesitation here is: what if out there there exists a filesystem
> that can't handle this call, and it throws an IOException on that
> platform?  But this is balanced against possible easy-win improvement
> in performance.
> Does anyone have any feedback / thoughts / experience relevant to
> this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-705) CompoundFileWriter should pre-set its file length

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558406#action_12558406 ] 

Grant Ingersoll commented on LUCENE-705:
----------------------------------------

This seems reasonable, although I am not an expert in low-level file system calls like this.  I guess for me the thing would be to find out if the major filesystems support it (Windows, OSX, Linux) and then perhaps we can deal w/ others from there as they arise (i.e. those that don't support it)

> CompoundFileWriter should pre-set its file length
> -------------------------------------------------
>
>                 Key: LUCENE-705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-705
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>
> I've read that if you are writing a large file, it's best to pre-set
> the size of the file in advance before you write all of its contents.
> This in general minimizes fragmentation and improves IO performance
> against the file in the future.
> I think this makes sense (intuitively) but I haven't done any real
> performance testing to verify.
> Java has the java.io.File.setLength() method (since 1.2) for this.
> We can easily fix CompoundFileWriter to call setLength() on the file
> it's writing (and add setLength() method to IndexOutput).  The
> CompoundFileWriter knows exactly how large its file will be.
> Another good thing is: if you are going run out of disk space, then,
> the setLength call should fail up front instead of failing when the
> compound file is actually written.  This has two benefits: first, you
> find out sooner that you will run out of disk space, and, second, you
> don't fill up the disk down to 0 bytes left (always a frustrating
> experience!).  Instead you leave what space was available
> and throw an IOException.
> My one hesitation here is: what if out there there exists a filesystem
> that can't handle this call, and it throws an IOException on that
> platform?  But this is balanced against possible easy-win improvement
> in performance.
> Does anyone have any feedback / thoughts / experience relevant to
> this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-705) CompoundFileWriter should pre-set its file length

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-705.
---------------------------------------

    Resolution: Fixed

I just committed this.

> CompoundFileWriter should pre-set its file length
> -------------------------------------------------
>
>                 Key: LUCENE-705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-705
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-705.patch
>
>
> I've read that if you are writing a large file, it's best to pre-set
> the size of the file in advance before you write all of its contents.
> This in general minimizes fragmentation and improves IO performance
> against the file in the future.
> I think this makes sense (intuitively) but I haven't done any real
> performance testing to verify.
> Java has the java.io.File.setLength() method (since 1.2) for this.
> We can easily fix CompoundFileWriter to call setLength() on the file
> it's writing (and add setLength() method to IndexOutput).  The
> CompoundFileWriter knows exactly how large its file will be.
> Another good thing is: if you are going run out of disk space, then,
> the setLength call should fail up front instead of failing when the
> compound file is actually written.  This has two benefits: first, you
> find out sooner that you will run out of disk space, and, second, you
> don't fill up the disk down to 0 bytes left (always a frustrating
> experience!).  Instead you leave what space was available
> and throw an IOException.
> My one hesitation here is: what if out there there exists a filesystem
> that can't handle this call, and it throws an IOException on that
> platform?  But this is balanced against possible easy-win improvement
> in performance.
> Does anyone have any feedback / thoughts / experience relevant to
> this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-705) CompoundFileWriter should pre-set its file length

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564355#action_12564355 ] 

Michael McCandless commented on LUCENE-705:
-------------------------------------------

{quote}
Does this mean one will no longer be able to tell exactly how large the index really is (because some portion of some data files will actually be empty)?
{quote}

Only while the CFS is being built. After it's done being built, then
it is "fully occupied" (no portion are empty).


> CompoundFileWriter should pre-set its file length
> -------------------------------------------------
>
>                 Key: LUCENE-705
>                 URL: https://issues.apache.org/jira/browse/LUCENE-705
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-705.patch
>
>
> I've read that if you are writing a large file, it's best to pre-set
> the size of the file in advance before you write all of its contents.
> This in general minimizes fragmentation and improves IO performance
> against the file in the future.
> I think this makes sense (intuitively) but I haven't done any real
> performance testing to verify.
> Java has the java.io.File.setLength() method (since 1.2) for this.
> We can easily fix CompoundFileWriter to call setLength() on the file
> it's writing (and add setLength() method to IndexOutput).  The
> CompoundFileWriter knows exactly how large its file will be.
> Another good thing is: if you are going run out of disk space, then,
> the setLength call should fail up front instead of failing when the
> compound file is actually written.  This has two benefits: first, you
> find out sooner that you will run out of disk space, and, second, you
> don't fill up the disk down to 0 bytes left (always a frustrating
> experience!).  Instead you leave what space was available
> and throw an IOException.
> My one hesitation here is: what if out there there exists a filesystem
> that can't handle this call, and it throws an IOException on that
> platform?  But this is balanced against possible easy-win improvement
> in performance.
> Does anyone have any feedback / thoughts / experience relevant to
> this?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org