You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Simon Willnauer (JIRA)" <ji...@apache.org> on 2012/04/26 12:01:17 UTC

[jira] [Created] (LUCENE-4022) Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available

Simon Willnauer created LUCENE-4022:
---------------------------------------

             Summary: Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available
                 Key: LUCENE-4022
                 URL: https://issues.apache.org/jira/browse/LUCENE-4022
             Project: Lucene - Java
          Issue Type: Bug
          Components: modules/spellchecker
    Affects Versions: 3.6, 4.0
            Reporter: Simon Willnauer
             Fix For: 4.0, 3.6.1


The Sorter we use for offline sorting seems to use the MIN_BUFFER_SIZE as a upper bound even if there is more memory available. See this snippet:
{code}
long half = free/2;
if (half >= ABSOLUTE_MIN_SORT_BUFFER_SIZE) { 
  return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
}
      
// by max mem (heap will grow)
half = (max - total) / 2;
return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
{code}

use use use Math.max instead of min here.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-4022) Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer reassigned LUCENE-4022:
---------------------------------------

    Assignee: Simon Willnauer
    
> Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4022
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4022
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/spellchecker
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0, 3.6.1
>
>         Attachments: LUCENE-4022.patch
>
>
> The Sorter we use for offline sorting seems to use the MIN_BUFFER_SIZE as a upper bound even if there is more memory available. See this snippet:
> {code}
> long half = free/2;
> if (half >= ABSOLUTE_MIN_SORT_BUFFER_SIZE) { 
>   return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> }
>       
> // by max mem (heap will grow)
> half = (max - total) / 2;
> return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> {code}
> use use use Math.max instead of min here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-4022) Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262502#comment-13262502 ] 

Dawid Weiss commented on LUCENE-4022:
-------------------------------------

It's a bug, don't know if it was a regression when we talked about how to estimate "half available heap" or if it was there even before then, but it should be Math.max().

Should we check for max array size overflows (for folks with super-large heaps)?
                
> Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4022
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4022
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/spellchecker
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0, 3.6.1
>
>
> The Sorter we use for offline sorting seems to use the MIN_BUFFER_SIZE as a upper bound even if there is more memory available. See this snippet:
> {code}
> long half = free/2;
> if (half >= ABSOLUTE_MIN_SORT_BUFFER_SIZE) { 
>   return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> }
>       
> // by max mem (heap will grow)
> half = (max - total) / 2;
> return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> {code}
> use use use Math.max instead of min here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-4022) Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271194#comment-13271194 ] 

Dawid Weiss commented on LUCENE-4022:
-------------------------------------

Looks good to me Simon. How did you come up with the 10x factor though? Is it something off the top of your head?
                
> Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4022
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4022
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/spellchecker
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0, 3.6.1
>
>         Attachments: LUCENE-4022.patch
>
>
> The Sorter we use for offline sorting seems to use the MIN_BUFFER_SIZE as a upper bound even if there is more memory available. See this snippet:
> {code}
> long half = free/2;
> if (half >= ABSOLUTE_MIN_SORT_BUFFER_SIZE) { 
>   return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> }
>       
> // by max mem (heap will grow)
> half = (max - total) / 2;
> return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> {code}
> use use use Math.max instead of min here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-4022) Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271312#comment-13271312 ] 

Simon Willnauer commented on LUCENE-4022:
-----------------------------------------

bq. How did you come up with the 10x factor though? Is it something off the top of your head?

I wanted to differentiate between a significantly bigger "unallocated" heap to force a grow if it makes sense so factor 10 seemed to be a good start. I mean this automatic stuff should be a conservative default that gives you reasonable performance. In the first place it should make sure your system is stable and doesn't run into OOM etc. It might seem somewhat arbitrarily. I will add a changes entry and commit this stuff. Seems like robert wants to roll a 3.6.1 soonish ;) 
                
> Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4022
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4022
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/spellchecker
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0, 3.6.1
>
>         Attachments: LUCENE-4022.patch
>
>
> The Sorter we use for offline sorting seems to use the MIN_BUFFER_SIZE as a upper bound even if there is more memory available. See this snippet:
> {code}
> long half = free/2;
> if (half >= ABSOLUTE_MIN_SORT_BUFFER_SIZE) { 
>   return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> }
>       
> // by max mem (heap will grow)
> half = (max - total) / 2;
> return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> {code}
> use use use Math.max instead of min here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-4022) Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-4022:
------------------------------------

    Attachment: LUCENE-4022.patch

here is a patch with a slightly change algorithm. It still takes free/2 as the base buffer size but checks if it is reasonable to grow the heap if the total available mem is 10x larger than the free memory or if the free memory is smaller than MIN_BUFFER_SIZE_MB. If we run into small heaps like on mobile phones where you only have up to 3MB this falls back to the 1/2 or the ABSOLUTE_MIN_SORT_BUFFER_SIZE. 

The actual buffer size is bounded by Integer.MAX_VALUE
                
> Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4022
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4022
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/spellchecker
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>             Fix For: 4.0, 3.6.1
>
>         Attachments: LUCENE-4022.patch
>
>
> The Sorter we use for offline sorting seems to use the MIN_BUFFER_SIZE as a upper bound even if there is more memory available. See this snippet:
> {code}
> long half = free/2;
> if (half >= ABSOLUTE_MIN_SORT_BUFFER_SIZE) { 
>   return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> }
>       
> // by max mem (heap will grow)
> half = (max - total) / 2;
> return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> {code}
> use use use Math.max instead of min here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-4022) Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-4022.
-------------------------------------

    Resolution: Fixed

committed to 3.6 branch and trunk
                
> Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4022
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4022
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/spellchecker
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0, 3.6.1
>
>         Attachments: LUCENE-4022.patch
>
>
> The Sorter we use for offline sorting seems to use the MIN_BUFFER_SIZE as a upper bound even if there is more memory available. See this snippet:
> {code}
> long half = free/2;
> if (half >= ABSOLUTE_MIN_SORT_BUFFER_SIZE) { 
>   return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> }
>       
> // by max mem (heap will grow)
> half = (max - total) / 2;
> return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> {code}
> use use use Math.max instead of min here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-4022) Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer updated LUCENE-4022:
------------------------------------

    Lucene Fields: New,Patch Available  (was: New)
    
> Offline Sorter wrongly uses MIN_BUFFER_SIZE if there is more memory available
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4022
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4022
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/spellchecker
>    Affects Versions: 3.6, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0, 3.6.1
>
>         Attachments: LUCENE-4022.patch
>
>
> The Sorter we use for offline sorting seems to use the MIN_BUFFER_SIZE as a upper bound even if there is more memory available. See this snippet:
> {code}
> long half = free/2;
> if (half >= ABSOLUTE_MIN_SORT_BUFFER_SIZE) { 
>   return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> }
>       
> // by max mem (heap will grow)
> half = (max - total) / 2;
> return new BufferSize(Math.min(MIN_BUFFER_SIZE_MB * MB, half));
> {code}
> use use use Math.max instead of min here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org