You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2009/03/27 00:38:50 UTC

[jira] Created: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

PooledSegmentReader, pools SegmentReader underlying byte arrays
---------------------------------------------------------------

                 Key: LUCENE-1574
                 URL: https://issues.apache.org/jira/browse/LUCENE-1574
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/*
    Affects Versions: 2.4.1
            Reporter: Jason Rutherglen
            Priority: Minor
             Fix For: 2.9


PooledSegmentReader pools the underlying byte arrays of deleted docs and norms for realtime search.  It is designed for use with IndexReader.clone which can create many copies of byte arrays, which are of the same length for a given segment.  When pooled they can be reused which could save on memory.  

Do we want to benchmark the memory usage comparison of PooledSegmentReader vs GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1574:
---------------------------------------

    Fix Version/s:     (was: 2.9)

Moving out.

> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---------------------------------------------------------------
>
>                 Key: LUCENE-1574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> PooledSegmentReader pools the underlying byte arrays of deleted docs and norms for realtime search.  It is designed for use with IndexReader.clone which can create many copies of byte arrays, which are of the same length for a given segment.  When pooled they can be reused which could save on memory.  
> Do we want to benchmark the memory usage comparison of PooledSegmentReader vs GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695115#action_12695115 ] 

Michael McCandless commented on LUCENE-1574:
--------------------------------------------

Presumably it wouldn't save on memory (the pool would presumably sometimes be holding onto spares, for future reuse), but could save on time, right?

Or, maybe instead we could spend our effort making a simple transactional data structure for holding deletes/norms (I think there's already an issue on this -- maybe it's LUCENE-1526).

> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---------------------------------------------------------------
>
>                 Key: LUCENE-1574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> PooledSegmentReader pools the underlying byte arrays of deleted docs and norms for realtime search.  It is designed for use with IndexReader.clone which can create many copies of byte arrays, which are of the same length for a given segment.  When pooled they can be reused which could save on memory.  
> Do we want to benchmark the memory usage comparison of PooledSegmentReader vs GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

Posted by "John Wang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737950#action_12737950 ] 

John Wang commented on LUCENE-1574:
-----------------------------------

Re: Zoie and deleted docs:
That is no longer true, Zoie is using a bloom filter over a intHash set from fastutil for exactly the perf reason Jason pointed.

> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---------------------------------------------------------------
>
>                 Key: LUCENE-1574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> PooledSegmentReader pools the underlying byte arrays of deleted docs and norms for realtime search.  It is designed for use with IndexReader.clone which can create many copies of byte arrays, which are of the same length for a given segment.  When pooled they can be reused which could save on memory.  
> Do we want to benchmark the memory usage comparison of PooledSegmentReader vs GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776321#action_12776321 ] 

Jason Rutherglen commented on LUCENE-1574:
------------------------------------------

I suppose as we're on Java 1.5, ConcurrentLinkedQueue can be used.

> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---------------------------------------------------------------
>
>                 Key: LUCENE-1574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> PooledSegmentReader pools the underlying byte arrays of deleted docs and norms for realtime search.  It is designed for use with IndexReader.clone which can create many copies of byte arrays, which are of the same length for a given segment.  When pooled they can be reused which could save on memory.  
> Do we want to benchmark the memory usage comparison of PooledSegmentReader vs GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695130#action_12695130 ] 

Jason Rutherglen commented on LUCENE-1574:
------------------------------------------

True the pool would hold onto spares, but they would expire.
It's mostly useful for the large on disk segments as those byte
arrays (for BitVectors) are large, and because there's more docs
in them would get hit with deletes more often, and so they'd be
reused fairly often. 

I'm not knowledgeable enough to say whether the transactional
data structure will be fast enough. We had been using
http://fastutil.dsi.unimi.it/docs/it/unimi/dsi/fastutil/ints/IntR
BTreeSet.html in Zoie for deleted docs and it's way slow. Binary
search of an int array is faster, albeit not fast enough. The
multi dimensional array thing isn't fast enough (for searching)
as we implemented this in Bobo. It's implemented in Bobo because
we have a multi value field cache (which is quite large because
for each doc we're storing potentially 64 or more values in an
inplace bitset) and a single massive array kills the GC. In some
cases this is faster than a single large array because of the
way Java (or the OS?) transfers memory around through the CPU
cache. 

> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---------------------------------------------------------------
>
>                 Key: LUCENE-1574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> PooledSegmentReader pools the underlying byte arrays of deleted docs and norms for realtime search.  It is designed for use with IndexReader.clone which can create many copies of byte arrays, which are of the same length for a given segment.  When pooled they can be reused which could save on memory.  
> Do we want to benchmark the memory usage comparison of PooledSegmentReader vs GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776576#action_12776576 ] 

Jason Rutherglen commented on LUCENE-1574:
------------------------------------------

A likely optimization for this patch is we'll only pool if the doc count is above a threshold, 100,000 seems like a good number.  Also pooling will be optional.  

> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---------------------------------------------------------------
>
>                 Key: LUCENE-1574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> PooledSegmentReader pools the underlying byte arrays of deleted docs and norms for realtime search.  It is designed for use with IndexReader.clone which can create many copies of byte arrays, which are of the same length for a given segment.  When pooled they can be reused which could save on memory.  
> Do we want to benchmark the memory usage comparison of PooledSegmentReader vs GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776299#action_12776299 ] 

Jason Rutherglen commented on LUCENE-1574:
------------------------------------------

Yonik,

Do you recommend using the method in SimpleStringInterner for lockless pooling?

> PooledSegmentReader, pools SegmentReader underlying byte arrays
> ---------------------------------------------------------------
>
>                 Key: LUCENE-1574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1574
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4.1
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 3.1
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> PooledSegmentReader pools the underlying byte arrays of deleted docs and norms for realtime search.  It is designed for use with IndexReader.clone which can create many copies of byte arrays, which are of the same length for a given segment.  When pooled they can be reused which could save on memory.  
> Do we want to benchmark the memory usage comparison of PooledSegmentReader vs GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org