You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by "Noble Paul (JIRA)" <ji...@apache.org> on 2008/07/30 08:13:31 UTC

[jira] Created: (SOLR-667) Alternate LRUCache implementation

Alternate LRUCache implementation
---------------------------------

                 Key: SOLR-667
                 URL: https://issues.apache.org/jira/browse/SOLR-667
             Project: Solr
          Issue Type: New Feature
          Components: search
    Affects Versions: 1.3
            Reporter: Noble Paul


The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618773#action_12618773 ] 

Noble Paul commented on SOLR-667:
---------------------------------

Fuad. You cannot trivialize concurrent programming so easily. Whatever
we have commented are from our experience (and wisdom)  . There is a
price to pay it. Java could have easily eliminated the
java.util.concurrent package by using 'volatile' everywhere and no
need of AtomicInteger,AtomimcLong etc. So they are there for a reason

BTW. Using TreeSet is not 'heavy' . It is the right tool for right
purpose. If you need a sorted set that is best

On Thu, Jul 31, 2008 at 9:58 PM, Fuad Efendi (JIRA) <ji...@apache.org> wrote:

-- 
--Noble Paul

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644321#action_12644321 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

Thanks Noble, we had a mid-air implementation collision :-)
I'm doing some quick performance testing the version I wrote now...I'll try it against your version after and then we can go from there.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

borrowed some ideas from yonik's impl.
Probably this is a good enough first cut.  

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment:     (was: SOLR-667.patch)

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644656#action_12644656 ] 

Shalin Shekhar Mangar commented on SOLR-667:
--------------------------------------------

Here's the performance test from the patch on a more recent machine -- Intel Quad Core, RHEL 64-bit, Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_11-b03, mixed mode):

{code}
time=1456 impl=LRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
time=1041 impl=FastLRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
time=3256 impl=LRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
time=754 impl=FastLRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
time=1234 impl=LRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835225
time=1564 impl=FastLRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.751506
time=3728 impl=LRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835006
time=1384 impl=FastLRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.798109
time=1357 impl=LRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.523398
time=1894 impl=FastLRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.3831675
time=4556 impl=LRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.512785
time=1514 impl=FastLRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.4682115
time=1614 impl=LRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.1445725
time=1837 impl=FastLRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10041049999999996
time=4710 impl=LRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10963999999999996
time=1816 impl=FastLRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.11144399999999999
time=339 impl=LRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=292 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=2511 impl=LRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=351 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=383 impl=LRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.833648
time=580 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.7404839999999999
time=2716 impl=LRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8337875
time=805 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.79799
time=570 impl=LRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5003285
time=794 impl=FastLRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.3516785
time=3676 impl=LRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.49959549999999997
time=1685 impl=FastLRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.436728
time=712 impl=LRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10054949999999996
time=1022 impl=FastLRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.05416350000000003
time=4395 impl=LRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.100526
time=2562 impl=FastLRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.08556600000000003
{code}

With more number of threads this time (4 and 16):
{code}
time=1794 impl=LRUCache nThreads= 4 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
time=594 impl=FastLRUCache nThreads= 4 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816075
time=1737 impl=LRUCache nThreads= 16 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981607
time=602 impl=FastLRUCache nThreads= 16 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981602
time=2387 impl=LRUCache nThreads= 4 size=100000 maxKey=120000 gets=2000000 hitRatio=0.830956
time=866 impl=FastLRUCache nThreads= 4 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8892465
time=1793 impl=LRUCache nThreads= 16 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8274485
time=706 impl=FastLRUCache nThreads= 16 size=100000 maxKey=120000 gets=2000000 hitRatio=0.9586865
time=2233 impl=LRUCache nThreads= 4 size=100000 maxKey=200000 gets=2000000 hitRatio=0.5025255
time=1228 impl=FastLRUCache nThreads= 4 size=100000 maxKey=200000 gets=2000000 hitRatio=0.654153
time=1905 impl=LRUCache nThreads= 16 size=100000 maxKey=200000 gets=2000000 hitRatio=0.500583
time=883 impl=FastLRUCache nThreads= 16 size=100000 maxKey=200000 gets=2000000 hitRatio=0.9067965
time=5336 impl=LRUCache nThreads= 4 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10182199999999997
time=1780 impl=FastLRUCache nThreads= 4 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.25870800000000005
time=2911 impl=LRUCache nThreads= 16 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10132300000000005
time=1941 impl=FastLRUCache nThreads= 16 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.508488
time=687 impl=LRUCache nThreads= 4 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=421 impl=FastLRUCache nThreads= 4 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=782 impl=LRUCache nThreads= 16 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=452 impl=FastLRUCache nThreads= 16 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=813 impl=LRUCache nThreads= 4 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8333735
time=678 impl=FastLRUCache nThreads= 4 size=1000 maxKey=1200 gets=2000000 hitRatio=0.9988885
time=794 impl=LRUCache nThreads= 16 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8331635
time=503 impl=FastLRUCache nThreads= 16 size=1000 maxKey=1200 gets=2000000 hitRatio=0.977526
time=1554 impl=LRUCache nThreads= 4 size=1000 maxKey=2000 gets=2000000 hitRatio=0.500093
time=928 impl=FastLRUCache nThreads= 4 size=1000 maxKey=2000 gets=2000000 hitRatio=0.802332
time=1102 impl=LRUCache nThreads= 16 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5002759999999999
time=566 impl=FastLRUCache nThreads= 16 size=1000 maxKey=2000 gets=2000000 hitRatio=0.954131
time=1543 impl=LRUCache nThreads= 4 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10062899999999997
time=1039 impl=FastLRUCache nThreads= 4 size=1000 maxKey=10000 gets=2000000 hitRatio=0.7582409999999999
time=1372 impl=LRUCache nThreads= 16 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10031000000000001
time=604 impl=FastLRUCache nThreads= 16 size=1000 maxKey=10000 gets=2000000 hitRatio=0.935282
{code}

Now with 8 and 32 threads:
{code}
time=2109 impl=LRUCache nThreads= 8 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816075
time=608 impl=FastLRUCache nThreads= 8 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981606
time=1502 impl=LRUCache nThreads= 32 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816045
time=648 impl=FastLRUCache nThreads= 32 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981592
time=3876 impl=LRUCache nThreads= 8 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8267995
time=748 impl=FastLRUCache nThreads= 8 size=100000 maxKey=120000 gets=2000000 hitRatio=0.915961
time=2176 impl=LRUCache nThreads= 32 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8271935
time=694 impl=FastLRUCache nThreads= 32 size=100000 maxKey=120000 gets=2000000 hitRatio=0.9652565
time=2038 impl=LRUCache nThreads= 8 size=100000 maxKey=200000 gets=2000000 hitRatio=0.5005305
time=1088 impl=FastLRUCache nThreads= 8 size=100000 maxKey=200000 gets=2000000 hitRatio=0.789179
time=2147 impl=LRUCache nThreads= 32 size=100000 maxKey=200000 gets=2000000 hitRatio=0.4997505
time=884 impl=FastLRUCache nThreads= 32 size=100000 maxKey=200000 gets=2000000 hitRatio=0.926915
time=2343 impl=LRUCache nThreads= 8 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10397699999999999
time=2207 impl=FastLRUCache nThreads= 8 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.34063
time=3440 impl=LRUCache nThreads= 32 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10123850000000001
time=2087 impl=FastLRUCache nThreads= 32 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.5367375
time=909 impl=LRUCache nThreads= 8 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=443 impl=FastLRUCache nThreads= 8 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=682 impl=LRUCache nThreads= 32 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=447 impl=FastLRUCache nThreads= 32 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=1189 impl=LRUCache nThreads= 8 size=1000 maxKey=1200 gets=2000000 hitRatio=0.832726
time=605 impl=FastLRUCache nThreads= 8 size=1000 maxKey=1200 gets=2000000 hitRatio=0.919104
time=1463 impl=LRUCache nThreads= 32 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8337005
time=489 impl=FastLRUCache nThreads= 32 size=1000 maxKey=1200 gets=2000000 hitRatio=0.9845845
time=1256 impl=LRUCache nThreads= 8 size=1000 maxKey=2000 gets=2000000 hitRatio=0.500149
time=678 impl=FastLRUCache nThreads= 8 size=1000 maxKey=2000 gets=2000000 hitRatio=0.907774
time=1013 impl=LRUCache nThreads= 32 size=1000 maxKey=2000 gets=2000000 hitRatio=0.49962399999999996
time=503 impl=FastLRUCache nThreads= 32 size=1000 maxKey=2000 gets=2000000 hitRatio=0.976796
time=1504 impl=LRUCache nThreads= 8 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10030550000000005
time=754 impl=FastLRUCache nThreads= 8 size=1000 maxKey=10000 gets=2000000 hitRatio=0.9151345
time=1245 impl=LRUCache nThreads= 32 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10028899999999996
time=499 impl=FastLRUCache nThreads= 32 size=1000 maxKey=10000 gets=2000000 hitRatio=0.978823
{code}

When the number of threads are increased, FastLRUCache is true to its name :)

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644942#action_12644942 ] 

Shalin Shekhar Mangar commented on SOLR-667:
--------------------------------------------

I ran it again with a new thread for each cleanup. Time taken for markAndSweep is printed:
{code}
time=1550 impl=LRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835225
MarkAndSweepTime = 138
MarkAndSweepTime = 35
MarkAndSweepTime = 36
MarkAndSweepTime = 39
MarkAndSweepTime = 41
MarkAndSweepTime = 42
MarkAndSweepTime = 42
MarkAndSweepTime = 44
MarkAndSweepTime = 43
MarkAndSweepTime = 43
MarkAndSweepTime = 44
MarkAndSweepTime = 43
MarkAndSweepTime = 43
MarkAndSweepTime = 44
MarkAndSweepTime = 43
MarkAndSweepTime = 44
MarkAndSweepTime = 44
MarkAndSweepTime = 43
time=1378 impl=FastLRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8130459999999999

time=3942 impl=LRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835045
MarkAndSweepTime = 58
MarkAndSweepTime = 165
MarkAndSweepTime = 32
MarkAndSweepTime = 34
MarkAndSweepTime = 37
MarkAndSweepTime = 37
MarkAndSweepTime = 46
MarkAndSweepTime = 40
MarkAndSweepTime = 61
MarkAndSweepTime = 53
MarkAndSweepTime = 51
MarkAndSweepTime = 44
MarkAndSweepTime = 47
MarkAndSweepTime = 47
MarkAndSweepTime = 48
MarkAndSweepTime = 48
MarkAndSweepTime = 46
time=1062 impl=FastLRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8560415
{code}

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618715#action_12618715 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

bq. Gets are free

Not entirely... there are a few memory barriers that make things thread safe, so I wouldn't call it "free" (since this branched off of another issue where some had the idea that one could get away without any sort of locks or memory barriers).

It's a good approach in genera, and should scale better with many CPUs under very high lookup load.  But I'm not sure that it should use a separate cleaner thread... and if it does, I don't think it should be scheduled.

After we got those details worked out, then we'd need a SolrCache implementation that uses it.  Given where we are in the release cycle (and that the cache contention issue is only affecting 1 person that I've seen), I think this should want until after 1.3

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643535#action_12643535 ] 

Noble Paul commented on SOLR-667:
---------------------------------

bq.This leads to a few response times that are over 1 second, but the average is way down closer to 20ms.
yeah you are right.
there is one more feature in ConcurrentLRUCache whcih enables cleanups to be done in a new thread .FastLRUCache is not using it yet .I'll give a patch soon . This will take care of that 1 sec delay.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635230#action_12635230 ] 

Noble Paul commented on SOLR-667:
---------------------------------

My implementation just uses a Map<K,V> internally. If we can get a Map implemenatation that is faster than  ConcurrentHashMap (and concurrent) we can replace it (after seeing the performance). 





> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618826#action_12618826 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

If you don't need the synchronized block, then an atomic variable for "inserts" (for example) would be a big win.
But if you have the synchronized block anyway, it's probably faster to just expand it's scope if the operations to be done are simple.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-667:
---------------------------------------

    Fix Version/s: 1.4

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643988#action_12643988 ] 

Todd Feak commented on SOLR-667:
--------------------------------

I'll hook up profiling and see if my GC overhead has changed, or is seeing big peaks. Should be later today.

My document cache is only 4000 for that test, but only about 3300 documents are in it. I wanted to focus on the overhead of getting things out of LRUCache, as in production we get >97% hit rates. I verified it's at 100% hit rate (warming fills it).

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

added a new boolean attribute {{newCleanThread}} . Default is set to false

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644554#action_12644554 ] 

Noble Paul commented on SOLR-667:
---------------------------------

{code}
CacheEntry[] eset = new CacheEntry[sz];
int eSize = 0;
{code}

Isn't it too expensive to create a potentially huge array every time we do a clean? (too much work for GC) .May be we do not even need it if the first loop is enough. Moreover this one thing has added more code.

I didn't use lucene PriorityQueue because if somebody wished to lift the code they can easily do so if there is no other dependency. When I posted the requirement in google collections there was a lot of interest in such  a component. Can TreeSet do the trick?



> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650055#action_12650055 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

OK, committed latest patch after some minor logic changes (including changing CacheEntry from an inner class to a static inner class, which solved the compilation errors).

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643968#action_12643968 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

Todd, what's the used size of your Document cache?
I'll review this latest incarnation of ConcurrentLRUCache to see if there's anything that might cause this.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643536#action_12643536 ] 

Shalin Shekhar Mangar commented on SOLR-667:
--------------------------------------------

bq. For apples to apples load tests, this more then doubled my overall throughput. 
Very good to hear that!

bq. Is this the cleanup involved when hitting a high-water mark?
Yes, a cleanup is attempted when the size crosses the high watermark ('size' config parameter). It is done in two stages. In the first stage, least recently used items are evicted. If, after the first stage, the cache size is still greater than 'acceptableSize' config parameter (default is 0.95*maxSize), the second stage takes over. The second stage is more intensive and tries to bring down the cache size to the 'minSize' config parameter (default is 0.9*maxSize).

Note that the cleanup is done in the same thread which calls a put on the cache, hence the 'pulsing' that you are seeing. The cache implementation supports using a separate cleanup thread too, however it is not used currently. We still need to evaluate the best way to use it and how much it can help.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar resolved SOLR-667.
----------------------------------------

    Resolution: Fixed

Committed revision 708656.

Thanks Fuad, Noble and Yonik!

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644540#action_12644540 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

Latest patch:
 -  fixes an off-by-one (it was possible to go below minSize) 
 - fixes tests to not expect a specific number of evictions.
 - makes FastLRUCache the default for filterCache in the example schema and main test schema.

I'll commit soon if there are no objections.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

yonik's suggestions implemented

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (SOLR-667) Alternate LRUCache implementation

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll closed SOLR-667.
--------------------------------


> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644889#action_12644889 ] 

Noble Paul commented on SOLR-667:
---------------------------------

There is another number which we have ignored. If the cleanup is done in a separate thread , FastLRUCache consistently outperforms the legacy one. (shalin forgot to put the numbers). For a very large cache size , the cleanup takes ~200-300 ms. Which means a request can end up paying a huge price. 

We need to add a new  'newCleanupThread' option to FastLRUCache (it is there in my old patch). I guess with that we can make FastLRUcache the default with newCleanupThread=true. 

bq.Is there an easy way to get this patched into 1.3.0? 

If you apply yonik's latest patch on trunk you get two extra files . You can straightaway copy those two files to 1.3 and use it.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618072#action_12618072 ] 

noble.paul edited comment on SOLR-667 at 7/29/08 11:22 PM:
-----------------------------------------------------------

A POC implementation based on _ConcurrentHashMap_

* Gets are free
* Puts are free till it touches the high water mark . It is expensive (very expensive) after the high watermark .
* To lighten the load on put an extra thread is employed to do a concurrent mark and sweep 
* there is a high-water-mark and a low-water-mark . The extra thread cleans anything if low-water-mark is crossed. Put must removes if level crosses high-water-mark

      was (Author: noble.paul):
    A POC implementation based on _ConcurrentHashMap_
Another one this is LRU. uses ConcurrentHashMap again

* Gets are free
* Puts are free till it touches the high water mark . It is expensive (very expensive) after the high watermark .
* To lighten the load on put an extra thread is employed to do a concurrent mark and sweep 
* there is a high-water-mark and a low-water-mark . The extra thread cleans anything if low-water-mark is crossed. Put must removes if level crosses high-water-mark
  
> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643738#action_12643738 ] 

Noble Paul commented on SOLR-667:
---------------------------------

bq.Could it be associated with the overhead of maintaining the least recently used entries?
The overhead is ~= 0. It just has to increment an AtomicLong everytime you do a get() .I suspect the 'pulsing' may be because of GC pauses. enable GC logging and you will know

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644093#action_12644093 ] 

Todd Feak commented on SOLR-667:
--------------------------------

I ran with a profiler and I'm not seeing any bursts of garbage collection. It's at a steady ~2%, with no major collections occurring (which is great!). However, the use of the profiler also slows things down about 10-20 % which seems to be enough that the pulsing goes away. I believe the pulsing may be some sort of limit of the testing I'm doing on a limited local environment. I'll include the patch in my current Solr app and do a more accurate comparison with our production level load tests to see if it still exists. Though it may be a while before I can provide feedback from that one, as getting machines allocated for the heavy load testing can take a bit.

As I said before, this is a huge improvement. Thanks for all the work on this one.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

made CacheEntry non static 

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

right Mark, Single proc machines are very rare in production
nowadays.Even laptops come w/ multiple cores


On Mon, Nov 3, 2008 at 2:13 AM, Mark Miller <ma...@gmail.com> wrote:
> Single processor pentium 4? Shouldn't someone benchmark this with tech from
> this century:) you know some comps are not room size now right?
>
> Not being very serious...
>
> - Mark
>
>
> On Nov 2, 2008, at 10:34 AM, "Yonik Seeley (JIRA)" <ji...@apache.org> wrote:
>
>>
>>   [
>> https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644594#action_12644594 ]
>>
>> Yonik Seeley commented on SOLR-667:
>> -----------------------------------
>>
>> bq. Isn't it too expensive to create a potentially huge array every time
>> we do a clean? (too much work for GC)
>>
>> That's what benchmarking is for :-)
>>
>> It's a single short-lived allocation that allows us to  greatly reduce the
>> number of elements we need to evaluate on successive passes.  Inserts into a
>> TreeSet may have a higher GC cost given it's an allocation per insert.
>>
>>> Alternate LRUCache implementation
>>> ---------------------------------
>>>
>>>               Key: SOLR-667
>>>               URL: https://issues.apache.org/jira/browse/SOLR-667
>>>           Project: Solr
>>>        Issue Type: New Feature
>>>        Components: search
>>>  Affects Versions: 1.3
>>>          Reporter: Noble Paul
>>>          Assignee: Yonik Seeley
>>>           Fix For: 1.4
>>>
>>>       Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java,
>>> ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch,
>>> SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch,
>>> SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch,
>>> SOLR-667.patch, SOLR-667.patch
>>>
>>>
>>> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_
>>> which has _get()_ also synchronized. This can cause severe bottlenecks for
>>> faceted search. Any alternate implementation which can be faster/better must
>>> be considered.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>



-- 
--Noble Paul

Re: [jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by Mark Miller <ma...@gmail.com>.

Single processor pentium 4? Shouldn't someone benchmark this with tech  
from this century:) you know some comps are not room size now right?

Not being very serious...

- Mark


On Nov 2, 2008, at 10:34 AM, "Yonik Seeley (JIRA)" <ji...@apache.org>  
wrote:

>
>    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644594#action_12644594 
>  ]
>
> Yonik Seeley commented on SOLR-667:
> -----------------------------------
>
> bq. Isn't it too expensive to create a potentially huge array every  
> time we do a clean? (too much work for GC)
>
> That's what benchmarking is for :-)
>
> It's a single short-lived allocation that allows us to  greatly  
> reduce the number of elements we need to evaluate on successive  
> passes.  Inserts into a TreeSet may have a higher GC cost given it's  
> an allocation per insert.
>
>> Alternate LRUCache implementation
>> ---------------------------------
>>
>>                Key: SOLR-667
>>                URL: https://issues.apache.org/jira/browse/SOLR-667
>>            Project: Solr
>>         Issue Type: New Feature
>>         Components: search
>>   Affects Versions: 1.3
>>           Reporter: Noble Paul
>>           Assignee: Yonik Seeley
>>            Fix For: 1.4
>>
>>        Attachments: ConcurrentLRUCache.java,  
>> ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667- 
>> alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch,  
>> SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch,  
>> SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch,  
>> SOLR-667.patch
>>
>>
>> The only available SolrCache i.e LRUCache is based on  
>> _LinkedHashMap_ which has _get()_ also synchronized. This can cause  
>> severe bottlenecks for faceted search. Any alternate implementation  
>> which can be faster/better must be considered.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644594#action_12644594 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

bq. Isn't it too expensive to create a potentially huge array every time we do a clean? (too much work for GC) 

That's what benchmarking is for :-)

It's a single short-lived allocation that allows us to  greatly reduce the number of elements we need to evaluate on successive passes.  Inserts into a TreeSet may have a higher GC cost given it's an allocation per insert.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644195#action_12644195 ] 

Noble Paul commented on SOLR-667:
---------------------------------

bq.The current code reversed this logic,

I missed the point . I guess this post made it clear . As you mentioned it should be a 3 step cleanup

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644389#action_12644389 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

Nobe, your latest patch contains code like this:
{code}
    if (!markAndSweepLock.tryLock()) return;
    long oldestItem = this.oldestItem;
    [...]
    markAndSweepLock.unlock();
    this.oldestItem = oldestItem;
{code}

oldestItem isn't volatile (and doesn't need to be if accessed correctly).
The lock will also serve as a read barrier, so the first part is OK.
The unlock will serve as a write barrier, but the set of oldestItem comes after it (not OK... another thread may not see the value).
Changing the order (the write to oldestItem before the unlock) will ensure that the next thread that crosses a read barrier will see the new value.


> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649982#action_12649982 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

Does this have a thread leak?  Where is FastLRUCache.destroy() ever called?
It's called from the finalizer (yuck), *but* that finalizer will never be called because the cleaning thread references the cache (the definition of liveness).  Issues with having the cache deal with the thread lifecycle is why I previously recommended exploring the use of an Executor that the user passes in.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650103#action_12650103 ] 

Noble Paul commented on SOLR-667:
---------------------------------

* Should we keep the cleanupThread= true default for FastLRUCache?


> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627598#action_12627598 ] 

Noble Paul commented on SOLR-667:
---------------------------------

The patch contains a version which use the java 5 features (ConcurrentHashMap) .It is better than using a separate Lock

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

* cleanup thread does wait() and get notified when needed
* ConcurrentLRUCache is generified
* A new interface added to ConcurrentLRUCache called EvictionListener .This gets callback for each entry that is evicted
* FastLRUcache has a new configuration 'cleanupThread' . default is set to 'false' . I believe it should be true by default

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645302#action_12645302 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

I just committed a fix to the setting of acceptableSize - it was always being set to maxSize, which would normally cause the cleaning routine to return after the first phase (and thus be called more often than normal).

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618812#action_12618812 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

bq. I only noticed some extremely stupid code where SOLR uses _double_synchronization and AtomicLong inside:

A simple typo I think... a remnant from way back when changing what object was being synchronized on.  That's why I like explicit synchronization rather than adding it to a method signature (easier to miss).   I just fixed this to be
{code}
  public Object put(Object key, Object value) {
    synchronized (map) {
      if (state == State.LIVE) {
        stats.inserts.incrementAndGet();
      }

      // increment local inserts regardless of state???
      // it does make it more consistent with the current size...
      inserts++;
      return map.put(key,value);
    }
  }
{code}

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-667:
------------------------------

    Attachment: SOLR-667-updates.patch

Some minor updates:
 - fix thread saftey issue in finalizer (background thread may never see stop being set)
 - fix tracking of size in put() to only increment if oldValue != null
 - optimize cleaning check in put() since it will be called for every put until the size is back down.


> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-667:
------------------------------

    Attachment: SOLR-667-alternate.patch

OK, here's the implementation based on my previous pseudo code, along with a very quick performance test.
The test uses random keys over a slightly bigger range than the table.
It also uses an upper water mark 10% higher than the lower water mark, and an acceptable water mark half way inbetween.  I haven't experimented to see what the best acceptable water mark is for either impl.  If anyone wants to do a more realistic test with real queries, they are welcome to it.

I did 4 runs and took the lowest number for each sub-test.  Java6 -server, WinXP, P4.  Times in milliseconds.
{code}
    doPerfTest(2000000, 100000, 200000); // big cache
noble=17063  yonik=9157
    doPerfTest(2000000, 100000, 120000);  // smaller key space increases distance between 
oldest, newest and makes the first passes less effective.
noble=8391  yonik=5812
    doPerfTest(6000000, 1000, 2000);    // small cache, smaller hit rate
noble=17578  yonik=12515
    doPerfTest(6000000, 1000, 1200);    // small cache, bigger hit rate
noble=11500  yonik=8219
{code}


> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639791#action_12639791 ] 

Shalin Shekhar Mangar commented on SOLR-667:
--------------------------------------------

Yonik -- do you think this is good enough to go in now? Probably some users can try it out and report their experiences if we commit it early. I can take this up if you want.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650035#action_12650035 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

Thanks Noble, looks like that solution should work.

Funny thing with the latest patch though - I get compile errors with "ant test" from the command line:
{code}
compile-common:
    [mkdir] Created dir: f:\code\solr\build\common
    [javac] Compiling 39 source files to f:\code\solr\build\common
    [javac] f:\code\solr\src\java\org\apache\solr\common\util\ConcurrentLRUCache.java:201: generic array creation
    [javac]       CacheEntry[] eset = new CacheEntry[sz];
    [javac]                           ^
    [javac] f:\code\solr\src\java\org\apache\solr\common\util\ConcurrentLRUCache.java:379: non-static class org.apache.solr.common.util.ConcurrentLRUCache.Cache
Entry cannot be referenced from a static context
    [javac]       return ((CacheEntry)b).lastAccessedCopy < ((CacheEntry)a).lastAccessedCopy;
   [...]
{code}

It looks like the compiler thinks that the method is static.  IntelliJ doesn't flag any errors, and I can't see anything wrong after a quick glance at the code.  Does "ant test" from the command line work for you?


> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627598#action_12627598 ] 

noble.paul edited comment on SOLR-667 at 9/2/08 1:00 AM:
---------------------------------------------------------

The patch contains a implementation which uses the java 5 features (ConcurrentHashMap) .It is better than using a separate Lock

      was (Author: noble.paul):
    The patch contains a version which use the java 5 features (ConcurrentHashMap) .It is better than using a separate Lock
  
> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

run cleanup in new thread

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635108#action_12635108 ] 

Noble Paul commented on SOLR-667:
---------------------------------

Looks good. This has a lot in common with my approach. The doClean() is done far more efficiently in your implementation . I can improve mine with your cleanup code (if you think it is fine)

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

Full SolrCache implementation 
not tested 

the initialization parameters are same us the current LRUCache. 

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Fuad Efendi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635221#action_12635221 ] 

Fuad Efendi commented on SOLR-667:
----------------------------------

Paul, Yonik,  thanks for your efforts; BTW 'Concurrent'HashMap uses spinloops for 'safe' updates in order to avoid synchronization (instead of giving up CPU cycles); there are always cases when it is not faster that simple HashMap with synchronization.

LingPipe uses different approach, see last comment at SOLR-665.

Also, why are you in-a-loop with LRU? LFU is logically better.

+1 and thanks for sharing.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-667:
------------------------------

    Attachment: SOLR-667-alternate.patch

Added some minor changes, making sure that minLimit >= 1 and limit >minLimit (needed for rounding with small cache sizes).
Also added test code for LRUCache vs FastLRUCache.
It appears that LRUCache is faster (at least on my single proc PC) when the hit ratio is low, and FastLRUCache is faster when the hit ratio is high.
Should FastLRUCache be made the default in the example schema for the filterCache?

{code}
time=2937 impl=LRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
time=2266 impl=FastLRUCache nThreads= 1 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
time=3594 impl=LRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.9816075
time=1484 impl=FastLRUCache nThreads= 2 size=100000 maxKey=100000 gets=2000000 hitRatio=0.981608
time=3203 impl=LRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.835225
time=4593 impl=FastLRUCache nThreads= 1 size=100000 maxKey=120000 gets=2000000 hitRatio=0.751506
time=3781 impl=LRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.834685
time=2656 impl=FastLRUCache nThreads= 2 size=100000 maxKey=120000 gets=2000000 hitRatio=0.8232835000000001
time=3234 impl=LRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.523398
time=5047 impl=FastLRUCache nThreads= 1 size=100000 maxKey=200000 gets=2000000 hitRatio=0.3831675
time=4125 impl=LRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.511871
time=3969 impl=FastLRUCache nThreads= 2 size=100000 maxKey=200000 gets=2000000 hitRatio=0.6665975
time=3390 impl=LRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.1445725
time=5687 impl=FastLRUCache nThreads= 1 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10041049999999996
time=4750 impl=LRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.10340150000000004
time=6875 impl=FastLRUCache nThreads= 2 size=100000 maxKey=1000000 gets=2000000 hitRatio=0.22233749999999997
time=1343 impl=LRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=860 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=1547 impl=LRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=703 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1000 gets=2000000 hitRatio=0.9998065
time=1610 impl=LRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.833648
time=2406 impl=FastLRUCache nThreads= 1 size=1000 maxKey=1200 gets=2000000 hitRatio=0.7404839999999999
time=2078 impl=LRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.8334255
time=859 impl=FastLRUCache nThreads= 2 size=1000 maxKey=1200 gets=2000000 hitRatio=0.998974
time=1922 impl=LRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5003285
time=2875 impl=FastLRUCache nThreads= 1 size=1000 maxKey=2000 gets=2000000 hitRatio=0.3516785
time=2422 impl=LRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.5002055000000001
time=1203 impl=FastLRUCache nThreads= 2 size=1000 maxKey=2000 gets=2000000 hitRatio=0.821195
time=2297 impl=LRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10054949999999996
time=2969 impl=FastLRUCache nThreads= 1 size=1000 maxKey=10000 gets=2000000 hitRatio=0.05416350000000003
time=3078 impl=LRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10003499999999999
time=3000 impl=FastLRUCache nThreads= 2 size=1000 maxKey=10000 gets=2000000 hitRatio=0.10475299999999999
{code}

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

creating an array[map.size()] for every markAndSweep() is expensive. Iterator should be better

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644097#action_12644097 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

I'll take a crack at updating the patch such that too many entries aren't removed.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644738#action_12644738 ] 

Todd Feak commented on SOLR-667:
--------------------------------

Is there an *easy* way to get this patched into 1.3.0?

Right now, I think I have to grab 7 patches and apply them in order. Will that give me the correct content? Is there an easier way to do this from the repository?

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647657#action_12647657 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

The ability to use a separate cleanup thread is interesting... but I'm not sure that having the ability to spin off a new thread for *each* cleanup is something one would ever want to do.  The cleanup thread logic should probably be fixed too (no sleeping and polling... it should wait until notified that a cleanup is needed)

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650104#action_12650104 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

I think the default should remain cleanupThread=false.  It's simpler behavior, and for normal cache sizes, the pause an individual request may see is less than what one would see from a GC pause.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-667:
------------------------------

    Attachment: ConcurrentLRUCache.java

Here is a prototype of an idea I've had for a while for an efficient concurrent LRU cache.
It is completely untested... consider it more "code as design".  It *should* feature faster cleaning - O( n ) when it works well.

In addition to low and high water marks, it adds the concept of an "acceptable" water mark.  A cleaning phase will try to go to the low water mark, but will be considered successful if it hits the acceptable water mark.

This is coupled with a multi-pass cleaning phase.  From the comments:
{code}
    // if we want to keep at least 1000 entries, then timestamps of
    // current through current-1000 are guaranteed not to be the oldest!
    // Also, if we want to remove 500 entries, then
    // oldestEntry through oldestEntry+500 are guaranteed to be
    // removed.
{code}

The oldestEntry and newestEntry in the set of entries currently being considered is recorded for each phase.  Entries that are new enough such that they are guaranteed to be kept are immediately removed from consideration, and entries that are old enough such that they are guaranteed to be removed are immediately removed (no sorting necessary).  After 2 phases of this (configurable) and we still haven't removed enough entries, a priority queue is used to find the oldest entries out of those remaining.

There are undoubtedly some other tricks we can use, but this was the best I could come up with for now.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

with testcase

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-667:
---------------------------------------

    Attachment: SOLR-667.patch

# Added comments in the code
# Fixed a few concurrency issues

I'll commit this shortly.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644594#action_12644594 ] 

yseeley@gmail.com edited comment on SOLR-667 at 11/2/08 7:39 AM:
------------------------------------------------------------

bq. Isn't it too expensive to create a potentially huge array every time we do a clean? (too much work for GC) 

That's what benchmarking is for :-)

It's a single short-lived allocation that allows us to  greatly reduce the number of elements we need to evaluate on successive passes.  Inserts into a TreeSet may have a higher GC cost given it's an allocation per insert.

bq. May be we do not even need it if the first loop is enough.

Right... although in my testing, it seemed like the first loop was rarely sufficient (although the second often was).

      was (Author: yseeley@gmail.com):
    bq. Isn't it too expensive to create a potentially huge array every time we do a clean? (too much work for GC) 

That's what benchmarking is for :-)

It's a single short-lived allocation that allows us to  greatly reduce the number of elements we need to evaluate on successive passes.  Inserts into a TreeSet may have a higher GC cost given it's an allocation per insert.
  
> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647798#action_12647798 ] 

Noble Paul commented on SOLR-667:
---------------------------------

OK that is a good idea. But this is an important functinality. 

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635159#action_12635159 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

bq. I can improve mine with your cleanup code (if you think it is fine)

+1

I'd also include the manual tracking of size() that mine did... the ConcurrentHashMap.size() doesn't look fast.

Another thing to think about : pass an optional Executor in the constructor instead of creating a cleaning thread... and if it's null, it means "do it in the foreground".  That would add flexibility and the ability to avoid one thread per cache if desired.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

this is the right patch

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647587#action_12647587 ] 

Shalin Shekhar Mangar commented on SOLR-667:
--------------------------------------------

Yonik, what do you think about using a new cleanup thread?

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649342#action_12649342 ] 

Shalin Shekhar Mangar commented on SOLR-667:
--------------------------------------------

Yonik, not trying to be pushy but can this patch be committed? :)

I want to create a build for internal use out of trunk with this feature.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Fuad Efendi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618805#action_12618805 ] 

Fuad Efendi commented on SOLR-667:
----------------------------------

Paul, 


I have never ever suggested to use 'volatile'  'to avoid synchronization' for concurrent programming. I only noticed some extremely stupid code where SOLR uses _double_synchronization and AtomicLong inside:

{code}
  public synchronized Object put(Object key, Object value) {
    if (state == State.LIVE) {
      stats.inserts.incrementAndGet();
    }

    synchronized (map) {
      // increment local inserts regardless of state???
      // it does make it more consistent with the current size...
      inserts++;
      return map.put(key,value);
    }
  }
{code}

Each tool has an area of applicability, and even ConcurrentHashMap just slightly intersects with SOLR needs; SOLR does not need 'consistent view at a point in time' on cached objects.

'volatile' is part of Java Specs, and implemented differently by different vendors. I use volatile (instead of more expensive AtomicLong) only and only to prevent JVM HotSpot Optimizer from some _not-applicable_ staff...

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618708#action_12618708 ] 

Noble Paul commented on SOLR-667:
---------------------------------

If somebody can review the implementation we can add another cache implementation to Solr.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644023#action_12644023 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

The markAndSweep logic in the current code didn't replicate the logic I gave in my example ConcurrentLRUCache, and it can remove a lot more than it should.

Specifically, my algorithm broke the results into 3 groups:
- those documents that are guaranteed to be in the top group (most recently accessed)
- those documents guaranteed to be in the bottom group (immediately discard)
- those documents where you can't tell.

The current code reversed this logic, assuming that one can remove everything that is not in the top group.  This isn't valid though, as lastAccess isn't uniform (and thus the size of the top group could be as small as 1).

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644730#action_12644730 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

That benchmark really isn't valid for a high number of threads though: notice the difference in hitRatio.
If you have many threads quickly adding items and only one thread at a time removing items, the FastLRUCache goes over it's target size and thus increases it's hitRatio, making it artificially faster.

This isn't a concern for it's use in Solr though, since the generation of a cache value will be much slower than clearing the cache.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: ConcurrentLRUCache.java

A POC implementation based on _ConcurrentHashMap_
Another one this is LRU. uses ConcurrentHashMap again

* Gets are free
* Puts are free till it touches the high water mark . It is expensive (very expensive) after the high watermark .
* To lighten the load on put an extra thread is employed to do a concurrent mark and sweep 
* there is a high-water-mark and a low-water-mark . The extra thread cleans anything if low-water-mark is crossed. Put must removes if level crosses high-water-mark

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644216#action_12644216 ] 

Shalin Shekhar Mangar commented on SOLR-667:
--------------------------------------------

I forgot to add license headers to the three source files for this issue. I'll hold off adding them lest it breaks patches that you guys are working on.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

name change and some refactoring

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: ConcurrentLRUCache.java

my findings on a simple perf test with no contention (single thread)

The code is there in the _main()_

cache size 1 million

with Hashmap
* time taken  for 1 million inserts = 2019ms
* time taken for 1 million gets = 625
* time taken  for cleanup  = 345ms

with ConcurrenthashHashMap

* time taken  for 1 million inserts  = 2437(roughly 20% slower than hashmap but small in absolute numbers) 
* time taken for 1 million gets  = 393ms (actually faster than simple HashMap )
* time taken  for cleanup = 298ms (actually faster)

other observations 
The extra thread may not be be necessary . The unlucky put() may take around .25 secs to .5secs for a cache size of 1 million .
If we keep the value of (highHaterMark -lowWaterMark) value very high cleanups will be infrequent





> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Fuad Efendi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618750#action_12618750 ] 

Fuad Efendi commented on SOLR-667:
----------------------------------

bq. ...safety, where nothing bad ever happens to an object. 
When _SOLR_ adds object to cache or remove it from cache it does not change it, it manipulates with internal arrays of pointers to objects (which are probably atomic, but I don't know such JVM & GC internals in-depth...)

Looks heavy with TreeSet...


> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar reassigned SOLR-667:
------------------------------------------

    Assignee: Shalin Shekhar Mangar

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley reassigned SOLR-667:
---------------------------------

    Assignee: Yonik Seeley  (was: Shalin Shekhar Mangar)

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-667:
----------------------------

    Attachment: SOLR-667.patch

 Yonik, Nice catch . There was a thread leak.

I hope this patch fixes that. The cleanup thread now holds a WeakReference to the cache 
The close() of Solrcache ensures that it is destroyed.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Yonik Seeley
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-alternate.patch, SOLR-667-alternate.patch, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643984#action_12643984 ] 

Yonik Seeley commented on SOLR-667:
-----------------------------------

Todd: have you verified that the hit rate stays at 100% (no new inserts, no new evictions, etc)?  If so, it might just be GC as Noble suggests.  Bigger heaps often increase GC pauses.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Antony Bowesman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627592#action_12627592 ] 

Antony Bowesman commented on SOLR-667:
--------------------------------------

I have also been considering a concurrent LRU cache for my own application and seeing this isse made me think about it again.  Wouldn't one option be to use a ReentrantReadWriteLock to synchronise the map rather than complete synchronisation on the map for both readers and writers.  Although that does not give a free get() it would at least allow concurrent get and still be able to use the LinkedHashMap and would not require the extra thread.  Not sure if SOLR is java 1.5, but if not you could still use Doug Lea's concurrent package for pre 1.5 Java.


> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643604#action_12643604 ] 

Todd Feak commented on SOLR-667:
--------------------------------

I'm not sure if that helped out. I haven't run a profiler yet ,but I think the "pulsing" (for lack of a better term) is caused by something else.

Here's why... I used the new FastLRUCache *only* for my Document cache in my latest test. The document cache size is large enough to hold all of the documents in the test set, helping focus on cache behavior. The warming query is enough to get all documents into the cache on startup. So, the cache is essentially as full as it's gonna get. 100% hit rate, and not growing. Yet, it still exhibits this pulsing. Could it be associated with the overhead of maintaining the least recently used entries?

The introduction of the background thread didn't address this. It also didn't appear to speed things up, in fact it dropped overall throughput a bit, though still better then they old LRUCache.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Todd Feak (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643520#action_12643520 ] 

Todd Feak commented on SOLR-667:
--------------------------------

Huge thanks on this one. This was one of the bottlenecks I've seen previously. For apples to apples load tests, this more then doubled my overall throughput.

I do notice a sort of "pulsing" in the responses. It appears that everything flies along, but on occasion everything piles up for a second, then starts going again. This leads to a few response times that are over 1 second, but the average is way down closer to 20ms. Is this the cleanup involved when hitting a high-water mark?

Overall, it's a huge improvement.

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644008#action_12644008 ] 

Shalin Shekhar Mangar commented on SOLR-667:
--------------------------------------------

Committed revision 709188.

Thanks Yonik!

> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: ConcurrentLRUCache.java, ConcurrentLRUCache.java, ConcurrentLRUCache.java, SOLR-667-updates.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch, SOLR-667.patch
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618768#action_12618768 ] 

Noble Paul commented on SOLR-667:
---------------------------------

bq.It's a good approach in genera, and should scale better with many CPUs under very high lookup load. But I'm not sure that it should use a separate cleaner thread... and if it does, I don't think it should be scheduled.

Thanks for the comments. I agree with you . I devised this approach because some user reported that heavy cache lookups are slowing things down for him. The cost benefit analysis is yet to be studied. Separate cleaner thread is optional in the implementation . I am yet to study the cost of sorting over a hundred thousand entries.  Do you recommend that the cleaner thread just keep running forever? That is fine, so there should be a sleep for the thread? 

BTW is the executorservice very expensive?

I did not provide a SolrCache implementation because that is going to be drown the approach in details. 

bq. I think this should want until after 1.3

True. This is not marked for 1.3. But this can definitely live as a patch and anyone who needs it would benefit from it


> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-667) Alternate LRUCache implementation

Posted by "Fuad Efendi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618824#action_12618824 ] 

Fuad Efendi commented on SOLR-667:
----------------------------------

Thanks Yonik, I even guess that in some cases synchronization is faster than sun.misc.Unsafe.compareAndSwapLong(this, valueOffset, expect, update);

{code}
    public final long incrementAndGet() {
        for (;;) {
            long current = get();
            long next = current + 1;
            if (compareAndSet(current, next))
                return next;
        }
    }
{code}

- extremal level of safety with some level of concurrency... Do we need exact value for 'stats.inserts' (if it is not synchronized)? 

It can be 'long' inside synchronized block...


> Alternate LRUCache implementation
> ---------------------------------
>
>                 Key: SOLR-667
>                 URL: https://issues.apache.org/jira/browse/SOLR-667
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>         Attachments: ConcurrentLRUCache.java
>
>
> The only available SolrCache i.e LRUCache is based on _LinkedHashMap_ which has _get()_ also synchronized. This can cause severe bottlenecks for faceted search. Any alternate implementation which can be faster/better must be considered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.