You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2010/11/25 18:30:14 UTC

[jira] Created: (LUCENE-2779) Use ReadWriteLock in RAMDirectory

Use ReadWriteLock in RAMDirectory
---------------------------------

                 Key: LUCENE-2779
                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Store
            Reporter: Shai Erera
            Assignee: Shai Erera
            Priority: Minor
             Fix For: 3.1, 4.0


RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.

Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...

I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964556#action_12964556 ] 

Earwin Burrfoot commented on LUCENE-2779:
-----------------------------------------

I'm happy with CHM.
The only thing bothering me is RAMDir.listAll() method. It feels broken both in your directory and the one I posted.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964920#action_12964920 ] 

Earwin Burrfoot commented on LUCENE-2779:
-----------------------------------------

I don't believe cloning the keySet() will make it more "snapshotty". There's no way to get a completely consistent snapshot of some concurrent datastructure without locking it completely, or using a variant of copy-on-write approach.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965301#action_12965301 ] 

Robert Muir commented on LUCENE-2779:
-------------------------------------

Hi:

please don't post sun/oracle proprietary source code on apache jira issues!

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964575#action_12964575 ] 

Yonik Seeley commented on LUCENE-2779:
--------------------------------------

bq. I checked CHM iterators and they are safe w.r.t the map being modified after the iterator was obtained.

Yes, but that wasn't the problem.  The iterator for ketSet "guarantees to traverse elements as they existed upon construction of the iterator".
The real problem is that the String[] is created based on the size of the keySet, and then later, an iterator is created over the keySet.  If an addition to the set is done between those two events, the iterator will traverse more elements than there is room for - hence the AIOB exception.

 bq. About remove I'll have to look in the code (not near the comp now) - but I had a feeling that it is safe too, b/c the file is first removed,

The createOutput issue is that get() was used (not remove) so multiple threads could overwrite the same file and all of them subtract the size.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965293#action_12965293 ] 

Shai Erera commented on LUCENE-2779:
------------------------------------

Ok I can change the code to: new ArrayList(map.keySet()).toArray(). How's that sound?

I use IBM JVM and I'll check the KeySet.toArray later when I'm in front of the comp. But I think we should have a resilient code and the one above seems right to me. What do you think?

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964978#action_12964978 ] 

Uwe Schindler commented on LUCENE-2779:
---------------------------------------

Reverted 3.x commit in revision: 1040320

(this change not even had a CHANGES.txt entry!) Please try to resolve this in a correct way!

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964987#action_12964987 ] 

Uwe Schindler commented on LUCENE-2779:
---------------------------------------

bq. Maybe we should commit it to 4.0 only? Doesn't look like a really important patch, that just has to be backported. 

+1; And please add a CHANGES.txt entry for trunk.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965240#action_12965240 ] 

Yonik Seeley commented on LUCENE-2779:
--------------------------------------

OK, a few points on the latest patch:
- cloning the map does not change the "lie" (i.e. it's still not a point-in-time snapshot)... the constructor for the new set must iterate over the items also, so consistency is not increased.  You've just changed where the iteration happens.  You can see this by trying out my test program, and making the following change:
{code}
      Object[] vals = map.keySet().toArray(new Integer[0]);
      Object[] vals = new HashSet<String>(map.keySet()).toArray(new Integer[0]);
{code}
- toArray() or toArray(T[]) should be safe to call on ConcurrentHashMap.keySet().  It works fine on my JVM (Oracle Java6)
  What JVM are you using?


> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-2779:
---------------------------------

    Attachment: TestCHM.java

Note: there is an important change of semantics to listAll() - it can now "lie".
Previously, it always gave an accurate listing of files as they existed at some point in time.  Now, you can get back a list of files that never really existed together at any point in time (i.e. the lie).

I'm OK with these new semantics because we have the same limitation already on FSDirectory.  This issue manifested in LUCENE-2585, so we should keep in mind that we could now also see this with RAMDirectory.

Here's a test program that demonstrates the issue by having a single writer adding 5,-5 to a map before removing the old 4,-4, etc.  At any point in time, there will always be a pair of numbers.  The single reader looks for this and spits out an error if not found.

output from my box:
iterations=1000000 errors=4120 minLen=0 maxLen=10

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965245#action_12965245 ] 

Uwe Schindler edited comment on LUCENE-2779 at 11/30/10 11:53 AM:
------------------------------------------------------------------

Yonik: You are right. Sun Java 5 creates an ArrayList and then uses the Iterator and adds each key to the ArrayList. After that it calls toArray on the ArrayList. This is safe, as the iterator is documented to be safe. Java 6 is safe as AbstractCollection.toArray() is oficially documented to behave correctly.

If Shai wants to keep this code, he should at least use ArrayList instead of HashSet for the keys, as cloning is much faster then. But as ArrayList(Collection) uses toArray() in its ctor it may also be broken.

Maybe Shai uses IBM JRocket? Or Harmony (Harmony is currently broken, HARMONY-6681)?

      was (Author: thetaphi):
    Yonik: You are right. In Sun Java 5 (src.zip, ConcurrentHashMap.java, line 1207) you can see how the toArray is implemented for KeySet. It creates an ArrayList and then uses the Iterator and adds each key to the ArrayList. After that it calls toArray on the ArrayList. This is safe, as the iterator is documented to be safe.

If Shai wants to keep this code, he should at least use ArrayList instead of HashSet for the keys, as cloning is much faster then.

Maybe Shai uses IBM JRocket? Or Harmony?
  
> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964882#action_12964882 ] 

Shai Erera commented on LUCENE-2779:
------------------------------------

I think we can still avoid this if we clone the keySet(). Since listAll I'd not a frequently called method, and since I believe RAMDirs do not hold hundreds of files, I believe It won't be expensive. What do you think?

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera updated LUCENE-2779:
-------------------------------

    Description: 
RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap

Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...

I'll post a patch shortly.

  was:
RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.

Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...

I'll post a patch shortly.

        Summary: Use ConcurrentHashMap in RAMDirectory  (was: Use ReadWriteLock in RAMDirectory)

Changed title and description to better match the issue's topic.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965376#action_12965376 ] 

Michael McCandless commented on LUCENE-2779:
--------------------------------------------

bq. But I still don't think we should do this, its a bug outside Lucene in a seldom used JVM! 

Actually I think [sadly] it is our responsibility to sidestep JRE bugs when we can.  We want to maximize Lucene's portability.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964570#action_12964570 ] 

Shai Erera commented on LUCENE-2779:
------------------------------------

I checked CHM iterators and they are safe w.r.t the map being modified after the iterator was obtained. It's easily testable: I added 3 items to CHM and called keySet.iterator(). I then called next() once, removed an item, called next() again and added an item. The changes were no visible to the iterator and it returned 3 items only (those that I added first). So I think we're safe on that.

About remove I'll have to look in the code (not near the comp now) - but I had a feeling that it is safe too, b/c the file is first removed, which is a blocking 'write' operation and only if it isn't null then the AtomicLong is updated. But I'll Check the code again, and maybe even add a concurrent test case.

Thanks for the review. If in the end this change will smell bad, I'll close the issue - introducing concurrency bugs is the last thing we need :).

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2779:
----------------------------------

    Attachment: LUCENE-2779-backwardsfix.patch

If we really want to break backwards compatibility, here would be the fix for 3.x backwards branch.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shay Banon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966376#action_12966376 ] 

Shay Banon commented on LUCENE-2779:
------------------------------------

  If the assumption still stands that an IndexInput will not be opened on a "writing" / unclosed IndexOutput, then RAMFile can also be improved when it comes to concurrency. The RAMOutputStream can maintain its own list of buffers (simple array list, no need to sync), and only when it gets closed, initialize the respective RAMFile with the list. This means most of the synchronize aspects of RAMFile can be removed. Also, on RAMFile, lastModified can be made volatile, and remove the sync on its respective methods.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965205#action_12965205 ] 

Shai Erera commented on LUCENE-2779:
------------------------------------

bq. toArray() with no parameters failed on CHM?

I don't use toArray(). I use toArray(new String[0]) because I don't want to cast the returned array to String[]. According to the javadocs:

_Returns an array containing all of the elements in this collection; the runtime type of the returned array is that of the specified array. If the collection fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the runtime type of the specified array and the size of this collection._

This method guarantees the right array size will be allocated.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ReadWriteLock in RAMDirectory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964512#action_12964512 ] 

Earwin Burrfoot commented on LUCENE-2779:
-----------------------------------------

I mean, even if aquiring locks costed you one second each, you're not going to notice that, as in your case it is a startup-only cost.

Look at this Directory impl I used for a while - https://gist.github.com/715617 - it uses synchronizedMap() over HashMap.
I had around 100qps with it on 4/8-way boxes AND pretty frequent updates.
I benchmarked a switch to RWLock and it yielded zero benefits.

> Use ReadWriteLock in RAMDirectory
> ---------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ReadWriteLock in RAMDirectory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935813#action_12935813 ] 

Earwin Burrfoot commented on LUCENE-2779:
-----------------------------------------

Using RWLock is needless.
synchronized performs pretty nice on modern JVMs
to at least some get contention on RAMDir you need to index+reopen gazillion times a second - not a likely scenario. Usual searching doesn't hit any of these locks
that's premature optimization, imo

> Use ReadWriteLock in RAMDirectory
> ---------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ReadWriteLock in RAMDirectory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964515#action_12964515 ] 

Earwin Burrfoot commented on LUCENE-2779:
-----------------------------------------

My primary point was that you're not going to have contention in that place at all. Take these 100queries/s, turn them into 100reopens+commits/s, multiply by ten, and then _maybe_ you start noticing _something_.

> Use ReadWriteLock in RAMDirectory
> ---------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965303#action_12965303 ] 

Uwe Schindler commented on LUCENE-2779:
---------------------------------------

The only performant variant of that code that works around all these bugs is the code snippet in Java SE6 JavaDocs, which is published by Sun as "works-as-if-implemented-as" in their JavaDocs: 

{code}
Set<String> set = #####.keySet();
List<String> list = new ArrayList<String>(set.size());
for (String s : set) list.add(e);
return list.toArray(new String[0]);
{code}

But I still don't think we should do this, its a bug outside Lucene in a seldom used JVM!

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965086#action_12965086 ] 

Shai Erera commented on LUCENE-2779:
------------------------------------

I didn't add a CHANGES entry because I consider it an internal change. And as a user, I wouldn't know what to do w/ an entry that says "RAMDirectory now uses CHM".

bq. I don't believe cloning the keySet() will make it more "snapshotty".

Cloning the keySet() will be exactly the 'snapshotty' behavior we're looking for. Before I made the change, you could call listAll(), lock RAMDir, return the array and before/after that files could be added/removed. W/ the clone, we'll get the same behavior - files can be added/removed before the clone, clone would reflect those changes, whatever happens after the clone is invisible to the iterator - hence why I consider it snapshotty.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964990#action_12964990 ] 

Yonik Seeley commented on LUCENE-2779:
--------------------------------------

In Shai's defense, a CHANGES entry for 4.0 is pretty questionable.  It's not going to be a noticeable enough speedup to call out as an end-user feature.  Any super-expert level people extending RAMDirectory and accessing the Map directly will get a compile error (a good thing) and instantly know what's changed.  IMO, this falls into the category of "little internal cleanup" and people can consult the SVN logs or mailing lists if they wish to know every single one of them ;-)

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965303#action_12965303 ] 

Uwe Schindler edited comment on LUCENE-2779 at 11/30/10 12:42 PM:
------------------------------------------------------------------

The only performant variant of that code that works around all these bugs is the code snippet in Java SE6 JavaDocs, which is published by Sun as "works-as-if-implemented-as" in their JavaDocs: 

{code}
Set<String> set = #####.keySet();
List<String> list = new ArrayList<String>(set.size());
for (String s : set) list.add(s);
return list.toArray(new String[0]);
{code}

But I still don't think we should do this, its a bug outside Lucene in a seldom used JVM!

      was (Author: thetaphi):
    The only performant variant of that code that works around all these bugs is the code snippet in Java SE6 JavaDocs, which is published by Sun as "works-as-if-implemented-as" in their JavaDocs: 

{code}
Set<String> set = #####.keySet();
List<String> list = new ArrayList<String>(set.size());
for (String s : set) list.add(e);
return list.toArray(new String[0]);
{code}

But I still don't think we should do this, its a bug outside Lucene in a seldom used JVM!
  
> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965396#action_12965396 ] 

Uwe Schindler commented on LUCENE-2779:
---------------------------------------

Code looks good, I would only chnage the comment to simply say, that:
{code}
return fileMap.keySet().toArray(new String[0]);
{code}
only works correct in Suns JVM. The thing with new ArrayList(Collection) or any other thing is implemented different in every JVM (even new HashSet(Collection) can be broken if it uses toArray() for some weird reason). But just to note, according to the JDK documentation the above code should work - period. So we should only document that and say, that this is broken.


> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera updated LUCENE-2779:
-------------------------------

    Attachment: LUCENE-2779.patch

Patch w/ the latest code + a typo fix. I will commit it later today.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966457#action_12966457 ] 

Michael McCandless commented on LUCENE-2779:
--------------------------------------------

bq. If the assumption still stands that an IndexInput will not be opened on a "writing" / unclosed IndexOutput, then RAMFile can also be improved when it comes to concurrency. 

This assumption does still stand, and our unit tests now assert this -- MockDirWrapper throws an exception if we ever try to open an II when an IO is still open against that file.

That said... we are considering relaxing this, because RT search needs to be able to access the doc stores with an II even as IO is appending to it.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964982#action_12964982 ] 

Earwin Burrfoot commented on LUCENE-2779:
-----------------------------------------

Maybe we should commit it to 4.0 only? Doesn't look like a really important patch, that just _has_ to be backported.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera updated LUCENE-2779:
-------------------------------

    Attachment: LUCENE-2779.patch

Attached patch combines the changes to RAMDir and the changes to MockRAMDir under backwards.

I also made the map final, as Uwe suggested, but it means that on close() it cannot be nullified, so instead I clear()-ed it. I think it achieves the same goal - clearing the references to RAMFiles.

Also, what do you know, I've hit an AIOB exception thrown from listAll() when it called toArray() :). So I cloned the set of keys first, which protects against it. After the set is cloned, it's not affected by any changes to the map, and therefore toArray() works safely, and returns some point in time snapshot of the map. The "point in time" is not necessarily the one that existed when you called listAll(), but the cloned set becomes the "point in time" snapshot. I think it's ok.

I've hit it when running backwards tests (from TestIndexWriterExceptions), but not from core. Perhaps it was just a threading issue.

If you're ok w/ that, I'll make the same changes to trunk as well (making the map final and cloning the set).

All tests pass (this time, including backwards :)).

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ReadWriteLock in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964516#action_12964516 ] 

Shai Erera commented on LUCENE-2779:
------------------------------------

I've changed RAMDir to use ConcurrentHashMap and everything works (still need to run contrib and Solr tests) so far, w/o any synchronization (neither synchronize nor RWLock). While I don't expect any major perf gains from this, I think it simplifies the code, a bit. I'll post a patch shortly.

> Use ReadWriteLock in RAMDirectory
> ---------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera resolved LUCENE-2779.
--------------------------------

    Resolution: Fixed

Committed revision 1041019 (3x).
Committed revision 1041039 (trunk).

Thanks all for the review !

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965359#action_12965359 ] 

Shai Erera commented on LUCENE-2779:
------------------------------------

I did hit an AIOOBE though, and I use IBM's JDK, which its CHM apparently uses AbstractCollection's impl of toArray(Object[]). The question is, why should we rely on toArray() implemented one way or the other in which JDK? If we think the best impl would be to allocate an ArrayList and add all the elements to it, then why not do this explicitly? Remember that before this change, listAll() would do almost exactly that - it iterated on the keySet() and added to items to an array, only then it could rely on size().

Apparently, doing new ArrayList(fileMap.keySet()) is not safe either, as internally it allocates an array and calls the set's toArray. So I ended up writing the following code and comment:

{code}
    // NOTE: due to different implementations of different JDKs, it's not safe
    // to do either:
    // 1. return fileMap.keySet().toArray(new String[0])
    // 2. return new ArrayList<String>(fileMap.keySet()).toArray(new String[0])
    // as both can result in ArrayIndexOutOfBoundException, in case the keySet()
    // is modified while this method is executed.
    // Therefore we must clone the set in the following manner, never relying on
    // keySet() size (even though it's used, ArrayList grows as needed.
    Set<String> fileNames = fileMap.keySet();
    List<String> names = new ArrayList<String>(fileNames.size());
    for (String name : fileNames) names.add(name);
    return names.toArray(new String[names.size()]);
{code}

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera resolved LUCENE-2779.
--------------------------------

    Resolution: Fixed

Committed revision 1040138 (trunk).
Committed revision 1040145 (3x).

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch, LUCENE-2779.patch
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ReadWriteLock in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964500#action_12964500 ] 

Shai Erera commented on LUCENE-2779:
------------------------------------

In my app, I need to server ~50 qps on a single JVM. And that JVM is dedicated entirely for reading the index (so the 'WriteLock' will in fact never be required. I don't think this will be a huge performance gain, but as one smart guy once said on the mailing list, if we keep adding 1 to 1, eventually it adds up to a big gain.

So I do think RWLock is in place. Also, I think CocurrentHashMap nearly perfectly matches here, except for the listAll() case where I still need to figure out if something will go wrong in case the map is modified in the middle of iteration.

> Use ReadWriteLock in RAMDirectory
> ---------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964576#action_12964576 ] 

Yonik Seeley commented on LUCENE-2779:
--------------------------------------

Anyway, you should be able to replace all the code in listAll with
{code}
  return (String[])fileMap.keySet().toArray();
{code}

The toArray() code of collection is smart and can handle the number of elements changing.

And for createOutput, replacing get() with remove() should be sufficient.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965300#action_12965300 ] 

Uwe Schindler commented on LUCENE-2779:
---------------------------------------

Earwin: I wanted to say the same. 

JRockit and Harmony are broken, Sun JVM is correct. So remove the code and only use toArray(). We cannot work around bugs in non-oficially-supported JVMs (IBM wants to go away from JRocket, too).

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera updated LUCENE-2779:
-------------------------------

    Attachment: LUCENE-2779.patch

Fixed createOutput and listAll as Yonik suggested. Thanks Yonik - it looks safer now :).

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch, LUCENE-2779.patch
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965235#action_12965235 ] 

Uwe Schindler commented on LUCENE-2779:
---------------------------------------

I am fine, there is only a typo copied from my patch in the RuntimeException :-)

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Reopened: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reopened LUCENE-2779:
-----------------------------------


This commit broke backwards compatibility in 3.x.

Please revert 3.x, we are no longer backwards compatible, because we changed type of a *protected* field!

(and: is it so complicated to do *ant test-backwards* before committing such a change?)

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965234#action_12965234 ] 

Shai Erera commented on LUCENE-2779:
------------------------------------

I plan to commit this tomorrow, so if you have comments, please share them.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965303#action_12965303 ] 

Uwe Schindler edited comment on LUCENE-2779 at 11/30/10 12:43 PM:
------------------------------------------------------------------

The only performant variant of that code that works around all these bugs is the code snippet in Java SE6 JavaDocs, which is published by Sun as "works-as-if-implemented-as" in their JavaDocs: 

{code}
Set<String> set = #####.keySet();
List<String> list = new ArrayList<String>(set.size());
for (String s : set) list.add(s);
return list.toArray(new String[0]);
{code}

But I still don't think we should do this, its a bug outside Lucene in a seldom used JVM! Sun Java 5 and Sun Java 6 work 100% correct and never throw exceptions (by using techniques like above).

      was (Author: thetaphi):
    The only performant variant of that code that works around all these bugs is the code snippet in Java SE6 JavaDocs, which is published by Sun as "works-as-if-implemented-as" in their JavaDocs: 

{code}
Set<String> set = #####.keySet();
List<String> list = new ArrayList<String>(set.size());
for (String s : set) list.add(s);
return list.toArray(new String[0]);
{code}

But I still don't think we should do this, its a bug outside Lucene in a seldom used JVM!
  
> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964565#action_12964565 ] 

Yonik Seeley commented on LUCENE-2779:
--------------------------------------

The generic problem with optimizing something is the risk of introducing bugs (so hopefully we're optimizing where it counts).

At a quick glance, I see:
- a race in listAll() (as earwin indicated) that can cause an AIOB exception
- a race in createOutput that can lead to an incorrect directory size (multiple threads can subtract the size of the same file being overwritten).  This probably isn't an issue with how Lucene uses RAMDirectory.

Changing how concurrency works is rarely easy ;-)

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965194#action_12965194 ] 

Earwin Burrfoot commented on LUCENE-2779:
-----------------------------------------

bq. Cloning the keySet() will be exactly the 'snapshotty' behavior we're looking for. Before I made the change, you could call listAll(), lock RAMDir, return the array and before/after that files could be added/removed. W/ the clone, we'll get the same behavior - files can be added/removed before the clone, clone would reflect those changes, whatever happens after the clone is invisible to the iterator - hence why I consider it snapshotty.
There are still weird cases, when file B was added after deleting A, but you see both in listAll(). These - remain, so it's not a "point in time" it's more like a "span in time".
Whatever happened after toArray was invisible to array too, so the behaviour hasn't changed.

bq. Also, what do you know, I've hit an AIOB exception thrown from listAll() when it called toArray() :)
But _this_ fact is really interesting. toArray() with no parameters failed on CHM? Cloning has a meaning now :)

bq. Earwin, I did not *just* backport it. ........
Didn't mean to offend anyone, sorry if I did.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965296#action_12965296 ] 

Earwin Burrfoot commented on LUCENE-2779:
-----------------------------------------

Quoting Sun JDK 1.6:

{code}
public ArrayList(Collection<? extends E> c) {
  elementData = c.toArray();
  size = elementData.length;
  // c.toArray might (incorrectly) not return Object[] (see 6260652)
  if (elementData.getClass() != Object[].class)
    elementData = Arrays.copyOf(elementData, size, Object[].class);
}
{code}

It calls toArray() on collection provided. You might as well skip wrapping with ArrayList and use toArray directly :D

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2779) Use ReadWriteLock in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera updated LUCENE-2779:
-------------------------------

    Attachment: LUCENE-2779.patch

Patch replaces HashMap w/ ConcurrentHashMap and removes synchronized blocks. All tests pass.

> Use ReadWriteLock in RAMDirectory
> ---------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779.patch
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ReadWriteLock in RAMDirectory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964511#action_12964511 ] 

Earwin Burrfoot commented on LUCENE-2779:
-----------------------------------------

If you don't write, you don't care about locking in RAMDir at all. Your reader opens files once, and then never ever does it again.
Believe me, this is so pointless.

> Use ReadWriteLock in RAMDirectory
> ---------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965380#action_12965380 ] 

Earwin Burrfoot commented on LUCENE-2779:
-----------------------------------------

bq. So I ended up writing the following code and comment
Looks good.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ReadWriteLock in RAMDirectory

Posted by "Simon Willnauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964514#action_12964514 ] 

Simon Willnauer commented on LUCENE-2779:
-----------------------------------------

Shai, I actually think Earwin is right with his claim that this is unneeded / pointless really. Performance on modern JVMs is very good for both RWLock and synchronized blocks and to make a big difference heavy contention is needed anyway. I would not expect any difference if you are on a Java 6 JVM at all even if you'd have heavy contention. I have looked into this too a while ago and came to the same conclusion as earwin, there seem to be no real gain in refactoring this to use RWLocks instead fo sync blocks. 

> Use ReadWriteLock in RAMDirectory
> ---------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965245#action_12965245 ] 

Uwe Schindler commented on LUCENE-2779:
---------------------------------------

Yonik: You are right. In Sun Java 5 (src.zip, ConcurrentHashMap.java, line 1207) you can see how the toArray is implemented for KeySet. It creates an ArrayList and then uses the Iterator and adds each key to the ArrayList. After that it calls toArray on the ArrayList. This is safe, as the iterator is documented to be safe.

If Shai wants to keep this code, he should at least use ArrayList instead of HashSet for the keys, as cloning is much faster then.

Maybe Shai uses IBM JRocket? Or Harmony?

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ReadWriteLock in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935812#action_12935812 ] 

Shai Erera commented on LUCENE-2779:
------------------------------------

Perhaps we can eliminate synchronization entirely by using ConcurrentHashMap - I'll check that too.

> Use ReadWriteLock in RAMDirectory
> ---------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ReadWriteLock in RAMDirectory

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935816#action_12935816 ] 

Earwin Burrfoot commented on LUCENE-2779:
-----------------------------------------

In fact, uncontended synchronized block might perform faster than RWLock.

> Use ReadWriteLock in RAMDirectory
> ---------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. So I think ReadWriteLock can be useful.
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965390#action_12965390 ] 

Yonik Seeley commented on LUCENE-2779:
--------------------------------------

bq. The question is, why should we rely on toArray() implemented one way or the other in which JDK?

Both Harmony and IBM's implementations are broken - it's not an issue of a different but still valid implementation.

bq. Actually I think [sadly] it is our responsibility to sidestep JRE bugs when we can. We want to maximize Lucene's portability.

+1 (within reason)
In this case, it's relatively straightforward to do so, and it's not in an inner-loop.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965130#action_12965130 ] 

Shai Erera commented on LUCENE-2779:
------------------------------------

bq. Doesn't look like a really important patch, that just has to be backported. 

Earwin, I did not *just* backport it. This time I deviated from my usual work habit, which is changing 3x before trunk, because I did want to cause any mergeprops troubles. But in real life, I work on and improve 3x, and back/forward port to trunk -- at least until trunk/4.0 release will be anywhere near sight. So for all intent and purposes, this improvement should have gone into 3x, as far as I'm concerned. That that we have an issue w/ our backwards tests layer (sent a separate email about it) is unrelated to the issue.

As a general practice though, I do not neglect 3x branch -- if something that I do can fit there, I will put it there, whether it's a major performance improvement, or a minor house cleaning.

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2779) Use ConcurrentHashMap in RAMDirectory

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964993#action_12964993 ] 

Uwe Schindler commented on LUCENE-2779:
---------------------------------------

OK, but in 3.x its clearly a backwards break. So if we change it there, we have to put it into backwards break section and apply the attached patch to backwards/

> Use ConcurrentHashMap in RAMDirectory
> -------------------------------------
>
>                 Key: LUCENE-2779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2779
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2779-backwardsfix.patch, LUCENE-2779.patch, LUCENE-2779.patch, TestCHM.java
>
>
> RAMDirectory synchronizes on its instance in many places to protect access to map of RAMFiles, in addition to updating the sizeInBytes member. In many places the sync is done for 'read' purposes, while only in few places we need 'write' access. This looks like a perfect use case for ConcurrentHashMap
> Also, syncing around sizeInBytes is unnecessary IMO, since it's an AtomicLong ...
> I'll post a patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org