You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Ariel Valentin <ar...@arielvalentin.com> on 2014/02/12 10:47:47 UTC

Synchronized Access to ZooCache Causing Threads to Block

I have run into a problem related to ACCUMULO-1833, which appears to have
addressed the issue for MutliTableBatchWriter; however I am seeing this
issue on the scanner side also:

394750-"http-/192.168.220.196:8080-35" daemon prio=10
tid=0x00007f3108038000 nid=0x538a waiting for monitor entry
[0x00007f31287d1000]

394878:   java.lang.Thread.State: BLOCKED (on object monitor)

394933- at
org.apache.accumulo.fate.zookeeper.ZooCache.getInstance(ZooCache.java:301)

395012- - waiting to lock <0x00000000fa64f5b8> (a java.lang.Class for
org.apache.accumulo.fate.zookeeper.ZooCache)

395120- at
org.apache.accumulo.core.client.impl.Tables.getZooCache(Tables.java:40)

395196- at
org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:44)

395267- at
org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)

395346- at
org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)

395421- at
org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)

395510- at
org.apache.accumulo.core.client.impl.ConnectorImpl.createScanner(ConnectorImpl.java:137)

I have not spent enough time reasoning about the code to understand all of
the nuances but I am interested in knowing if there are any mitigating
strategies for dealing with this thread contention e.g. would creating a
cache entry for each member of the Zookeeper ensemble help relieve the
strain? use multiple classloaders? or is my only option to spawn multiple
JVMs?

Thanks,
Ariel Valentin
e-mail: ariel@arielvalentin.com
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: http://www.linkedin.com/profile/view?id=8996534
---------------------------------------
*simplicity *communication
*feedback *courage *respect

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Josh Elser <jo...@gmail.com>.

Sick! Thanks for sharing -- feedback is always welcome and appreciated.

On 2/12/14, 8:44 PM, Ariel Valentin wrote:
> Josh,
>
> We experimented with 1.5.1 today; our load test numbers seem to indicate a 10x performance improvement over 1.5.0 on a single JVM. We are running additional experiments over the next few days to see what happens when we move to multiple JVMs. Stay tuned.
>
> Thanks,
> Ariel
> ---
> Sent from my mobile device. Please excuse any errors.
>
>> On Feb 12, 2014, at 6:01 PM, Josh Elser <jo...@gmail.com> wrote:
>>
>> Also, for completeness: I filed ACCUMULO-2362 to work on concurrent accesses to the same instance in the same JVM.
>>
>> Also, I misspoke earlier: much of the lock contention comes out of the Tables class, not from the Instance. ZooCache keeps a static map of instance to ZooCache which are used by a wide breadth of API calls.
>>
>>> On 2/12/14, 3:58 PM, Josh Elser wrote:
>>> ACCUMULO-1833 was merged into 1.5.1-SNAPSHOT a long time ago. I probably
>>> never cleaned up the branch after I finished the ticket.
>>>
>>> I believe John Vines started looking at using Curator, but I think he
>>> decided in the end that there wasn't significant gains to be had by
>>> using it. I'm sure he commented on the ticket he had for it.
>>>
>>>> On 2/12/14, 3:56 PM, Ariel Valentin wrote:
>>>> Is the 1833 branch going to be part of 1.5.1?
>>>> I recall reading somewhere that there was interest in using Curator to
>>>> ameliorate working with zookeeper. Is that still part of the release
>>>> roadmap?
>>>>
>>>> Thanks,
>>>> Ariel
>>>> ---
>>>> Sent from my mobile device. Please excuse any errors.
>>>>
>>>>> On Feb 12, 2014, at 3:13 PM, Josh Elser <jo...@gmail.com> wrote:
>>>>>
>>>>> Great, that helps. Thanks for the info, Ariel!
>>>>>
>>>>> I think this might be an area we want to revisit in later versions of
>>>>> Accumulo to make the client API implementations a little more robust
>>>>> and supportive of concurrent usage.
>>>>>
>>>>>> On 2/12/14, 3:10 PM, Ariel Valentin wrote:
>>>>>> Josh,
>>>>>>
>>>>>> The symptom is that we hit a point where a single server seems
>>>>>> "unresponsive" but we do not see anything unusual going on in that
>>>>>> machine and it seems idol. No heavy CPU, no I/O wait, low load average;
>>>>>> however when we add additional instances of the JVM our capacity seems
>>>>>> to increase linearly.
>>>>>>
>>>>>> Based on thread dumps and profiler stats it appears that under "heavy"
>>>>>> load most of our threads are blocked trying to access ZooCache.
>>>>>>
>>>>>>
>>>>>> Ariel Valentin
>>>>>> e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>>>>>> website: http://blog.arielvalentin.com
>>>>>> skype: ariel.s.valentin
>>>>>> twitter: arielvalentin
>>>>>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>>>>>> ---------------------------------------
>>>>>> *simplicity *communication
>>>>>> *feedback *courage *respect
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <josh.elser@gmail.com
>>>>>> <ma...@gmail.com>> wrote:
>>>>>>
>>>>>>     Didn't mean to ask about the subject matter, but how you were using
>>>>>>     the API. Are you actually seeing contention on ZooCache?
>>>>>>
>>>>>>
>>>>>>     On 2/12/14, 1:19 PM, Ariel Valentin wrote:
>>>>>>
>>>>>>         Sorry but I am not at liberty to be specific about our business
>>>>>>         problem.
>>>>>>
>>>>>>         Typical usage is multiple clients writing data to tables, which
>>>>>>         scan to
>>>>>>         avoid duplicate entries.
>>>>>>
>>>>>>         Ariel Valentin
>>>>>>         e-mail: ariel@arielvalentin.com
>>>>>> <ma...@arielvalentin.com>
>>>>>>         <mailto:ariel@arielvalentin.__com
>>>>>> <ma...@arielvalentin.com>>
>>>>>>         website: http://blog.arielvalentin.com
>>>>>>         skype: ariel.s.valentin
>>>>>>         twitter: arielvalentin
>>>>>>         linkedin: http://www.linkedin.com/__profile/view?id=8996534
>>>>>>         <http://www.linkedin.com/profile/view?id=8996534>
>>>>>>         ------------------------------__---------
>>>>>>         *simplicity *communication
>>>>>>         *feedback *courage *respect
>>>>>>
>>>>>>
>>>>>>         On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
>>>>>>         <josh.elser@gmail.com <ma...@gmail.com>
>>>>>>         <mailto:josh.elser@gmail.com <ma...@gmail.com>>>
>>>>>> wrote:
>>>>>>
>>>>>>              Also, I forgot this part before:
>>>>>>
>>>>>>              The ZooCache instance that's used *typically* comes
>>>>>> from the
>>>>>>              Instance object that your Connector was created from.
>>>>>> In other
>>>>>>              words, if you create multiple Instances
>>>>>> (ZooKeeperInstance,
>>>>>>              usually), you can get multiple ZooCaches which means that
>>>>>>         concurrent
>>>>>>              calls to methods off of those objects should not block one
>>>>>>         another
>>>>>>              (createScanner off of connector1 from instance1 should not
>>>>>>         block
>>>>>>              createScanner off of connector2 from instance2).
>>>>>>
>>>>>>              That should be something quick you can play with if you so
>>>>>>         desire.
>>>>>>
>>>>>>
>>>>>>              On 2/12/14, 9:57 AM, Josh Elser wrote:
>>>>>>
>>>>>>                  Yep, you'll likely also block on BatchScanner,
>>>>>> anything in
>>>>>>                  TableOperations, and a host of other things.
>>>>>>
>>>>>>                  For scanners, there's likely a standing
>>>>>> recommendation to
>>>>>>                  amortize the
>>>>>>                  use of those objects (if you want to look up 5 range,
>>>>>>         don't make 5
>>>>>>                  scanners).
>>>>>>
>>>>>>                  Creating a cache per member in the work would likely
>>>>>>         require
>>>>>>                  some kind
>>>>>>                  of paxos implementation to provide consistency
>>>>>> which is
>>>>>>         highly
>>>>>>                  undesirable.
>>>>>>
>>>>>>                  One thing I'm curious about is the impact of removing
>>>>>>         ZooCache
>>>>>>                  altogether from things like the client api and see
>>>>>> what
>>>>>>         happens.
>>>>>>                  I don't
>>>>>>                  have a good way to measure that impact off the top of
>>>>>>         my head
>>>>>>                  though.
>>>>>>
>>>>>>                  Anyways, is this causing you problems in your usage of
>>>>>>         the api?
>>>>>>                  Could
>>>>>>                  you elaborate a bit more on the specifics?
>>>>>>
>>>>>>                  On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>>>>>>                  <ariel@arielvalentin.com
>>>>>>         <ma...@arielvalentin.com>
>>>>>>         <mailto:ariel@arielvalentin.__com
>>>>>> <ma...@arielvalentin.com>>
>>>>>>                  <mailto:ariel@arielvalentin.
>>>>>>         <ma...@arielvalentin.>____com
>>>>>>
>>>>>>                  <mailto:ariel@arielvalentin.__com
>>>>>>         <ma...@arielvalentin.com>>>> wrote:
>>>>>>
>>>>>>                       I have run into a problem related to
>>>>>>         ACCUMULO-1833, which
>>>>>>                  appears to
>>>>>>                       have addressed the issue for
>>>>>>         MutliTableBatchWriter; however
>>>>>>                  I am
>>>>>>                       seeing this issue on the scanner side also:
>>>>>>
>>>>>>                       394750-"http-/192.168.220.196
>>>>>> <http://192.168.220.196>
>>>>>>                  <http://192.168.220.196>:____8080-35" daemon prio=10
>>>>>>
>>>>>>                       tid=0x00007f3108038000 nid=0x538a waiting for
>>>>>>         monitor entry
>>>>>>                       [0x00007f31287d1000]
>>>>>>
>>>>>>                       394878:   java.lang.Thread.State: BLOCKED (on
>>>>>>         object monitor)
>>>>>>
>>>>>>                       394933- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)
>>>>>>
>>>>>>
>>>>>>
>>>>>>                       395012- - waiting to lock <0x00000000fa64f5b8> (a
>>>>>>                  java.lang.Class
>>>>>>                       for
>>>>>> org.apache.accumulo.fate.____zookeeper.ZooCache)
>>>>>>
>>>>>>                       395120- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)
>>>>>>
>>>>>>
>>>>>>                       395196- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)
>>>>>>
>>>>>>
>>>>>>                       395267- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)
>>>>>>
>>>>>>
>>>>>>                       395346- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)
>>>>>>
>>>>>>
>>>>>>                       395421- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)
>>>>>>
>>>>>>
>>>>>>                       395510- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)
>>>>>>
>>>>>>
>>>>>>
>>>>>>                       I have not spent enough time reasoning about the
>>>>>>         code to
>>>>>>                  understand
>>>>>>                       all of the nuances but I am interested in knowing
>>>>>>         if there
>>>>>>                  are any
>>>>>>                       mitigating strategies for dealing with this
>>>>>> thread
>>>>>>                  contention e.g.
>>>>>>                       would creating a cache entry for each member of
>>>>>>         the Zookeeper
>>>>>>                       ensemble help relieve the strain? use multiple
>>>>>>                  classloaders? or is
>>>>>>                       my only option to spawn multiple JVMs?
>>>>>>
>>>>>>                       Thanks,
>>>>>>
>>>>>>                       Ariel Valentin
>>>>>>                       e-mail: ariel@arielvalentin.com
>>>>>>         <ma...@arielvalentin.com>
>>>>>>                  <mailto:ariel@arielvalentin.__com
>>>>>>         <ma...@arielvalentin.com>>
>>>>>>                  <mailto:ariel@arielvalentin.
>>>>>>         <ma...@arielvalentin.>____com
>>>>>>         <mailto:ariel@arielvalentin.__com
>>>>>> <ma...@arielvalentin.com>>>
>>>>>>
>>>>>>
>>>>>>                       website: http://blog.arielvalentin.com
>>>>>>                       skype: ariel.s.valentin
>>>>>>                       twitter: arielvalentin
>>>>>>                       linkedin:
>>>>>>         http://www.linkedin.com/____profile/view?id=8996534
>>>>>>         <http://www.linkedin.com/__profile/view?id=8996534>
>>>>>>                  <http://www.linkedin.com/__profile/view?id=8996534
>>>>>>         <http://www.linkedin.com/profile/view?id=8996534>>
>>>>>>                       ------------------------------____---------
>>>>>>
>>>>>>                       *simplicity *communication
>>>>>>                       *feedback *courage *respect
>>>>>>
>>>>>>
>>>>>>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Ariel Valentin <ar...@arielvalentin.com>.

Josh,

We experimented with 1.5.1 today; our load test numbers seem to indicate a 10x performance improvement over 1.5.0 on a single JVM. We are running additional experiments over the next few days to see what happens when we move to multiple JVMs. Stay tuned.

Thanks,
Ariel
---
Sent from my mobile device. Please excuse any errors.

> On Feb 12, 2014, at 6:01 PM, Josh Elser <jo...@gmail.com> wrote:
> 
> Also, for completeness: I filed ACCUMULO-2362 to work on concurrent accesses to the same instance in the same JVM.
> 
> Also, I misspoke earlier: much of the lock contention comes out of the Tables class, not from the Instance. ZooCache keeps a static map of instance to ZooCache which are used by a wide breadth of API calls.
> 
>> On 2/12/14, 3:58 PM, Josh Elser wrote:
>> ACCUMULO-1833 was merged into 1.5.1-SNAPSHOT a long time ago. I probably
>> never cleaned up the branch after I finished the ticket.
>> 
>> I believe John Vines started looking at using Curator, but I think he
>> decided in the end that there wasn't significant gains to be had by
>> using it. I'm sure he commented on the ticket he had for it.
>> 
>>> On 2/12/14, 3:56 PM, Ariel Valentin wrote:
>>> Is the 1833 branch going to be part of 1.5.1?
>>> I recall reading somewhere that there was interest in using Curator to
>>> ameliorate working with zookeeper. Is that still part of the release
>>> roadmap?
>>> 
>>> Thanks,
>>> Ariel
>>> ---
>>> Sent from my mobile device. Please excuse any errors.
>>> 
>>>> On Feb 12, 2014, at 3:13 PM, Josh Elser <jo...@gmail.com> wrote:
>>>> 
>>>> Great, that helps. Thanks for the info, Ariel!
>>>> 
>>>> I think this might be an area we want to revisit in later versions of
>>>> Accumulo to make the client API implementations a little more robust
>>>> and supportive of concurrent usage.
>>>> 
>>>>> On 2/12/14, 3:10 PM, Ariel Valentin wrote:
>>>>> Josh,
>>>>> 
>>>>> The symptom is that we hit a point where a single server seems
>>>>> "unresponsive" but we do not see anything unusual going on in that
>>>>> machine and it seems idol. No heavy CPU, no I/O wait, low load average;
>>>>> however when we add additional instances of the JVM our capacity seems
>>>>> to increase linearly.
>>>>> 
>>>>> Based on thread dumps and profiler stats it appears that under "heavy"
>>>>> load most of our threads are blocked trying to access ZooCache.
>>>>> 
>>>>> 
>>>>> Ariel Valentin
>>>>> e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>>>>> website: http://blog.arielvalentin.com
>>>>> skype: ariel.s.valentin
>>>>> twitter: arielvalentin
>>>>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>>>>> ---------------------------------------
>>>>> *simplicity *communication
>>>>> *feedback *courage *respect
>>>>> 
>>>>> 
>>>>> On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <josh.elser@gmail.com
>>>>> <ma...@gmail.com>> wrote:
>>>>> 
>>>>>    Didn't mean to ask about the subject matter, but how you were using
>>>>>    the API. Are you actually seeing contention on ZooCache?
>>>>> 
>>>>> 
>>>>>    On 2/12/14, 1:19 PM, Ariel Valentin wrote:
>>>>> 
>>>>>        Sorry but I am not at liberty to be specific about our business
>>>>>        problem.
>>>>> 
>>>>>        Typical usage is multiple clients writing data to tables, which
>>>>>        scan to
>>>>>        avoid duplicate entries.
>>>>> 
>>>>>        Ariel Valentin
>>>>>        e-mail: ariel@arielvalentin.com
>>>>> <ma...@arielvalentin.com>
>>>>>        <mailto:ariel@arielvalentin.__com
>>>>> <ma...@arielvalentin.com>>
>>>>>        website: http://blog.arielvalentin.com
>>>>>        skype: ariel.s.valentin
>>>>>        twitter: arielvalentin
>>>>>        linkedin: http://www.linkedin.com/__profile/view?id=8996534
>>>>>        <http://www.linkedin.com/profile/view?id=8996534>
>>>>>        ------------------------------__---------
>>>>>        *simplicity *communication
>>>>>        *feedback *courage *respect
>>>>> 
>>>>> 
>>>>>        On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
>>>>>        <josh.elser@gmail.com <ma...@gmail.com>
>>>>>        <mailto:josh.elser@gmail.com <ma...@gmail.com>>>
>>>>> wrote:
>>>>> 
>>>>>             Also, I forgot this part before:
>>>>> 
>>>>>             The ZooCache instance that's used *typically* comes
>>>>> from the
>>>>>             Instance object that your Connector was created from.
>>>>> In other
>>>>>             words, if you create multiple Instances
>>>>> (ZooKeeperInstance,
>>>>>             usually), you can get multiple ZooCaches which means that
>>>>>        concurrent
>>>>>             calls to methods off of those objects should not block one
>>>>>        another
>>>>>             (createScanner off of connector1 from instance1 should not
>>>>>        block
>>>>>             createScanner off of connector2 from instance2).
>>>>> 
>>>>>             That should be something quick you can play with if you so
>>>>>        desire.
>>>>> 
>>>>> 
>>>>>             On 2/12/14, 9:57 AM, Josh Elser wrote:
>>>>> 
>>>>>                 Yep, you'll likely also block on BatchScanner,
>>>>> anything in
>>>>>                 TableOperations, and a host of other things.
>>>>> 
>>>>>                 For scanners, there's likely a standing
>>>>> recommendation to
>>>>>                 amortize the
>>>>>                 use of those objects (if you want to look up 5 range,
>>>>>        don't make 5
>>>>>                 scanners).
>>>>> 
>>>>>                 Creating a cache per member in the work would likely
>>>>>        require
>>>>>                 some kind
>>>>>                 of paxos implementation to provide consistency
>>>>> which is
>>>>>        highly
>>>>>                 undesirable.
>>>>> 
>>>>>                 One thing I'm curious about is the impact of removing
>>>>>        ZooCache
>>>>>                 altogether from things like the client api and see
>>>>> what
>>>>>        happens.
>>>>>                 I don't
>>>>>                 have a good way to measure that impact off the top of
>>>>>        my head
>>>>>                 though.
>>>>> 
>>>>>                 Anyways, is this causing you problems in your usage of
>>>>>        the api?
>>>>>                 Could
>>>>>                 you elaborate a bit more on the specifics?
>>>>> 
>>>>>                 On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>>>>>                 <ariel@arielvalentin.com
>>>>>        <ma...@arielvalentin.com>
>>>>>        <mailto:ariel@arielvalentin.__com
>>>>> <ma...@arielvalentin.com>>
>>>>>                 <mailto:ariel@arielvalentin.
>>>>>        <ma...@arielvalentin.>____com
>>>>> 
>>>>>                 <mailto:ariel@arielvalentin.__com
>>>>>        <ma...@arielvalentin.com>>>> wrote:
>>>>> 
>>>>>                      I have run into a problem related to
>>>>>        ACCUMULO-1833, which
>>>>>                 appears to
>>>>>                      have addressed the issue for
>>>>>        MutliTableBatchWriter; however
>>>>>                 I am
>>>>>                      seeing this issue on the scanner side also:
>>>>> 
>>>>>                      394750-"http-/192.168.220.196
>>>>> <http://192.168.220.196>
>>>>>                 <http://192.168.220.196>:____8080-35" daemon prio=10
>>>>> 
>>>>>                      tid=0x00007f3108038000 nid=0x538a waiting for
>>>>>        monitor entry
>>>>>                      [0x00007f31287d1000]
>>>>> 
>>>>>                      394878:   java.lang.Thread.State: BLOCKED (on
>>>>>        object monitor)
>>>>> 
>>>>>                      394933- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)
>>>>> 
>>>>> 
>>>>> 
>>>>>                      395012- - waiting to lock <0x00000000fa64f5b8> (a
>>>>>                 java.lang.Class
>>>>>                      for
>>>>> org.apache.accumulo.fate.____zookeeper.ZooCache)
>>>>> 
>>>>>                      395120- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)
>>>>> 
>>>>> 
>>>>>                      395196- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)
>>>>> 
>>>>> 
>>>>>                      395267- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)
>>>>> 
>>>>> 
>>>>>                      395346- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)
>>>>> 
>>>>> 
>>>>>                      395421- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)
>>>>> 
>>>>> 
>>>>>                      395510- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)
>>>>> 
>>>>> 
>>>>> 
>>>>>                      I have not spent enough time reasoning about the
>>>>>        code to
>>>>>                 understand
>>>>>                      all of the nuances but I am interested in knowing
>>>>>        if there
>>>>>                 are any
>>>>>                      mitigating strategies for dealing with this
>>>>> thread
>>>>>                 contention e.g.
>>>>>                      would creating a cache entry for each member of
>>>>>        the Zookeeper
>>>>>                      ensemble help relieve the strain? use multiple
>>>>>                 classloaders? or is
>>>>>                      my only option to spawn multiple JVMs?
>>>>> 
>>>>>                      Thanks,
>>>>> 
>>>>>                      Ariel Valentin
>>>>>                      e-mail: ariel@arielvalentin.com
>>>>>        <ma...@arielvalentin.com>
>>>>>                 <mailto:ariel@arielvalentin.__com
>>>>>        <ma...@arielvalentin.com>>
>>>>>                 <mailto:ariel@arielvalentin.
>>>>>        <ma...@arielvalentin.>____com
>>>>>        <mailto:ariel@arielvalentin.__com
>>>>> <ma...@arielvalentin.com>>>
>>>>> 
>>>>> 
>>>>>                      website: http://blog.arielvalentin.com
>>>>>                      skype: ariel.s.valentin
>>>>>                      twitter: arielvalentin
>>>>>                      linkedin:
>>>>>        http://www.linkedin.com/____profile/view?id=8996534
>>>>>        <http://www.linkedin.com/__profile/view?id=8996534>
>>>>>                 <http://www.linkedin.com/__profile/view?id=8996534
>>>>>        <http://www.linkedin.com/profile/view?id=8996534>>
>>>>>                      ------------------------------____---------
>>>>> 
>>>>>                      *simplicity *communication
>>>>>                      *feedback *courage *respect
>>>>> 
>>>>> 
>>>>>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Josh Elser <jo...@gmail.com>.

Also, for completeness: I filed ACCUMULO-2362 to work on concurrent 
accesses to the same instance in the same JVM.

Also, I misspoke earlier: much of the lock contention comes out of the 
Tables class, not from the Instance. ZooCache keeps a static map of 
instance to ZooCache which are used by a wide breadth of API calls.

On 2/12/14, 3:58 PM, Josh Elser wrote:
> ACCUMULO-1833 was merged into 1.5.1-SNAPSHOT a long time ago. I probably
> never cleaned up the branch after I finished the ticket.
>
> I believe John Vines started looking at using Curator, but I think he
> decided in the end that there wasn't significant gains to be had by
> using it. I'm sure he commented on the ticket he had for it.
>
> On 2/12/14, 3:56 PM, Ariel Valentin wrote:
>> Is the 1833 branch going to be part of 1.5.1?
>> I recall reading somewhere that there was interest in using Curator to
>> ameliorate working with zookeeper. Is that still part of the release
>> roadmap?
>>
>> Thanks,
>> Ariel
>> ---
>> Sent from my mobile device. Please excuse any errors.
>>
>>> On Feb 12, 2014, at 3:13 PM, Josh Elser <jo...@gmail.com> wrote:
>>>
>>> Great, that helps. Thanks for the info, Ariel!
>>>
>>> I think this might be an area we want to revisit in later versions of
>>> Accumulo to make the client API implementations a little more robust
>>> and supportive of concurrent usage.
>>>
>>>> On 2/12/14, 3:10 PM, Ariel Valentin wrote:
>>>> Josh,
>>>>
>>>> The symptom is that we hit a point where a single server seems
>>>> "unresponsive" but we do not see anything unusual going on in that
>>>> machine and it seems idol. No heavy CPU, no I/O wait, low load average;
>>>> however when we add additional instances of the JVM our capacity seems
>>>> to increase linearly.
>>>>
>>>> Based on thread dumps and profiler stats it appears that under "heavy"
>>>> load most of our threads are blocked trying to access ZooCache.
>>>>
>>>>
>>>> Ariel Valentin
>>>> e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>>>> website: http://blog.arielvalentin.com
>>>> skype: ariel.s.valentin
>>>> twitter: arielvalentin
>>>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>>>> ---------------------------------------
>>>> *simplicity *communication
>>>> *feedback *courage *respect
>>>>
>>>>
>>>> On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <josh.elser@gmail.com
>>>> <ma...@gmail.com>> wrote:
>>>>
>>>>     Didn't mean to ask about the subject matter, but how you were using
>>>>     the API. Are you actually seeing contention on ZooCache?
>>>>
>>>>
>>>>     On 2/12/14, 1:19 PM, Ariel Valentin wrote:
>>>>
>>>>         Sorry but I am not at liberty to be specific about our business
>>>>         problem.
>>>>
>>>>         Typical usage is multiple clients writing data to tables, which
>>>>         scan to
>>>>         avoid duplicate entries.
>>>>
>>>>         Ariel Valentin
>>>>         e-mail: ariel@arielvalentin.com
>>>> <ma...@arielvalentin.com>
>>>>         <mailto:ariel@arielvalentin.__com
>>>> <ma...@arielvalentin.com>>
>>>>         website: http://blog.arielvalentin.com
>>>>         skype: ariel.s.valentin
>>>>         twitter: arielvalentin
>>>>         linkedin: http://www.linkedin.com/__profile/view?id=8996534
>>>>         <http://www.linkedin.com/profile/view?id=8996534>
>>>>         ------------------------------__---------
>>>>         *simplicity *communication
>>>>         *feedback *courage *respect
>>>>
>>>>
>>>>         On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
>>>>         <josh.elser@gmail.com <ma...@gmail.com>
>>>>         <mailto:josh.elser@gmail.com <ma...@gmail.com>>>
>>>> wrote:
>>>>
>>>>              Also, I forgot this part before:
>>>>
>>>>              The ZooCache instance that's used *typically* comes
>>>> from the
>>>>              Instance object that your Connector was created from.
>>>> In other
>>>>              words, if you create multiple Instances
>>>> (ZooKeeperInstance,
>>>>              usually), you can get multiple ZooCaches which means that
>>>>         concurrent
>>>>              calls to methods off of those objects should not block one
>>>>         another
>>>>              (createScanner off of connector1 from instance1 should not
>>>>         block
>>>>              createScanner off of connector2 from instance2).
>>>>
>>>>              That should be something quick you can play with if you so
>>>>         desire.
>>>>
>>>>
>>>>              On 2/12/14, 9:57 AM, Josh Elser wrote:
>>>>
>>>>                  Yep, you'll likely also block on BatchScanner,
>>>> anything in
>>>>                  TableOperations, and a host of other things.
>>>>
>>>>                  For scanners, there's likely a standing
>>>> recommendation to
>>>>                  amortize the
>>>>                  use of those objects (if you want to look up 5 range,
>>>>         don't make 5
>>>>                  scanners).
>>>>
>>>>                  Creating a cache per member in the work would likely
>>>>         require
>>>>                  some kind
>>>>                  of paxos implementation to provide consistency
>>>> which is
>>>>         highly
>>>>                  undesirable.
>>>>
>>>>                  One thing I'm curious about is the impact of removing
>>>>         ZooCache
>>>>                  altogether from things like the client api and see
>>>> what
>>>>         happens.
>>>>                  I don't
>>>>                  have a good way to measure that impact off the top of
>>>>         my head
>>>>                  though.
>>>>
>>>>                  Anyways, is this causing you problems in your usage of
>>>>         the api?
>>>>                  Could
>>>>                  you elaborate a bit more on the specifics?
>>>>
>>>>                  On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>>>>                  <ariel@arielvalentin.com
>>>>         <ma...@arielvalentin.com>
>>>>         <mailto:ariel@arielvalentin.__com
>>>> <ma...@arielvalentin.com>>
>>>>                  <mailto:ariel@arielvalentin.
>>>>         <ma...@arielvalentin.>____com
>>>>
>>>>                  <mailto:ariel@arielvalentin.__com
>>>>         <ma...@arielvalentin.com>>>> wrote:
>>>>
>>>>                       I have run into a problem related to
>>>>         ACCUMULO-1833, which
>>>>                  appears to
>>>>                       have addressed the issue for
>>>>         MutliTableBatchWriter; however
>>>>                  I am
>>>>                       seeing this issue on the scanner side also:
>>>>
>>>>                       394750-"http-/192.168.220.196
>>>> <http://192.168.220.196>
>>>>                  <http://192.168.220.196>:____8080-35" daemon prio=10
>>>>
>>>>                       tid=0x00007f3108038000 nid=0x538a waiting for
>>>>         monitor entry
>>>>                       [0x00007f31287d1000]
>>>>
>>>>                       394878:   java.lang.Thread.State: BLOCKED (on
>>>>         object monitor)
>>>>
>>>>                       394933- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)
>>>>
>>>>
>>>>
>>>>                       395012- - waiting to lock <0x00000000fa64f5b8> (a
>>>>                  java.lang.Class
>>>>                       for
>>>> org.apache.accumulo.fate.____zookeeper.ZooCache)
>>>>
>>>>                       395120- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)
>>>>
>>>>
>>>>                       395196- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)
>>>>
>>>>
>>>>                       395267- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)
>>>>
>>>>
>>>>                       395346- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)
>>>>
>>>>
>>>>                       395421- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)
>>>>
>>>>
>>>>                       395510- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)
>>>>
>>>>
>>>>
>>>>                       I have not spent enough time reasoning about the
>>>>         code to
>>>>                  understand
>>>>                       all of the nuances but I am interested in knowing
>>>>         if there
>>>>                  are any
>>>>                       mitigating strategies for dealing with this
>>>> thread
>>>>                  contention e.g.
>>>>                       would creating a cache entry for each member of
>>>>         the Zookeeper
>>>>                       ensemble help relieve the strain? use multiple
>>>>                  classloaders? or is
>>>>                       my only option to spawn multiple JVMs?
>>>>
>>>>                       Thanks,
>>>>
>>>>                       Ariel Valentin
>>>>                       e-mail: ariel@arielvalentin.com
>>>>         <ma...@arielvalentin.com>
>>>>                  <mailto:ariel@arielvalentin.__com
>>>>         <ma...@arielvalentin.com>>
>>>>                  <mailto:ariel@arielvalentin.
>>>>         <ma...@arielvalentin.>____com
>>>>         <mailto:ariel@arielvalentin.__com
>>>> <ma...@arielvalentin.com>>>
>>>>
>>>>
>>>>                       website: http://blog.arielvalentin.com
>>>>                       skype: ariel.s.valentin
>>>>                       twitter: arielvalentin
>>>>                       linkedin:
>>>>         http://www.linkedin.com/____profile/view?id=8996534
>>>>         <http://www.linkedin.com/__profile/view?id=8996534>
>>>>                  <http://www.linkedin.com/__profile/view?id=8996534
>>>>         <http://www.linkedin.com/profile/view?id=8996534>>
>>>>                       ------------------------------____---------
>>>>
>>>>                       *simplicity *communication
>>>>                       *feedback *courage *respect
>>>>
>>>>
>>>>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Josh Elser <jo...@gmail.com>.

ACCUMULO-1833 was merged into 1.5.1-SNAPSHOT a long time ago. I probably 
never cleaned up the branch after I finished the ticket.

I believe John Vines started looking at using Curator, but I think he 
decided in the end that there wasn't significant gains to be had by 
using it. I'm sure he commented on the ticket he had for it.

On 2/12/14, 3:56 PM, Ariel Valentin wrote:
> Is the 1833 branch going to be part of 1.5.1?
> I recall reading somewhere that there was interest in using Curator to ameliorate working with zookeeper. Is that still part of the release roadmap?
>
> Thanks,
> Ariel
> ---
> Sent from my mobile device. Please excuse any errors.
>
>> On Feb 12, 2014, at 3:13 PM, Josh Elser <jo...@gmail.com> wrote:
>>
>> Great, that helps. Thanks for the info, Ariel!
>>
>> I think this might be an area we want to revisit in later versions of Accumulo to make the client API implementations a little more robust and supportive of concurrent usage.
>>
>>> On 2/12/14, 3:10 PM, Ariel Valentin wrote:
>>> Josh,
>>>
>>> The symptom is that we hit a point where a single server seems
>>> "unresponsive" but we do not see anything unusual going on in that
>>> machine and it seems idol. No heavy CPU, no I/O wait, low load average;
>>> however when we add additional instances of the JVM our capacity seems
>>> to increase linearly.
>>>
>>> Based on thread dumps and profiler stats it appears that under "heavy"
>>> load most of our threads are blocked trying to access ZooCache.
>>>
>>>
>>> Ariel Valentin
>>> e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>>> website: http://blog.arielvalentin.com
>>> skype: ariel.s.valentin
>>> twitter: arielvalentin
>>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>>> ---------------------------------------
>>> *simplicity *communication
>>> *feedback *courage *respect
>>>
>>>
>>> On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <josh.elser@gmail.com
>>> <ma...@gmail.com>> wrote:
>>>
>>>     Didn't mean to ask about the subject matter, but how you were using
>>>     the API. Are you actually seeing contention on ZooCache?
>>>
>>>
>>>     On 2/12/14, 1:19 PM, Ariel Valentin wrote:
>>>
>>>         Sorry but I am not at liberty to be specific about our business
>>>         problem.
>>>
>>>         Typical usage is multiple clients writing data to tables, which
>>>         scan to
>>>         avoid duplicate entries.
>>>
>>>         Ariel Valentin
>>>         e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>>>         <mailto:ariel@arielvalentin.__com <ma...@arielvalentin.com>>
>>>         website: http://blog.arielvalentin.com
>>>         skype: ariel.s.valentin
>>>         twitter: arielvalentin
>>>         linkedin: http://www.linkedin.com/__profile/view?id=8996534
>>>         <http://www.linkedin.com/profile/view?id=8996534>
>>>         ------------------------------__---------
>>>         *simplicity *communication
>>>         *feedback *courage *respect
>>>
>>>
>>>         On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
>>>         <josh.elser@gmail.com <ma...@gmail.com>
>>>         <mailto:josh.elser@gmail.com <ma...@gmail.com>>> wrote:
>>>
>>>              Also, I forgot this part before:
>>>
>>>              The ZooCache instance that's used *typically* comes from the
>>>              Instance object that your Connector was created from. In other
>>>              words, if you create multiple Instances (ZooKeeperInstance,
>>>              usually), you can get multiple ZooCaches which means that
>>>         concurrent
>>>              calls to methods off of those objects should not block one
>>>         another
>>>              (createScanner off of connector1 from instance1 should not
>>>         block
>>>              createScanner off of connector2 from instance2).
>>>
>>>              That should be something quick you can play with if you so
>>>         desire.
>>>
>>>
>>>              On 2/12/14, 9:57 AM, Josh Elser wrote:
>>>
>>>                  Yep, you'll likely also block on BatchScanner, anything in
>>>                  TableOperations, and a host of other things.
>>>
>>>                  For scanners, there's likely a standing recommendation to
>>>                  amortize the
>>>                  use of those objects (if you want to look up 5 range,
>>>         don't make 5
>>>                  scanners).
>>>
>>>                  Creating a cache per member in the work would likely
>>>         require
>>>                  some kind
>>>                  of paxos implementation to provide consistency which is
>>>         highly
>>>                  undesirable.
>>>
>>>                  One thing I'm curious about is the impact of removing
>>>         ZooCache
>>>                  altogether from things like the client api and see what
>>>         happens.
>>>                  I don't
>>>                  have a good way to measure that impact off the top of
>>>         my head
>>>                  though.
>>>
>>>                  Anyways, is this causing you problems in your usage of
>>>         the api?
>>>                  Could
>>>                  you elaborate a bit more on the specifics?
>>>
>>>                  On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>>>                  <ariel@arielvalentin.com
>>>         <ma...@arielvalentin.com>
>>>         <mailto:ariel@arielvalentin.__com <ma...@arielvalentin.com>>
>>>                  <mailto:ariel@arielvalentin.
>>>         <ma...@arielvalentin.>____com
>>>
>>>                  <mailto:ariel@arielvalentin.__com
>>>         <ma...@arielvalentin.com>>>> wrote:
>>>
>>>                       I have run into a problem related to
>>>         ACCUMULO-1833, which
>>>                  appears to
>>>                       have addressed the issue for
>>>         MutliTableBatchWriter; however
>>>                  I am
>>>                       seeing this issue on the scanner side also:
>>>
>>>                       394750-"http-/192.168.220.196 <http://192.168.220.196>
>>>                  <http://192.168.220.196>:____8080-35" daemon prio=10
>>>
>>>                       tid=0x00007f3108038000 nid=0x538a waiting for
>>>         monitor entry
>>>                       [0x00007f31287d1000]
>>>
>>>                       394878:   java.lang.Thread.State: BLOCKED (on
>>>         object monitor)
>>>
>>>                       394933- at
>>>
>>>
>>>         org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)
>>>
>>>
>>>                       395012- - waiting to lock <0x00000000fa64f5b8> (a
>>>                  java.lang.Class
>>>                       for org.apache.accumulo.fate.____zookeeper.ZooCache)
>>>
>>>                       395120- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)
>>>
>>>                       395196- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)
>>>
>>>                       395267- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)
>>>
>>>                       395346- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)
>>>
>>>                       395421- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)
>>>
>>>                       395510- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)
>>>
>>>
>>>                       I have not spent enough time reasoning about the
>>>         code to
>>>                  understand
>>>                       all of the nuances but I am interested in knowing
>>>         if there
>>>                  are any
>>>                       mitigating strategies for dealing with this thread
>>>                  contention e.g.
>>>                       would creating a cache entry for each member of
>>>         the Zookeeper
>>>                       ensemble help relieve the strain? use multiple
>>>                  classloaders? or is
>>>                       my only option to spawn multiple JVMs?
>>>
>>>                       Thanks,
>>>
>>>                       Ariel Valentin
>>>                       e-mail: ariel@arielvalentin.com
>>>         <ma...@arielvalentin.com>
>>>                  <mailto:ariel@arielvalentin.__com
>>>         <ma...@arielvalentin.com>>
>>>                  <mailto:ariel@arielvalentin.
>>>         <ma...@arielvalentin.>____com
>>>         <mailto:ariel@arielvalentin.__com <ma...@arielvalentin.com>>>
>>>
>>>
>>>                       website: http://blog.arielvalentin.com
>>>                       skype: ariel.s.valentin
>>>                       twitter: arielvalentin
>>>                       linkedin:
>>>         http://www.linkedin.com/____profile/view?id=8996534
>>>         <http://www.linkedin.com/__profile/view?id=8996534>
>>>                  <http://www.linkedin.com/__profile/view?id=8996534
>>>         <http://www.linkedin.com/profile/view?id=8996534>>
>>>                       ------------------------------____---------
>>>
>>>                       *simplicity *communication
>>>                       *feedback *courage *respect
>>>
>>>
>>>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Ariel Valentin <ar...@arielvalentin.com>.

Is the 1833 branch going to be part of 1.5.1? 
I recall reading somewhere that there was interest in using Curator to ameliorate working with zookeeper. Is that still part of the release roadmap?

Thanks,
Ariel
---
Sent from my mobile device. Please excuse any errors.

> On Feb 12, 2014, at 3:13 PM, Josh Elser <jo...@gmail.com> wrote:
> 
> Great, that helps. Thanks for the info, Ariel!
> 
> I think this might be an area we want to revisit in later versions of Accumulo to make the client API implementations a little more robust and supportive of concurrent usage.
> 
>> On 2/12/14, 3:10 PM, Ariel Valentin wrote:
>> Josh,
>> 
>> The symptom is that we hit a point where a single server seems
>> "unresponsive" but we do not see anything unusual going on in that
>> machine and it seems idol. No heavy CPU, no I/O wait, low load average;
>> however when we add additional instances of the JVM our capacity seems
>> to increase linearly.
>> 
>> Based on thread dumps and profiler stats it appears that under "heavy"
>> load most of our threads are blocked trying to access ZooCache.
>> 
>> 
>> Ariel Valentin
>> e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>> website: http://blog.arielvalentin.com
>> skype: ariel.s.valentin
>> twitter: arielvalentin
>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>> ---------------------------------------
>> *simplicity *communication
>> *feedback *courage *respect
>> 
>> 
>> On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <josh.elser@gmail.com
>> <ma...@gmail.com>> wrote:
>> 
>>    Didn't mean to ask about the subject matter, but how you were using
>>    the API. Are you actually seeing contention on ZooCache?
>> 
>> 
>>    On 2/12/14, 1:19 PM, Ariel Valentin wrote:
>> 
>>        Sorry but I am not at liberty to be specific about our business
>>        problem.
>> 
>>        Typical usage is multiple clients writing data to tables, which
>>        scan to
>>        avoid duplicate entries.
>> 
>>        Ariel Valentin
>>        e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>>        <mailto:ariel@arielvalentin.__com <ma...@arielvalentin.com>>
>>        website: http://blog.arielvalentin.com
>>        skype: ariel.s.valentin
>>        twitter: arielvalentin
>>        linkedin: http://www.linkedin.com/__profile/view?id=8996534
>>        <http://www.linkedin.com/profile/view?id=8996534>
>>        ------------------------------__---------
>>        *simplicity *communication
>>        *feedback *courage *respect
>> 
>> 
>>        On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
>>        <josh.elser@gmail.com <ma...@gmail.com>
>>        <mailto:josh.elser@gmail.com <ma...@gmail.com>>> wrote:
>> 
>>             Also, I forgot this part before:
>> 
>>             The ZooCache instance that's used *typically* comes from the
>>             Instance object that your Connector was created from. In other
>>             words, if you create multiple Instances (ZooKeeperInstance,
>>             usually), you can get multiple ZooCaches which means that
>>        concurrent
>>             calls to methods off of those objects should not block one
>>        another
>>             (createScanner off of connector1 from instance1 should not
>>        block
>>             createScanner off of connector2 from instance2).
>> 
>>             That should be something quick you can play with if you so
>>        desire.
>> 
>> 
>>             On 2/12/14, 9:57 AM, Josh Elser wrote:
>> 
>>                 Yep, you'll likely also block on BatchScanner, anything in
>>                 TableOperations, and a host of other things.
>> 
>>                 For scanners, there's likely a standing recommendation to
>>                 amortize the
>>                 use of those objects (if you want to look up 5 range,
>>        don't make 5
>>                 scanners).
>> 
>>                 Creating a cache per member in the work would likely
>>        require
>>                 some kind
>>                 of paxos implementation to provide consistency which is
>>        highly
>>                 undesirable.
>> 
>>                 One thing I'm curious about is the impact of removing
>>        ZooCache
>>                 altogether from things like the client api and see what
>>        happens.
>>                 I don't
>>                 have a good way to measure that impact off the top of
>>        my head
>>                 though.
>> 
>>                 Anyways, is this causing you problems in your usage of
>>        the api?
>>                 Could
>>                 you elaborate a bit more on the specifics?
>> 
>>                 On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>>                 <ariel@arielvalentin.com
>>        <ma...@arielvalentin.com>
>>        <mailto:ariel@arielvalentin.__com <ma...@arielvalentin.com>>
>>                 <mailto:ariel@arielvalentin.
>>        <ma...@arielvalentin.>____com
>> 
>>                 <mailto:ariel@arielvalentin.__com
>>        <ma...@arielvalentin.com>>>> wrote:
>> 
>>                      I have run into a problem related to
>>        ACCUMULO-1833, which
>>                 appears to
>>                      have addressed the issue for
>>        MutliTableBatchWriter; however
>>                 I am
>>                      seeing this issue on the scanner side also:
>> 
>>                      394750-"http-/192.168.220.196 <http://192.168.220.196>
>>                 <http://192.168.220.196>:____8080-35" daemon prio=10
>> 
>>                      tid=0x00007f3108038000 nid=0x538a waiting for
>>        monitor entry
>>                      [0x00007f31287d1000]
>> 
>>                      394878:   java.lang.Thread.State: BLOCKED (on
>>        object monitor)
>> 
>>                      394933- at
>> 
>> 
>>        org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)
>> 
>> 
>>                      395012- - waiting to lock <0x00000000fa64f5b8> (a
>>                 java.lang.Class
>>                      for org.apache.accumulo.fate.____zookeeper.ZooCache)
>> 
>>                      395120- at
>> 
>> 
>>        org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)
>> 
>>                      395196- at
>> 
>> 
>>        org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)
>> 
>>                      395267- at
>> 
>> 
>>        org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)
>> 
>>                      395346- at
>> 
>> 
>>        org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)
>> 
>>                      395421- at
>> 
>> 
>>        org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)
>> 
>>                      395510- at
>> 
>> 
>>        org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)
>> 
>> 
>>                      I have not spent enough time reasoning about the
>>        code to
>>                 understand
>>                      all of the nuances but I am interested in knowing
>>        if there
>>                 are any
>>                      mitigating strategies for dealing with this thread
>>                 contention e.g.
>>                      would creating a cache entry for each member of
>>        the Zookeeper
>>                      ensemble help relieve the strain? use multiple
>>                 classloaders? or is
>>                      my only option to spawn multiple JVMs?
>> 
>>                      Thanks,
>> 
>>                      Ariel Valentin
>>                      e-mail: ariel@arielvalentin.com
>>        <ma...@arielvalentin.com>
>>                 <mailto:ariel@arielvalentin.__com
>>        <ma...@arielvalentin.com>>
>>                 <mailto:ariel@arielvalentin.
>>        <ma...@arielvalentin.>____com
>>        <mailto:ariel@arielvalentin.__com <ma...@arielvalentin.com>>>
>> 
>> 
>>                      website: http://blog.arielvalentin.com
>>                      skype: ariel.s.valentin
>>                      twitter: arielvalentin
>>                      linkedin:
>>        http://www.linkedin.com/____profile/view?id=8996534
>>        <http://www.linkedin.com/__profile/view?id=8996534>
>>                 <http://www.linkedin.com/__profile/view?id=8996534
>>        <http://www.linkedin.com/profile/view?id=8996534>>
>>                      ------------------------------____---------
>> 
>>                      *simplicity *communication
>>                      *feedback *courage *respect
>> 
>> 
>>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Josh Elser <jo...@gmail.com>.

Great, that helps. Thanks for the info, Ariel!

I think this might be an area we want to revisit in later versions of 
Accumulo to make the client API implementations a little more robust and 
supportive of concurrent usage.

On 2/12/14, 3:10 PM, Ariel Valentin wrote:
> Josh,
>
> The symptom is that we hit a point where a single server seems
> "unresponsive" but we do not see anything unusual going on in that
> machine and it seems idol. No heavy CPU, no I/O wait, low load average;
> however when we add additional instances of the JVM our capacity seems
> to increase linearly.
>
> Based on thread dumps and profiler stats it appears that under "heavy"
> load most of our threads are blocked trying to access ZooCache.
>
>
> Ariel Valentin
> e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
> website: http://blog.arielvalentin.com
> skype: ariel.s.valentin
> twitter: arielvalentin
> linkedin: http://www.linkedin.com/profile/view?id=8996534
> ---------------------------------------
> *simplicity *communication
> *feedback *courage *respect
>
>
> On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Didn't mean to ask about the subject matter, but how you were using
>     the API. Are you actually seeing contention on ZooCache?
>
>
>     On 2/12/14, 1:19 PM, Ariel Valentin wrote:
>
>         Sorry but I am not at liberty to be specific about our business
>         problem.
>
>         Typical usage is multiple clients writing data to tables, which
>         scan to
>         avoid duplicate entries.
>
>         Ariel Valentin
>         e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>         <mailto:ariel@arielvalentin.__com <ma...@arielvalentin.com>>
>         website: http://blog.arielvalentin.com
>         skype: ariel.s.valentin
>         twitter: arielvalentin
>         linkedin: http://www.linkedin.com/__profile/view?id=8996534
>         <http://www.linkedin.com/profile/view?id=8996534>
>         ------------------------------__---------
>         *simplicity *communication
>         *feedback *courage *respect
>
>
>         On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
>         <josh.elser@gmail.com <ma...@gmail.com>
>         <mailto:josh.elser@gmail.com <ma...@gmail.com>>> wrote:
>
>              Also, I forgot this part before:
>
>              The ZooCache instance that's used *typically* comes from the
>              Instance object that your Connector was created from. In other
>              words, if you create multiple Instances (ZooKeeperInstance,
>              usually), you can get multiple ZooCaches which means that
>         concurrent
>              calls to methods off of those objects should not block one
>         another
>              (createScanner off of connector1 from instance1 should not
>         block
>              createScanner off of connector2 from instance2).
>
>              That should be something quick you can play with if you so
>         desire.
>
>
>              On 2/12/14, 9:57 AM, Josh Elser wrote:
>
>                  Yep, you'll likely also block on BatchScanner, anything in
>                  TableOperations, and a host of other things.
>
>                  For scanners, there's likely a standing recommendation to
>                  amortize the
>                  use of those objects (if you want to look up 5 range,
>         don't make 5
>                  scanners).
>
>                  Creating a cache per member in the work would likely
>         require
>                  some kind
>                  of paxos implementation to provide consistency which is
>         highly
>                  undesirable.
>
>                  One thing I'm curious about is the impact of removing
>         ZooCache
>                  altogether from things like the client api and see what
>         happens.
>                  I don't
>                  have a good way to measure that impact off the top of
>         my head
>                  though.
>
>                  Anyways, is this causing you problems in your usage of
>         the api?
>                  Could
>                  you elaborate a bit more on the specifics?
>
>                  On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>                  <ariel@arielvalentin.com
>         <ma...@arielvalentin.com>
>         <mailto:ariel@arielvalentin.__com <ma...@arielvalentin.com>>
>                  <mailto:ariel@arielvalentin.
>         <ma...@arielvalentin.>____com
>
>                  <mailto:ariel@arielvalentin.__com
>         <ma...@arielvalentin.com>>>> wrote:
>
>                       I have run into a problem related to
>         ACCUMULO-1833, which
>                  appears to
>                       have addressed the issue for
>         MutliTableBatchWriter; however
>                  I am
>                       seeing this issue on the scanner side also:
>
>                       394750-"http-/192.168.220.196 <http://192.168.220.196>
>                  <http://192.168.220.196>:____8080-35" daemon prio=10
>
>                       tid=0x00007f3108038000 nid=0x538a waiting for
>         monitor entry
>                       [0x00007f31287d1000]
>
>                       394878:   java.lang.Thread.State: BLOCKED (on
>         object monitor)
>
>                       394933- at
>
>
>         org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)
>
>
>                       395012- - waiting to lock <0x00000000fa64f5b8> (a
>                  java.lang.Class
>                       for org.apache.accumulo.fate.____zookeeper.ZooCache)
>
>                       395120- at
>
>
>         org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)
>
>                       395196- at
>
>
>         org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)
>
>                       395267- at
>
>
>         org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)
>
>                       395346- at
>
>
>         org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)
>
>                       395421- at
>
>
>         org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)
>
>                       395510- at
>
>
>         org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)
>
>
>                       I have not spent enough time reasoning about the
>         code to
>                  understand
>                       all of the nuances but I am interested in knowing
>         if there
>                  are any
>                       mitigating strategies for dealing with this thread
>                  contention e.g.
>                       would creating a cache entry for each member of
>         the Zookeeper
>                       ensemble help relieve the strain? use multiple
>                  classloaders? or is
>                       my only option to spawn multiple JVMs?
>
>                       Thanks,
>
>                       Ariel Valentin
>                       e-mail: ariel@arielvalentin.com
>         <ma...@arielvalentin.com>
>                  <mailto:ariel@arielvalentin.__com
>         <ma...@arielvalentin.com>>
>                  <mailto:ariel@arielvalentin.
>         <ma...@arielvalentin.>____com
>         <mailto:ariel@arielvalentin.__com <ma...@arielvalentin.com>>>
>
>
>                       website: http://blog.arielvalentin.com
>                       skype: ariel.s.valentin
>                       twitter: arielvalentin
>                       linkedin:
>         http://www.linkedin.com/____profile/view?id=8996534
>         <http://www.linkedin.com/__profile/view?id=8996534>
>                  <http://www.linkedin.com/__profile/view?id=8996534
>         <http://www.linkedin.com/profile/view?id=8996534>>
>                       ------------------------------____---------
>
>                       *simplicity *communication
>                       *feedback *courage *respect
>
>
>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Ariel Valentin <ar...@arielvalentin.com>.

Josh,

The symptom is that we hit a point where a single server seems
"unresponsive" but we do not see anything unusual going on in that machine
and it seems idol. No heavy CPU, no I/O wait, low load average; however
when we add additional instances of the JVM our capacity seems to increase
linearly.

Based on thread dumps and profiler stats it appears that under "heavy" load
most of our threads are blocked trying to access ZooCache.


Ariel Valentin
e-mail: ariel@arielvalentin.com
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: http://www.linkedin.com/profile/view?id=8996534
---------------------------------------
*simplicity *communication
*feedback *courage *respect


On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <jo...@gmail.com> wrote:

> Didn't mean to ask about the subject matter, but how you were using the
> API. Are you actually seeing contention on ZooCache?
>
>
> On 2/12/14, 1:19 PM, Ariel Valentin wrote:
>
>> Sorry but I am not at liberty to be specific about our business problem.
>>
>> Typical usage is multiple clients writing data to tables, which scan to
>> avoid duplicate entries.
>>
>> Ariel Valentin
>> e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>> website: http://blog.arielvalentin.com
>> skype: ariel.s.valentin
>> twitter: arielvalentin
>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>> ---------------------------------------
>> *simplicity *communication
>> *feedback *courage *respect
>>
>>
>> On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser <josh.elser@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Also, I forgot this part before:
>>
>>     The ZooCache instance that's used *typically* comes from the
>>     Instance object that your Connector was created from. In other
>>     words, if you create multiple Instances (ZooKeeperInstance,
>>     usually), you can get multiple ZooCaches which means that concurrent
>>     calls to methods off of those objects should not block one another
>>     (createScanner off of connector1 from instance1 should not block
>>     createScanner off of connector2 from instance2).
>>
>>     That should be something quick you can play with if you so desire.
>>
>>
>>     On 2/12/14, 9:57 AM, Josh Elser wrote:
>>
>>         Yep, you'll likely also block on BatchScanner, anything in
>>         TableOperations, and a host of other things.
>>
>>         For scanners, there's likely a standing recommendation to
>>         amortize the
>>         use of those objects (if you want to look up 5 range, don't make 5
>>         scanners).
>>
>>         Creating a cache per member in the work would likely require
>>         some kind
>>         of paxos implementation to provide consistency which is highly
>>         undesirable.
>>
>>         One thing I'm curious about is the impact of removing ZooCache
>>         altogether from things like the client api and see what happens.
>>         I don't
>>         have a good way to measure that impact off the top of my head
>>         though.
>>
>>         Anyways, is this causing you problems in your usage of the api?
>>         Could
>>         you elaborate a bit more on the specifics?
>>
>>         On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>>         <ariel@arielvalentin.com <ma...@arielvalentin.com>
>>         <mailto:ariel@arielvalentin.__com
>>
>>         <ma...@arielvalentin.com>>> wrote:
>>
>>              I have run into a problem related to ACCUMULO-1833, which
>>         appears to
>>              have addressed the issue for MutliTableBatchWriter; however
>>         I am
>>              seeing this issue on the scanner side also:
>>
>>              394750-"http-/192.168.220.196
>>         <http://192.168.220.196>:__8080-35" daemon prio=10
>>
>>              tid=0x00007f3108038000 nid=0x538a waiting for monitor entry
>>              [0x00007f31287d1000]
>>
>>              394878:   java.lang.Thread.State: BLOCKED (on object monitor)
>>
>>              394933- at
>>
>>         org.apache.accumulo.fate.__zookeeper.ZooCache.__
>> getInstance(ZooCache.java:301)
>>
>>
>>              395012- - waiting to lock <0x00000000fa64f5b8> (a
>>         java.lang.Class
>>              for org.apache.accumulo.fate.__zookeeper.ZooCache)
>>
>>              395120- at
>>
>>         org.apache.accumulo.core.__client.impl.Tables.__
>> getZooCache(Tables.java:40)
>>
>>              395196- at
>>
>>         org.apache.accumulo.core.__client.impl.Tables.getMap(__
>> Tables.java:44)
>>
>>              395267- at
>>
>>         org.apache.accumulo.core.__client.impl.Tables.__
>> getNameToIdMap(Tables.java:78)
>>
>>              395346- at
>>
>>         org.apache.accumulo.core.__client.impl.Tables.getTableId(
>> __Tables.java:64)
>>
>>              395421- at
>>
>>         org.apache.accumulo.core.__client.impl.ConnectorImpl.__
>> getTableId(ConnectorImpl.java:__75)
>>
>>              395510- at
>>
>>         org.apache.accumulo.core.__client.impl.ConnectorImpl.__
>> createScanner(ConnectorImpl.__java:137)
>>
>>
>>              I have not spent enough time reasoning about the code to
>>         understand
>>              all of the nuances but I am interested in knowing if there
>>         are any
>>              mitigating strategies for dealing with this thread
>>         contention e.g.
>>              would creating a cache entry for each member of the Zookeeper
>>              ensemble help relieve the strain? use multiple
>>         classloaders? or is
>>              my only option to spawn multiple JVMs?
>>
>>              Thanks,
>>
>>              Ariel Valentin
>>              e-mail: ariel@arielvalentin.com
>>         <ma...@arielvalentin.com>
>>         <mailto:ariel@arielvalentin.__com <mailto:ariel@arielvalentin.com
>> >>
>>
>>
>>              website: http://blog.arielvalentin.com
>>              skype: ariel.s.valentin
>>              twitter: arielvalentin
>>              linkedin: http://www.linkedin.com/__profile/view?id=8996534
>>         <http://www.linkedin.com/profile/view?id=8996534>
>>              ------------------------------__---------
>>
>>              *simplicity *communication
>>              *feedback *courage *respect
>>
>>
>>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Josh Elser <jo...@gmail.com>.

Didn't mean to ask about the subject matter, but how you were using the 
API. Are you actually seeing contention on ZooCache?

On 2/12/14, 1:19 PM, Ariel Valentin wrote:
> Sorry but I am not at liberty to be specific about our business problem.
>
> Typical usage is multiple clients writing data to tables, which scan to
> avoid duplicate entries.
>
> Ariel Valentin
> e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
> website: http://blog.arielvalentin.com
> skype: ariel.s.valentin
> twitter: arielvalentin
> linkedin: http://www.linkedin.com/profile/view?id=8996534
> ---------------------------------------
> *simplicity *communication
> *feedback *courage *respect
>
>
> On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Also, I forgot this part before:
>
>     The ZooCache instance that's used *typically* comes from the
>     Instance object that your Connector was created from. In other
>     words, if you create multiple Instances (ZooKeeperInstance,
>     usually), you can get multiple ZooCaches which means that concurrent
>     calls to methods off of those objects should not block one another
>     (createScanner off of connector1 from instance1 should not block
>     createScanner off of connector2 from instance2).
>
>     That should be something quick you can play with if you so desire.
>
>
>     On 2/12/14, 9:57 AM, Josh Elser wrote:
>
>         Yep, you'll likely also block on BatchScanner, anything in
>         TableOperations, and a host of other things.
>
>         For scanners, there's likely a standing recommendation to
>         amortize the
>         use of those objects (if you want to look up 5 range, don't make 5
>         scanners).
>
>         Creating a cache per member in the work would likely require
>         some kind
>         of paxos implementation to provide consistency which is highly
>         undesirable.
>
>         One thing I'm curious about is the impact of removing ZooCache
>         altogether from things like the client api and see what happens.
>         I don't
>         have a good way to measure that impact off the top of my head
>         though.
>
>         Anyways, is this causing you problems in your usage of the api?
>         Could
>         you elaborate a bit more on the specifics?
>
>         On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>         <ariel@arielvalentin.com <ma...@arielvalentin.com>
>         <mailto:ariel@arielvalentin.__com
>         <ma...@arielvalentin.com>>> wrote:
>
>              I have run into a problem related to ACCUMULO-1833, which
>         appears to
>              have addressed the issue for MutliTableBatchWriter; however
>         I am
>              seeing this issue on the scanner side also:
>
>              394750-"http-/192.168.220.196
>         <http://192.168.220.196>:__8080-35" daemon prio=10
>              tid=0x00007f3108038000 nid=0x538a waiting for monitor entry
>              [0x00007f31287d1000]
>
>              394878:   java.lang.Thread.State: BLOCKED (on object monitor)
>
>              394933- at
>
>         org.apache.accumulo.fate.__zookeeper.ZooCache.__getInstance(ZooCache.java:301)
>
>              395012- - waiting to lock <0x00000000fa64f5b8> (a
>         java.lang.Class
>              for org.apache.accumulo.fate.__zookeeper.ZooCache)
>
>              395120- at
>
>         org.apache.accumulo.core.__client.impl.Tables.__getZooCache(Tables.java:40)
>
>              395196- at
>
>         org.apache.accumulo.core.__client.impl.Tables.getMap(__Tables.java:44)
>
>              395267- at
>
>         org.apache.accumulo.core.__client.impl.Tables.__getNameToIdMap(Tables.java:78)
>
>              395346- at
>
>         org.apache.accumulo.core.__client.impl.Tables.getTableId(__Tables.java:64)
>
>              395421- at
>
>         org.apache.accumulo.core.__client.impl.ConnectorImpl.__getTableId(ConnectorImpl.java:__75)
>
>              395510- at
>
>         org.apache.accumulo.core.__client.impl.ConnectorImpl.__createScanner(ConnectorImpl.__java:137)
>
>              I have not spent enough time reasoning about the code to
>         understand
>              all of the nuances but I am interested in knowing if there
>         are any
>              mitigating strategies for dealing with this thread
>         contention e.g.
>              would creating a cache entry for each member of the Zookeeper
>              ensemble help relieve the strain? use multiple
>         classloaders? or is
>              my only option to spawn multiple JVMs?
>
>              Thanks,
>
>              Ariel Valentin
>              e-mail: ariel@arielvalentin.com
>         <ma...@arielvalentin.com>
>         <mailto:ariel@arielvalentin.__com <ma...@arielvalentin.com>>
>
>              website: http://blog.arielvalentin.com
>              skype: ariel.s.valentin
>              twitter: arielvalentin
>              linkedin: http://www.linkedin.com/__profile/view?id=8996534
>         <http://www.linkedin.com/profile/view?id=8996534>
>              ------------------------------__---------
>              *simplicity *communication
>              *feedback *courage *respect
>
>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Ariel Valentin <ar...@arielvalentin.com>.

It may be an issue with our table design. We have two tables; one of the
tables contains related entities that need to be purged before updating the
parent entity.

Ariel Valentin
e-mail: ariel@arielvalentin.com
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: http://www.linkedin.com/profile/view?id=8996534
---------------------------------------
*simplicity *communication
*feedback *courage *respect


On Wed, Feb 12, 2014 at 1:42 PM, William Slacum <
wilhelm.von.cloud@accumulo.net> wrote:

> FWIW you can probably avoid the scan by making your insert idempotent
> aside from the timestamp and let versioning handle deduplication.
>
>
> On Wed, Feb 12, 2014 at 1:19 PM, Ariel Valentin <ar...@arielvalentin.com>wrote:
>
>> Sorry but I am not at liberty to be specific about our business problem.
>>
>> Typical usage is multiple clients writing data to tables, which scan to
>> avoid duplicate entries.
>>
>> Ariel Valentin
>> e-mail: ariel@arielvalentin.com
>>
>> website: http://blog.arielvalentin.com
>> skype: ariel.s.valentin
>> twitter: arielvalentin
>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>> ---------------------------------------
>> *simplicity *communication
>> *feedback *courage *respect
>>
>>
>> On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser <jo...@gmail.com>wrote:
>>
>>> Also, I forgot this part before:
>>>
>>> The ZooCache instance that's used *typically* comes from the Instance
>>> object that your Connector was created from. In other words, if you create
>>> multiple Instances (ZooKeeperInstance, usually), you can get multiple
>>> ZooCaches which means that concurrent calls to methods off of those objects
>>> should not block one another (createScanner off of connector1 from
>>> instance1 should not block createScanner off of connector2 from instance2).
>>>
>>> That should be something quick you can play with if you so desire.
>>>
>>>
>>> On 2/12/14, 9:57 AM, Josh Elser wrote:
>>>
>>>> Yep, you'll likely also block on BatchScanner, anything in
>>>> TableOperations, and a host of other things.
>>>>
>>>> For scanners, there's likely a standing recommendation to amortize the
>>>> use of those objects (if you want to look up 5 range, don't make 5
>>>> scanners).
>>>>
>>>> Creating a cache per member in the work would likely require some kind
>>>> of paxos implementation to provide consistency which is highly
>>>> undesirable.
>>>>
>>>> One thing I'm curious about is the impact of removing ZooCache
>>>> altogether from things like the client api and see what happens. I don't
>>>> have a good way to measure that impact off the top of my head though.
>>>>
>>>> Anyways, is this causing you problems in your usage of the api? Could
>>>> you elaborate a bit more on the specifics?
>>>>
>>>> On Feb 12, 2014 4:48 AM, "Ariel Valentin" <ariel@arielvalentin.com
>>>> <ma...@arielvalentin.com>> wrote:
>>>>
>>>>     I have run into a problem related to ACCUMULO-1833, which appears to
>>>>     have addressed the issue for MutliTableBatchWriter; however I am
>>>>     seeing this issue on the scanner side also:
>>>>
>>>>     394750-"http-/192.168.220.196:8080-35" daemon prio=10
>>>>     tid=0x00007f3108038000 nid=0x538a waiting for monitor entry
>>>>     [0x00007f31287d1000]
>>>>
>>>>     394878:   java.lang.Thread.State: BLOCKED (on object monitor)
>>>>
>>>>     394933- at
>>>>     org.apache.accumulo.fate.zookeeper.ZooCache.
>>>> getInstance(ZooCache.java:301)
>>>>
>>>>     395012- - waiting to lock <0x00000000fa64f5b8> (a java.lang.Class
>>>>     for org.apache.accumulo.fate.zookeeper.ZooCache)
>>>>
>>>>     395120- at
>>>>     org.apache.accumulo.core.client.impl.Tables.
>>>> getZooCache(Tables.java:40)
>>>>
>>>>     395196- at
>>>>     org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:44)
>>>>
>>>>     395267- at
>>>>     org.apache.accumulo.core.client.impl.Tables.
>>>> getNameToIdMap(Tables.java:78)
>>>>
>>>>     395346- at
>>>>     org.apache.accumulo.core.client.impl.Tables.getTableId(
>>>> Tables.java:64)
>>>>
>>>>     395421- at
>>>>     org.apache.accumulo.core.client.impl.ConnectorImpl.
>>>> getTableId(ConnectorImpl.java:75)
>>>>
>>>>     395510- at
>>>>     org.apache.accumulo.core.client.impl.ConnectorImpl.
>>>> createScanner(ConnectorImpl.java:137)
>>>>
>>>>     I have not spent enough time reasoning about the code to understand
>>>>     all of the nuances but I am interested in knowing if there are any
>>>>     mitigating strategies for dealing with this thread contention e.g.
>>>>     would creating a cache entry for each member of the Zookeeper
>>>>     ensemble help relieve the strain? use multiple classloaders? or is
>>>>     my only option to spawn multiple JVMs?
>>>>
>>>>     Thanks,
>>>>
>>>>     Ariel Valentin
>>>>     e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>>>>
>>>>     website: http://blog.arielvalentin.com
>>>>     skype: ariel.s.valentin
>>>>     twitter: arielvalentin
>>>>     linkedin: http://www.linkedin.com/profile/view?id=8996534
>>>>     ---------------------------------------
>>>>     *simplicity *communication
>>>>     *feedback *courage *respect
>>>>
>>>>
>>
>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by William Slacum <wi...@accumulo.net>.

FWIW you can probably avoid the scan by making your insert idempotent aside
from the timestamp and let versioning handle deduplication.


On Wed, Feb 12, 2014 at 1:19 PM, Ariel Valentin <ar...@arielvalentin.com>wrote:

> Sorry but I am not at liberty to be specific about our business problem.
>
> Typical usage is multiple clients writing data to tables, which scan to
> avoid duplicate entries.
>
> Ariel Valentin
> e-mail: ariel@arielvalentin.com
>
> website: http://blog.arielvalentin.com
> skype: ariel.s.valentin
> twitter: arielvalentin
> linkedin: http://www.linkedin.com/profile/view?id=8996534
> ---------------------------------------
> *simplicity *communication
> *feedback *courage *respect
>
>
> On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser <jo...@gmail.com> wrote:
>
>> Also, I forgot this part before:
>>
>> The ZooCache instance that's used *typically* comes from the Instance
>> object that your Connector was created from. In other words, if you create
>> multiple Instances (ZooKeeperInstance, usually), you can get multiple
>> ZooCaches which means that concurrent calls to methods off of those objects
>> should not block one another (createScanner off of connector1 from
>> instance1 should not block createScanner off of connector2 from instance2).
>>
>> That should be something quick you can play with if you so desire.
>>
>>
>> On 2/12/14, 9:57 AM, Josh Elser wrote:
>>
>>> Yep, you'll likely also block on BatchScanner, anything in
>>> TableOperations, and a host of other things.
>>>
>>> For scanners, there's likely a standing recommendation to amortize the
>>> use of those objects (if you want to look up 5 range, don't make 5
>>> scanners).
>>>
>>> Creating a cache per member in the work would likely require some kind
>>> of paxos implementation to provide consistency which is highly
>>> undesirable.
>>>
>>> One thing I'm curious about is the impact of removing ZooCache
>>> altogether from things like the client api and see what happens. I don't
>>> have a good way to measure that impact off the top of my head though.
>>>
>>> Anyways, is this causing you problems in your usage of the api? Could
>>> you elaborate a bit more on the specifics?
>>>
>>> On Feb 12, 2014 4:48 AM, "Ariel Valentin" <ariel@arielvalentin.com
>>> <ma...@arielvalentin.com>> wrote:
>>>
>>>     I have run into a problem related to ACCUMULO-1833, which appears to
>>>     have addressed the issue for MutliTableBatchWriter; however I am
>>>     seeing this issue on the scanner side also:
>>>
>>>     394750-"http-/192.168.220.196:8080-35" daemon prio=10
>>>     tid=0x00007f3108038000 nid=0x538a waiting for monitor entry
>>>     [0x00007f31287d1000]
>>>
>>>     394878:   java.lang.Thread.State: BLOCKED (on object monitor)
>>>
>>>     394933- at
>>>     org.apache.accumulo.fate.zookeeper.ZooCache.
>>> getInstance(ZooCache.java:301)
>>>
>>>     395012- - waiting to lock <0x00000000fa64f5b8> (a java.lang.Class
>>>     for org.apache.accumulo.fate.zookeeper.ZooCache)
>>>
>>>     395120- at
>>>     org.apache.accumulo.core.client.impl.Tables.
>>> getZooCache(Tables.java:40)
>>>
>>>     395196- at
>>>     org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:44)
>>>
>>>     395267- at
>>>     org.apache.accumulo.core.client.impl.Tables.
>>> getNameToIdMap(Tables.java:78)
>>>
>>>     395346- at
>>>     org.apache.accumulo.core.client.impl.Tables.getTableId(
>>> Tables.java:64)
>>>
>>>     395421- at
>>>     org.apache.accumulo.core.client.impl.ConnectorImpl.
>>> getTableId(ConnectorImpl.java:75)
>>>
>>>     395510- at
>>>     org.apache.accumulo.core.client.impl.ConnectorImpl.
>>> createScanner(ConnectorImpl.java:137)
>>>
>>>     I have not spent enough time reasoning about the code to understand
>>>     all of the nuances but I am interested in knowing if there are any
>>>     mitigating strategies for dealing with this thread contention e.g.
>>>     would creating a cache entry for each member of the Zookeeper
>>>     ensemble help relieve the strain? use multiple classloaders? or is
>>>     my only option to spawn multiple JVMs?
>>>
>>>     Thanks,
>>>
>>>     Ariel Valentin
>>>     e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>>>
>>>     website: http://blog.arielvalentin.com
>>>     skype: ariel.s.valentin
>>>     twitter: arielvalentin
>>>     linkedin: http://www.linkedin.com/profile/view?id=8996534
>>>     ---------------------------------------
>>>     *simplicity *communication
>>>     *feedback *courage *respect
>>>
>>>
>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Ariel Valentin <ar...@arielvalentin.com>.

Sorry but I am not at liberty to be specific about our business problem.

Typical usage is multiple clients writing data to tables, which scan to
avoid duplicate entries.

Ariel Valentin
e-mail: ariel@arielvalentin.com
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: http://www.linkedin.com/profile/view?id=8996534
---------------------------------------
*simplicity *communication
*feedback *courage *respect


On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser <jo...@gmail.com> wrote:

> Also, I forgot this part before:
>
> The ZooCache instance that's used *typically* comes from the Instance
> object that your Connector was created from. In other words, if you create
> multiple Instances (ZooKeeperInstance, usually), you can get multiple
> ZooCaches which means that concurrent calls to methods off of those objects
> should not block one another (createScanner off of connector1 from
> instance1 should not block createScanner off of connector2 from instance2).
>
> That should be something quick you can play with if you so desire.
>
>
> On 2/12/14, 9:57 AM, Josh Elser wrote:
>
>> Yep, you'll likely also block on BatchScanner, anything in
>> TableOperations, and a host of other things.
>>
>> For scanners, there's likely a standing recommendation to amortize the
>> use of those objects (if you want to look up 5 range, don't make 5
>> scanners).
>>
>> Creating a cache per member in the work would likely require some kind
>> of paxos implementation to provide consistency which is highly
>> undesirable.
>>
>> One thing I'm curious about is the impact of removing ZooCache
>> altogether from things like the client api and see what happens. I don't
>> have a good way to measure that impact off the top of my head though.
>>
>> Anyways, is this causing you problems in your usage of the api? Could
>> you elaborate a bit more on the specifics?
>>
>> On Feb 12, 2014 4:48 AM, "Ariel Valentin" <ariel@arielvalentin.com
>> <ma...@arielvalentin.com>> wrote:
>>
>>     I have run into a problem related to ACCUMULO-1833, which appears to
>>     have addressed the issue for MutliTableBatchWriter; however I am
>>     seeing this issue on the scanner side also:
>>
>>     394750-"http-/192.168.220.196:8080-35" daemon prio=10
>>     tid=0x00007f3108038000 nid=0x538a waiting for monitor entry
>>     [0x00007f31287d1000]
>>
>>     394878:   java.lang.Thread.State: BLOCKED (on object monitor)
>>
>>     394933- at
>>     org.apache.accumulo.fate.zookeeper.ZooCache.
>> getInstance(ZooCache.java:301)
>>
>>     395012- - waiting to lock <0x00000000fa64f5b8> (a java.lang.Class
>>     for org.apache.accumulo.fate.zookeeper.ZooCache)
>>
>>     395120- at
>>     org.apache.accumulo.core.client.impl.Tables.
>> getZooCache(Tables.java:40)
>>
>>     395196- at
>>     org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:44)
>>
>>     395267- at
>>     org.apache.accumulo.core.client.impl.Tables.
>> getNameToIdMap(Tables.java:78)
>>
>>     395346- at
>>     org.apache.accumulo.core.client.impl.Tables.getTableId(
>> Tables.java:64)
>>
>>     395421- at
>>     org.apache.accumulo.core.client.impl.ConnectorImpl.
>> getTableId(ConnectorImpl.java:75)
>>
>>     395510- at
>>     org.apache.accumulo.core.client.impl.ConnectorImpl.
>> createScanner(ConnectorImpl.java:137)
>>
>>     I have not spent enough time reasoning about the code to understand
>>     all of the nuances but I am interested in knowing if there are any
>>     mitigating strategies for dealing with this thread contention e.g.
>>     would creating a cache entry for each member of the Zookeeper
>>     ensemble help relieve the strain? use multiple classloaders? or is
>>     my only option to spawn multiple JVMs?
>>
>>     Thanks,
>>
>>     Ariel Valentin
>>     e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>>
>>     website: http://blog.arielvalentin.com
>>     skype: ariel.s.valentin
>>     twitter: arielvalentin
>>     linkedin: http://www.linkedin.com/profile/view?id=8996534
>>     ---------------------------------------
>>     *simplicity *communication
>>     *feedback *courage *respect
>>
>>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Josh Elser <jo...@gmail.com>.

Also, I forgot this part before:

The ZooCache instance that's used *typically* comes from the Instance 
object that your Connector was created from. In other words, if you 
create multiple Instances (ZooKeeperInstance, usually), you can get 
multiple ZooCaches which means that concurrent calls to methods off of 
those objects should not block one another (createScanner off of 
connector1 from instance1 should not block createScanner off of 
connector2 from instance2).

That should be something quick you can play with if you so desire.

On 2/12/14, 9:57 AM, Josh Elser wrote:
> Yep, you'll likely also block on BatchScanner, anything in
> TableOperations, and a host of other things.
>
> For scanners, there's likely a standing recommendation to amortize the
> use of those objects (if you want to look up 5 range, don't make 5
> scanners).
>
> Creating a cache per member in the work would likely require some kind
> of paxos implementation to provide consistency which is highly undesirable.
>
> One thing I'm curious about is the impact of removing ZooCache
> altogether from things like the client api and see what happens. I don't
> have a good way to measure that impact off the top of my head though.
>
> Anyways, is this causing you problems in your usage of the api? Could
> you elaborate a bit more on the specifics?
>
> On Feb 12, 2014 4:48 AM, "Ariel Valentin" <ariel@arielvalentin.com
> <ma...@arielvalentin.com>> wrote:
>
>     I have run into a problem related to ACCUMULO-1833, which appears to
>     have addressed the issue for MutliTableBatchWriter; however I am
>     seeing this issue on the scanner side also:
>
>     394750-"http-/192.168.220.196:8080-35" daemon prio=10
>     tid=0x00007f3108038000 nid=0x538a waiting for monitor entry
>     [0x00007f31287d1000]
>
>     394878:   java.lang.Thread.State: BLOCKED (on object monitor)
>
>     394933- at
>     org.apache.accumulo.fate.zookeeper.ZooCache.getInstance(ZooCache.java:301)
>
>     395012- - waiting to lock <0x00000000fa64f5b8> (a java.lang.Class
>     for org.apache.accumulo.fate.zookeeper.ZooCache)
>
>     395120- at
>     org.apache.accumulo.core.client.impl.Tables.getZooCache(Tables.java:40)
>
>     395196- at
>     org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:44)
>
>     395267- at
>     org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)
>
>     395346- at
>     org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)
>
>     395421- at
>     org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)
>
>     395510- at
>     org.apache.accumulo.core.client.impl.ConnectorImpl.createScanner(ConnectorImpl.java:137)
>
>     I have not spent enough time reasoning about the code to understand
>     all of the nuances but I am interested in knowing if there are any
>     mitigating strategies for dealing with this thread contention e.g.
>     would creating a cache entry for each member of the Zookeeper
>     ensemble help relieve the strain? use multiple classloaders? or is
>     my only option to spawn multiple JVMs?
>
>     Thanks,
>
>     Ariel Valentin
>     e-mail: ariel@arielvalentin.com <ma...@arielvalentin.com>
>     website: http://blog.arielvalentin.com
>     skype: ariel.s.valentin
>     twitter: arielvalentin
>     linkedin: http://www.linkedin.com/profile/view?id=8996534
>     ---------------------------------------
>     *simplicity *communication
>     *feedback *courage *respect
>

Re: Synchronized Access to ZooCache Causing Threads to Block

Posted by Josh Elser <jo...@gmail.com>.

Yep, you'll likely also block on BatchScanner, anything in TableOperations,
and a host of other things.

For scanners, there's likely a standing recommendation to amortize the use
of those objects (if you want to look up 5 range, don't make 5 scanners).

Creating a cache per member in the work would likely require some kind of
paxos implementation to provide consistency which is highly undesirable.

One thing I'm curious about is the impact of removing ZooCache altogether
from things like the client api and see what happens. I don't have a good
way to measure that impact off the top of my head though.

Anyways, is this causing you problems in your usage of the api? Could you
elaborate a bit more on the specifics?
On Feb 12, 2014 4:48 AM, "Ariel Valentin" <ar...@arielvalentin.com> wrote:

> I have run into a problem related to ACCUMULO-1833, which appears to have
> addressed the issue for MutliTableBatchWriter; however I am seeing this
> issue on the scanner side also:
>
> 394750-"http-/192.168.220.196:8080-35" daemon prio=10
> tid=0x00007f3108038000 nid=0x538a waiting for monitor entry
> [0x00007f31287d1000]
>
> 394878:   java.lang.Thread.State: BLOCKED (on object monitor)
>
> 394933- at
> org.apache.accumulo.fate.zookeeper.ZooCache.getInstance(ZooCache.java:301)
>
> 395012- - waiting to lock <0x00000000fa64f5b8> (a java.lang.Class for
> org.apache.accumulo.fate.zookeeper.ZooCache)
>
> 395120- at
> org.apache.accumulo.core.client.impl.Tables.getZooCache(Tables.java:40)
>
> 395196- at
> org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:44)
>
> 395267- at
> org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)
>
> 395346- at
> org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)
>
> 395421- at
> org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)
>
> 395510- at
> org.apache.accumulo.core.client.impl.ConnectorImpl.createScanner(ConnectorImpl.java:137)
>
> I have not spent enough time reasoning about the code to understand all of
> the nuances but I am interested in knowing if there are any mitigating
> strategies for dealing with this thread contention e.g. would creating a
> cache entry for each member of the Zookeeper ensemble help relieve the
> strain? use multiple classloaders? or is my only option to spawn multiple
> JVMs?
>
> Thanks,
> Ariel Valentin
> e-mail: ariel@arielvalentin.com
> website: http://blog.arielvalentin.com
> skype: ariel.s.valentin
> twitter: arielvalentin
> linkedin: http://www.linkedin.com/profile/view?id=8996534
> ---------------------------------------
> *simplicity *communication
> *feedback *courage *respect
>