You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2009/04/08 20:10:12 UTC

[jira] Created: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

ZooKeeper: use native threads to avoid GC stalls (JNI integration)
------------------------------------------------------------------

Key: HBASE-1316
URL: https://issues.apache.org/jira/browse/HBASE-1316
Project: Hadoop HBase
Issue Type: Improvement
Affects Versions: 0.20.0
Reporter: Andrew Purtell

>From Joey Echeverria up on hbase-users@:

We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.

We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.

I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe reassigned HBASE-1316:
----------------------------------

    Assignee: Nitay Joffe

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: zk_wrapper.tar.gz
>
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727184#action_12727184 ] 

stack commented on HBASE-1316:
------------------------------

Can we fall back to pure-java implementation if c binary can't be found?

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: zk_wrapper.tar.gz
>
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727185#action_12727185 ] 

Andrew Purtell commented on HBASE-1316:
---------------------------------------

@stack: That is what Hadoop does if native compression bits do not exist for the platform or were not installed. If Nitay follows that example, this stuff should work the same way. 

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: zk_wrapper.tar.gz
>
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726758#action_12726758 ] 

Nitay Joffe commented on HBASE-1316:
------------------------------------

I've started looking into this. I have a minimal JNI binding to ZooKeeper based off of Joey's work. It doesn't use swig, as I think that adds an unnecessary dependency.

The question, as Joey mentions, is where do we want to put the gap between JNI ZooKeeper and Java ZooKeeper?
On one hand, we can just have the JNI binding handle ephemeral nodes to reduce session expired events from Java GC starving hearbeats. On the other hand we can try making the JNI binding handle and not have a Java ZooKeeper handle at all, but that might be ugly with the watcher events going back and forth between C <=> Java.

What do you guys think?

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: zk_wrapper.tar.gz
>
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726759#action_12726759 ] 

Andrew Purtell commented on HBASE-1316:
---------------------------------------

The trouble here is ephemeral nodes expiring due to dropped heartbeats. I think this issue should be about solving that problem only. The rest almost does not matter -- java land is blocked anyway, watcher events will queue up. Also, is calling up into java land from JNI while the VM is in a GC cycle safe? It must be. Then I presume if you tried to create an object in the C thread the create would block somehow on an os level mutex until it is safe to create objects again. Would that not defeat the purpose of the C thread in the first place?  

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: zk_wrapper.tar.gz
>
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726763#action_12726763 ] 

Nitay Joffe commented on HBASE-1316:
------------------------------------

Good points Andrew. I'll look into this further (still learning JNI), but I suspect your intuitions are correct.
I'll stick to just handling ephemeral nodes in the JNI.

We also need to decide how we want to bundle the JNI library, as we will now have platform specific things in our code base. I'll look into how Hadoop does this with things like libhdfs. If anyone has pointers, please let me know.

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: zk_wrapper.tar.gz
>
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727198#action_12727198 ] 

Nitay Joffe commented on HBASE-1316:
------------------------------------

Sounds good, I'll look at the compression bits and work off that. Thanks boys.

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: zk_wrapper.tar.gz
>
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "Joey Echeverria (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726832#action_12726832 ] 

Joey Echeverria commented on HBASE-1316:
----------------------------------------

We specifically avoided having any callbacks cross the C/Java boundary This was simple in our use case where the only thing we needed to monitor after creating an ephemeral node was whether or ZK session had expired. We also had a very simple recovery mechanism, we immediately kill the process that got disconnected and the shell script that launched us will relaunch. This proved far easier than trying to re-establish a connection to ZK in the running process.

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: zk_wrapper.tar.gz
>
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726764#action_12726764 ] 

Andrew Purtell commented on HBASE-1316:
---------------------------------------

How native compression bits are hooked up is a better example than libhdfs. IIRC, building libhdfs is a manual extra step. 

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: zk_wrapper.tar.gz
>
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "Joey Echeverria (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joey Echeverria updated HBASE-1316:
-----------------------------------

    Attachment: zk_wrapper.tar.gz

I'm uploading a tarball with the basic wrapper we used to get around GC pauses. The C code maintains it's own session with zk which is independent of any java sessions. If you want to do anything other than create ephemeral nodes, more functions need to be wrapped or a combination of C and Java is needed.

I could try and build something more comprehensive, potentially even a full implementation of the Java API which completely wraps the C, but I wanted to show you guys what I had at this stage. Let me know if you have any questions.

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>         Attachments: zk_wrapper.tar.gz
>
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "Berk D. Demir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Berk D. Demir reassigned HBASE-1316:
------------------------------------

    Assignee: Berk D. Demir  (was: Nitay Joffe)

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>            Assignee: Berk D. Demir
>         Attachments: zk_wrapper.tar.gz
>
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697129#action_12697129 ] 

Andrew Purtell commented on HBASE-1316:
---------------------------------------

Subsequent discussion on the list includes affirmative votes from several individuals. 

> ZooKeeper: use native threads to avoid GC stalls (JNI integration)
> ------------------------------------------------------------------
>
>                 Key: HBASE-1316
>                 URL: https://issues.apache.org/jira/browse/HBASE-1316
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Andrew Purtell
>
> From Joey Echeverria up on hbase-users@:
> We've used zookeeper in a write-heavy project we've been working on and experienced issues similar to what you described. After several days of debugging, we discovered that our issue was garbage collection. There was no way to guarantee we wouldn't have long pauses especially since our environment was the worst case for garbage collection, millions of tiny, short lived objects. I suspect HBase sees similar work loads frequently, if it's not constantly. With anything shorter than a 30 second session time out, we got session expiration events extremely frequently. We needed to use 60 seconds for any real confidence that an ephemeral node disappearing meant something was unavailable.
> We really wanted quick recovery so we ended up writing a light-weight wrapper around the C API and used swig to auto-generate a JNI interface. It's not perfect, but since we switched to this method we've never seen a session expiration event and ephemeral nodes only disappear when there are network issues or a machine/process goes down.
> I don't know if it's worth doing the same kind of thing for HBase as it adds some "unnecessary" native code, but it's a solution that I found works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.