You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2014/06/27 20:46:25 UTC

[jira] [Created] (STORM-376) Add compression to data stored in ZK

Robert Joseph Evans created STORM-376:
-----------------------------------------

             Summary: Add compression to data stored in ZK
                 Key: STORM-376
                 URL: https://issues.apache.org/jira/browse/STORM-376
             Project: Apache Storm (Incubating)
          Issue Type: Improvement
            Reporter: Robert Joseph Evans
            Assignee: Robert Joseph Evans


If you run zookeeper with -Dzookeeper.forceSync=no the zookeeper disk no longer is the bottleneck for scaling storm.  For us on a Gigabit Ethernet (scale test cluster) it becomes the aggregate reads by all of the supervisors and workers trying to download the compiled topology assignments.

To reduce this load we took two approaches.  First we compressed the data being stored in zookeeper (this JIRA) which also has the added benefit of increasing the size of the topology you can store in ZK.  Second we used the ZK version number to see if the data had changed and avoid downloading it again needlessly (STORM-375).

With these changes we were able to scale to a simulated 1965 nodes (5 supervisors running on each of 393 real nodes, with each supervisor configured to have 10 slots).  We also filled the cluster with 131 topologies of 100 workers each.   (we are going to 200 topos, and may try to scale the cluster even larger, but it takes forever to launch topologies once the cluster is under load.  We may try to address that shortly too)



--
This message was sent by Atlassian JIRA
(v6.2#6252)