You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Piotr Kołaczkowski (JIRA)" <ji...@apache.org> on 2013/10/29 20:49:25 UTC
[jira] [Created] (CASSANDRA-6268) Poor performance of Hadoop if any
DC is using VNodes
Piotr Kołaczkowski created CASSANDRA-6268:
---------------------------------------------
Summary: Poor performance of Hadoop if any DC is using VNodes
Key: CASSANDRA-6268
URL: https://issues.apache.org/jira/browse/CASSANDRA-6268
Project: Cassandra
Issue Type: Improvement
Components: Hadoop
Reporter: Piotr Kołaczkowski
Assignee: Piotr Kołaczkowski
Attachments: 0001-DSP-2572-Adds-ability-to-set-target-DCs-where-a-Hado.patch
Some customers are complaining about huge number of splits in Hadoop caused by VNodes. Disabling vnodes only in Hadoop DC does not fix it, because splits are generated from the results of describe_ring, which returns a huge number of ranges.
The proposed fix:
- allows for specifying the DCs the Hadoop job should be run
- merges the consecutive ranges before generating Hadoop splits, so we don't have artificial range splitting caused by vnodes in the other DCs
--
This message was sent by Atlassian JIRA
(v6.1#6144)