You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Paulo Ricardo Motta Gomes <pa...@chaordicsystems.com> on 2014/05/14 00:51:57 UTC

Cassandra hadoop job fails if any node is DOWN

Hello,

One of the nodes of our Analytics DC is dead, but ColumnFamilyInputFormat
(CFIF) still assigns Hadoop input splits to it. This leads to many failed
tasks and consequently a failed job.

* Tasks fail with: java.lang.RuntimeException:
org.apache.thrift.transport.TTransportException: Failed to open a transport
to XX.75:9160. (obviously, the node is dead)

* Job fails with: Job Failed: # of failed Map Tasks exceeded allowed limit.
FailedCount: 1. LastFailedTask: task_201404180250_4207_m_000079

We use RF=2 and CL=LOCAL_ONE for hadoop jobs, C* 1.2.16. Is this expected
behavior?

I checked CFIF code, but it always assigns input splits to all the ring
nodes, no matter if the node is dead or alive. What we do to fix is patch
CFIF to blacklist the dead node, but this is not very automatic procedure.
Am I not getting something here?

Cheers,

-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200