You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Shamim <sr...@yandex.ru> on 2013/04/22 19:48:24 UTC

Cassandra + Hadoop - 2 Task attempts with million of rows

Hello all,
  recently we have upgrade our cluster (6 nodes) from cassandra version 1.1.6 to 1.2.1. Our cluster is evenly partitioned (Murmur3Partitioner). We are using pig for parse and compute aggregate data.

When we submit job through pig, what i consistently see is that, while most of the task have 20-25k row assigned each (Map input records), only 2 of them (always 2 ) getting more than 2 million rows. This 2 tasks always complete 100% and hang for long time. Also most of the time we are getting killed task (2%) with TimeoutException.

We increased rpc_timeout to 60000, also set cassandra.input.split.size=1024 but nothing help.

We have roughly 97million rows in our cluster. Why we are getting above behavior? Do you have any suggestion or clue to trouble shoot in this issue? Any help will be highly thankful. Thankx in advance.

--
Best regards
  Shamim A.

Re: Cassandra + Hadoop - 2 Task attempts with million of rows

Posted by aaron morton <aa...@thelastpickle.com>.
>> Our cluster is evenly partitioned (Murmur3Partitioner)
Murmor3Partitioner is only available in 1.2 and changing partitioners is not supported. Did you change from Random Partitioner under 1.1?

Are you using virtual nodes in your 1.2 cluster ? 

>> We have roughly 97million rows in our cluster. Why we are getting above behavior? Do you have any suggestion or clue to trouble shoot in this issue?
Can you make some of the logs from the tasks available?

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 5:50 AM, Shamim <sr...@yandex.ru> wrote:

> We are using Hadoop 1.0.3 and pig 0.11.1 version
> 
> -- 
> Best regards
>   Shamim A.
> 
> 22.04.2013, 21:48, "Shamim" <sr...@yandex.ru>:
>> Hello all,
>>   recently we have upgrade our cluster (6 nodes) from cassandra version 1.1.6 to 1.2.1. Our cluster is evenly partitioned (Murmur3Partitioner). We are using pig for parse and compute aggregate data.
>> 
>> When we submit job through pig, what i consistently see is that, while most of the task have 20-25k row assigned each (Map input records), only 2 of them (always 2 ) getting more than 2 million rows. This 2 tasks always complete 100% and hang for long time. Also most of the time we are getting killed task (2%) with TimeoutException.
>> 
>> We increased rpc_timeout to 60000, also set cassandra.input.split.size=1024 but nothing help.
>> 
>> We have roughly 97million rows in our cluster. Why we are getting above behavior? Do you have any suggestion or clue to trouble shoot in this issue? Any help will be highly thankful. Thankx in advance.
>> 
>> --
>> Best regards
>>   Shamim A.


Re: Cassandra + Hadoop - 2 Task attempts with million of rows

Posted by Shamim <sr...@yandex.ru>.
We are using Hadoop 1.0.3 and pig 0.11.1 version

-- 
Best regards
  Shamim A.

22.04.2013, 21:48, "Shamim" <sr...@yandex.ru>:
> Hello all,
>   recently we have upgrade our cluster (6 nodes) from cassandra version 1.1.6 to 1.2.1. Our cluster is evenly partitioned (Murmur3Partitioner). We are using pig for parse and compute aggregate data.
>
> When we submit job through pig, what i consistently see is that, while most of the task have 20-25k row assigned each (Map input records), only 2 of them (always 2 ) getting more than 2 million rows. This 2 tasks always complete 100% and hang for long time. Also most of the time we are getting killed task (2%) with TimeoutException.
>
> We increased rpc_timeout to 60000, also set cassandra.input.split.size=1024 but nothing help.
>
> We have roughly 97million rows in our cluster. Why we are getting above behavior? Do you have any suggestion or clue to trouble shoot in this issue? Any help will be highly thankful. Thankx in advance.
>
> --
> Best regards
>   Shamim A.