You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Patrik Modesto <pa...@gmail.com> on 2012/03/06 09:32:59 UTC

Re: newer Cassandra + Hadoop = TimedOutException()

Hi,

I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
Timeouts I get are not because of the Cassandra can't handle the
requests. I've noticed there are several tasks that show proggess of
several thousands percents. Seems like they are looping their range of
keys. I've run the job with debug enabled and the ranges look ok, see
http://pastebin.com/stVsFzLM

Another difference between cassandra-all 0.8.7 and 0.8.10 is the
number of mappers the job creates:
0.8.7: 4680
0.8.10: 595

Task       Complete
task_201202281457_2027_m_000041	9076.81%
task_201202281457_2027_m_000073	9639.04%
task_201202281457_2027_m_000105	10538.60%
task_201202281457_2027_m_000108	9364.17%

None of this happens with cassandra-all 0.8.7.

Regards,
P.



On Tue, Feb 28, 2012 at 12:29, Patrik Modesto <pa...@gmail.com> wrote:
> I'll alter these settings and will let you know.
>
> Regards,
> P.
>
> On Tue, Feb 28, 2012 at 09:23, aaron morton <aa...@thelastpickle.com> wrote:
>> Have you tried lowering the  batch size and increasing the time out? Even
>> just to get it to work.
>>
>> If you get a TimedOutException it means CL number of servers did not respond
>> in time.
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>>
>> Hi aaron,
>>
>> this is our current settings:
>>
>>      <property>
>>          <name>cassandra.range.batch.size</name>
>>          <value>1024</value>
>>      </property>
>>
>>      <property>
>>          <name>cassandra.input.split.size</name>
>>          <value>16384</value>
>>      </property>
>>
>> rpc_timeout_in_ms: 30000
>>
>> Regards,
>> P.
>>
>> On Mon, Feb 27, 2012 at 21:54, aaron morton <aa...@thelastpickle.com> wrote:
>>
>> What settings do you have for cassandra.range.batch.size
>>
>> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or increasing
>>
>> the second ?
>>
>>
>> Cheers
>>
>>
>> -----------------
>>
>> Aaron Morton
>>
>> Freelance Developer
>>
>> @aaronmorton
>>
>> http://www.thelastpickle.com
>>
>>
>> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>>
>>
>> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <ed...@gmail.com>
>>
>> wrote:
>>
>>
>> Did you see the notes here?
>>
>>
>>
>> I'm not sure what do you mean by the notes?
>>
>>
>> I'm using the mapred.* settings suggested there:
>>
>>
>>     <property>
>>
>>         <name>mapred.max.tracker.failures</name>
>>
>>         <value>20</value>
>>
>>     </property>
>>
>>     <property>
>>
>>         <name>mapred.map.max.attempts</name>
>>
>>         <value>20</value>
>>
>>     </property>
>>
>>     <property>
>>
>>         <name>mapred.reduce.max.attempts</name>
>>
>>         <value>20</value>
>>
>>     </property>
>>
>>
>> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
>>
>>
>> P.
>>
>>
>> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>>
>>
>>
>>

Re: newer Cassandra + Hadoop = TimedOutException()

Posted by Patrik Modesto <pa...@gmail.com>.
I've tryied cassandra-all 0.8.10 with fixed the rpc_endpoints ==
"0.0.0.0" bug, but the result is the same, there are still tasks over
1000%.  The only change is that there are real host names instead of
0.0.0.0 in the debug output.

Reconfiguring whole cluster is not possible, I can't test the
"rpc_address" commented out.

Regards,
P.


On Tue, Mar 6, 2012 at 12:26, Florent Lefillâtre <fl...@gmail.com> wrote:
> I remember a bug on the ColumnFamilyInputFormat class 0.8.10.
> It was a test rpc_endpoints == "0.0.0.0" in place of
> rpc_endpoint.equals("0.0.0.0"), may be it can help you
>
> Le 6 mars 2012 12:18, Florent Lefillâtre <fl...@gmail.com> a écrit :
>
>> Excuse me, I had not understood.
>> So, for me, the problem comes from the change of ColumnFamilyInputFormat
>> class between 0.8.7 and 0.8.10 where the splits are created (0.8.7 uses
>> endpoints and 0.8.10 uses rpc_endpoints).
>> With your config, splits fails, so Hadoop doesn't run a Map task on
>> approximtively 16384 rows (your cassandra.input.split.size) but on all the
>> rows of a node (certainly more over 16384).
>> However Hadoop estimate the task progress on 16384 inputs, it's why you
>> have something like 9076.81%.
>>
>> If you can't change rpc_adress configuration, I don't know how you can
>> solve your problem :/, sorry.
>>
>> Le 6 mars 2012 11:53, Patrik Modesto <pa...@gmail.com> a écrit :
>>
>>> Hi Florent,
>>>
>>> I don't change the server version, it is the Cassandra 0.8.10. I
>>> change just the version of cassandra-all in pom.xml of the mapreduce
>>> job.
>>>
>>> I have the 'rpc_address: 0.0.0.0'  in cassandra.yaml, because I want
>>> cassandra to bind RPC to all interfaces.
>>>
>>> Regards,
>>> P.
>>>
>>> On Tue, Mar 6, 2012 at 09:44, Florent Lefillâtre <fl...@gmail.com>
>>> wrote:
>>> > Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
>>> > In my case the split of token range failed.
>>> > I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.
>>> > May be see if you have not configuration changes between 0.8.7 and
>>> > 0.8.10
>>> >
>>> >
>>> > Le 6 mars 2012 09:32, Patrik Modesto <pa...@gmail.com> a écrit
>>> > :
>>> >
>>> >> Hi,
>>> >>
>>> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
>>> >> Timeouts I get are not because of the Cassandra can't handle the
>>> >> requests. I've noticed there are several tasks that show proggess of
>>> >> several thousands percents. Seems like they are looping their range of
>>> >> keys. I've run the job with debug enabled and the ranges look ok, see
>>> >> http://pastebin.com/stVsFzLM
>>> >>
>>> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
>>> >> number of mappers the job creates:
>>> >> 0.8.7: 4680
>>> >> 0.8.10: 595
>>> >>
>>> >> Task       Complete
>>> >> task_201202281457_2027_m_000041 9076.81%
>>> >> task_201202281457_2027_m_000073 9639.04%
>>> >> task_201202281457_2027_m_000105 10538.60%
>>> >> task_201202281457_2027_m_000108 9364.17%
>>> >>
>>> >> None of this happens with cassandra-all 0.8.7.
>>> >>
>>> >> Regards,
>>> >> P.
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto
>>> >> <pa...@gmail.com>
>>> >> wrote:
>>> >> > I'll alter these settings and will let you know.
>>> >> >
>>> >> > Regards,
>>> >> > P.
>>> >> >
>>> >> > On Tue, Feb 28, 2012 at 09:23, aaron morton
>>> >> > <aa...@thelastpickle.com>
>>> >> > wrote:
>>> >> >> Have you tried lowering the  batch size and increasing the time
>>> >> >> out?
>>> >> >> Even
>>> >> >> just to get it to work.
>>> >> >>
>>> >> >> If you get a TimedOutException it means CL number of servers did
>>> >> >> not
>>> >> >> respond
>>> >> >> in time.
>>> >> >>
>>> >> >> Cheers
>>> >> >>
>>> >> >> -----------------
>>> >> >> Aaron Morton
>>> >> >> Freelance Developer
>>> >> >> @aaronmorton
>>> >> >> http://www.thelastpickle.com
>>> >> >>
>>> >> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>>> >> >>
>>> >> >> Hi aaron,
>>> >> >>
>>> >> >> this is our current settings:
>>> >> >>
>>> >> >>      <property>
>>> >> >>          <name>cassandra.range.batch.size</name>
>>> >> >>          <value>1024</value>
>>> >> >>      </property>
>>> >> >>
>>> >> >>      <property>
>>> >> >>          <name>cassandra.input.split.size</name>
>>> >> >>          <value>16384</value>
>>> >> >>      </property>
>>> >> >>
>>> >> >> rpc_timeout_in_ms: 30000
>>> >> >>
>>> >> >> Regards,
>>> >> >> P.
>>> >> >>
>>> >> >> On Mon, Feb 27, 2012 at 21:54, aaron morton
>>> >> >> <aa...@thelastpickle.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >> What settings do you have for cassandra.range.batch.size
>>> >> >>
>>> >> >> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
>>> >> >> increasing
>>> >> >>
>>> >> >> the second ?
>>> >> >>
>>> >> >>
>>> >> >> Cheers
>>> >> >>
>>> >> >>
>>> >> >> -----------------
>>> >> >>
>>> >> >> Aaron Morton
>>> >> >>
>>> >> >> Freelance Developer
>>> >> >>
>>> >> >> @aaronmorton
>>> >> >>
>>> >> >> http://www.thelastpickle.com
>>> >> >>
>>> >> >>
>>> >> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>>> >> >>
>>> >> >>
>>> >> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo
>>> >> >> <ed...@gmail.com>
>>> >> >>
>>> >> >> wrote:
>>> >> >>
>>> >> >>
>>> >> >> Did you see the notes here?
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> I'm not sure what do you mean by the notes?
>>> >> >>
>>> >> >>
>>> >> >> I'm using the mapred.* settings suggested there:
>>> >> >>
>>> >> >>
>>> >> >>     <property>
>>> >> >>
>>> >> >>         <name>mapred.max.tracker.failures</name>
>>> >> >>
>>> >> >>         <value>20</value>
>>> >> >>
>>> >> >>     </property>
>>> >> >>
>>> >> >>     <property>
>>> >> >>
>>> >> >>         <name>mapred.map.max.attempts</name>
>>> >> >>
>>> >> >>         <value>20</value>
>>> >> >>
>>> >> >>     </property>
>>> >> >>
>>> >> >>     <property>
>>> >> >>
>>> >> >>         <name>mapred.reduce.max.attempts</name>
>>> >> >>
>>> >> >>         <value>20</value>
>>> >> >>
>>> >> >>     </property>
>>> >> >>
>>> >> >>
>>> >> >> But I still see the timeouts that I haven't with cassandra-all
>>> >> >> 0.8.7.
>>> >> >>
>>> >> >>
>>> >> >> P.
>>> >> >>
>>> >> >>
>>> >> >> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >
>>> >
>>
>>
>

Re: newer Cassandra + Hadoop = TimedOutException()

Posted by Florent Lefillâtre <fl...@gmail.com>.
I remember a bug on the ColumnFamilyInputFormat class 0.8.10.
It was a test rpc_endpoints == "0.0.0.0" in place of
rpc_endpoint.equals("0.0.0.0"), may be it can help you

Le 6 mars 2012 12:18, Florent Lefillâtre <fl...@gmail.com> a écrit :

> Excuse me, I had not understood.
> So, for me, the problem comes from the change of ColumnFamilyInputFormat
> class between 0.8.7 and 0.8.10 where the splits are created (0.8.7 uses
> endpoints and 0.8.10 uses rpc_endpoints).
> With your config, splits fails, so Hadoop doesn't run a Map task on
> approximtively 16384 rows (your cassandra.input.split.size) but on all the
> rows of a node (certainly more over 16384).
> However Hadoop estimate the task progress on 16384 inputs, it's why you
> have something like 9076.81%.
>
> If you can't change rpc_adress configuration, I don't know how you can
> solve your problem :/, sorry.
>
> Le 6 mars 2012 11:53, Patrik Modesto <pa...@gmail.com> a écrit :
>
> Hi Florent,
>>
>> I don't change the server version, it is the Cassandra 0.8.10. I
>> change just the version of cassandra-all in pom.xml of the mapreduce
>> job.
>>
>> I have the 'rpc_address: 0.0.0.0'  in cassandra.yaml, because I want
>> cassandra to bind RPC to all interfaces.
>>
>> Regards,
>> P.
>>
>> On Tue, Mar 6, 2012 at 09:44, Florent Lefillâtre <fl...@gmail.com>
>> wrote:
>> > Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
>> > In my case the split of token range failed.
>> > I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.
>> > May be see if you have not configuration changes between 0.8.7 and
>> 0.8.10
>> >
>> >
>> > Le 6 mars 2012 09:32, Patrik Modesto <pa...@gmail.com> a
>> écrit :
>> >
>> >> Hi,
>> >>
>> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
>> >> Timeouts I get are not because of the Cassandra can't handle the
>> >> requests. I've noticed there are several tasks that show proggess of
>> >> several thousands percents. Seems like they are looping their range of
>> >> keys. I've run the job with debug enabled and the ranges look ok, see
>> >> http://pastebin.com/stVsFzLM
>> >>
>> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
>> >> number of mappers the job creates:
>> >> 0.8.7: 4680
>> >> 0.8.10: 595
>> >>
>> >> Task       Complete
>> >> task_201202281457_2027_m_000041 9076.81%
>> >> task_201202281457_2027_m_000073 9639.04%
>> >> task_201202281457_2027_m_000105 10538.60%
>> >> task_201202281457_2027_m_000108 9364.17%
>> >>
>> >> None of this happens with cassandra-all 0.8.7.
>> >>
>> >> Regards,
>> >> P.
>> >>
>> >>
>> >>
>> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto <
>> patrik.modesto@gmail.com>
>> >> wrote:
>> >> > I'll alter these settings and will let you know.
>> >> >
>> >> > Regards,
>> >> > P.
>> >> >
>> >> > On Tue, Feb 28, 2012 at 09:23, aaron morton <aaron@thelastpickle.com
>> >
>> >> > wrote:
>> >> >> Have you tried lowering the  batch size and increasing the time out?
>> >> >> Even
>> >> >> just to get it to work.
>> >> >>
>> >> >> If you get a TimedOutException it means CL number of servers did not
>> >> >> respond
>> >> >> in time.
>> >> >>
>> >> >> Cheers
>> >> >>
>> >> >> -----------------
>> >> >> Aaron Morton
>> >> >> Freelance Developer
>> >> >> @aaronmorton
>> >> >> http://www.thelastpickle.com
>> >> >>
>> >> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>> >> >>
>> >> >> Hi aaron,
>> >> >>
>> >> >> this is our current settings:
>> >> >>
>> >> >>      <property>
>> >> >>          <name>cassandra.range.batch.size</name>
>> >> >>          <value>1024</value>
>> >> >>      </property>
>> >> >>
>> >> >>      <property>
>> >> >>          <name>cassandra.input.split.size</name>
>> >> >>          <value>16384</value>
>> >> >>      </property>
>> >> >>
>> >> >> rpc_timeout_in_ms: 30000
>> >> >>
>> >> >> Regards,
>> >> >> P.
>> >> >>
>> >> >> On Mon, Feb 27, 2012 at 21:54, aaron morton <
>> aaron@thelastpickle.com>
>> >> >> wrote:
>> >> >>
>> >> >> What settings do you have for cassandra.range.batch.size
>> >> >>
>> >> >> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
>> >> >> increasing
>> >> >>
>> >> >> the second ?
>> >> >>
>> >> >>
>> >> >> Cheers
>> >> >>
>> >> >>
>> >> >> -----------------
>> >> >>
>> >> >> Aaron Morton
>> >> >>
>> >> >> Freelance Developer
>> >> >>
>> >> >> @aaronmorton
>> >> >>
>> >> >> http://www.thelastpickle.com
>> >> >>
>> >> >>
>> >> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>> >> >>
>> >> >>
>> >> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <
>> edlinuxguru@gmail.com>
>> >> >>
>> >> >> wrote:
>> >> >>
>> >> >>
>> >> >> Did you see the notes here?
>> >> >>
>> >> >>
>> >> >>
>> >> >> I'm not sure what do you mean by the notes?
>> >> >>
>> >> >>
>> >> >> I'm using the mapred.* settings suggested there:
>> >> >>
>> >> >>
>> >> >>     <property>
>> >> >>
>> >> >>         <name>mapred.max.tracker.failures</name>
>> >> >>
>> >> >>         <value>20</value>
>> >> >>
>> >> >>     </property>
>> >> >>
>> >> >>     <property>
>> >> >>
>> >> >>         <name>mapred.map.max.attempts</name>
>> >> >>
>> >> >>         <value>20</value>
>> >> >>
>> >> >>     </property>
>> >> >>
>> >> >>     <property>
>> >> >>
>> >> >>         <name>mapred.reduce.max.attempts</name>
>> >> >>
>> >> >>         <value>20</value>
>> >> >>
>> >> >>     </property>
>> >> >>
>> >> >>
>> >> >> But I still see the timeouts that I haven't with cassandra-all
>> 0.8.7.
>> >> >>
>> >> >>
>> >> >> P.
>> >> >>
>> >> >>
>> >> >> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >
>> >
>>
>
>

Re: newer Cassandra + Hadoop = TimedOutException()

Posted by Florent Lefillâtre <fl...@gmail.com>.
Excuse me, I had not understood.
So, for me, the problem comes from the change of ColumnFamilyInputFormat
class between 0.8.7 and 0.8.10 where the splits are created (0.8.7 uses
endpoints and 0.8.10 uses rpc_endpoints).
With your config, splits fails, so Hadoop doesn't run a Map task on
approximtively 16384 rows (your cassandra.input.split.size) but on all the
rows of a node (certainly more over 16384).
However Hadoop estimate the task progress on 16384 inputs, it's why you
have something like 9076.81%.

If you can't change rpc_adress configuration, I don't know how you can
solve your problem :/, sorry.

Le 6 mars 2012 11:53, Patrik Modesto <pa...@gmail.com> a écrit :

> Hi Florent,
>
> I don't change the server version, it is the Cassandra 0.8.10. I
> change just the version of cassandra-all in pom.xml of the mapreduce
> job.
>
> I have the 'rpc_address: 0.0.0.0'  in cassandra.yaml, because I want
> cassandra to bind RPC to all interfaces.
>
> Regards,
> P.
>
> On Tue, Mar 6, 2012 at 09:44, Florent Lefillâtre <fl...@gmail.com>
> wrote:
> > Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
> > In my case the split of token range failed.
> > I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.
> > May be see if you have not configuration changes between 0.8.7 and 0.8.10
> >
> >
> > Le 6 mars 2012 09:32, Patrik Modesto <pa...@gmail.com> a écrit
> :
> >
> >> Hi,
> >>
> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> >> Timeouts I get are not because of the Cassandra can't handle the
> >> requests. I've noticed there are several tasks that show proggess of
> >> several thousands percents. Seems like they are looping their range of
> >> keys. I've run the job with debug enabled and the ranges look ok, see
> >> http://pastebin.com/stVsFzLM
> >>
> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
> >> number of mappers the job creates:
> >> 0.8.7: 4680
> >> 0.8.10: 595
> >>
> >> Task       Complete
> >> task_201202281457_2027_m_000041 9076.81%
> >> task_201202281457_2027_m_000073 9639.04%
> >> task_201202281457_2027_m_000105 10538.60%
> >> task_201202281457_2027_m_000108 9364.17%
> >>
> >> None of this happens with cassandra-all 0.8.7.
> >>
> >> Regards,
> >> P.
> >>
> >>
> >>
> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto <patrik.modesto@gmail.com
> >
> >> wrote:
> >> > I'll alter these settings and will let you know.
> >> >
> >> > Regards,
> >> > P.
> >> >
> >> > On Tue, Feb 28, 2012 at 09:23, aaron morton <aa...@thelastpickle.com>
> >> > wrote:
> >> >> Have you tried lowering the  batch size and increasing the time out?
> >> >> Even
> >> >> just to get it to work.
> >> >>
> >> >> If you get a TimedOutException it means CL number of servers did not
> >> >> respond
> >> >> in time.
> >> >>
> >> >> Cheers
> >> >>
> >> >> -----------------
> >> >> Aaron Morton
> >> >> Freelance Developer
> >> >> @aaronmorton
> >> >> http://www.thelastpickle.com
> >> >>
> >> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
> >> >>
> >> >> Hi aaron,
> >> >>
> >> >> this is our current settings:
> >> >>
> >> >>      <property>
> >> >>          <name>cassandra.range.batch.size</name>
> >> >>          <value>1024</value>
> >> >>      </property>
> >> >>
> >> >>      <property>
> >> >>          <name>cassandra.input.split.size</name>
> >> >>          <value>16384</value>
> >> >>      </property>
> >> >>
> >> >> rpc_timeout_in_ms: 30000
> >> >>
> >> >> Regards,
> >> >> P.
> >> >>
> >> >> On Mon, Feb 27, 2012 at 21:54, aaron morton <aaron@thelastpickle.com
> >
> >> >> wrote:
> >> >>
> >> >> What settings do you have for cassandra.range.batch.size
> >> >>
> >> >> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
> >> >> increasing
> >> >>
> >> >> the second ?
> >> >>
> >> >>
> >> >> Cheers
> >> >>
> >> >>
> >> >> -----------------
> >> >>
> >> >> Aaron Morton
> >> >>
> >> >> Freelance Developer
> >> >>
> >> >> @aaronmorton
> >> >>
> >> >> http://www.thelastpickle.com
> >> >>
> >> >>
> >> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
> >> >>
> >> >>
> >> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <
> edlinuxguru@gmail.com>
> >> >>
> >> >> wrote:
> >> >>
> >> >>
> >> >> Did you see the notes here?
> >> >>
> >> >>
> >> >>
> >> >> I'm not sure what do you mean by the notes?
> >> >>
> >> >>
> >> >> I'm using the mapred.* settings suggested there:
> >> >>
> >> >>
> >> >>     <property>
> >> >>
> >> >>         <name>mapred.max.tracker.failures</name>
> >> >>
> >> >>         <value>20</value>
> >> >>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>
> >> >>         <name>mapred.map.max.attempts</name>
> >> >>
> >> >>         <value>20</value>
> >> >>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>
> >> >>         <name>mapred.reduce.max.attempts</name>
> >> >>
> >> >>         <value>20</value>
> >> >>
> >> >>     </property>
> >> >>
> >> >>
> >> >> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
> >> >>
> >> >>
> >> >> P.
> >> >>
> >> >>
> >> >> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
> >> >>
> >> >>
> >> >>
> >> >>
> >
> >
>

Re: newer Cassandra + Hadoop = TimedOutException()

Posted by Patrik Modesto <pa...@gmail.com>.
Hi Florent,

I don't change the server version, it is the Cassandra 0.8.10. I
change just the version of cassandra-all in pom.xml of the mapreduce
job.

I have the 'rpc_address: 0.0.0.0'  in cassandra.yaml, because I want
cassandra to bind RPC to all interfaces.

Regards,
P.

On Tue, Mar 6, 2012 at 09:44, Florent Lefillâtre <fl...@gmail.com> wrote:
> Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
> In my case the split of token range failed.
> I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.
> May be see if you have not configuration changes between 0.8.7 and 0.8.10
>
>
> Le 6 mars 2012 09:32, Patrik Modesto <pa...@gmail.com> a écrit :
>
>> Hi,
>>
>> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
>> Timeouts I get are not because of the Cassandra can't handle the
>> requests. I've noticed there are several tasks that show proggess of
>> several thousands percents. Seems like they are looping their range of
>> keys. I've run the job with debug enabled and the ranges look ok, see
>> http://pastebin.com/stVsFzLM
>>
>> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
>> number of mappers the job creates:
>> 0.8.7: 4680
>> 0.8.10: 595
>>
>> Task       Complete
>> task_201202281457_2027_m_000041 9076.81%
>> task_201202281457_2027_m_000073 9639.04%
>> task_201202281457_2027_m_000105 10538.60%
>> task_201202281457_2027_m_000108 9364.17%
>>
>> None of this happens with cassandra-all 0.8.7.
>>
>> Regards,
>> P.
>>
>>
>>
>> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto <pa...@gmail.com>
>> wrote:
>> > I'll alter these settings and will let you know.
>> >
>> > Regards,
>> > P.
>> >
>> > On Tue, Feb 28, 2012 at 09:23, aaron morton <aa...@thelastpickle.com>
>> > wrote:
>> >> Have you tried lowering the  batch size and increasing the time out?
>> >> Even
>> >> just to get it to work.
>> >>
>> >> If you get a TimedOutException it means CL number of servers did not
>> >> respond
>> >> in time.
>> >>
>> >> Cheers
>> >>
>> >> -----------------
>> >> Aaron Morton
>> >> Freelance Developer
>> >> @aaronmorton
>> >> http://www.thelastpickle.com
>> >>
>> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>> >>
>> >> Hi aaron,
>> >>
>> >> this is our current settings:
>> >>
>> >>      <property>
>> >>          <name>cassandra.range.batch.size</name>
>> >>          <value>1024</value>
>> >>      </property>
>> >>
>> >>      <property>
>> >>          <name>cassandra.input.split.size</name>
>> >>          <value>16384</value>
>> >>      </property>
>> >>
>> >> rpc_timeout_in_ms: 30000
>> >>
>> >> Regards,
>> >> P.
>> >>
>> >> On Mon, Feb 27, 2012 at 21:54, aaron morton <aa...@thelastpickle.com>
>> >> wrote:
>> >>
>> >> What settings do you have for cassandra.range.batch.size
>> >>
>> >> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
>> >> increasing
>> >>
>> >> the second ?
>> >>
>> >>
>> >> Cheers
>> >>
>> >>
>> >> -----------------
>> >>
>> >> Aaron Morton
>> >>
>> >> Freelance Developer
>> >>
>> >> @aaronmorton
>> >>
>> >> http://www.thelastpickle.com
>> >>
>> >>
>> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>> >>
>> >>
>> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <ed...@gmail.com>
>> >>
>> >> wrote:
>> >>
>> >>
>> >> Did you see the notes here?
>> >>
>> >>
>> >>
>> >> I'm not sure what do you mean by the notes?
>> >>
>> >>
>> >> I'm using the mapred.* settings suggested there:
>> >>
>> >>
>> >>     <property>
>> >>
>> >>         <name>mapred.max.tracker.failures</name>
>> >>
>> >>         <value>20</value>
>> >>
>> >>     </property>
>> >>
>> >>     <property>
>> >>
>> >>         <name>mapred.map.max.attempts</name>
>> >>
>> >>         <value>20</value>
>> >>
>> >>     </property>
>> >>
>> >>     <property>
>> >>
>> >>         <name>mapred.reduce.max.attempts</name>
>> >>
>> >>         <value>20</value>
>> >>
>> >>     </property>
>> >>
>> >>
>> >> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
>> >>
>> >>
>> >> P.
>> >>
>> >>
>> >> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>> >>
>> >>
>> >>
>> >>
>
>

Re: newer Cassandra + Hadoop = TimedOutException()

Posted by Florent Lefillâtre <fl...@gmail.com>.
Hi, I had the same problem on hadoop 0.20.2 and cassandra 1.0.5.
In my case the split of token range failed.
I have comment line 'rpc_address: 0.0.0.0' in cassandra.yaml.
May be see if you have not configuration changes between 0.8.7 and 0.8.10


Le 6 mars 2012 09:32, Patrik Modesto <pa...@gmail.com> a écrit :

> Hi,
>
> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> Timeouts I get are not because of the Cassandra can't handle the
> requests. I've noticed there are several tasks that show proggess of
> several thousands percents. Seems like they are looping their range of
> keys. I've run the job with debug enabled and the ranges look ok, see
> http://pastebin.com/stVsFzLM
>
> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
> number of mappers the job creates:
> 0.8.7: 4680
> 0.8.10: 595
>
> Task       Complete
> task_201202281457_2027_m_000041 9076.81%
> task_201202281457_2027_m_000073 9639.04%
> task_201202281457_2027_m_000105 10538.60%
> task_201202281457_2027_m_000108 9364.17%
>
> None of this happens with cassandra-all 0.8.7.
>
> Regards,
> P.
>
>
>
> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto <pa...@gmail.com>
> wrote:
> > I'll alter these settings and will let you know.
> >
> > Regards,
> > P.
> >
> > On Tue, Feb 28, 2012 at 09:23, aaron morton <aa...@thelastpickle.com>
> wrote:
> >> Have you tried lowering the  batch size and increasing the time out?
> Even
> >> just to get it to work.
> >>
> >> If you get a TimedOutException it means CL number of servers did not
> respond
> >> in time.
> >>
> >> Cheers
> >>
> >> -----------------
> >> Aaron Morton
> >> Freelance Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >>
> >> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
> >>
> >> Hi aaron,
> >>
> >> this is our current settings:
> >>
> >>      <property>
> >>          <name>cassandra.range.batch.size</name>
> >>          <value>1024</value>
> >>      </property>
> >>
> >>      <property>
> >>          <name>cassandra.input.split.size</name>
> >>          <value>16384</value>
> >>      </property>
> >>
> >> rpc_timeout_in_ms: 30000
> >>
> >> Regards,
> >> P.
> >>
> >> On Mon, Feb 27, 2012 at 21:54, aaron morton <aa...@thelastpickle.com>
> wrote:
> >>
> >> What settings do you have for cassandra.range.batch.size
> >>
> >> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
> increasing
> >>
> >> the second ?
> >>
> >>
> >> Cheers
> >>
> >>
> >> -----------------
> >>
> >> Aaron Morton
> >>
> >> Freelance Developer
> >>
> >> @aaronmorton
> >>
> >> http://www.thelastpickle.com
> >>
> >>
> >> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
> >>
> >>
> >> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <ed...@gmail.com>
> >>
> >> wrote:
> >>
> >>
> >> Did you see the notes here?
> >>
> >>
> >>
> >> I'm not sure what do you mean by the notes?
> >>
> >>
> >> I'm using the mapred.* settings suggested there:
> >>
> >>
> >>     <property>
> >>
> >>         <name>mapred.max.tracker.failures</name>
> >>
> >>         <value>20</value>
> >>
> >>     </property>
> >>
> >>     <property>
> >>
> >>         <name>mapred.map.max.attempts</name>
> >>
> >>         <value>20</value>
> >>
> >>     </property>
> >>
> >>     <property>
> >>
> >>         <name>mapred.reduce.max.attempts</name>
> >>
> >>         <value>20</value>
> >>
> >>     </property>
> >>
> >>
> >> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
> >>
> >>
> >> P.
> >>
> >>
> >> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
> >>
> >>
> >>
> >>
>

Re: newer Cassandra + Hadoop = TimedOutException()

Posted by Patrik Modesto <pa...@gmail.com>.
I did change the rpc_endpoint to endpoints and now the splits are
computed correctly. So it's a bug in cassandra to hadoop interface. I
suspect that it has something to do with wide rows with tens of
thousands of columns we have because the unpatched getSubSplits()
works with small test data we have for development.

Regards,
P.


On Wed, Mar 7, 2012 at 11:02, Florent Lefillâtre <fl...@gmail.com> wrote:
> If you want try a test, in the CFIF.getSubSplits(String, String, TokenRange,
> Configuration) method, replace the loop on 'range.rpc_endpoints' by the same
> loop on 'range.endpoints'.
> This method split token range of each node with describe_splits method, but
> I think there is something wrong when you create Cassandra connection on
> host '0.0.0.0'.
>
>
>
>
> Le 7 mars 2012 09:07, Patrik Modesto <pa...@gmail.com> a écrit :
>
>> You're right, I wasn't looking in the right logs. Unfortunately I'd
>> need to restart hadoop takstracker with loglevel DEBUG and that is not
>> possilbe at the moment. Pitty it happens only in the production with
>> terrabytes of data, not in the test...
>>
>> Regards,
>> P.
>>
>> On Tue, Mar 6, 2012 at 14:31, Florent Lefillâtre <fl...@gmail.com>
>> wrote:
>> > CFRR.getProgress() is called by child mapper tasks on each TastTracker
>> > node,
>> > so the log must appear on
>> > ${hadoop_log_dir}/attempt_201202081707_0001_m_000000_0/syslog (or
>> > somethings
>> > like this) on TaskTrackers, not on client job logs.
>> > Are you sure to see the good log file, I say that because in your first
>> > mail
>> > you link the client job log.
>> > And may be you can log the size of each split in CFIF.
>> >
>> >
>> >
>> >
>> > Le 6 mars 2012 13:09, Patrik Modesto <pa...@gmail.com> a écrit
>> > :
>> >
>> >> I've added a debug message in the CFRR.getProgress() and I can't find
>> >> it in the debug output. Seems like the getProgress() has not been
>> >> called at all;
>> >>
>> >> Regards,
>> >> P.
>> >>
>> >> On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna <je...@gmail.com>
>> >> wrote:
>> >> > you may be running into this -
>> >> > https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure
>> >> > if it
>> >> > really affects the execution of the job itself though.
>> >> >
>> >> > On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and
>> >> >> the
>> >> >> Timeouts I get are not because of the Cassandra can't handle the
>> >> >> requests. I've noticed there are several tasks that show proggess of
>> >> >> several thousands percents. Seems like they are looping their range
>> >> >> of
>> >> >> keys. I've run the job with debug enabled and the ranges look ok,
>> >> >> see
>> >> >> http://pastebin.com/stVsFzLM
>> >> >>
>> >> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
>> >> >> number of mappers the job creates:
>> >> >> 0.8.7: 4680
>> >> >> 0.8.10: 595
>> >> >>
>> >> >> Task       Complete
>> >> >> task_201202281457_2027_m_000041       9076.81%
>> >> >> task_201202281457_2027_m_000073       9639.04%
>> >> >> task_201202281457_2027_m_000105       10538.60%
>> >> >> task_201202281457_2027_m_000108       9364.17%
>> >> >>
>> >> >> None of this happens with cassandra-all 0.8.7.
>> >> >>
>> >> >> Regards,
>> >> >> P.
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto
>> >> >> <pa...@gmail.com> wrote:
>> >> >>> I'll alter these settings and will let you know.
>> >> >>>
>> >> >>> Regards,
>> >> >>> P.
>> >> >>>
>> >> >>> On Tue, Feb 28, 2012 at 09:23, aaron morton
>> >> >>> <aa...@thelastpickle.com>
>> >> >>> wrote:
>> >> >>>> Have you tried lowering the  batch size and increasing the time
>> >> >>>> out?
>> >> >>>> Even
>> >> >>>> just to get it to work.
>> >> >>>>
>> >> >>>> If you get a TimedOutException it means CL number of servers did
>> >> >>>> not
>> >> >>>> respond
>> >> >>>> in time.
>> >> >>>>
>> >> >>>> Cheers
>> >> >>>>
>> >> >>>> -----------------
>> >> >>>> Aaron Morton
>> >> >>>> Freelance Developer
>> >> >>>> @aaronmorton
>> >> >>>> http://www.thelastpickle.com
>> >> >>>>
>> >> >>>> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>> >> >>>>
>> >> >>>> Hi aaron,
>> >> >>>>
>> >> >>>> this is our current settings:
>> >> >>>>
>> >> >>>>      <property>
>> >> >>>>          <name>cassandra.range.batch.size</name>
>> >> >>>>          <value>1024</value>
>> >> >>>>      </property>
>> >> >>>>
>> >> >>>>      <property>
>> >> >>>>          <name>cassandra.input.split.size</name>
>> >> >>>>          <value>16384</value>
>> >> >>>>      </property>
>> >> >>>>
>> >> >>>> rpc_timeout_in_ms: 30000
>> >> >>>>
>> >> >>>> Regards,
>> >> >>>> P.
>> >> >>>>
>> >> >>>> On Mon, Feb 27, 2012 at 21:54, aaron morton
>> >> >>>> <aa...@thelastpickle.com>
>> >> >>>> wrote:
>> >> >>>>
>> >> >>>> What settings do you have for cassandra.range.batch.size
>> >> >>>>
>> >> >>>> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
>> >> >>>> increasing
>> >> >>>>
>> >> >>>> the second ?
>> >> >>>>
>> >> >>>>
>> >> >>>> Cheers
>> >> >>>>
>> >> >>>>
>> >> >>>> -----------------
>> >> >>>>
>> >> >>>> Aaron Morton
>> >> >>>>
>> >> >>>> Freelance Developer
>> >> >>>>
>> >> >>>> @aaronmorton
>> >> >>>>
>> >> >>>> http://www.thelastpickle.com
>> >> >>>>
>> >> >>>>
>> >> >>>> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>> >> >>>>
>> >> >>>>
>> >> >>>> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo
>> >> >>>> <ed...@gmail.com>
>> >> >>>>
>> >> >>>> wrote:
>> >> >>>>
>> >> >>>>
>> >> >>>> Did you see the notes here?
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> I'm not sure what do you mean by the notes?
>> >> >>>>
>> >> >>>>
>> >> >>>> I'm using the mapred.* settings suggested there:
>> >> >>>>
>> >> >>>>
>> >> >>>>     <property>
>> >> >>>>
>> >> >>>>         <name>mapred.max.tracker.failures</name>
>> >> >>>>
>> >> >>>>         <value>20</value>
>> >> >>>>
>> >> >>>>     </property>
>> >> >>>>
>> >> >>>>     <property>
>> >> >>>>
>> >> >>>>         <name>mapred.map.max.attempts</name>
>> >> >>>>
>> >> >>>>         <value>20</value>
>> >> >>>>
>> >> >>>>     </property>
>> >> >>>>
>> >> >>>>     <property>
>> >> >>>>
>> >> >>>>         <name>mapred.reduce.max.attempts</name>
>> >> >>>>
>> >> >>>>         <value>20</value>
>> >> >>>>
>> >> >>>>     </property>
>> >> >>>>
>> >> >>>>
>> >> >>>> But I still see the timeouts that I haven't with cassandra-all
>> >> >>>> 0.8.7.
>> >> >>>>
>> >> >>>>
>> >> >>>> P.
>> >> >>>>
>> >> >>>>
>> >> >>>> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >
>> >
>> >
>
>

Re: newer Cassandra + Hadoop = TimedOutException()

Posted by Florent Lefillâtre <fl...@gmail.com>.
If you want try a test, in the CFIF.getSubSplits(String, String,
TokenRange, Configuration) method, replace the loop on
'range.rpc_endpoints' by the same loop on 'range.endpoints'.
This method split token range of each node with describe_splits method, but
I think there is something wrong when you create Cassandra connection on
host '0.0.0.0'.




Le 7 mars 2012 09:07, Patrik Modesto <pa...@gmail.com> a écrit :

> You're right, I wasn't looking in the right logs. Unfortunately I'd
> need to restart hadoop takstracker with loglevel DEBUG and that is not
> possilbe at the moment. Pitty it happens only in the production with
> terrabytes of data, not in the test...
>
> Regards,
> P.
>
> On Tue, Mar 6, 2012 at 14:31, Florent Lefillâtre <fl...@gmail.com>
> wrote:
> > CFRR.getProgress() is called by child mapper tasks on each TastTracker
> node,
> > so the log must appear on
> > ${hadoop_log_dir}/attempt_201202081707_0001_m_000000_0/syslog (or
> somethings
> > like this) on TaskTrackers, not on client job logs.
> > Are you sure to see the good log file, I say that because in your first
> mail
> > you link the client job log.
> > And may be you can log the size of each split in CFIF.
> >
> >
> >
> >
> > Le 6 mars 2012 13:09, Patrik Modesto <pa...@gmail.com> a écrit
> :
> >
> >> I've added a debug message in the CFRR.getProgress() and I can't find
> >> it in the debug output. Seems like the getProgress() has not been
> >> called at all;
> >>
> >> Regards,
> >> P.
> >>
> >> On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna <je...@gmail.com>
> >> wrote:
> >> > you may be running into this -
> >> > https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure
> if it
> >> > really affects the execution of the job itself though.
> >> >
> >> > On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> >> >> Timeouts I get are not because of the Cassandra can't handle the
> >> >> requests. I've noticed there are several tasks that show proggess of
> >> >> several thousands percents. Seems like they are looping their range
> of
> >> >> keys. I've run the job with debug enabled and the ranges look ok, see
> >> >> http://pastebin.com/stVsFzLM
> >> >>
> >> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
> >> >> number of mappers the job creates:
> >> >> 0.8.7: 4680
> >> >> 0.8.10: 595
> >> >>
> >> >> Task       Complete
> >> >> task_201202281457_2027_m_000041       9076.81%
> >> >> task_201202281457_2027_m_000073       9639.04%
> >> >> task_201202281457_2027_m_000105       10538.60%
> >> >> task_201202281457_2027_m_000108       9364.17%
> >> >>
> >> >> None of this happens with cassandra-all 0.8.7.
> >> >>
> >> >> Regards,
> >> >> P.
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto
> >> >> <pa...@gmail.com> wrote:
> >> >>> I'll alter these settings and will let you know.
> >> >>>
> >> >>> Regards,
> >> >>> P.
> >> >>>
> >> >>> On Tue, Feb 28, 2012 at 09:23, aaron morton <
> aaron@thelastpickle.com>
> >> >>> wrote:
> >> >>>> Have you tried lowering the  batch size and increasing the time
> out?
> >> >>>> Even
> >> >>>> just to get it to work.
> >> >>>>
> >> >>>> If you get a TimedOutException it means CL number of servers did
> not
> >> >>>> respond
> >> >>>> in time.
> >> >>>>
> >> >>>> Cheers
> >> >>>>
> >> >>>> -----------------
> >> >>>> Aaron Morton
> >> >>>> Freelance Developer
> >> >>>> @aaronmorton
> >> >>>> http://www.thelastpickle.com
> >> >>>>
> >> >>>> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
> >> >>>>
> >> >>>> Hi aaron,
> >> >>>>
> >> >>>> this is our current settings:
> >> >>>>
> >> >>>>      <property>
> >> >>>>          <name>cassandra.range.batch.size</name>
> >> >>>>          <value>1024</value>
> >> >>>>      </property>
> >> >>>>
> >> >>>>      <property>
> >> >>>>          <name>cassandra.input.split.size</name>
> >> >>>>          <value>16384</value>
> >> >>>>      </property>
> >> >>>>
> >> >>>> rpc_timeout_in_ms: 30000
> >> >>>>
> >> >>>> Regards,
> >> >>>> P.
> >> >>>>
> >> >>>> On Mon, Feb 27, 2012 at 21:54, aaron morton <
> aaron@thelastpickle.com>
> >> >>>> wrote:
> >> >>>>
> >> >>>> What settings do you have for cassandra.range.batch.size
> >> >>>>
> >> >>>> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
> >> >>>> increasing
> >> >>>>
> >> >>>> the second ?
> >> >>>>
> >> >>>>
> >> >>>> Cheers
> >> >>>>
> >> >>>>
> >> >>>> -----------------
> >> >>>>
> >> >>>> Aaron Morton
> >> >>>>
> >> >>>> Freelance Developer
> >> >>>>
> >> >>>> @aaronmorton
> >> >>>>
> >> >>>> http://www.thelastpickle.com
> >> >>>>
> >> >>>>
> >> >>>> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
> >> >>>>
> >> >>>>
> >> >>>> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo
> >> >>>> <ed...@gmail.com>
> >> >>>>
> >> >>>> wrote:
> >> >>>>
> >> >>>>
> >> >>>> Did you see the notes here?
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> I'm not sure what do you mean by the notes?
> >> >>>>
> >> >>>>
> >> >>>> I'm using the mapred.* settings suggested there:
> >> >>>>
> >> >>>>
> >> >>>>     <property>
> >> >>>>
> >> >>>>         <name>mapred.max.tracker.failures</name>
> >> >>>>
> >> >>>>         <value>20</value>
> >> >>>>
> >> >>>>     </property>
> >> >>>>
> >> >>>>     <property>
> >> >>>>
> >> >>>>         <name>mapred.map.max.attempts</name>
> >> >>>>
> >> >>>>         <value>20</value>
> >> >>>>
> >> >>>>     </property>
> >> >>>>
> >> >>>>     <property>
> >> >>>>
> >> >>>>         <name>mapred.reduce.max.attempts</name>
> >> >>>>
> >> >>>>         <value>20</value>
> >> >>>>
> >> >>>>     </property>
> >> >>>>
> >> >>>>
> >> >>>> But I still see the timeouts that I haven't with cassandra-all
> 0.8.7.
> >> >>>>
> >> >>>>
> >> >>>> P.
> >> >>>>
> >> >>>>
> >> >>>> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >
> >
> >
>

Re: newer Cassandra + Hadoop = TimedOutException()

Posted by Patrik Modesto <pa...@gmail.com>.
You're right, I wasn't looking in the right logs. Unfortunately I'd
need to restart hadoop takstracker with loglevel DEBUG and that is not
possilbe at the moment. Pitty it happens only in the production with
terrabytes of data, not in the test...

Regards,
P.

On Tue, Mar 6, 2012 at 14:31, Florent Lefillâtre <fl...@gmail.com> wrote:
> CFRR.getProgress() is called by child mapper tasks on each TastTracker node,
> so the log must appear on
> ${hadoop_log_dir}/attempt_201202081707_0001_m_000000_0/syslog (or somethings
> like this) on TaskTrackers, not on client job logs.
> Are you sure to see the good log file, I say that because in your first mail
> you link the client job log.
> And may be you can log the size of each split in CFIF.
>
>
>
>
> Le 6 mars 2012 13:09, Patrik Modesto <pa...@gmail.com> a écrit :
>
>> I've added a debug message in the CFRR.getProgress() and I can't find
>> it in the debug output. Seems like the getProgress() has not been
>> called at all;
>>
>> Regards,
>> P.
>>
>> On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna <je...@gmail.com>
>> wrote:
>> > you may be running into this -
>> > https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it
>> > really affects the execution of the job itself though.
>> >
>> > On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
>> >
>> >> Hi,
>> >>
>> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
>> >> Timeouts I get are not because of the Cassandra can't handle the
>> >> requests. I've noticed there are several tasks that show proggess of
>> >> several thousands percents. Seems like they are looping their range of
>> >> keys. I've run the job with debug enabled and the ranges look ok, see
>> >> http://pastebin.com/stVsFzLM
>> >>
>> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
>> >> number of mappers the job creates:
>> >> 0.8.7: 4680
>> >> 0.8.10: 595
>> >>
>> >> Task       Complete
>> >> task_201202281457_2027_m_000041       9076.81%
>> >> task_201202281457_2027_m_000073       9639.04%
>> >> task_201202281457_2027_m_000105       10538.60%
>> >> task_201202281457_2027_m_000108       9364.17%
>> >>
>> >> None of this happens with cassandra-all 0.8.7.
>> >>
>> >> Regards,
>> >> P.
>> >>
>> >>
>> >>
>> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto
>> >> <pa...@gmail.com> wrote:
>> >>> I'll alter these settings and will let you know.
>> >>>
>> >>> Regards,
>> >>> P.
>> >>>
>> >>> On Tue, Feb 28, 2012 at 09:23, aaron morton <aa...@thelastpickle.com>
>> >>> wrote:
>> >>>> Have you tried lowering the  batch size and increasing the time out?
>> >>>> Even
>> >>>> just to get it to work.
>> >>>>
>> >>>> If you get a TimedOutException it means CL number of servers did not
>> >>>> respond
>> >>>> in time.
>> >>>>
>> >>>> Cheers
>> >>>>
>> >>>> -----------------
>> >>>> Aaron Morton
>> >>>> Freelance Developer
>> >>>> @aaronmorton
>> >>>> http://www.thelastpickle.com
>> >>>>
>> >>>> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>> >>>>
>> >>>> Hi aaron,
>> >>>>
>> >>>> this is our current settings:
>> >>>>
>> >>>>      <property>
>> >>>>          <name>cassandra.range.batch.size</name>
>> >>>>          <value>1024</value>
>> >>>>      </property>
>> >>>>
>> >>>>      <property>
>> >>>>          <name>cassandra.input.split.size</name>
>> >>>>          <value>16384</value>
>> >>>>      </property>
>> >>>>
>> >>>> rpc_timeout_in_ms: 30000
>> >>>>
>> >>>> Regards,
>> >>>> P.
>> >>>>
>> >>>> On Mon, Feb 27, 2012 at 21:54, aaron morton <aa...@thelastpickle.com>
>> >>>> wrote:
>> >>>>
>> >>>> What settings do you have for cassandra.range.batch.size
>> >>>>
>> >>>> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
>> >>>> increasing
>> >>>>
>> >>>> the second ?
>> >>>>
>> >>>>
>> >>>> Cheers
>> >>>>
>> >>>>
>> >>>> -----------------
>> >>>>
>> >>>> Aaron Morton
>> >>>>
>> >>>> Freelance Developer
>> >>>>
>> >>>> @aaronmorton
>> >>>>
>> >>>> http://www.thelastpickle.com
>> >>>>
>> >>>>
>> >>>> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>> >>>>
>> >>>>
>> >>>> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo
>> >>>> <ed...@gmail.com>
>> >>>>
>> >>>> wrote:
>> >>>>
>> >>>>
>> >>>> Did you see the notes here?
>> >>>>
>> >>>>
>> >>>>
>> >>>> I'm not sure what do you mean by the notes?
>> >>>>
>> >>>>
>> >>>> I'm using the mapred.* settings suggested there:
>> >>>>
>> >>>>
>> >>>>     <property>
>> >>>>
>> >>>>         <name>mapred.max.tracker.failures</name>
>> >>>>
>> >>>>         <value>20</value>
>> >>>>
>> >>>>     </property>
>> >>>>
>> >>>>     <property>
>> >>>>
>> >>>>         <name>mapred.map.max.attempts</name>
>> >>>>
>> >>>>         <value>20</value>
>> >>>>
>> >>>>     </property>
>> >>>>
>> >>>>     <property>
>> >>>>
>> >>>>         <name>mapred.reduce.max.attempts</name>
>> >>>>
>> >>>>         <value>20</value>
>> >>>>
>> >>>>     </property>
>> >>>>
>> >>>>
>> >>>> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
>> >>>>
>> >>>>
>> >>>> P.
>> >>>>
>> >>>>
>> >>>> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >
>
>

Re: newer Cassandra + Hadoop = TimedOutException()

Posted by Florent Lefillâtre <fl...@gmail.com>.
CFRR.getProgress() is called by child mapper tasks on each TastTracker
node, so the log must appear on
${hadoop_log_dir}/attempt_201202081707_0001_m_000000_0/syslog (or
somethings like this) on TaskTrackers, not on client job logs.
Are you sure to see the good log file, I say that because in your first
mail you link the client job log.
And may be you can log the size of each split in CFIF.




Le 6 mars 2012 13:09, Patrik Modesto <pa...@gmail.com> a écrit :

> I've added a debug message in the CFRR.getProgress() and I can't find
> it in the debug output. Seems like the getProgress() has not been
> called at all;
>
> Regards,
> P.
>
> On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna <je...@gmail.com>
> wrote:
> > you may be running into this -
> https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it
> really affects the execution of the job itself though.
> >
> > On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
> >
> >> Hi,
> >>
> >> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> >> Timeouts I get are not because of the Cassandra can't handle the
> >> requests. I've noticed there are several tasks that show proggess of
> >> several thousands percents. Seems like they are looping their range of
> >> keys. I've run the job with debug enabled and the ranges look ok, see
> >> http://pastebin.com/stVsFzLM
> >>
> >> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
> >> number of mappers the job creates:
> >> 0.8.7: 4680
> >> 0.8.10: 595
> >>
> >> Task       Complete
> >> task_201202281457_2027_m_000041       9076.81%
> >> task_201202281457_2027_m_000073       9639.04%
> >> task_201202281457_2027_m_000105       10538.60%
> >> task_201202281457_2027_m_000108       9364.17%
> >>
> >> None of this happens with cassandra-all 0.8.7.
> >>
> >> Regards,
> >> P.
> >>
> >>
> >>
> >> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto <pa...@gmail.com>
> wrote:
> >>> I'll alter these settings and will let you know.
> >>>
> >>> Regards,
> >>> P.
> >>>
> >>> On Tue, Feb 28, 2012 at 09:23, aaron morton <aa...@thelastpickle.com>
> wrote:
> >>>> Have you tried lowering the  batch size and increasing the time out?
> Even
> >>>> just to get it to work.
> >>>>
> >>>> If you get a TimedOutException it means CL number of servers did not
> respond
> >>>> in time.
> >>>>
> >>>> Cheers
> >>>>
> >>>> -----------------
> >>>> Aaron Morton
> >>>> Freelance Developer
> >>>> @aaronmorton
> >>>> http://www.thelastpickle.com
> >>>>
> >>>> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
> >>>>
> >>>> Hi aaron,
> >>>>
> >>>> this is our current settings:
> >>>>
> >>>>      <property>
> >>>>          <name>cassandra.range.batch.size</name>
> >>>>          <value>1024</value>
> >>>>      </property>
> >>>>
> >>>>      <property>
> >>>>          <name>cassandra.input.split.size</name>
> >>>>          <value>16384</value>
> >>>>      </property>
> >>>>
> >>>> rpc_timeout_in_ms: 30000
> >>>>
> >>>> Regards,
> >>>> P.
> >>>>
> >>>> On Mon, Feb 27, 2012 at 21:54, aaron morton <aa...@thelastpickle.com>
> wrote:
> >>>>
> >>>> What settings do you have for cassandra.range.batch.size
> >>>>
> >>>> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or
> increasing
> >>>>
> >>>> the second ?
> >>>>
> >>>>
> >>>> Cheers
> >>>>
> >>>>
> >>>> -----------------
> >>>>
> >>>> Aaron Morton
> >>>>
> >>>> Freelance Developer
> >>>>
> >>>> @aaronmorton
> >>>>
> >>>> http://www.thelastpickle.com
> >>>>
> >>>>
> >>>> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
> >>>>
> >>>>
> >>>> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <edlinuxguru@gmail.com
> >
> >>>>
> >>>> wrote:
> >>>>
> >>>>
> >>>> Did you see the notes here?
> >>>>
> >>>>
> >>>>
> >>>> I'm not sure what do you mean by the notes?
> >>>>
> >>>>
> >>>> I'm using the mapred.* settings suggested there:
> >>>>
> >>>>
> >>>>     <property>
> >>>>
> >>>>         <name>mapred.max.tracker.failures</name>
> >>>>
> >>>>         <value>20</value>
> >>>>
> >>>>     </property>
> >>>>
> >>>>     <property>
> >>>>
> >>>>         <name>mapred.map.max.attempts</name>
> >>>>
> >>>>         <value>20</value>
> >>>>
> >>>>     </property>
> >>>>
> >>>>     <property>
> >>>>
> >>>>         <name>mapred.reduce.max.attempts</name>
> >>>>
> >>>>         <value>20</value>
> >>>>
> >>>>     </property>
> >>>>
> >>>>
> >>>> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
> >>>>
> >>>>
> >>>> P.
> >>>>
> >>>>
> >>>> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
> >>>>
> >>>>
> >>>>
> >>>>
> >
>

Re: newer Cassandra + Hadoop = TimedOutException()

Posted by Patrik Modesto <pa...@gmail.com>.
I've added a debug message in the CFRR.getProgress() and I can't find
it in the debug output. Seems like the getProgress() has not been
called at all;

Regards,
P.

On Tue, Mar 6, 2012 at 09:49, Jeremy Hanna <je...@gmail.com> wrote:
> you may be running into this - https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it really affects the execution of the job itself though.
>
> On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
>
>> Hi,
>>
>> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
>> Timeouts I get are not because of the Cassandra can't handle the
>> requests. I've noticed there are several tasks that show proggess of
>> several thousands percents. Seems like they are looping their range of
>> keys. I've run the job with debug enabled and the ranges look ok, see
>> http://pastebin.com/stVsFzLM
>>
>> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
>> number of mappers the job creates:
>> 0.8.7: 4680
>> 0.8.10: 595
>>
>> Task       Complete
>> task_201202281457_2027_m_000041       9076.81%
>> task_201202281457_2027_m_000073       9639.04%
>> task_201202281457_2027_m_000105       10538.60%
>> task_201202281457_2027_m_000108       9364.17%
>>
>> None of this happens with cassandra-all 0.8.7.
>>
>> Regards,
>> P.
>>
>>
>>
>> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto <pa...@gmail.com> wrote:
>>> I'll alter these settings and will let you know.
>>>
>>> Regards,
>>> P.
>>>
>>> On Tue, Feb 28, 2012 at 09:23, aaron morton <aa...@thelastpickle.com> wrote:
>>>> Have you tried lowering the  batch size and increasing the time out? Even
>>>> just to get it to work.
>>>>
>>>> If you get a TimedOutException it means CL number of servers did not respond
>>>> in time.
>>>>
>>>> Cheers
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>>>>
>>>> Hi aaron,
>>>>
>>>> this is our current settings:
>>>>
>>>>      <property>
>>>>          <name>cassandra.range.batch.size</name>
>>>>          <value>1024</value>
>>>>      </property>
>>>>
>>>>      <property>
>>>>          <name>cassandra.input.split.size</name>
>>>>          <value>16384</value>
>>>>      </property>
>>>>
>>>> rpc_timeout_in_ms: 30000
>>>>
>>>> Regards,
>>>> P.
>>>>
>>>> On Mon, Feb 27, 2012 at 21:54, aaron morton <aa...@thelastpickle.com> wrote:
>>>>
>>>> What settings do you have for cassandra.range.batch.size
>>>>
>>>> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or increasing
>>>>
>>>> the second ?
>>>>
>>>>
>>>> Cheers
>>>>
>>>>
>>>> -----------------
>>>>
>>>> Aaron Morton
>>>>
>>>> Freelance Developer
>>>>
>>>> @aaronmorton
>>>>
>>>> http://www.thelastpickle.com
>>>>
>>>>
>>>> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>>>>
>>>>
>>>> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <ed...@gmail.com>
>>>>
>>>> wrote:
>>>>
>>>>
>>>> Did you see the notes here?
>>>>
>>>>
>>>>
>>>> I'm not sure what do you mean by the notes?
>>>>
>>>>
>>>> I'm using the mapred.* settings suggested there:
>>>>
>>>>
>>>>     <property>
>>>>
>>>>         <name>mapred.max.tracker.failures</name>
>>>>
>>>>         <value>20</value>
>>>>
>>>>     </property>
>>>>
>>>>     <property>
>>>>
>>>>         <name>mapred.map.max.attempts</name>
>>>>
>>>>         <value>20</value>
>>>>
>>>>     </property>
>>>>
>>>>     <property>
>>>>
>>>>         <name>mapred.reduce.max.attempts</name>
>>>>
>>>>         <value>20</value>
>>>>
>>>>     </property>
>>>>
>>>>
>>>> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
>>>>
>>>>
>>>> P.
>>>>
>>>>
>>>> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>>>>
>>>>
>>>>
>>>>
>

Re: newer Cassandra + Hadoop = TimedOutException()

Posted by Jeremy Hanna <je...@gmail.com>.
you may be running into this - https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it really affects the execution of the job itself though.

On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:

> Hi,
> 
> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> Timeouts I get are not because of the Cassandra can't handle the
> requests. I've noticed there are several tasks that show proggess of
> several thousands percents. Seems like they are looping their range of
> keys. I've run the job with debug enabled and the ranges look ok, see
> http://pastebin.com/stVsFzLM
> 
> Another difference between cassandra-all 0.8.7 and 0.8.10 is the
> number of mappers the job creates:
> 0.8.7: 4680
> 0.8.10: 595
> 
> Task       Complete
> task_201202281457_2027_m_000041	9076.81%
> task_201202281457_2027_m_000073	9639.04%
> task_201202281457_2027_m_000105	10538.60%
> task_201202281457_2027_m_000108	9364.17%
> 
> None of this happens with cassandra-all 0.8.7.
> 
> Regards,
> P.
> 
> 
> 
> On Tue, Feb 28, 2012 at 12:29, Patrik Modesto <pa...@gmail.com> wrote:
>> I'll alter these settings and will let you know.
>> 
>> Regards,
>> P.
>> 
>> On Tue, Feb 28, 2012 at 09:23, aaron morton <aa...@thelastpickle.com> wrote:
>>> Have you tried lowering the  batch size and increasing the time out? Even
>>> just to get it to work.
>>> 
>>> If you get a TimedOutException it means CL number of servers did not respond
>>> in time.
>>> 
>>> Cheers
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 28/02/2012, at 8:18 PM, Patrik Modesto wrote:
>>> 
>>> Hi aaron,
>>> 
>>> this is our current settings:
>>> 
>>>      <property>
>>>          <name>cassandra.range.batch.size</name>
>>>          <value>1024</value>
>>>      </property>
>>> 
>>>      <property>
>>>          <name>cassandra.input.split.size</name>
>>>          <value>16384</value>
>>>      </property>
>>> 
>>> rpc_timeout_in_ms: 30000
>>> 
>>> Regards,
>>> P.
>>> 
>>> On Mon, Feb 27, 2012 at 21:54, aaron morton <aa...@thelastpickle.com> wrote:
>>> 
>>> What settings do you have for cassandra.range.batch.size
>>> 
>>> and rpc_timeout_in_ms  ? Have you tried reducing the first and/or increasing
>>> 
>>> the second ?
>>> 
>>> 
>>> Cheers
>>> 
>>> 
>>> -----------------
>>> 
>>> Aaron Morton
>>> 
>>> Freelance Developer
>>> 
>>> @aaronmorton
>>> 
>>> http://www.thelastpickle.com
>>> 
>>> 
>>> On 27/02/2012, at 8:02 PM, Patrik Modesto wrote:
>>> 
>>> 
>>> On Sun, Feb 26, 2012 at 04:25, Edward Capriolo <ed...@gmail.com>
>>> 
>>> wrote:
>>> 
>>> 
>>> Did you see the notes here?
>>> 
>>> 
>>> 
>>> I'm not sure what do you mean by the notes?
>>> 
>>> 
>>> I'm using the mapred.* settings suggested there:
>>> 
>>> 
>>>     <property>
>>> 
>>>         <name>mapred.max.tracker.failures</name>
>>> 
>>>         <value>20</value>
>>> 
>>>     </property>
>>> 
>>>     <property>
>>> 
>>>         <name>mapred.map.max.attempts</name>
>>> 
>>>         <value>20</value>
>>> 
>>>     </property>
>>> 
>>>     <property>
>>> 
>>>         <name>mapred.reduce.max.attempts</name>
>>> 
>>>         <value>20</value>
>>> 
>>>     </property>
>>> 
>>> 
>>> But I still see the timeouts that I haven't with cassandra-all 0.8.7.
>>> 
>>> 
>>> P.
>>> 
>>> 
>>> http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
>>> 
>>> 
>>> 
>>>