You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeremy Hanna (JIRA)" <ji...@apache.org> on 2011/07/17 02:07:59 UTC
[jira] [Created] (CASSANDRA-2905) Add retry logic to
ColumnFamilyRecordReader
Add retry logic to ColumnFamilyRecordReader
-------------------------------------------
Key: CASSANDRA-2905
URL: https://issues.apache.org/jira/browse/CASSANDRA-2905
Project: Cassandra
Issue Type: Improvement
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
One thing that would improve the built-in ColumnFamilyRecordReader is some retry logic if it times out on hasNext. It could help in addition to setting the rpc_timeout_in_ms, so that timeouts happen less frequently so there are fewer blacklisted task trackers (which are the result of an error, including the timeout).
{code}
java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242) ... 17 more
{code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2905) Add retry logic to
ColumnFamilyRecordReader
Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeremy Hanna updated CASSANDRA-2905:
------------------------------------
Priority: Minor (was: Major)
> Add retry logic to ColumnFamilyRecordReader
> -------------------------------------------
>
> Key: CASSANDRA-2905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2905
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jeremy Hanna
> Assignee: Jeremy Hanna
> Priority: Minor
> Labels: hadoop
>
> One thing that would improve the built-in ColumnFamilyRecordReader is some retry logic if it times out on hasNext. It could help in addition to setting the rpc_timeout_in_ms, so that timeouts happen less frequently so there are fewer blacklisted task trackers (which are the result of an error, including the timeout).
> {quote}
> java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242) ... 17 more
> {quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2905) Add retry logic to
ColumnFamilyRecordReader
Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067315#comment-13067315 ]
Brandon Williams commented on CASSANDRA-2905:
---------------------------------------------
Since we don't know how many times we should retry, or how long to wait between retries, we should expose these tunables to the job configuration.
> Add retry logic to ColumnFamilyRecordReader
> -------------------------------------------
>
> Key: CASSANDRA-2905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2905
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jeremy Hanna
> Assignee: Jeremy Hanna
> Priority: Minor
> Labels: hadoop
>
> One thing that would improve the built-in ColumnFamilyRecordReader is some retry logic if it times out on hasNext. It could help in addition to setting the rpc_timeout_in_ms, so that timeouts happen less frequently so there are fewer blacklisted task trackers (which are the result of an error, including the timeout).
> {quote}
> java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242) ... 17 more
> {quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2905) Add retry logic to
ColumnFamilyRecordReader
Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067323#comment-13067323 ]
Jeremy Hanna commented on CASSANDRA-2905:
-----------------------------------------
Good point. I'll put that in there with the default just being current behavior to begin with.
> Add retry logic to ColumnFamilyRecordReader
> -------------------------------------------
>
> Key: CASSANDRA-2905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2905
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jeremy Hanna
> Assignee: Jeremy Hanna
> Priority: Minor
> Labels: hadoop
>
> One thing that would improve the built-in ColumnFamilyRecordReader is some retry logic if it times out on hasNext. It could help in addition to setting the rpc_timeout_in_ms, so that timeouts happen less frequently so there are fewer blacklisted task trackers (which are the result of an error, including the timeout).
> {quote}
> java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242) ... 17 more
> {quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2905) Add retry logic to
ColumnFamilyRecordReader
Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeremy Hanna updated CASSANDRA-2905:
------------------------------------
Description:
One thing that would improve the built-in ColumnFamilyRecordReader is some retry logic if it times out on hasNext. It could help in addition to setting the rpc_timeout_in_ms, so that timeouts happen less frequently so there are fewer blacklisted task trackers (which are the result of an error, including the timeout).
{quote}
java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242) ... 17 more
{quote}
was:
One thing that would improve the built-in ColumnFamilyRecordReader is some retry logic if it times out on hasNext. It could help in addition to setting the rpc_timeout_in_ms, so that timeouts happen less frequently so there are fewer blacklisted task trackers (which are the result of an error, including the timeout).
{code}
java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242) ... 17 more
{code}
> Add retry logic to ColumnFamilyRecordReader
> -------------------------------------------
>
> Key: CASSANDRA-2905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2905
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jeremy Hanna
> Assignee: Jeremy Hanna
> Labels: hadoop
>
> One thing that would improve the built-in ColumnFamilyRecordReader is some retry logic if it times out on hasNext. It could help in addition to setting the rpc_timeout_in_ms, so that timeouts happen less frequently so there are fewer blacklisted task trackers (which are the result of an error, including the timeout).
> {quote}
> java.lang.RuntimeException: TimedOutException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242) ... 17 more
> {quote}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira