You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ericjohnston1989 <er...@gmail.com> on 2014/04/18 23:32:35 UTC

Calliope Frame size larger than max length

Hey all,

I'm working with Calliope to run jobs on a Cassandra cluster in standalone
mode. On some larger jobs I run into the following error:

java.lang.RuntimeException: Frame size (20667866) larger than max length
(15728640)!
	at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
	at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:322)
	at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:289)
	at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
	at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
	at
org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.nextKeyValue(CqlPagingRecordReader.java:205)
	at
com.tuplejump.calliope.cql3.Cql3CassandraRDD$$anon$1.hasNext(Cql3CassandraRDD.scala:73)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:724)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:720)
	at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
	at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
	at org.apache.spark.scheduler.Task.run(Task.scala:53)
	at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
	at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
	at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:724)


The max frame size (15728640) is 15mb, which is the default frame size
Cassandra uses. Has anyone seen this before? Are there common workarounds?
Also, I'd much rather not have to poke around changing Cassandra settings,
but I can change spark settings as much as I like.

My program itself is extremely simple since I'm testing. I'm just using
count() on the RDD I created with casbuilder.

Thanks,

Eric






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Calliope-Frame-size-larger-than-max-length-tp4469.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Calliope Frame size larger than max length

Posted by Rohit Rai <ro...@tuplejump.com>.
Hello Eric,

This happens when the data being fetched from Cassandra in single split is
greater than the maximum framesize allowed in thrift (yes it still uses
thrift underneath, until the next release when we will start using Native
CQL).

Generally, we do set the the Cassandra the framesize in Cassandra when
using it with Spark/Hadoop to 32MB or larger depending on our data model
and row size.

If you don't want to touch the Cassandra configuration you will have to
reduce the page size in use. The default here is 1000 CQL rows.
By the sizes mentioned in error message (20MB vs 15MB) I would suggest
setting the page size to 700 or lesser.

This can be done by using pageSize method in CasBuilder.

cqlCas.pageSize(700)


Cheers,
Rohit



*Founder & CEO, **Tuplejump, Inc.*
____________________________
www.tuplejump.com
*The Data Engineering Platform*


On Sat, Apr 19, 2014 at 3:02 AM, ericjohnston1989 <
ericjohnston1989@gmail.com> wrote:

> Hey all,
>
> I'm working with Calliope to run jobs on a Cassandra cluster in standalone
> mode. On some larger jobs I run into the following error:
>
> java.lang.RuntimeException: Frame size (20667866) larger than max length
> (15728640)!
>         at
>
> org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
>         at
>
> org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:322)
>         at
>
> org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:289)
>         at
>
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>         at
>
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>         at
>
> org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.nextKeyValue(CqlPagingRecordReader.java:205)
>         at
>
> com.tuplejump.calliope.cql3.Cql3CassandraRDD$$anon$1.hasNext(Cql3CassandraRDD.scala:73)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:724)
>         at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:720)
>         at
>
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
>         at
>
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
>         at
>
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
>         at
>
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
>         at
>
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
>
> The max frame size (15728640) is 15mb, which is the default frame size
> Cassandra uses. Has anyone seen this before? Are there common workarounds?
> Also, I'd much rather not have to poke around changing Cassandra settings,
> but I can change spark settings as much as I like.
>
> My program itself is extremely simple since I'm testing. I'm just using
> count() on the RDD I created with casbuilder.
>
> Thanks,
>
> Eric
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Calliope-Frame-size-larger-than-max-length-tp4469.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>