You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by innowireless TaeYun Kim <ta...@innowireless.co.kr> on 2014/07/25 14:34:17 UTC

Strange exception on coalesce()

Hi,
I'm using Spark 1.0.0.

On filter() - map() - coalesce() - saveAsText() sequence, the following
exception is thrown.

Exception in thread "main" java.util.NoSuchElementException: None.get
    at scala.None$.get(Option.scala:313)
    at scala.None$.get(Option.scala:311)
    at
org.apache.spark.rdd.PartitionCoalescer.setupGroups(CoalescedRDD.scala:270)
    at org.apache.spark.rdd.PartitionCoalescer.run(CoalescedRDD.scala:337)
    at
org.apache.spark.rdd.CoalescedRDD.getPartitions(CoalescedRDD.scala:83)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
    at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1086)
    at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.s
cala:788)
    at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scal
a:674)
    at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scal
a:593)
    at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1068)
    at
org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala
:436)
    at org.apache.spark.api.java.JavaRDD.saveAsTextFile(JavaRDD.scala:29)

The partition count of the original rdd is 306.

When the argument of coalesce() is one of 59, 60, 61, 62, 63, the exception
above is thrown.

But the argument is one of 50, 55, 58, 64, 65, 80, 100, the exception is not
thrown. (I've not tried other values, I think that they will be ok.)

Is there any magic number for the argument of coalesce() ?

Thanks.

RE: Strange exception on coalesce()

Posted by innowireless TaeYun Kim <ta...@innowireless.co.kr>.

Thank you. It works.
(I've applied the changed source code to my local 1.0.0 source)

-----Original Message-----
From: Sean Owen [mailto:sowen@cloudera.com] 
Sent: Friday, July 25, 2014 11:47 PM
To: user@spark.apache.org
Subject: Re: Strange exception on coalesce()

I'm pretty sure this was already fixed last week in SPARK-2414:
https://github.com/apache/spark/commit/7c23c0dc3ed721c95690fc49f435d9de6952523c

On Fri, Jul 25, 2014 at 1:34 PM, innowireless TaeYun Kim <ta...@innowireless.co.kr> wrote:
> Hi,
> I'm using Spark 1.0.0.
>
> On filter() - map() - coalesce() - saveAsText() sequence, the 
> following exception is thrown.
>
> Exception in thread "main" java.util.NoSuchElementException: None.get
>     at scala.None$.get(Option.scala:313)
>     at scala.None$.get(Option.scala:311)
>     at
> org.apache.spark.rdd.PartitionCoalescer.setupGroups(CoalescedRDD.scala:270)
>     at org.apache.spark.rdd.PartitionCoalescer.run(CoalescedRDD.scala:337)
>     at
> org.apache.spark.rdd.CoalescedRDD.getPartitions(CoalescedRDD.scala:83)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>     at scala.Option.getOrElse(Option.scala:120)
>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>     at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>     at scala.Option.getOrElse(Option.scala:120)
>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:1086)
>     at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunct
> ions.s
> cala:788)
>     at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunction
> s.scal
> a:674)
>     at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunction
> s.scal
> a:593)
>     at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1068)
>     at
> org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike
> .scala
> :436)
>     at 
> org.apache.spark.api.java.JavaRDD.saveAsTextFile(JavaRDD.scala:29)
>
> The partition count of the original rdd is 306.
>
> When the argument of coalesce() is one of 59, 60, 61, 62, 63, the 
> exception above is thrown.
>
> But the argument is one of 50, 55, 58, 64, 65, 80, 100, the exception 
> is not thrown. (I've not tried other values, I think that they will be 
> ok.)
>
> Is there any magic number for the argument of coalesce() ?
>
> Thanks.
>
>

Re: Strange exception on coalesce()

Posted by Sean Owen <so...@cloudera.com>.

I'm pretty sure this was already fixed last week in SPARK-2414:
https://github.com/apache/spark/commit/7c23c0dc3ed721c95690fc49f435d9de6952523c

On Fri, Jul 25, 2014 at 1:34 PM, innowireless TaeYun Kim
<ta...@innowireless.co.kr> wrote:
> Hi,
> I'm using Spark 1.0.0.
>
> On filter() - map() - coalesce() - saveAsText() sequence, the following
> exception is thrown.
>
> Exception in thread "main" java.util.NoSuchElementException: None.get
>     at scala.None$.get(Option.scala:313)
>     at scala.None$.get(Option.scala:311)
>     at
> org.apache.spark.rdd.PartitionCoalescer.setupGroups(CoalescedRDD.scala:270)
>     at org.apache.spark.rdd.PartitionCoalescer.run(CoalescedRDD.scala:337)
>     at
> org.apache.spark.rdd.CoalescedRDD.getPartitions(CoalescedRDD.scala:83)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>     at scala.Option.getOrElse(Option.scala:120)
>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>     at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>     at scala.Option.getOrElse(Option.scala:120)
>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:1086)
>     at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.s
> cala:788)
>     at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scal
> a:674)
>     at
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scal
> a:593)
>     at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1068)
>     at
> org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala
> :436)
>     at org.apache.spark.api.java.JavaRDD.saveAsTextFile(JavaRDD.scala:29)
>
> The partition count of the original rdd is 306.
>
> When the argument of coalesce() is one of 59, 60, 61, 62, 63, the exception
> above is thrown.
>
> But the argument is one of 50, 55, 58, 64, 65, 80, 100, the exception is not
> thrown. (I've not tried other values, I think that they will be ok.)
>
> Is there any magic number for the argument of coalesce() ?
>
> Thanks.
>
>

RE: Strange exception on coalesce()

Posted by innowireless TaeYun Kim <ta...@innowireless.co.kr>.

(Sorry for resending, I've reformatted the text as HTML.)

 

Hi,

I'm using Spark 1.0.0.

 

On filter() - map() - coalesce() - saveAsText() sequence, the following
exception is thrown.

 

 

Exception in thread "main" java.util.NoSuchElementException: None.get

    at scala.None$.get(Option.scala:313)

    at scala.None$.get(Option.scala:311)

    at
org.apache.spark.rdd.PartitionCoalescer.setupGroups(CoalescedRDD.scala:270)

    at org.apache.spark.rdd.PartitionCoalescer.run(CoalescedRDD.scala:337)

    at
org.apache.spark.rdd.CoalescedRDD.getPartitions(CoalescedRDD.scala:83)

    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)

    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)

    at scala.Option.getOrElse(Option.scala:120)

    at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)

    at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)

    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)

    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)

    at scala.Option.getOrElse(Option.scala:120)

    at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)

    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1086)

    at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.s
cala:788)

    at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scal
a:674)

    at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scal
a:593)

    at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1068)

    at
org.apache.spark.api.java.JavaRDDLike$class.saveAsTextFile(JavaRDDLike.scala
:436)

    at org.apache.spark.api.java.JavaRDD.saveAsTextFile(JavaRDD.scala:29)

 

 

The partition count of the original rdd is 306.

 

When the argument of coalesce() is one of 59, 60, 61, 62, 63, the exception
above is thrown.

 

But the argument is one of 50, 55, 58, 64, 65, 80, 100, the exception is not
thrown. (I've not tried other values, I think that they will be ok.)

 

Is there any magic number for the argument of coalesce() ?

 

Thanks.