You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/11/09 05:30:00 UTC

[jira] [Comment Edited] (SPARK-25976) Allow rdd.reduce on empty rdd by returning an Option[T]

    [ https://issues.apache.org/jira/browse/SPARK-25976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680889#comment-16680889 ] 

Hyukjin Kwon edited comment on SPARK-25976 at 11/9/18 5:29 AM:
---------------------------------------------------------------

Can you describe expected input and output?

Scala itself does not allow:

{code}
scala> Seq[Int]().reduce(_ + _)
java.lang.UnsupportedOperationException: empty.reduceLeft
  at scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:137)
  at scala.collection.immutable.List.reduceLeft(List.scala:84)
  at scala.collection.TraversableOnce$class.reduce(TraversableOnce.scala:208)
  at scala.collection.AbstractTraversable.reduce(Traversable.scala:104)
  ... 49 elided
{code}

I don't think we should fix this.


was (Author: hyukjin.kwon):
Can you describe expected input and output?

Scala itself does not allow:

{code}
scala> Seq().reduce(_ + _)
<console>:12: error: missing parameter type for expanded function ((x$1: <error>, x$2) => x$1.$plus(x$2))
       Seq().reduce(_ + _)
                    ^
<console>:12: error: missing parameter type for expanded function ((x$1: <error>, x$2: <error>) => x$1.$plus(x$2))
       Seq().reduce(_ + _)
                        ^
{code}

I don't think we should fix this.

> Allow rdd.reduce on empty rdd by returning an Option[T]
> -------------------------------------------------------
>
>                 Key: SPARK-25976
>                 URL: https://issues.apache.org/jira/browse/SPARK-25976
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.3.2
>            Reporter: Yuval Yaari
>            Priority: Minor
>
> it is sometimes useful to let the user decide what value to return when reducing on an empty rdd.
> currently, if there is no data to reduce an UnsupportedOperationException is thrown. 
> although user can catch that exception, it seems like a "shaky" solution as UnsupportedOperationException might be thrown from a different location.
> Instead, we can overload the reduce method by adding add a new method:
> reduce(f: (T, T) => T, defaultIfEmpty: () => T): T
> the reduce API will not be effected as it will simply call the second reduce method throwing an UnsupportedException as the default value
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org