You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "peay (JIRA)" <ji...@apache.org> on 2017/07/27 17:00:09 UTC

[jira] [Created] (SPARK-21550) approxQuantiles throws "next on empty iterator" on empty data

peay created SPARK-21550:
----------------------------

             Summary: approxQuantiles throws "next on empty iterator" on empty data
                 Key: SPARK-21550
                 URL: https://issues.apache.org/jira/browse/SPARK-21550
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: peay


The documentation says:
{code}
null and NaN values will be removed from the numerical column before calculation. If
the dataframe is empty or the column only contains null or NaN, an empty array is returned.
{code}

However, this small pyspark example
{code}
sql_context.range(10).filter(col("id") == 42).approxQuantile("id", [0.99], 0.001)
{code}

throws

{code}
Py4JJavaError: An error occurred while calling o3493.approxQuantile.
: java.util.NoSuchElementException: next on empty iterator
	at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
	at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
	at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
	at scala.collection.IterableLike$class.head(IterableLike.scala:107)
	at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$head(ArrayOps.scala:186)
	at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOptimized.scala:126)
	at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.last(TraversableLike.scala:431)
	at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$last(ArrayOps.scala:186)
	at scala.collection.IndexedSeqOptimized$class.last(IndexedSeqOptimized.scala:132)
	at scala.collection.mutable.ArrayOps$ofRef.last(ArrayOps.scala:186)
	at org.apache.spark.sql.catalyst.util.QuantileSummaries.query(QuantileSummaries.scala:207)
	at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply$mcDD$sp(StatFunctions.scala:92)
	at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92)
	at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$multipleApproxQuantiles$1$$anonfun$apply$1.apply(StatFunctions.scala:92)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org