You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2016/03/15 00:14:33 UTC
[jira] [Updated] (SPARK-13531) Some DataFrame joins stopped working with UnsupportedOperationException: No size estimation available for objects

     [ https://issues.apache.org/jira/browse/SPARK-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Armbrust updated SPARK-13531:
-------------------------------------
    Target Version/s: 2.0.0

> Some DataFrame joins stopped working with UnsupportedOperationException: No size estimation available for objects
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-13531
>                 URL: https://issues.apache.org/jira/browse/SPARK-13531
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: koert kuipers
>            Priority: Minor
>
> this is using spark 2.0.0-SNAPSHOT
> dataframe df1:
> schema:
> {noformat}StructType(StructField(x,IntegerType,true)){noformat}
> explain:
> {noformat}== Physical Plan ==
> MapPartitions <function1>, obj#135: object, [if (input[0, object].isNullAt) null else input[0, object].get AS x#128]
> +- MapPartitions <function1>, createexternalrow(if (isnull(x#9)) null else x#9), [input[0, object] AS obj#135]
>    +- WholeStageCodegen
>       :  +- Project [_1#8 AS x#9]
>       :     +- Scan ExistingRDD[_1#8]{noformat}
> show:
> {noformat}+---+
> |  x|
> +---+
> |  2|
> |  3|
> +---+{noformat}
> dataframe df2:
> schema:
> {noformat}StructType(StructField(x,IntegerType,true), StructField(y,StringType,true)){noformat}
> explain:
> {noformat}== Physical Plan ==
> MapPartitions <function1>, createexternalrow(x#2, if (isnull(y#3)) null else y#3.toString), [if (input[0, object].isNullAt) null else input[0, object].get AS x#130,if (input[0, object].isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, object].get, true) AS y#131]
> +- WholeStageCodegen
>    :  +- Project [_1#0 AS x#2,_2#1 AS y#3]
>    :     +- Scan ExistingRDD[_1#0,_2#1]{noformat}
> show:
> {noformat}+---+---+
> |  x|  y|
> +---+---+
> |  1|  1|
> |  2|  2|
> |  3|  3|
> +---+---+{noformat}
> i run:
> df1.join(df2, Seq("x")).show
> i get:
> {noformat}java.lang.UnsupportedOperationException: No size estimation available for objects.
> at org.apache.spark.sql.types.ObjectType.defaultSize(ObjectType.scala:41)
> at org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at org.apache.spark.sql.catalyst.plans.logical.UnaryNode$$anonfun$6.apply(LogicalPlan.scala:323)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
> at scala.collection.immutable.List.map(List.scala:285)
> at org.apache.spark.sql.catalyst.plans.logical.UnaryNode.statistics(LogicalPlan.scala:323)
> at org.apache.spark.sql.execution.SparkStrategies$CanBroadcast$.unapply(SparkStrategies.scala:87){noformat}
> now sure what changed, this ran about a week ago without issues (in our internal unit tests). it is fully reproducible, however when i tried to minimize the issue i could not reproduce it by just creating data frames in the repl with the same contents, so it probably has something to do with way these are created (from Row objects and StructTypes).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org