You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Alex Nastetsky <al...@verve.com> on 2019/02/13 22:32:31 UTC
"where" clause able to access fields not in its schema
I don't know if this is a bug or a feature, but it's a bit counter-intuitive when reading code.
The "b" dataframe does not have field "bar" in its schema, but is still able to filter on that field.
scala> val a = sc.parallelize(Seq((1,10),(2,20))).toDF("foo","bar")
a: org.apache.spark.sql.DataFrame = [foo: int, bar: int]
scala> a.show
+---+---+
|foo|bar|
+---+---+
| 1| 10|
| 2| 20|
+---+---+
scala> val b = a.select($"foo")
b: org.apache.spark.sql.DataFrame = [foo: int]
scala> b.schema
res3: org.apache.spark.sql.types.StructType = StructType(StructField(foo,IntegerType,false))
scala> b.select($"bar").show
org.apache.spark.sql.AnalysisException: cannot resolve '`bar`' given input columns: [foo];;
[...snip...]
scala> b.where($"bar" === 20).show
+---+
|foo|
+---+
| 2|
+---+
Re: "where" clause able to access fields not in its schema
Posted by Yeikel <em...@yeikel.com>.
It seems that we are using the function incorrectly.
val a = Seq((1,10),(2,20)).toDF("foo","bar")
val b = a.select($"foo")
val c = b.where(b("bar") === 20)
c.show
Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot
resolve column name "bar" among (foo);
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: "where" clause able to access fields not in its schema
Posted by Vadim Semenov <va...@datadoghq.com>.
Yeah, the filter gets infront of the select after analyzing
scala> b.where($"bar" === 20).explain(true)
== Parsed Logical Plan ==
'Filter ('bar = 20)
+- AnalysisBarrier
+- Project [foo#6]
+- Project [_1#3 AS foo#6, _2#4 AS bar#7]
+- SerializeFromObject [assertnotnull(assertnotnull(input[0,
scala.Tuple2, true]))._1 AS _1#3, assertnotnull(assertnotnull(input[0,
scala.Tuple2, true]))._2 AS _2#4]
+- ExternalRDD [obj#2]
== Analyzed Logical Plan ==
foo: int
Project [foo#6]
+- Filter (bar#7 = 20)
+- Project [foo#6, bar#7]
+- Project [_1#3 AS foo#6, _2#4 AS bar#7]
+- SerializeFromObject [assertnotnull(assertnotnull(input[0,
scala.Tuple2, true]))._1 AS _1#3, assertnotnull(assertnotnull(input[0,
scala.Tuple2, true]))._2 AS _2#4]
+- ExternalRDD [obj#2]
== Optimized Logical Plan ==
Project [_1#3 AS foo#6]
+- Filter (_2#4 = 20)
+- SerializeFromObject [assertnotnull(input[0, scala.Tuple2, true])._1
AS _1#3, assertnotnull(input[0, scala.Tuple2, true])._2 AS _2#4]
+- ExternalRDD [obj#2]
== Physical Plan ==
*(1) Project [_1#3 AS foo#6]
+- *(1) Filter (_2#4 = 20)
+- *(1) SerializeFromObject [assertnotnull(input[0, scala.Tuple2,
true])._1 AS _1#3, assertnotnull(input[0, scala.Tuple2, true])._2 AS _2#4]
+- Scan ExternalRDDScan[obj#2]
On Wed, Feb 13, 2019 at 8:04 PM Yeikel <em...@yeikel.com> wrote:
> This is indeed strange. To add to the question , I can see that if I use a
> filter I get an exception (as expected) , so I am not sure what's the
> difference between the where clause and filter :
>
>
> b.filter(s=> {
> val bar : String = s.getAs("bar")
>
> bar.equals("20")
> }).show
>
> * java.lang.IllegalArgumentException: Field "bar" does not exist.*
>
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
--
Sent from my iPhone
Re: "where" clause able to access fields not in its schema
Posted by Yeikel <em...@yeikel.com>.
This is indeed strange. To add to the question , I can see that if I use a
filter I get an exception (as expected) , so I am not sure what's the
difference between the where clause and filter :
b.filter(s=> {
val bar : String = s.getAs("bar")
bar.equals("20")
}).show
* java.lang.IllegalArgumentException: Field "bar" does not exist.*
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org