You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Alex Nastetsky <al...@verve.com> on 2019/02/13 22:32:31 UTC

"where" clause able to access fields not in its schema

I don't know if this is a bug or a feature, but it's a bit counter-intuitive when reading code.

The "b" dataframe does not have field "bar" in its schema, but is still able to filter on that field.

scala> val a = sc.parallelize(Seq((1,10),(2,20))).toDF("foo","bar")
a: org.apache.spark.sql.DataFrame = [foo: int, bar: int]

scala> a.show
+---+---+
|foo|bar|
+---+---+
|  1| 10|
|  2| 20|
+---+---+

scala> val b = a.select($"foo")
b: org.apache.spark.sql.DataFrame = [foo: int]

scala> b.schema
res3: org.apache.spark.sql.types.StructType = StructType(StructField(foo,IntegerType,false))

scala> b.select($"bar").show
org.apache.spark.sql.AnalysisException: cannot resolve '`bar`' given input columns: [foo];;
[...snip...]

scala> b.where($"bar" === 20).show
+---+
|foo|
+---+
|  2|
+---+


Re: "where" clause able to access fields not in its schema

Posted by Yeikel <em...@yeikel.com>.
It seems that we are using the function incorrectly. 


    val a = Seq((1,10),(2,20)).toDF("foo","bar")

    val b = a.select($"foo")

    val c = b.where(b("bar") === 20)

    c.show


  Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot
resolve column name "bar" among   (foo);






--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: "where" clause able to access fields not in its schema

Posted by Vadim Semenov <va...@datadoghq.com>.
Yeah, the filter gets infront of the select after analyzing

scala> b.where($"bar" === 20).explain(true)
== Parsed Logical Plan ==
'Filter ('bar = 20)
+- AnalysisBarrier
      +- Project [foo#6]
         +- Project [_1#3 AS foo#6, _2#4 AS bar#7]
            +- SerializeFromObject [assertnotnull(assertnotnull(input[0,
scala.Tuple2, true]))._1 AS _1#3, assertnotnull(assertnotnull(input[0,
scala.Tuple2, true]))._2 AS _2#4]
               +- ExternalRDD [obj#2]

== Analyzed Logical Plan ==
foo: int
Project [foo#6]
+- Filter (bar#7 = 20)
   +- Project [foo#6, bar#7]
      +- Project [_1#3 AS foo#6, _2#4 AS bar#7]
         +- SerializeFromObject [assertnotnull(assertnotnull(input[0,
scala.Tuple2, true]))._1 AS _1#3, assertnotnull(assertnotnull(input[0,
scala.Tuple2, true]))._2 AS _2#4]
            +- ExternalRDD [obj#2]

== Optimized Logical Plan ==
Project [_1#3 AS foo#6]
+- Filter (_2#4 = 20)
   +- SerializeFromObject [assertnotnull(input[0, scala.Tuple2, true])._1
AS _1#3, assertnotnull(input[0, scala.Tuple2, true])._2 AS _2#4]
      +- ExternalRDD [obj#2]

== Physical Plan ==
*(1) Project [_1#3 AS foo#6]
+- *(1) Filter (_2#4 = 20)
   +- *(1) SerializeFromObject [assertnotnull(input[0, scala.Tuple2,
true])._1 AS _1#3, assertnotnull(input[0, scala.Tuple2, true])._2 AS _2#4]
      +- Scan ExternalRDDScan[obj#2]

On Wed, Feb 13, 2019 at 8:04 PM Yeikel <em...@yeikel.com> wrote:

> This is indeed strange. To add to the question , I can see that if I use a
> filter I get an exception (as expected) , so I am not sure what's the
> difference between the  where clause and filter :
>
>
> b.filter(s=> {
> val bar : String = s.getAs("bar")
>
> bar.equals("20")
> }).show
>
> * java.lang.IllegalArgumentException: Field "bar" does not exist.*
>
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

-- 
Sent from my iPhone

Re: "where" clause able to access fields not in its schema

Posted by Yeikel <em...@yeikel.com>.
This is indeed strange. To add to the question , I can see that if I use a
filter I get an exception (as expected) , so I am not sure what's the
difference between the  where clause and filter : 


b.filter(s=> {
val bar : String = s.getAs("bar")

bar.equals("20")
}).show

* java.lang.IllegalArgumentException: Field "bar" does not exist.*





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org