You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liwei Lin (JIRA)" <ji...@apache.org> on 2016/07/05 03:31:11 UTC
[jira] [Commented] (SPARK-16371) IS NOT NULL clause gives false for
nested not empty column
[ https://issues.apache.org/jira/browse/SPARK-16371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361946#comment-15361946 ]
Liwei Lin commented on SPARK-16371:
-----------------------------------
Hi [~maver1ck], I can not reproduce this issue (see codes below) ?
{code}
object SPARK_16371 extends App {
import org.apache.spark.sql.SparkSession
case class Parent(a: Child)
case class Child(b: Long)
val spark = SparkSession.builder().master("local").getOrCreate()
import spark.implicits._
// write
spark.range(1000000).map(num => Parent(Child(num))).write.mode("overwrite").parquet("1m_parquet")
// read
// ---
// Parquet form:
// message spark_schema {
// optional group a {
// optional int64 b;
// }
// }
//
// Catalyst form:
// StructType(StructField(a,StructType(StructField(b,LongType,true)),true))
// ---
println(spark.read.parquet("1m_parquet").where("a.b is not null").count())
// console prints 1000000
}
{code}
> IS NOT NULL clause gives false for nested not empty column
> ----------------------------------------------------------
>
> Key: SPARK-16371
> URL: https://issues.apache.org/jira/browse/SPARK-16371
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Maciej BryĆski
> Priority: Critical
>
> I have df where column1 is struct type and there is 1M rows.
> (sample data from https://issues.apache.org/jira/browse/SPARK-16320)
> {code}
> df.where("column1 is not null").count()
> {code}
> gives:
> 1M in Spark 1.6
> *0* in Spark 2.0
> Is there a change in IS NOT NULL behaviour in Spark 2.0 ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org