You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gheorghe Gheorghe (JIRA)" <ji...@apache.org> on 2017/07/12 15:58:00 UTC

[jira] [Updated] (SPARK-21390) Dataset filter api inconsistency

     [ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gheorghe Gheorghe updated SPARK-21390:
--------------------------------------
    Description: 
Hello everybody, 

I've encountered a strange situation with the spark-shell.
When I run the code below in my IDE the second test case prints as expected count "1". However, when I run the same code using the spark-shell in the second test case I get 0 back as a count. 
I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and spark-shell. 


{code:java}
  import org.apache.spark.sql.Dataset

  case class SomeClass(field1:String, field2:String)

  val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )

  // Test 1
  val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
  
  println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
  
  // Test 2
  case class OtherClass(field1:String, field2:String)
  
  val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS

  println("Fail, count should return 1: " + filterMe2.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
{code}

Note if I do this it is printing 1 as expected.
{code:java}
 println(filterMe2.map(x=> SomeClass(x.field1, x.field2)).filter(filterCondition.contains(_)).count)
{code}

Is this a bug? I can see that this filter function has been marked as experimental https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)

  was:
Hello everybody, 

I've encountered a strange situation with spark 2.0.1 in spark-shell. 
When I run the code below in my IDE I get the in the second test case as expected 1. However, when I run the spark shell with the same code the second test case is returning 0. 
I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and spark-shell. 


{code:java}
  import org.apache.spark.sql.Dataset

  case class SomeClass(field1:String, field2:String)

  val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )

  // Test 1
  val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
  
  println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
  
  // Test 2
  case class OtherClass(field1:String, field2:String)
  
  val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS

  println("Fail, count should return 1: " + filterMe2.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
{code}

Note if I do this it is printing 1 as expected.
{code:java}
 println(filterMe2.map(x=> SomeClass(x.field1, x.field2)).filter(filterCondition.contains(_)).count)
{code}

Is this a bug? I can see that this filter function has been marked as experimental https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)


> Dataset filter api inconsistency
> --------------------------------
>
>                 Key: SPARK-21390
>                 URL: https://issues.apache.org/jira/browse/SPARK-21390
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.1
>            Reporter: Gheorghe Gheorghe
>            Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected count "1". However, when I run the same code using the spark-shell in the second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I do this it is printing 1 as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as experimental https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org