You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/02/13 06:10:00 UTC
[jira] [Updated] (SPARK-26865) DataSourceV2Strategy should push
normalized filters
[ https://issues.apache.org/jira/browse/SPARK-26865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-26865:
----------------------------------
Summary: DataSourceV2Strategy should push normalized filters (was: SupportsPushDownFilters should push normalized filters)
> DataSourceV2Strategy should push normalized filters
> ---------------------------------------------------
>
> Key: SPARK-26865
> URL: https://issues.apache.org/jira/browse/SPARK-26865
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Wenchen Fan
> Priority: Major
>
> Although we designed `SupportsPushDownFilters` in the same way by using `Filter`. DSv1 and DSv2 passes different filters.
> {code}
> /**
> * Pushes down filters, and returns filters that need to be evaluated after scanning.
> */
> Filter[] pushFilters(Filter[] filters);
> {code}
> Specifically, DSv2 doesn't guarantee that filter expressions match the underlying schema in terms of case-sensitivity.
> {code}
> buildReaderWithPartitionValues(..., filters: Seq[Filter], ...)
> - IsNotNull(ID)
> DataSourceV2Strategy.pushFilters
> - IsNotNull(id)
> {code}
> steps to reproduce:
> {code}
> spark.range(10).write.orc("/tmp/o1")
> spark.read.schema("ID long").orc("/tmp/o1").filter("id > 5").show
> java.util.NoSuchElementException: key not found: id
> at scala.collection.immutable.Map$Map1.apply(Map.scala:114)
> at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createBuilder(OrcFilters.scala:263)
> at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildSearchArgument(OrcFilters.scala:153)
> at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$1(OrcFilters.scala:99)
> at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
> at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:39)
> at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
> at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
> at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:98)
> at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:87)
> at org.apache.spark.sql.execution.datasources.v2.orc.OrcScanBuilder.pushFilters(OrcScanBuilder.scala:50)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org