You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/09/23 06:24:20 UTC
[jira] [Comment Edited] (SPARK-17636) Parquet filter push down
doesn't handle struct fields
[ https://issues.apache.org/jira/browse/SPARK-17636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15515555#comment-15515555 ]
Hyukjin Kwon edited comment on SPARK-17636 at 9/23/16 6:23 AM:
---------------------------------------------------------------
Confirmation from committer : https://github.com/apache/spark/pull/14067#issuecomment-230784030
Confirmation from me : https://github.com/apache/spark/pull/14067#issuecomment-230739992
In more specific, this is not being pushed down to datasources in https://github.com/apache/spark/blob/de7df7defc99e04fefd990974151a701f64b75b4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L417-L489
it does not match to {{Attribute}} as it is {{GetStructField}}.
I took a rough scan and it seems I can't find the related JIRAs.
was (Author: hyukjin.kwon):
Confirmation from committer - https://github.com/apache/spark/pull/14067#issuecomment-230784030 (confirmation from committer)
Confrimation from me - https://github.com/apache/spark/pull/14067#issuecomment-230739992
In more specific, this is not being pushed down to datasources in https://github.com/apache/spark/blob/de7df7defc99e04fefd990974151a701f64b75b4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L417-L489
it does not match to {{Attribute}} as it is {{GetStructField}}.
I took a rough scan and it seems I can't find the related JIRAs.
> Parquet filter push down doesn't handle struct fields
> -----------------------------------------------------
>
> Key: SPARK-17636
> URL: https://issues.apache.org/jira/browse/SPARK-17636
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 1.6.2
> Reporter: Mitesh
> Priority: Minor
>
> Theres a *PushedFilters* for a simple numeric field, but not for a numeric field inside a struct. Not sure if this is a Spark limitation because of Parquet, or only a Spark limitation.
> {quote}
> scala> hc.read.parquet("s3a://some/parquet/file").select("day_timestamp", "sale_id")
> res5: org.apache.spark.sql.DataFrame = [day_timestamp: struct<timestamp:bigint,timezone:string>, sale_id: bigint]
> scala> res5.filter("sale_id > 4").queryExecution.executedPlan
> res9: org.apache.spark.sql.execution.SparkPlan =
> Filter[23814] [args=(sale_id#86324L > 4)][outPart=UnknownPartitioning(0)][outOrder=List()]
> +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: s3a://some/parquet/file, PushedFilters: [GreaterThan(sale_id,4)]
> scala> res5.filter("day_timestamp.timestamp > 4").queryExecution.executedPlan
> res10: org.apache.spark.sql.execution.SparkPlan =
> Filter[23815] [args=(day_timestamp#86302.timestamp > 4)][outPart=UnknownPartitioning(0)][outOrder=List()]
> +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: s3a://some/parquet/file
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org