You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by andreweduffy <gi...@git.apache.org> on 2016/07/12 19:48:27 UTC

[GitHub] spark pull request #14159: [PARQUET] Fix for Parquet filter pushdown

GitHub user andreweduffy opened a pull request:

    https://github.com/apache/spark/pull/14159

    [PARQUET] Fix for Parquet filter pushdown

    ## What changes were proposed in this pull request?
    
    Fix parquet filter pushdown from not reaching all the way down to the file level
    
    Use of previous deprecated constructor defaults to null metadata, which
    prevents pushdown from reaching the Parquet level.
    
    ## How was this patch tested?
    
    Looking at output of collects from SparkShell, before were printing warnings about CorruptStatistics, preventing pushing down filters to individual parquet files. Now able to use the metadata in each file to pushdown.
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andreweduffy/spark bugfix/pushdown

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14159.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14159
    
----
commit f825ad709cdc3c89d0cc7e41d0410998e6cc7541
Author: Andrew Duffy <ad...@palantir.com>
Date:   2016-07-12T19:41:22Z

    Fix for Parquet filter pushdown
    
    Use of previous deprecated constructor defaults to null metadata, which
    prevents pushdown from reaching the Parquet level.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter pushdow...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    https://github.com/apache/spark/pull/14160 solves the same thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [PARQUET] Fix for Parquet filter pushdown

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter pushdow...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    **[Test build #3211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3211/consoleFull)** for PR 14159 at commit [`e64251a`](https://github.com/apache/spark/commit/e64251ad4bb3e38db0dc349084059c66db5c10e7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter pushdow...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    **[Test build #3211 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3211/consoleFull)** for PR 14159 at commit [`e64251a`](https://github.com/apache/spark/commit/e64251ad4bb3e38db0dc349084059c66db5c10e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter pushdow...

Posted by andreweduffy <gi...@git.apache.org>.
Github user andreweduffy commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    Yep, looks like the other one was closed by the committer. I saw Sean commented that this might need to be tested against 2.2, is that going to be necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter pushdow...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    Ok :)... Could you close then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter pushdow...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    @andreweduffy not much. I'll trigger a test.
    
    @liancheng could you take a look at this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter pushdow...

Posted by andreweduffy <gi...@git.apache.org>.
Github user andreweduffy commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    Actually it appears that since this was opened, later PR #14450 fixes this. Should be safe to close now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter pushdow...

Posted by andreweduffy <gi...@git.apache.org>.
Github user andreweduffy commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    @hvanhovell Anything new on this front?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter pushdow...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    Could you open a JIRA or add the existing JIRA to the PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter pushdow...

Posted by andreweduffy <gi...@git.apache.org>.
Github user andreweduffy commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    bump?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter ...

Posted by andreweduffy <gi...@git.apache.org>.
Github user andreweduffy closed the pull request at:

    https://github.com/apache/spark/pull/14159


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14159: [SQL][PARQUET] Fix for Vectorized Parquet filter pushdow...

Posted by andreweduffy <gi...@git.apache.org>.
Github user andreweduffy commented on the issue:

    https://github.com/apache/spark/pull/14159
  
    Yep, closing now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org