You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by RussellSpitzer <gi...@git.apache.org> on 2016/01/08 02:39:28 UTC

[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

GitHub user RussellSpitzer opened a pull request:

    https://github.com/apache/spark/pull/10655

    SPARK-12639 SQL Improve Explain for Datasources with Handled Predicates

    SPARK-11661 Makes all predicates pushed down to underlying Datasources
    regardless of whether the source can handle them or not. This makes the
    explain command slightly confusing as it will always list all filters
    whether or not the underlying source can actually use them. Instead
    now we should only list those filters which are expressly handled by the
    underlying source.
    
    All predicates are pushed down so there really isn't any value in listing them.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/RussellSpitzer/spark SPARK-12639

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10655
    
----
commit 42cc0e76148c39d84a5635058698262862a60e6f
Author: Russell Spitzer <ru...@gmail.com>
Date:   2016-01-08T01:33:59Z

    SPARK-12639: Improve Explain for Datasources with Handled Predicates
    
    SPARK-11661 Makes all predicates pushed down to underlying Datasources
    regardless of whether the source can handle them or not. This makes the
    explain command slightly confusing as it will always list all filters
    whether or not the underlying source can actually use them. Instead
    now we should only list those filters which are expressly handled by the
    underlying source.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-170604593
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10655: [SPARK-12639][SQL] Improve Explain for Datasources with ...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the issue:

    https://github.com/apache/spark/pull/10655
  
    We fixed this on a different pr https://github.com/apache/spark/pull/11317


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10655: [SPARK-12639][SQL] Improve Explain for Datasources with ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10655
  
    **[Test build #65267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65267/consoleFull)** for PR 10655 at commit [`5a0daf6`](https://github.com/apache/spark/commit/5a0daf6590a711f376494419d5419ed2a2b7b26d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-169897099
  
    Can you in the pull request description includes a before/after change?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-172139084
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10655#discussion_r49150950
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
    @@ -321,8 +321,8 @@ private[sql] object DataSourceStrategy extends Strategy with Logging {
         val metadata: Map[String, String] = {
           val pairs = ArrayBuffer.empty[(String, String)]
     
    -      if (pushedFilters.nonEmpty) {
    -        pairs += (PUSHED_FILTERS -> pushedFilters.mkString("[", ", ", "]"))
    +      if (handledPredicates.nonEmpty) {
    +        pairs += (HANDLED_FILTERS -> handledPredicates.mkString("[", ", ", "]"))
    --- End diff --
    
    Should we also keep pushed filters? For some data source like orc, a pushed filter will be evaluated at a coarse grain level instead of on every rows. I think it is better to keep that information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-169862654
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10655#discussion_r49151802
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
    @@ -114,6 +114,7 @@ private[sql] object PhysicalRDD {
       // Metadata keys
       val INPUT_PATHS = "InputPaths"
       val PUSHED_FILTERS = "PushedFilters"
    +  val HANDLED_FILTERS = "HandledFilters"
    --- End diff --
    
    `HandledFilters` at here means filters that will be applied to every row inside the data source, right? Is there a better name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-217465276
  
    ping @RussellSpitzer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-172371851
  
    @RussellSpitzer Thank you for the comment. I totally agree with you.
    
    As mentioned before, my only concern is that for ORC/Parquet, we will not be able to see pushed filters in the explain output after the current change in this PR. As a user of Parquet/ORC, I do want to see that a filter has been pushed down even if this filter will not be applied to every row.
    
    How about we go with Asterisks for now? You may need to keep the expression that maps to a data source filter (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L536) to generate the string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-170611622
  
    **[Test build #49155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49155/consoleFull)** for PR 10655 at commit [`42cc0e7`](https://github.com/apache/spark/commit/42cc0e76148c39d84a5635058698262862a60e6f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-172140783
  
    @RussellSpitzer Thanks for the change. I have thought about it again. My only concern is that `unhandledPredicates` actually contains filters that can be handled by the data source. For example, filters in orc are applied to a more coarse grain level. Also, any data source that does not override `unhandledFilters` method will cause `HandledFilters` shows nothing, which can cause confusion.
    
    So, how about we still show `PUSHED_FILTERS`? But, we can add a special character (maybe `*`) to filters that belong to `unhandledPredicates` to indicate these filters may not be applied to every row?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10655: [SPARK-12639][SQL] Improve Explain for Datasource...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer closed the pull request at:

    https://github.com/apache/spark/pull/10655


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-170195637
  
    Thanks @RussellSpitzer.
    
    I will let @yhuai review and merge this. One question, do you know why the filter is "if (isnull(acc#2)) null else CASE 1000 WHEN 1 THEN acc#2 WHEN 0 THEN NOT acc#2 ELSE false"? Seems so complicated for "acc = 1000"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-170613562
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49155/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-172136871
  
    @yhuai I removed the PushedFilters and add the other examples. We could read-add the "PushedFilters" if you like. I wasn't sure if you still wanted that. I'm still not sure if it's very valuable info since everything is `Pushed` if I understood.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10655: SPARK-12639 SQL Improve Explain for Datasources with Han...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/10655
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-174386527
  
    no problem! Thank you :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-172139086
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49502/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-174113049
  
    Haven't forgotten this will have a new pr soon :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10655#discussion_r49341368
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
    @@ -321,8 +321,8 @@ private[sql] object DataSourceStrategy extends Strategy with Logging {
         val metadata: Map[String, String] = {
           val pairs = ArrayBuffer.empty[(String, String)]
     
    -      if (pushedFilters.nonEmpty) {
    -        pairs += (PUSHED_FILTERS -> pushedFilters.mkString("[", ", ", "]"))
    +      if (handledPredicates.nonEmpty) {
    +        pairs += (HANDLED_FILTERS -> handledPredicates.mkString("[", ", ", "]"))
    --- End diff --
    
    ah sorry. I think I understand the change now. `handledPredicates` contains all filters that are pushed to the data source except those filters returned by the unhandledFilters method. I think the change is good and `HandledFilters` is a proper name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-170613543
  
    **[Test build #49155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49155/consoleFull)** for PR 10655 at commit [`42cc0e7`](https://github.com/apache/spark/commit/42cc0e76148c39d84a5635058698262862a60e6f).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-172145152
  
    I personally think the ambiguous `PUSHED_FILTERS` is more confusing. When we see a predicate there we have no idea whether or not it is a valid filter for the source at all. Like in the C* case this could contain clauses which have no way have being actually pushed down to the source. 
    
    In my mind their are 3 Categories of predicates
    * Those which cannot be pushed to the source at all
    * Those which can be pushed to the source but may have false positives 
    * Those which can be pushed to the source and filter completely 
    
    Currently we can only tell whether or not a predicate is in on of the first two categories or if it is in the third. This leaves is awkwardly stating that the source has had a predicate `Pushed` to it even when that is impossible. I like just stating the Third category because thats the only thing we truly can be sure of given the current code. It would be better if the underlying source was able to qualify all filters into the above categories.
    
    So to me it is more confusing to say something is `Pushed` when it technically can't be than to say something is not `Pushed` when it might be. But ymmv
    
    If you want to just go with Asterisks thats fine with me too just wanted to make my argument :D


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10655#discussion_r49153042
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
    @@ -114,6 +114,7 @@ private[sql] object PhysicalRDD {
       // Metadata keys
       val INPUT_PATHS = "InputPaths"
       val PUSHED_FILTERS = "PushedFilters"
    +  val HANDLED_FILTERS = "HandledFilters"
    --- End diff --
    
    How about `FilteredAtSource` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10655#discussion_r49342098
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
    @@ -114,6 +114,7 @@ private[sql] object PhysicalRDD {
       // Metadata keys
       val INPUT_PATHS = "InputPaths"
       val PUSHED_FILTERS = "PushedFilters"
    +  val HANDLED_FILTERS = "HandledFilters"
    --- End diff --
    
    sgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-172139079
  
    **[Test build #49502 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49502/consoleFull)** for PR 10655 at commit [`5a0daf6`](https://github.com/apache/spark/commit/5a0daf6590a711f376494419d5419ed2a2b7b26d).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10655: [SPARK-12639][SQL] Improve Explain for Datasources with ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/10655
  
    **[Test build #65267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65267/consoleFull)** for PR 10655 at commit [`5a0daf6`](https://github.com/apache/spark/commit/5a0daf6590a711f376494419d5419ed2a2b7b26d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10655: [SPARK-12639][SQL] Improve Explain for Datasources with ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10655
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65267/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-212184245
  
    Maybe we might have to correct the title just like the others `[SPARK-XXXX][SQL]` (this is described in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-217483240
  
    Sorry I forgot about this, I'll clean this up tomorrow and get it ready


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10655: SPARK-12639 SQL Improve Explain for Datasources with Han...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10655
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-170064440
  
    @rxin Added, basically I think the current "PushedFilters" list isn't very valuable if everything is listed there. So instead we should just list those filters which the source can actually do something with. If there is a possibility that a source might do something (bloom flitery) we should have a third category but currently there is no way of knowing (i think)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by RussellSpitzer <gi...@git.apache.org>.
Github user RussellSpitzer commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10655#discussion_r49152876
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala ---
    @@ -321,8 +321,8 @@ private[sql] object DataSourceStrategy extends Strategy with Logging {
         val metadata: Map[String, String] = {
           val pairs = ArrayBuffer.empty[(String, String)]
     
    -      if (pushedFilters.nonEmpty) {
    -        pairs += (PUSHED_FILTERS -> pushedFilters.mkString("[", ", ", "]"))
    +      if (handledPredicates.nonEmpty) {
    +        pairs += (HANDLED_FILTERS -> handledPredicates.mkString("[", ", ", "]"))
    --- End diff --
    
    I thought 11663 meant all filters are pushed down, regardless so I wondered if that was redundant?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10655#discussion_r49341415
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
    @@ -114,6 +114,7 @@ private[sql] object PhysicalRDD {
       // Metadata keys
       val INPUT_PATHS = "InputPaths"
       val PUSHED_FILTERS = "PushedFilters"
    +  val HANDLED_FILTERS = "HandledFilters"
    --- End diff --
    
    how about we just delete `PUSHED_FILTERS` since it is not used? I think `HandledFilters` is a better name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-170195683
  
    OK I think I figured out why. "acc" is a boolean column.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-170613555
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10655: [SPARK-12639][SQL] Improve Explain for Datasources with ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/10655
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-170604576
  
    @RussellSpitzer Thank you for the PR! The change looks good. Can you also try ORC and Parquet table and attach the before/after change to the PR description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10655#issuecomment-172137996
  
    **[Test build #49502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49502/consoleFull)** for PR 10655 at commit [`5a0daf6`](https://github.com/apache/spark/commit/5a0daf6590a711f376494419d5419ed2a2b7b26d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org