You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2016/02/01 19:52:06 UTC

[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

GitHub user huaxingao opened a pull request:

    https://github.com/apache/spark/pull/11005

    [SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBCRelation

    As suggested <a href="https://issues.apache.org/jira/browse/SPARK-9182?focusedCommentId=15031526&amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15031526" class="external-link" rel="nofollow">here</a>, I will change JDBCRelation to implement CatalystScan, and then directly access Catalyst expressions in JDBCRDD.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/huaxingao/spark spark-12126

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11005.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11005
    
----
commit 0bdfea86fc0671bf636bbed3174baa417de6367e
Author: Huaxin Gao <hu...@us.ibm.com>
Date:   2016-01-31T04:01:12Z

    [SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBCRelation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11005#issuecomment-178420517
  
    As @maropu said, for me I also agree with him but some may think differently as mentioned [here](https://github.com/apache/spark/pull/10750#issuecomment-175400704).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #11005: [SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBC...

Posted by kyprifog <gi...@git.apache.org>.
Github user kyprifog commented on the issue:

    https://github.com/apache/spark/pull/11005
  
    @HyukjinKwon Thanks this is what I was looking for.  Glad the work is still being continued.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #11005: [SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBC...

Posted by kyprifog <gi...@git.apache.org>.
Github user kyprifog commented on the issue:

    https://github.com/apache/spark/pull/11005
  
    @rxin Why was this pull request closed?  Can you direct me to a new one that could have replaced it? I was trying to track down the modification of push downs for basic operations like "limit".


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11005#issuecomment-178126949
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11005#issuecomment-179135955
  
    I agree that `CatalystScan` is used to support arithmetic operations in datasources though, this current `CatalystScan` trait only processes filter expressions. If we make this kind of interface changes, it'd be better to consider other requirements discussed in jira; otherwise, we get stuck with the same issue every time we add new features in datasources.
    
    Anyway, the idea to share codes in `SQLBuilder` and `JDBCRDD` is good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #11005: [SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBC...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/11005
  
    Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. We can also continue the discussion on the JIRA ticket.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11005#issuecomment-178978949
  
    Is `PrunedFilteredScan` and `CatalystScan` conflicting? If not, can we keep original `JDBCRelation` and let it implement `CatalystScan` too? In order not to duplicate current codes, I am wondering if `SQLBuilder` can help converting filtering expressions to SQL string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/11005#issuecomment-178416552
  
    Great work though, I think that we should consider lots of things to refactor these kinds of datasource push-down codes because related tickets (SPARK-12449, 12686, ...) describes non-filter logical operators push downs. Also, this pr could break binary compatibility...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #11005: [SPARK-12506][SPARK-12126][SQL]use CatalystScan f...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11005


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #11005: [SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBC...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/11005
  
    Technical reason: It's kind of risky to rely on `CatalystScan` and completely replace the interface. I think I already see some tests were disabled here. Also, there look potential better suggestions above.
    
    Practical reason: there are too many pending PRs as you see. If the author is not responsive and the PR is inactive to review comments, we better leave them closed for now - seems it's already stuck in few technical reasons. The author is welcome to reopen and other contributors are welcome to take over.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11005#discussion_r51493048
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala ---
    @@ -455,31 +458,31 @@ class JDBCSuite extends SparkFunSuite
         assert(DerbyColumns === Seq(""""abc"""", """"key""""))
       }
     
    -  test("compile filters") {
    -    val compileFilter = PrivateMethod[Option[String]]('compileFilter)
    -    def doCompileFilter(f: Filter): String = JDBCRDD invokePrivate compileFilter(f) getOrElse("")
    -    assert(doCompileFilter(EqualTo("col0", 3)) === "col0 = 3")
    -    assert(doCompileFilter(Not(EqualTo("col1", "abc"))) === "(NOT (col1 = 'abc'))")
    -    assert(doCompileFilter(And(EqualTo("col0", 0), EqualTo("col1", "def")))
    -      === "(col0 = 0) AND (col1 = 'def')")
    -    assert(doCompileFilter(Or(EqualTo("col0", 2), EqualTo("col1", "ghi")))
    -      === "(col0 = 2) OR (col1 = 'ghi')")
    -    assert(doCompileFilter(LessThan("col0", 5)) === "col0 < 5")
    -    assert(doCompileFilter(LessThan("col3",
    -      Timestamp.valueOf("1995-11-21 00:00:00.0"))) === "col3 < '1995-11-21 00:00:00.0'")
    -    assert(doCompileFilter(LessThan("col4", Date.valueOf("1983-08-04"))) === "col4 < '1983-08-04'")
    -    assert(doCompileFilter(LessThanOrEqual("col0", 5)) === "col0 <= 5")
    -    assert(doCompileFilter(GreaterThan("col0", 3)) === "col0 > 3")
    -    assert(doCompileFilter(GreaterThanOrEqual("col0", 3)) === "col0 >= 3")
    -    assert(doCompileFilter(In("col1", Array("jkl"))) === "col1 IN ('jkl')")
    -    assert(doCompileFilter(Not(In("col1", Array("mno", "pqr"))))
    -      === "(NOT (col1 IN ('mno', 'pqr')))")
    -    assert(doCompileFilter(IsNull("col1")) === "col1 IS NULL")
    -    assert(doCompileFilter(IsNotNull("col1")) === "col1 IS NOT NULL")
    -    assert(doCompileFilter(And(EqualNullSafe("col0", "abc"), EqualTo("col1", "def")))
    -      === "((NOT (col0 != 'abc' OR col0 IS NULL OR 'abc' IS NULL) "
    -        + "OR (col0 IS NULL AND 'abc' IS NULL))) AND (col1 = 'def')")
    -  }
    +//  test("compile filters") {
    --- End diff --
    
    It looks you forgot to uncomment those..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/11005#issuecomment-178233988
  
    cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #11005: [SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBC...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/11005
  
    BTW, datasource v2 is in progress too to allow more push downs (see [SPARK-22386](https://issues.apache.org/jira/browse/SPARK-22386)). You might want to take a look


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12506][SPARK-12126][SQL]use CatalystSca...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11005#discussion_r51531271
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala ---
    @@ -20,14 +20,15 @@ package org.apache.spark.sql.execution.datasources.jdbc
     import java.sql.{Connection, Date, ResultSet, ResultSetMetaData, SQLException, Timestamp}
     import java.util.Properties
     
    +import org.apache.spark.sql.catalyst.expressions._
    --- End diff --
    
    Fix an import order.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org