You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rekhajoshm <gi...@git.apache.org> on 2016/04/05 23:49:25 UTC

[GitHub] spark pull request: [SPARK-14410] [SQL] : SessionCatalog needs to ...

GitHub user rekhajoshm opened a pull request:

    https://github.com/apache/spark/pull/12183

    [SPARK-14410] [SQL] : SessionCatalog needs to check database/table/function existence

    ## What changes were proposed in this pull request?
    
    1.Check for existence of Function in catalog, instead of try-catch
    2. Check for existence of database before create/alter/drop database, though ignoreIfExists/ignoreIfNotExists become futile
    3.Rename getFunction to getMetadataOfFunction
    
    ## How was this patch tested?
    
    TestSuite
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rekhajoshm/spark SPARK-14410

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12183
    
----
commit e3677c9fa9697e0d34f9df52442085a6a481c9e9
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-05-05T23:10:08Z

    Merge pull request #1 from apache/master
    
    Pulling functionality from apache spark

commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-05-08T21:49:09Z

    Merge pull request #2 from apache/master
    
    pull latest from apache spark

commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-06-22T00:08:08Z

    Merge pull request #3 from apache/master
    
    Pulling functionality from apache spark

commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-09-17T01:03:09Z

    Merge pull request #4 from apache/master
    
    Pulling functionality from apache spark

commit b123c601e459d1ad17511fd91dd304032154882a
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-11-25T18:50:32Z

    Merge pull request #5 from apache/master
    
    pull request from apache/master

commit c73c32aadd6066e631956923725a48d98a18777e
Author: Rekha Joshi <re...@gmail.com>
Date:   2016-03-18T19:13:51Z

    Merge pull request #6 from apache/master
    
    pull latest from apache spark

commit 7dbf7320057978526635bed09dabc8cf8657a28a
Author: Rekha Joshi <re...@gmail.com>
Date:   2016-04-05T20:26:40Z

    Merge pull request #8 from apache/master
    
    pull latest from apache spark

commit 59c60a6d2d2d59f1a7656fd197442934fbc06a11
Author: Joshi <re...@gmail.com>
Date:   2016-04-05T21:37:45Z

    SessionCatalog function/database existence checks, metadata

commit e040216efc19f111371883989f8df5b34dc032db
Author: Joshi <re...@gmail.com>
Date:   2016-04-05T21:40:13Z

    SessionCatalog function/database existence checks, metadata

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14410] [SQL] : SessionCatalog needs to ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12183#issuecomment-206136314
  
    @rekhajoshm I've opened #12198 to illustrate what I mean. I don't think the changes in this patch are correct so I would recommend that we close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14410] [SQL] : SessionCatalog needs to ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12183#issuecomment-205999728
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55025/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14410] [SQL] : SessionCatalog needs to ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/12183#issuecomment-206033483
  
    @rekhajoshm I think there is some misunderstanding on what the issue is. The JIRA is saying all the create/drop methods in `SessionCatalog` should take in a flag `ignoreIf[Not]Exists`. I believe actually create/drop functions are the only places where we need to do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14410] [SQL] : SessionCatalog needs to ...

Posted by rekhajoshm <gi...@git.apache.org>.
Github user rekhajoshm commented on the pull request:

    https://github.com/apache/spark/pull/12183#issuecomment-206017437
  
    @yhuai - will do, one question is., as database calls had ignoreIfExists/ignoreIfNotExists flag, now with adding existence check this boolean loses meaning.Do we still pass that flag? If not, more changes across in clients.Please let me know.thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14410] [SQL] : SessionCatalog needs to ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12183#discussion_r58627907
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -84,15 +84,21 @@ class SessionCatalog(
       // ----------------------------------------------------------------------------
     
       def createDatabase(dbDefinition: CatalogDatabase, ignoreIfExists: Boolean): Unit = {
    -    externalCatalog.createDatabase(dbDefinition, ignoreIfExists)
    +    if (!databaseExists(dbDefinition.name)) {
    +      externalCatalog.createDatabase(dbDefinition, ignoreIfExists)
    +    }
    --- End diff --
    
    For createDatabase, we should throw an exception if db already exists and ignoreIfExists is false, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14410] [SQL] : SessionCatalog needs to ...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12183#issuecomment-206004627
  
    @rekhajoshm Thank you for working on this! I think there are other methods in SessionCatalog that handle an existing table/db/function that do not check the existence of the table/db/function (e.g. dropFunction). Can you go through methods provided by SessionCatalog and make changes accordingly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14410] [SQL] : SessionCatalog needs to ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12183#discussion_r58631403
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -474,15 +479,8 @@ class SessionCatalog(
           // This function exists in the FunctionRegistry.
           true
         } else {
    -      // Need to check if this function exists in the metastore.
    -      try {
    -        // TODO: It's better to ask external catalog if this function exists.
    -        // So, we can avoid of having this hacky try/catch block.
    -        getFunction(name) != null
    -      } catch {
    -        case _: NoSuchFunctionException => false
    -        case _: AnalysisException => false // HiveExternalCatalog wraps all exceptions with it.
    -      }
    +      val db = name.database.getOrElse(currentDb)
    +      externalCatalog.listFunctions(db,name.unquotedString).size > 0
    --- End diff --
    
    No, the right thing to do here is to implement `functionExists` in the external catalog and `HiveClientImpl`. That's what the TODO meant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14410] [SQL] : SessionCatalog needs to ...

Posted by rekhajoshm <gi...@git.apache.org>.
Github user rekhajoshm closed the pull request at:

    https://github.com/apache/spark/pull/12183


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14410] [SQL] : SessionCatalog needs to ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12183#discussion_r58631042
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -84,15 +84,21 @@ class SessionCatalog(
       // ----------------------------------------------------------------------------
     
       def createDatabase(dbDefinition: CatalogDatabase, ignoreIfExists: Boolean): Unit = {
    -    externalCatalog.createDatabase(dbDefinition, ignoreIfExists)
    +    if (!databaseExists(dbDefinition.name)) {
    +      externalCatalog.createDatabase(dbDefinition, ignoreIfExists)
    +    }
       }
     
       def dropDatabase(db: String, ignoreIfNotExists: Boolean, cascade: Boolean): Unit = {
    -    externalCatalog.dropDatabase(db, ignoreIfNotExists, cascade)
    +    if (databaseExists(db)) {
    +      externalCatalog.dropDatabase(db, ignoreIfNotExists, cascade)
    +    }
    --- End diff --
    
    This is now wrong. We used to pass the flag `ignoreIfNotExists` to the external catalog, which will handle the exception throwing. Now we'll ignore this flag if the database doesn't exist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14410] [SQL] : SessionCatalog needs to ...

Posted by rekhajoshm <gi...@git.apache.org>.
Github user rekhajoshm commented on the pull request:

    https://github.com/apache/spark/pull/12183#issuecomment-206171895
  
    ok thanks @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org