You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by felixcheung <gi...@git.apache.org> on 2017/03/30 16:28:07 UTC

[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

GitHub user felixcheung opened a pull request:

    https://github.com/apache/spark/pull/17483

    [SPARK-20159][SPARKR][SQL] Support all catalog API in R

    ## What changes were proposed in this pull request?
    
    Add a set of catalog API in R
    
    ## How was this patch tested?
    
    manual tests, unit tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/felixcheung/spark rcatalog

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17483
    
----
commit e4808b656172e5d2994e0159ac4c0e326de1cb8a
Author: Felix Cheung <fe...@hotmail.com>
Date:   2017-03-30T16:25:17Z

    Move catalog-base method into a new file, add catalog APIs, tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75451/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75417/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109045414
  
    --- Diff: R/pkg/R/catalog.R ---
    @@ -0,0 +1,478 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# catalog.R: SparkSession catalog functions
    +
    +#' Create an external table
    +#'
    +#' Creates an external table based on the dataset in a data source,
    +#' Returns a SparkDataFrame associated with the external table.
    +#'
    +#' The data source is specified by the \code{source} and a set of options(...).
    +#' If \code{source} is not specified, the default data source configured by
    +#' "spark.sql.sources.default" will be used.
    +#'
    +#' @param tableName a name of the table.
    +#' @param path the path of files to load.
    +#' @param source the name of external data source.
    +#' @param schema the schema of the data for certain data source.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A SparkDataFrame.
    +#' @rdname createExternalTable
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sparkR.session()
    +#' df <- createExternalTable("myjson", path="path/to/json", source="json", schema)
    +#' }
    +#' @name createExternalTable
    +#' @method createExternalTable default
    +#' @note createExternalTable since 1.4.0
    +createExternalTable.default <- function(tableName, path = NULL, source = NULL, schema = NULL, ...) {
    +  sparkSession <- getSparkSession()
    +  options <- varargsToStrEnv(...)
    +  if (!is.null(path)) {
    +    options[["path"]] <- path
    +  }
    +  catalog <- callJMethod(sparkSession, "catalog")
    +  if (!is.null(schema)) {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, options)
    +  } else {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, schema$jobj, options)
    +  }
    +  dataFrame(sdf)
    +}
    +
    +createExternalTable <- function(x, ...) {
    --- End diff --
    
    Just FYI, `createExternalTable ` is deprecated. See the PR: https://github.com/apache/spark/pull/16528
    
    Let me make the corresponding changes in SQLContext too. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75422 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75422/testReport)** for PR 17483 at commit [`5093891`](https://github.com/apache/spark/commit/5093891e5a8fc0f299ebb4303ddb488e86f87221).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109325537
  
    --- Diff: R/pkg/R/catalog.R ---
    @@ -0,0 +1,478 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# catalog.R: SparkSession catalog functions
    +
    +#' Create an external table
    +#'
    +#' Creates an external table based on the dataset in a data source,
    +#' Returns a SparkDataFrame associated with the external table.
    +#'
    +#' The data source is specified by the \code{source} and a set of options(...).
    +#' If \code{source} is not specified, the default data source configured by
    +#' "spark.sql.sources.default" will be used.
    +#'
    +#' @param tableName a name of the table.
    +#' @param path the path of files to load.
    +#' @param source the name of external data source.
    +#' @param schema the schema of the data for certain data source.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A SparkDataFrame.
    +#' @rdname createExternalTable
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sparkR.session()
    +#' df <- createExternalTable("myjson", path="path/to/json", source="json", schema)
    +#' }
    +#' @name createExternalTable
    +#' @method createExternalTable default
    +#' @note createExternalTable since 1.4.0
    +createExternalTable.default <- function(tableName, path = NULL, source = NULL, schema = NULL, ...) {
    +  sparkSession <- getSparkSession()
    +  options <- varargsToStrEnv(...)
    +  if (!is.null(path)) {
    +    options[["path"]] <- path
    +  }
    +  catalog <- callJMethod(sparkSession, "catalog")
    +  if (!is.null(schema)) {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, options)
    +  } else {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, schema$jobj, options)
    +  }
    +  dataFrame(sdf)
    +}
    +
    +createExternalTable <- function(x, ...) {
    --- End diff --
    
    > If you drop a managed table both data and meta data will be deleted if you drop an external table only metadata is deleted, external table is a way to protect data against accidental drop commands.
    
    Thus, it is a pretty important concept. It could be either Hive or Spark native one. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75447/testReport)** for PR 17483 at commit [`3c66930`](https://github.com/apache/spark/commit/3c6693035b023d7c9c9e2caa3014247c130eb037).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75393/testReport)** for PR 17483 at commit [`e4808b6`](https://github.com/apache/spark/commit/e4808b656172e5d2994e0159ac4c0e326de1cb8a).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109082276
  
    --- Diff: R/pkg/R/catalog.R ---
    @@ -0,0 +1,478 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# catalog.R: SparkSession catalog functions
    +
    +#' Create an external table
    +#'
    +#' Creates an external table based on the dataset in a data source,
    +#' Returns a SparkDataFrame associated with the external table.
    +#'
    +#' The data source is specified by the \code{source} and a set of options(...).
    +#' If \code{source} is not specified, the default data source configured by
    +#' "spark.sql.sources.default" will be used.
    +#'
    +#' @param tableName a name of the table.
    +#' @param path the path of files to load.
    +#' @param source the name of external data source.
    +#' @param schema the schema of the data for certain data source.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A SparkDataFrame.
    +#' @rdname createExternalTable
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sparkR.session()
    +#' df <- createExternalTable("myjson", path="path/to/json", source="json", schema)
    +#' }
    +#' @name createExternalTable
    +#' @method createExternalTable default
    +#' @note createExternalTable since 1.4.0
    +createExternalTable.default <- function(tableName, path = NULL, source = NULL, schema = NULL, ...) {
    +  sparkSession <- getSparkSession()
    +  options <- varargsToStrEnv(...)
    +  if (!is.null(path)) {
    +    options[["path"]] <- path
    +  }
    +  catalog <- callJMethod(sparkSession, "catalog")
    +  if (!is.null(schema)) {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, options)
    +  } else {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, schema$jobj, options)
    +  }
    +  dataFrame(sdf)
    +}
    +
    +createExternalTable <- function(x, ...) {
    --- End diff --
    
    yes, I'm aware. I'm not sure if we need to make changes to SQLContext - schema is required for json source but it's ok in Scala to use createTable instead.
    
    It makes sense to add in R here though because
    - there is no createTable method
    - createTable just sounds too generic and too much like existing R method, that I wasn't sure it's a good idea to add in R
    - createExternalTable since 2.0 is decoupled from SQLContext or SparkSession - it doesn't take either as parameter and it's calling on catalog



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75422 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75422/testReport)** for PR 17483 at commit [`5093891`](https://github.com/apache/spark/commit/5093891e5a8fc0f299ebb4303ddb488e86f87221).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75418 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75418/testReport)** for PR 17483 at commit [`28195b9`](https://github.com/apache/spark/commit/28195b98bf71c36b47e3e191b24715806f8aed6e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75447/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109290611
  
    --- Diff: R/pkg/R/catalog.R ---
    @@ -0,0 +1,478 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# catalog.R: SparkSession catalog functions
    +
    +#' Create an external table
    +#'
    +#' Creates an external table based on the dataset in a data source,
    +#' Returns a SparkDataFrame associated with the external table.
    +#'
    +#' The data source is specified by the \code{source} and a set of options(...).
    +#' If \code{source} is not specified, the default data source configured by
    +#' "spark.sql.sources.default" will be used.
    +#'
    +#' @param tableName a name of the table.
    +#' @param path the path of files to load.
    +#' @param source the name of external data source.
    +#' @param schema the schema of the data for certain data source.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A SparkDataFrame.
    +#' @rdname createExternalTable
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sparkR.session()
    +#' df <- createExternalTable("myjson", path="path/to/json", source="json", schema)
    +#' }
    +#' @name createExternalTable
    +#' @method createExternalTable default
    +#' @note createExternalTable since 1.4.0
    +createExternalTable.default <- function(tableName, path = NULL, source = NULL, schema = NULL, ...) {
    +  sparkSession <- getSparkSession()
    +  options <- varargsToStrEnv(...)
    +  if (!is.null(path)) {
    +    options[["path"]] <- path
    +  }
    +  catalog <- callJMethod(sparkSession, "catalog")
    +  if (!is.null(schema)) {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, options)
    +  } else {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, schema$jobj, options)
    +  }
    +  dataFrame(sdf)
    +}
    +
    +createExternalTable <- function(x, ...) {
    --- End diff --
    
    right, I was just concerned that with `data.table`, `read.table` etc, table == data.frame in R as supposed to `hive table` or `managed table`, which could be fairly confusing.
    anyway, I think I'll follow up with a PR for `createTable` but as of now `path` is optional for `createExternalTable`, even though it's potentially misleading, it does work now.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109278631
  
    --- Diff: R/pkg/R/utils.R ---
    @@ -846,6 +846,24 @@ captureJVMException <- function(e, method) {
         # Extract the first message of JVM exception.
         first <- strsplit(msg[2], "\r?\n\tat")[[1]][1]
         stop(paste0(rmsg, "analysis error - ", first), call. = FALSE)
    +  } else
    +    if (any(grep("org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: ", stacktrace))) {
    --- End diff --
    
    Yes. I knew it. See the JIRA: https://issues.apache.org/jira/browse/SPARK-19952. @hvanhovell plans to remove it in 2.3.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109290624
  
    --- Diff: R/pkg/R/utils.R ---
    @@ -846,6 +846,24 @@ captureJVMException <- function(e, method) {
         # Extract the first message of JVM exception.
         first <- strsplit(msg[2], "\r?\n\tat")[[1]][1]
         stop(paste0(rmsg, "analysis error - ", first), call. = FALSE)
    +  } else
    +    if (any(grep("org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: ", stacktrace))) {
    --- End diff --
    
    ok, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    cc @gatorsmile for any SQL specific inputs
    
    @felixcheung I will take a look at this later today. Meanwhile in the PR description can you note down what are the new functions being added in this change ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17483


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75393/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75452/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75451/testReport)** for PR 17483 at commit [`2aea0cb`](https://github.com/apache/spark/commit/2aea0cb0f9b55acf741788aa72a573853560f3d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    @gatorsmile I added a line about recoverPartitions, I think we should also be more clear in other language bindings?
    
    Also open https://issues.apache.org/jira/browse/SPARK-20188



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75417 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75417/testReport)** for PR 17483 at commit [`5ab5834`](https://github.com/apache/spark/commit/5ab583443d60f6bf1d85608552962b46ac088633).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109276366
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -645,16 +645,17 @@ test_that("test tableNames and tables", {
       df <- read.json(jsonPath)
       createOrReplaceTempView(df, "table1")
       expect_equal(length(tableNames()), 1)
    -  tables <- tables()
    +  tables <- listTables()
    --- End diff --
    
    right, there are some differences of the output (most notability catalog.listTables returns a Dataset<Table> - but I'm converting that into a DataFrame anyway), and I thought list* would be more consistent with other methods like listColumn, listDatabases()
    
    ```
    > head(tables("default"))
      database tableName isTemporary
    1  default      json       FALSE
    Warning message:
    'tables' is deprecated.
    Use 'listTables' instead.
    See help("Deprecated")
    > head(listTables("default"))
      name database description tableType isTemporary
    1 json  default        <NA>  EXTERNAL       FALSE
    ```
    
    If you think it makes sense, we could make `tables` an alias of `listTables` - it's going to call slightly different code on the Scala side and there are new columns and one different column name being returned.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75422/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109278537
  
    --- Diff: R/pkg/R/catalog.R ---
    @@ -0,0 +1,478 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# catalog.R: SparkSession catalog functions
    +
    +#' Create an external table
    +#'
    +#' Creates an external table based on the dataset in a data source,
    +#' Returns a SparkDataFrame associated with the external table.
    +#'
    +#' The data source is specified by the \code{source} and a set of options(...).
    +#' If \code{source} is not specified, the default data source configured by
    +#' "spark.sql.sources.default" will be used.
    +#'
    +#' @param tableName a name of the table.
    +#' @param path the path of files to load.
    +#' @param source the name of external data source.
    +#' @param schema the schema of the data for certain data source.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A SparkDataFrame.
    +#' @rdname createExternalTable
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sparkR.session()
    +#' df <- createExternalTable("myjson", path="path/to/json", source="json", schema)
    +#' }
    +#' @name createExternalTable
    +#' @method createExternalTable default
    +#' @note createExternalTable since 1.4.0
    +createExternalTable.default <- function(tableName, path = NULL, source = NULL, schema = NULL, ...) {
    +  sparkSession <- getSparkSession()
    +  options <- varargsToStrEnv(...)
    +  if (!is.null(path)) {
    +    options[["path"]] <- path
    +  }
    +  catalog <- callJMethod(sparkSession, "catalog")
    +  if (!is.null(schema)) {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, options)
    +  } else {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, schema$jobj, options)
    +  }
    +  dataFrame(sdf)
    +}
    +
    +createExternalTable <- function(x, ...) {
    --- End diff --
    
    `createExternalTable` is misleading now, because the table could be `managed` if users did not provide the value of `path`. Thus, we decided to rename it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75393/testReport)** for PR 17483 at commit [`e4808b6`](https://github.com/apache/spark/commit/e4808b656172e5d2994e0159ac4c0e326de1cb8a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75420/testReport)** for PR 17483 at commit [`9c768ae`](https://github.com/apache/spark/commit/9c768ae983f8fbeed11b7c308ca7f5662f88d809).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109290616
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -645,16 +645,17 @@ test_that("test tableNames and tables", {
       df <- read.json(jsonPath)
       createOrReplaceTempView(df, "table1")
       expect_equal(length(tableNames()), 1)
    -  tables <- tables()
    +  tables <- listTables()
    --- End diff --
    
    changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75452/testReport)** for PR 17483 at commit [`aff13a8`](https://github.com/apache/spark/commit/aff13a860282375a650f6323987c73364ed439cd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75452/testReport)** for PR 17483 at commit [`aff13a8`](https://github.com/apache/spark/commit/aff13a860282375a650f6323987c73364ed439cd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75451/testReport)** for PR 17483 at commit [`2aea0cb`](https://github.com/apache/spark/commit/2aea0cb0f9b55acf741788aa72a573853560f3d9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75420/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    update PR description!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    merged to master. thanks for the review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75447/testReport)** for PR 17483 at commit [`3c66930`](https://github.com/apache/spark/commit/3c6693035b023d7c9c9e2caa3014247c130eb037).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75420/testReport)** for PR 17483 at commit [`9c768ae`](https://github.com/apache/spark/commit/9c768ae983f8fbeed11b7c308ca7f5662f88d809).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75418 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75418/testReport)** for PR 17483 at commit [`28195b9`](https://github.com/apache/spark/commit/28195b98bf71c36b47e3e191b24715806f8aed6e).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r108974586
  
    --- Diff: R/pkg/R/catalog.R ---
    @@ -0,0 +1,478 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# catalog.R: SparkSession catalog functions
    +
    +#' Create an external table
    +#'
    +#' Creates an external table based on the dataset in a data source,
    +#' Returns a SparkDataFrame associated with the external table.
    +#'
    +#' The data source is specified by the \code{source} and a set of options(...).
    +#' If \code{source} is not specified, the default data source configured by
    +#' "spark.sql.sources.default" will be used.
    +#'
    +#' @param tableName a name of the table.
    +#' @param path the path of files to load.
    +#' @param source the name of external data source.
    +#' @param schema the schema of the data for certain data source.
    --- End diff --
    
    this is added to createExternalTable (won't work with json source otherwise)
    everything else is simply moved as-is from SQLContext.R


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    **[Test build #75417 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75417/testReport)** for PR 17483 at commit [`5ab5834`](https://github.com/apache/spark/commit/5ab583443d60f6bf1d85608552962b46ac088633).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75418/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109258117
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -2977,6 +2981,51 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column
       expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
     })
     
    +test_that("catalog APIs, currentDatabase, setCurrentDatabase, listDatabases", {
    +  expect_equal(currentDatabase(), "default")
    +  expect_error(setCurrentDatabase("default"), NA)
    +  expect_error(setCurrentDatabase("foo"),
    +               "Error in setCurrentDatabase : analysis error - Database 'foo' does not exist")
    +  dbs <- collect(listDatabases())
    +  expect_equal(names(dbs), c("name", "description", "locationUri"))
    +  expect_equal(dbs[[1]], "default")
    +})
    +
    +test_that("catalog APIs, listTables, listColumns, listFunctions", {
    +  tb <- listTables()
    +  count <- count(suppressWarnings(tables()))
    +  expect_equal(nrow(tb), count)
    +  expect_equal(colnames(tb), c("name", "database", "description", "tableType", "isTemporary"))
    +
    +  createOrReplaceTempView(as.DataFrame(cars), "cars")
    +
    +  tb <- listTables()
    +  expect_equal(nrow(tb), count + 1)
    +  tbs <- collect(tb)
    +  expect_true(nrow(tbs[tbs$name == "cars", ]) > 0)
    +  expect_error(listTables("bar"),
    +               "Error in listTables : no such database - Database 'bar' not found")
    +
    +  c <- listColumns("cars")
    +  expect_equal(nrow(c), 2)
    +  expect_equal(colnames(c),
    +               c("name", "description", "dataType", "nullable", "isPartition", "isBucket"))
    +  expect_equal(collect(c)[[1]][[1]], "speed")
    +  expect_error(listColumns("foo", "default"),
    +       "Error in listColumns : analysis error - Table 'foo' does not exist in database 'default'")
    +
    +  dropTempView("cars")
    +
    +  f <- listFunctions()
    +  expect_true(nrow(f) >= 200) # 250
    +  expect_equal(colnames(f),
    +               c("name", "database", "description", "className", "isTemporary"))
    +  expect_equal(take(orderBy(f, "className"), 1)$className,
    +               "org.apache.spark.sql.catalyst.expressions.Abs")
    +  expect_error(listFunctions("foo_db"),
    +               "Error in listFunctions : analysis error - Database 'foo_db' does not exist")
    +})
    --- End diff --
    
    We dont have tests for `recoverPartitions` `refreshByPath` and `refreshTable` ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109258347
  
    --- Diff: R/pkg/R/catalog.R ---
    @@ -0,0 +1,478 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# catalog.R: SparkSession catalog functions
    +
    +#' Create an external table
    +#'
    +#' Creates an external table based on the dataset in a data source,
    +#' Returns a SparkDataFrame associated with the external table.
    +#'
    +#' The data source is specified by the \code{source} and a set of options(...).
    +#' If \code{source} is not specified, the default data source configured by
    +#' "spark.sql.sources.default" will be used.
    +#'
    +#' @param tableName a name of the table.
    +#' @param path the path of files to load.
    +#' @param source the name of external data source.
    +#' @param schema the schema of the data for certain data source.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A SparkDataFrame.
    +#' @rdname createExternalTable
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sparkR.session()
    +#' df <- createExternalTable("myjson", path="path/to/json", source="json", schema)
    +#' }
    +#' @name createExternalTable
    +#' @method createExternalTable default
    +#' @note createExternalTable since 1.4.0
    +createExternalTable.default <- function(tableName, path = NULL, source = NULL, schema = NULL, ...) {
    +  sparkSession <- getSparkSession()
    +  options <- varargsToStrEnv(...)
    +  if (!is.null(path)) {
    +    options[["path"]] <- path
    +  }
    +  catalog <- callJMethod(sparkSession, "catalog")
    +  if (!is.null(schema)) {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, options)
    +  } else {
    +    sdf <- callJMethod(catalog, "createExternalTable", tableName, source, schema$jobj, options)
    +  }
    +  dataFrame(sdf)
    +}
    +
    +createExternalTable <- function(x, ...) {
    --- End diff --
    
    I agree that `createTable` sounds very general, but I dont think its used by base R or any popular R package ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    btw, @gatorsmile it looks like `listColumns` should throw `NoSuchTableException` and/or `NoSuchDatabaseException` instead of `AnalysisException [here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L148)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109257813
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -645,16 +645,17 @@ test_that("test tableNames and tables", {
       df <- read.json(jsonPath)
       createOrReplaceTempView(df, "table1")
       expect_equal(length(tableNames()), 1)
    -  tables <- tables()
    +  tables <- listTables()
    --- End diff --
    
    is `tables()` deprecated now ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17483: [SPARK-20159][SPARKR][SQL] Support all catalog AP...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17483#discussion_r109291089
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -2977,6 +2981,51 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column
       expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt"))
     })
     
    +test_that("catalog APIs, currentDatabase, setCurrentDatabase, listDatabases", {
    +  expect_equal(currentDatabase(), "default")
    +  expect_error(setCurrentDatabase("default"), NA)
    +  expect_error(setCurrentDatabase("foo"),
    +               "Error in setCurrentDatabase : analysis error - Database 'foo' does not exist")
    +  dbs <- collect(listDatabases())
    +  expect_equal(names(dbs), c("name", "description", "locationUri"))
    +  expect_equal(dbs[[1]], "default")
    +})
    +
    +test_that("catalog APIs, listTables, listColumns, listFunctions", {
    +  tb <- listTables()
    +  count <- count(suppressWarnings(tables()))
    +  expect_equal(nrow(tb), count)
    +  expect_equal(colnames(tb), c("name", "database", "description", "tableType", "isTemporary"))
    +
    +  createOrReplaceTempView(as.DataFrame(cars), "cars")
    +
    +  tb <- listTables()
    +  expect_equal(nrow(tb), count + 1)
    +  tbs <- collect(tb)
    +  expect_true(nrow(tbs[tbs$name == "cars", ]) > 0)
    +  expect_error(listTables("bar"),
    +               "Error in listTables : no such database - Database 'bar' not found")
    +
    +  c <- listColumns("cars")
    +  expect_equal(nrow(c), 2)
    +  expect_equal(colnames(c),
    +               c("name", "description", "dataType", "nullable", "isPartition", "isBucket"))
    +  expect_equal(collect(c)[[1]][[1]], "speed")
    +  expect_error(listColumns("foo", "default"),
    +       "Error in listColumns : analysis error - Table 'foo' does not exist in database 'default'")
    +
    +  dropTempView("cars")
    +
    +  f <- listFunctions()
    +  expect_true(nrow(f) >= 200) # 250
    +  expect_equal(colnames(f),
    +               c("name", "database", "description", "className", "isTemporary"))
    +  expect_equal(take(orderBy(f, "className"), 1)$className,
    +               "org.apache.spark.sql.catalyst.expressions.Abs")
    +  expect_error(listFunctions("foo_db"),
    +               "Error in listFunctions : analysis error - Database 'foo_db' does not exist")
    +})
    --- End diff --
    
    sharp eyes :) I was planning to add tests.
    
    I tested these manually, but the steps are more involved and these are only thin wrappers in R I think we should defer to scala tests.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17483: [SPARK-20159][SPARKR][SQL] Support all catalog API in R

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17483
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org