You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by np...@apache.org on 2022/10/13 13:18:55 UTC

[arrow] branch master updated: MINOR: [R][Docs] Add note about conversion from JSON types to Arrow types (#13871)

This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 66e8ba5a1e MINOR: [R][Docs] Add note about conversion from JSON types to Arrow types (#13871)
66e8ba5a1e is described below

commit 66e8ba5a1e07eaee19f040aa4df5a840614ed790
Author: eitsupi <50...@users.noreply.github.com>
AuthorDate: Thu Oct 13 22:18:49 2022 +0900

    MINOR: [R][Docs] Add note about conversion from JSON types to Arrow types (#13871)
    
    Add note about conversion from JSON types to Arrow types.
    These documents were copied from `docs/source/python/json.rst` with modifications.
    
    Also, show the data frame in the example to make it easier to understand how the conversion is performed.
    
    Authored-by: SHIMA Tatsuya <ts...@gmail.com>
    Signed-off-by: Neal Richardson <ne...@gmail.com>
---
 r/R/json.R               | 16 ++++++++++++++--
 r/man/read_json_arrow.Rd | 18 ++++++++++++++++--
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/r/R/json.R b/r/R/json.R
index 2b1f4916cb..c4061f066b 100644
--- a/r/R/json.R
+++ b/r/R/json.R
@@ -21,7 +21,19 @@
 #' data frame or Arrow Table.
 #'
 #' If passed a path, will detect and handle compression from the file extension
-#' (e.g. `.json.gz`). Accepts explicit or implicit nulls.
+#' (e.g. `.json.gz`).
+#'
+#' If `schema` is not provided, Arrow data types are inferred from the data:
+#' - JSON null values convert to the [null()] type, but can fall back to any other type.
+#' - JSON booleans convert to [boolean()].
+#' - JSON numbers convert to [int64()], falling back to [float64()] if a non-integer is encountered.
+#' - JSON strings of the kind "YYYY-MM-DD" and "YYYY-MM-DD hh:mm:ss" convert to [`timestamp(unit = "s")`][timestamp()],
+#'   falling back to [utf8()] if a conversion error occurs.
+#' - JSON arrays convert to a [list_of()] type, and inference proceeds recursively on the JSON arrays' values.
+#' - Nested JSON objects convert to a [struct()] type, and inference proceeds recursively on the JSON objects' values.
+#'
+#' When `as_data_frame = FALSE`, Arrow types are further converted to R types.
+#' See `vignette("arrow", package = "arrow")` for details.
 #'
 #' @inheritParams read_delim_arrow
 #' @param schema [Schema] that describes the table.
@@ -37,7 +49,7 @@
 #'     { "hello": 3.25, "world": null }
 #'     { "hello": 0.0, "world": true, "yo": null }
 #'   ', tf, useBytes = TRUE)
-#' df <- read_json_arrow(tf)
+#' read_json_arrow(tf)
 read_json_arrow <- function(file,
                             col_select = NULL,
                             as_data_frame = TRUE,
diff --git a/r/man/read_json_arrow.Rd b/r/man/read_json_arrow.Rd
index 2ad600725f..cc821c3301 100644
--- a/r/man/read_json_arrow.Rd
+++ b/r/man/read_json_arrow.Rd
@@ -41,7 +41,21 @@ data frame or Arrow Table.
 }
 \details{
 If passed a path, will detect and handle compression from the file extension
-(e.g. \code{.json.gz}). Accepts explicit or implicit nulls.
+(e.g. \code{.json.gz}).
+
+If \code{schema} is not provided, Arrow data types are inferred from the data:
+\itemize{
+\item JSON null values convert to the \code{\link[=null]{null()}} type, but can fall back to any other type.
+\item JSON booleans convert to \code{\link[=boolean]{boolean()}}.
+\item JSON numbers convert to \code{\link[=int64]{int64()}}, falling back to \code{\link[=float64]{float64()}} if a non-integer is encountered.
+\item JSON strings of the kind "YYYY-MM-DD" and "YYYY-MM-DD hh:mm:ss" convert to \code{\link[=timestamp]{timestamp(unit = "s")}},
+falling back to \code{\link[=utf8]{utf8()}} if a conversion error occurs.
+\item JSON arrays convert to a \code{\link[=list_of]{list_of()}} type, and inference proceeds recursively on the JSON arrays' values.
+\item Nested JSON objects convert to a \code{\link[=struct]{struct()}} type, and inference proceeds recursively on the JSON objects' values.
+}
+
+When \code{as_data_frame = FALSE}, Arrow types are further converted to R types.
+See \code{vignette("arrow", package = "arrow")} for details.
 }
 \examples{
 \dontshow{if (arrow_with_json()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
@@ -52,6 +66,6 @@ writeLines('
     { "hello": 3.25, "world": null }
     { "hello": 0.0, "world": true, "yo": null }
   ', tf, useBytes = TRUE)
-df <- read_json_arrow(tf)
+read_json_arrow(tf)
 \dontshow{\}) # examplesIf}
 }