You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by th...@apache.org on 2022/08/16 10:24:46 UTC

[arrow] branch master updated: MINOR: [R] remove duplication about hive-style file paths

This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new d880d7517a MINOR: [R] remove duplication about hive-style file paths
d880d7517a is described below

commit d880d7517a33f2ac8ff259cad711bc210fd570c5
Author: François Michonneau <fr...@gmail.com>
AuthorDate: Tue Aug 16 11:24:32 2022 +0100

    MINOR: [R] remove duplication about hive-style file paths
    
    Reading the vignette about the datasets, it seems that the part about having self-describing file paths is repeated.
    
    This PR removes the second time this is mentioned and adds the link to the Hive project when it's first mentioned.
    
    Another small detail is that the months in the dataset (at least in the S3 bucket) use a single digit (e.g., `1` for January) while in the section removed by this PR they are listed with 2 digits (`01` for January).
    
    Closes #13844 from fmichonneau/rm-hive-duplication
    
    Authored-by: François Michonneau <fr...@gmail.com>
    Signed-off-by: Nic Crane <th...@gmail.com>
---
 r/vignettes/dataset.Rmd | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/r/vignettes/dataset.Rmd b/r/vignettes/dataset.Rmd
index 1a969f979c..0890d36ff4 100644
--- a/r/vignettes/dataset.Rmd
+++ b/r/vignettes/dataset.Rmd
@@ -126,7 +126,7 @@ For more information on the usage of these parameters, see `?read_delim_arrow()`
 
 `open_dataset()` was able to automatically infer column values for `year` and `month`
 --which are not present in the data files--based on the directory structure. The 
-Hive-style partitioning structure is self-describing, with file paths like
+[Hive](https://hive.apache.org/)-style partitioning structure is self-describing, with file paths like
 
 ```
 year=2009/month=1/data.parquet
@@ -185,20 +185,6 @@ month: int32
 ")
 ```
 
-The other form of partitioning currently supported is [Hive](https://hive.apache.org/)-style,
-in which the partition variable names are included in the path segments.
-If you had saved your files in paths like:
-
-```
-year=2009/month=01/data.parquet
-year=2009/month=02/data.parquet
-...
-```
-
-you would not have had to provide the names in `partitioning`;
-you could have just called `ds <- open_dataset("nyc-taxi")` and the partitions
-would have been detected automatically.
-
 ## Querying the dataset
 
 Up to this point, you haven't loaded any data. You've walked directories to find