You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by np...@apache.org on 2022/05/12 21:04:49 UTC

[arrow-site] branch asf-site updated: Backfill R news for 8.0.0 release (#214)

This is an automated email from the ASF dual-hosted git repository.

npr pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new f3e8fe1793 Backfill R news for 8.0.0 release (#214)
f3e8fe1793 is described below

commit f3e8fe179397b59f7375560dd69a970d7872e67c
Author: Neal Richardson <ne...@gmail.com>
AuthorDate: Thu May 12 17:04:45 2022 -0400

    Backfill R news for 8.0.0 release (#214)
    
    https://github.com/apache/arrow/commit/526fa070c82c0e1c6d26a4c1d06a591b37c05011 apparently did not make it into the release tag
---
 docs/r/news/index.html | 108 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 96 insertions(+), 12 deletions(-)

diff --git a/docs/r/news/index.html b/docs/r/news/index.html
index 545c27d780..bd4fc2effc 100644
--- a/docs/r/news/index.html
+++ b/docs/r/news/index.html
@@ -128,27 +128,111 @@
     </div>
 
     <div class="section level2">
-<h2 class="page-header" data-toc-text="7.0.0.9000" id="arrow-7009000">arrow 7.0.0.9000<a class="anchor" aria-label="anchor" href="#arrow-7009000"></a></h2>
+<h2 class="page-header" data-toc-text="8.0.0" id="arrow-800">arrow 8.0.0<small>2022-05-09</small><a class="anchor" aria-label="anchor" href="#arrow-800"></a></h2>
+<div class="section level3">
+<h3 id="enhancements-to-dplyr-and-datasets-8-0-0">Enhancements to dplyr and datasets<a class="anchor" aria-label="anchor" href="#enhancements-to-dplyr-and-datasets-8-0-0"></a></h3>
 <ul><li>
-<code><a href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code>’s readr-style type <code>T</code> is now mapped to <code>timestamp(unit = "ns")</code> instead of <code>timestamp(unit = "s")</code>.</li>
+<code><a href="../reference/open_dataset.html">open_dataset()</a></code>:<ul><li>correctly supports the <code>skip</code> argument for skipping header rows in CSV datasets.</li>
+<li>can take a list of datasets with differing schemas and attempt to unify the schemas to produce a <code>UnionDataset</code>.</li>
+</ul></li>
+<li>Arrow <a href="https://dplyr.tidyverse.org" class="external-link">dplyr</a> queries:<ul><li>are supported on <code>RecordBatchReader</code>. This allows, for example, results from DuckDB to be streamed back into Arrow rather than materialized before continuing the pipeline.</li>
+<li>no longer need to materialize the entire result table before writing to a dataset if the query contains contains aggregations or joins.</li>
+<li>supports <code><a href="https://dplyr.tidyverse.org/reference/rename.html" class="external-link">dplyr::rename_with()</a></code>.</li>
+<li>
+<code><a href="https://dplyr.tidyverse.org/reference/count.html" class="external-link">dplyr::count()</a></code> returns an ungrouped dataframe.</li>
+</ul></li>
+<li>
+<code>write_dataset</code> has more options for controlling row group and file sizes when writing partitioned datasets, such as <code>max_open_files</code>, <code>max_rows_per_file</code>, <code>min_rows_per_group</code>, and <code>max_rows_per_group</code>.</li>
 <li>
-<code>lubridate</code>:<ul><li>component extraction functions: <code>tz()</code> (timezone), <code>semester()</code> (semester), <code>dst()</code> (daylight savings time indicator), <code><a href="https://rdrr.io/r/base/date.html" class="external-link">date()</a></code> (extract date), <code>epiyear()</code> (epiyear), improvements to <code>month()</code>, which now works with integer inputs.</li>
-<li>Added <code>make_date()</code> &amp; <code>make_datetime()</code> + <code><a href="https://rdrr.io/r/base/ISOdatetime.html" class="external-link">ISOdatetime()</a></code> &amp; <code><a href="https://rdrr.io/r/base/ISOdatetime.html" class="external-link">ISOdate()</a></code> to create date-times from numeric representations.</li>
-<li>Added <code>decimal_date()</code> and <code>date_decimal()</code>
+<code>write_csv_arrow</code> accepts a <code>Dataset</code> or an Arrow dplyr query.</li>
+<li>Joining one or more datasets while <code>option(use_threads = FALSE)</code> no longer crashes R. That option is set by default on Windows.</li>
+<li>
+<code>dplyr</code> joins support the <code>suffix</code> argument to handle overlap in column names.</li>
+<li>Filtering a Parquet dataset with <code><a href="https://rdrr.io/r/base/NA.html" class="external-link">is.na()</a></code> no longer misses any rows.</li>
+<li>
+<code><a href="../reference/map_batches.html">map_batches()</a></code> correctly accepts <code>Dataset</code> objects.</li>
+</ul></div>
+<div class="section level3">
+<h3 id="enhancements-to-date-and-time-support-8-0-0">Enhancements to date and time support<a class="anchor" aria-label="anchor" href="#enhancements-to-date-and-time-support-8-0-0"></a></h3>
+<ul><li>
+<code><a href="../reference/read_delim_arrow.html">read_csv_arrow()</a></code>’s readr-style type <code>T</code> is mapped to <code>timestamp(unit = "ns")</code> instead of <code>timestamp(unit = "s")</code>.</li>
+<li>For Arrow dplyr queries, added additional <a href="https://lubridate.tidyverse.org" class="external-link">lubridate</a> features and fixes:<ul><li>New component extraction functions:<ul><li>
+<code><a href="https://lubridate.tidyverse.org/reference/tz.html" class="external-link">lubridate::tz()</a></code> (timezone),</li>
+<li>
+<code><a href="https://lubridate.tidyverse.org/reference/quarter.html" class="external-link">lubridate::semester()</a></code>,</li>
+<li>
+<code><a href="https://lubridate.tidyverse.org/reference/dst.html" class="external-link">lubridate::dst()</a></code> (daylight savings time boolean),</li>
+<li>
+<code><a href="https://lubridate.tidyverse.org/reference/date.html" class="external-link">lubridate::date()</a></code>,</li>
+<li>
+<code><a href="https://lubridate.tidyverse.org/reference/year.html" class="external-link">lubridate::epiyear()</a></code> (year according to epidemiological week calendar),</li>
+</ul></li>
+<li>
+<code><a href="https://lubridate.tidyverse.org/reference/month.html" class="external-link">lubridate::month()</a></code> works with integer inputs.</li>
+<li>
+<code><a href="https://lubridate.tidyverse.org/reference/make_datetime.html" class="external-link">lubridate::make_date()</a></code> &amp; <code><a href="https://lubridate.tidyverse.org/reference/make_datetime.html" class="external-link">lubridate::make_datetime()</a></code> + <code>lubridate::ISOdatetime()</code> &amp; <code>lubridate::ISOdate()</code> to create date-times from numeric representations.</li>
+<li>
+<code><a href="https://lubridate.tidyverse.org/reference/decimal_date.html" class="external-link">lubridate::decimal_date()</a></code> and <code><a href="https://lubridate.tidyverse.org/reference/date_decimal.html" class="external-link">lubridate::date_decimal()</a></code>
+</li>
+<li>
+<code><a href="https://lubridate.tidyverse.org/reference/make_difftime.html" class="external-link">lubridate::make_difftime()</a></code> (duration constructor)</li>
+<li>
+<code><a href="https://lubridate.tidyverse.org/reference/duration.html" class="external-link">?lubridate::duration</a></code> helper functions, such as <code>dyears()</code>, <code>dhours()</code>, <code>dseconds()</code>.</li>
+<li><code><a href="https://lubridate.tidyverse.org/reference/leap_year.html" class="external-link">lubridate::leap_year()</a></code></li>
+<li>
+<code><a href="https://lubridate.tidyverse.org/reference/as_date.html" class="external-link">lubridate::as_date()</a></code> and <code><a href="https://lubridate.tidyverse.org/reference/as_date.html" class="external-link">lubridate::as_datetime()</a></code>
 </li>
-<li>Added <code>make_difftime()</code> (duration constructor)</li>
-<li>Added duration helper functions: <code>dyears()</code>, <code>dmonths()</code>, <code>dweeks()</code>, <code>ddays()</code>, <code>dhours()</code>, <code>dminutes()</code>, <code>dseconds()</code>, <code>dmilliseconds()</code>, <code>dmicroseconds()</code>, <code>dnanoseconds()</code>.</li>
 </ul></li>
-<li>date-time functionality:<ul><li>Added <code>as_date()</code> and <code>as_datetime()</code>
+<li>Also for Arrow dplyr queries, added support and fixes for base date and time functions:<ul><li>
+<code><a href="https://rdrr.io/r/base/difftime.html" class="external-link">base::difftime</a></code> and <code><a href="https://rdrr.io/r/base/difftime.html" class="external-link">base::as.difftime()</a></code>
 </li>
-<li>Added <code>difftime</code> and <code><a href="https://rdrr.io/r/base/difftime.html" class="external-link">as.difftime()</a></code>
+<li>
+<code><a href="https://rdrr.io/r/base/as.Date.html" class="external-link">base::as.Date()</a></code> to convert to date</li>
+<li>Arrow timestamp and date arrays support <code><a href="https://rdrr.io/r/base/format.html" class="external-link">base::format()</a></code>
 </li>
-<li>Added <code><a href="https://rdrr.io/r/base/as.Date.html" class="external-link">as.Date()</a></code> to convert to date</li>
+<li>
+<code><a href="https://rdrr.io/r/base/strptime.html" class="external-link">strptime()</a></code> returns <code>NA</code> instead of erroring in case of format mismatch, just like <code><a href="https://rdrr.io/r/base/strptime.html" class="external-link">base::strptime()</a></code>.</li>
+</ul></li>
+<li>Timezone operations are supported on Windows if the <a href="https://cran.r-project.org/web/packages/tzdb/index.html" class="external-link">tzdb package</a> is also installed.</li>
+</ul></div>
+<div class="section level3">
+<h3 id="extensibility-8-0-0">Extensibility<a class="anchor" aria-label="anchor" href="#extensibility-8-0-0"></a></h3>
+<ul><li>Added S3 generic conversion functions such as <code><a href="../reference/as_arrow_array.html">as_arrow_array()</a></code> and <code><a href="../reference/as_arrow_table.html">as_arrow_table()</a></code> for main Arrow objects. This includes, Arrow tables, record batches, arrays, chunked arrays, record batch readers, schemas, and data types. This allows other packages to define custom conversions from their types to Arrow objects, including extension arrays.</li>
+<li>Custom <a href="https://arrow.apache.org/docs/format/Columnar.html#extension-types" class="external-link">extension types and arrays</a> can be created and registered, allowing other packages to define their own array types. Extension arrays wrap regular Arrow array types and provide customized behavior and/or storage. See description and an example with <code><a href="../reference/new_extension_type.html">?new_extension_type</a></code>.</li>
+<li>Implemented a generic extension type and as_arrow_array() methods for all objects where<br><code><a href="https://vctrs.r-lib.org/reference/vec_assert.html" class="external-link">vctrs::vec_is()</a></code> returns TRUE (i.e., any object that can be used as a column in a <code><a href="https://tibble.tidyverse.org/reference/tibble.html" class="external-link">tibble::tibble()</a></code>), provided that the underlying <code><a href="https://vctrs.r-lib.org/reference/vec_data.html" class [...]
+</ul></div>
+<div class="section level3">
+<h3 id="concatenation-support-8-0-0">Concatenation Support<a class="anchor" aria-label="anchor" href="#concatenation-support-8-0-0"></a></h3>
+<p>Arrow arrays and tables can be easily concatenated:</p>
+<ul><li>Arrays can be concatenated with <code><a href="../reference/concat_arrays.html">concat_arrays()</a></code> or, if zero-copy is desired and chunking is acceptable, using <code>ChunkedArray$create()</code>.</li>
+<li>ChunkedArrays can be concatenated with <code><a href="https://rdrr.io/r/base/c.html" class="external-link">c()</a></code>.</li>
+<li>RecordBatches and Tables support <code><a href="https://rdrr.io/r/base/cbind.html" class="external-link">cbind()</a></code>.</li>
+<li>Tables support <code><a href="https://rdrr.io/r/base/cbind.html" class="external-link">rbind()</a></code>. <code><a href="../reference/concat_tables.html">concat_tables()</a></code> is also provided to concatenate tables while unifying schemas.</li>
+</ul></div>
+<div class="section level3">
+<h3 id="other-improvements-and-fixes-8-0-0">Other improvements and fixes<a class="anchor" aria-label="anchor" href="#other-improvements-and-fixes-8-0-0"></a></h3>
+<ul><li>Dictionary arrays support using ALTREP when converting to R factors.</li>
+<li>Math group generics are implemented for ArrowDatum. This means you can use base functions like <code><a href="https://rdrr.io/r/base/MathFun.html" class="external-link">sqrt()</a></code>, <code><a href="https://rdrr.io/r/base/Log.html" class="external-link">log()</a></code>, and <code><a href="https://rdrr.io/r/base/Log.html" class="external-link">exp()</a></code> with Arrow arrays and scalars.</li>
+<li>
+<code>read_*</code> and <code>write_*</code> functions support R Connection objects for reading and writing files.</li>
+<li>Parquet improvements:<ul><li>Parquet writer supports Duration type columns.</li>
+<li>The dataset Parquet reader consumes less memory.</li>
 </ul></li>
 <li>
-<code><a href="https://rdrr.io/r/stats/median.html" class="external-link">median()</a></code> and <code><a href="https://rdrr.io/r/stats/quantile.html" class="external-link">quantile()</a></code> will warn once about approximate calculations regardless of interactivity.</li>
-<li>Removed Solaris workarounds, libarrow is now required.</li>
+<code><a href="https://rdrr.io/r/stats/median.html" class="external-link">median()</a></code> and <code><a href="https://rdrr.io/r/stats/quantile.html" class="external-link">quantile()</a></code> will warn only once about approximate calculations regardless of interactivity.</li>
+<li>
+<code>Array$cast()</code> can cast StructArrays into another struct type with the same field names and structure (or a subset of fields) but different field types.</li>
+<li>Removed special handling for Solaris.</li>
+<li>The CSV writer is much faster when writing string columns.</li>
+<li>Fixed an issue where <code><a href="../reference/io_thread_count.html">set_io_thread_count()</a></code> would set the CPU count instead of the IO thread count.</li>
+<li>
+<code>RandomAccessFile</code> has a <code>$ReadMetadata()</code> method that provides useful metadata provided by the filesystem.</li>
+<li>
+<code>grepl</code> binding returns <code>FALSE</code> for <code>NA</code> inputs (previously it returned <code>NA</code>), to match the behavior of <code><a href="https://rdrr.io/r/base/grep.html" class="external-link">base::grepl()</a></code>.</li>
+<li>
+<code><a href="../reference/create_package_with_all_dependencies.html">create_package_with_all_dependencies()</a></code> works on Windows and Mac OS, instead of only Linux.</li>
 </ul></div>
+</div>
     <div class="section level2">
 <h2 class="page-header" data-toc-text="7.0.0" id="arrow-700">arrow 7.0.0<small>2022-02-10</small><a class="anchor" aria-label="anchor" href="#arrow-700"></a></h2>
 <div class="section level3">