You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/29 02:40:37 UTC

[GitHub] [arrow-site] liyafan82 commented on a change in pull request #127: Blog post for 5.0.0 release

liyafan82 commented on a change in pull request #127:
URL: https://github.com/apache/arrow-site/pull/127#discussion_r678780589



##########
File path: _posts/2021-07-20-5.0.0-release.md
##########
@@ -0,0 +1,264 @@
+---
+layout: post
+title: "Apache Arrow 5.0.0 Release"
+date: "2020-07-16 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+The Apache Arrow team is pleased to announce the 5.0.0 release. This covers
+3 months of development work and includes **XX commits** from
+[**XX distinct contributors**][1] in 2 repositories. See the Install Page to
+learn how to get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the complete changelogs for the [`apache/arrow`][2] and
+[`apache/arrow-rs`][3] repositories.
+
+## Community
+
+Since the 4.0.0 release, Daniƫl Heres, Kazuaki Ishizaki, Dominik Moritz, and Weston Pace
+have been invited as committers to Arrow,
+and Benjamin Kietzman and David Li have joined the Project Management Committee
+(PMC). Thank you for all of your contributions!
+
+## Columnar Format Notes
+
+Official IANA Media types (MIME types) have been registered for Apache
+Arrow IPC protocol data, both [stream]({{ site.baseurl }}/docs/format/Columnar.html#ipc-streaming-format)
+and [file]({{ site.baseurl }}/docs/format/Columnar.html#ipc-file-format) variants:
+
+* https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.stream
+* https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.file
+
+We recommend `.arrow` as the IPC file format file extension and `.arrows` for
+the IPC streaming format file extension.
+
+## Arrow Flight RPC notes
+
+The Go implementation now supports custom metadata and middleware, and has
+been added to integration testing.
+
+In Python, some operations can now be interrupted via Control-C.
+
+## C++ notes
+
+`MakeArrayFromScalar` now works for fixed-size binary types (ARROW-13321).
+
+### Compute layer
+
+The following [compute functions]({{site.baseurl}}/docs/cpp/compute.html)
+were added:
+
+* aggregations: `index`
+
+* scalar arithmetic and math functions: `abs`, `abs_checked`, `acos`,
+  `acos_checked`, `asin`, `asin_checked`, `atan`, `atan2`, `ceil`, `cos`,
+  `cos_checked`, `floor`, `ln`, `ln_checked`, `log10`, `log10_checked`,
+  `log1p`, `log1p_checked`, `log2`, `log2_checked`, `negate`, `negate_checked`,
+  `sign`, `sin`, `sin_checked`, `tan`, `tan_checked`, `trunc`
+
+* scalar bitwise functions: `bit_wise_and`, `bit_wise_not`, `bit_wise_or`,
+  `bit_wise_xor`, `shift_left`, `shift_left_checked`, `shift_right`,
+  `shift_right_checked`
+
+* scalar string functions: `ascii_center`, `ascii_lpad`, `ascii_reverse`,
+  `ascii_rpad`, `binary_join`, `binary_join_element_wise`,
+  `binary_replace_slice`, `count_substring`, `count_substring_regex`,
+  `ends_with`, `find_substring`, `find_substring_regex`, `match_like`,
+  `split_pattern_regex`, `starts_with`, `utf8_center`, `utf8_lpad`,
+  `utf8_replace_slice`, `utf8_rpad`, `utf8_reverse`, `utf8_slice_codepoints`
+
+* scalar temporal functions: `day`, `day_of_week`, `day_of_year`,
+  `iso_calendar`, `iso_week`, `iso_year`, `hour`, `microsecond`, `millisecond`,
+  `minute`, `month`, `nanosecond`, `quarter`, `second`, `subsecond`, `year`
+
+* other scalar functions: `case_when`, `coalesce`, `if_else`, `is_finite`,
+  `is_inf`, `is_nan`, `max_element_wise`, `min_element_wise`, `make_struct`
+
+* vector functions: `replace_with_mask`
+
+Duplicates are now allowed in `SetLookupOptions::value_set` (ARROW-12554).
+
+Decimal types are now supported by some basic arithmetic functions (ARROW-12074).
+
+The `take` function now supports dense unions (ARROW-13005).
+
+It is now possible to cast between dictionary types with different index
+types (ARROW-11673).
+
+Sorting is now implemented for boolean input (ARROW-12016).
+
+### CSV
+
+The streaming CSV reader can now take some advantage of multiple threads (ARROW-11889).
+
+The CSV reader tries to make its errors more informative by adding the
+row number when it is known, i.e. when parallel reading is disabled (ARROW-12675).
+
+A new option `ReaderOptions::skip_rows_after_names` allows skipping a number
+of rows _after_ reading the column names (as opposed to
+`ReaderOptions::skip_rows`).
+
+Quoted strings can now be treated as always non-null (ARROW-10115).
+
+### Dataset layer
+
+The asynchronous scanner introduced in 4.0.0 has been improved with truly 
+asynchronous readers implemented for CSV, Parquet, and IPC file formats and 
+file-level parallelism added.  This mode is controlled by a flag `use_async` that
+can be passed into methods which scan a dataset.  Setting this flag to True
+will have significant improvements on filesystems with high latency or parallel
+reads (e.g. S3).
+
+A CountRows method has been added to count rows matching a predicate; where
+possible, this will use metadata in files instead of reading the data itself.
+
+CSV datasets can now be written, and when reading a CSV dataset, explicit types can
+now be specified for a subset of columns while allowing the rest to still be inferred. 
+
+### IO and Filesystem layer
+
+The I/O thread pool size can now be adjuted at runtime (ARROW-12760).
+The default size remains 8 threads.
+
+Streams now can have auxiliary metadata, depending on the backend.  This
+has been implemented for the S3 filesystems, where a couple metadata
+keys are supported such as `Content-Type` and `ACL` (ARROW-11161, ARROW-12719).
+
+The HadoopFileSystem implementation now implements the FileSystem abstraction
+more faithfully (ARROW-12790).
+
+### Parquet
+
+The new LZ4_RAW compression scheme was implemented (PARQUET-1998).
+Unlike the legacy LZ4 compression scheme, it is defined unambiguously
+and should provide better portability once other Parquet implementations
+catch up.
+
+## Go notes
+
+
+### Flight 
+
+* Flight Client and Server now support Custom Metadata through the functions `flight.NewClientWithMiddleware` and `flight.NewServerWithMiddleware`. Functions
+`flight.NewFlightClient`, `flight.NewFlightServer`, `flight.CreateServerBearerTokenAuthInterceptors` have been deprecated in favor of using the new middleware. [#10633](https://github.com/apache/arrow/pull/10633)
+* Flight Client `AuthHandler` no longer overwrites outgoing metadata, correctly appending new metadata without overwriting existing metadata [#10297](https://github.com/apache/arrow/pull/10297)
+* Flight AppMetadata field is now exposed both for Reading and Writing via `flight.Reader#LatestAppMetadata()` and `flight.Writer#WriteWithAppMetadata` functions [#10142](https://github.com/apache/arrow/pull/10142)
+
+### Other enhancements 
+
+* [Map](https://github.com/apache/arrow/pull/10106) and [Extension](https://github.com/apache/arrow/pull/10203) Datatypes are now implemented for Arrow Arrays 
+* [Schema package](https://github.com/apache/arrow/10071) and first part of [Encoding package](https://github.com/apache/arrow/10379) added for Golang Parquet Implementation
+
+## Java notes
+
+Highlighted improvements and fixes:
+
+* Improved support for extension types using a complex storage type, e.g. struct, map or union. These can now extend
+the `ExtensionTypeVector` base class.
+* Union vectors now extend `AbstractContainerVector` to be consistent with other vectors.
+* Guava dependency updated to 30.1.1
+* Memory leak fixed if an exception occurs when reading IPC messages from a channel. [#10423](https://github.com/apache/arrow/pull/10423)
+* Flight error metadata is now propagated to the client. [#10370](https://github.com/apache/arrow/pull/10370)
+* JDBC adapter now preserves nullability. [#10285](https://github.com/apache/arrow/pull/10285)
+

Review comment:
       ```suggestion
   * The memory rounding policy is respected when allocating vector buffers. This helps saving memory space. [#13147] (https://github.com/apache/arrow/pull/10576)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org