You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ko...@apache.org on 2023/04/05 07:14:40 UTC

[arrow-site] branch main updated: Add Hugging Face Datasets to powered_by.md (#341)

This is an automated email from the ASF dual-hosted git repository.

kou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/main by this push:
     new 4768c3c0c07 Add Hugging Face Datasets to powered_by.md (#341)
4768c3c0c07 is described below

commit 4768c3c0c07103155759f29652b73a3b290dfa3d
Author: Christopher Akiki <ch...@gmail.com>
AuthorDate: Wed Apr 5 09:14:34 2023 +0200

    Add Hugging Face Datasets to powered_by.md (#341)
    
    This adds HF `datasets` to the list of projects using Arrow.
    (https://huggingface.co/docs/datasets/about_arrow)
---
 powered_by.md | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/powered_by.md b/powered_by.md
index ec368907654..fad37c2a676 100644
--- a/powered_by.md
+++ b/powered_by.md
@@ -123,7 +123,11 @@ short description of your use case.
 * **[HASH][39]:** HASH is an open-core platform for building, running, and learning
   from simulations, with an in-browser IDE. HASH Engine uses Apache Arrow to power
   the datastore for simulation state during computation, enabling zero-copy data
-  transfer between simulation logic written across Rust, JavaScript, and Python.
+* **[Hugging Face Datasets][47]:** A machine learning datasets library and hub
+  for accessing, processing and sharing datasets for audio, computer vision, 
+  natural language processing, and tabular tasks. Dataset objects are wrappers around 
+  Arrow Tables and memory-mapped from disk to support out-of-core parallel processing 
+  for machine learning workflows.
 * **[InAccel][29]:** A machine learning acceleration framework which leverages
   FPGAs-as-a-service. InAccel supports dataframes backed by Apache Arrow to
   serve as input for our implemented ML algorithms. Those dataframes can be
@@ -248,3 +252,4 @@ short description of your use case.
 [44]: https://clickhouse.com/docs/en/interfaces/formats/#data-format-arrow
 [45]: https://unum.cloud/ukv/
 [46]: https://github.com/GrepTimeTeam/greptimedb/
+[47]: https://github.com/huggingface/datasets