You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by pa...@apache.org on 2023/06/22 19:58:13 UTC

[arrow-nanoarrow] branch main updated: feat(extensions/nanoarrow_device): Draft DeviceArray interface (#205)

This is an automated email from the ASF dual-hosted git repository.

paleolimbot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-nanoarrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 086793a  feat(extensions/nanoarrow_device): Draft DeviceArray interface (#205)
086793a is described below

commit 086793ae024c217f985583a55ce25bfa502407b4
Author: Dewey Dunnington <de...@dunnington.ca>
AuthorDate: Thu Jun 22 16:58:07 2023 -0300

    feat(extensions/nanoarrow_device): Draft DeviceArray interface (#205)
    
    After:
    
    - https://github.com/zeroshade/arrow-non-cpu/tree/main
    - https://lists.apache.org/thread/o2hsw7o1gm3qgw5z51rmz6zqxh0p7bvk
    - https://github.com/apache/arrow/pull/34972
    
    Still in very much draft form; however, it *does* implement arbitrary
    ArrowArray copy to/from `ARROW_DEVICE_METAL`, `ARROW_DEVICE_CUDA`,
    `ARROW_DEVICE_CUDA_HOST`, and `ARROW_DEVICE_CPU`.
    
    The nanoarrow_device extension as drafted here serves a similar purpose
    to nanoarrow: a means by which to create and consume the C ABI with the
    intention of shipping those structures to other libraries to do
    transformations, and potentially retrieving them again after the
    computation is complete. Perhaps another way to put it is that nanoarrow
    is designed to help at the edges: it can create and consume. Similarly,
    the nanoarrow_device extension is designed to help at the edges: it can
    copy/move arrays to and from CPU-land.
    
    With this PR, you can currently do something like:
    
    ```c
    struct ArrowDevice* gpu = ArrowDeviceMetalDefaultDevice();
    // Alternatively, ArrowDeviceCuda(ARROW_DEVICE_CUDA, 0)
    // or  ArrowDeviceCuda(ARROW_DEVICE_CUDA_HOST, 0)
    struct ArrowDevice* cpu = ArrowDeviceCpu();
    struct ArrowArray array;
    struct ArrowDeviceArray device_array;
    struct ArrowDeviceArrayView device_array_view;
    
    // Build a CPU array
    ASSERT_EQ(ArrowArrayInitFromType(&array, NANOARROW_TYPE_STRING), NANOARROW_OK);
    ASSERT_EQ(ArrowArrayStartAppending(&array), NANOARROW_OK);
    ASSERT_EQ(ArrowArrayAppendString(&array, ArrowCharView("abc")), NANOARROW_OK);
    ASSERT_EQ(ArrowArrayAppendString(&array, ArrowCharView("defg")), NANOARROW_OK);
    ASSERT_EQ(ArrowArrayAppendNull(&array, 1), NANOARROW_OK);
    ASSERT_EQ(ArrowArrayFinishBuildingDefault(&array, nullptr), NANOARROW_OK);
    
    // Convert to a DeviceArray, still on the CPU
    ArrowDeviceArrayInit(&device_array, cpu);
    ArrowArrayMove(&array, &device_array.array);
    
    // Parse contents into a view that can be copied to another device
    ArrowDeviceArrayViewInit(&device_array_view);
    ArrowArrayViewInitFromType(&device_array_view.array_view, string_type);
    ASSERT_EQ(ArrowDeviceArrayViewSetArray(&device_array_view, &device_array, nullptr),
              NANOARROW_OK);
    
    // Try to zero-copy move to another device or copy if that is not possible. Zero-copy move
    // is implemented for ARROW_DEVICE_METAL and ARROW_DEVICE_CUDA_HOST for the
    // gpu -> cpu case.
    struct ArrowDeviceArray device_array2;
    device_array2.array.release = nullptr;
    ASSERT_EQ(
        ArrowDeviceArrayTryMove(&device_array, &device_array_view, gpu, &device_array2),
        NANOARROW_OK);
    ```
    
    In concrete terms, that means we to know enough about a device to (1)
    copy and/or move an arbitrary `ArrowArray`/`ArrowSchema` pair to a
    device from the CPU and (2) copy/move an arbitrary
    `ArrowDeviceArray`/`ArrowSchema` pair back to the CPU. The three types
    of copying I support (and maybe there could be fewer/need to be more)
    are:
    
    - `ArrowDeviceBufferInit()`: Make a non-owning buffer into an owning
    buffer on a device. The entry point if you want to take a slice of an
    `ArrowArrayView` and ship it to a device.
    - `ArrowDeviceBufferMove()`: Move an existing (owning) buffer to a
    device. For devices like the CPU, this is a true zero-copy move; for
    shared memory this can also sometimes be zero copy (e.g., Apple Metal ->
    CPU) but might also involve a copy.
    - `ArrowDeviceBufferCopy()`: Copy a section of a buffer into a
    preallocated section of another buffer. I'm envisioning this to be
    necessary when copying a String, Binary, List...we need the first and
    last values of the offsets buffer in order to know what portion of the
    data buffer to copy. It seems unnecessary to copy 4 bytes of a buffer
    into an owning variant covered by the first bullet but 🤷 .
    
    This PR currently provides support for the CPU device, Apple Metal,
    CUDA, and CUDA_HOST (i.e., CPU memory that has been registered with CUDA
    which CUDA copies under the hood).
    
    ---------
    
    Co-authored-by: Keith Kraus <ke...@gmail.com>
---
 .github/workflows/build-and-test-device.yaml       | 127 ++++++
 .github/workflows/bundle.yaml                      |   8 +
 ci/scripts/build-docs.sh                           |   7 +-
 ci/scripts/coverage.sh                             |  14 +
 docs/source/conf.py                                |   7 +-
 docs/source/index.rst                              |   4 +-
 docs/source/{ => reference}/c.rst                  |   0
 docs/source/{ => reference}/cpp.rst                |   0
 docs/source/{ipc.rst => reference/device.rst}      |  25 +-
 docs/source/{ => reference}/index.rst              |   9 +-
 docs/source/{ => reference}/ipc.rst                |   0
 docs/source/{index.rst => reference/r.rst}         |  14 +-
 extensions/nanoarrow_device/.gitignore             |  18 +
 extensions/nanoarrow_device/CMakeLists.txt         | 221 ++++++++++
 extensions/nanoarrow_device/CMakePresets.json      |  26 ++
 .../nanoarrow_device/CMakeUserPresets.json.example |  29 ++
 extensions/nanoarrow_device/README.md              |  68 +++
 extensions/nanoarrow_device/src/apidoc/.gitignore  |  18 +
 extensions/nanoarrow_device/src/apidoc/Doxyfile    | 407 +++++++++++++++++
 .../src/nanoarrow/nanoarrow_device.c               | 491 +++++++++++++++++++++
 .../src/nanoarrow/nanoarrow_device.h               | 334 ++++++++++++++
 .../src/nanoarrow/nanoarrow_device.hpp             | 123 ++++++
 .../src/nanoarrow/nanoarrow_device_cuda.c          | 403 +++++++++++++++++
 .../src/nanoarrow/nanoarrow_device_cuda.h          |  52 +++
 .../src/nanoarrow/nanoarrow_device_cuda_test.cc    | 230 ++++++++++
 .../src/nanoarrow/nanoarrow_device_hpp_test.cc     |  75 ++++
 .../src/nanoarrow/nanoarrow_device_metal.cc        | 389 ++++++++++++++++
 .../src/nanoarrow/nanoarrow_device_metal.h         |  85 ++++
 .../src/nanoarrow/nanoarrow_device_metal_test.cc   | 272 ++++++++++++
 .../src/nanoarrow/nanoarrow_device_test.cc         | 110 +++++
 src/nanoarrow/array.c                              |  13 +
 src/nanoarrow/array_inline.h                       |   2 +
 src/nanoarrow/nanoarrow.h                          |  10 +
 33 files changed, 3560 insertions(+), 31 deletions(-)

diff --git a/.github/workflows/build-and-test-device.yaml b/.github/workflows/build-and-test-device.yaml
new file mode 100644
index 0000000..dfb6e0a
--- /dev/null
+++ b/.github/workflows/build-and-test-device.yaml
@@ -0,0 +1,127 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: test-c-device
+
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - main
+    paths:
+      - 'CMakeLists.txt'
+      - '.github/workflows/build-and-test-device.yaml'
+      - 'src/nanoarrow/**'
+      - 'extensions/nanoarrow_device/**'
+
+jobs:
+  test-c-device:
+
+    runs-on: ubuntu-latest
+
+    name: ${{ matrix.config.label }}
+
+    strategy:
+      fail-fast: false
+      matrix:
+        config:
+          - {label: default-build}
+          - {label: namespaced-build, cmake_args: "-DNANOARROW_NAMESPACE=SomeUserNamespace"}
+          - {label: bundled-build, cmake_args: "-DNANOARROW_DEVICE_BUNDLE=ON"}
+
+    env:
+      SUBDIR: 'extensions/nanoarrow_device'
+      NANOARROW_ARROW_TESTING_DIR: '${{ github.workspace }}/arrow-testing'
+
+    steps:
+      - name: Checkout repo
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+
+      - name: Checkout arrow-testing
+        uses: actions/checkout@v3
+        with:
+          repository: apache/arrow-testing
+          fetch-depth: 0
+          path: arrow-testing
+
+      - name: Install dependencies
+        run: |
+          sudo apt-get update
+          sudo apt install -y -V ca-certificates lsb-release wget cmake valgrind
+          wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
+          sudo apt-get install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
+          sudo apt-get update
+          sudo apt-get install -y -V libarrow-dev
+          rm apache-arrow-apt-*.deb
+
+      - name: Build
+        run: |
+          cd $SUBDIR
+          export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pwd`/dist/lib
+          sudo ldconfig
+          mkdir build
+          cd build
+          cmake .. -DCMAKE_BUILD_TYPE=Debug -DNANOARROW_DEVICE_BUILD_TESTS=ON ${{ matrix.config.cmake_args }}
+          cmake --build .
+
+      - name: Check for non-namespaced symbols in namespaced build
+        if: matrix.config.label == 'namespaced-build'
+        run: |
+          cd $SUBDIR
+
+          # Dump all symbols
+          nm --extern-only build/libnanoarrow_device.a
+
+          # Check for non-namespaced ones
+          ARROW_SYMBOLS=`nm --extern-only build/libnanoarrow_device.a | grep "T Arrow" || true`
+          if [ -z "$ARROW_SYMBOLS" ]; then
+            exit 0
+          fi
+
+          echo "Found the following non-namespaced extern symbols:"
+          echo $ARROW_SYMBOLS
+          exit 1
+
+      - name: Run tests
+        run: |
+          cd $SUBDIR
+
+          export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pwd`/dist/lib
+          sudo ldconfig
+          cd build
+          ctest -T test --output-on-failure .
+
+      - name: Run tests with valgrind
+        if: matrix.config.label == 'default-build' || matrix.config.label == 'default-noatomics'
+        run: |
+          cd $SUBDIR
+
+          export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pwd`/dist/lib
+          sudo ldconfig
+          cd build
+          ctest -T memcheck .
+
+      - name: Upload memcheck results
+        if: failure() && matrix.config.label == 'default-build'
+        uses: actions/upload-artifact@main
+        with:
+          name: nanoarrow-device-memcheck
+          path: extensions/nanoarrow_device/build/Testing/Temporary/MemoryChecker.*.log
diff --git a/.github/workflows/bundle.yaml b/.github/workflows/bundle.yaml
index 205ddbd..3084730 100644
--- a/.github/workflows/bundle.yaml
+++ b/.github/workflows/bundle.yaml
@@ -57,6 +57,14 @@ jobs:
           cmake --build .
           cmake --install . --prefix=../../../nanoarrow-latest
 
+      - name: Bundle nanoarrow_device
+        run: |
+          cd extensions/nanoarrow_device
+          mkdir build && cd build
+          cmake .. -DNANOARROW_DEVICE_BUNDLE=ON
+          cmake --build .
+          cmake --install . --prefix=../../../nanoarrow-latest
+
       - name: Compress bundle
         run: |
           zip nanoarrow-latest.zip $(find nanoarrow-latest -type f)
diff --git a/ci/scripts/build-docs.sh b/ci/scripts/build-docs.sh
index f39d992..5f12d08 100755
--- a/ci/scripts/build-docs.sh
+++ b/ci/scripts/build-docs.sh
@@ -59,18 +59,21 @@ main() {
    rm -rf docs/_build
    mkdir -p docs/_build
 
-   # Run doxygen
    show_header "Run Doxygen for C library"
    pushd src/apidoc
    doxygen
    popd
 
-   # Run doxygen
    show_header "Run Doxygen for IPC extension"
    pushd extensions/nanoarrow_ipc/src/apidoc
    doxygen
    popd
 
+   show_header "Run Doxygen for device extension"
+   pushd extensions/nanoarrow_device/src/apidoc
+   doxygen
+   popd
+
    pushd docs
 
    show_header "Build Sphinx project"
diff --git a/ci/scripts/coverage.sh b/ci/scripts/coverage.sh
index 1a2c78b..c5bf6bc 100755
--- a/ci/scripts/coverage.sh
+++ b/ci/scripts/coverage.sh
@@ -85,6 +85,20 @@ function main() {
 
     pushd "${SANDBOX_DIR}"
 
+    # Build + run tests with gcov for device extension
+    show_header "Build + test nanoarrow_device"
+    mkdir "${SANDBOX_DIR}/nanoarrow_device"
+    pushd "${SANDBOX_DIR}/nanoarrow_device"
+
+    cmake "${TARGET_NANOARROW_DIR}/extensions/nanoarrow_device" \
+        -DNANOARROW_DEVICE_BUILD_TESTS=ON -DNANOARROW_DEVICE_CODE_COVERAGE=ON
+    cmake --build .
+    ctest .
+
+    popd
+
+    pushd "${SANDBOX_DIR}"
+
     # Generate coverage.info file for both cmake projects using lcov
     show_header "Calculate CMake project coverage"
     lcov --capture --directory . \
diff --git a/docs/source/conf.py b/docs/source/conf.py
index 9ef7e63..a55629a 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -52,7 +52,8 @@ extensions = [
 # Breathe configuration
 breathe_projects = {
     "nanoarrow_c": "../../src/apidoc/xml",
-    "nanoarrow_ipc": "../../extensions/nanoarrow_ipc/src/apidoc/xml"
+    "nanoarrow_ipc": "../../extensions/nanoarrow_ipc/src/apidoc/xml",
+    "nanoarrow_device": "../../extensions/nanoarrow_device/src/apidoc/xml"
 }
 breathe_default_project = "nanoarrow_c"
 
@@ -75,9 +76,7 @@ html_theme = 'pydata_sphinx_theme'
 html_theme_options = {
     "show_toc_level": 2,
     "use_edit_page_button": True,
-    "external_links": [
-      {"name": "R Package", "url": "r/index.html"},
-  ],
+    "external_links": [],
 }
 
 html_context = {
diff --git a/docs/source/index.rst b/docs/source/index.rst
index fa31569..84c1d1f 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -24,6 +24,4 @@ Contents
    :maxdepth: 2
 
    Getting Started <getting-started>
-   C API Reference <c>
-   C++ API Reference <cpp>
-   IPC Extension Reference <ipc>
+   API Reference <reference/index>
diff --git a/docs/source/c.rst b/docs/source/reference/c.rst
similarity index 100%
rename from docs/source/c.rst
rename to docs/source/reference/c.rst
diff --git a/docs/source/cpp.rst b/docs/source/reference/cpp.rst
similarity index 100%
rename from docs/source/cpp.rst
rename to docs/source/reference/cpp.rst
diff --git a/docs/source/ipc.rst b/docs/source/reference/device.rst
similarity index 66%
copy from docs/source/ipc.rst
copy to docs/source/reference/device.rst
index fb009c1..6ec6fa5 100644
--- a/docs/source/ipc.rst
+++ b/docs/source/reference/device.rst
@@ -15,19 +15,32 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-IPC Extension Reference
-=======================
+Device Extension Reference
+==========================
 
 C API
 ------------------------
 
-.. doxygengroup:: nanoarrow_ipc
-   :project: nanoarrow_ipc
+.. doxygengroup:: nanoarrow_device
+   :project: nanoarrow_device
    :members:
 
 C++ Helpers
 ------------------------
 
-.. doxygengroup:: nanoarrow_ipc_hpp-unique
-   :project: nanoarrow_ipc
+.. doxygengroup:: nanoarrow_device_hpp-unique
+   :project: nanoarrow_device
    :members:
+
+Arrow C Device Interface
+------------------------
+
+.. doxygengroup:: nanoarrow_device-arrow-cdata
+   :project: nanoarrow_device
+   :members:
+   :undoc-members:
+
+.. doxygengroup:: arrow-device-types
+   :project: nanoarrow_device
+   :members:
+   :undoc-members:
diff --git a/docs/source/index.rst b/docs/source/reference/index.rst
similarity index 91%
copy from docs/source/index.rst
copy to docs/source/reference/index.rst
index fa31569..56c4b50 100644
--- a/docs/source/index.rst
+++ b/docs/source/reference/index.rst
@@ -15,15 +15,14 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-.. include:: README_generated.rst
-
-Contents
---------
+API Reference
+=============
 
 .. toctree::
    :maxdepth: 2
 
-   Getting Started <getting-started>
+   R API Reference <r>
    C API Reference <c>
    C++ API Reference <cpp>
    IPC Extension Reference <ipc>
+   Device Extension Reference <device>
diff --git a/docs/source/ipc.rst b/docs/source/reference/ipc.rst
similarity index 100%
rename from docs/source/ipc.rst
rename to docs/source/reference/ipc.rst
diff --git a/docs/source/index.rst b/docs/source/reference/r.rst
similarity index 79%
copy from docs/source/index.rst
copy to docs/source/reference/r.rst
index fa31569..69becb1 100644
--- a/docs/source/index.rst
+++ b/docs/source/reference/r.rst
@@ -15,15 +15,7 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-.. include:: README_generated.rst
+R API Reference
+==================
 
-Contents
---------
-
-.. toctree::
-   :maxdepth: 2
-
-   Getting Started <getting-started>
-   C API Reference <c>
-   C++ API Reference <cpp>
-   IPC Extension Reference <ipc>
+See `R Documentation <../r/reference/index.html>`__.
diff --git a/extensions/nanoarrow_device/.gitignore b/extensions/nanoarrow_device/.gitignore
new file mode 100644
index 0000000..ed6ccba
--- /dev/null
+++ b/extensions/nanoarrow_device/.gitignore
@@ -0,0 +1,18 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+dist/
diff --git a/extensions/nanoarrow_device/CMakeLists.txt b/extensions/nanoarrow_device/CMakeLists.txt
new file mode 100644
index 0000000..336b75e
--- /dev/null
+++ b/extensions/nanoarrow_device/CMakeLists.txt
@@ -0,0 +1,221 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Building using CMake version: ${CMAKE_VERSION}")
+cmake_minimum_required(VERSION 3.14)
+include(FetchContent)
+
+if(NOT DEFINED CMAKE_C_STANDARD)
+  set(CMAKE_C_STANDARD 11)
+endif()
+
+project(nanoarrow_device)
+
+option(NANOARROW_DEVICE_BUILD_TESTS "Build tests" OFF)
+option(NANOARROW_DEVICE_BUNDLE "Create bundled nanoarrow_device.h and nanoarrow_device.c" OFF)
+option(NANOARROW_DEVICE_WITH_METAL "Build Apple metal extension" OFF)
+option(NANOARROW_DEVICE_WITH_CUDA "Build CUDA extension" OFF)
+
+
+option(NANOARROW_DEVICE_CODE_COVERAGE "Enable coverage reporting" OFF)
+add_library(device_coverage_config INTERFACE)
+
+if (NANOARROW_DEVICE_BUILD_TESTS OR NOT NANOARROW_DEVICE_BUNDLE)
+  # Add the nanoarrow dependency. nanoarrow is not linked into the
+  # nanoarrow_device library (the caller must link this themselves);
+  # however, we need nanoarrow.h to build nanoarrow_device.c.
+  FetchContent_Declare(
+    nanoarrow
+    SOURCE_DIR ${CMAKE_CURRENT_LIST_DIR}/../..)
+
+  # Don't install nanoarrow because of this configuration
+  FetchContent_GetProperties(nanoarrow)
+  if(NOT nanoarrow_POPULATED)
+    FetchContent_Populate(nanoarrow)
+    add_subdirectory(${nanoarrow_SOURCE_DIR} ${nanoarrow_BINARY_DIR} EXCLUDE_FROM_ALL)
+  endif()
+endif()
+
+if (NANOARROW_DEVICE_BUNDLE)
+  # The CMake build step is creating nanoarrow_device.c and nanoarrow_device.h;
+  # the CMake install step is copying them to a specific location
+  file(MAKE_DIRECTORY ${CMAKE_BINARY_DIR}/amalgamation)
+  file(MAKE_DIRECTORY ${CMAKE_BINARY_DIR}/amalgamation/nanoarrow)
+
+  # nanoarrow_device.h is currently standalone
+  set(NANOARROW_DEVICE_H_TEMP ${CMAKE_BINARY_DIR}/amalgamation/nanoarrow/nanoarrow_device.h)
+  file(READ src/nanoarrow/nanoarrow_device.h SRC_FILE_CONTENTS)
+  file(WRITE ${NANOARROW_DEVICE_H_TEMP} "${SRC_FILE_CONTENTS}")
+
+  # nanoarrow_device.c is currently standalone
+  set(NANOARROW_DEVICE_C_TEMP ${CMAKE_BINARY_DIR}/amalgamation/nanoarrow/nanoarrow_device.c)
+  file(READ src/nanoarrow/nanoarrow_device.c SRC_FILE_CONTENTS)
+  file(WRITE ${NANOARROW_DEVICE_C_TEMP} "${SRC_FILE_CONTENTS}")
+
+  # Add a library that the tests can link against (but don't install it)
+  if(NANOARROW_DEVICE_BUILD_TESTS)
+    add_library(nanoarrow_device ${NANOARROW_DEVICE_C_TEMP})
+
+    target_include_directories(nanoarrow_device PUBLIC
+      $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/src>
+      $<BUILD_INTERFACE:${nanoarrow_SOURCE_DIR}/src/nanoarrow>
+      $<BUILD_INTERFACE:${nanoarrow_BINARY_DIR}/generated>
+      $<BUILD_INTERFACE:${NANOARROW_DEVICE_FLATCC_INCLUDE_DIR}>)
+  endif()
+
+  # Install the amalgamated header and sources
+  install(FILES
+    ${NANOARROW_DEVICE_H_TEMP}
+    ${NANOARROW_DEVICE_C_TEMP}
+    DESTINATION ".")
+else()
+  # This is a normal CMake build that builds + installs some includes and a static lib
+  if (NANOARROW_DEVICE_WITH_METAL)
+    if (NOT EXISTS "${CMAKE_BINARY_DIR}/metal-cpp")
+      message(STATUS "Fetching metal-cpp")
+      file(DOWNLOAD
+        "https://developer.apple.com/metal/cpp/files/metal-cpp_macOS12_iOS15.zip"
+        "${CMAKE_BINARY_DIR}/metal-cpp.zip")
+      file(ARCHIVE_EXTRACT INPUT ${CMAKE_BINARY_DIR}/metal-cpp.zip DESTINATION ${CMAKE_BINARY_DIR})
+    endif()
+
+    if(NOT DEFINED CMAKE_CXX_STANDARD)
+      set(CMAKE_CXX_STANDARD 17)
+    endif()
+    set(CMAKE_CXX_STANDARD_REQUIRED ON)
+
+    find_library(METAL_LIBRARY Metal REQUIRED)
+    message(STATUS "Metal framework found at '${METAL_LIBRARY}'")
+
+    find_library(FOUNDATION_LIBRARY Foundation REQUIRED)
+    message(STATUS "Foundation framework found at '${FOUNDATION_LIBRARY}'")
+
+    find_library(QUARTZ_CORE_LIBRARY QuartzCore REQUIRED)
+    message(STATUS "CoreFoundation framework found at '${QUARTZ_CORE_LIBRARY}'")
+
+    set(NANOARROW_DEVICE_SOURCES_METAL src/nanoarrow/nanoarrow_device_metal.cc)
+    set(NANOARROW_DEVICE_INCLUDE_METAL ${CMAKE_BINARY_DIR}/metal-cpp)
+    set(NANOARROW_DEVICE_LIBS_METAL ${METAL_LIBRARY} ${FOUNDATION_LIBRARY} ${QUARTZ_CORE_LIBRARY})
+    set(NANOARROW_DEVICE_DEFS_METAL "NANOARROW_DEVICE_WITH_METAL")
+  endif()
+
+  if (NANOARROW_DEVICE_WITH_CUDA)
+    find_package(CUDAToolkit REQUIRED)
+    set(NANOARROW_DEVICE_SOURCES_CUDA src/nanoarrow/nanoarrow_device_cuda.c)
+    set(NANOARROW_DEVICE_LIBS_CUDA CUDA::cudart_static)
+    set(NANOARROW_DEVICE_DEFS_CUDA "NANOARROW_DEVICE_WITH_CUDA")
+  endif()
+
+  add_library(nanoarrow_device
+    src/nanoarrow/nanoarrow_device.c
+    ${NANOARROW_DEVICE_SOURCES_METAL}
+    ${NANOARROW_DEVICE_SOURCES_CUDA})
+
+  target_include_directories(nanoarrow_device PUBLIC
+        $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/src/nanoarrow>
+        $<BUILD_INTERFACE:${nanoarrow_SOURCE_DIR}/src/nanoarrow>
+        $<BUILD_INTERFACE:${nanoarrow_BINARY_DIR}/generated>
+        $<BUILD_INTERFACE:${NANOARROW_DEVICE_INCLUDE_METAL}>
+        $<INSTALL_INTERFACE:include>)
+
+  target_compile_definitions(nanoarrow_device PRIVATE
+    ${NANOARROW_DEVICE_DEFS_METAL}
+    ${NANOARROW_DEVICE_DEFS_CUDA})
+  target_link_libraries(nanoarrow_device PUBLIC
+    ${NANOARROW_DEVICE_LIBS_METAL}
+    ${NANOARROW_DEVICE_LIBS_CUDA})
+
+  install(TARGETS nanoarrow_device DESTINATION lib)
+  install(
+    FILES
+      src/nanoarrow/nanoarrow_device.h
+    DESTINATION include/nanoarrow)
+
+endif()
+
+
+
+if (NANOARROW_DEVICE_BUILD_TESTS)
+  set(MEMORYCHECK_COMMAND_OPTIONS "--leak-check=full --suppressions=${CMAKE_CURRENT_LIST_DIR}/../../valgrind.supp --error-exitcode=1")
+  include(CTest)
+  include(FetchContent)
+
+  if(NOT DEFINED CMAKE_CXX_STANDARD)
+    set(CMAKE_CXX_STANDARD 11)
+  endif()
+  set(CMAKE_CXX_STANDARD_REQUIRED ON)
+
+  # Warning about timestamps of downloaded files
+  if (${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.23")
+      cmake_policy(SET CMP0135 NEW)
+  endif()
+
+  # Use an old version of googletest if we have to to support gcc 4.8
+  if(NOT CMAKE_CXX_COMPILER_ID STREQUAL "GNU" OR
+    CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL "5.0.0")
+    FetchContent_Declare(
+        googletest
+        URL https://github.com/google/googletest/archive/release-1.11.0.zip
+        URL_HASH SHA256=353571c2440176ded91c2de6d6cd88ddd41401d14692ec1f99e35d013feda55a
+    )
+  else()
+    FetchContent_Declare(
+        googletest
+        URL https://github.com/google/googletest/archive/release-1.10.0.zip
+        URL_HASH SHA256=94c634d499558a76fa649edb13721dce6e98fb1e7018dfaeba3cd7a083945e91
+    )
+  endif()
+
+  FetchContent_MakeAvailable(googletest)
+
+  enable_testing()
+  add_executable(nanoarrow_device_test src/nanoarrow/nanoarrow_device_test.cc)
+  add_executable(nanoarrow_device_hpp_test src/nanoarrow/nanoarrow_device_hpp_test.cc)
+
+  if(NANOARROW_DEVICE_CODE_COVERAGE)
+    target_compile_options(device_coverage_config INTERFACE -O0 -g --coverage)
+    target_link_options(device_coverage_config INTERFACE --coverage)
+    target_link_libraries(nanoarrow_device PRIVATE device_coverage_config)
+  endif()
+
+  target_link_libraries(
+    nanoarrow_device_test
+    nanoarrow_device nanoarrow gtest_main device_coverage_config)
+  target_link_libraries(
+    nanoarrow_device_hpp_test
+    nanoarrow_device nanoarrow gtest_main device_coverage_config)
+
+  include(GoogleTest)
+  gtest_discover_tests(nanoarrow_device_test)
+  gtest_discover_tests(nanoarrow_device_hpp_test)
+
+  if (NANOARROW_DEVICE_WITH_METAL)
+    add_executable(nanoarrow_device_metal_test src/nanoarrow/nanoarrow_device_metal_test.cc)
+    target_link_libraries(
+      nanoarrow_device_metal_test
+      nanoarrow_device nanoarrow gtest_main device_coverage_config)
+    gtest_discover_tests(nanoarrow_device_metal_test)
+  endif()
+
+  if (NANOARROW_DEVICE_WITH_CUDA)
+    add_executable(nanoarrow_device_cuda_test src/nanoarrow/nanoarrow_device_cuda_test.cc)
+    target_link_libraries(
+      nanoarrow_device_cuda_test
+      nanoarrow_device nanoarrow gtest_main device_coverage_config)
+    gtest_discover_tests(nanoarrow_device_cuda_test)
+  endif()
+endif()
diff --git a/extensions/nanoarrow_device/CMakePresets.json b/extensions/nanoarrow_device/CMakePresets.json
new file mode 100644
index 0000000..f1a8ddd
--- /dev/null
+++ b/extensions/nanoarrow_device/CMakePresets.json
@@ -0,0 +1,26 @@
+{
+    "version": 3,
+    "cmakeMinimumRequired": {
+        "major": 3,
+        "minor": 21,
+        "patch": 0
+    },
+    "configurePresets": [
+        {
+            "name": "default",
+            "displayName": "Default Config",
+            "cacheVariables": {}
+        },
+        {
+            "name": "default-with-tests",
+            "inherits": [
+                "default"
+            ],
+            "displayName": "Default with tests",
+            "cacheVariables": {
+                "CMAKE_BUILD_TYPE": "Debug",
+                "NANOARROW_DEVICE_BUILD_TESTS": "ON"
+            }
+        }
+    ]
+}
diff --git a/extensions/nanoarrow_device/CMakeUserPresets.json.example b/extensions/nanoarrow_device/CMakeUserPresets.json.example
new file mode 100644
index 0000000..8cf26f0
--- /dev/null
+++ b/extensions/nanoarrow_device/CMakeUserPresets.json.example
@@ -0,0 +1,29 @@
+{
+    "version": 3,
+    "cmakeMinimumRequired": {
+      "major": 3,
+      "minor": 21,
+      "patch": 0
+    },
+    "configurePresets": [
+        {
+          "name": "user-local",
+          "inherits": ["default-with-tests"],
+          "displayName": "(user) local build",
+          "cacheVariables": {
+
+          }
+        }
+    ],
+    "testPresets": [
+      {
+          "name": "user-test-preset",
+          "description": "",
+          "displayName": "(user) test preset)",
+          "configurePreset": "user-local",
+          "environment": {
+            "CTEST_OUTPUT_ON_FAILURE": "1"
+          }
+      }
+    ]
+}
diff --git a/extensions/nanoarrow_device/README.md b/extensions/nanoarrow_device/README.md
new file mode 100644
index 0000000..e4fd6f6
--- /dev/null
+++ b/extensions/nanoarrow_device/README.md
@@ -0,0 +1,68 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# nanoarrow device extension
+
+This extension provides a similar set of tools as the core nanoarrow C API
+extended to the
+[Arrow C Device](https://arrow.apache.org/docs/dev/format/CDeviceDataInterface.html)
+interfaces in the Arrow specification.
+
+Currently, this extension provides an implementation fof CUDA devices
+and an implementation for the default Apple Metal device on MacOS/M1.
+These implementation are preliminary/experimental and are under active
+development.
+
+## Example
+
+```c
+struct ArrowDevice* gpu = ArrowDeviceMetalDefaultDevice();
+// Alternatively, ArrowDeviceCuda(ARROW_DEVICE_CUDA, 0)
+// or  ArrowDeviceCuda(ARROW_DEVICE_CUDA_HOST, 0)
+struct ArrowDevice* cpu = ArrowDeviceCpu();
+struct ArrowArray array;
+struct ArrowDeviceArray device_array;
+struct ArrowDeviceArrayView device_array_view;
+
+// Build a CPU array
+ASSERT_EQ(ArrowArrayInitFromType(&array, NANOARROW_TYPE_STRING), NANOARROW_OK);
+ASSERT_EQ(ArrowArrayStartAppending(&array), NANOARROW_OK);
+ASSERT_EQ(ArrowArrayAppendString(&array, ArrowCharView("abc")), NANOARROW_OK);
+ASSERT_EQ(ArrowArrayAppendString(&array, ArrowCharView("defg")), NANOARROW_OK);
+ASSERT_EQ(ArrowArrayAppendNull(&array, 1), NANOARROW_OK);
+ASSERT_EQ(ArrowArrayFinishBuildingDefault(&array, nullptr), NANOARROW_OK);
+
+// Convert to a DeviceArray, still on the CPU
+ASSERT_EQ(ArrowDeviceArrayInit(cpu, &device_array, &array), NANOARROW_OK);
+
+// Parse contents into a view that can be copied to another device
+ArrowDeviceArrayViewInit(&device_array_view);
+ArrowArrayViewInitFromType(&device_array_view.array_view, string_type);
+ASSERT_EQ(ArrowDeviceArrayViewSetArray(&device_array_view, &device_array, nullptr),
+          NANOARROW_OK);
+
+// Copy to another device. For some devices, ArrowDeviceArrayMoveToDevice() is
+// possible without an explicit copy (although this sometimes triggers an implicit
+// copy by the driver).
+struct ArrowDeviceArray device_array2;
+device_array2.array.release = nullptr;
+ASSERT_EQ(
+    ArrowDeviceArrayViewCopy(&device_array, &device_array_view, gpu, &device_array2),
+    NANOARROW_OK);
+```
diff --git a/extensions/nanoarrow_device/src/apidoc/.gitignore b/extensions/nanoarrow_device/src/apidoc/.gitignore
new file mode 100644
index 0000000..8e6b490
--- /dev/null
+++ b/extensions/nanoarrow_device/src/apidoc/.gitignore
@@ -0,0 +1,18 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+xml
diff --git a/extensions/nanoarrow_device/src/apidoc/Doxyfile b/extensions/nanoarrow_device/src/apidoc/Doxyfile
new file mode 100644
index 0000000..8cafd19
--- /dev/null
+++ b/extensions/nanoarrow_device/src/apidoc/Doxyfile
@@ -0,0 +1,407 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+# Doxyfile 1.9.4
+
+#---------------------------------------------------------------------------
+# Project related configuration options
+#---------------------------------------------------------------------------
+DOXYFILE_ENCODING      = UTF-8
+PROJECT_NAME           = "nanoarrow_device"
+PROJECT_NUMBER         =
+PROJECT_BRIEF          =
+PROJECT_LOGO           =
+OUTPUT_DIRECTORY       =
+CREATE_SUBDIRS         = NO
+CREATE_SUBDIRS_LEVEL   = 8
+ALLOW_UNICODE_NAMES    = NO
+OUTPUT_LANGUAGE        = English
+BRIEF_MEMBER_DESC      = YES
+REPEAT_BRIEF           = YES
+ABBREVIATE_BRIEF       = "The $name class" \
+                         "The $name widget" \
+                         "The $name file" \
+                         is \
+                         provides \
+                         specifies \
+                         contains \
+                         represents \
+                         a \
+                         an \
+                         the
+ALWAYS_DETAILED_SEC    = NO
+INLINE_INHERITED_MEMB  = NO
+FULL_PATH_NAMES        = YES
+STRIP_FROM_PATH        =
+STRIP_FROM_INC_PATH    =
+SHORT_NAMES            = NO
+JAVADOC_AUTOBRIEF      = NO
+JAVADOC_BANNER         = NO
+QT_AUTOBRIEF           = NO
+MULTILINE_CPP_IS_BRIEF = NO
+PYTHON_DOCSTRING       = YES
+INHERIT_DOCS           = YES
+SEPARATE_MEMBER_PAGES  = NO
+TAB_SIZE               = 4
+ALIASES                =
+OPTIMIZE_OUTPUT_FOR_C  = NO
+OPTIMIZE_OUTPUT_JAVA   = NO
+OPTIMIZE_FOR_FORTRAN   = NO
+OPTIMIZE_OUTPUT_VHDL   = NO
+OPTIMIZE_OUTPUT_SLICE  = NO
+EXTENSION_MAPPING      =
+MARKDOWN_SUPPORT       = YES
+TOC_INCLUDE_HEADINGS   = 5
+AUTOLINK_SUPPORT       = YES
+BUILTIN_STL_SUPPORT    = NO
+CPP_CLI_SUPPORT        = NO
+SIP_SUPPORT            = NO
+IDL_PROPERTY_SUPPORT   = YES
+DISTRIBUTE_GROUP_DOC   = NO
+GROUP_NESTED_COMPOUNDS = NO
+SUBGROUPING            = YES
+INLINE_GROUPED_CLASSES = NO
+INLINE_SIMPLE_STRUCTS  = NO
+TYPEDEF_HIDES_STRUCT   = NO
+LOOKUP_CACHE_SIZE      = 0
+NUM_PROC_THREADS       = 1
+#---------------------------------------------------------------------------
+# Build related configuration options
+#---------------------------------------------------------------------------
+EXTRACT_ALL            = NO
+EXTRACT_PRIVATE        = NO
+EXTRACT_PRIV_VIRTUAL   = NO
+EXTRACT_PACKAGE        = NO
+EXTRACT_STATIC         = NO
+EXTRACT_LOCAL_CLASSES  = YES
+EXTRACT_LOCAL_METHODS  = NO
+EXTRACT_ANON_NSPACES   = NO
+RESOLVE_UNNAMED_PARAMS = YES
+HIDE_UNDOC_MEMBERS     = NO
+HIDE_UNDOC_CLASSES     = NO
+HIDE_FRIEND_COMPOUNDS  = NO
+HIDE_IN_BODY_DOCS      = NO
+INTERNAL_DOCS          = NO
+CASE_SENSE_NAMES       = NO
+HIDE_SCOPE_NAMES       = NO
+HIDE_COMPOUND_REFERENCE= NO
+SHOW_HEADERFILE        = YES
+SHOW_INCLUDE_FILES     = YES
+SHOW_GROUPED_MEMB_INC  = NO
+FORCE_LOCAL_INCLUDES   = NO
+INLINE_INFO            = YES
+SORT_MEMBER_DOCS       = YES
+SORT_BRIEF_DOCS        = NO
+SORT_MEMBERS_CTORS_1ST = NO
+SORT_GROUP_NAMES       = NO
+SORT_BY_SCOPE_NAME     = NO
+STRICT_PROTO_MATCHING  = NO
+GENERATE_TODOLIST      = YES
+GENERATE_TESTLIST      = YES
+GENERATE_BUGLIST       = YES
+GENERATE_DEPRECATEDLIST= YES
+ENABLED_SECTIONS       =
+MAX_INITIALIZER_LINES  = 30
+SHOW_USED_FILES        = YES
+SHOW_FILES             = YES
+SHOW_NAMESPACES        = YES
+FILE_VERSION_FILTER    =
+LAYOUT_FILE            =
+CITE_BIB_FILES         =
+#---------------------------------------------------------------------------
+# Configuration options related to warning and progress messages
+#---------------------------------------------------------------------------
+QUIET                  = NO
+WARNINGS               = YES
+WARN_IF_UNDOCUMENTED   = YES
+WARN_IF_DOC_ERROR      = YES
+WARN_IF_INCOMPLETE_DOC = YES
+WARN_NO_PARAMDOC       = NO
+WARN_AS_ERROR          = NO
+WARN_FORMAT            = "$file:$line: $text"
+WARN_LINE_FORMAT       = "at line $line of file $file"
+WARN_LOGFILE           =
+#---------------------------------------------------------------------------
+# Configuration options related to the input files
+#---------------------------------------------------------------------------
+INPUT                  = ../nanoarrow \
+                         .
+INPUT_ENCODING         = UTF-8
+FILE_PATTERNS          = *.c \
+                         *.cc \
+                         *.cxx \
+                         *.cpp \
+                         *.c++ \
+                         *.java \
+                         *.ii \
+                         *.ixx \
+                         *.ipp \
+                         *.i++ \
+                         *.inl \
+                         *.idl \
+                         *.ddl \
+                         *.odl \
+                         *.h \
+                         *.hh \
+                         *.hxx \
+                         *.hpp \
+                         *.h++ \
+                         *.l \
+                         *.cs \
+                         *.d \
+                         *.php \
+                         *.php4 \
+                         *.php5 \
+                         *.phtml \
+                         *.inc \
+                         *.m \
+                         *.markdown \
+                         *.md \
+                         *.mm \
+                         *.dox \
+                         *.py \
+                         *.pyw \
+                         *.f90 \
+                         *.f95 \
+                         *.f03 \
+                         *.f08 \
+                         *.f18 \
+                         *.f \
+                         *.for \
+                         *.vhd \
+                         *.vhdl \
+                         *.ucf \
+                         *.qsf \
+                         *.ice
+RECURSIVE              = NO
+EXCLUDE                =
+EXCLUDE_SYMLINKS       = NO
+EXCLUDE_PATTERNS       =
+EXCLUDE_SYMBOLS        =
+EXAMPLE_PATH           =
+EXAMPLE_PATTERNS       = *
+EXAMPLE_RECURSIVE      = NO
+IMAGE_PATH             =
+INPUT_FILTER           =
+FILTER_PATTERNS        =
+FILTER_SOURCE_FILES    = NO
+FILTER_SOURCE_PATTERNS =
+USE_MDFILE_AS_MAINPAGE =
+#---------------------------------------------------------------------------
+# Configuration options related to source browsing
+#---------------------------------------------------------------------------
+SOURCE_BROWSER         = NO
+INLINE_SOURCES         = NO
+STRIP_CODE_COMMENTS    = YES
+REFERENCED_BY_RELATION = NO
+REFERENCES_RELATION    = NO
+REFERENCES_LINK_SOURCE = YES
+SOURCE_TOOLTIPS        = YES
+USE_HTAGS              = NO
+VERBATIM_HEADERS       = YES
+#---------------------------------------------------------------------------
+# Configuration options related to the alphabetical class index
+#---------------------------------------------------------------------------
+ALPHABETICAL_INDEX     = YES
+IGNORE_PREFIX          =
+#---------------------------------------------------------------------------
+# Configuration options related to the HTML output
+#---------------------------------------------------------------------------
+GENERATE_HTML          = NO
+HTML_OUTPUT            = html
+HTML_FILE_EXTENSION    = .html
+HTML_HEADER            =
+HTML_FOOTER            =
+HTML_STYLESHEET        =
+HTML_EXTRA_STYLESHEET  =
+HTML_EXTRA_FILES       =
+HTML_COLORSTYLE_HUE    = 220
+HTML_COLORSTYLE_SAT    = 100
+HTML_COLORSTYLE_GAMMA  = 80
+HTML_TIMESTAMP         = NO
+HTML_DYNAMIC_MENUS     = YES
+HTML_DYNAMIC_SECTIONS  = NO
+HTML_INDEX_NUM_ENTRIES = 100
+GENERATE_DOCSET        = NO
+DOCSET_FEEDNAME        = "Doxygen generated docs"
+DOCSET_FEEDURL         =
+DOCSET_BUNDLE_ID       = org.doxygen.Project
+DOCSET_PUBLISHER_ID    = org.doxygen.Publisher
+DOCSET_PUBLISHER_NAME  = Publisher
+GENERATE_HTMLHELP      = NO
+CHM_FILE               =
+HHC_LOCATION           =
+GENERATE_CHI           = NO
+CHM_INDEX_ENCODING     =
+BINARY_TOC             = NO
+TOC_EXPAND             = NO
+GENERATE_QHP           = NO
+QCH_FILE               =
+QHP_NAMESPACE          = org.doxygen.Project
+QHP_VIRTUAL_FOLDER     = doc
+QHP_CUST_FILTER_NAME   =
+QHP_CUST_FILTER_ATTRS  =
+QHP_SECT_FILTER_ATTRS  =
+QHG_LOCATION           =
+GENERATE_ECLIPSEHELP   = NO
+ECLIPSE_DOC_ID         = org.doxygen.Project
+DISABLE_INDEX          = NO
+GENERATE_TREEVIEW      = NO
+FULL_SIDEBAR           = NO
+ENUM_VALUES_PER_LINE   = 4
+TREEVIEW_WIDTH         = 250
+EXT_LINKS_IN_WINDOW    = NO
+OBFUSCATE_EMAILS       = YES
+HTML_FORMULA_FORMAT    = png
+FORMULA_FONTSIZE       = 10
+FORMULA_TRANSPARENT    = YES
+FORMULA_MACROFILE      =
+USE_MATHJAX            = NO
+MATHJAX_VERSION        = MathJax_2
+MATHJAX_FORMAT         = HTML-CSS
+MATHJAX_RELPATH        =
+MATHJAX_EXTENSIONS     =
+MATHJAX_CODEFILE       =
+SEARCHENGINE           = YES
+SERVER_BASED_SEARCH    = NO
+EXTERNAL_SEARCH        = NO
+SEARCHENGINE_URL       =
+SEARCHDATA_FILE        = searchdata.xml
+EXTERNAL_SEARCH_ID     =
+EXTRA_SEARCH_MAPPINGS  =
+#---------------------------------------------------------------------------
+# Configuration options related to the LaTeX output
+#---------------------------------------------------------------------------
+GENERATE_LATEX         = NO
+LATEX_OUTPUT           = latex
+LATEX_CMD_NAME         =
+MAKEINDEX_CMD_NAME     = makeindex
+LATEX_MAKEINDEX_CMD    = makeindex
+COMPACT_LATEX          = NO
+PAPER_TYPE             = a4
+EXTRA_PACKAGES         =
+LATEX_HEADER           =
+LATEX_FOOTER           =
+LATEX_EXTRA_STYLESHEET =
+LATEX_EXTRA_FILES      =
+PDF_HYPERLINKS         = YES
+USE_PDFLATEX           = YES
+LATEX_BATCHMODE        = NO
+LATEX_HIDE_INDICES     = NO
+LATEX_BIB_STYLE        = plain
+LATEX_TIMESTAMP        = NO
+LATEX_EMOJI_DIRECTORY  =
+#---------------------------------------------------------------------------
+# Configuration options related to the RTF output
+#---------------------------------------------------------------------------
+GENERATE_RTF           = NO
+RTF_OUTPUT             = rtf
+COMPACT_RTF            = NO
+RTF_HYPERLINKS         = NO
+RTF_STYLESHEET_FILE    =
+RTF_EXTENSIONS_FILE    =
+#---------------------------------------------------------------------------
+# Configuration options related to the man page output
+#---------------------------------------------------------------------------
+GENERATE_MAN           = NO
+MAN_OUTPUT             = man
+MAN_EXTENSION          = .3
+MAN_SUBDIR             =
+MAN_LINKS              = NO
+#---------------------------------------------------------------------------
+# Configuration options related to the XML output
+#---------------------------------------------------------------------------
+GENERATE_XML           = YES
+XML_OUTPUT             = xml
+XML_PROGRAMLISTING     = YES
+XML_NS_MEMB_FILE_SCOPE = NO
+#---------------------------------------------------------------------------
+# Configuration options related to the DOCBOOK output
+#---------------------------------------------------------------------------
+GENERATE_DOCBOOK       = NO
+DOCBOOK_OUTPUT         = docbook
+#---------------------------------------------------------------------------
+# Configuration options for the AutoGen Definitions output
+#---------------------------------------------------------------------------
+GENERATE_AUTOGEN_DEF   = NO
+#---------------------------------------------------------------------------
+# Configuration options related to the Perl module output
+#---------------------------------------------------------------------------
+GENERATE_PERLMOD       = NO
+PERLMOD_LATEX          = NO
+PERLMOD_PRETTY         = YES
+PERLMOD_MAKEVAR_PREFIX =
+#---------------------------------------------------------------------------
+# Configuration options related to the preprocessor
+#---------------------------------------------------------------------------
+ENABLE_PREPROCESSING   = YES
+MACRO_EXPANSION        = NO
+EXPAND_ONLY_PREDEF     = NO
+SEARCH_INCLUDES        = YES
+INCLUDE_PATH           =
+INCLUDE_FILE_PATTERNS  =
+PREDEFINED             =
+EXPAND_AS_DEFINED      =
+SKIP_FUNCTION_MACROS   = YES
+#---------------------------------------------------------------------------
+# Configuration options related to external references
+#---------------------------------------------------------------------------
+TAGFILES               =
+GENERATE_TAGFILE       =
+ALLEXTERNALS           = NO
+EXTERNAL_GROUPS        = YES
+EXTERNAL_PAGES         = YES
+#---------------------------------------------------------------------------
+# Configuration options related to the dot tool
+#---------------------------------------------------------------------------
+DIA_PATH               =
+HIDE_UNDOC_RELATIONS   = YES
+HAVE_DOT               = NO
+DOT_NUM_THREADS        = 0
+DOT_FONTNAME           = Helvetica
+DOT_FONTSIZE           = 10
+DOT_FONTPATH           =
+CLASS_GRAPH            = YES
+COLLABORATION_GRAPH    = YES
+GROUP_GRAPHS           = YES
+UML_LOOK               = NO
+UML_LIMIT_NUM_FIELDS   = 10
+DOT_UML_DETAILS        = NO
+DOT_WRAP_THRESHOLD     = 17
+TEMPLATE_RELATIONS     = NO
+INCLUDE_GRAPH          = YES
+INCLUDED_BY_GRAPH      = YES
+CALL_GRAPH             = NO
+CALLER_GRAPH           = NO
+GRAPHICAL_HIERARCHY    = YES
+DIRECTORY_GRAPH        = YES
+DIR_GRAPH_MAX_DEPTH    = 1
+DOT_IMAGE_FORMAT       = png
+INTERACTIVE_SVG        = NO
+DOT_PATH               =
+DOTFILE_DIRS           =
+MSCFILE_DIRS           =
+DIAFILE_DIRS           =
+PLANTUML_JAR_PATH      =
+PLANTUML_CFG_FILE      =
+PLANTUML_INCLUDE_PATH  =
+DOT_GRAPH_MAX_NODES    = 50
+MAX_DOT_GRAPH_DEPTH    = 0
+DOT_TRANSPARENT        = NO
+DOT_MULTI_TARGETS      = NO
+GENERATE_LEGEND        = YES
+DOT_CLEANUP            = YES
diff --git a/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device.c b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device.c
new file mode 100644
index 0000000..4be7a93
--- /dev/null
+++ b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device.c
@@ -0,0 +1,491 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <errno.h>
+
+#include "nanoarrow.h"
+
+#include "nanoarrow_device.h"
+
+ArrowErrorCode ArrowDeviceCheckRuntime(struct ArrowError* error) {
+  const char* nanoarrow_runtime_version = ArrowNanoarrowVersion();
+  const char* nanoarrow_ipc_build_time_version = NANOARROW_VERSION;
+
+  if (strcmp(nanoarrow_runtime_version, nanoarrow_ipc_build_time_version) != 0) {
+    ArrowErrorSet(error, "Expected nanoarrow runtime version '%s' but found version '%s'",
+                  nanoarrow_ipc_build_time_version, nanoarrow_runtime_version);
+    return EINVAL;
+  }
+
+  return NANOARROW_OK;
+}
+
+static void ArrowDeviceArrayInitDefault(struct ArrowDevice* device,
+                                        struct ArrowDeviceArray* device_array,
+                                        struct ArrowArray* array) {
+  memset(device_array, 0, sizeof(struct ArrowDeviceArray));
+  device_array->device_type = device->device_type;
+  device_array->device_id = device->device_id;
+  ArrowArrayMove(array, &device_array->array);
+}
+
+static ArrowErrorCode ArrowDeviceCpuBufferInit(struct ArrowDevice* device_src,
+                                               struct ArrowBufferView src,
+                                               struct ArrowDevice* device_dst,
+                                               struct ArrowBuffer* dst) {
+  if (device_dst->device_type != ARROW_DEVICE_CPU ||
+      device_src->device_type != ARROW_DEVICE_CPU) {
+    return ENOTSUP;
+  }
+
+  ArrowBufferInit(dst);
+  dst->allocator = ArrowBufferAllocatorDefault();
+  NANOARROW_RETURN_NOT_OK(ArrowBufferAppend(dst, src.data.as_uint8, src.size_bytes));
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrowDeviceCpuBufferMove(struct ArrowDevice* device_src,
+                                               struct ArrowBuffer* src,
+                                               struct ArrowDevice* device_dst,
+                                               struct ArrowBuffer* dst) {
+  if (device_dst->device_type != ARROW_DEVICE_CPU ||
+      device_src->device_type != ARROW_DEVICE_CPU) {
+    return ENOTSUP;
+  }
+
+  ArrowBufferMove(src, dst);
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrowDeviceCpuBufferCopy(struct ArrowDevice* device_src,
+                                               struct ArrowBufferView src,
+                                               struct ArrowDevice* device_dst,
+                                               struct ArrowBufferView dst) {
+  if (device_dst->device_type != ARROW_DEVICE_CPU ||
+      device_src->device_type != ARROW_DEVICE_CPU) {
+    return ENOTSUP;
+  }
+
+  memcpy((uint8_t*)dst.data.as_uint8, src.data.as_uint8, dst.size_bytes);
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrowDeviceCpuSynchronize(struct ArrowDevice* device,
+                                                void* sync_event,
+                                                struct ArrowError* error) {
+  switch (device->device_type) {
+    case ARROW_DEVICE_CPU:
+      if (sync_event != NULL) {
+        ArrowErrorSet(error, "Expected NULL sync_event for ARROW_DEVICE_CPU but got %p",
+                      sync_event);
+        return EINVAL;
+      } else {
+        return NANOARROW_OK;
+      }
+    default:
+      return device->synchronize_event(device, sync_event, error);
+  }
+}
+
+static void ArrowDeviceCpuRelease(struct ArrowDevice* device) { device->release = NULL; }
+
+struct ArrowDevice* ArrowDeviceCpu(void) {
+  static struct ArrowDevice* cpu_device_singleton = NULL;
+  if (cpu_device_singleton == NULL) {
+    cpu_device_singleton = (struct ArrowDevice*)ArrowMalloc(sizeof(struct ArrowDevice));
+    ArrowDeviceInitCpu(cpu_device_singleton);
+  }
+
+  return cpu_device_singleton;
+}
+
+void ArrowDeviceInitCpu(struct ArrowDevice* device) {
+  device->device_type = ARROW_DEVICE_CPU;
+  device->device_id = 0;
+  device->array_init = NULL;
+  device->array_move = NULL;
+  device->buffer_init = &ArrowDeviceCpuBufferInit;
+  device->buffer_move = &ArrowDeviceCpuBufferMove;
+  device->buffer_copy = &ArrowDeviceCpuBufferCopy;
+  device->synchronize_event = &ArrowDeviceCpuSynchronize;
+  device->release = &ArrowDeviceCpuRelease;
+  device->private_data = NULL;
+}
+
+#ifdef NANOARROW_DEVICE_WITH_METAL
+struct ArrowDevice* ArrowDeviceMetalDefaultDevice(void);
+#endif
+
+#ifdef NANOARROW_DEVICE_WITH_CUDA
+struct ArrowDevice* ArrowDeviceCuda(ArrowDeviceType device_type, int64_t device_id);
+#endif
+
+struct ArrowDevice* ArrowDeviceResolve(ArrowDeviceType device_type, int64_t device_id) {
+  if (device_type == ARROW_DEVICE_CPU && device_id == 0) {
+    return ArrowDeviceCpu();
+  }
+
+#ifdef NANOARROW_DEVICE_WITH_METAL
+  if (device_type == ARROW_DEVICE_METAL) {
+    struct ArrowDevice* default_device = ArrowDeviceMetalDefaultDevice();
+    if (device_id == default_device->device_id) {
+      return default_device;
+    }
+  }
+#endif
+
+#ifdef NANOARROW_DEVICE_WITH_CUDA
+  if (device_type == ARROW_DEVICE_CUDA || device_type == ARROW_DEVICE_CUDA_HOST) {
+    return ArrowDeviceCuda(device_type, device_id);
+  }
+#endif
+
+  return NULL;
+}
+
+ArrowErrorCode ArrowDeviceArrayInit(struct ArrowDevice* device,
+                                    struct ArrowDeviceArray* device_array,
+                                    struct ArrowArray* array) {
+  if (device->array_init != NULL) {
+    return device->array_init(device, device_array, array);
+  } else {
+    ArrowDeviceArrayInitDefault(device, device_array, array);
+    return NANOARROW_OK;
+  }
+}
+
+ArrowErrorCode ArrowDeviceBufferInit(struct ArrowDevice* device_src,
+                                     struct ArrowBufferView src,
+                                     struct ArrowDevice* device_dst,
+                                     struct ArrowBuffer* dst) {
+  int result = device_dst->buffer_init(device_src, src, device_dst, dst);
+  if (result == ENOTSUP) {
+    result = device_src->buffer_init(device_src, src, device_dst, dst);
+  }
+
+  return result;
+}
+
+ArrowErrorCode ArrowDeviceBufferMove(struct ArrowDevice* device_src,
+                                     struct ArrowBuffer* src,
+                                     struct ArrowDevice* device_dst,
+                                     struct ArrowBuffer* dst) {
+  int result = device_dst->buffer_move(device_src, src, device_dst, dst);
+  if (result == ENOTSUP) {
+    result = device_src->buffer_move(device_src, src, device_dst, dst);
+  }
+
+  return result;
+}
+
+ArrowErrorCode ArrowDeviceBufferCopy(struct ArrowDevice* device_src,
+                                     struct ArrowBufferView src,
+                                     struct ArrowDevice* device_dst,
+                                     struct ArrowBufferView dst) {
+  int result = device_dst->buffer_copy(device_src, src, device_dst, dst);
+  if (result == ENOTSUP) {
+    result = device_src->buffer_copy(device_src, src, device_dst, dst);
+  }
+
+  return result;
+}
+
+struct ArrowBasicDeviceArrayStreamPrivate {
+  struct ArrowDevice* device;
+  struct ArrowArrayStream naive_stream;
+};
+
+static int ArrowDeviceBasicArrayStreamGetSchema(
+    struct ArrowDeviceArrayStream* array_stream, struct ArrowSchema* schema) {
+  struct ArrowBasicDeviceArrayStreamPrivate* private_data =
+      (struct ArrowBasicDeviceArrayStreamPrivate*)array_stream->private_data;
+  return private_data->naive_stream.get_schema(&private_data->naive_stream, schema);
+}
+
+static int ArrowDeviceBasicArrayStreamGetNext(struct ArrowDeviceArrayStream* array_stream,
+                                              struct ArrowDeviceArray* device_array) {
+  struct ArrowBasicDeviceArrayStreamPrivate* private_data =
+      (struct ArrowBasicDeviceArrayStreamPrivate*)array_stream->private_data;
+
+  struct ArrowArray tmp;
+  NANOARROW_RETURN_NOT_OK(
+      private_data->naive_stream.get_next(&private_data->naive_stream, &tmp));
+  int result = ArrowDeviceArrayInit(private_data->device, device_array, &tmp);
+  if (result != NANOARROW_OK) {
+    tmp.release(&tmp);
+    return result;
+  }
+
+  return NANOARROW_OK;
+}
+
+static const char* ArrowDeviceBasicArrayStreamGetLastError(
+    struct ArrowDeviceArrayStream* array_stream) {
+  struct ArrowBasicDeviceArrayStreamPrivate* private_data =
+      (struct ArrowBasicDeviceArrayStreamPrivate*)array_stream->private_data;
+  return private_data->naive_stream.get_last_error(&private_data->naive_stream);
+}
+
+static void ArrowDeviceBasicArrayStreamRelease(
+    struct ArrowDeviceArrayStream* array_stream) {
+  struct ArrowBasicDeviceArrayStreamPrivate* private_data =
+      (struct ArrowBasicDeviceArrayStreamPrivate*)array_stream->private_data;
+  private_data->naive_stream.release(&private_data->naive_stream);
+  ArrowFree(private_data);
+  array_stream->release = NULL;
+}
+
+ArrowErrorCode ArrowDeviceBasicArrayStreamInit(
+    struct ArrowDeviceArrayStream* device_array_stream,
+    struct ArrowArrayStream* array_stream, struct ArrowDevice* device) {
+  struct ArrowBasicDeviceArrayStreamPrivate* private_data =
+      (struct ArrowBasicDeviceArrayStreamPrivate*)ArrowMalloc(
+          sizeof(struct ArrowBasicDeviceArrayStreamPrivate));
+  if (private_data == NULL) {
+    return ENOMEM;
+  }
+
+  private_data->device = device;
+  ArrowArrayStreamMove(array_stream, &private_data->naive_stream);
+
+  device_array_stream->device_type = device->device_type;
+  device_array_stream->get_schema = &ArrowDeviceBasicArrayStreamGetSchema;
+  device_array_stream->get_next = &ArrowDeviceBasicArrayStreamGetNext;
+  device_array_stream->get_last_error = &ArrowDeviceBasicArrayStreamGetLastError;
+  device_array_stream->release = &ArrowDeviceBasicArrayStreamRelease;
+  device_array_stream->private_data = private_data;
+  return NANOARROW_OK;
+}
+
+void ArrowDeviceArrayViewInit(struct ArrowDeviceArrayView* device_array_view) {
+  memset(device_array_view, 0, sizeof(struct ArrowDeviceArrayView));
+}
+
+void ArrowDeviceArrayViewReset(struct ArrowDeviceArrayView* device_array_view) {
+  ArrowArrayViewReset(&device_array_view->array_view);
+  device_array_view->device = NULL;
+}
+
+static ArrowErrorCode ArrowDeviceBufferGetInt32(struct ArrowDevice* device,
+                                                struct ArrowBufferView buffer_view,
+                                                int64_t i, int32_t* out) {
+  struct ArrowBufferView out_view;
+  out_view.data.as_int32 = out;
+  out_view.size_bytes = sizeof(int32_t);
+
+  struct ArrowBufferView device_buffer_view;
+  device_buffer_view.data.as_int32 = buffer_view.data.as_int32 + i;
+  device_buffer_view.size_bytes = sizeof(int32_t);
+  NANOARROW_RETURN_NOT_OK(
+      ArrowDeviceBufferCopy(device, device_buffer_view, ArrowDeviceCpu(), out_view));
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrowDeviceBufferGetInt64(struct ArrowDevice* device,
+                                                struct ArrowBufferView buffer_view,
+                                                int64_t i, int64_t* out) {
+  struct ArrowBufferView out_view;
+  out_view.data.as_int64 = out;
+  out_view.size_bytes = sizeof(int64_t);
+
+  struct ArrowBufferView device_buffer_view;
+  device_buffer_view.data.as_int64 = buffer_view.data.as_int64 + i;
+  device_buffer_view.size_bytes = sizeof(int64_t);
+  NANOARROW_RETURN_NOT_OK(
+      ArrowDeviceBufferCopy(device, device_buffer_view, ArrowDeviceCpu(), out_view));
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrowDeviceArrayViewResolveBufferSizes(
+    struct ArrowDevice* device, struct ArrowArrayView* array_view) {
+  // Calculate buffer sizes that require accessing the offset buffer
+  // (at this point all other sizes have been resolved).
+  int64_t offset_plus_length = array_view->offset + array_view->length;
+  int32_t last_offset32;
+  int64_t last_offset64;
+
+  switch (array_view->storage_type) {
+    case NANOARROW_TYPE_STRING:
+    case NANOARROW_TYPE_BINARY:
+      if (array_view->buffer_views[1].size_bytes == 0) {
+        array_view->buffer_views[2].size_bytes = 0;
+      } else if (array_view->buffer_views[2].size_bytes == -1) {
+        NANOARROW_RETURN_NOT_OK(ArrowDeviceBufferGetInt32(
+            device, array_view->buffer_views[1], offset_plus_length, &last_offset32));
+        array_view->buffer_views[2].size_bytes = last_offset32;
+      }
+      break;
+
+    case NANOARROW_TYPE_LARGE_STRING:
+    case NANOARROW_TYPE_LARGE_BINARY:
+      if (array_view->buffer_views[1].size_bytes == 0) {
+        array_view->buffer_views[2].size_bytes = 0;
+      } else if (array_view->buffer_views[2].size_bytes == -1) {
+        NANOARROW_RETURN_NOT_OK(ArrowDeviceBufferGetInt64(
+            device, array_view->buffer_views[1], offset_plus_length, &last_offset64));
+        array_view->buffer_views[2].size_bytes = last_offset64;
+      }
+      break;
+    default:
+      break;
+  }
+
+  // Recurse for children
+  for (int64_t i = 0; i < array_view->n_children; i++) {
+    NANOARROW_RETURN_NOT_OK(
+        ArrowDeviceArrayViewResolveBufferSizes(device, array_view->children[i]));
+  }
+
+  return NANOARROW_OK;
+}
+
+ArrowErrorCode ArrowDeviceArrayViewSetArrayMinimal(
+    struct ArrowDeviceArrayView* device_array_view, struct ArrowDeviceArray* device_array,
+    struct ArrowError* error) {
+  // Resolve device
+  struct ArrowDevice* device =
+      ArrowDeviceResolve(device_array->device_type, device_array->device_id);
+  if (device == NULL) {
+    ArrowErrorSet(error, "Can't resolve device with type %d and identifier %ld",
+                  (int)device_array->device_type, (long)device_array->device_id);
+    return EINVAL;
+  }
+
+  // Set the device array device
+  device_array_view->device = device;
+
+  // Populate the array_view
+  NANOARROW_RETURN_NOT_OK(ArrowArrayViewSetArrayMinimal(&device_array_view->array_view,
+                                                        &device_array->array, error));
+
+  return NANOARROW_OK;
+}
+
+ArrowErrorCode ArrowDeviceArrayViewSetArray(
+    struct ArrowDeviceArrayView* device_array_view, struct ArrowDeviceArray* device_array,
+    struct ArrowError* error) {
+  NANOARROW_RETURN_NOT_OK(
+      ArrowDeviceArrayViewSetArrayMinimal(device_array_view, device_array, error));
+
+  // Wait on device_array to synchronize with the CPU
+  // TODO: This is not actually sufficient for CUDA, where the synchronization
+  // should happen after the cudaMemcpy, not before it. The ordering of
+  // these operations should be explicit and asynchronous (and is probably outside
+  // the scope of what can be done with a generic callback).
+  NANOARROW_RETURN_NOT_OK(device_array_view->device->synchronize_event(
+      device_array_view->device, device_array->sync_event, error));
+
+  // Resolve unknown buffer sizes (i.e., string, binary, large string, large binary)
+  NANOARROW_RETURN_NOT_OK_WITH_ERROR(
+      ArrowDeviceArrayViewResolveBufferSizes(device_array_view->device,
+                                             &device_array_view->array_view),
+      error);
+
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrowDeviceArrayViewCopyInternal(struct ArrowDevice* device_src,
+                                                       struct ArrowArrayView* src,
+                                                       struct ArrowDevice* device_dst,
+                                                       struct ArrowArray* dst) {
+  // Currently no attempt to minimize the amount of memory copied (i.e.,
+  // by applying offset + length and copying potentially fewer bytes)
+  dst->length = src->length;
+  dst->offset = src->offset;
+  dst->null_count = src->null_count;
+
+  for (int i = 0; i < 3; i++) {
+    if (src->layout.buffer_type[i] == NANOARROW_BUFFER_TYPE_NONE) {
+      break;
+    }
+
+    NANOARROW_RETURN_NOT_OK(ArrowDeviceBufferInit(device_src, src->buffer_views[i],
+                                                  device_dst, ArrowArrayBuffer(dst, i)));
+  }
+
+  for (int64_t i = 0; i < src->n_children; i++) {
+    NANOARROW_RETURN_NOT_OK(ArrowDeviceArrayViewCopyInternal(
+        device_src, src->children[i], device_dst, dst->children[i]));
+  }
+
+  if (src->dictionary != NULL) {
+    NANOARROW_RETURN_NOT_OK(ArrowDeviceArrayViewCopyInternal(
+        device_src, src->dictionary, device_dst, dst->dictionary));
+  }
+
+  return NANOARROW_OK;
+}
+
+ArrowErrorCode ArrowDeviceArrayViewCopy(struct ArrowDeviceArrayView* src,
+                                        struct ArrowDevice* device_dst,
+                                        struct ArrowDeviceArray* dst) {
+  struct ArrowArray tmp;
+  NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromArrayView(&tmp, &src->array_view, NULL));
+
+  int result =
+      ArrowDeviceArrayViewCopyInternal(src->device, &src->array_view, device_dst, &tmp);
+  if (result != NANOARROW_OK) {
+    tmp.release(&tmp);
+    return result;
+  }
+
+  result = ArrowArrayFinishBuilding(&tmp, NANOARROW_VALIDATION_LEVEL_MINIMAL, NULL);
+  if (result != NANOARROW_OK) {
+    tmp.release(&tmp);
+    return result;
+  }
+
+  result = ArrowDeviceArrayInit(device_dst, dst, &tmp);
+  if (result != NANOARROW_OK) {
+    tmp.release(&tmp);
+    return result;
+  }
+
+  return result;
+}
+
+ArrowErrorCode ArrowDeviceArrayMoveToDevice(struct ArrowDeviceArray* src,
+                                            struct ArrowDevice* device_dst,
+                                            struct ArrowDeviceArray* dst) {
+  // Can always move from the same device to the same device
+  if (src->device_type == device_dst->device_type &&
+      src->device_id == device_dst->device_id) {
+    ArrowDeviceArrayMove(src, dst);
+    return NANOARROW_OK;
+  }
+
+  struct ArrowDevice* device_src = ArrowDeviceResolve(src->device_type, src->device_id);
+  if (device_src == NULL) {
+    return EINVAL;
+  }
+
+  // See if the source knows how to move
+  int result;
+  if (device_src->array_move != NULL) {
+    result = device_src->array_move(device_src, src, device_dst, dst);
+    if (result != ENOTSUP) {
+      return result;
+    }
+  }
+
+  // See if the destination knows how to move
+  if (device_dst->array_move != NULL) {
+    NANOARROW_RETURN_NOT_OK(device_dst->array_move(device_src, src, device_dst, dst));
+  }
+
+  return ENOTSUP;
+}
diff --git a/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device.h b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device.h
new file mode 100644
index 0000000..96a2bae
--- /dev/null
+++ b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device.h
@@ -0,0 +1,334 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef NANOARROW_DEVICE_H_INCLUDED
+#define NANOARROW_DEVICE_H_INCLUDED
+
+#include "nanoarrow.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/// \defgroup nanoarrow_device-arrow-cdata Arrow C Device interface
+///
+/// The Arrow Device and Stream interfaces are part of the
+/// Arrow C Device Data and Arrow C Device stream interfaces
+/// (https://arrow.apache.org/docs/dev/format/CDeviceDataInterface.html).
+/// See the Arrow documentation for detailed documentation of these structures.
+///
+/// @{
+
+#ifndef ARROW_C_DEVICE_DATA_INTERFACE
+#define ARROW_C_DEVICE_DATA_INTERFACE
+
+// Device type for the allocated memory
+typedef int32_t ArrowDeviceType;
+
+// CPU device, same as using ArrowArray directly
+#define ARROW_DEVICE_CPU 1
+// CUDA GPU Device
+#define ARROW_DEVICE_CUDA 2
+// Pinned CUDA CPU memory by cudaMallocHost
+#define ARROW_DEVICE_CUDA_HOST 3
+// OpenCL Device
+#define ARROW_DEVICE_OPENCL 4
+// Vulkan buffer for next-gen graphics
+#define ARROW_DEVICE_VULKAN 7
+// Metal for Apple GPU
+#define ARROW_DEVICE_METAL 8
+// Verilog simulator buffer
+#define ARROW_DEVICE_VPI 9
+// ROCm GPUs for AMD GPUs
+#define ARROW_DEVICE_ROCM 10
+// Pinned ROCm CPU memory allocated by hipMallocHost
+#define ARROW_DEVICE_ROCM_HOST 11
+// Reserved for extension
+//
+// used to quickly test extension devices, semantics
+// can differ based on implementation
+#define ARROW_DEVICE_EXT_DEV 12
+// CUDA managed/unified memory allocated by cudaMallocManaged
+#define ARROW_DEVICE_CUDA_MANAGED 13
+// Unified shared memory allocated on a oneAPI
+// non-partitioned device.
+//
+// A call to the oneAPI runtime is required to determine the
+// device type, the USM allocation type and the sycl context
+// that it is bound to.
+#define ARROW_DEVICE_ONEAPI 14
+// GPU support for next-gen WebGPU standard
+#define ARROW_DEVICE_WEBGPU 15
+// Qualcomm Hexagon DSP
+#define ARROW_DEVICE_HEXAGON 16
+
+struct ArrowDeviceArray {
+  struct ArrowArray array;
+  int64_t device_id;
+  ArrowDeviceType device_type;
+  void* sync_event;
+
+  // reserved bytes for future expansion
+  int64_t reserved[3];
+};
+
+#endif  // ARROW_C_DEVICE_DATA_INTERFACE
+
+#ifndef ARROW_C_DEVICE_STREAM_INTERFACE
+#define ARROW_C_DEVICE_STREAM_INTERFACE
+
+struct ArrowDeviceArrayStream {
+  // device type that all arrays will be accessible from
+  ArrowDeviceType device_type;
+  // callbacks
+  int (*get_schema)(struct ArrowDeviceArrayStream*, struct ArrowSchema*);
+  int (*get_next)(struct ArrowDeviceArrayStream*, struct ArrowDeviceArray*);
+  const char* (*get_last_error)(struct ArrowDeviceArrayStream*);
+
+  // release callback
+  void (*release)(struct ArrowDeviceArrayStream*);
+
+  // opaque producer-specific data
+  void* private_data;
+};
+
+#endif  // ARROW_C_DEVICE_STREAM_INTERFACE
+
+/// \brief Move the contents of src into dst and set src->array.release to NULL
+static inline void ArrowDeviceArrayMove(struct ArrowDeviceArray* src,
+                                        struct ArrowDeviceArray* dst) {
+  memcpy(dst, src, sizeof(struct ArrowDeviceArray));
+  src->array.release = 0;
+}
+
+/// @}
+
+#ifdef NANOARROW_NAMESPACE
+
+#define ArrowDeviceCheckRuntime \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceCheckRuntime)
+#define ArrowDeviceArrayInit NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceArrayInit)
+#define ArrowDeviceArrayViewInit \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceArrayViewInit)
+#define ArrowDeviceArrayViewReset \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceArrayViewReset)
+#define ArrowDeviceArrayViewSetArrayMinimal \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceArrayViewSetArrayMinimal)
+#define ArrowDeviceArrayViewSetArray \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceArrayViewSetArray)
+#define ArrowDeviceArrayViewCopy \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceArrayViewCopy)
+#define ArrowDeviceArrayViewCopyRequired \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceArrayViewCopyRequired)
+#define ArrowDeviceArrayMoveToDevice \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceArrayMoveToDevice)
+#define ArrowDeviceResolve NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceResolve)
+#define ArrowDeviceCpu NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceCpu)
+#define ArrowDeviceInitCpu NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceInitCpu)
+#define ArrowDeviceBufferInit NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceBufferInit)
+#define ArrowDeviceBufferMove NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceBufferMove)
+#define ArrowDeviceBufferCopy NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceBufferCopy)
+#define ArrowDeviceBasicArrayStreamInit \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceBasicArrayStreamInit)
+
+#endif
+
+/// \defgroup nanoarrow_device Nanoarrow Device extension
+///
+/// Except where noted, objects are not thread-safe and clients should
+/// take care to serialize accesses to methods.
+///
+/// @{
+
+/// \brief Checks the nanoarrow runtime to make sure the run/build versions match
+ArrowErrorCode ArrowDeviceCheckRuntime(struct ArrowError* error);
+
+/// \brief A Device wrapper with callbacks for basic memory management tasks
+///
+/// All device objects are currently implemented as singletons; however, this
+/// may change as implementations progress.
+struct ArrowDevice {
+  /// \brief The device type integer identifier (see ArrowDeviceArray)
+  ArrowDeviceType device_type;
+
+  /// \brief The device identifier (see ArrowDeviceArray)
+  int64_t device_id;
+
+  /// \brief Initialize an ArrowDeviceArray from a previously allocated ArrowArray
+  ///
+  /// Given a device and an uninitialized device_array, populate the fields of the
+  /// device_array (including sync_event) appropriately. If NANOARROW_OK is returned,
+  /// ownership of array is transferred to device_array. This function must allocate
+  /// the appropriate sync_event and make its address available as
+  /// device_array->sync_event (if sync_event applies to this device type).
+  ArrowErrorCode (*array_init)(struct ArrowDevice* device,
+                               struct ArrowDeviceArray* device_array,
+                               struct ArrowArray* array);
+
+  /// \brief Move an ArrowDeviceArray between devices without copying buffers
+  ///
+  /// Some devices can move an ArrowDeviceArray without an explicit buffer copy,
+  /// although the performance characteristics of the moved array may be different
+  /// than that of an explicitly copied one depending on the device.
+  ArrowErrorCode (*array_move)(struct ArrowDevice* device_src,
+                               struct ArrowDeviceArray* src,
+                               struct ArrowDevice* device_dst,
+                               struct ArrowDeviceArray* dst);
+
+  /// \brief Initialize an owning buffer from existing content
+  ///
+  /// Creates a new buffer whose data member can be accessed by the GPU by
+  /// copying existing content.
+  /// Implementations must check device_src and device_dst and return ENOTSUP if
+  /// not prepared to handle this operation.
+  ArrowErrorCode (*buffer_init)(struct ArrowDevice* device_src,
+                                struct ArrowBufferView src,
+                                struct ArrowDevice* device_dst, struct ArrowBuffer* dst);
+
+  /// \brief Move an owning buffer to a device
+  ///
+  /// Creates a new buffer whose data member can be accessed by the GPU by
+  /// moving an existing buffer. If NANOARROW_OK is returned, src will have
+  /// been released or moved by the implementation and dst must be released by
+  /// the caller.
+  /// Implementations must check device_src and device_dst and return ENOTSUP if
+  /// not prepared to handle this operation.
+  ArrowErrorCode (*buffer_move)(struct ArrowDevice* device_src, struct ArrowBuffer* src,
+                                struct ArrowDevice* device_dst, struct ArrowBuffer* dst);
+
+  /// \brief Copy a section of memory into a preallocated buffer
+  ///
+  /// As opposed to the other buffer operations, this is designed to support
+  /// copying very small slices of memory.
+  /// Implementations must check device_src and device_dst and return ENOTSUP if
+  /// not prepared to handle this operation.
+  ArrowErrorCode (*buffer_copy)(struct ArrowDevice* device_src,
+                                struct ArrowBufferView src,
+                                struct ArrowDevice* device_dst,
+                                struct ArrowBufferView dst);
+
+  /// \brief Wait for an event on the CPU host
+  ArrowErrorCode (*synchronize_event)(struct ArrowDevice* device, void* sync_event,
+                                      struct ArrowError* error);
+
+  /// \brief Release this device and any resources it holds
+  void (*release)(struct ArrowDevice* device);
+
+  /// \brief Opaque, implementation-specific data.
+  void* private_data;
+};
+
+struct ArrowDeviceArrayView {
+  struct ArrowDevice* device;
+  struct ArrowArrayView array_view;
+};
+
+/// \brief Initialize an ArrowDeviceArray
+///
+/// Given an ArrowArray whose buffers/release callback has been set appropriately,
+/// initialize an ArrowDeviceArray.
+ArrowErrorCode ArrowDeviceArrayInit(struct ArrowDevice* device,
+                                    struct ArrowDeviceArray* device_array,
+                                    struct ArrowArray* array);
+
+/// \brief Initialize an ArrowDeviceArrayView
+///
+/// Zeroes memory for the device array view struct. Callers must initialize the
+/// array_view member using nanoarrow core functions that can initialize from
+/// a type identifier or schema.
+void ArrowDeviceArrayViewInit(struct ArrowDeviceArrayView* device_array_view);
+
+/// \brief Release the underlying ArrowArrayView
+void ArrowDeviceArrayViewReset(struct ArrowDeviceArrayView* device_array_view);
+
+/// \brief Set minimal ArrowArrayView buffer information from a device array
+///
+/// A thin wrapper around ArrowArrayViewSetArrayMinimal() that does not attempt
+/// to resolve buffer sizes of variable-length buffers by copying data from the device.
+ArrowErrorCode ArrowDeviceArrayViewSetArrayMinimal(
+    struct ArrowDeviceArrayView* device_array_view, struct ArrowDeviceArray* device_array,
+    struct ArrowError* error);
+
+/// \brief Set ArrowArrayView buffer information from a device array
+///
+/// Runs ArrowDeviceArrayViewSetArrayMinimal() but also sets buffer sizes for
+/// variable-length buffers by copying data from the device. This function will block on
+/// the device_array's sync_event.
+ArrowErrorCode ArrowDeviceArrayViewSetArray(
+    struct ArrowDeviceArrayView* device_array_view, struct ArrowDeviceArray* device_array,
+    struct ArrowError* error);
+
+/// \brief Copy an ArrowDeviceArrayView to a device
+ArrowErrorCode ArrowDeviceArrayViewCopy(struct ArrowDeviceArrayView* src,
+                                        struct ArrowDevice* device_dst,
+                                        struct ArrowDeviceArray* dst);
+
+/// \brief Move an ArrowDeviceArray to a device if possible
+///
+/// Will attempt to move a device array to a device without copying buffers.
+/// This may result in a device array with different performance charateristics
+/// than an array that was copied.
+ArrowErrorCode ArrowDeviceArrayMoveToDevice(struct ArrowDeviceArray* src,
+                                            struct ArrowDevice* device_dst,
+                                            struct ArrowDeviceArray* dst);
+
+/// \brief Pointer to a statically-allocated CPU device singleton
+struct ArrowDevice* ArrowDeviceCpu(void);
+
+/// \brief Initialize a user-allocated device struct with a CPU device
+void ArrowDeviceInitCpu(struct ArrowDevice* device);
+
+/// \brief Resolve a device pointer from a type + identifier
+///
+/// Depending on which libraries this build of the device extension was built with,
+/// some device types may or may not be supported. The CPU type is always supported.
+/// Returns NULL for device that does not exist or cannot be returned as a singleton.
+/// Callers must not release the pointed-to device.
+struct ArrowDevice* ArrowDeviceResolve(ArrowDeviceType device_type, int64_t device_id);
+
+ArrowErrorCode ArrowDeviceBufferInit(struct ArrowDevice* device_src,
+                                     struct ArrowBufferView src,
+                                     struct ArrowDevice* device_dst,
+                                     struct ArrowBuffer* dst);
+
+ArrowErrorCode ArrowDeviceBufferMove(struct ArrowDevice* device_src,
+                                     struct ArrowBuffer* src,
+                                     struct ArrowDevice* device_dst,
+                                     struct ArrowBuffer* dst);
+
+ArrowErrorCode ArrowDeviceBufferCopy(struct ArrowDevice* device_src,
+                                     struct ArrowBufferView src,
+                                     struct ArrowDevice* device_dst,
+                                     struct ArrowBufferView dst);
+
+/// \brief Initialize an ArrowDeviceArrayStream from an existing ArrowArrayStream
+///
+/// Wrap an ArrowArrayStream of ArrowDeviceArray objects already allocated by the
+/// specified device as an ArrowDeviceArrayStream. This function moves the ownership of
+/// array_stream to the device_array_stream. If this function returns NANOARROW_OK, the
+/// caller is responsible for releasing the ArrowDeviceArrayStream.
+ArrowErrorCode ArrowDeviceBasicArrayStreamInit(
+    struct ArrowDeviceArrayStream* device_array_stream,
+    struct ArrowArrayStream* array_stream, struct ArrowDevice* device);
+
+/// @}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device.hpp b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device.hpp
new file mode 100644
index 0000000..cd26b7e
--- /dev/null
+++ b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device.hpp
@@ -0,0 +1,123 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "nanoarrow_device.h"
+
+#ifndef NANOARROW_DEVICE_HPP_INCLUDED
+#define NANOARROW_DEVICE_HPP_INCLUDED
+
+namespace nanoarrow {
+
+namespace internal {
+
+static inline void init_pointer(struct ArrowDeviceArray* data) {
+  data->array.release = nullptr;
+  data->sync_event = nullptr;
+}
+
+static inline void move_pointer(struct ArrowDeviceArray* src,
+                                struct ArrowDeviceArray* dst) {
+  ArrowDeviceArrayMove(src, dst);
+}
+
+static inline void release_pointer(struct ArrowDeviceArray* data) {
+  if (data->array.release != nullptr) {
+    data->array.release(&data->array);
+  }
+
+  data->sync_event = nullptr;
+}
+
+static inline void init_pointer(struct ArrowDeviceArrayStream* data) {
+  data->release = nullptr;
+}
+
+static inline void move_pointer(struct ArrowDeviceArrayStream* src,
+                                struct ArrowDeviceArrayStream* dst) {
+  memcpy(dst, src, sizeof(struct ArrowDeviceArrayStream));
+  src->release = nullptr;
+}
+
+static inline void release_pointer(struct ArrowDeviceArrayStream* data) {
+  if (data->release != nullptr) {
+    data->release(data);
+  }
+}
+
+static inline void init_pointer(struct ArrowDeviceArrayView* data) {
+  ArrowDeviceArrayViewInit(data);
+}
+
+static inline void move_pointer(struct ArrowDeviceArrayView* src,
+                                struct ArrowDeviceArrayView* dst) {
+  ArrowArrayViewMove(&src->array_view, &dst->array_view);
+  dst->device = src->device;
+  src->device = nullptr;
+}
+
+static inline void release_pointer(struct ArrowDeviceArrayView* data) {
+  ArrowArrayViewReset(&data->array_view);
+}
+
+static inline void init_pointer(struct ArrowDevice* data) { data->release = nullptr; }
+
+static inline void move_pointer(struct ArrowDevice* src, struct ArrowDevice* dst) {
+  memcpy(dst, src, sizeof(struct ArrowDevice));
+  src->release = nullptr;
+}
+
+static inline void release_pointer(struct ArrowDevice* data) {
+  if (data->release != nullptr) {
+    data->release(data);
+  }
+}
+
+}  // namespace internal
+}  // namespace nanoarrow
+
+#include "nanoarrow.hpp"
+
+namespace nanoarrow {
+
+namespace device {
+
+/// \defgroup nanoarrow_device_hpp-unique Unique object wrappers
+///
+/// Extends the unique object wrappers in nanoarrow.hpp to include C structs
+/// defined in the nanoarrow_device.h header.
+///
+/// @{
+
+/// \brief Class wrapping a unique struct ArrowDeviceArray
+using UniqueDeviceArray = internal::Unique<struct ArrowDeviceArray>;
+
+/// \brief Class wrapping a unique struct ArrowDeviceArrayStream
+using UniqueDeviceArrayStream = internal::Unique<struct ArrowDeviceArrayStream>;
+
+/// \brief Class wrapping a unique struct ArrowDevice
+using UniqueDevice = internal::Unique<struct ArrowDevice>;
+
+/// \brief Class wrapping a unique struct ArrowDeviceArrayView
+using UniqueDeviceArrayView = internal::Unique<struct ArrowDeviceArrayView>;
+
+/// @}
+
+}  // namespace device
+
+}  // namespace nanoarrow
+
+#endif
diff --git a/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_cuda.c b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_cuda.c
new file mode 100644
index 0000000..00f09ef
--- /dev/null
+++ b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_cuda.c
@@ -0,0 +1,403 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cuda_runtime_api.h>
+
+#include "nanoarrow_device.h"
+
+struct ArrowDeviceCudaAllocatorPrivate {
+  ArrowDeviceType device_type;
+  int64_t device_id;
+  // When moving a buffer from CUDA_HOST to CUDA, the pointer used to access
+  // the data changes but the pointer needed to pass to cudaFreeHost does not
+  void* allocated_ptr;
+};
+
+static void ArrowDeviceCudaDeallocator(struct ArrowBufferAllocator* allocator,
+                                       uint8_t* ptr, int64_t old_size) {
+  struct ArrowDeviceCudaAllocatorPrivate* allocator_private =
+      (struct ArrowDeviceCudaAllocatorPrivate*)allocator->private_data;
+
+  int prev_device = 0;
+  // Not ideal: we have no place to communicate any errors here
+  cudaGetDevice(&prev_device);
+  cudaSetDevice((int)allocator_private->device_id);
+
+  switch (allocator_private->device_type) {
+    case ARROW_DEVICE_CUDA:
+      cudaFree(allocator_private->allocated_ptr);
+      break;
+    case ARROW_DEVICE_CUDA_HOST:
+      cudaFreeHost(allocator_private->allocated_ptr);
+      break;
+    default:
+      break;
+  }
+
+  cudaSetDevice(prev_device);
+  ArrowFree(allocator_private);
+}
+
+static ArrowErrorCode ArrowDeviceCudaAllocateBuffer(struct ArrowDevice* device,
+                                                    struct ArrowBuffer* buffer,
+                                                    int64_t size_bytes) {
+  int prev_device = 0;
+  cudaError_t result = cudaGetDevice(&prev_device);
+  if (result != cudaSuccess) {
+    return EINVAL;
+  }
+
+  result = cudaSetDevice((int)device->device_id);
+  if (result != cudaSuccess) {
+    cudaSetDevice(prev_device);
+    return EINVAL;
+  }
+
+  struct ArrowDeviceCudaAllocatorPrivate* allocator_private =
+      (struct ArrowDeviceCudaAllocatorPrivate*)ArrowMalloc(
+          sizeof(struct ArrowDeviceCudaAllocatorPrivate));
+  if (allocator_private == NULL) {
+    cudaSetDevice(prev_device);
+    return ENOMEM;
+  }
+
+  void* ptr = NULL;
+  switch (device->device_type) {
+    case ARROW_DEVICE_CUDA:
+      result = cudaMalloc(&ptr, (int64_t)size_bytes);
+      break;
+    case ARROW_DEVICE_CUDA_HOST:
+      result = cudaMallocHost(&ptr, (int64_t)size_bytes);
+      break;
+    default:
+      ArrowFree(allocator_private);
+      cudaSetDevice(prev_device);
+      return EINVAL;
+  }
+
+  if (result != cudaSuccess) {
+    ArrowFree(allocator_private);
+    cudaSetDevice(prev_device);
+    return ENOMEM;
+  }
+
+  allocator_private->device_id = device->device_id;
+  allocator_private->device_type = device->device_type;
+  allocator_private->allocated_ptr = ptr;
+
+  buffer->data = (uint8_t*)ptr;
+  buffer->size_bytes = size_bytes;
+  buffer->capacity_bytes = size_bytes;
+  buffer->allocator =
+      ArrowBufferDeallocator(&ArrowDeviceCudaDeallocator, allocator_private);
+
+  cudaSetDevice(prev_device);
+  return NANOARROW_OK;
+}
+
+struct ArrowDeviceCudaArrayPrivate {
+  struct ArrowArray parent;
+  cudaEvent_t sync_event;
+};
+
+static void ArrowDeviceCudaArrayRelease(struct ArrowArray* array) {
+  struct ArrowDeviceCudaArrayPrivate* private_data =
+      (struct ArrowDeviceCudaArrayPrivate*)array->private_data;
+  cudaEventDestroy(private_data->sync_event);
+  private_data->parent.release(&private_data->parent);
+  ArrowFree(private_data);
+  array->release = NULL;
+}
+
+static ArrowErrorCode ArrowDeviceCudaArrayInit(struct ArrowDevice* device,
+                                               struct ArrowDeviceArray* device_array,
+                                               struct ArrowArray* array) {
+  struct ArrowDeviceCudaArrayPrivate* private_data =
+      (struct ArrowDeviceCudaArrayPrivate*)ArrowMalloc(
+          sizeof(struct ArrowDeviceCudaArrayPrivate));
+  if (private_data == NULL) {
+    return ENOMEM;
+  }
+
+  int prev_device = 0;
+  cudaError_t result = cudaGetDevice(&prev_device);
+  if (result != cudaSuccess) {
+    ArrowFree(private_data);
+    return EINVAL;
+  }
+
+  result = cudaSetDevice((int)device->device_id);
+  if (result != cudaSuccess) {
+    cudaSetDevice(prev_device);
+    ArrowFree(private_data);
+    return EINVAL;
+  }
+
+  cudaError_t error = cudaEventCreate(&private_data->sync_event);
+  if (error != cudaSuccess) {
+    ArrowFree(private_data);
+    return EINVAL;
+  }
+
+  memset(device_array, 0, sizeof(struct ArrowDeviceArray));
+  device_array->array = *array;
+  device_array->array.private_data = private_data;
+  device_array->array.release = &ArrowDeviceCudaArrayRelease;
+  ArrowArrayMove(array, &private_data->parent);
+
+  device_array->device_id = device->device_id;
+  device_array->device_type = device->device_type;
+  device_array->sync_event = &private_data->sync_event;
+
+  cudaSetDevice(prev_device);
+  return NANOARROW_OK;
+}
+
+// TODO: All these buffer copiers would benefit from cudaMemcpyAsync but there is
+// no good way to incorporate that just yet
+
+static ArrowErrorCode ArrowDeviceCudaBufferInit(struct ArrowDevice* device_src,
+                                                struct ArrowBufferView src,
+                                                struct ArrowDevice* device_dst,
+                                                struct ArrowBuffer* dst) {
+  struct ArrowBuffer tmp;
+  enum cudaMemcpyKind memcpy_kind;
+
+  if (device_src->device_type == ARROW_DEVICE_CPU &&
+      device_dst->device_type == ARROW_DEVICE_CUDA) {
+    NANOARROW_RETURN_NOT_OK(
+        ArrowDeviceCudaAllocateBuffer(device_dst, &tmp, src.size_bytes));
+    memcpy_kind = cudaMemcpyHostToDevice;
+
+  } else if (device_src->device_type == ARROW_DEVICE_CUDA &&
+             device_dst->device_type == ARROW_DEVICE_CUDA) {
+    NANOARROW_RETURN_NOT_OK(
+        ArrowDeviceCudaAllocateBuffer(device_dst, &tmp, src.size_bytes));
+    memcpy_kind = cudaMemcpyDeviceToDevice;
+
+  } else if (device_src->device_type == ARROW_DEVICE_CUDA &&
+             device_dst->device_type == ARROW_DEVICE_CPU) {
+    ArrowBufferInit(&tmp);
+    NANOARROW_RETURN_NOT_OK(ArrowBufferReserve(&tmp, src.size_bytes));
+    tmp.size_bytes = src.size_bytes;
+    memcpy_kind = cudaMemcpyDeviceToHost;
+
+  } else if (device_src->device_type == ARROW_DEVICE_CPU &&
+             device_dst->device_type == ARROW_DEVICE_CUDA_HOST) {
+    NANOARROW_RETURN_NOT_OK(
+        ArrowDeviceCudaAllocateBuffer(device_dst, &tmp, src.size_bytes));
+    memcpy_kind = cudaMemcpyHostToHost;
+
+  } else if (device_src->device_type == ARROW_DEVICE_CUDA_HOST &&
+             device_dst->device_type == ARROW_DEVICE_CUDA_HOST) {
+    NANOARROW_RETURN_NOT_OK(
+        ArrowDeviceCudaAllocateBuffer(device_dst, &tmp, src.size_bytes));
+    memcpy_kind = cudaMemcpyHostToHost;
+
+  } else if (device_src->device_type == ARROW_DEVICE_CUDA_HOST &&
+             device_dst->device_type == ARROW_DEVICE_CPU) {
+    ArrowBufferInit(&tmp);
+    NANOARROW_RETURN_NOT_OK(ArrowBufferReserve(&tmp, src.size_bytes));
+    tmp.size_bytes = src.size_bytes;
+    memcpy_kind = cudaMemcpyHostToHost;
+
+  } else {
+    return ENOTSUP;
+  }
+
+  cudaError_t result =
+      cudaMemcpy(tmp.data, src.data.as_uint8, (size_t)src.size_bytes, memcpy_kind);
+  if (result != cudaSuccess) {
+    ArrowBufferReset(&tmp);
+    return EINVAL;
+  }
+
+  ArrowBufferMove(&tmp, dst);
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrowDeviceCudaBufferCopy(struct ArrowDevice* device_src,
+                                                struct ArrowBufferView src,
+                                                struct ArrowDevice* device_dst,
+                                                struct ArrowBufferView dst) {
+  enum cudaMemcpyKind memcpy_kind;
+
+  if (device_src->device_type == ARROW_DEVICE_CPU &&
+      device_dst->device_type == ARROW_DEVICE_CUDA) {
+    memcpy_kind = cudaMemcpyHostToDevice;
+  } else if (device_src->device_type == ARROW_DEVICE_CUDA &&
+             device_dst->device_type == ARROW_DEVICE_CUDA) {
+    memcpy_kind = cudaMemcpyDeviceToDevice;
+  } else if (device_src->device_type == ARROW_DEVICE_CUDA &&
+             device_dst->device_type == ARROW_DEVICE_CPU) {
+    memcpy_kind = cudaMemcpyDeviceToHost;
+  } else if (device_src->device_type == ARROW_DEVICE_CPU &&
+             device_dst->device_type == ARROW_DEVICE_CUDA_HOST) {
+    memcpy_kind = cudaMemcpyHostToHost;
+  } else if (device_src->device_type == ARROW_DEVICE_CUDA_HOST &&
+             device_dst->device_type == ARROW_DEVICE_CUDA_HOST) {
+    memcpy_kind = cudaMemcpyHostToHost;
+  } else if (device_src->device_type == ARROW_DEVICE_CUDA_HOST &&
+             device_dst->device_type == ARROW_DEVICE_CPU) {
+    memcpy_kind = cudaMemcpyHostToHost;
+  } else {
+    return ENOTSUP;
+  }
+
+  cudaError_t result = cudaMemcpy((void*)dst.data.as_uint8, src.data.as_uint8,
+                                  dst.size_bytes, memcpy_kind);
+  if (result != cudaSuccess) {
+    return EINVAL;
+  }
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrowDeviceCudaSynchronize(struct ArrowDevice* device,
+                                                 void* sync_event,
+                                                 struct ArrowError* error) {
+  if (sync_event == NULL) {
+    return NANOARROW_OK;
+  }
+
+  if (device->device_type != ARROW_DEVICE_CUDA &&
+      device->device_type != ARROW_DEVICE_CUDA_HOST) {
+    return ENOTSUP;
+  }
+
+  // Memory for cuda_event is owned by the ArrowArray member of the ArrowDeviceArray
+  cudaEvent_t* cuda_event = (cudaEvent_t*)sync_event;
+  cudaError_t result = cudaEventSynchronize(*cuda_event);
+
+  if (result != cudaSuccess) {
+    ArrowErrorSet(error, "cudaEventSynchronize() failed: %s", cudaGetErrorString(result));
+    return EINVAL;
+  }
+
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrowDeviceCudaArrayMove(struct ArrowDevice* device_src,
+                                               struct ArrowDeviceArray* src,
+                                               struct ArrowDevice* device_dst,
+                                               struct ArrowDeviceArray* dst) {
+  // Note that the case where the devices are the same is handled before this
+
+  if (device_src->device_type == ARROW_DEVICE_CUDA_HOST &&
+      device_dst->device_type == ARROW_DEVICE_CPU) {
+    // Move: the array's release callback is responsible for cudaFreeHost or
+    // deregistration (or perhaps this has been handled at a higher level).
+    // We do have to wait on the sync event, though, because this has to be NULL
+    // for a CPU device array.
+    NANOARROW_RETURN_NOT_OK(
+        ArrowDeviceCudaSynchronize(device_src, src->sync_event, NULL));
+    ArrowDeviceArrayMove(src, dst);
+    dst->device_type = device_dst->device_type;
+    dst->device_id = device_dst->device_id;
+    dst->sync_event = NULL;
+
+    return NANOARROW_OK;
+  }
+
+  // TODO: We can theoretically also do a move from CUDA_HOST to CUDA
+
+  return ENOTSUP;
+}
+
+static void ArrowDeviceCudaRelease(struct ArrowDevice* device) {
+  // No private_data to release
+}
+
+static ArrowErrorCode ArrowDeviceCudaInitDevice(struct ArrowDevice* device,
+                                                ArrowDeviceType device_type,
+                                                int64_t device_id,
+                                                struct ArrowError* error) {
+  switch (device_type) {
+    case ARROW_DEVICE_CUDA:
+    case ARROW_DEVICE_CUDA_HOST:
+      break;
+    default:
+      ArrowErrorSet(error, "Device type code %d not supported", (int)device_type);
+      return EINVAL;
+  }
+
+  int n_devices;
+  cudaError_t result = cudaGetDeviceCount(&n_devices);
+  if (result != cudaSuccess) {
+    ArrowErrorSet(error, "cudaGetDeviceCount() failed: %s", cudaGetErrorString(result));
+    return EINVAL;
+  }
+
+  if (device_id < 0 || device_id >= n_devices) {
+    ArrowErrorSet(error, "CUDA device_id must be between 0 and %d", n_devices - 1);
+    return EINVAL;
+  }
+
+  device->device_type = device_type;
+  device->device_id = device_id;
+  device->array_init = &ArrowDeviceCudaArrayInit;
+  device->array_move = &ArrowDeviceCudaArrayMove;
+  device->buffer_init = &ArrowDeviceCudaBufferInit;
+  device->buffer_move = NULL;
+  device->buffer_copy = &ArrowDeviceCudaBufferCopy;
+  device->synchronize_event = &ArrowDeviceCudaSynchronize;
+  device->release = &ArrowDeviceCudaRelease;
+  device->private_data = NULL;
+
+  return NANOARROW_OK;
+}
+
+struct ArrowDevice* ArrowDeviceCuda(ArrowDeviceType device_type, int64_t device_id) {
+  int n_devices;
+  cudaError_t result = cudaGetDeviceCount(&n_devices);
+  if (result != cudaSuccess) {
+    return NULL;
+  }
+  static struct ArrowDevice* devices_singleton = NULL;
+  if (devices_singleton == NULL) {
+    devices_singleton =
+        (struct ArrowDevice*)ArrowMalloc(2 * n_devices * sizeof(struct ArrowDevice));
+
+    for (int i = 0; i < n_devices; i++) {
+      int result =
+          ArrowDeviceCudaInitDevice(devices_singleton + i, ARROW_DEVICE_CUDA, i, NULL);
+      if (result != NANOARROW_OK) {
+        ArrowFree(devices_singleton);
+        devices_singleton = NULL;
+      }
+
+      result = ArrowDeviceCudaInitDevice(devices_singleton + n_devices + i,
+                                         ARROW_DEVICE_CUDA_HOST, i, NULL);
+      if (result != NANOARROW_OK) {
+        ArrowFree(devices_singleton);
+        devices_singleton = NULL;
+      }
+    }
+  }
+
+  if (device_id < 0 || device_id >= n_devices) {
+    return NULL;
+  }
+
+  switch (device_type) {
+    case ARROW_DEVICE_CUDA:
+      return devices_singleton + device_id;
+    case ARROW_DEVICE_CUDA_HOST:
+      return devices_singleton + n_devices + device_id;
+    default:
+      return NULL;
+  }
+}
diff --git a/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_cuda.h b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_cuda.h
new file mode 100644
index 0000000..b05a64d
--- /dev/null
+++ b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_cuda.h
@@ -0,0 +1,52 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef NANOARROW_DEVICE_CUDA_H_INCLUDED
+#define NANOARROW_DEVICE_CUDA_H_INCLUDED
+
+#include "nanoarrow_device.h"
+
+#ifdef NANOARROW_NAMESPACE
+
+#define ArrowDeviceCuda NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceCuda)
+
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/// \defgroup nanoarrow_device_cuda CUDA Device extension
+///
+/// A CUDA (i.e., `cuda_runtime_api.h`) implementation of the Arrow C Device
+/// interface.
+///
+/// @{
+
+/// \brief Get a CUDA device from type and ID
+///
+/// device_type must be one of ARROW_DEVICE_CUDA or ARROW_DEVICE_CUDA_HOST;
+/// device_id must be between 0 and cudaGetDeviceCount - 1.
+struct ArrowDevice* ArrowDeviceCuda(ArrowDeviceType device_type, int64_t device_id);
+
+/// @}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_cuda_test.cc b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_cuda_test.cc
new file mode 100644
index 0000000..912aa09
--- /dev/null
+++ b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_cuda_test.cc
@@ -0,0 +1,230 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <errno.h>
+
+#include <cuda_runtime_api.h>
+#include <gtest/gtest.h>
+
+#include "nanoarrow_device.h"
+#include "nanoarrow_device_cuda.h"
+
+TEST(NanoarrowDeviceCuda, GetDevice) {
+  struct ArrowDevice* cuda = ArrowDeviceCuda(ARROW_DEVICE_CUDA, 0);
+  ASSERT_NE(cuda, nullptr);
+  EXPECT_EQ(cuda->device_type, ARROW_DEVICE_CUDA);
+  struct ArrowDevice* cuda_host = ArrowDeviceCuda(ARROW_DEVICE_CUDA_HOST, 0);
+  ASSERT_NE(cuda_host, nullptr);
+  EXPECT_EQ(cuda_host->device_type, ARROW_DEVICE_CUDA_HOST);
+
+  // null return for invalid input
+  EXPECT_EQ(ArrowDeviceCuda(ARROW_DEVICE_CUDA, std::numeric_limits<int32_t>::max()),
+            nullptr);
+  EXPECT_EQ(ArrowDeviceCuda(ARROW_DEVICE_CPU, 0), nullptr);
+}
+
+TEST(NanoarrowDeviceCuda, DeviceCudaBufferInit) {
+  struct ArrowDevice* cpu = ArrowDeviceCpu();
+  struct ArrowDevice* gpu = ArrowDeviceCuda(ARROW_DEVICE_CUDA, 0);
+  struct ArrowBuffer buffer_gpu;
+  struct ArrowBuffer buffer;
+  uint8_t data[] = {0x01, 0x02, 0x03, 0x04, 0x05};
+  struct ArrowBufferView cpu_view = {data, sizeof(data)};
+
+  // CPU -> GPU
+  ASSERT_EQ(ArrowDeviceBufferInit(cpu, cpu_view, gpu, &buffer_gpu), NANOARROW_OK);
+  EXPECT_EQ(buffer_gpu.size_bytes, sizeof(data));
+  // (Content is tested on the roundtrip)
+  struct ArrowBufferView gpu_view = {buffer_gpu.data, buffer_gpu.size_bytes};
+
+  // GPU -> GPU
+  ASSERT_EQ(ArrowDeviceBufferInit(gpu, gpu_view, gpu, &buffer), NANOARROW_OK);
+  EXPECT_EQ(buffer.size_bytes, sizeof(data));
+  // (Content is tested on the roundtrip)
+  ArrowBufferReset(&buffer);
+
+  // GPU -> CPU
+  ASSERT_EQ(ArrowDeviceBufferInit(gpu, gpu_view, cpu, &buffer), NANOARROW_OK);
+  EXPECT_EQ(buffer.size_bytes, sizeof(data));
+  EXPECT_EQ(memcmp(buffer.data, data, sizeof(data)), 0);
+  ArrowBufferReset(&buffer);
+
+  ArrowBufferReset(&buffer_gpu);
+}
+
+TEST(NanoarrowDeviceCuda, DeviceCudaHostBufferInit) {
+  struct ArrowDevice* cpu = ArrowDeviceCpu();
+  struct ArrowDevice* gpu = ArrowDeviceCuda(ARROW_DEVICE_CUDA_HOST, 0);
+  struct ArrowBuffer buffer_gpu;
+  struct ArrowBuffer buffer;
+  uint8_t data[] = {0x01, 0x02, 0x03, 0x04, 0x05};
+  struct ArrowBufferView cpu_view = {data, sizeof(data)};
+
+  // CPU -> GPU
+  ASSERT_EQ(ArrowDeviceBufferInit(cpu, cpu_view, gpu, &buffer_gpu), NANOARROW_OK);
+  EXPECT_EQ(buffer_gpu.size_bytes, sizeof(data));
+  EXPECT_EQ(memcmp(buffer_gpu.data, data, sizeof(data)), 0);
+  // Here, "GPU" is memory in the CPU space allocated by cudaMallocHost
+  struct ArrowBufferView gpu_view = {buffer_gpu.data, buffer_gpu.size_bytes};
+
+  // GPU -> GPU
+  ASSERT_EQ(ArrowDeviceBufferInit(gpu, gpu_view, gpu, &buffer), NANOARROW_OK);
+  EXPECT_EQ(buffer.size_bytes, sizeof(data));
+  EXPECT_EQ(memcmp(buffer.data, data, sizeof(data)), 0);
+  ArrowBufferReset(&buffer);
+
+  // GPU -> CPU
+  ASSERT_EQ(ArrowDeviceBufferInit(gpu, gpu_view, cpu, &buffer), NANOARROW_OK);
+  EXPECT_EQ(buffer.size_bytes, sizeof(data));
+  EXPECT_EQ(memcmp(buffer.data, data, sizeof(data)), 0);
+  ArrowBufferReset(&buffer);
+
+  ArrowBufferReset(&buffer_gpu);
+}
+
+TEST(NanoarrowDeviceCuda, DeviceCudaBufferCopy) {
+  struct ArrowDevice* cpu = ArrowDeviceCpu();
+  struct ArrowDevice* gpu = ArrowDeviceCuda(ARROW_DEVICE_CUDA, 0);
+  uint8_t data[] = {0x01, 0x02, 0x03, 0x04, 0x05};
+  struct ArrowBufferView cpu_view = {data, sizeof(data)};
+
+  void* gpu_dest;
+  cudaError_t result = cudaMalloc(&gpu_dest, sizeof(data));
+  struct ArrowBufferView gpu_view = {gpu_dest, sizeof(data)};
+  if (result != cudaSuccess) {
+    GTEST_FAIL() << "cudaMalloc(&gpu_dest) failed";
+  }
+
+  void* gpu_dest2;
+  result = cudaMalloc(&gpu_dest2, sizeof(data));
+  struct ArrowBufferView gpu_view2 = {gpu_dest2, sizeof(data)};
+  if (result != cudaSuccess) {
+    GTEST_FAIL() << "cudaMalloc(&gpu_dest2) failed";
+  }
+
+  // CPU -> GPU
+  ASSERT_EQ(ArrowDeviceBufferCopy(cpu, cpu_view, gpu, gpu_view), NANOARROW_OK);
+
+  // GPU -> GPU
+  ASSERT_EQ(ArrowDeviceBufferCopy(gpu, gpu_view, gpu, gpu_view2), NANOARROW_OK);
+
+  // GPU -> CPU
+  uint8_t cpu_dest[5];
+  struct ArrowBufferView cpu_dest_view = {cpu_dest, sizeof(data)};
+  ASSERT_EQ(ArrowDeviceBufferCopy(gpu, gpu_view, cpu, cpu_dest_view), NANOARROW_OK);
+
+  // Check roundtrip
+  EXPECT_EQ(memcmp(cpu_dest, data, sizeof(data)), 0);
+
+  // Clean up
+  result = cudaFree(gpu_dest);
+  if (result != cudaSuccess) {
+    GTEST_FAIL() << "cudaFree(gpu_dest) failed";
+  }
+
+  result = cudaFree(gpu_dest2);
+  if (result != cudaSuccess) {
+    GTEST_FAIL() << "cudaFree(gpu_dest2) failed";
+  }
+}
+
+class StringTypeParameterizedTestFixture
+    : public ::testing::TestWithParam<std::pair<ArrowDeviceType, enum ArrowType>> {
+ protected:
+  std::pair<ArrowDeviceType, enum ArrowType> info;
+};
+
+std::pair<ArrowDeviceType, enum ArrowType> DeviceAndType(ArrowDeviceType device_type,
+                                                         enum ArrowType arrow_type) {
+  return {device_type, arrow_type};
+}
+
+TEST_P(StringTypeParameterizedTestFixture, ArrowDeviceCudaArrayViewString) {
+  struct ArrowDevice* cpu = ArrowDeviceCpu();
+  struct ArrowDevice* gpu = ArrowDeviceCuda(GetParam().first, 0);
+  struct ArrowArray array;
+  struct ArrowDeviceArray device_array;
+  struct ArrowDeviceArrayView device_array_view;
+  enum ArrowType string_type = GetParam().second;
+
+  ASSERT_EQ(ArrowArrayInitFromType(&array, string_type), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayStartAppending(&array), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayAppendString(&array, ArrowCharView("abc")), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayAppendString(&array, ArrowCharView("defg")), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayAppendNull(&array, 1), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayFinishBuildingDefault(&array, nullptr), NANOARROW_OK);
+
+  ASSERT_EQ(ArrowDeviceArrayInit(cpu, &device_array, &array), NANOARROW_OK);
+
+  ArrowDeviceArrayViewInit(&device_array_view);
+  ArrowArrayViewInitFromType(&device_array_view.array_view, string_type);
+  ASSERT_EQ(ArrowDeviceArrayViewSetArray(&device_array_view, &device_array, nullptr),
+            NANOARROW_OK);
+
+  EXPECT_EQ(device_array_view.array_view.buffer_views[2].size_bytes, 7);
+  EXPECT_EQ(device_array.array.length, 3);
+
+  // Copy required to Cuda
+  struct ArrowDeviceArray device_array2;
+  device_array2.array.release = nullptr;
+  ASSERT_EQ(ArrowDeviceArrayMoveToDevice(&device_array, gpu, &device_array2), ENOTSUP);
+  ASSERT_EQ(ArrowDeviceArrayViewCopy(&device_array_view, gpu, &device_array2),
+            NANOARROW_OK);
+  device_array.array.release(&device_array.array);
+
+  ASSERT_NE(device_array2.array.release, nullptr);
+  ASSERT_EQ(device_array2.device_id, gpu->device_id);
+  ASSERT_EQ(ArrowDeviceArrayViewSetArray(&device_array_view, &device_array2, nullptr),
+            NANOARROW_OK);
+  EXPECT_EQ(device_array_view.array_view.buffer_views[2].size_bytes, 7);
+  EXPECT_EQ(device_array_view.array_view.length, 3);
+  EXPECT_EQ(device_array2.array.length, 3);
+
+  // Copy required back to Cpu for Cuda; not for CudaHost
+  if (gpu->device_type == ARROW_DEVICE_CUDA_HOST) {
+    ASSERT_EQ(ArrowDeviceArrayMoveToDevice(&device_array2, cpu, &device_array),
+              NANOARROW_OK);
+  } else {
+    ASSERT_EQ(ArrowDeviceArrayViewCopy(&device_array_view, cpu, &device_array),
+              NANOARROW_OK);
+    device_array2.array.release(&device_array2.array);
+  }
+
+  ASSERT_NE(device_array.array.release, nullptr);
+  ASSERT_EQ(device_array.device_type, ARROW_DEVICE_CPU);
+  ASSERT_EQ(ArrowDeviceArrayViewSetArray(&device_array_view, &device_array, nullptr),
+            NANOARROW_OK);
+
+  EXPECT_EQ(device_array_view.array_view.buffer_views[2].size_bytes, 7);
+  EXPECT_EQ(memcmp(device_array_view.array_view.buffer_views[2].data.data, "abcdefg", 7),
+            0);
+
+  device_array.array.release(&device_array.array);
+  ArrowDeviceArrayViewReset(&device_array_view);
+}
+
+INSTANTIATE_TEST_SUITE_P(
+    NanoarrowDeviceCuda, StringTypeParameterizedTestFixture,
+    ::testing::Values(DeviceAndType(ARROW_DEVICE_CUDA, NANOARROW_TYPE_STRING),
+                      DeviceAndType(ARROW_DEVICE_CUDA, NANOARROW_TYPE_LARGE_STRING),
+                      DeviceAndType(ARROW_DEVICE_CUDA, NANOARROW_TYPE_BINARY),
+                      DeviceAndType(ARROW_DEVICE_CUDA, NANOARROW_TYPE_LARGE_BINARY),
+                      DeviceAndType(ARROW_DEVICE_CUDA_HOST, NANOARROW_TYPE_STRING),
+                      DeviceAndType(ARROW_DEVICE_CUDA_HOST, NANOARROW_TYPE_LARGE_STRING),
+                      DeviceAndType(ARROW_DEVICE_CUDA_HOST, NANOARROW_TYPE_BINARY),
+                      DeviceAndType(ARROW_DEVICE_CUDA_HOST,
+                                    NANOARROW_TYPE_LARGE_BINARY)));
diff --git a/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_hpp_test.cc b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_hpp_test.cc
new file mode 100644
index 0000000..fc3a555
--- /dev/null
+++ b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_hpp_test.cc
@@ -0,0 +1,75 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <gtest/gtest.h>
+
+#include "nanoarrow_device.hpp"
+
+TEST(NanoarrowDeviceHpp, UniqueDeviceArray) {
+  nanoarrow::device::UniqueDeviceArray array;
+  ASSERT_EQ(array->array.release, nullptr);
+
+  ASSERT_EQ(ArrowArrayInitFromType(&array->array, NANOARROW_TYPE_INT32), NANOARROW_OK);
+  ASSERT_NE(array->array.release, nullptr);
+
+  nanoarrow::device::UniqueDeviceArray array2 = std::move(array);
+  ASSERT_EQ(array->array.release, nullptr);
+  ASSERT_NE(array2->array.release, nullptr);
+}
+
+TEST(NanoarrowDeviceHpp, UniqueDeviceArrayStream) {
+  nanoarrow::device::UniqueDeviceArrayStream stream;
+  ASSERT_EQ(stream->release, nullptr);
+
+  nanoarrow::UniqueSchema schema;
+  ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_INT32), NANOARROW_OK);
+  nanoarrow::UniqueArrayStream naive_stream;
+  ASSERT_EQ(ArrowBasicArrayStreamInit(naive_stream.get(), schema.get(), 0), NANOARROW_OK);
+
+  ASSERT_EQ(
+      ArrowDeviceBasicArrayStreamInit(stream.get(), naive_stream.get(), ArrowDeviceCpu()),
+      NANOARROW_OK);
+  ASSERT_NE(stream->release, nullptr);
+
+  nanoarrow::device::UniqueDeviceArrayStream stream2 = std::move(stream);
+  ASSERT_EQ(stream->release, nullptr);
+  ASSERT_NE(stream2->release, nullptr);
+}
+
+TEST(NanoarrowDeviceHpp, UniqueDevice) {
+  nanoarrow::device::UniqueDevice device;
+  ASSERT_EQ(device->release, nullptr);
+
+  ArrowDeviceInitCpu(device.get());
+
+  nanoarrow::device::UniqueDevice device2 = std::move(device);
+  ASSERT_EQ(device->release, nullptr);
+  ASSERT_NE(device2->release, nullptr);
+}
+
+TEST(NanoarrowDeviceHpp, UniqueDeviceArrayView) {
+  nanoarrow::device::UniqueDeviceArrayView array_view;
+  ASSERT_EQ(array_view->device, nullptr);
+  ArrowDeviceArrayViewInit(array_view.get());
+  ArrowArrayViewInitFromType(&array_view->array_view, NANOARROW_TYPE_INT32);
+
+  ASSERT_EQ(array_view->array_view.storage_type, NANOARROW_TYPE_INT32);
+
+  nanoarrow::device::UniqueDeviceArrayView array_view2 = std::move(array_view);
+  ASSERT_EQ(array_view2->array_view.storage_type, NANOARROW_TYPE_INT32);
+  ASSERT_EQ(array_view->array_view.storage_type, NANOARROW_TYPE_UNINITIALIZED);
+}
diff --git a/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_metal.cc b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_metal.cc
new file mode 100644
index 0000000..3bdfb71
--- /dev/null
+++ b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_metal.cc
@@ -0,0 +1,389 @@
+
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <errno.h>
+#include <string.h>
+#include <unistd.h>
+
+#define NS_PRIVATE_IMPLEMENTATION
+#define MTL_PRIVATE_IMPLEMENTATION
+#include <Metal/Metal.hpp>
+
+#include "nanoarrow_device.hpp"
+
+#include "nanoarrow_device_metal.h"
+
+// If non-null, caller must ->release() the return value. This doesn't
+// release the underlying memory (which must be managed separately).
+static MTL::Buffer* ArrowDeviceMetalWrapBufferNonOwning(MTL::Device* mtl_device,
+                                                        const void* arbitrary_addr,
+                                                        int64_t size_bytes = -1) {
+  // Cache the page size from the system call
+  static int pagesize = 0;
+  if (pagesize == 0) {
+    pagesize = getpagesize();
+  }
+
+  // If we don't know the size of the buffer yet, try pagesize
+  if (size_bytes == -1) {
+    size_bytes = pagesize;
+  }
+
+  // We can wrap any zero-size buffer
+  if (size_bytes == 0) {
+    return mtl_device->newBuffer(0, MTL::ResourceStorageModeShared);
+  }
+
+  int64_t allocation_size;
+  if (size_bytes % pagesize == 0) {
+    allocation_size = size_bytes;
+  } else {
+    allocation_size = (size_bytes / pagesize) + 1 * pagesize;
+  }
+
+  // Will return nullptr if the memory is improperly aligned
+  return mtl_device->newBuffer(arbitrary_addr, allocation_size,
+                               MTL::ResourceStorageModeShared, nullptr);
+}
+
+static uint8_t* ArrowDeviceMetalAllocatorReallocate(
+    struct ArrowBufferAllocator* allocator, uint8_t* ptr, int64_t old_size,
+    int64_t new_size) {
+  // Cache the page size from the system call
+  static int pagesize = 0;
+  if (pagesize == 0) {
+    pagesize = getpagesize();
+  }
+
+  int64_t allocation_size;
+  if (new_size % pagesize == 0) {
+    allocation_size = new_size;
+  } else {
+    allocation_size = (new_size / pagesize) + 1 * pagesize;
+  }
+
+  // If growing an existing buffer but the allocation size is still big enough,
+  // return the same pointer and do nothing.
+  if (ptr != nullptr && new_size >= old_size && new_size <= allocation_size) {
+    return ptr;
+  }
+
+  int64_t copy_size;
+  if (new_size > old_size) {
+    copy_size = old_size;
+  } else {
+    copy_size = new_size;
+  }
+
+  void* new_ptr = nullptr;
+  posix_memalign(&new_ptr, pagesize, allocation_size);
+  if (new_ptr != nullptr && ptr != nullptr) {
+    memcpy(new_ptr, ptr, copy_size);
+  }
+
+  if (ptr != nullptr) {
+    free(ptr);
+  }
+
+  return reinterpret_cast<uint8_t*>(new_ptr);
+}
+
+static void ArrowDeviceMetalAllocatorFree(struct ArrowBufferAllocator* allocator,
+                                          uint8_t* ptr, int64_t old_size) {
+  free(ptr);
+}
+
+void ArrowDeviceMetalInitBuffer(struct ArrowBuffer* buffer) {
+  buffer->allocator.reallocate = &ArrowDeviceMetalAllocatorReallocate;
+  buffer->allocator.free = &ArrowDeviceMetalAllocatorFree;
+  buffer->allocator.private_data = nullptr;
+  buffer->data = nullptr;
+  buffer->size_bytes = 0;
+  buffer->capacity_bytes = 0;
+}
+
+ArrowErrorCode ArrowDeviceMetalAlignArrayBuffers(struct ArrowArray* array) {
+  struct ArrowBuffer* buffer;
+  struct ArrowBuffer new_buffer;
+
+  for (int64_t i = 0; i < array->n_buffers; i++) {
+    buffer = ArrowArrayBuffer(array, i);
+    ArrowDeviceMetalInitBuffer(&new_buffer);
+    NANOARROW_RETURN_NOT_OK(
+        ArrowBufferAppend(&new_buffer, buffer->data, buffer->size_bytes));
+    ArrowBufferReset(buffer);
+    ArrowBufferMove(&new_buffer, buffer);
+  }
+
+  for (int64_t i = 0; i < array->n_children; i++) {
+    NANOARROW_RETURN_NOT_OK(ArrowDeviceMetalAlignArrayBuffers(array->children[i]));
+  }
+
+  return NANOARROW_OK;
+}
+
+struct ArrowDeviceMetalArrayPrivate {
+  struct ArrowArray parent;
+  MTL::SharedEvent* event;
+};
+
+static void ArrowDeviceMetalArrayRelease(struct ArrowArray* array) {
+  struct ArrowDeviceMetalArrayPrivate* private_data =
+      (struct ArrowDeviceMetalArrayPrivate*)array->private_data;
+  private_data->event->release();
+  private_data->parent.release(&private_data->parent);
+  ArrowFree(private_data);
+  array->release = NULL;
+}
+
+static ArrowErrorCode ArrowDeviceMetalArrayInit(struct ArrowDevice* device,
+                                                struct ArrowDeviceArray* device_array,
+                                                struct ArrowArray* array) {
+  struct ArrowDeviceMetalArrayPrivate* private_data =
+      (struct ArrowDeviceMetalArrayPrivate*)ArrowMalloc(
+          sizeof(struct ArrowDeviceMetalArrayPrivate));
+  if (private_data == NULL) {
+    return ENOMEM;
+  }
+
+  auto mtl_device = reinterpret_cast<MTL::Device*>(device->private_data);
+  private_data->event = mtl_device->newSharedEvent();
+
+  memset(device_array, 0, sizeof(struct ArrowDeviceArray));
+  device_array->array = *array;
+  device_array->array.private_data = private_data;
+  device_array->array.release = &ArrowDeviceMetalArrayRelease;
+  ArrowArrayMove(array, &private_data->parent);
+
+  device_array->device_id = device->device_id;
+  device_array->device_type = device->device_type;
+  device_array->sync_event = private_data->event;
+
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrowDeviceMetalBufferInit(struct ArrowDevice* device_src,
+                                                 struct ArrowBufferView src,
+                                                 struct ArrowDevice* device_dst,
+                                                 struct ArrowBuffer* dst) {
+  if (device_src->device_type == ARROW_DEVICE_CPU &&
+      device_dst->device_type == ARROW_DEVICE_METAL) {
+    struct ArrowBuffer tmp;
+    ArrowDeviceMetalInitBuffer(&tmp);
+    NANOARROW_RETURN_NOT_OK(ArrowBufferAppend(&tmp, src.data.as_uint8, src.size_bytes));
+    ArrowBufferMove(&tmp, dst);
+    return NANOARROW_OK;
+
+  } else if (device_src->device_type == ARROW_DEVICE_METAL &&
+             device_dst->device_type == ARROW_DEVICE_METAL) {
+    struct ArrowBuffer tmp;
+    ArrowDeviceMetalInitBuffer(&tmp);
+    NANOARROW_RETURN_NOT_OK(ArrowBufferAppend(&tmp, src.data.as_uint8, src.size_bytes));
+    ArrowBufferMove(&tmp, dst);
+    return NANOARROW_OK;
+
+  } else if (device_src->device_type == ARROW_DEVICE_METAL &&
+             device_dst->device_type == ARROW_DEVICE_CPU) {
+    struct ArrowBuffer tmp;
+    ArrowDeviceMetalInitBuffer(&tmp);
+    NANOARROW_RETURN_NOT_OK(ArrowBufferAppend(&tmp, src.data.as_uint8, src.size_bytes));
+    ArrowBufferMove(&tmp, dst);
+    return NANOARROW_OK;
+
+  } else {
+    return ENOTSUP;
+  }
+}
+
+static ArrowErrorCode ArrowDeviceMetalBufferMove(struct ArrowDevice* device_src,
+                                                 struct ArrowBuffer* src,
+                                                 struct ArrowDevice* device_dst,
+                                                 struct ArrowBuffer* dst) {
+  if (device_src->device_type == ARROW_DEVICE_CPU &&
+      device_dst->device_type == ARROW_DEVICE_METAL) {
+    // Check if the input is already aligned
+    auto mtl_device = reinterpret_cast<MTL::Device*>(device_dst->private_data);
+    MTL::Buffer* mtl_buffer =
+        ArrowDeviceMetalWrapBufferNonOwning(mtl_device, src->data, src->size_bytes);
+    if (mtl_buffer != nullptr) {
+      mtl_buffer->release();
+      ArrowBufferMove(src, dst);
+      return NANOARROW_OK;
+    }
+
+    // Otherwise, initialize a new buffer and copy
+    struct ArrowBuffer tmp;
+    ArrowDeviceMetalInitBuffer(&tmp);
+    NANOARROW_RETURN_NOT_OK(ArrowBufferAppend(&tmp, src->data, src->size_bytes));
+    ArrowBufferMove(&tmp, dst);
+    ArrowBufferReset(src);
+    return NANOARROW_OK;
+  } else if (device_src->device_type == ARROW_DEVICE_METAL &&
+             device_dst->device_type == ARROW_DEVICE_METAL) {
+    // Metal -> Metal is always just a move
+    ArrowBufferMove(src, dst);
+    return NANOARROW_OK;
+  } else if (device_src->device_type == ARROW_DEVICE_METAL &&
+             device_dst->device_type == ARROW_DEVICE_CPU) {
+    // Metal -> CPU is also just a move since the memory is CPU accessible
+    ArrowBufferMove(src, dst);
+    return NANOARROW_OK;
+  } else {
+    return ENOTSUP;
+  }
+}
+
+static ArrowErrorCode ArrowDeviceMetalBufferCopy(struct ArrowDevice* device_src,
+                                                 struct ArrowBufferView src,
+                                                 struct ArrowDevice* device_dst,
+                                                 struct ArrowBufferView dst) {
+  // This is all just memcpy since it's all living in the same address space
+  if (device_src->device_type == ARROW_DEVICE_CPU &&
+      device_dst->device_type == ARROW_DEVICE_METAL) {
+    memcpy((void*)dst.data.as_uint8, src.data.as_uint8, dst.size_bytes);
+    return NANOARROW_OK;
+  } else if (device_src->device_type == ARROW_DEVICE_METAL &&
+             device_dst->device_type == ARROW_DEVICE_METAL) {
+    memcpy((void*)dst.data.as_uint8, src.data.as_uint8, dst.size_bytes);
+    return NANOARROW_OK;
+  } else if (device_src->device_type == ARROW_DEVICE_METAL &&
+             device_dst->device_type == ARROW_DEVICE_CPU) {
+    memcpy((void*)dst.data.as_uint8, src.data.as_uint8, dst.size_bytes);
+    return NANOARROW_OK;
+  } else {
+    return ENOTSUP;
+  }
+}
+
+static int ArrowDeviceMetalCopyRequiredCpuToMetal(MTL::Device* mtl_device,
+                                                  struct ArrowArray* src) {
+  // Only if all buffers in src can be wrapped as an MTL::Buffer
+  for (int i = 0; i < src->n_buffers; i++) {
+    MTL::Buffer* maybe_buffer =
+        ArrowDeviceMetalWrapBufferNonOwning(mtl_device, src->buffers[i]);
+    if (maybe_buffer == nullptr) {
+      return true;
+    }
+
+    maybe_buffer->release();
+  }
+
+  for (int64_t i = 0; i < src->n_children; i++) {
+    int result = ArrowDeviceMetalCopyRequiredCpuToMetal(mtl_device, src->children[i]);
+    if (result != 0) {
+      return result;
+    }
+  }
+
+  return false;
+}
+
+static ArrowErrorCode ArrowDeviceMetalSynchronize(struct ArrowDevice* device,
+                                                  void* sync_event,
+                                                  struct ArrowError* error) {
+  // TODO: sync events for Metal are harder than for CUDA
+  // https://developer.apple.com/documentation/metal/resource_synchronization/synchronizing_events_between_a_gpu_and_the_cpu?language=objc
+  // It would be much easier if sync_event were a command buffer
+
+  // Something like:
+  // auto listener = MTL::SharedEventListener::alloc();
+  // listener->init();
+
+  // auto event = reinterpret_cast<MTL::SharedEvent*>(sync_event);
+  // event->notifyListener(
+  //   listener, event->signaledValue(), ^(MTL::SharedEvent* pEvent, uint64_t value) {
+  //     pEvent->signaledValue = value + 1;
+  //   });
+
+  // listener->release();
+
+  return NANOARROW_OK;
+}
+
+static ArrowErrorCode ArrowDeviceMetalArrayMove(struct ArrowDevice* device_src,
+                                                struct ArrowDeviceArray* src,
+                                                struct ArrowDevice* device_dst,
+                                                struct ArrowDeviceArray* dst) {
+  // Note that the case where the devices are the same is handled before this
+
+  if (device_src->device_type == ARROW_DEVICE_CPU &&
+      device_dst->device_type == ARROW_DEVICE_METAL) {
+    // Check if we can do the move (i.e., if all buffers are page-aligned)
+    auto mtl_device = reinterpret_cast<MTL::Device*>(device_dst->private_data);
+    if (ArrowDeviceMetalCopyRequiredCpuToMetal(mtl_device, &src->array)) {
+      return ENOTSUP;
+    }
+
+    NANOARROW_RETURN_NOT_OK(ArrowDeviceArrayInit(device_dst, dst, &src->array));
+    return NANOARROW_OK;
+
+  } else if (device_src->device_type == ARROW_DEVICE_METAL &&
+             device_dst->device_type == ARROW_DEVICE_CPU) {
+    NANOARROW_RETURN_NOT_OK(
+        ArrowDeviceMetalSynchronize(device_src, src->sync_event, nullptr));
+    ArrowDeviceArrayMove(src, dst);
+    dst->device_type = device_dst->device_type;
+    dst->device_id = device_dst->device_id;
+    dst->sync_event = NULL;
+    return NANOARROW_OK;
+  }
+
+  return ENOTSUP;
+}
+
+static void ArrowDeviceMetalRelease(struct ArrowDevice* device) {
+  auto mtl_device = reinterpret_cast<MTL::Device*>(device->private_data);
+  mtl_device->release();
+  device->release = NULL;
+}
+
+struct ArrowDevice* ArrowDeviceMetalDefaultDevice(void) {
+  static struct ArrowDevice* default_device_singleton = nullptr;
+  if (default_device_singleton == nullptr) {
+    default_device_singleton =
+        (struct ArrowDevice*)ArrowMalloc(sizeof(struct ArrowDevice));
+    int result = ArrowDeviceMetalInitDefaultDevice(default_device_singleton, nullptr);
+    if (result != NANOARROW_OK) {
+      ArrowFree(default_device_singleton);
+      default_device_singleton = nullptr;
+    }
+  }
+
+  return default_device_singleton;
+}
+
+ArrowErrorCode ArrowDeviceMetalInitDefaultDevice(struct ArrowDevice* device,
+                                                 struct ArrowError* error) {
+  MTL::Device* default_device = MTL::CreateSystemDefaultDevice();
+  if (default_device == nullptr) {
+    ArrowErrorSet(error, "No default device found");
+    return EINVAL;
+  }
+
+  device->device_type = ARROW_DEVICE_METAL;
+  device->device_id = static_cast<int64_t>(default_device->registryID());
+  device->array_init = &ArrowDeviceMetalArrayInit;
+  device->array_move = &ArrowDeviceMetalArrayMove;
+  device->buffer_init = &ArrowDeviceMetalBufferInit;
+  device->buffer_move = &ArrowDeviceMetalBufferMove;
+  device->buffer_copy = &ArrowDeviceMetalBufferCopy;
+  device->synchronize_event = &ArrowDeviceMetalSynchronize;
+  device->release = &ArrowDeviceMetalRelease;
+  device->private_data = default_device;
+  return NANOARROW_OK;
+}
diff --git a/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_metal.h b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_metal.h
new file mode 100644
index 0000000..8cb4878
--- /dev/null
+++ b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_metal.h
@@ -0,0 +1,85 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef NANOARROW_DEVICE_METAL_H_INCLUDED
+#define NANOARROW_DEVICE_METAL_H_INCLUDED
+
+#include "nanoarrow_device.h"
+
+#ifdef NANOARROW_NAMESPACE
+
+#define ArrowDeviceMetalDefaultDevice \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceMetalDefaultDevice)
+#define ArrowDeviceMetalInitDefaultDevice \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceMetalInitDefaultDevice)
+#define ArrowDeviceMetalInitCpuBuffer \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceMetalInitCpuBuffer)
+#define ArrowDeviceMetalInitCpuArrayBuffers \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowDeviceMetalInitCpuArrayBuffers)
+
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/// \defgroup nanoarrow_device_metal Apple Metal Device extension
+///
+/// An Apple Metal implementation of the Arrow C Device interface, primarily targeted to
+/// the M1 series of CPU/GPUs that feature shared CPU/GPU memory. Even though the memory
+/// regions are shared, it is currently not possible to wrap an arbitrary CPU memory
+/// region as an `MTL::Buffer*` unless that memory region is page-aligned. Because of
+/// this, a copy is still required in most cases to make memory GPU accessible. After GPU
+/// calculations are complete; however, moving the buffers back to the CPU is zero-copy.
+///
+/// Sync events are represented as an `MTL::Event*`. The degree to which the pointers
+/// to `MTL::Event*` are stable across metal-cpp versions/builds is currently unknown.
+///
+/// @{
+
+/// \brief A pointer to a default metal device singleton
+struct ArrowDevice* ArrowDeviceMetalDefaultDevice(void);
+
+/// \brief Initialize a preallocated device struct with the default metal device
+ArrowErrorCode ArrowDeviceMetalInitDefaultDevice(struct ArrowDevice* device,
+                                                 struct ArrowError* error);
+
+/// \brief Initialize a buffer with the Metal allocator
+///
+/// Metal uses shared memory with the CPU; however, only page-aligned buffers
+/// or buffers created explicitly using the Metal API can be sent to the GPU.
+/// This buffer's allocator uses the Metal API so that it is cheaper to send
+/// buffers to the GPU later. You can use, append to, or move this buffer just
+/// like a normal ArrowBuffer.
+void ArrowDeviceMetalInitBuffer(struct ArrowBuffer* buffer);
+
+/// \brief Convert an ArrowArray to buffers that use the Metal allocator
+///
+/// Replaces buffers from a given ArrowArray with ones that use the Metal
+/// allocator, copying existing content where necessary. The array is still
+/// valid to use just like a normal ArrowArray that was initialized with
+/// ArrowArrayInitFromType() (i.e., it can be appended to and finished with
+/// validation).
+ArrowErrorCode ArrowDeviceMetalAlignArrayBuffers(struct ArrowArray* array);
+
+/// @}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_metal_test.cc b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_metal_test.cc
new file mode 100644
index 0000000..0a29cca
--- /dev/null
+++ b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_metal_test.cc
@@ -0,0 +1,272 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <errno.h>
+
+#include <gtest/gtest.h>
+
+#include <Metal/Metal.hpp>
+
+#include "nanoarrow_device.hpp"
+
+#include "nanoarrow_device_metal.h"
+
+TEST(NanoarrowDeviceMetal, DefaultDevice) {
+  nanoarrow::device::UniqueDevice device;
+  ASSERT_EQ(ArrowDeviceMetalInitDefaultDevice(device.get(), nullptr), NANOARROW_OK);
+  ASSERT_EQ(device->device_type, ARROW_DEVICE_METAL);
+  ASSERT_NE(device->device_id, 0);
+
+  ASSERT_EQ(ArrowDeviceMetalDefaultDevice(), ArrowDeviceMetalDefaultDevice());
+}
+
+TEST(NanoarrowDeviceMetal, DeviceGpuBufferInit) {
+  struct ArrowDevice* cpu = ArrowDeviceCpu();
+  struct ArrowDevice* gpu = ArrowDeviceMetalDefaultDevice();
+  struct ArrowBuffer buffer;
+  uint8_t data[] = {0x01, 0x02, 0x03, 0x04, 0x05};
+  struct ArrowBufferView cpu_view = {data, sizeof(data)};
+
+  struct ArrowBuffer buffer_aligned;
+  ArrowDeviceMetalInitBuffer(&buffer_aligned);
+  ASSERT_EQ(ArrowBufferAppend(&buffer_aligned, data, sizeof(data)), NANOARROW_OK);
+  struct ArrowBufferView gpu_view = {buffer_aligned.data, sizeof(data)};
+
+  // CPU -> GPU
+  ASSERT_EQ(ArrowDeviceBufferInit(cpu, cpu_view, gpu, &buffer), NANOARROW_OK);
+  EXPECT_EQ(buffer.size_bytes, sizeof(data));
+  EXPECT_EQ(memcmp(buffer.data, data, sizeof(data)), 0);
+  ArrowBufferReset(&buffer);
+
+  // GPU -> GPU
+  ASSERT_EQ(ArrowDeviceBufferInit(gpu, gpu_view, gpu, &buffer), NANOARROW_OK);
+  EXPECT_EQ(buffer.size_bytes, sizeof(data));
+  EXPECT_EQ(memcmp(buffer.data, data, sizeof(data)), 0);
+  ArrowBufferReset(&buffer);
+
+  // GPU -> CPU
+  ASSERT_EQ(ArrowDeviceBufferInit(gpu, gpu_view, cpu, &buffer), NANOARROW_OK);
+  EXPECT_EQ(buffer.size_bytes, sizeof(data));
+  EXPECT_EQ(memcmp(buffer.data, data, sizeof(data)), 0);
+  ArrowBufferReset(&buffer);
+
+  ArrowBufferReset(&buffer_aligned);
+}
+
+TEST(NanoarrowDeviceMetal, DeviceGpuBufferMove) {
+  struct ArrowDevice* cpu = ArrowDeviceCpu();
+  struct ArrowDevice* gpu = ArrowDeviceMetalDefaultDevice();
+  struct ArrowBuffer buffer;
+  struct ArrowBuffer buffer2;
+
+  uint8_t data[] = {0x01, 0x02, 0x03, 0x04, 0x05};
+  struct ArrowBufferView view = {data, sizeof(data)};
+
+  ASSERT_EQ(ArrowDeviceBufferInit(cpu, view, gpu, &buffer), NANOARROW_OK);
+  auto mtl_buffer = reinterpret_cast<MTL::Buffer*>(buffer.data);
+
+  // GPU -> GPU: just a move
+  uint8_t* old_ptr = buffer.data;
+  ASSERT_EQ(ArrowDeviceBufferMove(gpu, &buffer, gpu, &buffer2), NANOARROW_OK);
+  EXPECT_EQ(buffer2.size_bytes, 5);
+  EXPECT_EQ(buffer2.data, old_ptr);
+  EXPECT_EQ(buffer.data, nullptr);
+
+  // GPU -> CPU: just a move
+  ASSERT_EQ(ArrowDeviceBufferMove(gpu, &buffer2, cpu, &buffer), NANOARROW_OK);
+  EXPECT_EQ(buffer.size_bytes, 5);
+  EXPECT_EQ(buffer.data, old_ptr);
+  EXPECT_EQ(buffer2.data, nullptr);
+
+  // CPU -> GPU: should be just a move here because the buffer is properly aligned
+  // from the initial GPU allocation.
+  ASSERT_EQ(ArrowDeviceBufferMove(cpu, &buffer, gpu, &buffer2), NANOARROW_OK);
+  EXPECT_EQ(buffer2.size_bytes, 5);
+  EXPECT_EQ(buffer2.data, old_ptr);
+  EXPECT_EQ(buffer.data, nullptr);
+  ArrowBufferReset(&buffer2);
+
+  // CPU -> GPU without alignment should trigger a copy and release the input
+  ArrowBufferInit(&buffer);
+  ASSERT_EQ(ArrowBufferAppend(&buffer, data, sizeof(data)), NANOARROW_OK);
+  old_ptr = buffer.data;
+  ASSERT_EQ(ArrowDeviceBufferMove(cpu, &buffer, gpu, &buffer2), NANOARROW_OK);
+  EXPECT_EQ(buffer2.size_bytes, 5);
+  EXPECT_NE(buffer2.data, old_ptr);
+  EXPECT_EQ(memcmp(buffer2.data, data, sizeof(data)), 0);
+  EXPECT_EQ(buffer.data, nullptr);
+
+  ArrowBufferReset(&buffer2);
+}
+
+TEST(NanoarrowDeviceMetal, DeviceGpuBufferCopy) {
+  struct ArrowDevice* cpu = ArrowDeviceCpu();
+  struct ArrowDevice* gpu = ArrowDeviceMetalDefaultDevice();
+  uint8_t data[] = {0x01, 0x02, 0x03, 0x04, 0x05};
+  struct ArrowBufferView cpu_view = {data, sizeof(data)};
+
+  struct ArrowBuffer buffer;
+  ASSERT_EQ(ArrowDeviceBufferInit(cpu, cpu_view, gpu, &buffer), NANOARROW_OK);
+  struct ArrowBufferView gpu_view = {buffer.data, sizeof(data)};
+
+  struct ArrowBuffer buffer_dest;
+  ASSERT_EQ(ArrowDeviceBufferInit(cpu, cpu_view, gpu, &buffer_dest), NANOARROW_OK);
+  struct ArrowBufferView gpu_dest_view = {buffer_dest.data, sizeof(data)};
+  void* gpu_dest = buffer_dest.data;
+
+  uint8_t cpu_dest[5];
+  struct ArrowBufferView cpu_dest_view = {cpu_dest, sizeof(data)};
+
+  // GPU -> GPU
+  ASSERT_EQ(ArrowDeviceBufferCopy(gpu, gpu_view, gpu, gpu_dest_view), NANOARROW_OK);
+  EXPECT_EQ(memcmp(gpu_dest, data, sizeof(data)), 0);
+  memset(gpu_dest, 0, sizeof(data));
+
+  // GPU -> CPU
+  ASSERT_EQ(ArrowDeviceBufferCopy(gpu, gpu_view, cpu, cpu_dest_view), NANOARROW_OK);
+  EXPECT_EQ(memcmp(cpu_dest, data, sizeof(data)), 0);
+  memset(cpu_dest, 0, sizeof(data));
+
+  // CPU -> GPU
+  ASSERT_EQ(ArrowDeviceBufferCopy(cpu, cpu_view, gpu, gpu_dest_view), NANOARROW_OK);
+  EXPECT_EQ(memcmp(gpu_dest, data, sizeof(data)), 0);
+
+  ArrowBufferReset(&buffer);
+  ArrowBufferReset(&buffer_dest);
+}
+
+TEST(NanoarrowDeviceMetal, DeviceAlignedBuffer) {
+  struct ArrowBuffer buffer;
+  int64_t data[] = {1, 2, 3, 4, 5, 6, 7, 8};
+  struct ArrowBufferView view = {data, sizeof(data)};
+
+  ArrowDeviceMetalInitBuffer(&buffer);
+  ASSERT_EQ(ArrowBufferAppendBufferView(&buffer, view), NANOARROW_OK);
+  EXPECT_EQ(memcmp(buffer.data, data, sizeof(data)), 0);
+  EXPECT_EQ(buffer.capacity_bytes, 64);
+
+  // Check that when we reallocate larger but less then the allocation size,
+  // the pointer does not change
+  uint8_t* old_ptr = buffer.data;
+  ASSERT_EQ(ArrowBufferAppendBufferView(&buffer, view), NANOARROW_OK);
+  EXPECT_EQ(memcmp(buffer.data, data, sizeof(data)), 0);
+  EXPECT_EQ(memcmp(buffer.data + sizeof(data), data, sizeof(data)), 0);
+  EXPECT_EQ(buffer.capacity_bytes, 128);
+  EXPECT_EQ(buffer.data, old_ptr);
+
+  // But we can still shrink buffers with reallocation
+  ASSERT_EQ(ArrowBufferResize(&buffer, 64, true), NANOARROW_OK);
+  EXPECT_EQ(memcmp(buffer.data, data, sizeof(data)), 0);
+  EXPECT_NE(buffer.data, old_ptr);
+  EXPECT_EQ(buffer.size_bytes, 64);
+  EXPECT_EQ(buffer.capacity_bytes, 64);
+
+  // When we reallocate to an invalid size, we get null
+  ArrowBufferReset(&buffer);
+  ArrowDeviceMetalInitBuffer(&buffer);
+  EXPECT_EQ(ArrowBufferReserve(&buffer, std::numeric_limits<intptr_t>::max()), ENOMEM);
+  EXPECT_EQ(buffer.data, nullptr);
+  EXPECT_EQ(buffer.allocator.private_data, nullptr);
+}
+
+TEST(NanoarrowDeviceMetal, DeviceCpuArrayBuffers) {
+  nanoarrow::UniqueArray array;
+  ASSERT_EQ(ArrowArrayInitFromType(array.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayAllocateChildren(array.get(), 1), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayInitFromType(array->children[0], NANOARROW_TYPE_INT32),
+            NANOARROW_OK);
+
+  ASSERT_EQ(ArrowDeviceMetalAlignArrayBuffers(array.get()), NANOARROW_OK);
+
+  // Make sure we can build an array
+  ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayAppendInt(array->children[0], 1234), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK);
+  ASSERT_EQ(
+      ArrowArrayFinishBuilding(array.get(), NANOARROW_VALIDATION_LEVEL_FULL, nullptr),
+      NANOARROW_OK);
+
+  // Make sure that ArrowDeviceMetalInitArrayBuffers() copies existing content
+  ASSERT_EQ(ArrowDeviceMetalAlignArrayBuffers(array.get()), NANOARROW_OK);
+
+  auto data_ptr = reinterpret_cast<const int32_t*>(array->children[0]->buffers[1]);
+  EXPECT_EQ(data_ptr[0], 1234);
+}
+
+class StringTypeParameterizedTestFixture
+    : public ::testing::TestWithParam<enum ArrowType> {
+ protected:
+  enum ArrowType type;
+};
+
+TEST_P(StringTypeParameterizedTestFixture, ArrowDeviceMetalArrayViewString) {
+  struct ArrowDevice* metal = ArrowDeviceMetalDefaultDevice();
+  struct ArrowDevice* cpu = ArrowDeviceCpu();
+  struct ArrowArray array;
+  struct ArrowDeviceArray device_array;
+  struct ArrowDeviceArrayView device_array_view;
+  enum ArrowType string_type = GetParam();
+
+  ASSERT_EQ(ArrowArrayInitFromType(&array, string_type), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayStartAppending(&array), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayAppendString(&array, ArrowCharView("abc")), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayAppendString(&array, ArrowCharView("defg")), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayAppendNull(&array, 1), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayFinishBuildingDefault(&array, nullptr), NANOARROW_OK);
+
+  ASSERT_EQ(ArrowDeviceArrayInit(cpu, &device_array, &array), NANOARROW_OK);
+
+  ArrowDeviceArrayViewInit(&device_array_view);
+  ArrowArrayViewInitFromType(&device_array_view.array_view, string_type);
+  ASSERT_EQ(ArrowDeviceArrayViewSetArray(&device_array_view, &device_array, nullptr),
+            NANOARROW_OK);
+
+  EXPECT_EQ(device_array_view.array_view.buffer_views[2].size_bytes, 7);
+
+  // Copy required to Metal
+  struct ArrowDeviceArray device_array2;
+  device_array2.array.release = nullptr;
+  ASSERT_EQ(ArrowDeviceArrayMoveToDevice(&device_array, metal, &device_array2), ENOTSUP);
+  ASSERT_EQ(ArrowDeviceArrayViewCopy(&device_array_view, metal, &device_array2),
+            NANOARROW_OK);
+  device_array.array.release(&device_array.array);
+
+  ASSERT_NE(device_array2.array.release, nullptr);
+  ASSERT_EQ(device_array2.device_id, metal->device_id);
+  ASSERT_EQ(ArrowDeviceArrayViewSetArray(&device_array_view, &device_array2, nullptr),
+            NANOARROW_OK);
+  EXPECT_EQ(device_array_view.array_view.buffer_views[2].size_bytes, 7);
+  EXPECT_EQ(memcmp(device_array_view.array_view.buffer_views[2].data.data, "abcdefg", 7),
+            0);
+
+  // Copy shouldn't be required to the CPU
+  ASSERT_EQ(ArrowDeviceArrayMoveToDevice(&device_array2, cpu, &device_array),
+            NANOARROW_OK);
+  ASSERT_EQ(ArrowDeviceArrayViewSetArray(&device_array_view, &device_array, nullptr),
+            NANOARROW_OK);
+  EXPECT_EQ(memcmp(device_array_view.array_view.buffer_views[2].data.data, "abcdefg", 7),
+            0);
+
+  device_array.array.release(&device_array.array);
+  ArrowDeviceArrayViewReset(&device_array_view);
+}
+
+INSTANTIATE_TEST_SUITE_P(NanoarrowDeviceMetal, StringTypeParameterizedTestFixture,
+                         ::testing::Values(NANOARROW_TYPE_STRING,
+                                           NANOARROW_TYPE_LARGE_STRING,
+                                           NANOARROW_TYPE_BINARY,
+                                           NANOARROW_TYPE_LARGE_BINARY));
diff --git a/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_test.cc b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_test.cc
new file mode 100644
index 0000000..730c162
--- /dev/null
+++ b/extensions/nanoarrow_device/src/nanoarrow/nanoarrow_device_test.cc
@@ -0,0 +1,110 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <errno.h>
+
+#include <gtest/gtest.h>
+
+#include "nanoarrow_device.h"
+
+TEST(NanoarrowDevice, CheckRuntime) {
+  EXPECT_EQ(ArrowDeviceCheckRuntime(nullptr), NANOARROW_OK);
+}
+
+TEST(NanoarrowDevice, CpuDevice) {
+  struct ArrowDevice* cpu = ArrowDeviceCpu();
+  EXPECT_EQ(cpu->device_type, ARROW_DEVICE_CPU);
+  EXPECT_EQ(cpu->device_id, 0);
+  EXPECT_EQ(cpu, ArrowDeviceCpu());
+
+  void* sync_event = nullptr;
+  EXPECT_EQ(cpu->synchronize_event(cpu, sync_event, nullptr), NANOARROW_OK);
+  sync_event = cpu;
+  EXPECT_EQ(cpu->synchronize_event(cpu, sync_event, nullptr), EINVAL);
+}
+
+TEST(NanoarrowDevice, ArrowDeviceCpuBuffer) {
+  struct ArrowDevice* cpu = ArrowDeviceCpu();
+  struct ArrowBuffer buffer;
+  uint8_t data[] = {0x01, 0x02, 0x03, 0x04, 0x05};
+  struct ArrowBufferView view = {data, sizeof(data)};
+
+  ASSERT_EQ(ArrowDeviceBufferInit(cpu, view, cpu, &buffer), NANOARROW_OK);
+  EXPECT_EQ(buffer.size_bytes, 5);
+  EXPECT_EQ(memcmp(buffer.data, view.data.data, sizeof(data)), 0);
+
+  struct ArrowBuffer buffer2;
+  ASSERT_EQ(ArrowDeviceBufferMove(cpu, &buffer, cpu, &buffer2), NANOARROW_OK);
+  EXPECT_EQ(buffer2.size_bytes, 5);
+  EXPECT_EQ(memcmp(buffer2.data, view.data.data, sizeof(data)), 0);
+  EXPECT_EQ(buffer.data, nullptr);
+
+  uint8_t dest[5];
+  struct ArrowBufferView dest_view = {dest, sizeof(dest)};
+  ASSERT_EQ(ArrowDeviceBufferCopy(cpu, view, cpu, dest_view), NANOARROW_OK);
+  EXPECT_EQ(memcmp(dest, view.data.data, sizeof(data)), 0);
+
+  ArrowBufferReset(&buffer2);
+}
+
+class StringTypeParameterizedTestFixture
+    : public ::testing::TestWithParam<enum ArrowType> {
+ protected:
+  enum ArrowType type;
+};
+
+TEST_P(StringTypeParameterizedTestFixture, ArrowDeviceCpuArrayViewString) {
+  struct ArrowDevice* cpu = ArrowDeviceCpu();
+  struct ArrowArray array;
+  struct ArrowDeviceArray device_array;
+  struct ArrowDeviceArrayView device_array_view;
+  enum ArrowType string_type = GetParam();
+
+  ASSERT_EQ(ArrowArrayInitFromType(&array, string_type), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayStartAppending(&array), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayAppendString(&array, ArrowCharView("abc")), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayAppendString(&array, ArrowCharView("defg")), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayAppendNull(&array, 1), NANOARROW_OK);
+  ASSERT_EQ(ArrowArrayFinishBuildingDefault(&array, nullptr), NANOARROW_OK);
+
+  ASSERT_EQ(ArrowDeviceArrayInit(cpu, &device_array, &array), NANOARROW_OK);
+
+  ArrowDeviceArrayViewInit(&device_array_view);
+  ArrowArrayViewInitFromType(&device_array_view.array_view, string_type);
+  ASSERT_EQ(ArrowDeviceArrayViewSetArray(&device_array_view, &device_array, nullptr),
+            NANOARROW_OK);
+
+  EXPECT_EQ(device_array_view.array_view.buffer_views[2].size_bytes, 7);
+
+  // Copy shouldn't be required to the same device
+  struct ArrowDeviceArray device_array2;
+  device_array2.array.release = nullptr;
+  ASSERT_EQ(ArrowDeviceArrayMoveToDevice(&device_array, cpu, &device_array2),
+            NANOARROW_OK);
+  ASSERT_EQ(device_array.array.release, nullptr);
+  ASSERT_NE(device_array2.array.release, nullptr);
+  ASSERT_EQ(device_array2.device_id, cpu->device_id);
+
+  device_array2.array.release(&device_array2.array);
+  ArrowDeviceArrayViewReset(&device_array_view);
+}
+
+INSTANTIATE_TEST_SUITE_P(NanoarrowDevice, StringTypeParameterizedTestFixture,
+                         ::testing::Values(NANOARROW_TYPE_STRING,
+                                           NANOARROW_TYPE_LARGE_STRING,
+                                           NANOARROW_TYPE_BINARY,
+                                           NANOARROW_TYPE_LARGE_BINARY));
diff --git a/src/nanoarrow/array.c b/src/nanoarrow/array.c
index f1524cf..0f40583 100644
--- a/src/nanoarrow/array.c
+++ b/src/nanoarrow/array.c
@@ -1021,6 +1021,19 @@ ArrowErrorCode ArrowArrayViewSetArray(struct ArrowArrayView* array_view,
   return NANOARROW_OK;
 }
 
+ArrowErrorCode ArrowArrayViewSetArrayMinimal(struct ArrowArrayView* array_view,
+                                             struct ArrowArray* array,
+                                             struct ArrowError* error) {
+  // Extract information from the array into the array view
+  NANOARROW_RETURN_NOT_OK(ArrowArrayViewSetArrayInternal(array_view, array, error));
+
+  // Run default validation. Because we've marked all non-NULL buffers as having unknown
+  // size, validation will also update the buffer sizes as it goes.
+  NANOARROW_RETURN_NOT_OK(ArrowArrayViewValidateMinimal(array_view, error));
+
+  return NANOARROW_OK;
+}
+
 static int ArrowAssertIncreasingInt32(struct ArrowBufferView view,
                                       struct ArrowError* error) {
   if (view.size_bytes <= (int64_t)sizeof(int32_t)) {
diff --git a/src/nanoarrow/array_inline.h b/src/nanoarrow/array_inline.h
index bdca215..78a8353 100644
--- a/src/nanoarrow/array_inline.h
+++ b/src/nanoarrow/array_inline.h
@@ -521,6 +521,8 @@ static inline ArrowErrorCode ArrowArrayAppendString(struct ArrowArray* array,
   switch (private_data->storage_type) {
     case NANOARROW_TYPE_STRING:
     case NANOARROW_TYPE_LARGE_STRING:
+    case NANOARROW_TYPE_BINARY:
+    case NANOARROW_TYPE_LARGE_BINARY:
       return ArrowArrayAppendBytes(array, buffer_view);
     default:
       return EINVAL;
diff --git a/src/nanoarrow/nanoarrow.h b/src/nanoarrow/nanoarrow.h
index 372b9dd..413a2c0 100644
--- a/src/nanoarrow/nanoarrow.h
+++ b/src/nanoarrow/nanoarrow.h
@@ -94,6 +94,8 @@
   NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowArrayInitFromSchema)
 #define ArrowArrayInitFromArrayView \
   NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowArrayInitFromArrayView)
+#define ArrowArrayInitFromArrayView \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowArrayInitFromArrayView)
 #define ArrowArrayAllocateChildren \
   NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowArrayAllocateChildren)
 #define ArrowArrayAllocateDictionary \
@@ -118,6 +120,8 @@
   NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowArrayViewSetLength)
 #define ArrowArrayViewSetArray \
   NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowArrayViewSetArray)
+#define ArrowArrayViewSetArrayMinimal \
+  NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowArrayViewSetArrayMinimal)
 #define ArrowArrayViewValidate \
   NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowArrayViewValidate)
 #define ArrowArrayViewReset NANOARROW_SYMBOL(NANOARROW_NAMESPACE, ArrowArrayViewReset)
@@ -957,6 +961,12 @@ void ArrowArrayViewSetLength(struct ArrowArrayView* array_view, int64_t length);
 ArrowErrorCode ArrowArrayViewSetArray(struct ArrowArrayView* array_view,
                                       struct ArrowArray* array, struct ArrowError* error);
 
+/// \brief Set buffer sizes and data pointers from an ArrowArray except for those
+/// that require dereferencing buffer content.
+ArrowErrorCode ArrowArrayViewSetArrayMinimal(struct ArrowArrayView* array_view,
+                                             struct ArrowArray* array,
+                                             struct ArrowError* error);
+
 /// \brief Performs checks on the content of an ArrowArrayView
 ///
 /// If using ArrowArrayViewSetArray() to back array_view with an ArrowArray,