You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by pa...@apache.org on 2023/05/12 14:53:11 UTC
[arrow-nanoarrow] branch main updated: docs: Add "getting started with nanoarrow" tutorial (#190)

This is an automated email from the ASF dual-hosted git repository.

paleolimbot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-nanoarrow.git


The following commit(s) were added to refs/heads/main by this push:
     new d408272  docs: Add "getting started with nanoarrow" tutorial (#190)
d408272 is described below

commit d40827287e0b8c7e6eedbfb51e8cffe73b033fa9
Author: Dewey Dunnington <de...@dunnington.ca>
AuthorDate: Fri May 12 10:53:05 2023 -0400

    docs: Add "getting started with nanoarrow" tutorial (#190)
    
    This PR adds a "getting started" tutorial. The basic premise is that it
    demonstrates building a library in C++ that exposes an Arrow-based
    API...in the future, one could add tutorials for wrapping it in R and
    Python as well.
    
    Open to any and all feedback (including a different premise!)
---
 .github/workflows/docs.yaml                    |   1 +
 .github/workflows/examples.yaml                |   8 +
 ci/scripts/build-docs.sh                       |   3 +
 docs/.gitignore                                |   2 +-
 docs/source/getting-started.md                 | 528 +++++++++++++++++++++++++
 docs/source/{index.rst => getting-started.rst} |  12 +-
 docs/source/index.rst                          |   1 +
 examples/linesplitter/CMakeLists.txt           |  48 +++
 examples/linesplitter/linesplitter.cc          |  98 +++++
 examples/linesplitter/linesplitter.h           |  77 ++++
 examples/linesplitter/linesplitter_test.cc     |  50 +++
 11 files changed, 816 insertions(+), 12 deletions(-)

diff --git a/.github/workflows/docs.yaml b/.github/workflows/docs.yaml
index 14f1d16..942435c 100644
--- a/.github/workflows/docs.yaml
+++ b/.github/workflows/docs.yaml
@@ -32,6 +32,7 @@ on:
       - 'extensions/nanoarrow_ipc/src/**'
       - 'src/**'
       - 'r/**'
+      - 'docs/source/**'
 
 jobs:
   docs:
diff --git a/.github/workflows/examples.yaml b/.github/workflows/examples.yaml
index 446f512..dca7a3d 100644
--- a/.github/workflows/examples.yaml
+++ b/.github/workflows/examples.yaml
@@ -101,3 +101,11 @@ jobs:
           gcc -o example_vendored_ipc_app app.c libexample_vendored_ipc_library.a
 
           cat ../schema-valid.arrows | ./example_vendored_ipc_app
+
+      - name: Getting Started Tutorial Example
+        run: |
+          cd examples/linesplitter
+          mkdir build && cd build
+          cmake ..
+          cmake --build .
+          ctest .
diff --git a/ci/scripts/build-docs.sh b/ci/scripts/build-docs.sh
index 427ea16..f39d992 100755
--- a/ci/scripts/build-docs.sh
+++ b/ci/scripts/build-docs.sh
@@ -78,6 +78,9 @@ main() {
    # Use the README as the docs homepage
    pandoc ../README.md --from markdown --to rst -s -o source/README_generated.rst
 
+   # Do some Markdown -> reST conversion
+   pandoc source/getting-started.md --from markdown --to rst -s -o source/getting-started_generated.rst
+
    # Build sphinx project
    sphinx-build source _build/html
 
diff --git a/docs/.gitignore b/docs/.gitignore
index 43dc75f..e161297 100644
--- a/docs/.gitignore
+++ b/docs/.gitignore
@@ -16,4 +16,4 @@
 # under the License.
 
 _build/
-README_generated.rst
+*_generated.rst
diff --git a/docs/source/getting-started.md b/docs/source/getting-started.md
new file mode 100644
index 0000000..78eb798
--- /dev/null
+++ b/docs/source/getting-started.md
@@ -0,0 +1,528 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Getting started with nanoarrow
+
+This tutorial provides a short example of writing a C++ library that exposes
+an Arrow-based API and uses nanoarrow to implement a simple text file reader/writer.
+In general, nanoarrow can help you write a library or application that:
+
+- exposes an Arrow-based API to read from a data source or format,
+- exposes an Arrow-based API to write to a data source or format,
+- exposes one or more compute functions that operates on and produces data
+  in the form of Arrow arrays, and/or
+- exposes an extension type implementation.
+
+Becauase Arrow has bindings in many languages, it means that you or others can easily
+bind or use your tool in higher-level runtimes like R, Java, C++, Python, Rust, Julia,
+Go, or Ruby, among others.
+
+The nanoarrow library is not the only way that an Arrow-based API can be implemented:
+Arrow C++, Rust, and Go are all excellent choices and can compile into
+static libraries that are C-linkable from other languages; however, existing Arrow
+implementations produce relatively large static libraries and can present complex build-time
+or run-time linking requirements depending on the implementation and features used. If
+the set of libraries you're working with already provide the conveniences you require,
+nanoarrow may provide all the functionality you need.
+
+Now that we've talked about why you might want to build a library with nanoarrow...let's
+build one!
+
+```{=rst}
+.. note::
+  This tutorial also goes over some of the basic structure of writing a C++ library.
+  If you already know how to do this, feel free to scroll to the code examples provided
+  below or take a look at the
+  `final example project <https://github.com/apache/arrow-nanoarrow/tree/main/examples/linesplitter>`__.
+
+```
+
+## The library
+
+The library we'll write in this tutorial is a simple text processing library that splits
+and reassembles lines of text. It will be able to:
+
+- Read text from a buffer into an `ArrowArray` as one element per line, and
+- Write elements of an `ArrowArray` into a buffer, inserting line breaks
+  after every element.
+
+For the sake of argument, we'll call it `linesplitter`.
+
+## The development environment
+
+There are many excellent IDEs that can be used to develop C and C++ libraries. For
+this tutorial, we will use [VSCode](https://code.visualstudio.com/) and
+[CMake](https://cmake.org/). You'll need both installed to follow along:
+VSCode can be downloaded from the official site for most platforms;
+CMake is typically installed via your favourite package manager
+(e.g., `brew install cmake`, `apt-get install cmake` `dnf install cmake`,
+etc.). You will also need a C and C++ compiler: on MacOS these can be installed
+using `xcode-select --install`; on Linux you will need the packages that provide
+`gcc`, `g++`, and `make` (e.g., `apt-get install build-essential`); on Windows
+you will need to install
+[Visual Studio](https://visualstudio.microsoft.com/downloads/) and
+CMake from the official download pages.
+
+Once you have VSCode installed, ensure you have the **CMake Tools** and **C/C++**
+extensions installed. Once your environment is set up, create a folder called
+`linesplitter` and open it using **File -> Open Folder**.
+
+## The interface
+
+We'll expose the interface to our library as a header called `linesplitter.h`.
+To ensure the definitions are only included once in any given source file, we'll
+add the following line at the top:
+
+```cpp
+#pragma once
+```
+
+Then, we need the
+[Arrow C Data interface](https://arrow.apache.org/docs/format/CDataInterface.html#structure-definitions)
+itself, since it provides the type definitions that are recognized by other Arrow
+implementations on which our API will be built. It's designed to be copy and
+pasted in this way - there's no need to put it in another file include
+something from another project.
+
+```cpp
+#include <stdint.h>
+
+#ifndef ARROW_C_DATA_INTERFACE
+#define ARROW_C_DATA_INTERFACE
+
+#define ARROW_FLAG_DICTIONARY_ORDERED 1
+#define ARROW_FLAG_NULLABLE 2
+#define ARROW_FLAG_MAP_KEYS_SORTED 4
+
+struct ArrowSchema {
+  // Array type description
+  const char* format;
+  const char* name;
+  const char* metadata;
+  int64_t flags;
+  int64_t n_children;
+  struct ArrowSchema** children;
+  struct ArrowSchema* dictionary;
+
+  // Release callback
+  void (*release)(struct ArrowSchema*);
+  // Opaque producer-specific data
+  void* private_data;
+};
+
+struct ArrowArray {
+  // Array data description
+  int64_t length;
+  int64_t null_count;
+  int64_t offset;
+  int64_t n_buffers;
+  int64_t n_children;
+  const void** buffers;
+  struct ArrowArray** children;
+  struct ArrowArray* dictionary;
+
+  // Release callback
+  void (*release)(struct ArrowArray*);
+  // Opaque producer-specific data
+  void* private_data;
+};
+
+#endif  // ARROW_C_DATA_INTERFACE
+```
+
+Next, we'll provide definitions for the functions we'll implement below:
+
+```c
+// Builds an ArrowArray of type string that will contain one element for each line
+// in src and places it into out.
+//
+// On success, returns {0, ""}; on error, returns {<errno code>, <error message>}
+std::pair<int, std::string> linesplitter_read(const std::string& src,
+                                              struct ArrowArray* out);
+
+// Concatenates all elements of a string ArrowArray inserting a newline between
+// elements.
+//
+// On success, returns {0, <result>}; on error, returns {<errno code>, <error message>}
+std::pair<int, std::string> linesplitter_write(struct ArrowArray* input);
+```
+
+```{=rst}
+.. note::
+  You may notice that we don't include or mention nanoarrow in any way in the header
+  that is exposed to users. Because nanoarrow is designed to be vendored and is not
+  distributed as a system library, it is not safe for users of your library to
+  ``#include "nanoarrow.h"`` because it might conflict with another library that does
+  the same (with possibly a different version of nanoarrow).
+
+```
+
+## Arrow C data/nanoarrow interface basics
+
+Now that we've seen the functions we need to implement and the Arrow types exposed
+in the C data interface, let's unpack a few basics about using the Arrow C data
+interface and a few conventions used in the nanoarrow implementation.
+
+First, let's discuss the `ArrowSchema` and the `ArrowArray`. You can think of an
+`ArrowSchema` as an expression of a data type, whereas an `ArrowArray` is the
+data itself. These structures accomodate nested types: columns are encoded in
+the `children` member of each. You always need to know the data type of an
+`ArrowArray` before accessing its contents. In our case we only operate on arrays
+of one type ("string") and document that in our interface; for functions that
+operate on more than one type of array you will need to accept an `ArrowSchema`
+and inspect it (e.g., using nanoarrow's helper functions).
+
+Second, let's discuss error handling. You may have noticed in the function definitions
+above that we return `int`, which is an errno-compatible error code or `0` to
+indicate success. Functions in nanoarrow that need to communicate more detailed
+error information accept an `ArrowError*` argument (which can be `NULL` if
+the caller does care about the extra information). Any nanoarrow function that
+might fail communicates errors in this way. To avoid verbose code like the
+following:
+
+```c
+int init_string_non_null(struct ArrowSchema* schema) {
+  int code = ArrowSchemaInitFromType(&schema, NANOARROW_TYPE_STRING);
+  if (code != NANOARROW_OK) {
+    return code;
+  }
+
+  schema->flags &= ~ARROW_FLAG_NULLABLE;
+  return NANOARROW_OK;
+}
+```
+
+...you can use the `NANOARROW_RETURN_NOT_OK()` macro:
+
+```c
+int init_string_non_null(struct ArrowSchema* schema) {
+  NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(&schema, NANOARROW_TYPE_STRING));
+  schema->flags &= ~ARROW_FLAG_NULLABLE;
+  return NANOARROW_OK;
+}
+```
+
+This works as long as your internal functions that use nanoarrow also return
+`int` and/or an `ArrowError*` argument. This usually means that there is
+an outer function that presents a more idiomatic interface (e.g., returning
+`std::optional<>` or throwing an exception) and an inner function that uses
+nanoarrow-style error handling. Embracing `NANOARROW_RETURN_NOT_OK()` is key
+to hapiness when using the nanoarrow library.
+
+Third, let's discuss memory management. Because nanoarrow is implemented in C
+and provides a C interface, the library by default uses C-style memory management
+(i.e., if you allocate it, you clean it up). This is unnecessary when you have
+C++ at your disposal, so nanoarrow also provides a C++ header (`nanoarrow.hpp`) with
+`std::unique_ptr<>`-like wrappers around anything that requires explicit clean up.
+Whereas in C you might have to write code like this:
+
+```c
+struct ArrowSchema schema;
+struct ArrowArray array;
+
+// Ok: if this returns, array was not initialized
+NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(&schema, NANOARROW_TYPE_STRING));
+
+// Verbose: if this fails, we need to release schema before returning
+// or it will leak.
+int code = ArrowArrayInitFromSchema(&array, &schema, NULL);
+if (code != NANOARROW_OK) {
+  schema.release(&schema);
+  return code;
+}
+```
+
+...using the `nanoarrow.hpp` types we can do:
+
+```cpp
+nanoarrow::UniqueSchema schema;
+nanoarrow::UniqueArray array;
+
+NANOARROW_RETURN_NOT_OK(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRING));
+NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromSchema(array.get(), schema.get(), NULL));
+```
+
+## Building the library
+
+Our library implementation will live in `linesplitter.cc`. Before writing the
+actual implementations, let's add just enough to our project that we can
+build it using VSCode's C/C++/CMake integration:
+
+```cpp
+#include <cerrno>
+#include <cstdint>
+#include <sstream>
+#include <string>
+#include <utility>
+
+#include "nanoarrow/nanoarrow.hpp"
+
+#include "linesplitter.h"
+
+std::pair<int, std::string> linesplitter_read(const std::string& src,
+                                              struct ArrowArray* out) {
+  return {ENOTSUP, ""};
+}
+
+std::pair<int, std::string> linesplitter_write(struct ArrowArray* input) {
+  return {ENOTSUP, ""};
+}
+```
+
+We also need a `CMakeLists.txt` file that tells CMake and VSCode what to build.
+CMake has a lot of options and can scale to coordinate very large projects; however
+we only need a few lines to leverage VSCode's integration.
+
+```cmake
+project(linesplitter)
+
+set(CMAKE_CXX_STANDARD 11)
+
+include(FetchContent)
+
+FetchContent_Declare(
+  nanoarrow
+  URL https://github.com/apache/arrow-nanoarrow/releases/download/apache-arrow-nanoarrow-0.1.0/apache-arrow-nanoarrow-0.1.0.tar.gz
+  URL_HASH SHA512=dc62480b986ee76aaad8e38c6fbc602f8cef2cc35a5f5ede7da2a93b4db2b63839bdca3eefe8a44ae1cb6895a2fd3f090e3f6ea1020cf93cfe86437304dfee17)
+FetchContent_MakeAvailable(nanoarrow)
+
+add_library(linesplitter linesplitter.cc)
+target_link_libraries(linesplitter PRIVATE nanoarrow)
+```
+
+After saving `CMakeLists.txt`, you may have to close and re-open the `linesplitter`
+directory in VSCode to activate the CMake integration. From the command pallete
+(i.e., Control/Command-Shift-P), choose **CMake: Build**. If all went well, you should
+see a few lines of output indicating progress towards building and linking `linesplitter`.
+
+```{=rst}
+.. note::
+  Depending on your version of CMake you might also see a few warnings. This CMakeLists.txt
+  is intentionally minimal and as such does not attempt to silence them.
+
+```
+
+```{=rst}
+.. note::
+  If you're not using VSCode, you can accomplish the equivalent task in in a terminal
+  with ``mkdir build && cd build && cmake .. && cmake --build .``.
+
+```
+
+## Building an ArrowArray
+
+The input for our `linesplitter_read()` function in an `std::string`, which we'll iterate
+over and add each detected line as its own element. First, some core logic to detect
+the number of characters until the next `\n` or end-of-string.
+
+```cpp
+static int64_t find_newline(const ArrowStringView& src) {
+  for (int64_t i = 0; i < src.size_bytes; i++) {
+    if (src.data[i] == '\n') {
+      return i;
+    }
+  }
+
+  return src.size_bytes;
+}
+```
+
+The next function we'll define is an internal function that uses nanoarrow-style error
+handling. This uses the `ArrowArrayAppend*()` family of functions provided by
+nanoarrow to build the array:
+
+```cpp
+static int linesplitter_read_internal(const std::string& src, ArrowArray* out,
+                                      ArrowError* error) {
+  nanoarrow::UniqueArray tmp;
+  NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromType(tmp.get(), NANOARROW_TYPE_STRING));
+  NANOARROW_RETURN_NOT_OK(ArrowArrayStartAppending(tmp.get()));
+
+  ArrowStringView src_view = {src.data(), static_cast<int64_t>(src.size())};
+  ArrowStringView line_view;
+  int64_t next_newline = -1;
+  while ((next_newline = find_newline(src_view)) >= 0) {
+    line_view = {src_view.data, next_newline};
+    NANOARROW_RETURN_NOT_OK(ArrowArrayAppendString(tmp.get(), line_view));
+    src_view.data += next_newline + 1;
+    src_view.size_bytes -= next_newline + 1;
+  }
+
+  NANOARROW_RETURN_NOT_OK(ArrowArrayFinishBuildingDefault(tmp.get(), error));
+
+  ArrowArrayMove(tmp.get(), out);
+  return NANOARROW_OK;
+}
+```
+
+Finally, we define a wrapper that corresponds to the outer function definition.
+
+```cpp
+std::pair<int, std::string> linesplitter_read(const std::string& src, ArrowArray* out) {
+  ArrowError error;
+  int code = linesplitter_read_internal(src, out, &error);
+  if (code != NANOARROW_OK) {
+    return {code, std::string(ArrowErrorMessage(&error))};
+  } else {
+    return {NANOARROW_OK, ""};
+  }
+}
+```
+
+## Reading an ArrowArray
+
+The input for our `linesplitter_write()` function is an `ArrowArray*` like the one we
+create in `linesplitter_read()`. Just as nanoarrow provides helpers to build arrays,
+it also provides helpers to read them via the `ArrowArrayView*()` family of functions.
+Again, we first define an internal function that uses nanoarrow-style error handling:
+
+```cpp
+static int linesplitter_write_internal(ArrowArray* input, std::stringstream& out,
+                                       ArrowError* error) {
+  nanoarrow::UniqueArrayView input_view;
+  ArrowArrayViewInitFromType(input_view.get(), NANOARROW_TYPE_STRING);
+  NANOARROW_RETURN_NOT_OK(ArrowArrayViewSetArray(input_view.get(), input, error));
+
+  ArrowStringView item;
+  for (int64_t i = 0; i < input->length; i++) {
+    if (ArrowArrayViewIsNull(input_view.get(), i)) {
+      out << "\n";
+    } else {
+      item = ArrowArrayViewGetStringUnsafe(input_view.get(), i);
+      out << std::string(item.data, item.size_bytes) << "\n";
+    }
+  }
+
+  return NANOARROW_OK;
+}
+```
+
+Then, provide an outer wrapper that corresponds to the outer function definition.
+
+```cpp
+std::pair<int, std::string> linesplitter_write(ArrowArray* input) {
+  std::stringstream out;
+  ArrowError error;
+  int code = linesplitter_write_internal(input, out, &error);
+  if (code != NANOARROW_OK) {
+    return {code, std::string(ArrowErrorMessage(&error))};
+  } else {
+    return {NANOARROW_OK, out.str()};
+  }
+}
+```
+
+## Testing
+
+We have an implementation, but does it work? Unlike higher-level runtimes like
+R and Python, we can't just open a prompt and type some code to find out. For
+C and C++ libraries, the
+[googletest](https://google.github.io/googletest/quickstart-cmake.html)
+framework provides a quick and easy way to do this that scales nicely as the
+complexity of your project grows.
+
+First, we'll add a stub test and some CMake to get going. In `linesplitter_test.cc`,
+add the following:
+
+```cpp
+#include <gtest/gtest.h>
+
+#include "nanoarrow/nanoarrow.hpp"
+
+#include "linesplitter.h"
+
+TEST(Linesplitter, LinesplitterRoundtrip) {
+  EXPECT_EQ(4, 4);
+}
+```
+
+Then, add the following to your `CMakeLists.txt`:
+
+```cmake
+FetchContent_Declare(
+  googletest
+  URL https://github.com/google/googletest/archive/refs/tags/v1.13.0.zip
+)
+FetchContent_MakeAvailable(googletest)
+
+enable_testing()
+
+add_executable(linesplitter_test linesplitter_test.cc)
+target_link_libraries(linesplitter_test linesplitter GTest::gtest_main)
+
+include(GoogleTest)
+gtest_discover_tests(linesplitter_test)
+```
+
+After you're done, build the project again using the **CMake: Build** command from
+the command palette. If all goes well, choose **Test: Run All Tests** from the command
+pallete to run them! You should see some output indiciating that tests ran successfully,
+or you can use VSCode's "Testing" panel to visually inspect which tests passed.
+
+```{=rst}
+.. note::
+  If you're not using VSCode, you can accomplish the equivalent task in in a terminal
+  with ``cd build && ctest .``.
+
+```
+
+Now we're ready to fill in the test! Our two functions happen to round trip,
+so a useful first test might be to check.
+
+```cpp
+TEST(Linesplitter, LinesplitterRoundtrip) {
+  nanoarrow::UniqueArray out;
+  auto result = linesplitter_read("line1\nline2\nline3", out.get());
+  ASSERT_EQ(result.first, 0);
+  ASSERT_EQ(result.second, "");
+
+  ASSERT_EQ(out->length, 3);
+
+  nanoarrow::UniqueArrayView out_view;
+  ArrowArrayViewInitFromType(out_view.get(), NANOARROW_TYPE_STRING);
+  ASSERT_EQ(ArrowArrayViewSetArray(out_view.get(), out.get(), nullptr), 0);
+  ArrowStringView item;
+
+  item = ArrowArrayViewGetStringUnsafe(out_view.get(), 0);
+  ASSERT_EQ(std::string(item.data, item.size_bytes), "line1");
+
+  item = ArrowArrayViewGetStringUnsafe(out_view.get(), 1);
+  ASSERT_EQ(std::string(item.data, item.size_bytes), "line2");
+
+  item = ArrowArrayViewGetStringUnsafe(out_view.get(), 2);
+  ASSERT_EQ(std::string(item.data, item.size_bytes), "line3");
+
+
+  auto result2 = linesplitter_write(out.get());
+  ASSERT_EQ(result2.first, 0);
+  ASSERT_EQ(result2.second, "line1\nline2\nline3\n");
+}
+```
+
+Writing tests in this way also opens up a relatively straightforward debug
+path via the **CMake: Set Debug target** and **CMake: Debug** commands.
+If the first thing that happens when you write run your test is a crash,
+running the tests with the debugger turned on will automatically pause at
+the line of code that caused the crash. For more fine-tuned debugging,
+you can set breakpoints and step through code.
+
+## Summary
+
+This tutorial covered the basics of writing and testing a C++ library exposing an
+Arrow-based API implemented using the nanoarrow C library.
diff --git a/docs/source/index.rst b/docs/source/getting-started.rst
similarity index 82%
copy from docs/source/index.rst
copy to docs/source/getting-started.rst
index 8700738..18e9b39 100644
--- a/docs/source/index.rst
+++ b/docs/source/getting-started.rst
@@ -15,14 +15,4 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-.. include:: README_generated.rst
-
-Contents
---------
-
-.. toctree::
-   :maxdepth: 2
-
-   C API Reference <c>
-   C++ API Reference <cpp>
-   IPC Extension Reference <ipc>
+.. include:: getting-started_generated.rst
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 8700738..fa31569 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -23,6 +23,7 @@ Contents
 .. toctree::
    :maxdepth: 2
 
+   Getting Started <getting-started>
    C API Reference <c>
    C++ API Reference <cpp>
    IPC Extension Reference <ipc>
diff --git a/examples/linesplitter/CMakeLists.txt b/examples/linesplitter/CMakeLists.txt
new file mode 100644
index 0000000..18dc78d
--- /dev/null
+++ b/examples/linesplitter/CMakeLists.txt
@@ -0,0 +1,48 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+project(linesplitter)
+
+set(CMAKE_CXX_STANDARD 11)
+
+include(FetchContent)
+
+FetchContent_Declare(
+  nanoarrow
+  SOURCE_DIR ${CMAKE_CURRENT_LIST_DIR}/../..
+  # We use SOURCE_DIR to simplify testing this example on CI; however,
+  # you should use a released version of nanoarrow like so:
+  # URL https://github.com/apache/arrow-nanoarrow/releases/download/apache-arrow-nanoarrow-0.1.0/apache-arrow-nanoarrow-0.1.0.tar.gz
+  # URL_HASH SHA512=dc62480b986ee76aaad8e38c6fbc602f8cef2cc35a5f5ede7da2a93b4db2b63839bdca3eefe8a44ae1cb6895a2fd3f090e3f6ea1020cf93cfe86437304dfee17)
+)
+FetchContent_MakeAvailable(nanoarrow)
+
+add_library(linesplitter linesplitter.cc)
+target_link_libraries(linesplitter PRIVATE nanoarrow)
+
+FetchContent_Declare(
+  googletest
+  URL https://github.com/google/googletest/archive/refs/tags/v1.13.0.zip
+)
+FetchContent_MakeAvailable(googletest)
+
+enable_testing()
+add_executable(linesplitter_test linesplitter_test.cc)
+target_link_libraries(linesplitter_test linesplitter nanoarrow GTest::gtest_main)
+
+include(GoogleTest)
+gtest_discover_tests(linesplitter_test)
diff --git a/examples/linesplitter/linesplitter.cc b/examples/linesplitter/linesplitter.cc
new file mode 100644
index 0000000..ff2a048
--- /dev/null
+++ b/examples/linesplitter/linesplitter.cc
@@ -0,0 +1,98 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <cerrno>
+#include <cstdint>
+#include <sstream>
+#include <string>
+#include <utility>
+
+#include "nanoarrow/nanoarrow.hpp"
+
+#include "linesplitter.h"
+
+static int64_t find_newline(const ArrowStringView& src) {
+  for (int64_t i = 0; i < src.size_bytes; i++) {
+    if (src.data[i] == '\n') {
+      return i;
+    }
+  }
+
+  return src.size_bytes;
+}
+
+static int linesplitter_read_internal(const std::string& src, ArrowArray* out,
+                                      ArrowError* error) {
+  nanoarrow::UniqueArray tmp;
+  NANOARROW_RETURN_NOT_OK(ArrowArrayInitFromType(tmp.get(), NANOARROW_TYPE_STRING));
+  NANOARROW_RETURN_NOT_OK(ArrowArrayStartAppending(tmp.get()));
+
+  ArrowStringView src_view = {src.data(), static_cast<int64_t>(src.size())};
+  ArrowStringView line_view;
+  int64_t next_newline = -1;
+  while ((next_newline = find_newline(src_view)) >= 0) {
+    line_view = {src_view.data, next_newline};
+    NANOARROW_RETURN_NOT_OK(ArrowArrayAppendString(tmp.get(), line_view));
+    src_view.data += next_newline + 1;
+    src_view.size_bytes -= next_newline + 1;
+  }
+
+  NANOARROW_RETURN_NOT_OK(ArrowArrayFinishBuildingDefault(tmp.get(), error));
+
+  ArrowArrayMove(tmp.get(), out);
+  return NANOARROW_OK;
+}
+
+std::pair<int, std::string> linesplitter_read(const std::string& src, ArrowArray* out) {
+  ArrowError error;
+  int code = linesplitter_read_internal(src, out, &error);
+  if (code != NANOARROW_OK) {
+    return {code, std::string(ArrowErrorMessage(&error))};
+  } else {
+    return {NANOARROW_OK, ""};
+  }
+}
+
+static int linesplitter_write_internal(ArrowArray* input, std::stringstream& out,
+                                       ArrowError* error) {
+  nanoarrow::UniqueArrayView input_view;
+  ArrowArrayViewInitFromType(input_view.get(), NANOARROW_TYPE_STRING);
+  NANOARROW_RETURN_NOT_OK(ArrowArrayViewSetArray(input_view.get(), input, error));
+
+  ArrowStringView item;
+  for (int64_t i = 0; i < input->length; i++) {
+    if (ArrowArrayViewIsNull(input_view.get(), i)) {
+      out << "\n";
+    } else {
+      item = ArrowArrayViewGetStringUnsafe(input_view.get(), i);
+      out << std::string(item.data, item.size_bytes) << "\n";
+    }
+  }
+
+  return NANOARROW_OK;
+}
+
+std::pair<int, std::string> linesplitter_write(ArrowArray* input) {
+  std::stringstream out;
+  ArrowError error;
+  int code = linesplitter_write_internal(input, out, &error);
+  if (code != NANOARROW_OK) {
+    return {code, std::string(ArrowErrorMessage(&error))};
+  } else {
+    return {NANOARROW_OK, out.str()};
+  }
+}
diff --git a/examples/linesplitter/linesplitter.h b/examples/linesplitter/linesplitter.h
new file mode 100644
index 0000000..c1329ce
--- /dev/null
+++ b/examples/linesplitter/linesplitter.h
@@ -0,0 +1,77 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <stdint.h>
+#include <string>
+#include <utility>
+
+#ifndef ARROW_C_DATA_INTERFACE
+#define ARROW_C_DATA_INTERFACE
+
+#define ARROW_FLAG_DICTIONARY_ORDERED 1
+#define ARROW_FLAG_NULLABLE 2
+#define ARROW_FLAG_MAP_KEYS_SORTED 4
+
+struct ArrowSchema {
+  // Array type description
+  const char* format;
+  const char* name;
+  const char* metadata;
+  int64_t flags;
+  int64_t n_children;
+  struct ArrowSchema** children;
+  struct ArrowSchema* dictionary;
+
+  // Release callback
+  void (*release)(struct ArrowSchema*);
+  // Opaque producer-specific data
+  void* private_data;
+};
+
+struct ArrowArray {
+  // Array data description
+  int64_t length;
+  int64_t null_count;
+  int64_t offset;
+  int64_t n_buffers;
+  int64_t n_children;
+  const void** buffers;
+  struct ArrowArray** children;
+  struct ArrowArray* dictionary;
+
+  // Release callback
+  void (*release)(struct ArrowArray*);
+  // Opaque producer-specific data
+  void* private_data;
+};
+
+#endif  // ARROW_C_DATA_INTERFACE
+
+// Builds an ArrowArray of type string that will contain one element for each line
+// in src and places it into out.
+//
+// On success, returns {0, ""}; on error, returns {<errno code>, <error message>}
+std::pair<int, std::string> linesplitter_read(const std::string& src,
+                                              struct ArrowArray* out);
+
+// Concatenates all elements of a string ArrowArray inserting a newline between
+// elements.
+//
+// On success, returns {0, <result>}; on error, returns {<errno code>, <error message>}
+std::pair<int, std::string> linesplitter_write(struct ArrowArray* input);
diff --git a/examples/linesplitter/linesplitter_test.cc b/examples/linesplitter/linesplitter_test.cc
new file mode 100644
index 0000000..dbddd06
--- /dev/null
+++ b/examples/linesplitter/linesplitter_test.cc
@@ -0,0 +1,50 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <gtest/gtest.h>
+
+#include <nanoarrow/nanoarrow.hpp>
+
+#include "linesplitter.h"
+
+TEST(Linesplitter, LinesplitterRoundtrip) {
+  nanoarrow::UniqueArray out;
+  auto result = linesplitter_read("line1\nline2\nline3", out.get());
+  ASSERT_EQ(result.first, 0);
+  ASSERT_EQ(result.second, "");
+
+  ASSERT_EQ(out->length, 3);
+
+  nanoarrow::UniqueArrayView out_view;
+  ArrowArrayViewInitFromType(out_view.get(), NANOARROW_TYPE_STRING);
+  ASSERT_EQ(ArrowArrayViewSetArray(out_view.get(), out.get(), nullptr), 0);
+  ArrowStringView item;
+
+  item = ArrowArrayViewGetStringUnsafe(out_view.get(), 0);
+  ASSERT_EQ(std::string(item.data, item.size_bytes), "line1");
+
+  item = ArrowArrayViewGetStringUnsafe(out_view.get(), 1);
+  ASSERT_EQ(std::string(item.data, item.size_bytes), "line2");
+
+  item = ArrowArrayViewGetStringUnsafe(out_view.get(), 2);
+  ASSERT_EQ(std::string(item.data, item.size_bytes), "line3");
+
+
+  auto result2 = linesplitter_write(out.get());
+  ASSERT_EQ(result2.first, 0);
+  ASSERT_EQ(result2.second, "line1\nline2\nline3\n");
+}