You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "paleolimbot (via GitHub)" <gi...@apache.org> on 2023/03/20 19:21:29 UTC

[GitHub] [arrow-nanoarrow] paleolimbot opened a new pull request, #164: feat(extensions/nanoarrow_ipc): Add single-threaded stream reader

paleolimbot opened a new pull request, #164:
URL: https://github.com/apache/arrow-nanoarrow/pull/164

   Higher level runtimes may be able to use the `ArrowIpcDecoder` (or more than one) and handle IO/parallelization using tools that are difficult to provide from C; however, for testing we do need the ability to read streams in their entirety. This PR provides a tool to do that based on an arbitrary bytes input.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] lidavidm commented on a diff in pull request #164: feat(extensions/nanoarrow_ipc): Add single-threaded stream reader

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #164:
URL: https://github.com/apache/arrow-nanoarrow/pull/164#discussion_r1145303497


##########
extensions/nanoarrow_ipc/src/nanoarrow/nanoarrow_ipc_reader.c:
##########
@@ -0,0 +1,424 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <errno.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "nanoarrow.h"
+#include "nanoarrow_ipc.h"
+
+void ArrowIpcInputStreamMove(struct ArrowIpcInputStream* src,
+                             struct ArrowIpcInputStream* dst) {
+  memcpy(dst, src, sizeof(struct ArrowIpcInputStream));
+  src->release = NULL;
+}
+
+struct ArrowIpcInputStreamBufferPrivate {
+  struct ArrowBuffer input;
+  int64_t cursor_bytes;
+};
+
+static ArrowErrorCode ArrowIpcInputStreamBufferRead(struct ArrowIpcInputStream* stream,
+                                                    void* buf, int64_t buf_size_bytes,
+                                                    int64_t* size_read_out,
+                                                    struct ArrowError* error) {
+  if (buf_size_bytes == 0) {
+    return NANOARROW_OK;

Review Comment:
   nit: possibly assign to `size_read_out` anyways?



##########
extensions/nanoarrow_ipc/src/nanoarrow/nanoarrow_ipc.h:
##########
@@ -219,6 +227,60 @@ ArrowErrorCode ArrowIpcDecoderDecodeArray(struct ArrowIpcDecoder* decoder,
                                           struct ArrowArray* out,
                                           struct ArrowError* error);
 
+/// \brief An user-extensible input data source
+struct ArrowIpcInputStream {
+  /// \brief Read up to buf_size_bytes from stream into buf
+  ///
+  /// The actual number of bytes read is placed in the value pointed to by
+  /// size_read_out. Returns NANOARROW_OK on success.
+  ArrowErrorCode (*read)(struct ArrowIpcInputStream* stream, void* buf,

Review Comment:
   I suppose:
   
   - Perhaps consider a concrete type (char*, uint8_t*) for buf so it's harder to accidentally pass an arbitrary pointer to it?
   - What do you think about `char** buf` (i.e., letting the stream do the allocation)? That would let you do things like read an in-memory buffer by slicing instead of copying (though I suppose lifetimes may get confusing there).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] paleolimbot merged pull request #164: feat(extensions/nanoarrow_ipc): Add single-threaded stream reader

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot merged PR #164:
URL: https://github.com/apache/arrow-nanoarrow/pull/164


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #164: feat(extensions/nanoarrow_ipc): Add single-threaded stream reader

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #164:
URL: https://github.com/apache/arrow-nanoarrow/pull/164#discussion_r1145350451


##########
extensions/nanoarrow_ipc/src/nanoarrow/nanoarrow_ipc.h:
##########
@@ -219,6 +227,60 @@ ArrowErrorCode ArrowIpcDecoderDecodeArray(struct ArrowIpcDecoder* decoder,
                                           struct ArrowArray* out,
                                           struct ArrowError* error);
 
+/// \brief An user-extensible input data source
+struct ArrowIpcInputStream {
+  /// \brief Read up to buf_size_bytes from stream into buf
+  ///
+  /// The actual number of bytes read is placed in the value pointed to by
+  /// size_read_out. Returns NANOARROW_OK on success.
+  ArrowErrorCode (*read)(struct ArrowIpcInputStream* stream, void* buf,

Review Comment:
   > Perhaps consider a concrete type (char*, uint8_t*) for buf
   
   Totally!
   
   > What do you think about char** buf (i.e., letting the stream do the allocation)?
   
   I played with changing it to `struct ArrowBuffer*` but it's complicated...probably the way to go about that would be to do what Arrow C++ does and make it a separate method (I think Arrow's is called `ReadAt()` or something? If somebody really wants slicing of an in-memory buffer they can use the `ArrowIpcDecoder` directly too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org