You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Francois Saint-Jacques (Jira)" <ji...@apache.org> on 2019/08/23 01:18:00 UTC

[jira] [Comment Edited] (ARROW-5508) [C++] Create reusable Iterator interface

    [ https://issues.apache.org/jira/browse/ARROW-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913838#comment-16913838 ] 

Francois Saint-Jacques edited comment on ARROW-5508 at 8/23/19 1:17 AM:
------------------------------------------------------------------------

My take after implementing MapIterator, FlattenIterator and using it heavily in the dataset code.

# `T` must be of pointer type (or support assignment/comparison of nullptr). Iterator completion is signaled by assigning `T* out` to nullptr.
# Due to previous point, the iterator may never yield nullptr as a valid value.
# The interface forces consuming a value to know if it's empty, i.e. there's no Done()/HasNext(). This can lead to [odd|https://github.com/fsaintjacques/arrow/commit/36ba801f47a1053c292fd461afd4ec23e63c1e97#diff-df9646433131d9cf9f31a395c2719b70R157-R190] consumption.
# I question the user of Status as a return code, maybe we should have a specialized `FailableIterator<T> : Iterator<Result<T>>` for the same effect.

The first and second point could be tackled by returning `Option<T>` (Result<T> wouldn't work because we can't use Status::OK() as a sentinel-completion value). The third is annoying for streaming iterators (when there's no way to know completion without side effect), since the iterator itself must consume on Done() call and cache the result. I think I prefer putting the onus on the iterator implementor than the user of the interface.


was (Author: fsaintjacques):
My take after implementing MapIterator, FlattenIterator.

# `T` must be of pointer type (or support assignment/comparison of nullptr). Iterator completion is signaled by assigning `T* out` to nullptr.
# Due to previous point, the iterator may never yield nullptr as a valid value.
# The interface forces consuming a value to know if it's empty, i.e. there's no Done()/HasNext(). This can lead to [odd|https://github.com/fsaintjacques/arrow/commit/36ba801f47a1053c292fd461afd4ec23e63c1e97#diff-df9646433131d9cf9f31a395c2719b70R157-R190] consumption.
# I question the user of Status as a return code, maybe we should have a specialized `FailableIterator<T> : Iterator<Result<T>>` for the same effect.

The first and second point could be tackled by returning `Option<T>` (Result<T> wouldn't work because we can't use Status::OK() as a sentinel-completion value). The third is annoying for streaming iterators (when there's no way to know completion without side effect), since the iterator itself must consume on Done() call and cache the result. I think I prefer putting the onus on the iterator implementor than the user of the interface.

> [C++] Create reusable Iterator<T> interface 
> --------------------------------------------
>
>                 Key: ARROW-5508
>                 URL: https://issues.apache.org/jira/browse/ARROW-5508
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 0.15.0
>
>
> We have various iterator-like classes. I envision a reusable interface like
> {code}
> template <typename T>
> class Iterator {
>  public:
>   virtual ~Iterator() = default;
>   virtual Status Next(T* out) = 0;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)