You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Benjamin Kietzman (JIRA)" <ji...@apache.org> on 2018/12/12 17:05:00 UTC

[jira] [Comment Edited] (ARROW-47) [C++] Consider adding a scalar type object model

    [ https://issues.apache.org/jira/browse/ARROW-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719205#comment-16719205 ] 

Benjamin Kietzman edited comment on ARROW-47 at 12/12/18 5:04 PM:
------------------------------------------------------------------

One alternative to using `vector<shared_ptr<Scalar>>` would be a flatbuffer:

```
struct StructScalar {
  template <typename T>
  T GetFieldAs(int field_index);

  flatbuffers::Table *root_;
  std::vector<flatbuffers::reflection::Field*> fields_;
  std::shared_ptr<Buffer> storage_;
};
```

In any case, the main challenge I see is the amount of fragile unboxing boilerplate that StructScalar would require to be user friendly. That can be mitigated with good metaprogramming, but it's still a bit verbose:

{{
StructScalar* obj = get();
Status s = Unbox1<int, string, ignore, bool>(obj, [](vector<bool> is_valid, int id, string_view name, ignore, bool admin) {
  // ...
});

vector<bool> is_valid;
tuple<int, string_view, ignore, bool> employee;
RETURN_NOT_OK(Unbox2(obj, &employee, &is_valid));

pair<bool, int> id;
pair<bool, string_view> name;
pair<bool, bool> admin;
RETURN_NOT_OK(Unbox3<int>(obj, 0, &id));
RETURN_NOT_OK(Unbox3<int>(obj, 1, &name));
RETURN_NOT_OK(Unbox3<int>(obj, 3, &admin));

// more options available beyond c++11
}}


was (Author: bkietz):
One alternative to using `vector<shared_ptr<Scalar>>` would be a flatbuffer:

```
struct StructScalar {
  template <typename T>
  T GetFieldAs(int field_index);

  flatbuffers::Table *root_;
  std::vector<flatbuffers::reflection::Field*> fields_;
  std::shared_ptr<Buffer> storage_;
};
```

In any case, the main challenge I see is the amount of fragile unboxing boilerplate that StructScalar would require to be user friendly. That can be mitigated with good metaprogramming, but it's still a bit verbose:

```
StructScalar* obj = get();
Status s = Unbox1<int, string, ignore, bool>(obj, [](vector<bool> is_valid, int id, string_view name, ignore, bool admin) {
  // ...
});

vector<bool> is_valid;
tuple<int, string_view, ignore, bool> employee;
RETURN_NOT_OK(Unbox2(obj, &employee, &is_valid));

pair<bool, int> id;
pair<bool, string_view> name;
pair<bool, bool> admin;
RETURN_NOT_OK(Unbox3<int>(obj, 0, &id));
RETURN_NOT_OK(Unbox3<int>(obj, 1, &name));
RETURN_NOT_OK(Unbox3<int>(obj, 3, &admin));

// more options available beyond c++11
```

> [C++] Consider adding a scalar type object model
> ------------------------------------------------
>
>                 Key: ARROW-47
>                 URL: https://issues.apache.org/jira/browse/ARROW-47
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Uwe L. Korn
>            Priority: Major
>              Labels: Analytics
>             Fix For: 0.13.0
>
>
> Just did this on the Python side. In later analytics routines, passing in scalar values (example: Array + Scalar) requires some kind of container. Some systems, like the R language, solve this problem with length-1 arrays, but we should do some analysis of use cases and figure out what will work best for Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)