You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/03/01 23:25:01 UTC

[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

    [ https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382849#comment-16382849 ] 

ASF GitHub Bot commented on ARROW-2142:
---------------------------------------

wesm commented on a change in pull request #1635: ARROW-2142: [Python] Allow conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r171722310
 
 

 ##########
 File path: cpp/src/arrow/array.cc
 ##########
 @@ -772,6 +773,105 @@ std::shared_ptr<Array> MakeArray(const std::shared_ptr<ArrayData>& data) {
   return out;
 }
 
+// ----------------------------------------------------------------------
+// Misc APIs
+
+namespace internal {
+
+std::vector<ArrayVector> RechunkArraysConsistently(
+    const std::vector<ArrayVector>& groups) {
+  if (groups.size() <= 1) {
+    return groups;
+  }
+  // Adjacent slices defining the desired rechunking
+  std::vector<std::pair<int64_t, int64_t>> slices;
+  // Total number of elements common to all array groups
+  int64_t total_length = -1;
+
+  {
+    // Compute a vector of slices such that each array spans
+    // one or more *entire* slices only
+    // e.g. if group #1 has bounds {0, 2, 4, 5, 10}
+    //     and group #2 has bounds {0, 5, 7, 10}
+    // then the computed slices are
+    //     {(0, 2), (2, 4), (4, 5), (5, 7), (7, 10)}
+    std::set<int64_t> bounds;
+    for (auto& group : groups) {
+      int64_t cur = 0;
+      bounds.insert(cur);
+      for (auto& array : group) {
+        cur += array->length();
+        bounds.insert(cur);
 
 Review comment:
   The complexity of this code roughly O(ncolumns * log(num chunks)). The algorithm in `TableBatchReader::ReadNext` is linear-time -- where it's more complex than what's below may be a matter of opinion

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> ---------------------------------------------------------
>
>                 Key: ARROW-2142
>                 URL: https://issues.apache.org/jira/browse/ARROW-2142
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 0.8.0
>            Reporter: Antoine Pitrou
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', '<f4')])
> >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "<ipython-input-18-27a52820b7d8>", line 1, in <module>
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 77, in pyarrow.lib.check_status
>   File "error.pxi", line 85, in pyarrow.lib.check_status
> ArrowNotImplementedError: /home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1585 code: converter.Convert()
> NumPyConverter doesn't implement <struct<x: float>> conversion.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)