You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mikelui (via GitHub)" <gi...@apache.org> on 2023/03/26 05:35:17 UTC

[GitHub] [arrow] mikelui opened a new pull request, #34730: GH-34729: [C++, Python] Enhanced Arrow<->Pandas map/pydict support

mikelui opened a new pull request, #34730:
URL: https://github.com/apache/arrow/pull/34730

   ### Rationale for this change
   
   Explained in issue #34729 
   
   ### What changes are included in this PR?
   
   - Add support for list of maps when converting Arrow to Pandas. There doesn't seem to be a strong change to omit this. Previously it was a hard error as unsupported, due to a bool check.
   - Refactor Arrow Map -> Pandas to support two paths: (1) list of tuples, or (2) pydicts
   - Add a switch in PandasOptions to enable (2), above
   - Bugfix in nested pydicts -> Arrow maps. 
   - Unit tests
   
   ### Are these changes tested?
   
   Unit tests are added in `test_pandas.py`
   
   ### Are there any user-facing changes?
   
   - An additional option flag in PandasOptions
   - Enable list of maps to Pandas, which was previously disabled


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34730: GH-34729: [C++, Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34730:
URL: https://github.com/apache/arrow/pull/34730#issuecomment-1484000930

   :warning: GitHub issue #34729 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on a diff in pull request #34730: GH-34729: [C++, Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on code in PR #34730:
URL: https://github.com/apache/arrow/pull/34730#discussion_r1148481056


##########
python/pyarrow/src/arrow/python/arrow_to_pandas.cc:
##########
@@ -808,52 +808,20 @@ Status ConvertListsLike(PandasOptions options, const ChunkedArray& data,
   return Status::OK();
 }
 
-Status ConvertMap(PandasOptions options, const ChunkedArray& data,
-                  PyObject** out_values) {
-  // Get columns of underlying key/item arrays
-  std::vector<std::shared_ptr<Array>> key_arrays;
-  std::vector<std::shared_ptr<Array>> item_arrays;
-  for (int c = 0; c < data.num_chunks(); ++c) {
-    const auto& map_arr = checked_cast<const MapArray&>(*data.chunk(c));
-    key_arrays.emplace_back(map_arr.keys());
-    item_arrays.emplace_back(map_arr.items());
-  }
-
-  const auto& map_type = checked_cast<const MapType&>(*data.type());
-  auto key_type = map_type.key_type();
-  auto item_type = map_type.item_type();
+template<typename F1, typename F2, typename F3>
+Status ConvertMapHelper(

Review Comment:
   this change makes sense when combined with the next commit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #34730:
URL: https://github.com/apache/arrow/pull/34730#issuecomment-1503088155

   Thank you for the contribution @mikelui!
   
   The Python part of the code looks good to me. There are two long lines to be corrected:
   
   ```python
   /arrow/python/pyarrow/tests/test_pandas.py:4610:89: E501 line too long (95 > 88 characters)
   /arrow/python/pyarrow/tests/test_pandas.py:4627:89: E501 line too long (92 > 88 characters)
   ```
   
   As for the C++ part and the question if this is a good option to have or not, Joris will probably have a better comment and ideas.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amol- commented on pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "amol- (via GitHub)" <gi...@apache.org>.
amol- commented on PR #34730:
URL: https://github.com/apache/arrow/pull/34730#issuecomment-1502961005

   @jorisvandenbossche would you mind having a look at this one? 
   
   I'm not sure how sold I am on a lossy conversion, but it's also true that it's an option so it's user choice. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #34730:
URL: https://github.com/apache/arrow/pull/34730#discussion_r1167102035


##########
python/pyarrow/tests/test_pandas.py:
##########
@@ -4542,3 +4569,48 @@ def test_does_not_mutate_timedelta_nested():
     df = table.to_pandas()
 
     assert df["timedelta_2"][0].to_pytimedelta() == timedelta_2[0]
+
+
+def test_roundtrip_nested_map_table_with_pydicts():

Review Comment:
   This is a bit paranoid, but could you also test the `to_pandas` conversion on a MapArray that has been sliced (or even a chunked one where each chunk has been sliced)? That often brings out some bugs in these implementations.



##########
python/pyarrow/src/arrow/python/arrow_to_pandas.cc:
##########
@@ -911,6 +877,97 @@ Status ConvertMap(PandasOptions options, const ChunkedArray& data,
   return Status::OK();
 }
 
+Status ConvertMap(PandasOptions options, const ChunkedArray& data,
+                  PyObject** out_values) {
+  // Get columns of underlying key/item arrays
+  std::vector<std::shared_ptr<Array>> key_arrays;
+  std::vector<std::shared_ptr<Array>> item_arrays;
+  for (int c = 0; c < data.num_chunks(); ++c) {
+    const auto& map_arr = checked_cast<const MapArray&>(*data.chunk(c));
+    key_arrays.emplace_back(map_arr.keys());
+    item_arrays.emplace_back(map_arr.items());
+  }
+
+  const auto& map_type = checked_cast<const MapType&>(*data.type());
+  auto key_type = map_type.key_type();
+  auto item_type = map_type.item_type();
+
+  // ARROW-6899: Convert dictionary-encoded children to dense instead of
+  // failing below. A more efficient conversion than this could be done later
+  if (key_type->id() == Type::DICTIONARY) {
+    auto dense_type = checked_cast<const DictionaryType&>(*key_type).value_type();
+    RETURN_NOT_OK(DecodeDictionaries(options.pool, dense_type, &key_arrays));
+    key_type = dense_type;
+  }
+  if (item_type->id() == Type::DICTIONARY) {
+    auto dense_type = checked_cast<const DictionaryType&>(*item_type).value_type();
+    RETURN_NOT_OK(DecodeDictionaries(options.pool, dense_type, &item_arrays));
+    item_type = dense_type;
+  }
+
+  // See notes in MakeInnerOptions.
+  options = MakeInnerOptions(std::move(options));
+  // Don't blindly convert because timestamps in lists are handled differently.
+  options.timestamp_as_object = true;
+
+  auto flat_keys = std::make_shared<ChunkedArray>(key_arrays, key_type);
+  auto flat_items = std::make_shared<ChunkedArray>(item_arrays, item_type);
+  OwnedRefNoGIL owned_numpy_keys;
+  RETURN_NOT_OK(
+      ConvertChunkedArrayToPandas(options, flat_keys, nullptr, owned_numpy_keys.ref()));
+  OwnedRefNoGIL owned_numpy_items;
+  RETURN_NOT_OK(
+      ConvertChunkedArrayToPandas(options, flat_items, nullptr, owned_numpy_items.ref()));
+  PyArrayObject* py_keys = reinterpret_cast<PyArrayObject*>(owned_numpy_keys.obj());
+  PyArrayObject* py_items = reinterpret_cast<PyArrayObject*>(owned_numpy_items.obj());
+
+  if (!options.maps_as_pydicts) {
+    // The default behavior to express an Arrow MAP as a list of [(key, value), ...] pairs
+    OwnedRef list_item;
+    return ConvertMapHelper(
+        [&list_item](int64_t num_pairs) {
+          list_item.reset(PyList_New(num_pairs));
+          return CheckPyError();
+        },
+        [&list_item](int64_t idx, OwnedRef& key_value, OwnedRef& item_value) {
+          PyList_SET_ITEM(list_item.obj(), idx,
+                          PyTuple_Pack(2, key_value.obj(), item_value.obj()));
+          return CheckPyError();
+        },
+        [&list_item]{ return list_item.detach(); },
+        data,
+        py_keys,
+        py_items,
+        item_arrays,
+        out_values);
+  } else {
+    // Use a native pydict
+    OwnedRef dict_item;
+    return ConvertMapHelper(
+        [&dict_item]([[maybe_unused]] int64_t) {
+          dict_item.reset(PyDict_New());
+          return CheckPyError();
+        },
+        [&dict_item]([[maybe_unused]] int64_t idx, OwnedRef& key_value, OwnedRef& item_value) {
+          auto setitem_result =
+              PyDict_SetItem(dict_item.obj(), key_value.obj(), item_value.obj());
+          RETURN_IF_PYERROR();

Review Comment:
   This should handle if `TypeError` is raised in case `key_value.obj()` isn't hashable? Perhaps we should test that? I think having a key type of a map or list would trigger that.



##########
python/pyarrow/src/arrow/python/arrow_to_pandas.cc:
##########
@@ -176,6 +176,7 @@ static inline bool ListTypeSupported(const DataType& type) {
     case Type::DATE32:
     case Type::DATE64:
     case Type::STRUCT:
+    case Type::MAP:

Review Comment:
   Agreed. Although it's interesting we check the value type for list, but not the fields of structs? I was thinking you should put `Type::MAP` down with `Type::LIST`, since it is a subclass of ListType, though since the immediate child is `Type::STRUCT` right now it won't do anything meaningful.



##########
python/pyarrow/src/arrow/python/python_to_arrow.cc:
##########
@@ -762,7 +762,7 @@ class PyListConverter : public ListConverter<T, PyConverter, PyConverterTrait> {
       RETURN_NOT_OK(AppendSequence(value));
     } else if (PySet_Check(value) || (Py_TYPE(value) == &PyDictValues_Type)) {
       RETURN_NOT_OK(AppendIterable(value));
-    } else if (PyDict_Check(value) && this->options_.type->id() == Type::MAP) {
+    } else if (PyDict_Check(value) && this->type()->id() == Type::MAP) {

Review Comment:
   Agreed. `MakeConverter` doesn't seem to expect that options is specific to the type in question; it's templated over it. This change seems correct to me. Thanks for catching it. 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on PR #34730:
URL: https://github.com/apache/arrow/pull/34730#issuecomment-1509884569

   To address @amol- 's concerns about data loss, I added a check and warning.
   
   With the new tests that have no duplicate keys, there are no warnings output.
   
   With the test that does have duplicate keys:
   
   ```
   arrow/python/pyarrow/src/arrow/python/arrow_to_pandas.cc:1006: After conversion of Arrow map to pydict, detected possible data loss due to duplicated keys. Input length is [3], total pydict length is [2].
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on PR #34730:
URL: https://github.com/apache/arrow/pull/34730#issuecomment-1515543134

   Looks like some merge issue. Is this an external issue or is it something weird about the HEAD I was working off of?
   
   https://gist.github.com/mikelui/95d23d440974d49c80ab354764e5a39b


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on a diff in pull request #34730: GH-34729: [C++, Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on code in PR #34730:
URL: https://github.com/apache/arrow/pull/34730#discussion_r1148480831


##########
python/pyarrow/src/arrow/python/python_to_arrow.cc:
##########
@@ -762,7 +762,7 @@ class PyListConverter : public ListConverter<T, PyConverter, PyConverterTrait> {
       RETURN_NOT_OK(AppendSequence(value));
     } else if (PySet_Check(value) || (Py_TYPE(value) == &PyDictValues_Type)) {
       RETURN_NOT_OK(AppendIterable(value));
-    } else if (PyDict_Check(value) && this->options_.type->id() == Type::MAP) {
+    } else if (PyDict_Check(value) && this->type()->id() == Type::MAP) {

Review Comment:
   It looks like `options_` isn't updated/set as expected in `MakeConverter`
   
   So, when there's a nested map (e.g. list of map):
   1. `type_` is updated to be the list's value_type, but
   2. `option_` is passed along unchanged and still has its `type_` set as the list
   
   That causes this check to fail unexpected, since the pydict is seen as having `options_.type->id() == Type::LIST` instead of `type()->id() == Type::MAP`
   
   I am unsure what the expected behavior of `options_` is, but this check certainly is meant to check the current type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34730: GH-34729: [C++, Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34730:
URL: https://github.com/apache/arrow/pull/34730#issuecomment-1484000926

   * Closes: #34729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on a diff in pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on code in PR #34730:
URL: https://github.com/apache/arrow/pull/34730#discussion_r1167555579


##########
python/pyarrow/src/arrow/python/arrow_to_pandas.cc:
##########
@@ -911,6 +877,122 @@ Status ConvertMap(PandasOptions options, const ChunkedArray& data,
   return Status::OK();
 }
 
+// A more helpful error message around TypeErrors that may stem from unhashable keys
+Status CheckMapAsPydictsError() {

Review Comment:
   Manually tested this by remove the `with pytest.raises(TypeError):` in my non-hashable key test, and got:
   
   ```
   E   TypeError: unhashable type: 'numpy.ndarray'. If keys are not hashable, then you must use `maps_as_pydicts=False`
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #34730:
URL: https://github.com/apache/arrow/pull/34730#discussion_r1171471068


##########
python/pyarrow/src/arrow/python/arrow_to_pandas.cc:
##########
@@ -911,6 +877,165 @@ Status ConvertMap(PandasOptions options, const ChunkedArray& data,
   return Status::OK();
 }
 
+// A more helpful error message around TypeErrors that may stem from unhashable keys
+Status CheckMapAsPydictsTypeError() {
+  if (ARROW_PREDICT_TRUE(!PyErr_Occurred())) {
+    return Status::OK();
+  }
+  if (PyErr_ExceptionMatches(PyExc_TypeError)) {
+    // Modify the error string directly, so it is re-raised
+    // with our additional info.
+    //
+    // There are not many interesting things happening when this
+    // is hit. This is intended to only be called directly after
+    // PyDict_SetItem, where a finite set of errors could occur.
+    PyObject *type, *value, *traceback;
+    PyErr_Fetch(&type, &value, &traceback);
+    std::string message;
+    RETURN_NOT_OK(internal::PyObject_StdStringStr(value, &message));
+    message += ". If keys are not hashable, then you must use the option "
+        "[maps_as_pydicts=None (default)]";
+
+    // resets the error
+    PyErr_SetString(PyExc_TypeError, message.c_str());
+  }
+  return ConvertPyError();
+}
+
+Status CheckForDuplicateKeys(bool error_on_duplicate_keys,
+                             Py_ssize_t total_dict_len, Py_ssize_t total_raw_len) {
+  if (total_dict_len < total_raw_len) {
+    const char* message =
+        "[maps_as_pydicts] "
+        "After conversion of Arrow maps to pydicts, "
+        "detected data loss due to duplicate keys. "
+        "Original input length is [%lld], total converted pydict length is [%lld].";
+    std::array<char, 256> buf;
+    std::snprintf(buf.data(), buf.size(), message, total_raw_len, total_dict_len);
+
+    if (error_on_duplicate_keys) {
+      return Status::UnknownError(buf.data());
+    } else {
+      ARROW_LOG(WARNING) << buf.data();
+    }
+  }
+  return Status::OK();
+}
+
+Status ConvertMap(PandasOptions options, const ChunkedArray& data,
+                  PyObject** out_values) {
+  // Get columns of underlying key/item arrays
+  std::vector<std::shared_ptr<Array>> key_arrays;
+  std::vector<std::shared_ptr<Array>> item_arrays;
+  for (int c = 0; c < data.num_chunks(); ++c) {
+    const auto& map_arr = checked_cast<const MapArray&>(*data.chunk(c));
+    key_arrays.emplace_back(map_arr.keys());
+    item_arrays.emplace_back(map_arr.items());
+  }
+
+  const auto& map_type = checked_cast<const MapType&>(*data.type());
+  auto key_type = map_type.key_type();
+  auto item_type = map_type.item_type();
+
+  // ARROW-6899: Convert dictionary-encoded children to dense instead of
+  // failing below. A more efficient conversion than this could be done later
+  if (key_type->id() == Type::DICTIONARY) {
+    auto dense_type = checked_cast<const DictionaryType&>(*key_type).value_type();
+    RETURN_NOT_OK(DecodeDictionaries(options.pool, dense_type, &key_arrays));
+    key_type = dense_type;
+  }
+  if (item_type->id() == Type::DICTIONARY) {
+    auto dense_type = checked_cast<const DictionaryType&>(*item_type).value_type();
+    RETURN_NOT_OK(DecodeDictionaries(options.pool, dense_type, &item_arrays));
+    item_type = dense_type;
+  }
+
+  // See notes in MakeInnerOptions.
+  options = MakeInnerOptions(std::move(options));
+  // Don't blindly convert because timestamps in lists are handled differently.
+  options.timestamp_as_object = true;
+
+  auto flat_keys = std::make_shared<ChunkedArray>(key_arrays, key_type);
+  auto flat_items = std::make_shared<ChunkedArray>(item_arrays, item_type);
+  OwnedRefNoGIL owned_numpy_keys;
+  RETURN_NOT_OK(
+      ConvertChunkedArrayToPandas(options, flat_keys, nullptr, owned_numpy_keys.ref()));
+  OwnedRefNoGIL owned_numpy_items;
+  RETURN_NOT_OK(
+      ConvertChunkedArrayToPandas(options, flat_items, nullptr, owned_numpy_items.ref()));
+  PyArrayObject* py_keys = reinterpret_cast<PyArrayObject*>(owned_numpy_keys.obj());
+  PyArrayObject* py_items = reinterpret_cast<PyArrayObject*>(owned_numpy_items.obj());
+
+  if (options.maps_as_pydicts == MapConversionType::DEFAULT) {
+    // The default behavior to express an Arrow MAP as a list of [(key, value), ...] pairs
+    OwnedRef list_item;
+    return ConvertMapHelper(
+        [&list_item](int64_t num_pairs) {
+          list_item.reset(PyList_New(num_pairs));
+          return CheckPyError();
+        },
+        [&list_item](int64_t idx, OwnedRef& key_value, OwnedRef& item_value) {
+          PyList_SET_ITEM(list_item.obj(), idx,
+                          PyTuple_Pack(2, key_value.obj(), item_value.obj()));
+          return CheckPyError();
+        },
+        [&list_item]{ return list_item.detach(); },
+        data,
+        py_keys,
+        py_items,
+        item_arrays,
+        out_values);
+  } else {
+    // Use a native pydict
+    OwnedRef dict_item;
+    Py_ssize_t total_dict_len{0};
+    Py_ssize_t total_raw_len{0};
+
+    bool error_on_duplicate_keys;
+    if (options.maps_as_pydicts == MapConversionType::LOSSY) {
+      error_on_duplicate_keys = false;
+    } else if (options.maps_as_pydicts == MapConversionType::STRICT_) {
+      error_on_duplicate_keys = true;
+    } else {
+      auto val = std::underlying_type_t<MapConversionType>(options.maps_as_pydicts);
+      return Status::UnknownError(
+          "Received unknown option for maps_as_pydicts: " + std::to_string(val)
+      );
+    }
+
+    auto status = ConvertMapHelper(
+        [&dict_item, &total_raw_len](int64_t num_pairs) {
+          total_raw_len += num_pairs;
+          dict_item.reset(PyDict_New());
+          return CheckPyError();
+        },
+        [&dict_item]([[maybe_unused]] int64_t idx, OwnedRef& key_value, OwnedRef& item_value) {
+          auto setitem_result =
+              PyDict_SetItem(dict_item.obj(), key_value.obj(), item_value.obj());
+          ARROW_RETURN_NOT_OK(CheckMapAsPydictsTypeError());
+          // returns -1 if there are internal errors around hashing/resizing
+          return setitem_result == 0 ?
+            Status::OK() :
+            Status::UnknownError("[maps_as_pydicts] "
+                "Unexpected failure inserting Arrow (key, value) pair into Python dict"
+            );
+        },
+        [&dict_item, &total_dict_len]{
+          total_dict_len += PyDict_Size(dict_item.obj());
+          return dict_item.detach();
+        },
+        data,
+        py_keys,
+        py_items,
+        item_arrays,
+        out_values);
+
+    // If there were no errors generating the pydicts,
+    // then check if we detected any data loss from duplicate keys.
+    return !status.ok() ? status : CheckForDuplicateKeys(error_on_duplicate_keys, total_dict_len, total_raw_len);

Review Comment:
   I think we typically use the `ARROW_RETURN_NOT_OK` to early return.
   ```suggestion
       ARROW_RETURN_NOT_OK(status);
       return CheckForDuplicateKeys(error_on_duplicate_keys, total_dict_len, total_raw_len);
   ```



##########
python/pyarrow/tests/test_pandas.py:
##########
@@ -2128,6 +2128,91 @@ def test_to_list_of_structs_pandas(self):
                                     DeprecationWarning)
             tm.assert_series_equal(series, expected)
 
+    def test_to_list_of_maps_pandas(self):
+        if ((Version(np.__version__) >= Version("1.25.0.dev0")) and
+                (Version(pd.__version__) < Version("2.0.0"))):
+            # TODO: regression in pandas with numpy 1.25dev
+            # https://github.com/pandas-dev/pandas/issues/50360
+            pytest.skip("Regression in pandas with numpy 1.25")
+        offsets = pa.array([0, 2, 5, 6], pa.int32())
+        keys = pa.array(['foo', 'bar', 'baz', 'qux', 'quux', 'quz'])
+        items = pa.array([['a', 'b'], ['c', 'd'], [], None, [None, 'e'], ['f', 'g']],
+                         pa.list_(pa.string()))
+        maps = pa.MapArray.from_arrays(offsets, keys, items)
+        data = pa.ListArray.from_arrays([0, 1, 3], maps)
+
+        expected = pd.Series([
+            [[('foo', ['a', 'b']), ('bar', ['c', 'd'])]],
+            [[('baz', []), ('qux', None), ('quux', [None, 'e'])], [('quz', ['f', 'g'])]]
+        ])
+
+        series = pd.Series(data.to_pandas())

Review Comment:
   I think this could be simplified:
   
   ```suggestion
           data = [
               [[('foo', ['a', 'b']), ('bar', ['c', 'd'])]],
               [[('baz', []), ('qux', None), ('quux', [None, 'e'])], [('quz', ['f', 'g'])]]
           ]
           arr = pa.array(data, pa.list_(pa.map_(pa.utf8(), pa.list_(pa.utf8()))))
           series = arr.to_pandas()
   
           expected = pd.Series(data)
   ```



##########
python/pyarrow/tests/test_pandas.py:
##########
@@ -2128,6 +2128,91 @@ def test_to_list_of_structs_pandas(self):
                                     DeprecationWarning)
             tm.assert_series_equal(series, expected)
 
+    def test_to_list_of_maps_pandas(self):
+        if ((Version(np.__version__) >= Version("1.25.0.dev0")) and
+                (Version(pd.__version__) < Version("2.0.0"))):
+            # TODO: regression in pandas with numpy 1.25dev
+            # https://github.com/pandas-dev/pandas/issues/50360
+            pytest.skip("Regression in pandas with numpy 1.25")
+        offsets = pa.array([0, 2, 5, 6], pa.int32())
+        keys = pa.array(['foo', 'bar', 'baz', 'qux', 'quux', 'quz'])
+        items = pa.array([['a', 'b'], ['c', 'd'], [], None, [None, 'e'], ['f', 'g']],
+                         pa.list_(pa.string()))
+        maps = pa.MapArray.from_arrays(offsets, keys, items)
+        data = pa.ListArray.from_arrays([0, 1, 3], maps)
+
+        expected = pd.Series([
+            [[('foo', ['a', 'b']), ('bar', ['c', 'd'])]],
+            [[('baz', []), ('qux', None), ('quux', [None, 'e'])], [('quz', ['f', 'g'])]]
+        ])
+
+        series = pd.Series(data.to_pandas())
+
+        # pandas.testing generates a
+        # DeprecationWarning: elementwise comparison failed
+        with warnings.catch_warnings():
+            warnings.filterwarnings("ignore", "elementwise comparison failed",
+                                    DeprecationWarning)
+            tm.assert_series_equal(series, expected)
+
+    def test_to_list_of_maps_pandas_sliced(self):
+        """
+        A slightly more rigorous test for chunk/slice combinations
+        """
+
+        if ((Version(np.__version__) >= Version("1.25.0.dev0")) and
+                (Version(pd.__version__) < Version("2.0.0"))):
+            # TODO: regression in pandas with numpy 1.25dev
+            # https://github.com/pandas-dev/pandas/issues/50360
+            pytest.skip("Regression in pandas with numpy 1.25")
+
+        keys_1 = pa.array(['foo', 'bar'])
+        keys_2 = pa.array(['baz', 'qux', 'quux'])
+        keys_3 = pa.array(['quz'])
+
+        items_1 = pa.array(
+            [['a', 'b'], ['c', 'd']],
+            pa.list_(pa.string()),
+        )
+        items_2 = pa.array(
+            [[], None, [None, 'e']],
+            pa.list_(pa.string()),
+        )
+        items_3 = pa.array(
+            [['f', 'g']],
+            pa.list_(pa.string()),
+        )
+
+        map_chunk_1 = pa.MapArray.from_arrays([0, 2], keys_1, items_1)
+        map_chunk_2 = pa.MapArray.from_arrays([0, 3], keys_2, items_2)
+        map_chunk_3 = pa.MapArray.from_arrays([0, 1], keys_3, items_3)
+        data = pa.chunked_array([
+            pa.ListArray.from_arrays([0, 1], map_chunk_1),
+            pa.ListArray.from_arrays([0, 1], map_chunk_2),
+            pa.ListArray.from_arrays([0, 1], map_chunk_3),
+        ])

Review Comment:
   A couple thoughts:
   
    1. I think the chunked array is overcomplicated it in the end (sorry for that distraction). What's more important is the slicing.
    2. This currently doesn't slice within any arrays; it slices at array boundaries, which doesn't trigger the code paths I was hoping for.
    3. It would also be good to slice the inner arrays.
   
   Here's my suggested test case:
   
   ```suggestion
           keys = pa.array(['ignore', 'foo', 'bar', 'baz', 'qux', 'quux', 'ignore']).slice(1, 5)
           items = pa.array(
               [['ignore'], ['ignore'], ['a', 'b'], ['c', 'd'], [], None, [None, 'e']],
               pa.list_(pa.string()),
           ).slice(2, 5)
           map = pa.MapArray.from_arrays([0, 2, 4], keys, items)
           arr = pa.ListArray.from_arrays([0, 1, 2], map)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on PR #34730:
URL: https://github.com/apache/arrow/pull/34730#issuecomment-1513531120

   Hmmmm I wonder if it would be more ergonomic to have users supply an optional string, instead of a bool for `maps_as_pydicts`.
   
   That way, users can supply:
   * None: default behavior
   * "lossy": report warnings when duplicate keys detected
   * "strict": if duplicate keys are detected, we raise a python error
   
   I will experiment with this as a new commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on a diff in pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on code in PR #34730:
URL: https://github.com/apache/arrow/pull/34730#discussion_r1167563568


##########
python/pyarrow/src/arrow/python/arrow_to_pandas.cc:
##########
@@ -911,6 +877,97 @@ Status ConvertMap(PandasOptions options, const ChunkedArray& data,
   return Status::OK();
 }
 
+Status ConvertMap(PandasOptions options, const ChunkedArray& data,
+                  PyObject** out_values) {
+  // Get columns of underlying key/item arrays
+  std::vector<std::shared_ptr<Array>> key_arrays;
+  std::vector<std::shared_ptr<Array>> item_arrays;
+  for (int c = 0; c < data.num_chunks(); ++c) {
+    const auto& map_arr = checked_cast<const MapArray&>(*data.chunk(c));
+    key_arrays.emplace_back(map_arr.keys());
+    item_arrays.emplace_back(map_arr.items());
+  }
+
+  const auto& map_type = checked_cast<const MapType&>(*data.type());
+  auto key_type = map_type.key_type();
+  auto item_type = map_type.item_type();
+
+  // ARROW-6899: Convert dictionary-encoded children to dense instead of
+  // failing below. A more efficient conversion than this could be done later
+  if (key_type->id() == Type::DICTIONARY) {
+    auto dense_type = checked_cast<const DictionaryType&>(*key_type).value_type();
+    RETURN_NOT_OK(DecodeDictionaries(options.pool, dense_type, &key_arrays));
+    key_type = dense_type;
+  }
+  if (item_type->id() == Type::DICTIONARY) {
+    auto dense_type = checked_cast<const DictionaryType&>(*item_type).value_type();
+    RETURN_NOT_OK(DecodeDictionaries(options.pool, dense_type, &item_arrays));
+    item_type = dense_type;
+  }
+
+  // See notes in MakeInnerOptions.
+  options = MakeInnerOptions(std::move(options));
+  // Don't blindly convert because timestamps in lists are handled differently.
+  options.timestamp_as_object = true;
+
+  auto flat_keys = std::make_shared<ChunkedArray>(key_arrays, key_type);
+  auto flat_items = std::make_shared<ChunkedArray>(item_arrays, item_type);
+  OwnedRefNoGIL owned_numpy_keys;
+  RETURN_NOT_OK(
+      ConvertChunkedArrayToPandas(options, flat_keys, nullptr, owned_numpy_keys.ref()));
+  OwnedRefNoGIL owned_numpy_items;
+  RETURN_NOT_OK(
+      ConvertChunkedArrayToPandas(options, flat_items, nullptr, owned_numpy_items.ref()));
+  PyArrayObject* py_keys = reinterpret_cast<PyArrayObject*>(owned_numpy_keys.obj());
+  PyArrayObject* py_items = reinterpret_cast<PyArrayObject*>(owned_numpy_items.obj());
+
+  if (!options.maps_as_pydicts) {
+    // The default behavior to express an Arrow MAP as a list of [(key, value), ...] pairs
+    OwnedRef list_item;
+    return ConvertMapHelper(
+        [&list_item](int64_t num_pairs) {
+          list_item.reset(PyList_New(num_pairs));
+          return CheckPyError();
+        },
+        [&list_item](int64_t idx, OwnedRef& key_value, OwnedRef& item_value) {
+          PyList_SET_ITEM(list_item.obj(), idx,
+                          PyTuple_Pack(2, key_value.obj(), item_value.obj()));
+          return CheckPyError();
+        },
+        [&list_item]{ return list_item.detach(); },
+        data,
+        py_keys,
+        py_items,
+        item_arrays,
+        out_values);
+  } else {
+    // Use a native pydict
+    OwnedRef dict_item;
+    return ConvertMapHelper(
+        [&dict_item]([[maybe_unused]] int64_t) {
+          dict_item.reset(PyDict_New());
+          return CheckPyError();
+        },
+        [&dict_item]([[maybe_unused]] int64_t idx, OwnedRef& key_value, OwnedRef& item_value) {
+          auto setitem_result =
+              PyDict_SetItem(dict_item.obj(), key_value.obj(), item_value.obj());
+          RETURN_IF_PYERROR();

Review Comment:
   This is already correctly handled and converted here: https://github.com/kszucs/arrow/blob/master/cpp/src/arrow/python/common.cc#L144
   
   But, I added an additional check and error message, anyway (c708c604d3c1eb0d1c0bd2e0f5742a783c5f138e) , to provide more direct information to users so that they don't have to google and curiously search through docs for the common case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 merged pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 merged PR #34730:
URL: https://github.com/apache/arrow/pull/34730


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on a diff in pull request #34730: GH-34729: [C++, Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on code in PR #34730:
URL: https://github.com/apache/arrow/pull/34730#discussion_r1148480976


##########
python/pyarrow/src/arrow/python/arrow_to_pandas.cc:
##########
@@ -176,6 +176,7 @@ static inline bool ListTypeSupported(const DataType& type) {
     case Type::DATE32:
     case Type::DATE64:
     case Type::STRUCT:
+    case Type::MAP:

Review Comment:
   There didn't seem to be a strong reason to have this disabled. Users can do nested maps via (list of structs of maps)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34730:
URL: https://github.com/apache/arrow/pull/34730#issuecomment-1518909786

   Benchmark runs are scheduled for baseline = 929894e415bdb633f135903c1fdcbc83cb223fac and contender = 1a697abd08d6db6ac1496e548dcdb86a9a82c152. 1a697abd08d6db6ac1496e548dcdb86a9a82c152 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3c04579b56754e1f882b18a68921a936...ee7718732b3a477482bc73758ec3ad92/)
   [Failed] [test-mac-arm](https://conbench.ursa.dev/compare/runs/0d2827fa2d8d4f058a3485b785aeebd6...3269fdd33074413db254784b3308cb30/)
   [Finished :arrow_down:2.81% :arrow_up:0.26%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/f02199ea50284c7490a8c9ced6882836...c6a1ce87c70f420dba28c9957645e98f/)
   [Finished :arrow_down:0.21% :arrow_up:0.06%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/1a97c6bc55964b9fbbfa3810ed1efb2d...34f67b0ba3e44d958cd1e65f0d0865f8/)
   Buildkite builds:
   [Finished] [`1a697abd` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2765)
   [Failed] [`1a697abd` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2799)
   [Finished] [`1a697abd` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2763)
   [Finished] [`1a697abd` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2790)
   [Finished] [`929894e4` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2764)
   [Failed] [`929894e4` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2798)
   [Finished] [`929894e4` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2762)
   [Finished] [`929894e4` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2789)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34730:
URL: https://github.com/apache/arrow/pull/34730#issuecomment-1518910172

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/f02199ea50284c7490a8c9ced6882836...c6a1ce87c70f420dba28c9957645e98f/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on a diff in pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on code in PR #34730:
URL: https://github.com/apache/arrow/pull/34730#discussion_r1167563568


##########
python/pyarrow/src/arrow/python/arrow_to_pandas.cc:
##########
@@ -911,6 +877,97 @@ Status ConvertMap(PandasOptions options, const ChunkedArray& data,
   return Status::OK();
 }
 
+Status ConvertMap(PandasOptions options, const ChunkedArray& data,
+                  PyObject** out_values) {
+  // Get columns of underlying key/item arrays
+  std::vector<std::shared_ptr<Array>> key_arrays;
+  std::vector<std::shared_ptr<Array>> item_arrays;
+  for (int c = 0; c < data.num_chunks(); ++c) {
+    const auto& map_arr = checked_cast<const MapArray&>(*data.chunk(c));
+    key_arrays.emplace_back(map_arr.keys());
+    item_arrays.emplace_back(map_arr.items());
+  }
+
+  const auto& map_type = checked_cast<const MapType&>(*data.type());
+  auto key_type = map_type.key_type();
+  auto item_type = map_type.item_type();
+
+  // ARROW-6899: Convert dictionary-encoded children to dense instead of
+  // failing below. A more efficient conversion than this could be done later
+  if (key_type->id() == Type::DICTIONARY) {
+    auto dense_type = checked_cast<const DictionaryType&>(*key_type).value_type();
+    RETURN_NOT_OK(DecodeDictionaries(options.pool, dense_type, &key_arrays));
+    key_type = dense_type;
+  }
+  if (item_type->id() == Type::DICTIONARY) {
+    auto dense_type = checked_cast<const DictionaryType&>(*item_type).value_type();
+    RETURN_NOT_OK(DecodeDictionaries(options.pool, dense_type, &item_arrays));
+    item_type = dense_type;
+  }
+
+  // See notes in MakeInnerOptions.
+  options = MakeInnerOptions(std::move(options));
+  // Don't blindly convert because timestamps in lists are handled differently.
+  options.timestamp_as_object = true;
+
+  auto flat_keys = std::make_shared<ChunkedArray>(key_arrays, key_type);
+  auto flat_items = std::make_shared<ChunkedArray>(item_arrays, item_type);
+  OwnedRefNoGIL owned_numpy_keys;
+  RETURN_NOT_OK(
+      ConvertChunkedArrayToPandas(options, flat_keys, nullptr, owned_numpy_keys.ref()));
+  OwnedRefNoGIL owned_numpy_items;
+  RETURN_NOT_OK(
+      ConvertChunkedArrayToPandas(options, flat_items, nullptr, owned_numpy_items.ref()));
+  PyArrayObject* py_keys = reinterpret_cast<PyArrayObject*>(owned_numpy_keys.obj());
+  PyArrayObject* py_items = reinterpret_cast<PyArrayObject*>(owned_numpy_items.obj());
+
+  if (!options.maps_as_pydicts) {
+    // The default behavior to express an Arrow MAP as a list of [(key, value), ...] pairs
+    OwnedRef list_item;
+    return ConvertMapHelper(
+        [&list_item](int64_t num_pairs) {
+          list_item.reset(PyList_New(num_pairs));
+          return CheckPyError();
+        },
+        [&list_item](int64_t idx, OwnedRef& key_value, OwnedRef& item_value) {
+          PyList_SET_ITEM(list_item.obj(), idx,
+                          PyTuple_Pack(2, key_value.obj(), item_value.obj()));
+          return CheckPyError();
+        },
+        [&list_item]{ return list_item.detach(); },
+        data,
+        py_keys,
+        py_items,
+        item_arrays,
+        out_values);
+  } else {
+    // Use a native pydict
+    OwnedRef dict_item;
+    return ConvertMapHelper(
+        [&dict_item]([[maybe_unused]] int64_t) {
+          dict_item.reset(PyDict_New());
+          return CheckPyError();
+        },
+        [&dict_item]([[maybe_unused]] int64_t idx, OwnedRef& key_value, OwnedRef& item_value) {
+          auto setitem_result =
+              PyDict_SetItem(dict_item.obj(), key_value.obj(), item_value.obj());
+          RETURN_IF_PYERROR();

Review Comment:
   This is already correctly handled and converted here: https://github.com/kszucs/arrow/blob/master/cpp/src/arrow/python/common.cc#L144
   
   But, I added an additional check and error message, anyway, to provide more direct information to users so that they don't have to google and curiously search through docs for the common case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on a diff in pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on code in PR #34730:
URL: https://github.com/apache/arrow/pull/34730#discussion_r1167563924


##########
python/pyarrow/tests/test_pandas.py:
##########
@@ -4542,3 +4569,48 @@ def test_does_not_mutate_timedelta_nested():
     df = table.to_pandas()
 
     assert df["timedelta_2"][0].to_pytimedelta() == timedelta_2[0]
+
+
+def test_roundtrip_nested_map_table_with_pydicts():

Review Comment:
   Added some additional tests in 0939fa65ab99e709f0ec4f3bf2609f4f353b218b
   
   The actual scenario was a bit difficult to generate. Let me know if the tests are sufficient or if they need more refinement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on PR #34730:
URL: https://github.com/apache/arrow/pull/34730#issuecomment-1509866110

   What's the right way to handle the errors in:  https://github.com/apache/arrow/actions/runs/4708295556/jobs/8350646115?pr=34730
   
   It looks like newer versions of pandas have modified the testing utilities to avoid this error, and those tests do seem to pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mikelui commented on pull request #34730: GH-34729: [C++][Python] Enhanced Arrow<->Pandas map/pydict support

Posted by "mikelui (via GitHub)" <gi...@apache.org>.
mikelui commented on PR #34730:
URL: https://github.com/apache/arrow/pull/34730#issuecomment-1514735829

   @wjones127 I've clarified and improved some of the related error logic, and added some extra tests for it. Can you take a look and provide any feedback when you have time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org