You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/23 01:41:30 UTC

[GitHub] [arrow] wesm opened a new pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

wesm opened a new pull request #7522:
URL: https://github.com/apache/arrow/pull/7522


   An array object was failing to be decref'd on the DatetimeTZ conversion path. The code is slightly complicated by the different calling reference ownership semantics of the Array/ChunkedArray conversion path (which expects to own the created array when it calls `GetSeriesResult` while the `GetResultBlock` code retains its array in a `OwnedRefNoGIL`). This was the simplest thing that fixed the memory leak for me. There is leak checking code that can be used to verify this in python/scripts/test_leak.py (just run the script). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7522:
URL: https://github.com/apache/arrow/pull/7522#issuecomment-648145200


   +1, I'll go ahead and merge this since I confirmed the memory leak is fixed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7522:
URL: https://github.com/apache/arrow/pull/7522#issuecomment-647857111


   https://issues.apache.org/jira/browse/ARROW-8801


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7522:
URL: https://github.com/apache/arrow/pull/7522


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #7522:
URL: https://github.com/apache/arrow/pull/7522#issuecomment-648146247


   Was just testing it, and can also confirm the case from the issue is fixed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7522:
URL: https://github.com/apache/arrow/pull/7522#issuecomment-648113053


   Perhaps @jorisvandenbossche can review this, because I don't much about Pandas conversions and internals.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7522:
URL: https://github.com/apache/arrow/pull/7522#discussion_r443913968



##########
File path: cpp/src/arrow/python/arrow_to_pandas.cc
##########
@@ -1269,13 +1283,18 @@ class DatetimeTZWriter : public DatetimeNanoWriter {
       : DatetimeNanoWriter(options, num_rows, 1), timezone_(timezone) {}
 
  protected:
-  Status GetResultBlock(PyObject** out) override { return GetBlock1D(out); }
+  Status GetResultBlock(PyObject** out) override {
+    RETURN_NOT_OK(MakeBlock1D());
+    *out = block_arr_.obj();
+    return Status::OK();
+  }
 
   Status AddResultMetadata(PyObject* result) override {
     PyObject* py_tz = PyUnicode_FromStringAndSize(
         timezone_.c_str(), static_cast<Py_ssize_t>(timezone_.size()));
     RETURN_IF_PYERROR();
     PyDict_SetItemString(result, "timezone", py_tz);
+    Py_DECREF(py_tz);

Review comment:
       This was an small memory leak that users are unlikely to observe but I found it while looking at usages of PyDict_SetItemString (which increments the reference count of the passed dict value)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7522: ARROW-8801: [Python] Fix memory leak when converting datetime64-with-tz data to pandas

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche edited a comment on pull request #7522:
URL: https://github.com/apache/arrow/pull/7522#issuecomment-648146247


   Was just testing it, and can also confirm the case from the issue is fixed, and the code looks good to me


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org