You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/14 13:03:58 UTC

[GitHub] [arrow] rymurr opened a new pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

rymurr opened a new pull request #7753:
URL: https://github.com/apache/arrow/pull/7753


   @wesm I couldn't push to #7744 so I have added my fixes to yours on this PR. Feel free to cherry-pick my fixes and close this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou edited a comment on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou edited a comment on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658296432


   Rebased. There were some Java changes in-between, hopefully they won't interfere.
   
   Edit: of course they do...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rymurr commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454383698



##########
File path: python/pyarrow/jvm.py
##########
@@ -43,9 +43,14 @@ def jvm_buffer(arrowbuf):
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
+    import jpype
     address = arrowbuf.memoryAddress()
     size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, arrowbuf.asNettyBuffer())
+
+    # TODO: why can we not use arrowbuf as the base?

Review comment:
       Hey @pitrou I am not actually sure. The test passes locally for me with `base=arrowbuf`. I have uploaded a patch w/ this removed. Lets see if its just my machine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658329952


   JPype build is green, AppVeyor is [green](https://ci.appveyor.com/project/pitrou/arrow). Will merge :-)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658292321


   Revision: 28f6500515fee577dc17f969a819ef01830ffdcd
   
   Submitted crossbow builds: [ursa-labs/crossbow @ actions-416](https://github.com/ursa-labs/crossbow/branches/all?query=actions-416)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.8-jpype|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-416-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-416-github-test-conda-python-3.8-jpype)|


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rymurr commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454498500



##########
File path: ci/scripts/python_test.sh
##########
@@ -29,4 +29,4 @@ export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH}
 # Enable some checks inside Python itself
 export PYTHONDEVMODE=1
 
-pytest -r s --pyargs pyarrow
+pytest -r s --pyargs pyarrow.tests.test_jvm

Review comment:
       Did this sneak in on accident?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658317095


   Revision: 4752024fc2fbcb5b8e058d9f9fa8d2864c5c74ba
   
   Submitted crossbow builds: [ursa-labs/crossbow @ actions-417](https://github.com/ursa-labs/crossbow/branches/all?query=actions-417)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.8-jpype|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-417-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-417-github-test-conda-python-3.8-jpype)|


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658283148


   @github-actions crossbow submit test-conda-python-3.8-jpype


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454493669



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,38 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+
+    def __init__(self, jvm_buf):
+        self.jvm_buf = jvm_buf
+        self.jvm_buf.retain()
+
+    def __del__(self):
+        self.jvm_buf.release()
+
+
+def jvm_buffer(jvm_buf):
     """
     Construct an Arrow buffer from org.apache.arrow.memory.ArrowBuf
 
     Parameters
     ----------
 
-    arrowbuf: org.apache.arrow.memory.ArrowBuf
+    jvm_buf: org.apache.arrow.memory.ArrowBuf
         Arrow Buffer representation on the JVM.
 
     Returns
     -------
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
-    address = arrowbuf.memoryAddress()
-    size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, base=arrowbuf)
+    address = jvm_buf.memoryAddress()

Review comment:
       I'm mostly trying to understand where and when this buffer will be deallcocated. In any case it's outside the scope of this PR so we can let someone more invested in this code deal with it later




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rymurr commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454387527



##########
File path: python/pyarrow/jvm.py
##########
@@ -43,9 +43,14 @@ def jvm_buffer(arrowbuf):
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
+    import jpype
     address = arrowbuf.memoryAddress()
     size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, arrowbuf.asNettyBuffer())
+
+    # TODO: why can we not use arrowbuf as the base?

Review comment:
       `arrowbuf.asNettyBuffer()` fails as we don't expose Netty dependencies in the Java API anymore. I have changed it to just take an `ArrowBuf`, I am not sure why it originally needed a Netty Buffer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658243587


   Well, this should not go in if every buffer creates a Java memory leak, IMHO.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454494904



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,38 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+
+    def __init__(self, jvm_buf):
+        self.jvm_buf = jvm_buf
+        self.jvm_buf.retain()
+
+    def __del__(self):
+        self.jvm_buf.release()
+
+
+def jvm_buffer(jvm_buf):
     """
     Construct an Arrow buffer from org.apache.arrow.memory.ArrowBuf
 
     Parameters
     ----------
 
-    arrowbuf: org.apache.arrow.memory.ArrowBuf
+    jvm_buf: org.apache.arrow.memory.ArrowBuf
         Arrow Buffer representation on the JVM.
 
     Returns
     -------
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
-    address = arrowbuf.memoryAddress()
-    size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, base=arrowbuf)
+    address = jvm_buf.memoryAddress()

Review comment:
       I'm trying to improve this, hold on.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454374835



##########
File path: python/pyarrow/jvm.py
##########
@@ -43,9 +43,14 @@ def jvm_buffer(arrowbuf):
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
+    import jpype
     address = arrowbuf.memoryAddress()
     size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, arrowbuf.asNettyBuffer())
+
+    # TODO: why can we not use arrowbuf as the base?

Review comment:
       @rymurr Do you have the answer to this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on a change in pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r496922290



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,45 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+    ref_manager = None

Review comment:
       So if you have multiple threads creating `jvm_buffer`s with different backing reference manager instances, isn't it possible that one thread will overwrite the class attribute `ref_manager` being used in another thread, then both will call `ref_manager.retain()` using the same `ref_manager` class attribute?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454438960



##########
File path: python/pyarrow/jvm.py
##########
@@ -43,9 +43,14 @@ def jvm_buffer(arrowbuf):
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
+    import jpype
     address = arrowbuf.memoryAddress()
     size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, arrowbuf.asNettyBuffer())
+
+    # TODO: why can we not use arrowbuf as the base?

Review comment:
       @xhochy Try [weakref.finalize](https://docs.python.org/3/library/weakref.html#weakref.finalize). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658281712


   @wesm Trying it as well...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454500127



##########
File path: ci/scripts/python_test.sh
##########
@@ -29,4 +29,4 @@ export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH}
 # Enable some checks inside Python itself
 export PYTHONDEVMODE=1
 
-pytest -r s --pyargs pyarrow
+pytest -r s --pyargs pyarrow.tests.test_jvm

Review comment:
       Yes :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658244375


   Oh, I see... then we can merge this anyway.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454441837



##########
File path: python/pyarrow/jvm.py
##########
@@ -43,9 +43,14 @@ def jvm_buffer(arrowbuf):
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
+    import jpype
     address = arrowbuf.memoryAddress()
     size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, arrowbuf.asNettyBuffer())
+
+    # TODO: why can we not use arrowbuf as the base?

Review comment:
       Well, it should be even easier with `pa.foreign_buffer`. Just pass a custom base object with the desired destructor...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rymurr commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454427669



##########
File path: python/pyarrow/jvm.py
##########
@@ -43,9 +43,14 @@ def jvm_buffer(arrowbuf):
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
+    import jpype
     address = arrowbuf.memoryAddress()
     size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, arrowbuf.asNettyBuffer())
+
+    # TODO: why can we not use arrowbuf as the base?

Review comment:
       @wesm ref counting is a possibility. Though I don't see anywhere where the ref count of the netty java object is explicitly referenced.  
   
   When the python buffer object is garbage collected there is nothing (as far as I can tell) to collect the java based buffer. When the Python object is collected it will decrement the python reference count of the java object and eventually GC it but we would still have to manually close the Java side buffer regardless of it being wrapped in netty or not. So we would have a memory leak regardless if we don't carefully clean up buffers. Unless jpype does something cool and undocumented w/ Netty buffers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #7753:
URL: https://github.com/apache/arrow/pull/7753


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454493669



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,38 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+
+    def __init__(self, jvm_buf):
+        self.jvm_buf = jvm_buf
+        self.jvm_buf.retain()
+
+    def __del__(self):
+        self.jvm_buf.release()
+
+
+def jvm_buffer(jvm_buf):
     """
     Construct an Arrow buffer from org.apache.arrow.memory.ArrowBuf
 
     Parameters
     ----------
 
-    arrowbuf: org.apache.arrow.memory.ArrowBuf
+    jvm_buf: org.apache.arrow.memory.ArrowBuf
         Arrow Buffer representation on the JVM.
 
     Returns
     -------
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
-    address = arrowbuf.memoryAddress()
-    size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, base=arrowbuf)
+    address = jvm_buf.memoryAddress()

Review comment:
       I'm mostly trying to understand where and when this buffer will be destroyed. In any case it's outside the scope of this PR so we can let someone more invested in this code deal with it later




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r496926751



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,45 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+    ref_manager = None

Review comment:
       Hmm, no. The class attribute `ref_manager` will always remain `None`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658239934


   @xhochy Can you please validate the buffer handling changes?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454390287



##########
File path: python/pyarrow/jvm.py
##########
@@ -43,9 +43,14 @@ def jvm_buffer(arrowbuf):
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
+    import jpype
     address = arrowbuf.memoryAddress()
     size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, arrowbuf.asNettyBuffer())
+
+    # TODO: why can we not use arrowbuf as the base?

Review comment:
       I wasn't sure if there might be some kind of memory leak issue, @xhochy would know more since Uwe wrote this code originally




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rymurr commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658246552


   Raised https://issues.apache.org/jira/browse/ARROW-9468 to follow up on the memory leak


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on a change in pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r496938084



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,45 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+    ref_manager = None

Review comment:
       Ok, no problem, thanks for clearing that up for me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658270844


   I'm trying to add a nanny object with a custom dtor


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454378016



##########
File path: python/pyarrow/jvm.py
##########
@@ -43,9 +43,14 @@ def jvm_buffer(arrowbuf):
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
+    import jpype
     address = arrowbuf.memoryAddress()
     size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, arrowbuf.asNettyBuffer())
+
+    # TODO: why can we not use arrowbuf as the base?

Review comment:
       I think it has something to do with allocator reference counting but it would be good to know




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658166655


   https://issues.apache.org/jira/browse/ARROW-9385


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r496537681



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,45 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+    ref_manager = None

Review comment:
       You are misunderstanding how this works. This is not C++, when you write:
   ```python
           self.ref_manager = ref_manager
   ```
   it creates an instance attribute which shadows the class attribute.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xhochy commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
xhochy commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454437715



##########
File path: python/pyarrow/jvm.py
##########
@@ -43,9 +43,14 @@ def jvm_buffer(arrowbuf):
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
+    import jpype
     address = arrowbuf.memoryAddress()
     size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, arrowbuf.asNettyBuffer())
+
+    # TODO: why can we not use arrowbuf as the base?

Review comment:
       At least not without writing C-code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xhochy commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
xhochy commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658244070


   > Well, this should not go in if every buffer creates a Java memory leak, IMHO.
   
   AFAIK this isn't introducing a new leak :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454489525



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,38 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+
+    def __init__(self, jvm_buf):
+        self.jvm_buf = jvm_buf
+        self.jvm_buf.retain()
+
+    def __del__(self):
+        self.jvm_buf.release()
+
+
+def jvm_buffer(jvm_buf):
     """
     Construct an Arrow buffer from org.apache.arrow.memory.ArrowBuf
 
     Parameters
     ----------
 
-    arrowbuf: org.apache.arrow.memory.ArrowBuf
+    jvm_buf: org.apache.arrow.memory.ArrowBuf
         Arrow Buffer representation on the JVM.
 
     Returns
     -------
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
-    address = arrowbuf.memoryAddress()
-    size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, base=arrowbuf)
+    address = jvm_buf.memoryAddress()

Review comment:
       Do you mean the object could be collected from another thread?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658251176


   It just needs to call `release()` when it's done with the buffer, right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658169074


   Revision: 68b92e9f21072008f3a0bfb1b464ab2c3b279fd3
   
   Submitted crossbow builds: [ursa-labs/crossbow @ actions-414](https://github.com/ursa-labs/crossbow/branches/all?query=actions-414)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.8-jpype|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-414-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-414-github-test-conda-python-3.8-jpype)|


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658316233


   @github-actions crossbow submit test-conda-python-3.8-jpype


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on a change in pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r496931942



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,45 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+    ref_manager = None

Review comment:
       Oh, ok. I think I was confused over having a local var `ref_manager` too. So why do you need a class attribute? Is it in case of an exception is thrown and the class instance doesn't get assigned?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on a change in pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r496398998



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,45 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+    ref_manager = None

Review comment:
       Hi @pitrou I was trying to understand why you used a static variable here instead of `jvm_buf.retain()` like before? I'm a little concerned about how this would behave with multiple threads and buffers that have different `ReferenceManager`s since `retain()` is called on the static var, but `release()` is called on the instance var..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rymurr commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658235361


   @github-actions crossbow submit test-conda-python-3.8-jpype


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rymurr commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658166708


   @github-actions crossbow submit test-conda-python-3.8-jpype


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xhochy commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
xhochy commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454437578



##########
File path: python/pyarrow/jvm.py
##########
@@ -43,9 +43,14 @@ def jvm_buffer(arrowbuf):
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
+    import jpype
     address = arrowbuf.memoryAddress()
     size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, arrowbuf.asNettyBuffer())
+
+    # TODO: why can we not use arrowbuf as the base?

Review comment:
       Yes, it probably leaks then. I thought that when the Java object gets garbage collected that the memory would be freed. As this doesn't seem to be the case, this is a problem. 
   
   Not sure though how to call something when an object is GC'ed on the Python side.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r496933666



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,45 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+    ref_manager = None

Review comment:
       Ah, no, it's not needed. It's just a common idiom to write:
   ```python
   class A:
       attr = None
   ```
   
   instead of:
   ```python
   class A:
       def __init__(self):
           self.attr = None
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658296432


   Rebased. There were some Java changes in-between, hopefully they won't interfere.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7753: ARROW-9385: [Python] Fix JPype tests and JVM buffer lifetime

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r496934279



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,45 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+    ref_manager = None

Review comment:
       So, yes, I could have written `self.ref_manager = None` inside the constructor instead. Sorry for the confusion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#issuecomment-658238575


   Revision: bcacea55c68644c795f8498c396517ef4376dc88
   
   Submitted crossbow builds: [ursa-labs/crossbow @ actions-415](https://github.com/ursa-labs/crossbow/branches/all?query=actions-415)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.8-jpype|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-415-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-415-github-test-conda-python-3.8-jpype)|


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7753: ARROW-9385: [Python] finish Fix JPype tests

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7753:
URL: https://github.com/apache/arrow/pull/7753#discussion_r454488716



##########
File path: python/pyarrow/jvm.py
##########
@@ -28,24 +28,38 @@
 import pyarrow as pa
 
 
-def jvm_buffer(arrowbuf):
+class _JvmBufferNanny:
+    """
+    An object that keeps a org.apache.arrow.memory.ArrowBuf's underlying
+    memory alive.
+    """
+
+    def __init__(self, jvm_buf):
+        self.jvm_buf = jvm_buf
+        self.jvm_buf.retain()
+
+    def __del__(self):
+        self.jvm_buf.release()
+
+
+def jvm_buffer(jvm_buf):
     """
     Construct an Arrow buffer from org.apache.arrow.memory.ArrowBuf
 
     Parameters
     ----------
 
-    arrowbuf: org.apache.arrow.memory.ArrowBuf
+    jvm_buf: org.apache.arrow.memory.ArrowBuf
         Arrow Buffer representation on the JVM.
 
     Returns
     -------
     pyarrow.Buffer
         Python Buffer that references the JVM memory.
     """
-    address = arrowbuf.memoryAddress()
-    size = arrowbuf.capacity()
-    return pa.foreign_buffer(address, size, base=arrowbuf)
+    address = jvm_buf.memoryAddress()

Review comment:
       Do we expect that this is a "borrowed" reference here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org