You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/06 08:43:11 UTC

[GitHub] [arrow] milesgranger opened a new pull request, #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

milesgranger opened a new pull request, #14331:
URL: https://github.com/apache/arrow/pull/14331

   Fixes [ARROW-17944](https://issues.apache.org/jira/browse/ARROW-17944)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14331:
URL: https://github.com/apache/arrow/pull/14331#issuecomment-1269600978

   https://issues.apache.org/jira/browse/ARROW-17944


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989082209


##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +124,13 @@ def run_query(plan, table_provider=None):
         function[CNamedTableProvider] c_named_table_provider
         CConversionOptions c_conversion_options
 
-    c_buf_plan = pyarrow_unwrap_buffer(plan)
+    if isinstance(plan, bytes):
+        c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+    elif isinstance(plan, Buffer):
+        c_buf_plan = pyarrow_unwrap_buffer(plan)
+    else:
+        raise TypeError(
+            f"Expected '{Buffer}' or bytes, got '{type(plan)}'")

Review Comment:
   maybe `{Buffer.__module__}.{Buffer.__class__.__name__}`? :sweat_smile: 
   I'll change it to `pyarrow.Buffer` :+1: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989192063


##########
python/pyarrow/tests/test_substrait.py:
##########
@@ -86,6 +86,22 @@ def test_run_serialized_query(tmpdir):
     assert table.select(["foo"]) == res_tb.select(["foo"])
 
 
+@pytest.mark.parametrize("query", (pa.py_buffer(b'buffer'), b"bytes", 1))
+def test_run_query_input_types(tmpdir, query):
+
+    # Passing unsupported type, like int, will not segfault.
+    if not any(isinstance(query, c) for c in (pa.Buffer, bytes)):

Review Comment:
   FYI you can pass a tuple to check if the object matches any of the types:
   
   ```suggestion
       if not isinstance(query, (pa.Buffer, bytes)):
   ```
   
   https://docs.python.org/3.6/library/functions.html#isinstance



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche merged pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche merged PR #14331:
URL: https://github.com/apache/arrow/pull/14331


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989082209


##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +124,13 @@ def run_query(plan, table_provider=None):
         function[CNamedTableProvider] c_named_table_provider
         CConversionOptions c_conversion_options
 
-    c_buf_plan = pyarrow_unwrap_buffer(plan)
+    if isinstance(plan, bytes):
+        c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+    elif isinstance(plan, Buffer):
+        c_buf_plan = pyarrow_unwrap_buffer(plan)
+    else:
+        raise TypeError(
+            f"Expected '{Buffer}' or bytes, got '{type(plan)}'")

Review Comment:
   ~maybe `{Buffer.__module__}.{Buffer.__class__.__name__}`?~ :sweat_smile:
   I'll change it to `pyarrow.Buffer` :+1: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989075163


##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +124,13 @@ def run_query(plan, table_provider=None):
         function[CNamedTableProvider] c_named_table_provider
         CConversionOptions c_conversion_options
 
-    c_buf_plan = pyarrow_unwrap_buffer(plan)
+    if isinstance(plan, bytes):
+        c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+    elif isinstance(plan, Buffer):
+        c_buf_plan = pyarrow_unwrap_buffer(plan)
+    else:
+        raise TypeError(
+            f"Expected '{Buffer}' or bytes, got '{type(plan)}'")

Review Comment:
   > if it's ever changed or relocated
   
   I think that's quite unlikely



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989072221


##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +124,13 @@ def run_query(plan, table_provider=None):
         function[CNamedTableProvider] c_named_table_provider
         CConversionOptions c_conversion_options
 
-    c_buf_plan = pyarrow_unwrap_buffer(plan)
+    if isinstance(plan, bytes):
+        c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+    elif isinstance(plan, Buffer):
+        c_buf_plan = pyarrow_unwrap_buffer(plan)
+    else:
+        raise TypeError(
+            f"Expected '{Buffer}' or bytes, got '{type(plan)}'")

Review Comment:
   ya, I initially had that, but thought if it's ever changed or relocated this would be outdated. I agree it's prettier though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #14331:
URL: https://github.com/apache/arrow/pull/14331#issuecomment-1275297943

   Benchmark runs are scheduled for baseline = 49a53d2fe01145ade49e4af68092af1b73570f9c and contender = 5fb02bdb13965988a9711cd69e5bf0be69eeea62. 5fb02bdb13965988a9711cd69e5bf0be69eeea62 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/b2b0b1a60f5a41b4a98caefadd33ba0c...7263874130404c32b49b3ad64bbb4cab/)
   [Failed :arrow_down:1.11% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3ad0d3ea3ac342ea9031c672f41d02eb...e5203bd0132b4bd8a5ae9a984cc79308/)
   [Failed :arrow_down:0.82% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/393de7b32bb04d0da7545d618fade874...e6ae1c449067456e9a784a2e8111ed15/)
   [Finished :arrow_down:0.29% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/b56c38d578184166b1776febba03edaa...5cc0d729191a4fa68851a5d65421b92c/)
   Buildkite builds:
   [Finished] [`5fb02bdb` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1660)
   [Failed] [`5fb02bdb` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1679)
   [Failed] [`5fb02bdb` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1662)
   [Finished] [`5fb02bdb` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1673)
   [Finished] [`49a53d2f` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1659)
   [Failed] [`49a53d2f` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1678)
   [Failed] [`49a53d2f` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1661)
   [Finished] [`49a53d2f` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1672)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r988771112


##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +129,10 @@ def run_query(plan, table_provider=None):
         function[CNamedTableProvider] c_named_table_provider
         CConversionOptions c_conversion_options
 
-    c_buf_plan = pyarrow_unwrap_buffer(plan)
+    if isinstance(plan, bytes):
+        c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+    else:
+        c_buf_plan = pyarrow_unwrap_buffer(plan)

Review Comment:
   ```suggestion
       elif isinstance(plan, Buffer):
           c_buf_plan = pyarrow_unwrap_buffer(plan)
       else:
           raise TypeError(...)
   ```
   
   Possible alternative to the fueed type. While that is a nice trick, it seems to give a confusing error message? ("No matching signature found" ?)



##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +129,10 @@ def run_query(plan, table_provider=None):
         function[CNamedTableProvider] c_named_table_provider
         CConversionOptions c_conversion_options
 
-    c_buf_plan = pyarrow_unwrap_buffer(plan)
+    if isinstance(plan, bytes):
+        c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+    else:
+        c_buf_plan = pyarrow_unwrap_buffer(plan)

Review Comment:
   ```suggestion
       elif isinstance(plan, Buffer):
           c_buf_plan = pyarrow_unwrap_buffer(plan)
       else:
           raise TypeError(...)
   ```
   
   Possible alternative to the fused type. While that is a nice trick, it seems to give a confusing error message? ("No matching signature found" ?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r988765977


##########
python/pyarrow/tests/test_substrait.py:
##########
@@ -74,12 +76,22 @@ def test_run_serialized_query(tmpdir):
     """
 
     file_name = "read_data.arrow"
-    table = pa.table([[1, 2, 3, 4, 5]], names=['foo'])
+    table = pa.table([[1, 2, 3, 4, 5]], names=["foo"])
     path = _write_dummy_data_to_disk(tmpdir, file_name, table)
     query = tobytes(substrait_query.replace("FILENAME_PLACEHOLDER", path))
 
     buf = pa._substrait._parse_json_plan(query)
 
+    # Passing unsupported type, like int, will not segfault.
+    if input_type == "unsupported":
+        with pytest.raises(TypeError, match="No matching signature found"):
+            substrait.run_query(1)
+        return

Review Comment:
   Since this part doesn't use anything of the rest of the test (it doesn't require any of the variables that is set up above), I would personally just put this in its own test instead of relying on parametrization



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989068685


##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +124,13 @@ def run_query(plan, table_provider=None):
         function[CNamedTableProvider] c_named_table_provider
         CConversionOptions c_conversion_options
 
-    c_buf_plan = pyarrow_unwrap_buffer(plan)
+    if isinstance(plan, bytes):
+        c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+    elif isinstance(plan, Buffer):
+        c_buf_plan = pyarrow_unwrap_buffer(plan)
+    else:
+        raise TypeError(
+            f"Expected '{Buffer}' or bytes, got '{type(plan)}'")

Review Comment:
   ```suggestion
               f"Expected pyarrow.Buffer or bytes, got '{type(plan)}'")
   ```
   
   (automatically printing it will give `"<class 'pyarrow.lib.Buffer'>"`, which is more verbose)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault

Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989353252


##########
python/pyarrow/tests/test_substrait.py:
##########
@@ -86,6 +86,22 @@ def test_run_serialized_query(tmpdir):
     assert table.select(["foo"]) == res_tb.select(["foo"])
 
 
+@pytest.mark.parametrize("query", (pa.py_buffer(b'buffer'), b"bytes", 1))
+def test_run_query_input_types(tmpdir, query):
+
+    # Passing unsupported type, like int, will not segfault.
+    if not any(isinstance(query, c) for c in (pa.Buffer, bytes)):

Review Comment:
   Oh my, lovely! :open_mouth: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org