You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/06 08:43:11 UTC
[GitHub] [arrow] milesgranger opened a new pull request, #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
milesgranger opened a new pull request, #14331:
URL: https://github.com/apache/arrow/pull/14331
Fixes [ARROW-17944](https://issues.apache.org/jira/browse/ARROW-17944)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14331:
URL: https://github.com/apache/arrow/pull/14331#issuecomment-1269600978
https://issues.apache.org/jira/browse/ARROW-17944
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] milesgranger commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989082209
##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +124,13 @@ def run_query(plan, table_provider=None):
function[CNamedTableProvider] c_named_table_provider
CConversionOptions c_conversion_options
- c_buf_plan = pyarrow_unwrap_buffer(plan)
+ if isinstance(plan, bytes):
+ c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+ elif isinstance(plan, Buffer):
+ c_buf_plan = pyarrow_unwrap_buffer(plan)
+ else:
+ raise TypeError(
+ f"Expected '{Buffer}' or bytes, got '{type(plan)}'")
Review Comment:
maybe `{Buffer.__module__}.{Buffer.__class__.__name__}`? :sweat_smile:
I'll change it to `pyarrow.Buffer` :+1:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wjones127 commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989192063
##########
python/pyarrow/tests/test_substrait.py:
##########
@@ -86,6 +86,22 @@ def test_run_serialized_query(tmpdir):
assert table.select(["foo"]) == res_tb.select(["foo"])
+@pytest.mark.parametrize("query", (pa.py_buffer(b'buffer'), b"bytes", 1))
+def test_run_query_input_types(tmpdir, query):
+
+ # Passing unsupported type, like int, will not segfault.
+ if not any(isinstance(query, c) for c in (pa.Buffer, bytes)):
Review Comment:
FYI you can pass a tuple to check if the object matches any of the types:
```suggestion
if not isinstance(query, (pa.Buffer, bytes)):
```
https://docs.python.org/3.6/library/functions.html#isinstance
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche merged pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
jorisvandenbossche merged PR #14331:
URL: https://github.com/apache/arrow/pull/14331
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] milesgranger commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989082209
##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +124,13 @@ def run_query(plan, table_provider=None):
function[CNamedTableProvider] c_named_table_provider
CConversionOptions c_conversion_options
- c_buf_plan = pyarrow_unwrap_buffer(plan)
+ if isinstance(plan, bytes):
+ c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+ elif isinstance(plan, Buffer):
+ c_buf_plan = pyarrow_unwrap_buffer(plan)
+ else:
+ raise TypeError(
+ f"Expected '{Buffer}' or bytes, got '{type(plan)}'")
Review Comment:
~maybe `{Buffer.__module__}.{Buffer.__class__.__name__}`?~ :sweat_smile:
I'll change it to `pyarrow.Buffer` :+1:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989075163
##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +124,13 @@ def run_query(plan, table_provider=None):
function[CNamedTableProvider] c_named_table_provider
CConversionOptions c_conversion_options
- c_buf_plan = pyarrow_unwrap_buffer(plan)
+ if isinstance(plan, bytes):
+ c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+ elif isinstance(plan, Buffer):
+ c_buf_plan = pyarrow_unwrap_buffer(plan)
+ else:
+ raise TypeError(
+ f"Expected '{Buffer}' or bytes, got '{type(plan)}'")
Review Comment:
> if it's ever changed or relocated
I think that's quite unlikely
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] milesgranger commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989072221
##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +124,13 @@ def run_query(plan, table_provider=None):
function[CNamedTableProvider] c_named_table_provider
CConversionOptions c_conversion_options
- c_buf_plan = pyarrow_unwrap_buffer(plan)
+ if isinstance(plan, bytes):
+ c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+ elif isinstance(plan, Buffer):
+ c_buf_plan = pyarrow_unwrap_buffer(plan)
+ else:
+ raise TypeError(
+ f"Expected '{Buffer}' or bytes, got '{type(plan)}'")
Review Comment:
ya, I initially had that, but thought if it's ever changed or relocated this would be outdated. I agree it's prettier though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ursabot commented on pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #14331:
URL: https://github.com/apache/arrow/pull/14331#issuecomment-1275297943
Benchmark runs are scheduled for baseline = 49a53d2fe01145ade49e4af68092af1b73570f9c and contender = 5fb02bdb13965988a9711cd69e5bf0be69eeea62. 5fb02bdb13965988a9711cd69e5bf0be69eeea62 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/b2b0b1a60f5a41b4a98caefadd33ba0c...7263874130404c32b49b3ad64bbb4cab/)
[Failed :arrow_down:1.11% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3ad0d3ea3ac342ea9031c672f41d02eb...e5203bd0132b4bd8a5ae9a984cc79308/)
[Failed :arrow_down:0.82% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/393de7b32bb04d0da7545d618fade874...e6ae1c449067456e9a784a2e8111ed15/)
[Finished :arrow_down:0.29% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/b56c38d578184166b1776febba03edaa...5cc0d729191a4fa68851a5d65421b92c/)
Buildkite builds:
[Finished] [`5fb02bdb` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1660)
[Failed] [`5fb02bdb` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1679)
[Failed] [`5fb02bdb` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1662)
[Finished] [`5fb02bdb` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1673)
[Finished] [`49a53d2f` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1659)
[Failed] [`49a53d2f` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1678)
[Failed] [`49a53d2f` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1661)
[Finished] [`49a53d2f` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1672)
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r988771112
##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +129,10 @@ def run_query(plan, table_provider=None):
function[CNamedTableProvider] c_named_table_provider
CConversionOptions c_conversion_options
- c_buf_plan = pyarrow_unwrap_buffer(plan)
+ if isinstance(plan, bytes):
+ c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+ else:
+ c_buf_plan = pyarrow_unwrap_buffer(plan)
Review Comment:
```suggestion
elif isinstance(plan, Buffer):
c_buf_plan = pyarrow_unwrap_buffer(plan)
else:
raise TypeError(...)
```
Possible alternative to the fueed type. While that is a nice trick, it seems to give a confusing error message? ("No matching signature found" ?)
##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +129,10 @@ def run_query(plan, table_provider=None):
function[CNamedTableProvider] c_named_table_provider
CConversionOptions c_conversion_options
- c_buf_plan = pyarrow_unwrap_buffer(plan)
+ if isinstance(plan, bytes):
+ c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+ else:
+ c_buf_plan = pyarrow_unwrap_buffer(plan)
Review Comment:
```suggestion
elif isinstance(plan, Buffer):
c_buf_plan = pyarrow_unwrap_buffer(plan)
else:
raise TypeError(...)
```
Possible alternative to the fused type. While that is a nice trick, it seems to give a confusing error message? ("No matching signature found" ?)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r988765977
##########
python/pyarrow/tests/test_substrait.py:
##########
@@ -74,12 +76,22 @@ def test_run_serialized_query(tmpdir):
"""
file_name = "read_data.arrow"
- table = pa.table([[1, 2, 3, 4, 5]], names=['foo'])
+ table = pa.table([[1, 2, 3, 4, 5]], names=["foo"])
path = _write_dummy_data_to_disk(tmpdir, file_name, table)
query = tobytes(substrait_query.replace("FILENAME_PLACEHOLDER", path))
buf = pa._substrait._parse_json_plan(query)
+ # Passing unsupported type, like int, will not segfault.
+ if input_type == "unsupported":
+ with pytest.raises(TypeError, match="No matching signature found"):
+ substrait.run_query(1)
+ return
Review Comment:
Since this part doesn't use anything of the rest of the test (it doesn't require any of the variables that is set up above), I would personally just put this in its own test instead of relying on parametrization
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989068685
##########
python/pyarrow/_substrait.pyx:
##########
@@ -124,7 +124,13 @@ def run_query(plan, table_provider=None):
function[CNamedTableProvider] c_named_table_provider
CConversionOptions c_conversion_options
- c_buf_plan = pyarrow_unwrap_buffer(plan)
+ if isinstance(plan, bytes):
+ c_buf_plan = pyarrow_unwrap_buffer(py_buffer(plan))
+ elif isinstance(plan, Buffer):
+ c_buf_plan = pyarrow_unwrap_buffer(plan)
+ else:
+ raise TypeError(
+ f"Expected '{Buffer}' or bytes, got '{type(plan)}'")
Review Comment:
```suggestion
f"Expected pyarrow.Buffer or bytes, got '{type(plan)}'")
```
(automatically printing it will give `"<class 'pyarrow.lib.Buffer'>"`, which is more verbose)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] milesgranger commented on a diff in pull request #14331: ARROW-17944: [Python] substrait.run_query accept bytes/Buffer and not segfault
Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14331:
URL: https://github.com/apache/arrow/pull/14331#discussion_r989353252
##########
python/pyarrow/tests/test_substrait.py:
##########
@@ -86,6 +86,22 @@ def test_run_serialized_query(tmpdir):
assert table.select(["foo"]) == res_tb.select(["foo"])
+@pytest.mark.parametrize("query", (pa.py_buffer(b'buffer'), b"bytes", 1))
+def test_run_query_input_types(tmpdir, query):
+
+ # Passing unsupported type, like int, will not segfault.
+ if not any(isinstance(query, c) for c in (pa.Buffer, bytes)):
Review Comment:
Oh my, lovely! :open_mouth:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org