You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "AlenkaF (via GitHub)" <gi...@apache.org> on 2023/06/19 10:27:00 UTC

[GitHub] [arrow] AlenkaF opened a new pull request, #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

AlenkaF opened a new pull request, #36162:
URL: https://github.com/apache/arrow/pull/36162

   ### Rationale for this change
   
   Currently, `pyarrow.array `doesn't accept list of pyarrow Scalars and this PR adds a check to allow that.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1252110602


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2457,14 +2457,29 @@ def test_array_from_pylist_offset_overflow():
     )
 ])
 def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
-    if type(seq(scalar_data)) == set:
-        pytest.skip("TODO: The elements in the set get reordered.")
     expect = pa.array(data, type=value_type)
     result = pa.array(seq(scalar_data))
-    assert expect.equals(result)
+
+    if type(seq(scalar_data)) == set:
+        try:
+            import pyarrow.compute as pc
+            expect = [expect[a] for a in pc.array_sort_indices(expect).to_pylist()]
+            result = [result[a] for a in pc.array_sort_indices(result).to_pylist()]
+            expect == result
+        except pa.lib.ArrowNotImplementedError:
+            set(expect) == set(result)

Review Comment:
   > There is always a case that doesn't fit in this PR
   
   Which case exactly gives that error message with ``sorted(expect.to_pylist()) == sorted(result.to_pylist())``?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #36162:
URL: https://github.com/apache/arrow/pull/36162#issuecomment-1602364407

   The tests failing already have an open issue: https://github.com/apache/arrow/issues/35728, so I think this PR is ready for review.
   
   One missing thing are the test for sets where the elements in the set get reordered when getting passed to the `pa.array` constructor. Not sure there is a solution for it.
   https://github.com/apache/arrow/blob/3da9ba67144643d5331aeda7accb5c5795aeb1df/python/pyarrow/tests/test_convert_builtin.py#L2448-L2449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36162:
URL: https://github.com/apache/arrow/pull/36162#issuecomment-1627829784

   Conbench analyzed the 6 benchmark runs on commit `b116b8ab`.
   
   There were 4 benchmark results indicating a performance regression:
   
   - Commit Run on `ursa-thinkcentre-m75q` at [2023-07-07 07:16:12Z](http://conbench.ursa.dev/compare/runs/69b02051183b449f9bb2b9792d0b31cb...bc447b828e384771bfaa199c65a7052d/)
     - [params=DecodeArrowNonNull_Dense/4096, source=cpp-micro, suite=parquet-encoding-benchmark](http://conbench.ursa.dev/compare/benchmarks/064a79000ead798b8000a8dfb120adc1...064a7bc3feec7b6d8000c535c75aed5b)
     - [params=DecodeArrowNonNull_Dense/65536, source=cpp-micro, suite=parquet-encoding-benchmark](http://conbench.ursa.dev/compare/benchmarks/064a79000fc47534800011febda5f4c6...064a7bc400067ecd800003e6b297b80c)
   - and 2 more (see the report linked below)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14894253403) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1251884730


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2457,14 +2457,29 @@ def test_array_from_pylist_offset_overflow():
     )
 ])
 def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
-    if type(seq(scalar_data)) == set:
-        pytest.skip("TODO: The elements in the set get reordered.")
     expect = pa.array(data, type=value_type)
     result = pa.array(seq(scalar_data))
-    assert expect.equals(result)
+
+    if type(seq(scalar_data)) == set:
+        try:
+            import pyarrow.compute as pc
+            expect = [expect[a] for a in pc.array_sort_indices(expect).to_pylist()]
+            result = [result[a] for a in pc.array_sort_indices(result).to_pylist()]
+            expect == result
+        except pa.lib.ArrowNotImplementedError:
+            set(expect) == set(result)

Review Comment:
   That was my first try, but in the case of dictionary type the set mixed the elements differently 🤷‍♀️ 
   But `sorted(expect.to_pylist()) == sorted(result.to_pylist())` is a great idea! Tried without the `to_pylist` but it ofcourse didn't work. Will update today 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1233863309


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2363,3 +2363,113 @@ def test_array_from_pylist_offset_overflow():
     assert isinstance(arr, pa.ChunkedArray)
     assert len(arr) == 2**31
     assert len(arr.chunks) > 1
+
+
+@parametrize_with_collections_types
+@pytest.mark.parametrize(('data', 'scalar_data'), [
+    ([True, False, None], [pa.scalar(True), pa.scalar(False), None]),
+    ([1, 2, None], [pa.scalar(1), pa.scalar(2), None]),
+    ([1, None, None], [pa.scalar(1), None, pa.scalar(None, type=pa.int64())]),
+    ([None, None], [pa.scalar(None), pa.scalar(None)]),
+    ([1., 2., None], [pa.scalar(1.), pa.scalar(2.), None]),
+    ([None, datetime.date.today()], [None, pa.scalar(datetime.date.today())]),
+    ([datetime.time(1, 1, 1), None], [pa.scalar(datetime.time(1, 1, 1)), None]),
+    ([datetime.timedelta(seconds=10)], [pa.scalar(datetime.timedelta(seconds=10))]),
+    ([None, datetime.datetime(2014, 1, 1)], [
+     None, pa.scalar(datetime.datetime(2014, 1, 1))]),
+    ([pa.MonthDayNano([1, -1, -10100])], [pa.scalar(pa.MonthDayNano([1, -1, -10100]))]),
+    (["a", "b"], [pa.scalar("a"), pa.scalar("b")]),
+    ([b"a", b"b"], [pa.scalar(b"a"), pa.scalar(b"b")]),
+    ([1, 2, 3], pa.scalar([1, 2, 3])),
+    (["a", "b"], pa.scalar(["a", "b"])),
+])
+def test_array_accepts_pyarrow_scalar(seq, data, scalar_data):
+    if type(seq(scalar_data)) == set:
+        pytest.skip("TODO: look at the reordering of the elements in the set")
+    expect = pa.array(data)
+    result = pa.array(seq(scalar_data))
+    assert expect.equals(result)
+
+
+@parametrize_with_collections_types
+@pytest.mark.parametrize(('data', 'scalar_data', 'value_type'), [
+    ([1, 2, None], [pa.scalar(1, type=pa.int8()),
+     pa.scalar(2, type=pa.int8()), None], pa.int8()),
+    ([1, None], [pa.scalar(1.0, type=pa.int32()), None], pa.int32()),
+    (["aaa", "bbb"], [pa.scalar("aaa", type=pa.binary(3)),
+     pa.scalar("bbb", type=pa.binary(3))], pa.binary(3)),
+    ([b"a"], [pa.scalar("a", type=pa.large_binary())], pa.large_binary()),
+    (["a"], [pa.scalar("a", type=pa.large_string())], pa.large_string()),
+    (
+        ["a"],
+        [pa.scalar("a", type=pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        ["a", "b"],
+        [pa.scalar("a", pa.dictionary(pa.int64(), pa.string())),
+         pa.scalar("b", pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        [1],
+        [pa.scalar(1, type=pa.dictionary(pa.int64(), pa.int32()))],
+        pa.dictionary(pa.int64(), pa.int32())
+    ),
+    (
+        [(1, 2)],
+        [pa.scalar([('a', 1), ('b', 2)], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.int8())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.int8())])
+    ),
+    (
+        [(1, 'bar')],
+        [pa.scalar([('a', 1), ('b', 'bar')], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.string())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.string())])
+    )
+])
+def test_array_accepts_pyarrow_scalar_with_type(seq, data, scalar_data, value_type):
+    if type(seq(scalar_data)) == set:
+        pytest.skip("TODO: look at the reordering of the elements in the set")
+    expect = pa.array(data, type=value_type)
+    result = pa.array(seq(scalar_data), type=value_type)
+    assert expect.equals(result)
+
+
+def test_array_accepts_pyarrow_scalar_something():
+    arr = pa.array([1, 2, 3])
+    result = pa.array([arr.sum()])
+    expect = pa.array([6])
+    assert expect.equals(result)
+
+
+@parametrize_with_collections_types
+def test_array_accepts_pyarrow_scalar_errors(seq):
+    sequence = seq([pa.scalar(1), pa.scalar("a"), pa.scalar(3.0)])
+    with pytest.raises(pa.ArrowInvalid,
+                       match="cannot mix scalars with different types"):
+        pa.array(sequence)
+
+    sequence = seq([1, pa.scalar("a"), None])
+    with pytest.raises(pa.ArrowInvalid,
+                       match="pyarrow scalars cannot be mixed with other "
+                             "Python scalar values currently"):
+        pa.array(sequence)
+
+    sequence = seq([np.float16("0.1"), pa.scalar("a"), None])
+    with pytest.raises(pa.ArrowInvalid,
+                       match="pyarrow scalars cannot be mixed with other "
+                             "Python scalar values currently"):
+        pa.array(sequence)
+
+    sequence = seq([pa.scalar("a"), np.float16("0.1"), None])
+    with pytest.raises(pa.ArrowInvalid,
+                       match="pyarrow scalars cannot be mixed with other "
+                             "Python scalar values currently"):
+        pa.array(sequence)
+
+    with pytest.raises(pa.ArrowInvalid,
+                       match="Cannot append scalar of type string "
+                             "to builder for type int32"):
+        pa.array([pa.scalar("a")], type=pa.int32())

Review Comment:
   This should fail with a different message.
   Also casting `int64` to `int8` should work but it currently doesn't.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1241494996


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2363,3 +2363,136 @@ def test_array_from_pylist_offset_overflow():
     assert isinstance(arr, pa.ChunkedArray)
     assert len(arr) == 2**31
     assert len(arr.chunks) > 1
+
+

Review Comment:
   Yeah, I took approx one example for the group of datatypes represented in the specific `PyPrimitiveConverter` class that holds an `Append` method in `python/pyarrow/src/arrow/python/python_to_arrow.cc`. There should be no difference between for example `date32` vs `date64`. But will add an extra case to the test just to be sure 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1251858303


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2457,14 +2457,29 @@ def test_array_from_pylist_offset_overflow():
     )
 ])
 def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
-    if type(seq(scalar_data)) == set:
-        pytest.skip("TODO: The elements in the set get reordered.")
     expect = pa.array(data, type=value_type)
     result = pa.array(seq(scalar_data))
-    assert expect.equals(result)
+
+    if type(seq(scalar_data)) == set:
+        try:
+            import pyarrow.compute as pc
+            expect = [expect[a] for a in pc.array_sort_indices(expect).to_pylist()]
+            result = [result[a] for a in pc.array_sort_indices(result).to_pylist()]
+            expect == result
+        except pa.lib.ArrowNotImplementedError:
+            set(expect) == set(result)

Review Comment:
   I think it's probably good enough to just always use the fallback (or could also be `sorted(expect.to_pylist()) == sorted(result.to_pylist())` ?)
   
   (that would avoid some of the complexity)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #36162:
URL: https://github.com/apache/arrow/pull/36162#issuecomment-1612975020

   The failing tests are all due to a known issue with `test_total_bytes_allocated`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1246861245


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2363,3 +2363,141 @@ def test_array_from_pylist_offset_overflow():
     assert isinstance(arr, pa.ChunkedArray)
     assert len(arr) == 2**31
     assert len(arr.chunks) > 1
+
+
+@parametrize_with_collections_types
+@pytest.mark.parametrize(('data', 'scalar_data', 'value_type'), [
+    ([True, False, None], [pa.scalar(True), pa.scalar(False), None], pa.bool_()),
+    (
+        [1, 2, None],
+        [pa.scalar(1), pa.scalar(2), pa.scalar(None, pa.int64())],
+        pa.int64()
+    ),
+    ([1, None, None], [pa.scalar(1), None, pa.scalar(None, pa.int64())], pa.int64()),
+    ([None, None], [pa.scalar(None), pa.scalar(None)], pa.null()),
+    ([1., 2., None], [pa.scalar(1.), pa.scalar(2.), None], pa.float64()),
+    (
+        [None, datetime.date.today()],
+        [None, pa.scalar(datetime.date.today())],
+        pa.date32()
+    ),
+    (
+        [None, datetime.date.today()],
+        [None, pa.scalar(datetime.date.today(), pa.date64())],
+        pa.date64()
+    ),
+    (
+        [datetime.time(1, 1, 1), None],
+        [pa.scalar(datetime.time(1, 1, 1)), None],
+        pa.time64('us')
+    ),
+    (
+        [datetime.timedelta(seconds=10)],
+        [pa.scalar(datetime.timedelta(seconds=10))],
+        pa.duration('us')
+    ),
+    (
+        [None, datetime.datetime(2014, 1, 1)],
+        [None, pa.scalar(datetime.datetime(2014, 1, 1))],
+        pa.timestamp('us')
+    ),
+    (
+        [pa.MonthDayNano([1, -1, -10100])],
+        [pa.scalar(pa.MonthDayNano([1, -1, -10100]))],
+        pa.month_day_nano_interval()
+    ),
+    (["a", "b"], [pa.scalar("a"), pa.scalar("b")], pa.string()),
+    ([b"a", b"b"], [pa.scalar(b"a"), pa.scalar(b"b")], pa.binary()),
+    (
+        [b"a", b"b"],
+        [pa.scalar(b"a", pa.binary(1)), pa.scalar(b"b", pa.binary(1))],
+        pa.binary(1)
+    ),
+    ([[1, 2, 3]], [pa.scalar([1, 2, 3])], pa.list_(pa.int64())),
+    ([["a", "b"]], [pa.scalar(["a", "b"])], pa.list_(pa.string())),
+    (
+        [1, 2, None],
+        [pa.scalar(1, type=pa.int8()), pa.scalar(2, type=pa.int8()), None],
+        pa.int8()
+    ),
+    ([1, None], [pa.scalar(1.0, type=pa.int32()), None], pa.int32()),
+    (
+        ["aaa", "bbb"],
+        [pa.scalar("aaa", type=pa.binary(3)), pa.scalar("bbb", type=pa.binary(3))],
+        pa.binary(3)),
+    ([b"a"], [pa.scalar("a", type=pa.large_binary())], pa.large_binary()),
+    (["a"], [pa.scalar("a", type=pa.large_string())], pa.large_string()),
+    (
+        ["a"],
+        [pa.scalar("a", type=pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        ["a", "b"],
+        [pa.scalar("a", pa.dictionary(pa.int64(), pa.string())),
+         pa.scalar("b", pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        [1],
+        [pa.scalar(1, type=pa.dictionary(pa.int64(), pa.int32()))],
+        pa.dictionary(pa.int64(), pa.int32())
+    ),
+    (
+        [(1, 2)],
+        [pa.scalar([('a', 1), ('b', 2)], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.int8())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.int8())])
+    ),
+    (
+        [(1, 'bar')],
+        [pa.scalar([('a', 1), ('b', 'bar')], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.string())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.string())])
+    )
+])
+def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
+    if type(seq(scalar_data)) == set:
+        pytest.skip("TODO: The elements in the set get reordered.")

Review Comment:
   One more thing I forgot, we should maybe resolve this comment. Either just keep skipping it (and remove the TODO), or actually address it. 
   
   The actual behaviour is expected: the iteration order of a set is not defined (and not the same as the order it was created), so it is OK that this doesn't pass out of the box. The question is whether we want to change the assert specifically for this case, or just leave it as is (the skip)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1238239388


##########
python/pyarrow/src/arrow/python/inference.cc:
##########
@@ -77,6 +79,8 @@ Status ImportPresentIntervalTypes(OwnedRefNoGIL* interval_types_tuple) {
 
 }  // namespace
 
+int import_pyarrow();

Review Comment:
   Removing `import_pyarrow();` altogether also works so I went with that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1252246982


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2457,14 +2457,29 @@ def test_array_from_pylist_offset_overflow():
     )
 ])
 def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
-    if type(seq(scalar_data)) == set:
-        pytest.skip("TODO: The elements in the set get reordered.")
     expect = pa.array(data, type=value_type)
     result = pa.array(seq(scalar_data))
-    assert expect.equals(result)
+
+    if type(seq(scalar_data)) == set:
+        try:
+            import pyarrow.compute as pc
+            expect = [expect[a] for a in pc.array_sort_indices(expect).to_pylist()]
+            result = [result[a] for a in pc.array_sort_indices(result).to_pylist()]
+            expect == result
+        except pa.lib.ArrowNotImplementedError:
+            set(expect) == set(result)

Review Comment:
   `ListType` and `StructType`, that is:
   - `data = [(1, 'bar')], scalar_data = [<pyarrow.StructScalar: [('a', 1), ('b', 'bar')]>], value_type = StructType(struct<a: int8, b: string>)`
   - `data = [(1, 2)], scalar_data = [<pyarrow.StructScalar: [('a', 1), ('b', 2)]>], value_type = StructType(struct<a: int8, b: int8>)`
   - `data = [['a', 'b']], scalar_data = [<pyarrow.ListScalar: ['a', 'b']>], value_type = ListType(list<item: string>)`
   - `data = [[1, 2, 3]], scalar_data = [<pyarrow.ListScalar: [1, 2, 3]>], value_type = ListType(list<item: int64>)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1252643646


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2457,14 +2457,29 @@ def test_array_from_pylist_offset_overflow():
     )
 ])
 def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
-    if type(seq(scalar_data)) == set:
-        pytest.skip("TODO: The elements in the set get reordered.")
     expect = pa.array(data, type=value_type)
     result = pa.array(seq(scalar_data))
-    assert expect.equals(result)
+
+    if type(seq(scalar_data)) == set:
+        try:
+            import pyarrow.compute as pc
+            expect = [expect[a] for a in pc.array_sort_indices(expect).to_pylist()]
+            result = [result[a] for a in pc.array_sort_indices(result).to_pylist()]
+            expect == result
+        except pa.lib.ArrowNotImplementedError:
+            set(expect) == set(result)

Review Comment:
   Hmm. Sorry for the back and forth here, but then I would maybe go back to the initial skip (with an updated comment). I don't think this is worth the complexity. We know that sets can be iterated multiple times (just with an non-deterministic order), and won't give a problem here, accept being difficult to test with general data. 
   (eg, while testing some other options, I also noticed that for the case of null type, for a set it actually created a different array because of de-duplicating the sequence)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1234913799


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2363,3 +2363,113 @@ def test_array_from_pylist_offset_overflow():
     assert isinstance(arr, pa.ChunkedArray)
     assert len(arr) == 2**31
     assert len(arr.chunks) > 1
+
+
+@parametrize_with_collections_types
+@pytest.mark.parametrize(('data', 'scalar_data'), [
+    ([True, False, None], [pa.scalar(True), pa.scalar(False), None]),
+    ([1, 2, None], [pa.scalar(1), pa.scalar(2), None]),
+    ([1, None, None], [pa.scalar(1), None, pa.scalar(None, type=pa.int64())]),
+    ([None, None], [pa.scalar(None), pa.scalar(None)]),
+    ([1., 2., None], [pa.scalar(1.), pa.scalar(2.), None]),
+    ([None, datetime.date.today()], [None, pa.scalar(datetime.date.today())]),
+    ([datetime.time(1, 1, 1), None], [pa.scalar(datetime.time(1, 1, 1)), None]),
+    ([datetime.timedelta(seconds=10)], [pa.scalar(datetime.timedelta(seconds=10))]),
+    ([None, datetime.datetime(2014, 1, 1)], [
+     None, pa.scalar(datetime.datetime(2014, 1, 1))]),
+    ([pa.MonthDayNano([1, -1, -10100])], [pa.scalar(pa.MonthDayNano([1, -1, -10100]))]),
+    (["a", "b"], [pa.scalar("a"), pa.scalar("b")]),
+    ([b"a", b"b"], [pa.scalar(b"a"), pa.scalar(b"b")]),
+    ([1, 2, 3], pa.scalar([1, 2, 3])),
+    (["a", "b"], pa.scalar(["a", "b"])),

Review Comment:
   ```suggestion
       ([[1, 2, 3]], [pa.scalar([1, 2, 3])]),
       ([["a", "b"]], [pa.scalar(["a", "b"])]),
   ```
   
   ?



##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2363,3 +2363,113 @@ def test_array_from_pylist_offset_overflow():
     assert isinstance(arr, pa.ChunkedArray)
     assert len(arr) == 2**31
     assert len(arr.chunks) > 1
+
+
+@parametrize_with_collections_types
+@pytest.mark.parametrize(('data', 'scalar_data'), [
+    ([True, False, None], [pa.scalar(True), pa.scalar(False), None]),
+    ([1, 2, None], [pa.scalar(1), pa.scalar(2), None]),
+    ([1, None, None], [pa.scalar(1), None, pa.scalar(None, type=pa.int64())]),
+    ([None, None], [pa.scalar(None), pa.scalar(None)]),
+    ([1., 2., None], [pa.scalar(1.), pa.scalar(2.), None]),
+    ([None, datetime.date.today()], [None, pa.scalar(datetime.date.today())]),
+    ([datetime.time(1, 1, 1), None], [pa.scalar(datetime.time(1, 1, 1)), None]),
+    ([datetime.timedelta(seconds=10)], [pa.scalar(datetime.timedelta(seconds=10))]),
+    ([None, datetime.datetime(2014, 1, 1)], [
+     None, pa.scalar(datetime.datetime(2014, 1, 1))]),
+    ([pa.MonthDayNano([1, -1, -10100])], [pa.scalar(pa.MonthDayNano([1, -1, -10100]))]),
+    (["a", "b"], [pa.scalar("a"), pa.scalar("b")]),
+    ([b"a", b"b"], [pa.scalar(b"a"), pa.scalar(b"b")]),
+    ([1, 2, 3], pa.scalar([1, 2, 3])),
+    (["a", "b"], pa.scalar(["a", "b"])),
+])
+def test_array_accepts_pyarrow_scalar(seq, data, scalar_data):
+    if type(seq(scalar_data)) == set:
+        pytest.skip("TODO: look at the reordering of the elements in the set")
+    expect = pa.array(data)
+    result = pa.array(seq(scalar_data))
+    assert expect.equals(result)
+
+
+@parametrize_with_collections_types
+@pytest.mark.parametrize(('data', 'scalar_data', 'value_type'), [
+    ([1, 2, None], [pa.scalar(1, type=pa.int8()),
+     pa.scalar(2, type=pa.int8()), None], pa.int8()),
+    ([1, None], [pa.scalar(1.0, type=pa.int32()), None], pa.int32()),
+    (["aaa", "bbb"], [pa.scalar("aaa", type=pa.binary(3)),
+     pa.scalar("bbb", type=pa.binary(3))], pa.binary(3)),
+    ([b"a"], [pa.scalar("a", type=pa.large_binary())], pa.large_binary()),
+    (["a"], [pa.scalar("a", type=pa.large_string())], pa.large_string()),
+    (
+        ["a"],
+        [pa.scalar("a", type=pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        ["a", "b"],
+        [pa.scalar("a", pa.dictionary(pa.int64(), pa.string())),
+         pa.scalar("b", pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        [1],
+        [pa.scalar(1, type=pa.dictionary(pa.int64(), pa.int32()))],
+        pa.dictionary(pa.int64(), pa.int32())
+    ),
+    (
+        [(1, 2)],
+        [pa.scalar([('a', 1), ('b', 2)], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.int8())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.int8())])
+    ),
+    (
+        [(1, 'bar')],
+        [pa.scalar([('a', 1), ('b', 'bar')], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.string())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.string())])
+    )
+])
+def test_array_accepts_pyarrow_scalar_with_type(seq, data, scalar_data, value_type):
+    if type(seq(scalar_data)) == set:
+        pytest.skip("TODO: look at the reordering of the elements in the set")
+    expect = pa.array(data, type=value_type)
+    result = pa.array(seq(scalar_data), type=value_type)
+    assert expect.equals(result)
+
+
+def test_array_accepts_pyarrow_scalar_something():
+    arr = pa.array([1, 2, 3])
+    result = pa.array([arr.sum()])
+    expect = pa.array([6])
+    assert expect.equals(result)
+
+
+@parametrize_with_collections_types
+def test_array_accepts_pyarrow_scalar_errors(seq):
+    sequence = seq([pa.scalar(1), pa.scalar("a"), pa.scalar(3.0)])
+    with pytest.raises(pa.ArrowInvalid,
+                       match="cannot mix scalars with different types"):
+        pa.array(sequence)
+
+    sequence = seq([1, pa.scalar("a"), None])
+    with pytest.raises(pa.ArrowInvalid,
+                       match="pyarrow scalars cannot be mixed with other "
+                             "Python scalar values currently"):
+        pa.array(sequence)
+
+    sequence = seq([np.float16("0.1"), pa.scalar("a"), None])
+    with pytest.raises(pa.ArrowInvalid,
+                       match="pyarrow scalars cannot be mixed with other "
+                             "Python scalar values currently"):
+        pa.array(sequence)
+
+    sequence = seq([pa.scalar("a"), np.float16("0.1"), None])
+    with pytest.raises(pa.ArrowInvalid,
+                       match="pyarrow scalars cannot be mixed with other "
+                             "Python scalar values currently"):
+        pa.array(sequence)
+
+    with pytest.raises(pa.ArrowInvalid,
+                       match="Cannot append scalar of type string "
+                             "to builder for type int32"):
+        pa.array([pa.scalar("a")], type=pa.int32())

Review Comment:
   > Also casting `int64` to `int8` should work but it currently doesn't.
   
   I suppose it is the builder that currently doesn't support that? 
   (it might also be fine to not support this for this initial PR)



##########
python/pyarrow/src/arrow/python/inference.cc:
##########
@@ -77,6 +79,8 @@ Status ImportPresentIntervalTypes(OwnedRefNoGIL* interval_types_tuple) {
 
 }  // namespace
 
+int import_pyarrow();

Review Comment:
   ```suggestion
   import_pyarrow();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #36162:
URL: https://github.com/apache/arrow/pull/36162#issuecomment-1621344258

   Thanks Alenka!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1250513824


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2363,3 +2363,141 @@ def test_array_from_pylist_offset_overflow():
     assert isinstance(arr, pa.ChunkedArray)
     assert len(arr) == 2**31
     assert len(arr.chunks) > 1
+
+
+@parametrize_with_collections_types
+@pytest.mark.parametrize(('data', 'scalar_data', 'value_type'), [
+    ([True, False, None], [pa.scalar(True), pa.scalar(False), None], pa.bool_()),
+    (
+        [1, 2, None],
+        [pa.scalar(1), pa.scalar(2), pa.scalar(None, pa.int64())],
+        pa.int64()
+    ),
+    ([1, None, None], [pa.scalar(1), None, pa.scalar(None, pa.int64())], pa.int64()),
+    ([None, None], [pa.scalar(None), pa.scalar(None)], pa.null()),
+    ([1., 2., None], [pa.scalar(1.), pa.scalar(2.), None], pa.float64()),
+    (
+        [None, datetime.date.today()],
+        [None, pa.scalar(datetime.date.today())],
+        pa.date32()
+    ),
+    (
+        [None, datetime.date.today()],
+        [None, pa.scalar(datetime.date.today(), pa.date64())],
+        pa.date64()
+    ),
+    (
+        [datetime.time(1, 1, 1), None],
+        [pa.scalar(datetime.time(1, 1, 1)), None],
+        pa.time64('us')
+    ),
+    (
+        [datetime.timedelta(seconds=10)],
+        [pa.scalar(datetime.timedelta(seconds=10))],
+        pa.duration('us')
+    ),
+    (
+        [None, datetime.datetime(2014, 1, 1)],
+        [None, pa.scalar(datetime.datetime(2014, 1, 1))],
+        pa.timestamp('us')
+    ),
+    (
+        [pa.MonthDayNano([1, -1, -10100])],
+        [pa.scalar(pa.MonthDayNano([1, -1, -10100]))],
+        pa.month_day_nano_interval()
+    ),
+    (["a", "b"], [pa.scalar("a"), pa.scalar("b")], pa.string()),
+    ([b"a", b"b"], [pa.scalar(b"a"), pa.scalar(b"b")], pa.binary()),
+    (
+        [b"a", b"b"],
+        [pa.scalar(b"a", pa.binary(1)), pa.scalar(b"b", pa.binary(1))],
+        pa.binary(1)
+    ),
+    ([[1, 2, 3]], [pa.scalar([1, 2, 3])], pa.list_(pa.int64())),
+    ([["a", "b"]], [pa.scalar(["a", "b"])], pa.list_(pa.string())),
+    (
+        [1, 2, None],
+        [pa.scalar(1, type=pa.int8()), pa.scalar(2, type=pa.int8()), None],
+        pa.int8()
+    ),
+    ([1, None], [pa.scalar(1.0, type=pa.int32()), None], pa.int32()),
+    (
+        ["aaa", "bbb"],
+        [pa.scalar("aaa", type=pa.binary(3)), pa.scalar("bbb", type=pa.binary(3))],
+        pa.binary(3)),
+    ([b"a"], [pa.scalar("a", type=pa.large_binary())], pa.large_binary()),
+    (["a"], [pa.scalar("a", type=pa.large_string())], pa.large_string()),
+    (
+        ["a"],
+        [pa.scalar("a", type=pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        ["a", "b"],
+        [pa.scalar("a", pa.dictionary(pa.int64(), pa.string())),
+         pa.scalar("b", pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        [1],
+        [pa.scalar(1, type=pa.dictionary(pa.int64(), pa.int32()))],
+        pa.dictionary(pa.int64(), pa.int32())
+    ),
+    (
+        [(1, 2)],
+        [pa.scalar([('a', 1), ('b', 2)], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.int8())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.int8())])
+    ),
+    (
+        [(1, 'bar')],
+        [pa.scalar([('a', 1), ('b', 'bar')], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.string())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.string())])
+    )
+])
+def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
+    if type(seq(scalar_data)) == set:
+        pytest.skip("TODO: The elements in the set get reordered.")

Review Comment:
   Updated tests to include sets also (probably not the nicest way) in [bfb17c6](https://github.com/apache/arrow/pull/36162/commits/bfb17c693cce362998f3bc80675e22f01d935123).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1239031012


##########
python/pyarrow/src/arrow/python/inference.cc:
##########
@@ -560,6 +573,17 @@ class TypeInferrer {
     return Status::OK();
   }
 
+  Status VisitArrowScalar(PyObject* obj, bool* keep_going) {

Review Comment:
   I think we can add an `/* unused */` comment for `keep_going` since we do want to check all values.



##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2363,3 +2363,136 @@ def test_array_from_pylist_offset_overflow():
     assert isinstance(arr, pa.ChunkedArray)
     assert len(arr) == 2**31
     assert len(arr.chunks) > 1
+
+

Review Comment:
   Great test cases! I see we aren't testing all scalar types defined in https://arrow.apache.org/docs/python/api/datatypes.html, but I don't think we need to since many of them take identical code paths, right? e.g. date32 vs date64



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1246410455


##########
python/pyarrow/src/arrow/python/inference.cc:
##########
@@ -27,7 +27,9 @@
 #include <utility>
 #include <vector>
 
+#include "arrow/scalar.h"
 #include "arrow/status.h"
+#include "arrow/type.h"

Review Comment:
   Will check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1246423519


##########
python/pyarrow/src/arrow/python/inference.cc:
##########
@@ -27,7 +27,9 @@
 #include <utility>
 #include <vector>
 
+#include "arrow/scalar.h"
 #include "arrow/status.h"
+#include "arrow/type.h"

Review Comment:
   Correct, I guess it must be a leftover. Will remove it 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1250252616


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2363,3 +2363,141 @@ def test_array_from_pylist_offset_overflow():
     assert isinstance(arr, pa.ChunkedArray)
     assert len(arr) == 2**31
     assert len(arr.chunks) > 1
+
+
+@parametrize_with_collections_types
+@pytest.mark.parametrize(('data', 'scalar_data', 'value_type'), [
+    ([True, False, None], [pa.scalar(True), pa.scalar(False), None], pa.bool_()),
+    (
+        [1, 2, None],
+        [pa.scalar(1), pa.scalar(2), pa.scalar(None, pa.int64())],
+        pa.int64()
+    ),
+    ([1, None, None], [pa.scalar(1), None, pa.scalar(None, pa.int64())], pa.int64()),
+    ([None, None], [pa.scalar(None), pa.scalar(None)], pa.null()),
+    ([1., 2., None], [pa.scalar(1.), pa.scalar(2.), None], pa.float64()),
+    (
+        [None, datetime.date.today()],
+        [None, pa.scalar(datetime.date.today())],
+        pa.date32()
+    ),
+    (
+        [None, datetime.date.today()],
+        [None, pa.scalar(datetime.date.today(), pa.date64())],
+        pa.date64()
+    ),
+    (
+        [datetime.time(1, 1, 1), None],
+        [pa.scalar(datetime.time(1, 1, 1)), None],
+        pa.time64('us')
+    ),
+    (
+        [datetime.timedelta(seconds=10)],
+        [pa.scalar(datetime.timedelta(seconds=10))],
+        pa.duration('us')
+    ),
+    (
+        [None, datetime.datetime(2014, 1, 1)],
+        [None, pa.scalar(datetime.datetime(2014, 1, 1))],
+        pa.timestamp('us')
+    ),
+    (
+        [pa.MonthDayNano([1, -1, -10100])],
+        [pa.scalar(pa.MonthDayNano([1, -1, -10100]))],
+        pa.month_day_nano_interval()
+    ),
+    (["a", "b"], [pa.scalar("a"), pa.scalar("b")], pa.string()),
+    ([b"a", b"b"], [pa.scalar(b"a"), pa.scalar(b"b")], pa.binary()),
+    (
+        [b"a", b"b"],
+        [pa.scalar(b"a", pa.binary(1)), pa.scalar(b"b", pa.binary(1))],
+        pa.binary(1)
+    ),
+    ([[1, 2, 3]], [pa.scalar([1, 2, 3])], pa.list_(pa.int64())),
+    ([["a", "b"]], [pa.scalar(["a", "b"])], pa.list_(pa.string())),
+    (
+        [1, 2, None],
+        [pa.scalar(1, type=pa.int8()), pa.scalar(2, type=pa.int8()), None],
+        pa.int8()
+    ),
+    ([1, None], [pa.scalar(1.0, type=pa.int32()), None], pa.int32()),
+    (
+        ["aaa", "bbb"],
+        [pa.scalar("aaa", type=pa.binary(3)), pa.scalar("bbb", type=pa.binary(3))],
+        pa.binary(3)),
+    ([b"a"], [pa.scalar("a", type=pa.large_binary())], pa.large_binary()),
+    (["a"], [pa.scalar("a", type=pa.large_string())], pa.large_string()),
+    (
+        ["a"],
+        [pa.scalar("a", type=pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        ["a", "b"],
+        [pa.scalar("a", pa.dictionary(pa.int64(), pa.string())),
+         pa.scalar("b", pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        [1],
+        [pa.scalar(1, type=pa.dictionary(pa.int64(), pa.int32()))],
+        pa.dictionary(pa.int64(), pa.int32())
+    ),
+    (
+        [(1, 2)],
+        [pa.scalar([('a', 1), ('b', 2)], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.int8())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.int8())])
+    ),
+    (
+        [(1, 'bar')],
+        [pa.scalar([('a', 1), ('b', 'bar')], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.string())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.string())])
+    )
+])
+def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
+    if type(seq(scalar_data)) == set:
+        pytest.skip("TODO: The elements in the set get reordered.")

Review Comment:
   I would change the assert. Will think about how to do it and hopefully make a commit soon =)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #36162:
URL: https://github.com/apache/arrow/pull/36162#issuecomment-1598402232

   I am not sure why linter is failing on this PR. I will try to rebase and see if it helps.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1250513824


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2363,3 +2363,141 @@ def test_array_from_pylist_offset_overflow():
     assert isinstance(arr, pa.ChunkedArray)
     assert len(arr) == 2**31
     assert len(arr.chunks) > 1
+
+
+@parametrize_with_collections_types
+@pytest.mark.parametrize(('data', 'scalar_data', 'value_type'), [
+    ([True, False, None], [pa.scalar(True), pa.scalar(False), None], pa.bool_()),
+    (
+        [1, 2, None],
+        [pa.scalar(1), pa.scalar(2), pa.scalar(None, pa.int64())],
+        pa.int64()
+    ),
+    ([1, None, None], [pa.scalar(1), None, pa.scalar(None, pa.int64())], pa.int64()),
+    ([None, None], [pa.scalar(None), pa.scalar(None)], pa.null()),
+    ([1., 2., None], [pa.scalar(1.), pa.scalar(2.), None], pa.float64()),
+    (
+        [None, datetime.date.today()],
+        [None, pa.scalar(datetime.date.today())],
+        pa.date32()
+    ),
+    (
+        [None, datetime.date.today()],
+        [None, pa.scalar(datetime.date.today(), pa.date64())],
+        pa.date64()
+    ),
+    (
+        [datetime.time(1, 1, 1), None],
+        [pa.scalar(datetime.time(1, 1, 1)), None],
+        pa.time64('us')
+    ),
+    (
+        [datetime.timedelta(seconds=10)],
+        [pa.scalar(datetime.timedelta(seconds=10))],
+        pa.duration('us')
+    ),
+    (
+        [None, datetime.datetime(2014, 1, 1)],
+        [None, pa.scalar(datetime.datetime(2014, 1, 1))],
+        pa.timestamp('us')
+    ),
+    (
+        [pa.MonthDayNano([1, -1, -10100])],
+        [pa.scalar(pa.MonthDayNano([1, -1, -10100]))],
+        pa.month_day_nano_interval()
+    ),
+    (["a", "b"], [pa.scalar("a"), pa.scalar("b")], pa.string()),
+    ([b"a", b"b"], [pa.scalar(b"a"), pa.scalar(b"b")], pa.binary()),
+    (
+        [b"a", b"b"],
+        [pa.scalar(b"a", pa.binary(1)), pa.scalar(b"b", pa.binary(1))],
+        pa.binary(1)
+    ),
+    ([[1, 2, 3]], [pa.scalar([1, 2, 3])], pa.list_(pa.int64())),
+    ([["a", "b"]], [pa.scalar(["a", "b"])], pa.list_(pa.string())),
+    (
+        [1, 2, None],
+        [pa.scalar(1, type=pa.int8()), pa.scalar(2, type=pa.int8()), None],
+        pa.int8()
+    ),
+    ([1, None], [pa.scalar(1.0, type=pa.int32()), None], pa.int32()),
+    (
+        ["aaa", "bbb"],
+        [pa.scalar("aaa", type=pa.binary(3)), pa.scalar("bbb", type=pa.binary(3))],
+        pa.binary(3)),
+    ([b"a"], [pa.scalar("a", type=pa.large_binary())], pa.large_binary()),
+    (["a"], [pa.scalar("a", type=pa.large_string())], pa.large_string()),
+    (
+        ["a"],
+        [pa.scalar("a", type=pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        ["a", "b"],
+        [pa.scalar("a", pa.dictionary(pa.int64(), pa.string())),
+         pa.scalar("b", pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        [1],
+        [pa.scalar(1, type=pa.dictionary(pa.int64(), pa.int32()))],
+        pa.dictionary(pa.int64(), pa.int32())
+    ),
+    (
+        [(1, 2)],
+        [pa.scalar([('a', 1), ('b', 2)], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.int8())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.int8())])
+    ),
+    (
+        [(1, 'bar')],
+        [pa.scalar([('a', 1), ('b', 'bar')], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.string())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.string())])
+    )
+])
+def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
+    if type(seq(scalar_data)) == set:
+        pytest.skip("TODO: The elements in the set get reordered.")

Review Comment:
   Created (probably not the nicest way) to test sets also ([bfb17c6](https://github.com/apache/arrow/pull/36162/commits/bfb17c693cce362998f3bc80675e22f01d935123)).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche merged pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche merged PR #36162:
URL: https://github.com/apache/arrow/pull/36162


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1243955620


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2363,3 +2363,144 @@ def test_array_from_pylist_offset_overflow():
     assert isinstance(arr, pa.ChunkedArray)
     assert len(arr) == 2**31
     assert len(arr.chunks) > 1
+
+
+@parametrize_with_collections_types
+@pytest.mark.parametrize(('data', 'scalar_data', 'value_type'), [
+    ([True, False, None], [pa.scalar(True), pa.scalar(False), None], pa.bool_()),
+    ([1, 2, None], [pa.scalar(1), pa.scalar(2), None], pa.int64()),
+    ([1, None, None], [pa.scalar(1), None, pa.scalar(None, pa.int64())], pa.int64()),
+    ([None, None], [pa.scalar(None), pa.scalar(None)], pa.null()),
+    ([1., 2., None], [pa.scalar(1.), pa.scalar(2.), None], pa.float64()),
+    (
+        [None, datetime.date.today()],
+        [None, pa.scalar(datetime.date.today())],
+        pa.date32()
+    ),
+    (
+        [None, datetime.date.today()],
+        [None, pa.scalar(datetime.date.today(), pa.date64())],
+        pa.date64()
+    ),
+    (
+        [datetime.time(1, 1, 1), None],
+        [pa.scalar(datetime.time(1, 1, 1)), None],
+        pa.time64('us')
+    ),
+    (
+        [datetime.timedelta(seconds=10)],
+        [pa.scalar(datetime.timedelta(seconds=10))],
+        pa.duration('us')
+    ),
+    (
+        [None, datetime.datetime(2014, 1, 1)],
+        [None, pa.scalar(datetime.datetime(2014, 1, 1))],
+        pa.timestamp('us')
+    ),
+    (
+        [pa.MonthDayNano([1, -1, -10100])],
+        [pa.scalar(pa.MonthDayNano([1, -1, -10100]))],
+        pa.month_day_nano_interval()
+    ),
+    (["a", "b"], [pa.scalar("a"), pa.scalar("b")], pa.string()),
+    ([b"a", b"b"], [pa.scalar(b"a"), pa.scalar(b"b")], pa.binary()),
+    (
+        [b"a", b"b"],
+        [pa.scalar(b"a", pa.binary(1)), pa.scalar(b"b", pa.binary(1))],
+        pa.binary(1)
+    ),
+    ([[1, 2, 3]], [pa.scalar([1, 2, 3])], pa.list_(pa.int64())),
+    ([["a", "b"]], [pa.scalar(["a", "b"])], pa.list_(pa.string())),
+    (
+        [1, 2, None],
+        [pa.scalar(1, type=pa.int8()), pa.scalar(2, type=pa.int8()), None],
+        pa.int8()
+    ),
+    ([1, None], [pa.scalar(1.0, type=pa.int32()), None], pa.int32()),
+    (
+        ["aaa", "bbb"],
+        [pa.scalar("aaa", type=pa.binary(3)), pa.scalar("bbb", type=pa.binary(3))],
+        pa.binary(3)),
+    ([b"a"], [pa.scalar("a", type=pa.large_binary())], pa.large_binary()),
+    (["a"], [pa.scalar("a", type=pa.large_string())], pa.large_string()),
+    (
+        ["a"],
+        [pa.scalar("a", type=pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        ["a", "b"],
+        [pa.scalar("a", pa.dictionary(pa.int64(), pa.string())),
+         pa.scalar("b", pa.dictionary(pa.int64(), pa.string()))],
+        pa.dictionary(pa.int64(), pa.string())
+    ),
+    (
+        [1],
+        [pa.scalar(1, type=pa.dictionary(pa.int64(), pa.int32()))],
+        pa.dictionary(pa.int64(), pa.int32())
+    ),
+    (
+        [(1, 2)],
+        [pa.scalar([('a', 1), ('b', 2)], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.int8())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.int8())])
+    ),
+    (
+        [(1, 'bar')],
+        [pa.scalar([('a', 1), ('b', 'bar')], type=pa.struct(
+            [('a', pa.int8()), ('b', pa.string())]))],
+        pa.struct([('a', pa.int8()), ('b', pa.string())])
+    )
+])
+def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
+    if type(seq(scalar_data)) == set:
+        pytest.skip("TODO: The elements in the set get reordered.")
+    expect = pa.array(data, type=value_type)
+    result = pa.array(seq(scalar_data))
+    assert expect.equals(result)
+
+    result = pa.array(seq(scalar_data), type=value_type)
+    assert expect.equals(result)
+
+
+def test_array_accepts_pyarrow_scalar_from_compute():
+    arr = pa.array([1, 2, 3])
+    result = pa.array([arr.sum()])
+    expect = pa.array([6])
+    assert expect.equals(result)

Review Comment:
   Given that this is essentially tested by the test above (we know that the sum() returns a scalar), you can maybe drop this one



##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2363,3 +2363,144 @@ def test_array_from_pylist_offset_overflow():
     assert isinstance(arr, pa.ChunkedArray)
     assert len(arr) == 2**31
     assert len(arr.chunks) > 1
+
+
+@parametrize_with_collections_types
+@pytest.mark.parametrize(('data', 'scalar_data', 'value_type'), [
+    ([True, False, None], [pa.scalar(True), pa.scalar(False), None], pa.bool_()),
+    ([1, 2, None], [pa.scalar(1), pa.scalar(2), None], pa.int64()),

Review Comment:
   ```suggestion
       ([1, 2, None], [pa.scalar(1), pa.scalar(2), pa.scalar(None, pa.int64())], pa.int64()),
   ```
   
   To have a case with a null scalar instead of None as well



##########
python/pyarrow/src/arrow/python/inference.cc:
##########
@@ -27,7 +27,9 @@
 #include <utility>
 #include <vector>
 
+#include "arrow/scalar.h"
 #include "arrow/status.h"
+#include "arrow/type.h"

Review Comment:
   Is this additional include needed? (we already use shared pointers to DataType in this file, so I would expect your diff to not need this)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1234972630


##########
python/pyarrow/src/arrow/python/inference.cc:
##########
@@ -77,6 +79,8 @@ Status ImportPresentIntervalTypes(OwnedRefNoGIL* interval_types_tuple) {
 
 }  // namespace
 
+int import_pyarrow();

Review Comment:
   This change produced an error:
   ```
   /arrow/python/pyarrow/src/arrow/python/inference.cc
     /arrow/python/pyarrow/src/arrow/python/inference.cc:82:17: error: expected constructor, destructor, or type conversion before ';' token
        82 | import_pyarrow();
           |   
   ```
   
   I do not fully understand the usage of this function. But if we need to call it before using any other function in the pyarrow C++ we could do an if statement or something?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1251917493


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2457,14 +2457,29 @@ def test_array_from_pylist_offset_overflow():
     )
 ])
 def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
-    if type(seq(scalar_data)) == set:
-        pytest.skip("TODO: The elements in the set get reordered.")
     expect = pa.array(data, type=value_type)
     result = pa.array(seq(scalar_data))
-    assert expect.equals(result)
+
+    if type(seq(scalar_data)) == set:
+        try:
+            import pyarrow.compute as pc
+            expect = [expect[a] for a in pc.array_sort_indices(expect).to_pylist()]
+            result = [result[a] for a in pc.array_sort_indices(result).to_pylist()]
+            expect == result
+        except pa.lib.ArrowNotImplementedError:
+            set(expect) == set(result)

Review Comment:
   There is always a case that doesn't fit in this PR 🤣  
   
   ```python
   TypeError: unhashable type: 'list'
   TypeError: unhashable type: 'dict'
   ```
   
   I can do `try-except` and take the items of the lists in the case of a `TypeError` from above (in case of ListType and StructType as `to_pylist()` returns a dictionary in a list).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #36162:
URL: https://github.com/apache/arrow/pull/36162#discussion_r1252679542


##########
python/pyarrow/tests/test_convert_builtin.py:
##########
@@ -2457,14 +2457,29 @@ def test_array_from_pylist_offset_overflow():
     )
 ])
 def test_array_accepts_pyarrow_scalar(seq, data, scalar_data, value_type):
-    if type(seq(scalar_data)) == set:
-        pytest.skip("TODO: The elements in the set get reordered.")
     expect = pa.array(data, type=value_type)
     result = pa.array(seq(scalar_data))
-    assert expect.equals(result)
+
+    if type(seq(scalar_data)) == set:
+        try:
+            import pyarrow.compute as pc
+            expect = [expect[a] for a in pc.array_sort_indices(expect).to_pylist()]
+            result = [result[a] for a in pc.array_sort_indices(result).to_pylist()]
+            expect == result
+        except pa.lib.ArrowNotImplementedError:
+            set(expect) == set(result)

Review Comment:
   Agree, will do 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #36162: GH-21761: [Python] accept pyarrow scalars in array constructor

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #36162:
URL: https://github.com/apache/arrow/pull/36162#issuecomment-1621336058

   Will merge this today as the CI is green 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org