You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/25 04:31:46 UTC

[GitHub] [arrow] arw2019 opened a new pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

arw2019 opened a new pull request #8044:
URL: https://github.com/apache/arrow/pull/8044


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
arw2019 commented on pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#issuecomment-692737718


   Thanks @jorisvandenbossche for reviewing!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#discussion_r478505587



##########
File path: python/pyarrow/tests/test_pandas.py
##########
@@ -3531,11 +3533,11 @@ def test_dictionary_from_pandas_specified_type():
     assert result.type.equals(typ)
     assert result.to_pylist() == ['a', 'b']
 
-    # mismatching values type -> raise error (for now a deprecation warning)
+    # mismatching values type -> raise error
     typ = pa.dictionary(index_type=pa.int8(), value_type=pa.int64())
-    with pytest.warns(FutureWarning):
+    with pytest.raises(pa.ArrowInvalid):
         result = pa.array(cat, type=typ)
-    assert result.to_pylist() == ['a', 'b']
+    assert result.to_pylist() == ["a", "b"]

Review comment:
       this line can be removed now? (it's actually testing again the `result` of the previous case above)

##########
File path: python/pyarrow/tests/test_convert_builtin.py
##########
@@ -382,11 +382,8 @@ def test_sequence_custom_integers(seq):
 @parametrize_with_iterable_types
 def test_broken_integers(seq):

Review comment:
       I think that is fine, personally

##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -562,7 +562,7 @@ def test_is_null():
 def test_fill_null():
     arr = pa.array([1, 2, None, 4], type=pa.int8())
     fill_value = pa.array([5], type=pa.int8())
-    with pytest.raises(TypeError):
+    with pytest.raises(pa.ArrowInvalid):

Review comment:
       What did the error message say before, and what does it show now?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#issuecomment-691845741


   @jorisvandenbossche was there more to be done here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche closed pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche closed pull request #8044:
URL: https://github.com/apache/arrow/pull/8044


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#issuecomment-692733341


   There are some known failures on Mac and Appveyor at the moment, so nothing to worry about for this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on a change in pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
arw2019 commented on a change in pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#discussion_r479811805



##########
File path: cpp/src/arrow/python/python_to_arrow.cc
##########
@@ -106,7 +106,12 @@ struct ValueConverter<Type, enable_if_integer<Type>> {
 
   static inline Result<ValueType> FromPython(PyObject* obj) {
     ValueType value;
-    RETURN_NOT_OK(internal::CIntFromPython(obj, &value));
+    arrow::Status s_ = internal::CIntFromPython(obj, &value);
+    if (!s_.ok() && !internal::PyIntScalar_Check(obj)) {
+      return internal::InvalidValue(obj, "tried to convert to int");
+    } else {
+      RETURN_NOT_OK(s_);

Review comment:
       Yes, and also when converting a negative integer to a `uint`:
   ```
   pa.scalar(-1, type='uint8')
   ```
   No other tests are touched if I recompile without this check




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#discussion_r478879571



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -562,7 +562,7 @@ def test_is_null():
 def test_fill_null():
     arr = pa.array([1, 2, None, 4], type=pa.int8())
     fill_value = pa.array([5], type=pa.int8())
-    with pytest.raises(TypeError):
+    with pytest.raises(pa.ArrowInvalid):

Review comment:
       Hmm, for this case I find the original error message clearer ..
   
   That's the consequence of the `scalar(..)` conversion using the array conversion under the hood, I suppose? 
   But OK, I suppose this is fine (it's maybe mainly the multiline repr of the array in the middle of the sentence that makes it more confusing)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on a change in pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
arw2019 commented on a change in pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#discussion_r478814578



##########
File path: python/pyarrow/tests/test_pandas.py
##########
@@ -3531,11 +3533,11 @@ def test_dictionary_from_pandas_specified_type():
     assert result.type.equals(typ)
     assert result.to_pylist() == ['a', 'b']
 
-    # mismatching values type -> raise error (for now a deprecation warning)
+    # mismatching values type -> raise error
     typ = pa.dictionary(index_type=pa.int8(), value_type=pa.int64())
-    with pytest.warns(FutureWarning):
+    with pytest.raises(pa.ArrowInvalid):
         result = pa.array(cat, type=typ)
-    assert result.to_pylist() == ['a', 'b']
+    assert result.to_pylist() == ["a", "b"]

Review comment:
       removed 

##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -562,7 +562,7 @@ def test_is_null():
 def test_fill_null():
     arr = pa.array([1, 2, None, 4], type=pa.int8())
     fill_value = pa.array([5], type=pa.int8())
-    with pytest.raises(TypeError):
+    with pytest.raises(pa.ArrowInvalid):

Review comment:
       On master it's
   ``` python
   TypeError: an integer is required (got type pyarrow.lib.Int8Array)
   ```
   verus on this branch
   ``` python
   
   ArrowInvalid: Could not convert [
     5
   ] with type pyarrow.lib.Int8Array: tried to convert to int
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#issuecomment-691921944


   Thanks for the ping. I think all good. @arw2019 can you just rebase to ensure it's still all passing with latest master?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#issuecomment-679599784


   https://issues.apache.org/jira/browse/ARROW-7663


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on a change in pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
arw2019 commented on a change in pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#discussion_r476648698



##########
File path: python/pyarrow/tests/test_convert_builtin.py
##########
@@ -382,11 +382,8 @@ def test_sequence_custom_integers(seq):
 @parametrize_with_iterable_types
 def test_broken_integers(seq):

Review comment:
       We lose the the more specific traceback and `ZeroDivisionError` message, in favor of
   ``` python
   In [11]: class MyBrokenInt: 
       ...:     def __init__(self): 
       ...:         1/0
   In [12]: pa.array([MyBrokenInt()], type=pa.int64())        
   ---------------------------------------------------------------------------
   ArrowInvalid                              Traceback (most recent call last)
   <ipython-input-12-1cf156b165b3> in <module>
   ----> 1 pa.array([MyBrokenInt()], type=pa.int64())
   
   ~/git_repo/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
       269     else:
       270         # ConvertPySequence does strict conversion if type is explicitly passed
   --> 271         return _sequence_to_array(obj, mask, size, type, pool, c_from_pandas)
       272 
       273 
   
   ~/git_repo/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()
        38 
        39     with nogil:
   ---> 40         check_status(ConvertPySequence(sequence, mask, options, &out))
        41 
        42     if out.get().num_chunks() == 1:
   
   ~/git_repo/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
        82 
        83         if status.IsInvalid():
   ---> 84             raise ArrowInvalid(message)
        85         elif status.IsIOError():
        86             # Note: OSError constructor is
   ArrowInvalid: Could not convert <__main__.MyBrokenInt object at 0x7fc331394290> with type MyBrokenInt: tried to convert to int
   ```
   but this is the same message as what we get on master for 
   ``` python
   In [11]: class MyBrokenInt: 
       ...:     def __init__(self): 
       ...:         1/1 
   ```
   so maybe it's ok?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#issuecomment-692734903


   Thanks @arw2019 !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#discussion_r478880416



##########
File path: cpp/src/arrow/python/python_to_arrow.cc
##########
@@ -106,7 +106,12 @@ struct ValueConverter<Type, enable_if_integer<Type>> {
 
   static inline Result<ValueType> FromPython(PyObject* obj) {
     ValueType value;
-    RETURN_NOT_OK(internal::CIntFromPython(obj, &value));
+    arrow::Status s_ = internal::CIntFromPython(obj, &value);
+    if (!s_.ok() && !internal::PyIntScalar_Check(obj)) {
+      return internal::InvalidValue(obj, "tried to convert to int");
+    } else {
+      RETURN_NOT_OK(s_);

Review comment:
       What are the cases that this couldn't be converted, but that `obj` *is* an integer? When the integer is too big to fit in a C int?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
arw2019 commented on pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#issuecomment-692246161


   @jorisvandenbossche Rebased and seeing some failures. They're ones also popping up in other, unrelated, PRs, so not sure they're to do with this patch? I'm happy to investigate, though


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on a change in pull request #8044: ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table

Posted by GitBox <gi...@apache.org>.
arw2019 commented on a change in pull request #8044:
URL: https://github.com/apache/arrow/pull/8044#discussion_r476648698



##########
File path: python/pyarrow/tests/test_convert_builtin.py
##########
@@ -382,11 +382,8 @@ def test_sequence_custom_integers(seq):
 @parametrize_with_iterable_types
 def test_broken_integers(seq):

Review comment:
       We lose the the more specific traceback and `ZeroDivisionError` message, in favor of
   ``` python
   In [11]: class MyBrokenInt: 
       ...:     def __init__(self): 
       ...:         1/0
   In [12]: pa.array([MyBrokenInt()], type=pa.int64())        
   ---------------------------------------------------------------------------
   ArrowInvalid                              Traceback (most recent call last)
   <ipython-input-12-1cf156b165b3> in <module>
   ----> 1 pa.array([MyBrokenInt()], type=pa.int64())
   
   ~/git_repo/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
       269     else:
       270         # ConvertPySequence does strict conversion if type is explicitly passed
   --> 271         return _sequence_to_array(obj, mask, size, type, pool, c_from_pandas)
       272 
       273 
   
   ~/git_repo/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()
        38 
        39     with nogil:
   ---> 40         check_status(ConvertPySequence(sequence, mask, options, &out))
        41 
        42     if out.get().num_chunks() == 1:
   
   ~/git_repo/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
        82 
        83         if status.IsInvalid():
   ---> 84             raise ArrowInvalid(message)
        85         elif status.IsIOError():
        86             # Note: OSError constructor is
   ArrowInvalid: Could not convert <__main__.MyBrokenInt object at 0x7fc331394290> with type MyBrokenInt: tried to convert to int
   ```
   but this is the same message as what we get on master for 
   ``` python
   In [11]: class MyBrokenInt: 
       ...:     def __init__(self): 
       ...:         1/1 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org