You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/15 15:22:14 UTC

[GitHub] [arrow] amol- opened a new pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

amol- opened a new pull request #10054:
URL: https://github.com/apache/arrow/pull/10054


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] amol- commented on a change in pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#discussion_r614775487



##########
File path: python/pyarrow/io.pxi
##########
@@ -1449,6 +1449,10 @@ cdef shared_ptr[CBuffer] as_c_buffer(object o) except *:
 
 
 cdef NativeFile get_native_file(object source, c_bool use_memory_map):
+    if source is None:

Review comment:
       Yes, I can confirm that `not None` is not allowed in `cdef`  functions:
   
   ```
   pyarrow/io.pxi:1451:32: 'not None' only allowed in Python functions
   ```
   
   I think that the main value probably comes from having a test to cover for a past `segfault`, the better error message is a nice to have but not strictly something that values the extra code. If we want to go deeper we might want to investigate where the segfault was originally coming from (as my feeling is that we have just hidden it behind other python changes) but it doesn't seem to value the extra time at the moment if it's already solved.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] amol- commented on a change in pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#discussion_r615844457



##########
File path: python/pyarrow/io.pxi
##########
@@ -1449,6 +1449,10 @@ cdef shared_ptr[CBuffer] as_c_buffer(object o) except *:
 
 
 cdef NativeFile get_native_file(object source, c_bool use_memory_map):
+    if source is None:

Review comment:
       The same issue seems to involved btw also callers of `get_input_stream` which is a bit more widespread.
   See for example
   ```
   >>> json.read_json(None)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "pyarrow/_json.pyx", line 238, in pyarrow._json.read_json
       _get_reader(input_file, &stream)
     File "pyarrow/_json.pyx", line 193, in pyarrow._json._get_reader
       get_input_stream(input_file, use_memory_map, out)
     File "pyarrow/io.pxi", line 1494, in pyarrow.lib.get_input_stream
       input_stream = nf.get_input_stream()
   AttributeError: 'NoneType' object has no attribute 'get_input_stream'
   ```
   
   I wonder if a possible strategy might also be to provide an error signaling return value to `get_reader` and `get_input_stream` (as they don't return anything) and use it for the purpose of adding an `except -1` or something like that.
   
   That seems would cover more paths that could all end up in this situation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] amol- edited a comment on pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
amol- edited a comment on pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#issuecomment-828272172


   I simplified further the solution as suggested by @jorisvandenbossche 
   I think that at this point the change is so minor that even if we don't think it provides much value the maintenance cost is negligible and still verifies for something that was a segfault in the past.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#issuecomment-826926941


   Personally, I still find it worth it to provide a better error message. Yes, it's not typical to pass `None` as the file source, but it can accidentally happen, and then the current message is not that helpful (it doesn't give any indication which is incorrectly being passed as None, but just showing a later error about None not having some attribute)
   
   For example adding a simple `not None` to ParquetReader:
   
   ```diff
   --- a/python/pyarrow/_parquet.pyx
   +++ b/python/pyarrow/_parquet.pyx
   @@ -927,7 +927,7 @@ cdef class ParquetReader(_Weakrefable):
            self.pool = maybe_unbox_memory_pool(memory_pool)
            self._metadata = None
    
   -    def open(self, object source, bint use_memory_map=True,
   +    def open(self, object source not None, bint use_memory_map=True,
                 read_dictionary=None, FileMetaData metadata=None,
                 int buffer_size=0):
   ```
   
   could also help if we don't want to add an explicit `if .. is None: ..` check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] amol- commented on pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
amol- commented on pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#issuecomment-828272172


   I simplified further the solution as suggested by @jorisvandenbossche 
   I think that at this point the change is so minor that even if we don't think it provides much value the maintenance cost is negligible


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche edited a comment on pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#issuecomment-826926941


   Personally, I still find it worth it to provide a better error message. Yes, it's not typical to pass `None` as the file source, but it can accidentally happen, and then the current message is not that helpful (it doesn't give any indication which keyword is incorrectly being passed as None, but just showing a later error about None not having some attribute)
   
   For example adding a simple `not None` to ParquetReader:
   
   ```diff
   --- a/python/pyarrow/_parquet.pyx
   +++ b/python/pyarrow/_parquet.pyx
   @@ -927,7 +927,7 @@ cdef class ParquetReader(_Weakrefable):
            self.pool = maybe_unbox_memory_pool(memory_pool)
            self._metadata = None
    
   -    def open(self, object source, bint use_memory_map=True,
   +    def open(self, object source not None, bint use_memory_map=True,
                 read_dictionary=None, FileMetaData metadata=None,
                 int buffer_size=0):
   ```
   
   could also help if we don't want to add an explicit `if .. is None: ..` check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#discussion_r615867888



##########
File path: python/pyarrow/io.pxi
##########
@@ -1449,6 +1449,10 @@ cdef shared_ptr[CBuffer] as_c_buffer(object o) except *:
 
 
 cdef NativeFile get_native_file(object source, c_bool use_memory_map):
+    if source is None:

Review comment:
       Again, I'm not sure what the concern is. Is it common to pass `None` instead of a file object? It's not an optional parameter, so I don't know where the `None` could come from.
   
   (also, getting this kind of `AttributeError` when passing the wrong type is quite common in Python)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#issuecomment-820594065


   https://issues.apache.org/jira/browse/ARROW-10910


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] amol- commented on a change in pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#discussion_r614775487



##########
File path: python/pyarrow/io.pxi
##########
@@ -1449,6 +1449,10 @@ cdef shared_ptr[CBuffer] as_c_buffer(object o) except *:
 
 
 cdef NativeFile get_native_file(object source, c_bool use_memory_map):
+    if source is None:

Review comment:
       Yes, I can confirm that `not None` is not allowed in `cdef`  functions:
   
   ```
   pyarrow/io.pxi:1451:32: 'not None' only allowed in Python functions
   ```
   
   I think that the main value probably comes from having a test to cover for a past `segfault`, the better error message is a nice to have but not strictly something that values the extra code. 
   
   If we want to go deeper we might want to investigate where the segfault was originally coming from (as my feeling is that we have just hidden it behind other python changes) but it doesn't seem to value the extra time at the moment if it's already solved.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#discussion_r614260138



##########
File path: python/pyarrow/io.pxi
##########
@@ -1449,6 +1449,10 @@ cdef shared_ptr[CBuffer] as_c_buffer(object o) except *:
 
 
 cdef NativeFile get_native_file(object source, c_bool use_memory_map):
+    if source is None:

Review comment:
       I'm not sure this is useful. Why would you pass `None` in the first place?
   Not crashing is nice, but I don't think we want to maintain a specific error message for this case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] amol- commented on a change in pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
amol- commented on a change in pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#discussion_r614775487



##########
File path: python/pyarrow/io.pxi
##########
@@ -1449,6 +1449,10 @@ cdef shared_ptr[CBuffer] as_c_buffer(object o) except *:
 
 
 cdef NativeFile get_native_file(object source, c_bool use_memory_map):
+    if source is None:

Review comment:
       Yes, I can confirm that `not None` is not allowed in `cdef`  functions:
   
   ```
   pyarrow/io.pxi:1451:32: 'not None' only allowed in Python functions
   ```
   
   I think that the main value probably comes from having a test to cover for a past `segfault`, the better error message is a nice to have but not strictly something that values the extra code. I wonder if we should not just keep the test without any error message customization.
   
   If we want to go deeper we might want to investigate where the segfault was originally coming from (as my feeling is that we have just hidden it behind other python changes) but it doesn't seem to value the extra time at the moment if it's already solved.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] amol- commented on pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
amol- commented on pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#issuecomment-826913704


   @jorisvandenbossche @pitrou should I delete this one? If we don't think it values doing it there is little benefit in keeping the PR around


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#discussion_r615808906



##########
File path: python/pyarrow/io.pxi
##########
@@ -1449,6 +1449,10 @@ cdef shared_ptr[CBuffer] as_c_buffer(object o) except *:
 
 
 cdef NativeFile get_native_file(object source, c_bool use_memory_map):
+    if source is None:

Review comment:
       @amol- another solution could be to add `not None` in places where `get_native_file` eventually gets called, to get an error higher up. Eg `ParquetReader.open` or `ORCReader.open` could have `object source not None`, since those are `def` methods.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #10054:
URL: https://github.com/apache/arrow/pull/10054


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10054: ARROW-10910: [Python] Provide better error message when trying to read from None source

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10054:
URL: https://github.com/apache/arrow/pull/10054#discussion_r614293940



##########
File path: python/pyarrow/io.pxi
##########
@@ -1449,6 +1449,10 @@ cdef shared_ptr[CBuffer] as_c_buffer(object o) except *:
 
 
 cdef NativeFile get_native_file(object source, c_bool use_memory_map):
+    if source is None:

Review comment:
       I suggested to add `object source not None` in the signature (that would automatically give a better error message), but so apparently that only works for `def` functions and not for `cdef` functions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org