You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/23 16:37:51 UTC

[GitHub] [arrow] pitrou opened a new pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

pitrou opened a new pull request #12698:
URL: https://github.com/apache/arrow/pull/12698


   The numpydoc-validation routine in Archery would skip over many Cython-generated methods and properties.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #12698:
URL: https://github.com/apache/arrow/pull/12698


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1077583674


   @pitrou I don't quite understand this change: `inspect_signature` already supports cython generated embedded signatures but this PR seems to disable that. On the other hand now it properly retrieves the module of the method, so I assume `_get_module` does the actual fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1076771965


   Hmm, I need to fix more docstrings before this is ready. My local testing was unfortunately incomplete.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r837719810



##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -816,48 +843,25 @@ cdef class Fragment(_Weakrefable):
             Schema to use for scanning. This is used to unify a Fragment to
             it's Dataset's schema. If not specified this will use the
             Fragment's physical schema which might differ for each Fragment.
-        columns : list of str, default None

Review comment:
       Ok, I opened https://issues.apache.org/jira/browse/ARROW-16063




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1076698302


   https://issues.apache.org/jira/browse/ARROW-15321


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r834258200



##########
File path: dev/archery/archery/lang/python.py
##########
@@ -139,10 +143,19 @@ def traverse(self, fn, obj, from_package):
                     continue
 
                 member = getattr(obj, name)
-                module = getattr(member, '__module__', None)
-                if not (module and module.startswith(from_package)):
+                module = _get_module(member)
+                if module is None or not module.startswith(from_package):
                     continue
-
+                # Is it a Cython-generated method? If so, try to detect
+                # whether it has a implicitly-generated docstring due to
+                # the Cython `embedsignature` directive; and if so, skip

Review comment:
       `inspect_signature` supports cython's signature, so we shouldn't skip those symbols.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1080454647


   @amol- @jorisvandenbossche Could you give this a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1080904780


   CI passed on my fork:
   https://github.com/pitrou/arrow/actions/runs/2053167401
   https://github.com/pitrou/arrow/actions/runs/2053167403
   https://github.com/pitrou/arrow/actions/runs/2053167405


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1080916898


   Benchmark runs are scheduled for baseline = 3eb5673597bf67246271b6c9a98e6f812d4e01a7 and contender = 6ab947b19246e137e3ef9f4478c7a15d363dcfd3. 6ab947b19246e137e3ef9f4478c7a15d363dcfd3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/545d323a996d444793d18283379176ed...f96b63ac1da5401995db3ce29cd467ce/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/b12320a2aacf47928ec7ce1424347f67...8f7bc2e876b644099bda95fcb83bff30/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/5245005e4a5346c78955ac5295410972...526f0a8411da4646a2dac7735622a1a2/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/028bc87a866a4c40852037bfeff8ab50...e8c8389d329e4948857be6ee517cc3e1/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r836538465



##########
File path: python/pyarrow/array.pxi
##########
@@ -849,19 +883,48 @@ cdef class Array(_PandasConvertible):
     def sum(self, **kwargs):
         """
         Sum the values in a numerical array.
+
+        See `pyarrow.compute.sum` for full usage.

Review comment:
       Good point, I'll do this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r834253612



##########
File path: .github/workflows/docs_light.yml
##########
@@ -21,7 +21,7 @@ on:
   pull_request:
     paths:
       - 'docs/**'
-      - '.github/workflows/docs.yml'
+      - '.github/workflows/docs_light.yml'

Review comment:
       Nice catch!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r836329892



##########
File path: python/pyarrow/_compute.pyx
##########
@@ -2127,21 +2138,76 @@ cdef class Expression(_Weakrefable):
         return Expression._call("divide_checked", [self, other])
 
     def is_valid(self):
-        """Checks whether the expression is not-null (valid)"""
+        """
+        Check whether the expression is not-null (valid).
+
+        This creates a new expression equivalent to calling the
+        `is_valid` compute function on this expression.
+
+        Returns
+        -------
+        is_valid : Expression

Review comment:
       Not that it needs to be changed for this PR, but just a general future note: I think putting only "Expression" (i.e. the type) on this line is valid as well for numpydoc, and personally I find the repeating of the name "is_valid: " not giving any added value.

##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -816,48 +843,25 @@ cdef class Fragment(_Weakrefable):
             Schema to use for scanning. This is used to unify a Fragment to
             it's Dataset's schema. If not specified this will use the
             Fragment's physical schema which might differ for each Fragment.
-        columns : list of str, default None

Review comment:
       For a separate JIRA: it might be nice to actually share this part of the docstring in both places (so users don't have to check another object's docstring to see the help), but for now it's good to deduplicate this and have a single up-to-date version.

##########
File path: python/pyarrow/table.pxi
##########
@@ -1966,9 +2085,12 @@ cdef class Table(_PandasConvertible):
 
         return pyarrow_wrap_table(c_table)
 
-    def to_batches(self, max_chunksize=None, **kwargs):
+    def to_batches(self, max_chunksize=None):
         """
-        Convert Table to list of (contiguous) RecordBatch objects.
+        Convert Table to a list of RecordBatch objects.
+
+        Note that this method is zero-copy, it merely exposes the same data
+        under a different API.

Review comment:
       This is not necessarily true, as the chunking of each column does not have to be the same? (in which case you will get some copies?)

##########
File path: python/pyarrow/_compute.pyx
##########
@@ -2127,21 +2138,76 @@ cdef class Expression(_Weakrefable):
         return Expression._call("divide_checked", [self, other])
 
     def is_valid(self):
-        """Checks whether the expression is not-null (valid)"""
+        """
+        Check whether the expression is not-null (valid).
+
+        This creates a new expression equivalent to calling the
+        `is_valid` compute function on this expression.
+
+        Returns
+        -------
+        is_valid : Expression

Review comment:
       Or did numpydoc complain about this? (seeing you changes this also in some existing docstrings)

##########
File path: python/pyarrow/table.pxi
##########
@@ -1966,9 +2085,12 @@ cdef class Table(_PandasConvertible):
 
         return pyarrow_wrap_table(c_table)
 
-    def to_batches(self, max_chunksize=None, **kwargs):
+    def to_batches(self, max_chunksize=None):
         """
-        Convert Table to list of (contiguous) RecordBatch objects.
+        Convert Table to a list of RecordBatch objects.
+
+        Note that this method is zero-copy, it merely exposes the same data
+        under a different API.

Review comment:
       Actually, I suppose we still only _slice_ data in that case, so still being zero-copy?

##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -700,17 +700,43 @@ cdef class FileFormat(_Weakrefable):
         return self.wrapped
 
     def inspect(self, file, filesystem=None):
-        """Infer the schema of a file."""
+        """
+        Infer the schema of a file.
+
+        Parameters
+        ----------
+        file : file-like object, path-like or str
+            The file or file path to infer a schema from.
+        filesystem : Filesystem, optional
+            If `filesystem` is given, `file` must be a string and specifies
+            the path of the file to read from the filesystem.
+
+        Returns
+        -------
+        schema : Schema
+            The schema inferred from the file
+        """
         c_source = _make_file_source(file, filesystem)
         c_schema = GetResultValue(self.format.Inspect(c_source))
         return pyarrow_wrap_schema(move(c_schema))
 
     def make_fragment(self, file, filesystem=None,
                       Expression partition_expression=None):
         """
-        Make a FileFragment of this FileFormat. The filter may not reference
-        fields absent from the provided schema. If no schema is provided then
-        one will be inferred.
+        Make a FileFragment from a given file.
+
+        The filter may not reference fields absent from the provided schema.
+        If no schema is provided then one will be inferred.

Review comment:
       This was already in the original docstring, but the "if not schema is provided" seems wrong, given there is no way to provide a schema. 

##########
File path: python/pyarrow/io.pxi
##########
@@ -1257,6 +1283,10 @@ cdef class BufferReader(NativeFile):
     cdef:
         Buffer buffer
 
+    # XXX Needed to make numpydoc happy
+    def __init__(self, obj):

Review comment:
       Do you remember why this was needed? (which error did it give?)
   
   Aside, we should probably try to instruct sphinx/autodoc/numpydoc to not include `__init__` functions in the reference docs, as I don't think we have any case where this adds value (compared to the class docstring). See eg https://arrow.apache.org/docs/dev/python/generated/pyarrow.parquet.ParquetDataset.html#pyarrow.parquet.ParquetDataset.__init__

##########
File path: python/pyarrow/array.pxi
##########
@@ -849,19 +883,48 @@ cdef class Array(_PandasConvertible):
     def sum(self, **kwargs):
         """
         Sum the values in a numerical array.
+
+        See `pyarrow.compute.sum` for full usage.

Review comment:
       We could make this:
   
   ```suggestion
           See :func:`pyarrow.compute.sum` for full usage.
   ```
   
   which would give a proper link in the html version (and not too much visual noise in the text version)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r836541612



##########
File path: python/pyarrow/table.pxi
##########
@@ -1966,9 +2085,12 @@ cdef class Table(_PandasConvertible):
 
         return pyarrow_wrap_table(c_table)
 
-    def to_batches(self, max_chunksize=None, **kwargs):
+    def to_batches(self, max_chunksize=None):
         """
-        Convert Table to list of (contiguous) RecordBatch objects.
+        Convert Table to a list of RecordBatch objects.
+
+        Note that this method is zero-copy, it merely exposes the same data
+        under a different API.

Review comment:
       Yes, it's still zero-copy thanks to slicing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1077471308


   @github-actions crossbow submit -g python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r834271751



##########
File path: dev/archery/archery/lang/python.py
##########
@@ -139,10 +143,19 @@ def traverse(self, fn, obj, from_package):
                     continue
 
                 member = getattr(obj, name)
-                module = getattr(member, '__module__', None)
-                if not (module and module.startswith(from_package)):
+                module = _get_module(member)
+                if module is None or not module.startswith(from_package):
                     continue
-
+                # Is it a Cython-generated method? If so, try to detect
+                # whether it has a implicitly-generated docstring due to
+                # the Cython `embedsignature` directive; and if so, skip

Review comment:
       The problem is not the signature, it's that the generated docstring doesn't have parameter description, so this would fail the numpydoc rule PR01.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r834283962



##########
File path: dev/archery/archery/lang/python.py
##########
@@ -139,10 +143,19 @@ def traverse(self, fn, obj, from_package):
                     continue
 
                 member = getattr(obj, name)
-                module = getattr(member, '__module__', None)
-                if not (module and module.startswith(from_package)):
+                module = _get_module(member)
+                if module is None or not module.startswith(from_package):
                     continue
-
+                # Is it a Cython-generated method? If so, try to detect
+                # whether it has a implicitly-generated docstring due to
+                # the Cython `embedsignature` directive; and if so, skip

Review comment:
       Ohh I think the comment is a bit misleading. So the condition below only skips if there is a signature but no docstrings, right? So for example it still validates the following:
   
   ```python
   In [9]: print(ds.Dataset.get_fragments.__doc__)
   Dataset.get_fragments(self, Expression filter=None)
   Returns an iterator over the fragments in this dataset.
   
           Parameters
           ----------
           filter : Expression, default None
               Return fragments matching the optional filter, either using the
               partition_expression or internal information like Parquet's
               statistics.
   
           Returns
           -------
           fragments : iterator of Fragment
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r837720972



##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -816,48 +843,25 @@ cdef class Fragment(_Weakrefable):
             Schema to use for scanning. This is used to unify a Fragment to
             it's Dataset's schema. If not specified this will use the
             Fragment's physical schema which might differ for each Fragment.
-        columns : list of str, default None

Review comment:
       I also asked about this here: https://github.com/apache/arrow/pull/12560#discussion_r825877794 (I'll put this on Jira)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r836535154



##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -700,17 +700,43 @@ cdef class FileFormat(_Weakrefable):
         return self.wrapped
 
     def inspect(self, file, filesystem=None):
-        """Infer the schema of a file."""
+        """
+        Infer the schema of a file.
+
+        Parameters
+        ----------
+        file : file-like object, path-like or str
+            The file or file path to infer a schema from.
+        filesystem : Filesystem, optional
+            If `filesystem` is given, `file` must be a string and specifies
+            the path of the file to read from the filesystem.
+
+        Returns
+        -------
+        schema : Schema
+            The schema inferred from the file
+        """
         c_source = _make_file_source(file, filesystem)
         c_schema = GetResultValue(self.format.Inspect(c_source))
         return pyarrow_wrap_schema(move(c_schema))
 
     def make_fragment(self, file, filesystem=None,
                       Expression partition_expression=None):
         """
-        Make a FileFragment of this FileFormat. The filter may not reference
-        fields absent from the provided schema. If no schema is provided then
-        one will be inferred.
+        Make a FileFragment from a given file.
+
+        The filter may not reference fields absent from the provided schema.
+        If no schema is provided then one will be inferred.

Review comment:
       Will change indeed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r836534470



##########
File path: python/pyarrow/_compute.pyx
##########
@@ -2127,21 +2138,76 @@ cdef class Expression(_Weakrefable):
         return Expression._call("divide_checked", [self, other])
 
     def is_valid(self):
-        """Checks whether the expression is not-null (valid)"""
+        """
+        Check whether the expression is not-null (valid).
+
+        This creates a new expression equivalent to calling the
+        `is_valid` compute function on this expression.
+
+        Returns
+        -------
+        is_valid : Expression

Review comment:
       No, it seems to pass actually. Given that other docstrings used this convention, I just thought it was preferred.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1077504497


   Revision: 05919a88a00341e8b7c02cb6b0b936b470d1db54
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-1790](https://github.com/ursacomputing/crossbow/branches/all?query=actions-1790)
   
   |Task|Status|
   |----|------|
   |test-conda-python-3.10|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.10)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.10)|
   |test-conda-python-3.7|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.7)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.7)|
   |test-conda-python-3.7-hdfs-2.9.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.7-hdfs-2.9.2)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.7-hdfs-2.9.2)|
   |test-conda-python-3.7-hdfs-3.2.1|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.7-hdfs-3.2.1)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.7-hdfs-3.2.1)|
   |test-conda-python-3.7-kartothek-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.7-kartothek-latest)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.7-kartothek-latest)|
   |test-conda-python-3.7-kartothek-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.7-kartothek-master)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.7-kartothek-master)|
   |test-conda-python-3.7-pandas-0.24|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.7-pandas-0.24)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.7-pandas-0.24)|
   |test-conda-python-3.7-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.7-pandas-latest)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.7-pandas-latest)|
   |test-conda-python-3.7-spark-v3.1.2|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.7-spark-v3.1.2)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.7-spark-v3.1.2)|
   |test-conda-python-3.7-turbodbc-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.7-turbodbc-latest)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.7-turbodbc-latest)|
   |test-conda-python-3.7-turbodbc-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.7-turbodbc-master)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.7-turbodbc-master)|
   |test-conda-python-3.8|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.8)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.8)|
   |test-conda-python-3.8-hypothesis|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.8-hypothesis)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.8-hypothesis)|
   |test-conda-python-3.8-pandas-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.8-pandas-latest)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.8-pandas-latest)|
   |test-conda-python-3.8-pandas-nightly|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.8-pandas-nightly)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.8-pandas-nightly)|
   |test-conda-python-3.8-spark-v3.2.0|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.8-spark-v3.2.0)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.8-spark-v3.2.0)|
   |test-conda-python-3.9|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.9)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.9)|
   |test-conda-python-3.9-dask-latest|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.9-dask-latest)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.9-dask-latest)|
   |test-conda-python-3.9-dask-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.9-dask-master)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.9-dask-master)|
   |test-conda-python-3.9-pandas-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.9-pandas-master)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.9-pandas-master)|
   |test-conda-python-3.9-spark-master|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-1790-github-test-conda-python-3.9-spark-master)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-1790-github-test-conda-python-3.9-spark-master)|
   |test-debian-11-python-3|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-1790-azure-test-debian-11-python-3)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-1790-azure-test-debian-11-python-3)|
   |test-fedora-35-python-3|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-1790-azure-test-fedora-35-python-3)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-1790-azure-test-fedora-35-python-3)|
   |test-ubuntu-20.04-python-3|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-1790-azure-test-ubuntu-20.04-python-3)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-1790-azure-test-ubuntu-20.04-python-3)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r834255172



##########
File path: dev/archery/archery/cli.py
##########
@@ -303,15 +303,22 @@ def lint(ctx, src, fix, iwyu_all, **checks):
         sys.exit(1)
 
 
+def _flatten_numpydoc_rules(rules):
+    flattened = []
+    for rule in rules:
+        flattened.extend(filter(None, rule.split(',')))
+    return flattened
+
+
 @archery.command(short_help="Lint python docstring with NumpyDoc")
 @click.argument('symbols', nargs=-1)
 @click.option("--src", metavar="<arrow_src>", default=None,
               callback=validate_arrow_sources,
               help="Specify Arrow source directory")
 @click.option("--allow-rule", "-a", multiple=True,
-              help="Allow only these rules")
+              help="Allow only these rules (can be comma-separated)")

Review comment:
       Nice!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r833608420



##########
File path: .github/workflows/docs_light.yml
##########
@@ -52,8 +52,8 @@ jobs:
         uses: actions/cache@v2
         with:
           path: .docker
-          key: ubuntu-docs-${{ hashFiles('cpp/**') }}
-          restore-keys: ubuntu-docs-
+          key: conda-docs-${{ hashFiles('docs/**') }}
+          restore-keys: conda-docs-

Review comment:
       Hmm, this change might partly be a mistake. Perhaps we want to compute the hashes on the C++ files?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1080916898


   Benchmark runs are scheduled for baseline = 3eb5673597bf67246271b6c9a98e6f812d4e01a7 and contender = 6ab947b19246e137e3ef9f4478c7a15d363dcfd3. 6ab947b19246e137e3ef9f4478c7a15d363dcfd3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/545d323a996d444793d18283379176ed...f96b63ac1da5401995db3ce29cd467ce/)
   [Finished :arrow_down:0.17% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/b12320a2aacf47928ec7ce1424347f67...8f7bc2e876b644099bda95fcb83bff30/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/5245005e4a5346c78955ac5295410972...526f0a8411da4646a2dac7735622a1a2/)
   [Finished :arrow_down:3.11% :arrow_up:13.36%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/028bc87a866a4c40852037bfeff8ab50...e8c8389d329e4948857be6ee517cc3e1/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1080916898


   Benchmark runs are scheduled for baseline = 3eb5673597bf67246271b6c9a98e6f812d4e01a7 and contender = 6ab947b19246e137e3ef9f4478c7a15d363dcfd3. 6ab947b19246e137e3ef9f4478c7a15d363dcfd3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/545d323a996d444793d18283379176ed...f96b63ac1da5401995db3ce29cd467ce/)
   [Finished :arrow_down:0.17% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/b12320a2aacf47928ec7ce1424347f67...8f7bc2e876b644099bda95fcb83bff30/)
   [Failed :arrow_down:1.07% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/5245005e4a5346c78955ac5295410972...526f0a8411da4646a2dac7735622a1a2/)
   [Finished :arrow_down:3.11% :arrow_up:13.36%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/028bc87a866a4c40852037bfeff8ab50...e8c8389d329e4948857be6ee517cc3e1/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1077598894


   @kszucs As answered in the comments: the problem is not the signature, it's that the Cython-generated docstring doesn't have parameter description, so this would fail the numpydoc rule PR01.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r834306523



##########
File path: dev/archery/archery/lang/python.py
##########
@@ -139,10 +143,19 @@ def traverse(self, fn, obj, from_package):
                     continue
 
                 member = getattr(obj, name)
-                module = getattr(member, '__module__', None)
-                if not (module and module.startswith(from_package)):
+                module = _get_module(member)
+                if module is None or not module.startswith(from_package):
                     continue
-
+                # Is it a Cython-generated method? If so, try to detect
+                # whether it has a implicitly-generated docstring due to
+                # the Cython `embedsignature` directive; and if so, skip

Review comment:
       Yes, exactly. I could improve the comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#issuecomment-1080916898


   Benchmark runs are scheduled for baseline = 3eb5673597bf67246271b6c9a98e6f812d4e01a7 and contender = 6ab947b19246e137e3ef9f4478c7a15d363dcfd3. 6ab947b19246e137e3ef9f4478c7a15d363dcfd3 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/545d323a996d444793d18283379176ed...f96b63ac1da5401995db3ce29cd467ce/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/b12320a2aacf47928ec7ce1424347f67...8f7bc2e876b644099bda95fcb83bff30/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/5245005e4a5346c78955ac5295410972...526f0a8411da4646a2dac7735622a1a2/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/028bc87a866a4c40852037bfeff8ab50...e8c8389d329e4948857be6ee517cc3e1/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12698: ARROW-15321: [Dev][Python] Also numpydoc-validate Cython-generated methods

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12698:
URL: https://github.com/apache/arrow/pull/12698#discussion_r836541028



##########
File path: python/pyarrow/io.pxi
##########
@@ -1257,6 +1283,10 @@ cdef class BufferReader(NativeFile):
     cdef:
         Buffer buffer
 
+    # XXX Needed to make numpydoc happy
+    def __init__(self, obj):

Review comment:
       The problem is that numpydoc tries to match the parameters documented in the `BufferReader` docstring with its `__init__` signature. To be honest, I'm not sure why some classes prefer to define `__cinit__` rather than `__init__`, there doesn't seem to be a consistent convention across the codebase.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org