You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/03 13:12:09 UTC

[GitHub] [arrow] pitrou commented on a change in pull request #12543: ARROW-15432: [Python] Address CSV docstrings

pitrou commented on a change in pull request #12543:
URL: https://github.com/apache/arrow/pull/12543#discussion_r818633656



##########
File path: python/pyarrow/_csv.pyx
##########
@@ -178,6 +183,33 @@ cdef class ReadOptions(_Weakrefable):
         The number of rows to skip before the column names (if any)
         and the CSV data.
         See `skip_rows_after_names` for interaction description
+
+        Examples:
+        ---------
+        >>> from pyarrow import csv

Review comment:
       I'm not sure it's worth repeating the import at the top of each example. But otherwise you should add one at the top of the `use_threads` example.
   
   @jorisvandenbossche Thoughts?

##########
File path: python/pyarrow/_csv.pyx
##########
@@ -340,6 +465,22 @@ cdef class ParseOptions(_Weakrefable):
         """
         The character used optionally for quoting CSV values
         (False if quoting is not allowed).
+
+        Examples:
+        ---------
+
+        >>> from pyarrow import csv
+
+        >>> parse_options = csv.ParseOptions(quote_char=",")
+        >>> csv.read_csv("animals.csv", parse_options=parse_options)
+        pyarrow.Table
+        "animals": string
+        "n_legs": int64
+        "entry": string
+        ----
+        "animals": [[""Flamingo"",""Horse"",""Brittle stars"",""Centipede""]]

Review comment:
       Similar question here, and the result will probably confuse the user.

##########
File path: python/pyarrow/_csv.pyx
##########
@@ -223,6 +296,34 @@ cdef class ReadOptions(_Weakrefable):
         - `skip_rows` is applied (if non-zero);
         - column names aread (unless `column_names` is set);
         - `skip_rows_after_names` is applied (if non-zero).
+
+        Examples:
+        ---------
+
+        >>> from pyarrow import csv
+
+        >>> read_options = csv.ReadOptions(skip_rows_after_names=1)
+        >>> csv.read_csv("animals.csv", read_options=read_options)
+        pyarrow.Table
+        animals: string
+        n_legs: int64
+        entry: string
+        ----
+        animals: [["Horse","Brittle stars","Centipede"]]
+        n_legs: [[4,5,100]]
+        entry: [["02/03/2022","03/03/2022","04/03/2022"]]

Review comment:
       Sidenote: if the dates where in ISO format (e.g. "2022-03-02"), they would be inferred neatly as date32.

##########
File path: python/pyarrow/_csv.pyx
##########
@@ -328,6 +429,30 @@ cdef class ParseOptions(_Weakrefable):
     def delimiter(self):
         """
         The character delimiting individual cells in the CSV data.
+
+        Examples:
+        ---------
+
+        >>> from pyarrow import csv
+
+        >>> parse_options = csv.ParseOptions(delimiter=";")
+        >>> csv.read_csv("animals.csv", parse_options=parse_options)
+        pyarrow.Table
+        animals,"n_legs","entry": string
+        ----
+        animals,"n_legs","entry": [["Flamingo,2,"01/03/2022"","Horse,4,"02/03/2022"",
+        "Brittle stars,5,"03/03/2022"","Centipede,100,"04/03/2022""]]

Review comment:
       I don't know... is it useful to show a CSV file being parsed with the wrong delimiter?

##########
File path: python/pyarrow/_csv.pyx
##########
@@ -190,6 +222,21 @@ cdef class ReadOptions(_Weakrefable):
         """
         The column names of the target table.  If empty, fall back on
         `autogenerate_column_names`.
+
+        Examples:
+        ---------
+        >>> from pyarrow import csv
+
+        >>> >>> read_options = csv.ReadOptions(column_names=["a", "n", "d"])

Review comment:
       ```suggestion
           >>>read_options = csv.ReadOptions(column_names=["a", "n", "d"])
   ```

##########
File path: python/pyarrow/_csv.pyx
##########
@@ -152,6 +152,11 @@ cdef class ReadOptions(_Weakrefable):
     def use_threads(self):
         """
         Whether to use multiple threads to accelerate reading.
+
+        Examples:
+        ---------

Review comment:
       I don't think numpydoc expects a trailing colon:
   ```suggestion
           Examples
           --------
   ```

##########
File path: python/pyarrow/_csv.pyx
##########
@@ -359,6 +500,11 @@ cdef class ParseOptions(_Weakrefable):
         """
         Whether two quotes in a quoted CSV value denote a single quote
         in the data.
+
+        Examples:
+        ---------
+        >>> parse_options = csv.ParseOptions(double_quote=False)
+        >>> csv.read_csv(input_file, parse_options=parse_options)

Review comment:
       I think we don't necessarily have to add an example if the example doesn't show anything interesting :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org