You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/02/21 08:22:28 UTC
[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #33785: GH-33741: [Python] Address docstrings in Data Types Factory Functions

jorisvandenbossche commented on code in PR #33785:
URL: https://github.com/apache/arrow/pull/33785#discussion_r1112696114


##########
python/pyarrow/types.pxi:
##########
@@ -2509,6 +2724,16 @@ def timestamp(unit, tz=None):
     >>> pa.timestamp('s', tz='+07:30')
     TimestampType(timestamp[s, tz=+07:30])
 
+    Use timestamp type when creating a scalar object:
+
+    >>> from datetime import datetime
+    >>> pa.scalar(datetime(2012, 1, 1),
+    ...           type=pa.timestamp('s', tz='+07:30'))

Review Comment:
   ```suggestion
       >>> pa.scalar(datetime(2012, 1, 1), type=pa.timestamp('s', tz='+07:30'))
   ```
   
   (this fits on a single line for PEP8?)



##########
python/pyarrow/types.pxi:
##########
@@ -2664,41 +2901,145 @@ def month_day_nano_interval():
     """
     Create instance of an interval type representing months, days and
     nanoseconds between two dates.
+
+    Examples
+    --------
+    Create an instance of an month_day_nano_interval type:
+
+    >>> import pyarrow as pa
+    >>> pa.month_day_nano_interval()
+    DataType(month_day_nano_interval)
+
+    Create a scalar with month_day_nano_interval type:
+
+    >>> pa.scalar((1, 15, -30), type=pa.month_day_nano_interval())
+    <pyarrow.MonthDayNanoIntervalScalar: MonthDayNano(months=1, days=15, nanoseconds=-30)>
     """
     return primitive_type(_Type_INTERVAL_MONTH_DAY_NANO)
 
 
 def date32():
     """
     Create instance of 32-bit date (days since UNIX epoch 1970-01-01).
+
+    Examples
+    --------
+    Create an instance of 32-bit date type:
+
+    >>> import pyarrow as pa
+    >>> pa.date32()
+    DataType(date32[day])
+
+    Create a scalar with 32-bit date type:
+
+    >>> from datetime import datetime
+    >>> pa.scalar(datetime(2012, 1, 1), type=pa.date32())

Review Comment:
   Do we want to show creating it from `datetime.date` instead? (or in addition)



##########
python/pyarrow/types.pxi:
##########
@@ -2509,6 +2724,16 @@ def timestamp(unit, tz=None):
     >>> pa.timestamp('s', tz='+07:30')
     TimestampType(timestamp[s, tz=+07:30])
 
+    Use timestamp type when creating a scalar object:
+
+    >>> from datetime import datetime
+    >>> pa.scalar(datetime(2012, 1, 1),
+    ...           type=pa.timestamp('s', tz='+07:30'))
+    <pyarrow.TimestampScalar: datetime.datetime(2012, 1, 1, 7, 30, tzinfo=pytz.FixedOffset(450))>
+    >>> pa.scalar(datetime(2012, 1, 1),
+    ...           type=pa.timestamp('us'))

Review Comment:
   ```suggestion
       >>> pa.scalar(datetime(2012, 1, 1), type=pa.timestamp('us'))
   ```



##########
python/pyarrow/types.pxi:
##########
@@ -2509,6 +2724,16 @@ def timestamp(unit, tz=None):
     >>> pa.timestamp('s', tz='+07:30')
     TimestampType(timestamp[s, tz=+07:30])
 
+    Use timestamp type when creating a scalar object:
+
+    >>> from datetime import datetime
+    >>> pa.scalar(datetime(2012, 1, 1),
+    ...           type=pa.timestamp('s', tz='+07:30'))

Review Comment:
   And I would maybe show a timezone name instead of a fixed offset (in general there are not many reasons to use a fixed offset, I think). Maybe "UTC", to show that that is also an option.



##########
python/pyarrow/types.pxi:
##########
@@ -2914,6 +3454,34 @@ cpdef MapType map_(key_type, item_type, keys_sorted=False):
     Returns
     -------
     map_type : DataType
+
+    Examples
+    --------
+    Create an instance of MapType:
+
+    >>> import pyarrow as pa
+    >>> pa.map_(pa.string(), pa.int32())
+    MapType(map<string, int32>)
+    >>> pa.map_(pa.string(), pa.int32(), keys_sorted=True)
+    MapType(map<string, int32, keys_sorted>)
+
+    Use MapType to create an array:
+
+    >>> data = [[{'key': 'a', 'value': 1}, {'key': 'b', 'value': 2}]]

Review Comment:
   ```suggestion
       >>> data = [[{'key': 'a', 'value': 1}, {'key': 'b', 'value': 2}], [{'key': 'c', 'value': 3}]]
   ```
   
   By adding a second element of the array, it might be clearer that this is actually like a nested list (where the first element of the map has two key/values mappings, and the second element has one). Currently someone not familiar with the repr might interpret it as an array of length 2



##########
python/pyarrow/types.pxi:
##########
@@ -2810,6 +3237,26 @@ def large_binary():
 
     This data type may not be supported by all Arrow implementations.  Unless
     you need to represent data larger than 2GB, you should prefer binary().
+
+    Examples
+    --------
+    Create an instance of large variable-length binary type:
+
+    >>> import pyarrow as pa
+    >>> pa.large_binary()
+    DataType(large_binary)
+
+    and use the type to create an array:
+
+    >>> pa.array(['foo', 'bar'] * 50, type=pa.large_binary())

Review Comment:
   I don't know if the `* 50` is needed here. I would maybe use the same `['foo', 'bar', 'baz']` as you used above for the normal binary type. The `* 50` might indicate that it is "larger", but at the same time it is still tiny data, so I am not sure that helps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org