You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/26 18:48:31 UTC

[GitHub] [arrow] westonpace commented on a change in pull request #11008: ARROW-13755: [Python] Allow writing datasets using a partitioning that only specifies field_names

westonpace commented on a change in pull request #11008:
URL: https://github.com/apache/arrow/pull/11008#discussion_r696893157



##########
File path: python/pyarrow/dataset.py
##########
@@ -678,13 +678,24 @@ def dataset(source, schema=None, format=None, filesystem=None,
         )
 
 
-def _ensure_write_partitioning(scheme):
-    if scheme is None:
-        scheme = partitioning(pa.schema([]))
-    if not isinstance(scheme, Partitioning):
-        # TODO support passing field names, and get types from schema
+def _ensure_write_partitioning(part, schema):
+    if isinstance(part, (tuple, list)):
+        # Name of fields were provided instead of a partitioning object.
+        # Create a partitioning factory with those field names.
+        part = partitioning(field_names=part)
+
+    if part is None:
+        part = partitioning(pa.schema([]))
+    elif isinstance(part, PartitioningFactory):
+        # If a schema is provided, combine the factory with the schema
+        # to build a real Partitioning object that the Writer can accept.
+        if schema is not None:
+            part = part.create_with_schema(schema)

Review comment:
       Perhaps raise a more specific error in an else case.  Something like "PartitioningFactory provided but no schema, a schema must be provided."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org