You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/04 14:47:08 UTC

[GitHub] [arrow] sanjibansg commented on a diff in pull request #12530: ARROW-14612: [C++] Support for filename-based partitioning

sanjibansg commented on code in PR #12530:
URL: https://github.com/apache/arrow/pull/12530#discussion_r841825091


##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -569,6 +570,22 @@ def test_partitioning():
         with pytest.raises(pa.ArrowInvalid):
             partitioning.parse(shouldfail)
 
+    partitioning = ds.FilenamePartitioning(
+        pa.schema([
+            pa.field('group', pa.int64()),
+            pa.field('key', pa.float64())
+        ])
+    )
+    assert partitioning.dictionaries is None

Review Comment:
   For testing the dictionaries, I was trying to do something like this https://arrow.apache.org/docs/python/generated/pyarrow.dataset.partitioning.html?highlight=partitioning#pyarrow.dataset.partitioning, 
   which has a similar implementation now here,
   https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_dataset.py#L618
   
   I noticed that with the previous implementation, if I wanted to see the `dictionaries` in the Partitioning object, it was returning `None` inspite of having some dictionary fields. 
   
   Thus, I tried changing the implementation to what we have now, where the dictionary fields are returned correctly. 
   We can however change back to the implementation and return a `None` object if there are no dictionary fields present. We can just have a check on the `res` list here,
   https://github.com/apache/arrow/blob/master/python/pyarrow/_dataset.pyx#L1359
   and return `None` if it is empty. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org