You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/16 18:28:47 UTC

[GitHub] [arrow] wjones127 commented on a diff in pull request #14646: ARROW-18269: [C++] Slash character in partition value handling

wjones127 commented on code in PR #14646:
URL: https://github.com/apache/arrow/pull/14646#discussion_r1024364550


##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -4912,3 +4912,33 @@ def test_read_table_nested_columns(tempdir, format):
         {'user_id': 'qrs456', 'type': 'scroll', 'values': [None, 3, 4],
          'structs': [{'fizz': 'buzz', 'foo': None}], 'a.dotted.field': 2}
     ]
+
+
+def test_dataset_partition_with_slash(tmpdir):
+    from pyarrow import dataset as ds
+
+    path = tmpdir / "slash-writer-x"
+
+    dt_table = pa.Table.from_arrays([
+        pa.array([1, 2, 3, 4, 5], pa.int32()),
+        pa.array(["experiment/A/f.csv", "experiment/B/f.csv",
+                  "experiment/A/f.csv", "experiment/C/k.csv",
+                  "experiment/M/i.csv"], pa.utf8())], ["exp_id", "exp_meta"])
+
+    ds.write_dataset(
+        data=dt_table,
+        base_dir=path,
+        format='parquet',
+        partitioning=['exp_meta'],
+        partitioning_flavor='hive',
+    )
+
+    read_table = ds.dataset(
+        source=path,
+        format='parquet',
+        partitioning='hive',
+        schema=pa.schema([pa.field("exp_id", pa.int32()),
+                         pa.field("exp_meta", pa.utf8())])
+    ).to_table().combine_chunks()
+

Review Comment:
   Could we also assert what the escaped partition directories are named? Given we are trying to be compatible with other systems, it seems like it would be wise to enforce what the Uri-encoded form is, rather than just asserting we can roundtrip.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org