You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/12 12:10:44 UTC

[GitHub] [arrow] lidavidm commented on a diff in pull request #12863: ARROW-11259: [Python] Allow to create field reference to nested field

lidavidm commented on code in PR #12863:
URL: https://github.com/apache/arrow/pull/12863#discussion_r848356453


##########
python/pyarrow/compute.py:
##########
@@ -597,16 +597,29 @@ def field(name_or_index):
     Stores only the field's name. Type and other information is known only when
     the expression is bound to a dataset having an explicit scheme.
 
+    Nested references are allowed by passing a tuple of names.
+    For example ``('foo', 'bar')`` references the field named "bar" inside
+    the field named "foo".
+
     Parameters
     ----------
-    name_or_index : string or int
-        The name or index of the field the expression references to.
+    name_or_index : string, tuple or int

Review Comment:
   I wonder if we should also allow dotted names (though that could get ambiguous), or else, allow `field("a", "b")` instead of requiring `field(("a", "b"))`. That said neither are essential here.



##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -391,6 +393,14 @@ def test_dataset(dataset, dataset_reader):
     assert sorted(result['group']) == [1, 2]
     assert sorted(result['key']) == ['xxx', 'yyy']
 
+    condition = ds.field(('struct', 'b')) == '1'
+    result = dataset.to_table(use_threads=True, filter=condition).to_pydict()

Review Comment:
   Can we test a column projection as well somewhere here?



##########
python/pyarrow/_compute.pyx:
##########
@@ -2219,10 +2219,26 @@ cdef class Expression(_Weakrefable):
 
     @staticmethod
     def _field(name_or_idx not None):
-        if isinstance(name_or_idx, str):
-            return Expression.wrap(CMakeFieldExpression(tobytes(name_or_idx)))
-        else:
+        cdef:
+            CFieldRef c_field
+
+        if isinstance(name_or_idx, int):
             return Expression.wrap(CMakeFieldExpressionByIndex(name_or_idx))
+        else:
+            c_field = CFieldRef(<c_string> tobytes(name_or_idx))
+            return Expression.wrap(CMakeFieldExpression(c_field))
+
+    @staticmethod
+    def _nested_field(tuple names not None):
+        cdef:
+            vector[CFieldRef] nested
+
+        if len(names) == 0:
+            raise ValueError("nested field reference should be non-empty")
+        nested.reserve(len(names))
+        for name in names:
+            nested.push_back(CFieldRef(<c_string> tobytes(name)))

Review Comment:
   We might want to file a follow up for nested field refs that use indices? ARROW-15658/#12664 are working on support for that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org