You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/14 15:08:10 UTC

[GitHub] [arrow] KirillLykov opened a new pull request #8461: ARROW-10197: [python][Gandiva] Execute expression on filtered data

KirillLykov opened a new pull request #8461:
URL: https://github.com/apache/arrow/pull/8461


   This PR is to enable running queries on top of filtered data in python. For that, I extended Cython wrapper and added a unit test which is the same as a test in cpp module https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L208


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8461: ARROW-10197: [python][Gandiva] Execute expression on filtered data

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8461:
URL: https://github.com/apache/arrow/pull/8461#issuecomment-708466709


   https://issues.apache.org/jira/browse/ARROW-10197


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #8461: ARROW-10197: [python][Gandiva] Execute expression on filtered data

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #8461:
URL: https://github.com/apache/arrow/pull/8461#issuecomment-725667526


   @pitrou or @bkietz could one of you have a look?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #8461: ARROW-10197: [python][Gandiva] Execute expression on filtered data

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #8461:
URL: https://github.com/apache/arrow/pull/8461#issuecomment-710473983


   The community is in the middle of making a release so it may be a little while before a reviewer can get to this


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz closed pull request #8461: ARROW-10197: [python][Gandiva] Execute expression on filtered data

Posted by GitBox <gi...@apache.org>.
bkietz closed pull request #8461:
URL: https://github.com/apache/arrow/pull/8461


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] KirillLykov commented on a change in pull request #8461: ARROW-10197: [python][Gandiva] Execute expression on filtered data

Posted by GitBox <gi...@apache.org>.
KirillLykov commented on a change in pull request #8461:
URL: https://github.com/apache/arrow/pull/8461#discussion_r527184657



##########
File path: python/pyarrow/includes/libgandiva.pxd
##########
@@ -58,6 +67,31 @@ cdef extern from "gandiva/selection_vector.h" namespace "gandiva" nogil:
             int64_t max_slots, CMemoryPool* pool,
             shared_ptr[CSelectionVector]* selection_vector)
 
+cdef inline CSelectionVector_Mode _ensure_selection_mode(str name) except *:
+    uppercase = name.upper()
+    if uppercase == 'NONE':
+        return CSelectionVector_Mode_NONE
+    elif uppercase == 'UINT16':
+        return CSelectionVector_Mode_UINT16
+    elif uppercase == 'UINT32':
+        return CSelectionVector_Mode_UINT32
+    elif uppercase == 'UINT64':
+        return CSelectionVector_Mode_UINT64
+    else:
+        raise ValueError('Invalid value for Selection Mode: {!r}'.format(name))
+
+cdef inline str _selection_mode_name(CSelectionVector_Mode ctype):

Review comment:
       It cannot be added because `Exception clause not allowed for function returning Python object`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] KirillLykov commented on pull request #8461: ARROW-10197: [python][Gandiva] Execute expression on filtered data

Posted by GitBox <gi...@apache.org>.
KirillLykov commented on pull request #8461:
URL: https://github.com/apache/arrow/pull/8461#issuecomment-731040216


   It looks like I addressed all the comments by @bkietz . The only thing is that the commit history looks a bit too long and I would recommend squashing (don't know project policy about merging/rebasing/squashing).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #8461: ARROW-10197: [python][Gandiva] Execute expression on filtered data

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #8461:
URL: https://github.com/apache/arrow/pull/8461#discussion_r522287090



##########
File path: python/pyarrow/includes/libgandiva.pxd
##########
@@ -58,6 +67,31 @@ cdef extern from "gandiva/selection_vector.h" namespace "gandiva" nogil:
             int64_t max_slots, CMemoryPool* pool,
             shared_ptr[CSelectionVector]* selection_vector)
 
+cdef inline CSelectionVector_Mode _ensure_selection_mode(str name) except *:
+    uppercase = name.upper()
+    if uppercase == 'NONE':
+        return CSelectionVector_Mode_NONE
+    elif uppercase == 'UINT16':
+        return CSelectionVector_Mode_UINT16
+    elif uppercase == 'UINT32':
+        return CSelectionVector_Mode_UINT32
+    elif uppercase == 'UINT64':
+        return CSelectionVector_Mode_UINT64
+    else:
+        raise ValueError('Invalid value for Selection Mode: {!r}'.format(name))
+
+cdef inline str _selection_mode_name(CSelectionVector_Mode ctype):

Review comment:
       ```suggestion
   cdef inline str _selection_mode_name(CSelectionVector_Mode ctype) except *:
   ```

##########
File path: python/pyarrow/gandiva.pyx
##########
@@ -405,9 +437,24 @@ cpdef make_projector(Schema schema, children, MemoryPool pool):
                                 &result))
     return Projector.create(result, pool)
 
+cpdef make_projector_with_mode(Schema schema, children,
+                               str selection_mode, MemoryPool pool):

Review comment:
       Some duplicated code could be removed by making selection_mode an optional argument of make_projector instead of writing a new function

##########
File path: python/pyarrow/gandiva.pyx
##########
@@ -405,9 +437,24 @@ cpdef make_projector(Schema schema, children, MemoryPool pool):
                                 &result))
     return Projector.create(result, pool)
 
+cpdef make_projector_with_mode(Schema schema, children,
+                               str selection_mode, MemoryPool pool):
+    cdef c_vector[shared_ptr[CExpression]] c_children
+    cdef Expression child
+    for child in children:
+        c_children.push_back(child.expression)
+    cdef shared_ptr[CProjector] result
+    cdef Configuration configuration = Configuration.create()
+    cdef CSelectionVector_Mode mode = _ensure_selection_mode(selection_mode)
+    check_status(
+        Projector_Make(schema.sp_schema, c_children, mode,
+                       configuration.config, &result))

Review comment:
       ```suggestion
       check_status(
           Projector_Make(schema.sp_schema,
                          c_children,
                          _ensure_selection_mode(selection_mode),
                          CConfigurationBuilder.DefaultConfiguration(),
                          &result))
   ```

##########
File path: python/pyarrow/gandiva.pyx
##########
@@ -161,6 +179,20 @@ cdef class Projector(_Weakrefable):
             arrays.append(pyarrow_wrap_array(result))
         return arrays
 
+    def evaluate_with_selection(self, RecordBatch batch,
+                                SelectionVector selection):

Review comment:
       Likewise `selection` could be an optional argument of evaluate()

##########
File path: python/pyarrow/gandiva.pyx
##########
@@ -97,6 +100,21 @@ cdef class Expression(_Weakrefable):
     cdef void init(self, shared_ptr[CExpression] expression):
         self.expression = expression
 
+cdef class Configuration(_Weakrefable):
+    cdef:
+        shared_ptr[CConfiguration] config
+
+    def __init__(self):
+        raise TypeError("Do not call {}'s constructor directly, use the "
+                        "TreeExprBuilder API directly"
+                        .format(self.__class__.__name__))
+
+    @staticmethod
+    cdef create():
+        cdef Configuration self = Configuration.__new__(Configuration)
+        self.config = CConfigurationBuilder.DefaultConfiguration()
+        return self
+

Review comment:
       ```suggestion
   ```
   
   This class is currently unnecessary since it only wraps the default configuration. Exposing it can wait until there's a need for python to adjust a Configuration.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] KirillLykov commented on pull request #8461: ARROW-10197: [python][Gandiva] Execute expression on filtered data

Posted by GitBox <gi...@apache.org>.
KirillLykov commented on pull request #8461:
URL: https://github.com/apache/arrow/pull/8461#issuecomment-708485699


   Lint failure is clear but two other failing checks:
   * Python / AMD64 MacOS 10.15 Python 3
   * continuous-integration/appveyor/pr
   
   Do not seem to be related with my changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org