You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/13 02:01:50 UTC

[GitHub] [arrow] svjack commented on issue #9172: Is Expression have decomposition methods ?

svjack commented on issue #9172:
URL: https://github.com/apache/arrow/issues/9172#issuecomment-759153642


   > @svjack because the Expression APIs are somewhat provisional, we didn't expose much functionality to inspect / interact with it, for now. 
   > 
   > 
   > 
   > Do you have a specific use case for this?
   
   I search the usage of Expression in the pyarrow project,
   It seems that elements of dataset.pieces i.e. piece (Fragment)
   have partition_expression as its attribute,
   If i want perform some kind of filters by partition_expressson 
   on some pieces. The official support method is to use filters (Expression) argument in DatasetV2 constructor.
   But because the methods i can apply to Expression is limited,
   I replaced the official support method by custom a filter on pieces and want to use the partition_expression as the formal representation of the piece (partition Fragment),
   1、So i think some bool simplify should support, such as :
   (ExpressionA or ExpressionB) and ExpressionA -> ExpressionA
   2、I did not dive into the underlying logic of Expression filters execute. But think of below case:
   	total_expression = reduce(lambda pe_a, pe_b: pe_a.__or__(pe_b), map(lambda piece: piece.partition_expression, dataset.pieces)) 
   	total_expression may seemed as a trial expression of all pieces union, but if the underlying logic of execute total_expression is to simplify it first and execute the simplified total_expression , i think this may save the execute
   speed than perform a lot of __or__ (union) on many of fragments.
   
   So i want to make complex Expression have logic simplify method and some sense of pre-simplify in execute time.
   I think this depends on function can retrieve minimal logic units from the total_expression (this is about the element), when comes to 
   the "op" ("=" "in" and so on in _filters_to_expression), should have a formal or formula reverse method to transform Expression back to filters (construct by nested python collections) 
   With the help of these functions, Expression will have completeness in both algebraic (math) and programming.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org