You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hivemall.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2018/04/04 02:18:00 UTC

[jira] [Created] (HIVEMALL-185) Add an optimizer rule to push down a Sample plan node into fact tables

Takeshi Yamamuro created HIVEMALL-185:
-----------------------------------------

             Summary: Add an optimizer rule to push down a Sample plan node into fact tables
                 Key: HIVEMALL-185
                 URL: https://issues.apache.org/jira/browse/HIVEMALL-185
             Project: Hivemall
          Issue Type: Sub-task
            Reporter: Takeshi Yamamuro
            Assignee: Takeshi Yamamuro


Sampling is a common technique to extract a part of data in joined relations (fact tables and dimension tables) for training data. The optimizer in Spark cannot push down a Sample plan node into larger fact tables because this node is non-deterministic. But, by using RI constraints, we could push down this node into fact tables in some cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)