You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2018/04/04 02:18:00 UTC
[jira] [Created] (HIVEMALL-185) Add an optimizer rule to push down
a Sample plan node into fact tables
Takeshi Yamamuro created HIVEMALL-185:
-----------------------------------------
Summary: Add an optimizer rule to push down a Sample plan node into fact tables
Key: HIVEMALL-185
URL: https://issues.apache.org/jira/browse/HIVEMALL-185
Project: Hivemall
Issue Type: Sub-task
Reporter: Takeshi Yamamuro
Assignee: Takeshi Yamamuro
Sampling is a common technique to extract a part of data in joined relations (fact tables and dimension tables) for training data. The optimizer in Spark cannot push down a Sample plan node into larger fact tables because this node is non-deterministic. But, by using RI constraints, we could push down this node into fact tables in some cases.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)