You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2017/11/15 00:20:00 UTC

[jira] [Created] (PIG-5313) Support PARALLEL in STORE statement

Rohini Palaniswamy created PIG-5313:
---------------------------------------

             Summary: Support PARALLEL in STORE statement
                 Key: PIG-5313
                 URL: https://issues.apache.org/jira/browse/PIG-5313
             Project: Pig
          Issue Type: New Feature
          Components: tez
            Reporter: Rohini Palaniswamy


Restricting number of files in output is a very common use case. In Pig, currently users add a ORDER BY, GROUP BY or DISTINCT with the required parallelism before STORE to achieve it. All of the above operations create unnecessary overhead in processing. It would be ideal if STORE clause supported the PARALLEL statement and the partitioning of data was handled in a more simple and efficient manner.

This jira is more Tez specific and requires TEZ-3865. More details are in that jira regarding how it can be done via Tez. We will also have to add APIs to StoreFunc (HCatStorer, MultiStorage, etc) to get partition keys to partition the data for store statement.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)