You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Divya (JIRA)" <ji...@apache.org> on 2016/05/20 15:38:12 UTC

[jira] [Created] (PIG-4901) To use Multistorage for each Group

Divya created PIG-4901:
--------------------------

             Summary: To use Multistorage for each Group
                 Key: PIG-4901
                 URL: https://issues.apache.org/jira/browse/PIG-4901
             Project: Pig
          Issue Type: Task
          Components: piggybank
    Affects Versions: 0.11.1
         Environment: Hadoop 1.2.0
            Reporter: Divya
            Priority: Minor


I am trying to group my data and store in hdfs with a folder for each 'name' and subfolders for each 'YearMonth' under each name folder.

Input:
(Date)            (name)     (col3)     (col4)
2015-02-02    abc              y          z
2016-01-02    xyz              i            j
2015-03-02    abc              f          b
2015-02-06    abc              y          z
2016-03-02    xyz              a          q
    
Expected out in hdfs:
abc folder
    ->201502 subfolder
           2015-02-02    abc              y          z
           2015-02-06    abc              y          z
    ->201503 subfolder
           2015-03-02    abc              f           b
xyz folder
    ->201601
          2016-01-02    xyz              i            j
    ->201603
          2016-03-02    xyz              a          q

I am not sure of how to use the Multistorage option on Name column after grouping the tuples by date.

Any help is appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)