You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Gaurav Jain (JIRA)" <ji...@apache.org> on 2009/11/26 01:41:39 UTC

[jira] Created: (PIG-1111) [Zebra]

[Zebra]
-------

                 Key: PIG-1111
                 URL: https://issues.apache.org/jira/browse/PIG-1111
             Project: Pig
          Issue Type: New Feature
            Reporter: Gaurav Jain
            Assignee: Gaurav Jain
             Fix For: 0.6.0, 0.7.0



Zebra enables application to stream data into different zebra table instances.

New Interface added:

setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.

Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )

ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance

We also introduce a new mapred property for setting multiple outputs.

mapred.lib.table.multi.output.dirs
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1111) [Zebra] multiple outputs support

Posted by "Chao Wang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785419#action_12785419 ] 

Chao Wang commented on PIG-1111:
--------------------------------

Why we need build script change to run multiple outputs test cases? Are they any different from other test cases?
I took a look at that part of change in the patch, seems nothing to do with the multiple outputs feature.

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch, PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1111) [Zebra] multiple outputs support

Posted by "Yan Zhou (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785736#action_12785736 ] 

Yan Zhou commented on PIG-1111:
-------------------------------

patch committed to trunk

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch, PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1111) [Zebra] multiple outputs support

Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gaurav Jain updated PIG-1111:
-----------------------------

    Attachment: PIG-1111.patch


I did some code cleaning in this patch.

Please review at your earliest convenience.

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch, PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1111) [Zebra] multiple outputs support

Posted by "Yan Zhou (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785388#action_12785388 ] 

Yan Zhou commented on PIG-1111:
-------------------------------

+1

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch, PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1111) [Zebra] multiple outputs support

Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gaurav Jain updated PIG-1111:
-----------------------------

    Status: Patch Available  (was: Open)


Please review

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch, PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1111) [Zebra] multiple outputs support

Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gaurav Jain updated PIG-1111:
-----------------------------

    Affects Version/s: 0.7.0
                       0.6.0
               Status: Patch Available  (was: Open)


Please review and provide feedback at your earliest convenience

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1111) [Zebra] multiple outputs support

Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gaurav Jain updated PIG-1111:
-----------------------------

    Attachment: PIG-1111.patch

Source code and test cases for the feature

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1111) [Zebra] multiple outputs support

Posted by "Yan Zhou (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yan Zhou updated PIG-1111:
--------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed to the 0.6 branch as well.

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch, PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1111) [Zebra] multiple outputs support

Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785216#action_12785216 ] 

Gaurav Jain commented on PIG-1111:
----------------------------------


There was an code review feedback outside of jira by Yan Zhou.

1)       why build.xml needs any changes?

2)       BasicTableOutputFormat.IS_MULTI should be of package scope instead of public

3)       In RecordWriter::write() method, the check of "if(jobConf.getBoolean(BasicTableOutputFormat.IS_MULTI, false) == true)" should be replaced with a simple "if (op != null)". As a consequence, "jobConf" variable is not needed;

4)       A lot of RuntimeExceptions have been thrown, which should be replaced with IOException

5)       getRecordWriter:  why remove the check for Path's nullness? The patch seems to be inconsistent with what's on trunk. Patch says the check is completely removed; while the trunk has an empty check;

6)       TableRecordWriter: commaSeparatedLocs is never used;

7)       In getOutputPartition, why are setConf/getConf necessary? Just curious.


In the latest patch all the above issues have been addressed


> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch, PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1111) [Zebra] multiple outputs support

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785420#action_12785420 ] 

Hadoop QA commented on PIG-1111:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12426757/PIG-1111.patch
  against trunk revision 886650.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 13 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/82/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/82/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/82/console

This message is automatically generated.

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch, PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1111) [Zebra] multiple outputs support

Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gaurav Jain updated PIG-1111:
-----------------------------

    Status: Open  (was: Patch Available)


Submitting an update

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1111) [Zebra] multiple outputs support

Posted by "Jing Huang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jing Huang updated PIG-1111:
----------------------------

    Description: 
Zebra enables application to stream data into different zebra table instances.

New Interface added:

setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.

Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )

ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance

We also introduce a new mapred property for setting multiple outputs.

mapred.lib.table.multi.output.dirs
 

  was:

Zebra enables application to stream data into different zebra table instances.

New Interface added:

setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.

Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )

ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance

We also introduce a new mapred property for setting multiple outputs.

mapred.lib.table.multi.output.dirs
 

        Summary: [Zebra] multiple outputs support  (was: [Zebra])

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1111) [Zebra] multiple outputs support

Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785396#action_12785396 ] 

Gaurav Jain commented on PIG-1111:
----------------------------------


In response to feedback:

1) build.xml has tags to run multiple outputs tests

2) Changed to package scope

3) Change has been made

4) Throws IOException now

5) Was done as part of code cleaning. Null'ness check is done in getOuputPaths() now

6) This variable is taken out

7) Since ZebraOutputPartittion implements Configurable interface. setConf and getConf are interface methods

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch, PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1111) [Zebra] multiple outputs support

Posted by "Jing Huang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785429#action_12785429 ] 

Jing Huang commented on PIG-1111:
---------------------------------

In response to Chao's question, we need to add the system property "whichCluster" in  build.xml. Thus test cases for TestMultipleOutputs.java can be portable on miniCluster and realCluster. 

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch, PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1111) [Zebra] multiple outputs support

Posted by "Chao Wang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785431#action_12785431 ] 

Chao Wang commented on PIG-1111:
--------------------------------

+1

> [Zebra] multiple outputs support
> --------------------------------
>
>                 Key: PIG-1111
>                 URL: https://issues.apache.org/jira/browse/PIG-1111
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1111.patch, PIG-1111.patch
>
>
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<? extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations ( in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is implemented by the application. It will return an index into the list. Zebra will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
> mapred.lib.table.multi.output.dirs
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.