You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2009/12/16 18:50:18 UTC

[jira] Created: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators

Add aliases to ExecJobs and PhysicalOperators
---------------------------------------------

                 Key: PIG-1156
                 URL: https://issues.apache.org/jira/browse/PIG-1156
             Project: Pig
          Issue Type: Improvement
            Reporter: Dmitriy V. Ryaboy
            Assignee: Dmitriy V. Ryaboy
             Fix For: 0.7.0


Currently, the way to use muti-query from Java is as follows:

1.  pigServer.setBatchOn();
2. register your queries with pigServer
3. List<ExecJob> jobs = pigServer.executeBatch();
4. for (ExecJob job : jobs) { Iterator<Tuple> results = job.getResults(); }

This will cause all stores to get evaluated in a single batch. However, there is no way to identify which of the ExecJobs corresponds to which store.  We should add aliases by which the stored relations are known to ExecJob in order to allow the user to identify what the jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-1156:
-----------------------------------

    Attachment: pig_batchAliases.patch

> Add aliases to ExecJobs and PhysicalOperators
> ---------------------------------------------
>
>                 Key: PIG-1156
>                 URL: https://issues.apache.org/jira/browse/PIG-1156
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.7.0
>
>         Attachments: pig_batchAliases.patch
>
>
> Currently, the way to use muti-query from Java is as follows:
> 1.  pigServer.setBatchOn();
> 2. register your queries with pigServer
> 3. List<ExecJob> jobs = pigServer.executeBatch();
> 4. for (ExecJob job : jobs) { Iterator<Tuple> results = job.getResults(); }
> This will cause all stores to get evaluated in a single batch. However, there is no way to identify which of the ExecJobs corresponds to which store.  We should add aliases by which the stored relations are known to ExecJob in order to allow the user to identify what the jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai closed PIG-1156.
---------------------------


> Add aliases to ExecJobs and PhysicalOperators
> ---------------------------------------------
>
>                 Key: PIG-1156
>                 URL: https://issues.apache.org/jira/browse/PIG-1156
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.7.0
>
>         Attachments: pig_batchAliases.patch
>
>
> Currently, the way to use muti-query from Java is as follows:
> 1.  pigServer.setBatchOn();
> 2. register your queries with pigServer
> 3. List<ExecJob> jobs = pigServer.executeBatch();
> 4. for (ExecJob job : jobs) { Iterator<Tuple> results = job.getResults(); }
> This will cause all stores to get evaluated in a single batch. However, there is no way to identify which of the ExecJobs corresponds to which store.  We should add aliases by which the stored relations are known to ExecJob in order to allow the user to identify what the jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791489#action_12791489 ] 

Dmitriy V. Ryaboy commented on PIG-1156:
----------------------------------------

Attached patch adds a new field, alias, to ExecJob and to PhysicalOperator.

The PhysicalOperator alias is set to the alias of the LogicalOperator that was compiled into this PO. In cases when multiple POs are needed to represent a single LogicalOperator, all of them get the same alias. Note that this means *there is a one-to-many correspondence* between the LogicalOperator aliases and PhysicalOperator aliases.

POStore is assigned the alias of the relation being stored -- so, "store A into ...." will have the alias 'A'.

ExecJob also gets an alias, which is assigned to it based on the alias of its POStore.

This allows us to call pigServer.executeBatch(), get a List of ExecJobs, and identify the ExecJobs based on the name of the relation they stored -- allowing us to get appropriate result iterators.

Note that adding aliases to PhysicalOperators will allow us to generate more meaningful plans and error messages, as users will be able to correlate elements of the physical plan with their PigLatin job. This means we are a step closer to solving PIG-908


> Add aliases to ExecJobs and PhysicalOperators
> ---------------------------------------------
>
>                 Key: PIG-1156
>                 URL: https://issues.apache.org/jira/browse/PIG-1156
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.7.0
>
>         Attachments: pig_batchAliases.patch
>
>
> Currently, the way to use muti-query from Java is as follows:
> 1.  pigServer.setBatchOn();
> 2. register your queries with pigServer
> 3. List<ExecJob> jobs = pigServer.executeBatch();
> 4. for (ExecJob job : jobs) { Iterator<Tuple> results = job.getResults(); }
> This will cause all stores to get evaluated in a single batch. However, there is no way to identify which of the ExecJobs corresponds to which store.  We should add aliases by which the stored relations are known to ExecJob in order to allow the user to identify what the jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1156:
----------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch committed.  Thanks Dmitriy.

> Add aliases to ExecJobs and PhysicalOperators
> ---------------------------------------------
>
>                 Key: PIG-1156
>                 URL: https://issues.apache.org/jira/browse/PIG-1156
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.7.0
>
>         Attachments: pig_batchAliases.patch
>
>
> Currently, the way to use muti-query from Java is as follows:
> 1.  pigServer.setBatchOn();
> 2. register your queries with pigServer
> 3. List<ExecJob> jobs = pigServer.executeBatch();
> 4. for (ExecJob job : jobs) { Iterator<Tuple> results = job.getResults(); }
> This will cause all stores to get evaluated in a single batch. However, there is no way to identify which of the ExecJobs corresponds to which store.  We should add aliases by which the stored relations are known to ExecJob in order to allow the user to identify what the jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792575#action_12792575 ] 

Alan Gates commented on PIG-1156:
---------------------------------

No worries on the release audit warnings.  I'll review the patch.

> Add aliases to ExecJobs and PhysicalOperators
> ---------------------------------------------
>
>                 Key: PIG-1156
>                 URL: https://issues.apache.org/jira/browse/PIG-1156
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.7.0
>
>         Attachments: pig_batchAliases.patch
>
>
> Currently, the way to use muti-query from Java is as follows:
> 1.  pigServer.setBatchOn();
> 2. register your queries with pigServer
> 3. List<ExecJob> jobs = pigServer.executeBatch();
> 4. for (ExecJob job : jobs) { Iterator<Tuple> results = job.getResults(); }
> This will cause all stores to get evaluated in a single batch. However, there is no way to identify which of the ExecJobs corresponds to which store.  We should add aliases by which the stored relations are known to ExecJob in order to allow the user to identify what the jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791768#action_12791768 ] 

Dmitriy V. Ryaboy commented on PIG-1156:
----------------------------------------

The release audit warnings are about lack of Apache license on the html doc files that are automatically generated. Not sure what I can do about that.

Please review / advise.

> Add aliases to ExecJobs and PhysicalOperators
> ---------------------------------------------
>
>                 Key: PIG-1156
>                 URL: https://issues.apache.org/jira/browse/PIG-1156
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.7.0
>
>         Attachments: pig_batchAliases.patch
>
>
> Currently, the way to use muti-query from Java is as follows:
> 1.  pigServer.setBatchOn();
> 2. register your queries with pigServer
> 3. List<ExecJob> jobs = pigServer.executeBatch();
> 4. for (ExecJob job : jobs) { Iterator<Tuple> results = job.getResults(); }
> This will cause all stores to get evaluated in a single batch. However, there is no way to identify which of the ExecJobs corresponds to which store.  We should add aliases by which the stored relations are known to ExecJob in order to allow the user to identify what the jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-1156:
-----------------------------------

    Status: Patch Available  (was: Open)

> Add aliases to ExecJobs and PhysicalOperators
> ---------------------------------------------
>
>                 Key: PIG-1156
>                 URL: https://issues.apache.org/jira/browse/PIG-1156
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.7.0
>
>         Attachments: pig_batchAliases.patch
>
>
> Currently, the way to use muti-query from Java is as follows:
> 1.  pigServer.setBatchOn();
> 2. register your queries with pigServer
> 3. List<ExecJob> jobs = pigServer.executeBatch();
> 4. for (ExecJob job : jobs) { Iterator<Tuple> results = job.getResults(); }
> This will cause all stores to get evaluated in a single batch. However, there is no way to identify which of the ExecJobs corresponds to which store.  We should add aliases by which the stored relations are known to ExecJob in order to allow the user to identify what the jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791745#action_12791745 ] 

Hadoop QA commented on PIG-1156:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428195/pig_batchAliases.patch
  against trunk revision 891419.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 7 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 406 release audit warnings (more than the trunk's current 402 warnings).

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/132/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/132/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/132/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/132/console

This message is automatically generated.

> Add aliases to ExecJobs and PhysicalOperators
> ---------------------------------------------
>
>                 Key: PIG-1156
>                 URL: https://issues.apache.org/jira/browse/PIG-1156
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.7.0
>
>         Attachments: pig_batchAliases.patch
>
>
> Currently, the way to use muti-query from Java is as follows:
> 1.  pigServer.setBatchOn();
> 2. register your queries with pigServer
> 3. List<ExecJob> jobs = pigServer.executeBatch();
> 4. for (ExecJob job : jobs) { Iterator<Tuple> results = job.getResults(); }
> This will cause all stores to get evaluated in a single batch. However, there is no way to identify which of the ExecJobs corresponds to which store.  We should add aliases by which the stored relations are known to ExecJob in order to allow the user to identify what the jobs correspond do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.