You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2009/12/14 21:53:23 UTC

[jira] Created: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Allow instantiation of SampleLoaders with parametrized LoadFuncs
----------------------------------------------------------------

                 Key: PIG-1149
                 URL: https://issues.apache.org/jira/browse/PIG-1149
             Project: Pig
          Issue Type: Bug
            Reporter: Dmitriy V. Ryaboy
            Assignee: Dmitriy V. Ryaboy
            Priority: Minor
             Fix For: 0.7.0


Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793269#action_12793269 ] 

Thejas M Nair commented on PIG-1149:
------------------------------------

+1 to the lsr branch version.
But the FIXME comment in the test case is not correct. There does not have to be > 1 samples sampled for every map, if the number of rows are very small. Though this behavior is different from earlier version of the trunk version of poisson sampler, it satisfies the requirements as per http://wiki.apache.org/pig/PigSampler and PIG-1062.
I can remove the FIXME comment as part of the patch I am going to submit to fix the other test case.


> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch, pig_1149_lsr-branch.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791218#action_12791218 ] 

Hadoop QA commented on PIG-1149:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428005/pig_1149.patch
  against trunk revision 890596.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/126/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/126/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/126/console

This message is automatically generated.

> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-1149:
-----------------------------------

    Attachment: pig_1149.patch

> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792578#action_12792578 ] 

Pradeep Kamath commented on PIG-1149:
-------------------------------------

I tried applying it on the branch and it failed:
{noformat}
:/tmp/load-store-redesign]patch -p0 < /homes/pradeepk/dev/pig-apache/pig/trunk/pig_1149.patch 
patching file src/org/apache/pig/impl/builtin/SampleLoader.java
Hunk #1 succeeded at 31 with fuzz 2 (offset -4 lines).
Hunk #2 FAILED at 46.
1 out of 2 hunks FAILED -- saving rejects to file src/org/apache/pig/impl/builtin/SampleLoader.java.rej
patching file test/org/apache/pig/test/TestPoissonSampleLoader.java
[pradeepk@chargesize:/tmp/load-store-redesign]
{noformat}

Since Thejas worked on PIG-1062, he might be in a better position to check whether this patch needs changes.

> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-1149:
-----------------------------------

    Attachment: pig_1149_lsr-branch.patch

Attaching patch for lsr branch.
I also retabbed the involved files to replace tabs with spaces, and got rid of some unused imports.

Note the FIXME in the test case, as discussed.

> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch, pig_1149_lsr-branch.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792573#action_12792573 ] 

Alan Gates commented on PIG-1149:
---------------------------------

Changes look fine.

Pradeep, will this apply as is to the load-store redesign branch or will we need a separate patch for that?

> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai closed PIG-1149.
---------------------------


> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch, pig_1149_lsr-branch.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-1149:
-----------------------------------

    Status: Patch Available  (was: Open)

Due to the string being parsed a few times along the way, three backslashes need to precede the escaped quote in PigLatin. Which means six backslashes when expressing PigLatin as a string in Java. 

> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792627#action_12792627 ] 

Thejas M Nair commented on PIG-1149:
------------------------------------

I notice that this patch is using org.mortbay.log instead of org.apache.commons.logging. That is not used anywhere else in pig code. Should we replace that with org.apache.commons.logging ?

A small change is required to get the patch working with load-store branch. It no longer requires the load func to implement SampleLoader interface, and that interface has been removed. I can submit the modified patch. 
{code}
+        loader = (SamplableLoader)PigContext.instantiateFuncFromSpec(funcSpec);
{code}
changes to 
+        loader = (LoadFunc)PigContext.instantiateFuncFromSpec(funcSpec);
{code}


> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793321#action_12793321 ] 

Olga Natkovich commented on PIG-1149:
-------------------------------------

patch pig_1149.patch is committed to the trunk.

> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch, pig_1149_lsr-branch.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792698#action_12792698 ] 

Thejas M Nair commented on PIG-1149:
------------------------------------

I spoke too soon about the special string being unnecessary. GetMemNumRows uses it. I will add some comments to document that in PoissonSampleLoader .
In previous comment,  "special string gets added to the last row only"  should be "special string gets added to the last *sample* row only".


> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792689#action_12792689 ] 

Thejas M Nair commented on PIG-1149:
------------------------------------

The first test case failure is known, I will be fixing that with a patch in PIG-1094. 

The special string gets added to the last row only. But that looks unnecessary. I will be removing that with a new patch in PIG-1062. 

You can submit your patch for LSR branch patch, by checking for 5 columns in your test case. I will change your new test case as well when I submit  new PIG-1062 patch (to check for 4 columns).


> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792636#action_12792636 ] 

Dmitriy V. Ryaboy commented on PIG-1149:
----------------------------------------

The log reference and the log.info call can be completely removed (I can post a new patch, or you can just remove the change when integrating the patch).  That's Eclipse being "helpful"...

I already made a patch for LSR branch, but TestPoissonSampleLoader fails on both tests. The not sure why the first one fails, but in the second one (the one I added), the problem is that when I fetch the first result, I get an unexpected number of fields -- the tuple looks like this:

(100,apple1,aaa0,䥖㠸_pig_inTeRnal-spEcial_roW_num_tuple3kt579CFLehkblah,300L)

I don't recall -- does the special stuff get inserted into every row, or just the last one? What should I "properly" expect?

> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1149:
--------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Patch for lsr branch also committed, thanks Dmitriy!

> Allow instantiation of SampleLoaders with parametrized LoadFuncs
> ----------------------------------------------------------------
>
>                 Key: PIG-1149
>                 URL: https://issues.apache.org/jira/browse/PIG-1149
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: pig_1149.patch, pig_1149_lsr-branch.patch
>
>
> Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':').  We should allow passing parameters to the loaders being sampled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.