You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2011/07/15 03:08:01 UTC

[jira] [Created] (PIG-2166) UDFs to flatten a bag

UDFs to flatten a bag
---------------------

                 Key: PIG-2166
                 URL: https://issues.apache.org/jira/browse/PIG-2166
             Project: Pig
          Issue Type: Improvement
            Reporter: Daniel Dai
            Priority: Minor


Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2166) UDFs to join a bag

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2166:
----------------------------

    Summary: UDFs to join a bag  (was: UDFs to flatten a bag)
    
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292149#comment-13292149 ] 

Daniel Dai commented on PIG-2166:
---------------------------------

Hi, Hien,
Are you able to run e2e tests? Do you need any help?
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: PIG-2166.diff, bagtotuplestring.diff, test_harnesss_1338753364
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2166) UDFs to join a bag

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2166:
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.11
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

+1.

Patch committed to trunk.

Thanks Hien!
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>             Fix For: 0.11
>
>         Attachments: PIG-2166-e2e.diff, PIG-2166.diff, bagtotuplestring.diff, test_harnesss_1338753364
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290753#comment-13290753 ] 

Hien Luu commented on PIG-2166:
-------------------------------

Awesome.  I was able to make some progress after "svn up".
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: PIG-2166.diff, bagtotuplestring.diff, test_harnesss_1338753364
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2166) UDFs to join a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hien Luu updated PIG-2166:
--------------------------

    Status: Patch Available  (was: Open)

I was testing these UDFs in a Pig script by modifying one of the scripts in tutorial directory.  It was a manual testing so it is not ideal.  

The testing through calling the exec method is in class TestBuiltInBagToTupleOrString.java.

For these two UDFs, is it necessary to test them in a Pig script?  
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2166) UDFs to join a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hien Luu updated PIG-2166:
--------------------------

    Attachment: PIG-2166-e2e.diff

Added 2 test groups to nightly.conf.  A total of 4 tests and they all passed.
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: PIG-2166-e2e.diff, PIG-2166.diff, bagtotuplestring.diff, test_harnesss_1338753364
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281704#comment-13281704 ] 

Thejas M Nair commented on PIG-2166:
------------------------------------

Bags in pig are expected to be bags containing tuples. So the bag should actually be - {(a),({(b,c)}),(d)}, and the output of BagToTuple on it should be same as what Daniel said - (a,{(b,c)},d) . 

                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292185#comment-13292185 ] 

Hien Luu commented on PIG-2166:
-------------------------------

Yes, I am able to run e2e tests.  I am hoping to finish adding tests to nightly.conf this weekend :)
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: PIG-2166.diff, bagtotuplestring.diff, test_harnesss_1338753364
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to flatten a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279595#comment-13279595 ] 

Hien Luu commented on PIG-2166:
-------------------------------

Hi Daniel,

These two UDFs will perform flattening only at the first level right? They don't need to recursively flatten nested bags, do they? 
 
For example:
Input: {(a),{(b,c)},(d)} the output will be (a,{(b,c)},d) or should it be (a,b,c,d)

I don't think they should, but just wanted to double check.
                
> UDFs to flatten a bag
> ---------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285866#comment-13285866 ] 

Daniel Dai commented on PIG-2166:
---------------------------------

Hi, Hien,
Patch looks good. For BagToString, it is better to have a default delimit, so it does not complain if we don't pass a delimit. 

It also makes sense to add some e2e tests to test/e2e/pig/tests/nightly.conf. You can use the input file studentcomplextab10k which contains bag. Refer to https://cwiki.apache.org/confluence/display/PIG/HowToTest for how to run e2e tests.
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286311#comment-13286311 ] 

Hien Luu commented on PIG-2166:
-------------------------------

I ran in a problem when trying to generate test data using the command "ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir -Dharness.cluster.bin=hadoop_script test-e2e-deploy" on https://cwiki.apache.org/confluence/display/PIG/HowToTest page.

Can't locate IPC/Run.pm in @INC (@INC contains: . . . ./libexec . . ./libexec /Library/Perl/Updates/5.10.0 /System/Library/Perl/

Then I tried to install IPC::Run perl module and ran into another error.

On cpan.org, there is paragraph:
"OSX comes with Perl pre-installed, in order to build and install your own modules you will need to install the 'developer' package which can be found on your OSX install DVD (you only need the 'unix tools'). Once you have done this you can use all of the tools mentioned above."

Do you know the 'developer' package it is talking about?

BTW, I am on Mac OSX.

Thanks,

Hien
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286692#comment-13286692 ] 

Hien Luu commented on PIG-2166:
-------------------------------

I had to upgrade to xcode version 3.2 to get over the IPC::Run installation error.  It was complaining about missing some header file.

Does it really take 10 hours to complete the e2e tests when running in local mode?  Is there a way to run a specific set of tests only?
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290315#comment-13290315 ] 

Daniel Dai commented on PIG-2166:
---------------------------------

We fixed it on trunk and 0.10 branch. Please try "svn up", and run again. (Don't need PH_BENCHMARK_CACHE_PATH)
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: PIG-2166.diff, bagtotuplestring.diff, test_harnesss_1338753364
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290285#comment-13290285 ] 

Hien Luu commented on PIG-2166:
-------------------------------

How do I set a value for this environment variable?

I tried to add an environment variable to my .bash_profile like below and still ran into the same issue:

export PH_BENCHMARK_CACHE_PATH=/Users/hluu/dev/pig_project/pig/cache

Here is the error in the test harness log file (<pig home>/test/e2e/pig/testdist/out/log/test_harnesss_1339002322):


sort ./out/pigtest/hluu/hluu.1339002409/Distinct_1.out/out_original
ERROR TestDriver::run at : 470 Failed to run test Distinct_1 <Unable to open file ${PH_BENCHMARK_CACHE_PATH}/Distinct_1_benchmark.pig to write pig script, No such file or directory
>


This issue is blocking my progress.  Please let me know what needs to be done so I can move forward.

Thanks.
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: PIG-2166.diff, bagtotuplestring.diff, test_harnesss_1338753364
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2166) UDFs to join a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hien Luu updated PIG-2166:
--------------------------

    Attachment: bagtotuplestring.diff

I tried my best to follow the convention about what exception to throw in UDF.  Let me know if I missed anything.
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Gianmarco De Francisci Morales (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281714#comment-13281714 ] 

Gianmarco De Francisci Morales commented on PIG-2166:
-----------------------------------------------------

I think we need at least 2 delimiters, one for bag elements (which are tuples) and one for tuple elements (which are anything), but I am not sure it is worth supporting nested structures in the tuples.
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to flatten a bag

Posted by "Julien Le Dem (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279202#comment-13279202 ] 

Julien Le Dem commented on PIG-2166:
------------------------------------

Hi Alan, I was merely commenting on the title of the JIRA, not the UDF name.
                
> UDFs to flatten a bag
> ---------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289821#comment-13289821 ] 

Daniel Dai commented on PIG-2166:
---------------------------------

It is the cache directory for benchmark files. 
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: PIG-2166.diff, bagtotuplestring.diff, test_harnesss_1338753364
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to flatten a bag

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279128#comment-13279128 ] 

Alan Gates commented on PIG-2166:
---------------------------------

-1 to join.  We already use that for another concept.  What's wrong with BagToTuple and BagToString?
                
> UDFs to flatten a bag
> ---------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286700#comment-13286700 ] 

Thejas M Nair commented on PIG-2166:
------------------------------------

bq. Is there a way to run a specific set of tests only?
Yes, for example to run the test number 1 in Checkin test group, add the following param to command line:   {{-Dtests.to.run="-t Checkin_1"}}
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to flatten a bag

Posted by "Julien Le Dem (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13275913#comment-13275913 ] 

Julien Le Dem commented on PIG-2166:
------------------------------------

We should not say "flatten" in that case. What about saying "join" ?
                
> UDFs to flatten a bag
> ---------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286729#comment-13286729 ] 

Hien Luu commented on PIG-2166:
-------------------------------

Cool. Thanks for the answers and suggestions.  Very helpful.
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to flatten a bag

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276056#comment-13276056 ] 

Daniel Dai commented on PIG-2166:
---------------------------------

Sounds good.
                
> UDFs to flatten a bag
> ---------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2166) UDFs to join a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hien Luu updated PIG-2166:
--------------------------

    Attachment: test_harnesss_1338753364

I tried to add more tests to nightly.conf and I kept getting an error when trying to run it.

Here is the command I used to run a specific test:

ant -Dharness.old.pig=/Users/hluu/dev/pig_project/pig/old_pig/pig-0.10.0 -Dharness.cluster.conf=hadoop_conf_dir -Dharness.cluster.bin=hadoop_script test-e2e-local -Dtests.to.run="-t BagToString_1"

The test log file is attached and the error is on line 412.

Here is line 412:

ERROR TestDriver::run at : 470 Failed to run test BagToString_1 <Unable to open file ${PH_BENCHMARK_CACHE_PATH}/BagToString_1_benchmark.pig to write pig script, No such file or directory
>


Any ideas? What is PH_BENCHMARK_CACHE_PATH?

Thanks.
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: PIG-2166.diff, bagtotuplestring.diff, test_harnesss_1338753364
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Julien Le Dem (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286721#comment-13286721 ] 

Julien Le Dem commented on PIG-2166:
------------------------------------

Here is an example of how you could test your UDF in a pig script from a java unit test:
http://svn.apache.org/viewvc/pig/trunk/test/org/apache/pig/builtin/mock/TestMockStorage.java?revision=1331070&view=markup
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2166) UDFs to join a bag

Posted by "Hien Luu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hien Luu updated PIG-2166:
--------------------------

    Attachment: PIG-2166.diff

Added support for default delimiter in BagToString UDF and more tests using embedded PigServer.
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: PIG-2166.diff, bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286676#comment-13286676 ] 

Thejas M Nair commented on PIG-2166:
------------------------------------

this might also work for you - cpan install IPC::Run
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to flatten a bag

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279988#comment-13279988 ] 

Daniel Dai commented on PIG-2166:
---------------------------------

Hi, Hien, I don't think either. BagToTuple({(a),{(b,c)},(d)}) should be (a,{(b,c)},d). Otherwise, the result is ambiguous. 


                
> UDFs to flatten a bag
> ---------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286317#comment-13286317 ] 

Daniel Dai commented on PIG-2166:
---------------------------------

You need to install IPC::Run module. I usually download it from http://search.cpan.org/~toddr/IPC-Run-0.91/lib/IPC/Run.pm, build and install it.
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2166) UDFs to join a bag

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286705#comment-13286705 ] 

Prashant Kommireddi commented on PIG-2166:
------------------------------------------

You can also run a single unit test if you would like https://cwiki.apache.org/PIG/howtotest.html#HowToTest-Runningasingleunittest
                
> UDFs to join a bag
> ------------------
>
>                 Key: PIG-2166
>                 URL: https://issues.apache.org/jira/browse/PIG-2166
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Hien Luu
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: bagtotuplestring.diff
>
>
> Get several request for a UDF to flatten a bag. Seems reasonable to create one in builtin:
> 1. BagToTuple: {(a),(b),(c)} -> (a,b,c)
> 2. BagToString(delimit="_"): {(a),(b),(c) -> "a_b_c"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira