You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Stan Rosenberg (Created) (JIRA)" <ji...@apache.org> on 2012/03/12 03:06:40 UTC

[jira] [Created] (PIG-2579) Support for multiple input schemas in AvroStorage

Support for multiple input schemas in AvroStorage
-------------------------------------------------

                 Key: PIG-2579
                 URL: https://issues.apache.org/jira/browse/PIG-2579
             Project: Pig
          Issue Type: New Feature
          Components: piggybank
            Reporter: Stan Rosenberg
            Priority: Minor
         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz

This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  

A simple illustrative example is attached: run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474550#comment-13474550 ] 

Santhosh Srinivasan commented on PIG-2579:
------------------------------------------

I ran all the unit test cases and for Hadoop23, there are 2 failures and 1 error. I verified that these failures and error were not related to this patch by reproducing them on the latest source from trunk.

{code}
~/src/apache/pig/trunk/contrib/piggybank/java/build/test/logs $ grep Failures TEST-org.apache.pig.piggybank.test.* | grep -v "Failures: 0"
TEST-org.apache.pig.piggybank.test.storage.TestDBStorage.txt:Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 8.462 sec
TEST-org.apache.pig.piggybank.test.storage.TestMultiStorage.txt:Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 7.989 sec

~/src/apache/pig/trunk/contrib/piggybank/java/build/test/logs $ grep Errors TEST-org.apache.pig.piggybank.test.* | grep -v "Errors: 0"
TEST-org.apache.pig.piggybank.test.evaluation.string.TestLookupInFiles.txt:Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 8.041 sec

{code}

The patch and the updated binaries for unit tests along with the deletions are now committed.

Thanks Cheolsoo.
                
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch, PIG-2579-4.patch, PIG-2579-5.patch, PIG-2579-6.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park reassigned PIG-2579:
----------------------------------

    Assignee: Cheolsoo Park
    
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475740#comment-13475740 ] 

Cheolsoo Park commented on PIG-2579:
------------------------------------

@Santhosh,
Thank you very much.

Btw, regarding the other test failures, I realized that I was hitting MAPREDUCE-3933, and I was able to fix them by setting MALLOC_ARENA_MAX to 4 on my CentOS 6 VM. Please see PIG-2966.

But I cannot reproduce the same failures on my Mac, which seems correct as this issue is CentOS-6-specific. Are you setting JAVA_HOME=`/usr/libexec/java_home` on your Mac? As far as I understand, those 3 tests are only ones that use MiniCluster, and not setting JAVA_HOME will make them fail.
                
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch, PIG-2579-4.patch, PIG-2579-5.patch, PIG-2579-6.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2579:
-------------------------------

    Attachment:     (was: PIG-2579-avro_test_files.tar.gz)
    
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2579:
-------------------------------

    Attachment: PIG-2579-4.patch

Updated the patch based on Santhosh's comments in review board.
                
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch, PIG-2579-4.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2579:
-------------------------------

    Attachment: PIG-2579-3.patch

I rebased the patch to trunk.
                
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446559#comment-13446559 ] 

Cheolsoo Park commented on PIG-2579:
------------------------------------

Review board:
https://reviews.apache.org/r/6884/diff/
                
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-avro_test_files.tar.gz, PIG-2579.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475488#comment-13475488 ] 

Cheolsoo Park commented on PIG-2579:
------------------------------------

@Santhosh,
I think that you omitted two files:
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testMultipleSchemas1.avro
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testMultipleSchemas2.avro

TestAvroStorage is failing due to missing files. Can you please commit them to trunk and brach-0.11?

Thanks!
                
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch, PIG-2579-4.patch, PIG-2579-5.patch, PIG-2579-6.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475507#comment-13475507 ] 

Santhosh Srinivasan commented on PIG-2579:
------------------------------------------

My apologies on missing out on adding these files. I have committed both of them to trunk and branch-0.11. Cheolsoo, thanks for pointing it out.
                
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch, PIG-2579-4.patch, PIG-2579-5.patch, PIG-2579-6.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2579:
-------------------------------

    Attachment: PIG-2579.patch
                PIG-2579-avro_test_files.tar.gz

I updated the original Stan's patch re-basing it to trunk. While I kept the core logic unchanged, I made some modifications as follows:
# Removed glob pattern related code as it's resolved in PIG-2492.
# Added an option 'multiple_schema' to AvroStorage. By default, AvroStorage assumes that all the input files have the same schema, but if 'multiple_schema' is passed to load function, it tries to merge every input schema.
# Allows multiple schemas with the same name. I use paths to identify schemas instead of their names.
# Refactored code.
# Added unit tests.

I think that the most arguable part is how to merge two different schemas into one. In shorts, the rules are as follows:
# Different primitive types can be merged if certain conditions are met. Please see AvroStorageUtils.mergeType() for more details.
# Only the same kind of complex types can be merged. e.g. record + record => ok, but record + array => error.
# For records, the union of fields is returned.
# For arrays/maps, their element types/value types are merged.
# For unions, the union of unions is returned.
# For fixeds, only the same size of fixeds can be merged.

It's easy to see in a unit test (TestAvroStorageUtils) what's expected when two schemas are merged.

Please let me know if you have any questions/concerns.

Thanks!
                
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-avro_test_files.tar.gz, PIG-2579.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-2579:
-------------------------------------

    Fix Version/s: 0.11
    
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch, PIG-2579-4.patch, PIG-2579-5.patch, PIG-2579-6.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-2579:
-------------------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Patch reviewed and committed.
                
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch, PIG-2579-4.patch, PIG-2579-5.patch, PIG-2579-6.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2579:
-------------------------------

    Attachment: PIG-2579-2-avro_test_files.tar.gz
                PIG-2579-2.patch
    
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-avro_test_files.tar.gz, PIG-2579.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474558#comment-13474558 ] 

Cheolsoo Park commented on PIG-2579:
------------------------------------

Thanks Santhosh.

I opened PIG-2966 for the piggybank test failures.
                
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch, PIG-2579-4.patch, PIG-2579-5.patch, PIG-2579-6.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2579:
-------------------------------

    Attachment:     (was: PIG-2579.patch)
    
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2579:
-------------------------------

    Attachment: PIG-2579-6.patch

Updating the patch.

@Santhosh,
Can you please also remove the following files when committing the patch? They are no longer used by tests so should be deleted.
{code}
#	deleted:    contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_generic_union_schema.avro
#	deleted:    contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_recursive_schema.avro
{code}
                
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch, PIG-2579-4.patch, PIG-2579-5.patch, PIG-2579-6.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Stan Rosenberg (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stan Rosenberg updated PIG-2579:
--------------------------------

          Description: 
This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  

A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

  was:
This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  

A simple illustrative example is attached: run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

    Affects Version/s: 0.11
                       0.9.2
    
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2579:
-------------------------------

    Status: Patch Available  (was: Open)
    
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-avro_test_files.tar.gz, PIG-2579.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2579:
-------------------------------

    Attachment: PIG-2579-5.patch
    
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>    Affects Versions: 0.9.2, 0.11
>            Reporter: Stan Rosenberg
>            Assignee: Cheolsoo Park
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz, PIG-2579-2-avro_test_files.tar.gz, PIG-2579-2.patch, PIG-2579-3.patch, PIG-2579-4.patch, PIG-2579-5.patch
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached (avro_storage_union_schema_test.tar.gz): run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2579) Support for multiple input schemas in AvroStorage

Posted by "Stan Rosenberg (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stan Rosenberg updated PIG-2579:
--------------------------------

    Attachment: avro_storage_union_schema_test.tar.gz
                avro_storage_union_schema.patch
    
> Support for multiple input schemas in AvroStorage
> -------------------------------------------------
>
>                 Key: PIG-2579
>                 URL: https://issues.apache.org/jira/browse/PIG-2579
>             Project: Pig
>          Issue Type: New Feature
>          Components: piggybank
>            Reporter: Stan Rosenberg
>            Priority: Minor
>         Attachments: avro_storage_union_schema.patch, avro_storage_union_schema_test.tar.gz
>
>
> This is a barebones patch for AvroStorage which enables support of multiple input schemas.  The assumption is that the input consists of avro files having different schemas that can be unioned, e.g., flat records.  
> A simple illustrative example is attached: run create_avro1.pig, followed by create_avro2.pig, followed by read_avro.pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira