You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ted Malaska (Created) (JIRA)" <ji...@apache.org> on 2012/01/29 14:31:10 UTC

[jira] [Created] (PIG-2494) Improvement to SequenceFileLoader (NullWritable and Delimiter)

Improvement to SequenceFileLoader (NullWritable and Delimiter)
--------------------------------------------------------------

                 Key: PIG-2494
                 URL: https://issues.apache.org/jira/browse/PIG-2494
             Project: Pig
          Issue Type: Improvement
          Components: piggybank
    Affects Versions: 0.9.1
         Environment: All
            Reporter: Ted Malaska
            Priority: Minor


I wanted to add two features to SequenceFileLoader.
1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2494) Improvement to SequenceFileLoader (NullWritable and Delimiter)

Posted by "Ted Malaska (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Malaska updated PIG-2494:
-----------------------------

    Attachment: SequenceFileLoader.java

Here is my implementation
                
> Improvement to SequenceFileLoader (NullWritable and Delimiter)
> --------------------------------------------------------------
>
>                 Key: PIG-2494
>                 URL: https://issues.apache.org/jira/browse/PIG-2494
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.9.1
>         Environment: All
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>         Attachments: SequenceFileLoader.java
>
>
> I wanted to add two features to SequenceFileLoader.
> 1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
> 2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2494) Improvement to SequenceFileLoader (NullWritable and Delimiter)

Posted by "Joey Echeverria (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196169#comment-13196169 ] 

Joey Echeverria commented on PIG-2494:
--------------------------------------

Hi Ted,

Thanks for the contribution! Would you mind formatting your submission as a patch? You can find instructions on how to generate the patch here:

https://cwiki.apache.org/confluence/display/PIG/HowToContribute

This will make it easier to review your changes.
                
> Improvement to SequenceFileLoader (NullWritable and Delimiter)
> --------------------------------------------------------------
>
>                 Key: PIG-2494
>                 URL: https://issues.apache.org/jira/browse/PIG-2494
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.9.1
>         Environment: All
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>         Attachments: SequenceFileLoader.java
>
>
> I wanted to add two features to SequenceFileLoader.
> 1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
> 2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.
> My change is attached to this Issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2494) Improvement to SequenceFileLoader (NullWritable and Delimiter)

Posted by "Ted Malaska (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Malaska updated PIG-2494:
-----------------------------

    Description: 
I wanted to add two features to SequenceFileLoader.
1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.

My change is attached to this Jiri Issue.

  was:
I wanted to add two features to SequenceFileLoader.
1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.


    
> Improvement to SequenceFileLoader (NullWritable and Delimiter)
> --------------------------------------------------------------
>
>                 Key: PIG-2494
>                 URL: https://issues.apache.org/jira/browse/PIG-2494
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.9.1
>         Environment: All
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>         Attachments: SequenceFileLoader.java
>
>
> I wanted to add two features to SequenceFileLoader.
> 1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
> 2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.
> My change is attached to this Jiri Issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2494) Improvement to SequenceFileLoader (NullWritable and Delimiter)

Posted by "Ted Malaska (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Malaska updated PIG-2494:
-----------------------------

    Description: 
I wanted to add two features to SequenceFileLoader.
1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.

My change is attached to this Issue.

  was:
I wanted to add two features to SequenceFileLoader.
1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.

My change is attached to this Jiri Issue.

    
> Improvement to SequenceFileLoader (NullWritable and Delimiter)
> --------------------------------------------------------------
>
>                 Key: PIG-2494
>                 URL: https://issues.apache.org/jira/browse/PIG-2494
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.9.1
>         Environment: All
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie
>         Attachments: SequenceFileLoader.java
>
>
> I wanted to add two features to SequenceFileLoader.
> 1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
> 2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.
> My change is attached to this Issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2494) Improvement to SequenceFileLoader (NullWritable and Delimiter)

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264628#comment-13264628 ] 

Dmitriy V. Ryaboy commented on PIG-2494:
----------------------------------------

Note that a far more powerful version of a Sequence File Loader is available in Elephant-Bird: https://github.com/kevinweil/elephant-bird

This is a pretty small patch, though. Good one to practice patch submission on, if someone wanted to post it using the procedure Joey linked to above.
                
> Improvement to SequenceFileLoader (NullWritable and Delimiter)
> --------------------------------------------------------------
>
>                 Key: PIG-2494
>                 URL: https://issues.apache.org/jira/browse/PIG-2494
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.9.1
>         Environment: All
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: SequenceFileLoader.java
>
>
> I wanted to add two features to SequenceFileLoader.
> 1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
> 2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.
> My change is attached to this Issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2494) Improvement to SequenceFileLoader (NullWritable and Delimiter)

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447870#comment-13447870 ] 

Ted Malaska commented on PIG-2494:
----------------------------------

Hey Dmitriy,

I know it's been a long time but I'm going to try to finish this Issue # now. 

I just reviewed the SequenceFileLoader code in elephant-bird and it looks like the major piece to bring over is the idea of the converter and it's ability to transform the raw data and provide a schema for the outputting format.

This would add a lot of power to the existing implementation.

I'll start on this tonight.
                
> Improvement to SequenceFileLoader (NullWritable and Delimiter)
> --------------------------------------------------------------
>
>                 Key: PIG-2494
>                 URL: https://issues.apache.org/jira/browse/PIG-2494
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.9.1
>         Environment: All
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: SequenceFileLoader.java
>
>
> I wanted to add two features to SequenceFileLoader.
> 1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
> 2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.
> My change is attached to this Issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2494) Improvement to SequenceFileLoader (NullWritable and Delimiter)

Posted by "Ted Malaska (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448084#comment-13448084 ] 

Ted Malaska commented on PIG-2494:
----------------------------------

So I have four options on how I should address this issue #.

1. Update Sequence Loader so that it will be able to handle nullWritable keys and also handle delimiters like PigStorage.
2. All of option (1) plus update sequence loader to sequence storage so we can use it to dump out data in sequence files.
3. Bring the elephant-bird implementation over to piggybank and add support for delimiters.
4. Drop the whole delimiter thing because we can use TOKENIZE

Let me know.



                
> Improvement to SequenceFileLoader (NullWritable and Delimiter)
> --------------------------------------------------------------
>
>                 Key: PIG-2494
>                 URL: https://issues.apache.org/jira/browse/PIG-2494
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.9.1
>         Environment: All
>            Reporter: Ted Malaska
>            Priority: Minor
>              Labels: newbie, simple
>         Attachments: SequenceFileLoader.java
>
>
> I wanted to add two features to SequenceFileLoader.
> 1.	I added a delimiter so it will act more like PigStorage, in that it will Split the value if it is a type Text (chararray).
> 2.	I added the option of the key being a NullWritable.  I wanted to be able to process my Hive files in both Hive and Pig, but because my Hive sequence files have a NullWritable key I could not make this work with the current implementation of SequenceFileLoader.
> My change is attached to this Issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira