You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2010/01/23 02:57:21 UTC

[jira] Created: (HIVE-1087) Let user script write out binary data into a table

Let user script write out binary data into a table
--------------------------------------------------

                 Key: HIVE-1087
                 URL: https://issues.apache.org/jira/browse/HIVE-1087
             Project: Hadoop Hive
          Issue Type: New Feature
    Affects Versions: 0.6.0
            Reporter: Zheng Shao
            Assignee: Zheng Shao
         Attachments: HIVE-1087.1.patch

We want to allow user script to write out binary stream data.
We don't need to understand the binary stream format, but we want to write the data as it is to disk.

Since inside hive everything is a row object, we need to add a RecordReader which can split the binary stream into records, and a BinaryOutputFormat which writes out the data as it is without any separators.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1087) Let user script write out binary data into a table

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804741#action_12804741 ] 

Ning Zhang commented on HIVE-1087:
----------------------------------

+1. looks good. Will commit after tests pass.

> Let user script write out binary data into a table
> --------------------------------------------------
>
>                 Key: HIVE-1087
>                 URL: https://issues.apache.org/jira/browse/HIVE-1087
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-1087.1.patch
>
>
> We want to allow user script to write out binary stream data.
> We don't need to understand the binary stream format, but we want to write the data as it is to disk.
> Since inside hive everything is a row object, we need to add a RecordReader which can split the binary stream into records, and a BinaryOutputFormat which writes out the data as it is without any separators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1087) Let user script write out binary data into a table

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1087:
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.6.0
           Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Zheng!

> Let user script write out binary data into a table
> --------------------------------------------------
>
>                 Key: HIVE-1087
>                 URL: https://issues.apache.org/jira/browse/HIVE-1087
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1087.1.patch, HIVE-1087.2.patch
>
>
> We want to allow user script to write out binary stream data.
> We don't need to understand the binary stream format, but we want to write the data as it is to disk.
> Since inside hive everything is a row object, we need to add a RecordReader which can split the binary stream into records, and a BinaryOutputFormat which writes out the data as it is without any separators.
> Example:
> {code}
> DROP TABLE dest1;
> -- Create a table with binary output format
> CREATE TABLE dest1(mydata STRING)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES (
>   'serialization.last.column.takes.rest'='true'
> )
> STORED AS
>   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
>   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveBinaryOutputFormat';
> -- Insert into that table using transform
> EXPLAIN EXTENDED
> INSERT OVERWRITE TABLE dest1
> SELECT TRANSFORM(*)
>   USING 'cat'
>   AS mydata STRING
>     ROW FORMAT SERDE
>       'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>     WITH SERDEPROPERTIES (
>       'serialization.last.column.takes.rest'='true'
>     )
>     RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
> FROM src;
> INSERT OVERWRITE TABLE dest1
> SELECT TRANSFORM(*)
>   USING 'cat'
>   AS mydata STRING
>     ROW FORMAT SERDE
>       'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>     WITH SERDEPROPERTIES (
>       'serialization.last.column.takes.rest'='true'
>     )
>     RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
> FROM src;
> -- Test the result
> SELECT * FROM dest1;
> DROP TABLE dest1;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1087) Let user script write out binary data into a table

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1087:
-----------------------------

    Attachment: HIVE-1087.2.patch

Fixed the compilation problem with hadoop 0.17.


> Let user script write out binary data into a table
> --------------------------------------------------
>
>                 Key: HIVE-1087
>                 URL: https://issues.apache.org/jira/browse/HIVE-1087
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-1087.1.patch, HIVE-1087.2.patch
>
>
> We want to allow user script to write out binary stream data.
> We don't need to understand the binary stream format, but we want to write the data as it is to disk.
> Since inside hive everything is a row object, we need to add a RecordReader which can split the binary stream into records, and a BinaryOutputFormat which writes out the data as it is without any separators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1087) Let user script write out binary data into a table

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1087:
-----------------------------

    Status: Patch Available  (was: Open)

> Let user script write out binary data into a table
> --------------------------------------------------
>
>                 Key: HIVE-1087
>                 URL: https://issues.apache.org/jira/browse/HIVE-1087
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-1087.1.patch
>
>
> We want to allow user script to write out binary stream data.
> We don't need to understand the binary stream format, but we want to write the data as it is to disk.
> Since inside hive everything is a row object, we need to add a RecordReader which can split the binary stream into records, and a BinaryOutputFormat which writes out the data as it is without any separators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1087) Let user script write out binary data into a table

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804835#action_12804835 ] 

Ning Zhang commented on HIVE-1087:
----------------------------------

The test passed on hadoop 0.20, but didn't pass on 0.17. Can you take a look?

> Let user script write out binary data into a table
> --------------------------------------------------
>
>                 Key: HIVE-1087
>                 URL: https://issues.apache.org/jira/browse/HIVE-1087
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-1087.1.patch
>
>
> We want to allow user script to write out binary stream data.
> We don't need to understand the binary stream format, but we want to write the data as it is to disk.
> Since inside hive everything is a row object, we need to add a RecordReader which can split the binary stream into records, and a BinaryOutputFormat which writes out the data as it is without any separators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1087) Let user script write out binary data into a table

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1087:
-----------------------------

    Description: 
We want to allow user script to write out binary stream data.
We don't need to understand the binary stream format, but we want to write the data as it is to disk.

Since inside hive everything is a row object, we need to add a RecordReader which can split the binary stream into records, and a BinaryOutputFormat which writes out the data as it is without any separators.

Example:
{code}
DROP TABLE dest1;

-- Create a table with binary output format
CREATE TABLE dest1(mydata STRING)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.last.column.takes.rest'='true'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveBinaryOutputFormat';

-- Insert into that table using transform
EXPLAIN EXTENDED
INSERT OVERWRITE TABLE dest1
SELECT TRANSFORM(*)
  USING 'cat'
  AS mydata STRING
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
    WITH SERDEPROPERTIES (
      'serialization.last.column.takes.rest'='true'
    )
    RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
FROM src;

INSERT OVERWRITE TABLE dest1
SELECT TRANSFORM(*)
  USING 'cat'
  AS mydata STRING
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
    WITH SERDEPROPERTIES (
      'serialization.last.column.takes.rest'='true'
    )
    RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
FROM src;

-- Test the result
SELECT * FROM dest1;

DROP TABLE dest1;
{code}


  was:
We want to allow user script to write out binary stream data.
We don't need to understand the binary stream format, but we want to write the data as it is to disk.

Since inside hive everything is a row object, we need to add a RecordReader which can split the binary stream into records, and a BinaryOutputFormat which writes out the data as it is without any separators.



> Let user script write out binary data into a table
> --------------------------------------------------
>
>                 Key: HIVE-1087
>                 URL: https://issues.apache.org/jira/browse/HIVE-1087
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-1087.1.patch, HIVE-1087.2.patch
>
>
> We want to allow user script to write out binary stream data.
> We don't need to understand the binary stream format, but we want to write the data as it is to disk.
> Since inside hive everything is a row object, we need to add a RecordReader which can split the binary stream into records, and a BinaryOutputFormat which writes out the data as it is without any separators.
> Example:
> {code}
> DROP TABLE dest1;
> -- Create a table with binary output format
> CREATE TABLE dest1(mydata STRING)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> WITH SERDEPROPERTIES (
>   'serialization.last.column.takes.rest'='true'
> )
> STORED AS
>   INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
>   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveBinaryOutputFormat';
> -- Insert into that table using transform
> EXPLAIN EXTENDED
> INSERT OVERWRITE TABLE dest1
> SELECT TRANSFORM(*)
>   USING 'cat'
>   AS mydata STRING
>     ROW FORMAT SERDE
>       'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>     WITH SERDEPROPERTIES (
>       'serialization.last.column.takes.rest'='true'
>     )
>     RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
> FROM src;
> INSERT OVERWRITE TABLE dest1
> SELECT TRANSFORM(*)
>   USING 'cat'
>   AS mydata STRING
>     ROW FORMAT SERDE
>       'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>     WITH SERDEPROPERTIES (
>       'serialization.last.column.takes.rest'='true'
>     )
>     RECORDREADER 'org.apache.hadoop.hive.ql.exec.BinaryRecordReader'
> FROM src;
> -- Test the result
> SELECT * FROM dest1;
> DROP TABLE dest1;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1087) Let user script write out binary data into a table

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1087:
-----------------------------

    Attachment: HIVE-1087.1.patch

> Let user script write out binary data into a table
> --------------------------------------------------
>
>                 Key: HIVE-1087
>                 URL: https://issues.apache.org/jira/browse/HIVE-1087
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-1087.1.patch
>
>
> We want to allow user script to write out binary stream data.
> We don't need to understand the binary stream format, but we want to write the data as it is to disk.
> Since inside hive everything is a row object, we need to add a RecordReader which can split the binary stream into records, and a BinaryOutputFormat which writes out the data as it is without any separators.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.