You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/08/24 19:41:59 UTC

[jira] Created: (HIVE-785) Add RecordWriter for ScriptOperator

Add RecordWriter for ScriptOperator
-----------------------------------

                 Key: HIVE-785
                 URL: https://issues.apache.org/jira/browse/HIVE-785
             Project: Hadoop Hive
          Issue Type: New Feature
    Affects Versions: 0.5.0
            Reporter: Zheng Shao
            Assignee: Namit Jain


HIVE-708 added RecordReader, but it is hardcoding a "RecordWriter" that uses newline for Text and write out data directly for BytesWritable.
We should make this configurable as well.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-785) Add RecordWriter for ScriptOperator

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750044#action_12750044 ] 

Namit Jain commented on HIVE-785:
---------------------------------

yes, the above code is copied from hadoop which we can remove once we stop supporting 17

> Add RecordWriter for ScriptOperator
> -----------------------------------
>
>                 Key: HIVE-785
>                 URL: https://issues.apache.org/jira/browse/HIVE-785
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>         Attachments: hive.785.1.patch, hive.785.2.patch, hive.785.3.patch
>
>
> HIVE-708 added RecordReader, but it is hardcoding a "RecordWriter" that uses newline for Text and write out data directly for BytesWritable.
> We should make this configurable as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-785) Add RecordWriter for ScriptOperator

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-785:
----------------------------

    Status: Patch Available  (was: Open)

> Add RecordWriter for ScriptOperator
> -----------------------------------
>
>                 Key: HIVE-785
>                 URL: https://issues.apache.org/jira/browse/HIVE-785
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>         Attachments: hive.785.1.patch
>
>
> HIVE-708 added RecordReader, but it is hardcoding a "RecordWriter" that uses newline for Text and write out data directly for BytesWritable.
> We should make this configurable as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-785) Add RecordWriter for ScriptOperator

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-785:
----------------------------

    Attachment: hive.785.3.patch

loaded a old patch by mistake previously

> Add RecordWriter for ScriptOperator
> -----------------------------------
>
>                 Key: HIVE-785
>                 URL: https://issues.apache.org/jira/browse/HIVE-785
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>         Attachments: hive.785.1.patch, hive.785.2.patch, hive.785.3.patch
>
>
> HIVE-708 added RecordReader, but it is hardcoding a "RecordWriter" that uses newline for Text and write out data directly for BytesWritable.
> We should make this configurable as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-785) Add RecordWriter for ScriptOperator

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-785:
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.5.0
     Release Note: HIVE-785. Add RecordWriter for ScriptOperator. (Namit Jain via zshao)
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Committed. Thanks Namit!

> Add RecordWriter for ScriptOperator
> -----------------------------------
>
>                 Key: HIVE-785
>                 URL: https://issues.apache.org/jira/browse/HIVE-785
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>             Fix For: 0.5.0
>
>         Attachments: hive.785.1.patch, hive.785.2.patch, hive.785.3.patch, hive.785.4.patch
>
>
> HIVE-708 added RecordReader, but it is hardcoding a "RecordWriter" that uses newline for Text and write out data directly for BytesWritable.
> We should make this configurable as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-785) Add RecordWriter for ScriptOperator

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750039#action_12750039 ] 

Zheng Shao commented on HIVE-785:
---------------------------------

@HIVE-785.2.patch:

{code}
+++ common/src/java/org/apache/hadoop/hive/conf/HiveConf.java	(working copy)
+    HIVESCRIPTRECORDREADER("hive.script.recordreader", "org.apache.hadoop.hive.contrib.util.typedbytes.TextRecordReader"),
+    HIVESCRIPTRECORDWRITER("hive.script.recordwriter", "org.apache.hadoop.hive.contrib.util.typedbytes.TextRecordWriter"),
{code}
TextRecordReader/Writable is in org.apache.hadoop.hive.ql.exec package.


{code}
+public class TypedBytesWritableOutput {
...
+  public void write(Writable w) throws IOException {
+    if (w instanceof TypedBytesWritable) {
+      writeTypedBytes((TypedBytesWritable) w);
+    } else if (w instanceof BytesWritable) {
+      writeBytes((BytesWritable) w);
+    } else if (w instanceof ByteWritable) {
...
{code}

This write method is not very efficient. I am OK with leaving it as it is (I guess it's from hadoop?), or we can optimize it by using object + objectInspector.


> Add RecordWriter for ScriptOperator
> -----------------------------------
>
>                 Key: HIVE-785
>                 URL: https://issues.apache.org/jira/browse/HIVE-785
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>         Attachments: hive.785.1.patch, hive.785.2.patch
>
>
> HIVE-708 added RecordReader, but it is hardcoding a "RecordWriter" that uses newline for Text and write out data directly for BytesWritable.
> We should make this configurable as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-785) Add RecordWriter for ScriptOperator

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-785:
----------------------------

    Attachment: hive.785.2.patch

some more cleanups 

> Add RecordWriter for ScriptOperator
> -----------------------------------
>
>                 Key: HIVE-785
>                 URL: https://issues.apache.org/jira/browse/HIVE-785
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>         Attachments: hive.785.1.patch, hive.785.2.patch
>
>
> HIVE-708 added RecordReader, but it is hardcoding a "RecordWriter" that uses newline for Text and write out data directly for BytesWritable.
> We should make this configurable as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-785) Add RecordWriter for ScriptOperator

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750052#action_12750052 ] 

Zheng Shao commented on HIVE-785:
---------------------------------

@HIVE-785.3.patch: 
I still see the same default values in HiveConf.java ...

> Add RecordWriter for ScriptOperator
> -----------------------------------
>
>                 Key: HIVE-785
>                 URL: https://issues.apache.org/jira/browse/HIVE-785
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>         Attachments: hive.785.1.patch, hive.785.2.patch, hive.785.3.patch
>
>
> HIVE-708 added RecordReader, but it is hardcoding a "RecordWriter" that uses newline for Text and write out data directly for BytesWritable.
> We should make this configurable as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-785) Add RecordWriter for ScriptOperator

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-785:
----------------------------

    Attachment: hive.785.4.patch

> Add RecordWriter for ScriptOperator
> -----------------------------------
>
>                 Key: HIVE-785
>                 URL: https://issues.apache.org/jira/browse/HIVE-785
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>         Attachments: hive.785.1.patch, hive.785.2.patch, hive.785.3.patch, hive.785.4.patch
>
>
> HIVE-708 added RecordReader, but it is hardcoding a "RecordWriter" that uses newline for Text and write out data directly for BytesWritable.
> We should make this configurable as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-785) Add RecordWriter for ScriptOperator

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-785:
----------------------------

    Attachment: hive.785.1.patch

> Add RecordWriter for ScriptOperator
> -----------------------------------
>
>                 Key: HIVE-785
>                 URL: https://issues.apache.org/jira/browse/HIVE-785
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>         Attachments: hive.785.1.patch
>
>
> HIVE-708 added RecordReader, but it is hardcoding a "RecordWriter" that uses newline for Text and write out data directly for BytesWritable.
> We should make this configurable as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-785) Add RecordWriter for ScriptOperator

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750097#action_12750097 ] 

Zheng Shao commented on HIVE-785:
---------------------------------

+1. Will test and commit.

> Add RecordWriter for ScriptOperator
> -----------------------------------
>
>                 Key: HIVE-785
>                 URL: https://issues.apache.org/jira/browse/HIVE-785
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.5.0
>            Reporter: Zheng Shao
>            Assignee: Namit Jain
>         Attachments: hive.785.1.patch, hive.785.2.patch, hive.785.3.patch, hive.785.4.patch
>
>
> HIVE-708 added RecordReader, but it is hardcoding a "RecordWriter" that uses newline for Text and write out data directly for BytesWritable.
> We should make this configurable as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.