You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Siying Dong (JIRA)" <ji...@apache.org> on 2010/11/19 00:29:13 UTC

[jira] Created: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
----------------------------------------------------------------------------

                 Key: HIVE-1801
                 URL: https://issues.apache.org/jira/browse/HIVE-1801
             Project: Hive
          Issue Type: Improvement
            Reporter: Siying Dong
            Assignee: Siying Dong


HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.

We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934536#action_12934536 ] 

He Yongqiang commented on HIVE-1801:
------------------------------------

can you put "if (recordReader instanceof RCFileRecordReader)" at the same level with "else if (inputFormatClass.getName().contains("RCFile")) {"?

     } else if (inputFormatClass.getName().contains("RCFile")) {
-      RCFile.Reader in = new RCFile.Reader(fs, path, job);
       blockPointer = true;
-      in.sync(fileSplit.getStart());
-      blockStart = in.getPosition();
-      in.close();
+
+      if (recordReader instanceof RCFileRecordReader) {
+        blockStart = ((RCFileRecordReader)recordReader).getStart();
+      } else {
+        RCFile.Reader in = new RCFile.Reader(fs, path, job);
+        in.sync(fileSplit.getStart());
+        blockStart = in.getPosition();
+        in.close();
+      }


> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-1801
>                 URL: https://issues.apache.org/jira/browse/HIVE-1801
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1801.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935048#action_12935048 ] 

He Yongqiang commented on HIVE-1801:
------------------------------------

siying, the patch is not for this jira. 
can you upload a new patch?

> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-1801
>                 URL: https://issues.apache.org/jira/browse/HIVE-1801
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1801.1.patch, HIVE-1802.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1801:
------------------------------

    Attachment: HIVE-1802.1.patch

address Yongqiang's comment.

> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-1801
>                 URL: https://issues.apache.org/jira/browse/HIVE-1801
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1801.1.patch, HIVE-1802.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1801:
-------------------------------

    Status: Open  (was: Patch Available)

> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-1801
>                 URL: https://issues.apache.org/jira/browse/HIVE-1801
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1801.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1801:
------------------------------

    Status: Patch Available  (was: Open)

> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-1801
>                 URL: https://issues.apache.org/jira/browse/HIVE-1801
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1801.1.patch, HIVE-1802.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1801:
------------------------------

    Attachment: HIVE-1801.1.patch

Still running tests.

> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-1801
>                 URL: https://issues.apache.org/jira/browse/HIVE-1801
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1801.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1801:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed! Thanks Siying!

> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-1801
>                 URL: https://issues.apache.org/jira/browse/HIVE-1801
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1801.1.patch, HIVE-1801.2.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1801:
------------------------------

    Status: Patch Available  (was: Open)

> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-1801
>                 URL: https://issues.apache.org/jira/browse/HIVE-1801
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1801.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1801:
------------------------------

    Attachment: HIVE-1801.2.patch

This one should be right.

> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-1801
>                 URL: https://issues.apache.org/jira/browse/HIVE-1801
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1801.1.patch, HIVE-1801.2.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935079#action_12935079 ] 

He Yongqiang commented on HIVE-1801:
------------------------------------

+1 running tests.

> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-1801
>                 URL: https://issues.apache.org/jira/browse/HIVE-1801
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1801.1.patch, HIVE-1801.2.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1801) HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1801:
------------------------------

    Attachment:     (was: HIVE-1802.1.patch)

> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-1801
>                 URL: https://issues.apache.org/jira/browse/HIVE-1801
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1801.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.