You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Siying Dong (JIRA)" <ji...@apache.org> on 2010/11/19 00:29:13 UTC
[jira] Created: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
----------------------------------------------------------------------------
Key: HIVE-1801
URL: https://issues.apache.org/jira/browse/HIVE-1801
Project: Hive
Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934536#action_12934536 ]
He Yongqiang commented on HIVE-1801:
------------------------------------
can you put "if (recordReader instanceof RCFileRecordReader)" at the same level with "else if (inputFormatClass.getName().contains("RCFile")) {"?
} else if (inputFormatClass.getName().contains("RCFile")) {
- RCFile.Reader in = new RCFile.Reader(fs, path, job);
blockPointer = true;
- in.sync(fileSplit.getStart());
- blockStart = in.getPosition();
- in.close();
+
+ if (recordReader instanceof RCFileRecordReader) {
+ blockStart = ((RCFileRecordReader)recordReader).getStart();
+ } else {
+ RCFile.Reader in = new RCFile.Reader(fs, path, job);
+ in.sync(fileSplit.getStart());
+ blockStart = in.getPosition();
+ in.close();
+ }
> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
> Key: HIVE-1801
> URL: https://issues.apache.org/jira/browse/HIVE-1801
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1801.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935048#action_12935048 ]
He Yongqiang commented on HIVE-1801:
------------------------------------
siying, the patch is not for this jira.
can you upload a new patch?
> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
> Key: HIVE-1801
> URL: https://issues.apache.org/jira/browse/HIVE-1801
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1801.1.patch, HIVE-1802.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1801:
------------------------------
Attachment: HIVE-1802.1.patch
address Yongqiang's comment.
> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
> Key: HIVE-1801
> URL: https://issues.apache.org/jira/browse/HIVE-1801
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1801.1.patch, HIVE-1802.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1801:
-------------------------------
Status: Open (was: Patch Available)
> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
> Key: HIVE-1801
> URL: https://issues.apache.org/jira/browse/HIVE-1801
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1801.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1801:
------------------------------
Status: Patch Available (was: Open)
> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
> Key: HIVE-1801
> URL: https://issues.apache.org/jira/browse/HIVE-1801
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1801.1.patch, HIVE-1802.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1801:
------------------------------
Attachment: HIVE-1801.1.patch
Still running tests.
> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
> Key: HIVE-1801
> URL: https://issues.apache.org/jira/browse/HIVE-1801
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1801.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1801:
-------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Committed! Thanks Siying!
> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
> Key: HIVE-1801
> URL: https://issues.apache.org/jira/browse/HIVE-1801
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1801.1.patch, HIVE-1801.2.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1801:
------------------------------
Status: Patch Available (was: Open)
> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
> Key: HIVE-1801
> URL: https://issues.apache.org/jira/browse/HIVE-1801
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1801.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1801:
------------------------------
Attachment: HIVE-1801.2.patch
This one should be right.
> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
> Key: HIVE-1801
> URL: https://issues.apache.org/jira/browse/HIVE-1801
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1801.1.patch, HIVE-1801.2.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935079#action_12935079 ]
He Yongqiang commented on HIVE-1801:
------------------------------------
+1 running tests.
> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
> Key: HIVE-1801
> URL: https://issues.apache.org/jira/browse/HIVE-1801
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1801.1.patch, HIVE-1801.2.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1801) HiveInputFormat or
CombineHiveInputFormat always sync blocks of RCFile twice
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-1801:
------------------------------
Attachment: (was: HIVE-1802.1.patch)
> HiveInputFormat or CombineHiveInputFormat always sync blocks of RCFile twice
> ----------------------------------------------------------------------------
>
> Key: HIVE-1801
> URL: https://issues.apache.org/jira/browse/HIVE-1801
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Attachments: HIVE-1801.1.patch
>
>
> HiveInputFormat or CombineHiveInputFormat RCFile.Reader.sync() twice. One in getReader() and one in initIOContext(). We can avoid the latter one by read the sync() position of the former one.
> We also sync() twice for SequenceFile but since SequenceFileReader is not a part of Hive code, maybe we should be careful when depending on the implementation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.