You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "He Yongqiang (JIRA)" <ji...@apache.org> on 2010/01/23 21:00:18 UTC
[jira] Created: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
----------------------------------------------------------------------------------------------------
Key: HIVE-1088
URL: https://issues.apache.org/jira/browse/HIVE-1088
Project: Hadoop Hive
Issue Type: Bug
Reporter: He Yongqiang
Fix For: 0.5.0, 0.6.0
Attachments: hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.patch
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-1088:
---------------------------------
Component/s: Serializers/Deserializers
Affects Version/s: (was: 0.6.0)
(was: 0.5.0)
Fix Version/s: (was: 0.6.0)
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: hive-1088-branch0.5-2010-1-25.2.patch, hive-1088-branch0.5-2010-1-25.patch, hive-1088-trunk-2010-1-25.2.patch, hive-1088-trunk-2010-1-25.patch, hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1088:
-------------------------------
Attachment: hive-1088-trunk-2010-1-25.patch
hive-1088-branch0.5-2010-1-25.patch
Attached 2 patches. hive-1088-branch0.5-2010-1-25.patch is for branch 0.5, and hive-1088-trunk-2010-1-25.patch is for trunk code.
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.5.0, 0.6.0
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Priority: Blocker
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-1088-branch0.5-2010-1-25.patch, hive-1088-trunk-2010-1-25.patch, hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1088:
-------------------------------
Attachment: (was: hive-1088-trunk-2010-1-25.2.patch)
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.5.0, 0.6.0
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Priority: Blocker
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-1088-branch0.5-2010-1-25.patch, hive-1088-trunk-2010-1-25.patch, hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1088:
-------------------------------
Attachment: hive-1088-trunk-2010-1-25.2.patch
hive-1088-branch0.5-2010-1-25.2.patch
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.5.0, 0.6.0
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Priority: Blocker
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-1088-branch0.5-2010-1-25.2.patch, hive-1088-branch0.5-2010-1-25.patch, hive-1088-trunk-2010-1-25.2.patch, hive-1088-trunk-2010-1-25.patch, hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1088:
-------------------------------
Attachment: hive-rcfile-reader-trunk.patch
hive-rcfile-reader-branch-0.5.patch
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: He Yongqiang
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-1088:
-----------------------------
Priority: Blocker (was: Major)
Affects Version/s: 0.6.0
0.5.0
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.5.0, 0.6.0
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Priority: Blocker
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1088) RCFile RecordReader's first split
will read duplicate rows if the split end is < the first SYNC mark
Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804853#action_12804853 ]
Ning Zhang commented on HIVE-1088:
----------------------------------
+1. Looks good. Will commit if tests pass.
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.5.0, 0.6.0
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Priority: Blocker
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-1088-branch0.5-2010-1-25.2.patch, hive-1088-branch0.5-2010-1-25.patch, hive-1088-trunk-2010-1-25.2.patch, hive-1088-trunk-2010-1-25.patch, hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1088:
-------------------------------
Attachment: (was: hive-1088-branch0.5-2010-1-25.2.patch)
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.5.0, 0.6.0
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Priority: Blocker
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-1088-branch0.5-2010-1-25.patch, hive-1088-trunk-2010-1-25.patch, hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ning Zhang resolved HIVE-1088.
------------------------------
Resolution: Fixed
Committed to 0.5.0 and trunk (0.6.0). Thanks Yongqiang!
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.5.0, 0.6.0
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Priority: Blocker
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-1088-branch0.5-2010-1-25.2.patch, hive-1088-branch0.5-2010-1-25.patch, hive-1088-trunk-2010-1-25.2.patch, hive-1088-trunk-2010-1-25.patch, hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1088:
-------------------------------
Attachment: hive-rcfile-reader-trunk.2.patch
Added a testcase in the patch for trunk. (hive-rcfile-reader-trunk.2.patch)
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-1088:
-------------------------------
Attachment: hive-1088-trunk-2010-1-25.2.patch
hive-1088-branch0.5-2010-1-25.2.patch
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.5.0, 0.6.0
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Priority: Blocker
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-1088-branch0.5-2010-1-25.2.patch, hive-1088-branch0.5-2010-1-25.patch, hive-1088-trunk-2010-1-25.2.patch, hive-1088-trunk-2010-1-25.patch, hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1088) RCFile RecordReader's first split
will read duplicate rows if the split end is < the first SYNC mark
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804434#action_12804434 ]
He Yongqiang commented on HIVE-1088:
------------------------------------
Will add a testcase.
>>Also, do you think it is a good idea to convert a subset of tests to use rcfile ?
Yes. We may need to do this soon or later, and right now is a good time. It maybe better if we do this in a separate jira.
What others think?
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1088) RCFile RecordReader's first split will
read duplicate rows if the split end is < the first SYNC mark
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain reassigned HIVE-1088:
--------------------------------
Assignee: He Yongqiang
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1088) RCFile RecordReader's first split
will read duplicate rows if the split end is < the first SYNC mark
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804389#action_12804389 ]
Namit Jain commented on HIVE-1088:
----------------------------------
Can you add a test for this ?
Also, do you think it is a good idea to convert a subset of tests to use rcfile ?
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1088) RCFile RecordReader's first split
will read duplicate rows if the split end is < the first SYNC mark
Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804639#action_12804639 ]
Ning Zhang commented on HIVE-1088:
----------------------------------
Another suggestion: in RCFileRecordReader.java, next(LongWritable, BytesRefArrayWritable) shares many common code with next(LongWritable). Can you refactor this function to something like:
public boolean next(LongWritable key, ByteRefArrayWritable value)
throws IOException {
more = next(key);
if ( more ) {
in.getCurrentRow(value);
}
return more;
}
This will always keep the logic consistent for the two next() functions.
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1088) RCFile RecordReader's first split
will read duplicate rows if the split end is < the first SYNC mark
Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804632#action_12804632 ]
Ning Zhang commented on HIVE-1088:
----------------------------------
Yongqiang, I noticed you have added a unit test in TestRCFile.java. Is it possible to add a unit test as .q file? The benefit of doing this is that TestRCFile just test one code path for particular functions. If we change the code path later in the real execution engine, we may not cover the error case, but a .q file will be more likely to catch the caes.
> RCFile RecordReader's first split will read duplicate rows if the split end is < the first SYNC mark
> ----------------------------------------------------------------------------------------------------
>
> Key: HIVE-1088
> URL: https://issues.apache.org/jira/browse/HIVE-1088
> Project: Hadoop Hive
> Issue Type: Bug
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Fix For: 0.5.0, 0.6.0
>
> Attachments: hive-rcfile-reader-branch-0.5.patch, hive-rcfile-reader-trunk.2.patch, hive-rcfile-reader-trunk.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.