You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "ZhuGuanyin (JIRA)" <ji...@apache.org> on 2009/05/06 13:11:30 UTC
[jira] Created: (HADOOP-5779) KeyFieldBasedPartitioner should
encode free and handle ArrayOutOfIndex exception!
KeyFieldBasedPartitioner should encode free and handle ArrayOutOfIndex exception!
---------------------------------------------------------------------------------
Key: HADOOP-5779
URL: https://issues.apache.org/jira/browse/HADOOP-5779
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Reporter: ZhuGuanyin
Fix For: 0.20.1
1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5779) KeyFieldBasedPartitioner would lost
data if specifed field not exist, and it should encode free not only
support utf8
Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amar Kamat updated HADOOP-5779:
-------------------------------
Attachment: HADOOP-5779-partial.patch
Hey. I tried testing your patch against my testcase and it failed. The code changes to do with the exception seems insufficient. Attaching the code changes that fixes the issue. Can you plz change the patch accordingly?
Note that I havent coded for utf-8 issue.
> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: encode-free-KeyFieldBasedPartitioner-v1.patch, encode-free-KeyFieldBasedPartitioner.patch, HADOOP-5779-partial.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5779) KeyFieldBasedPartitioner would lost
data if specifed field not exist, and it should encode free not only
support utf8
Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710191#action_12710191 ]
Jothi Padmanabhan commented on HADOOP-5779:
-------------------------------------------
Some minor comments:
* I think it would be better to have the instance check for the keys in a single if block
{code}
if (key instanceof BytesWritable) {
// Handle BytesWritable
}
else if (key instanceof Text) {
// Handle Text
}
else {
// error
}
{code}
* A test case to test for Text and BytesWritable keys would be good to have for this patch. It could either be a new test case or could modify TestStreamDataProtocol. Also, if the test case can demonstrate the fix for ArrayOutOfBoundsException -- it should fail without this patch and run with this patch, it would be really nice.
> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: encode-free-KeyFieldBasedPartitioner-v1.patch, encode-free-KeyFieldBasedPartitioner.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5779) KeyFieldBasedPartitioner would lost
data if specifed field not exist, and it should encode free not only
support utf8
Posted by "ZhuGuanyin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ZhuGuanyin updated HADOOP-5779:
-------------------------------
Attachment: encode-free-KeyFieldBasedPartitioner-v1.patch
create patsh using svn diff instead of diff
> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: encode-free-KeyFieldBasedPartitioner-v1.patch, encode-free-KeyFieldBasedPartitioner.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5779) KeyFieldBasedPartitioner should
encode free and handle ArrayOutOfIndex exception!
Posted by "ZhuGuanyin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ZhuGuanyin updated HADOOP-5779:
-------------------------------
Attachment: encode-free-KeyFieldBasedPartitioner.patch
here comes the patch :)
> KeyFieldBasedPartitioner should encode free and handle ArrayOutOfIndex exception!
> ---------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: ZhuGuanyin
> Fix For: 0.20.1
>
> Attachments: encode-free-KeyFieldBasedPartitioner.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5779) KeyFieldBasedPartitioner would lost
data if specifed field not exist, and it should encode free not only
support utf8
Posted by "ZhuGuanyin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719503#action_12719503 ]
ZhuGuanyin commented on HADOOP-5779:
------------------------------------
Thanks very much, I'm busy the last month and not followed this issue, I'll attach an example dataset to let key.toString().getBytes("UTF-8") throws exception soon,
Thanks again!
> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: encode-free-KeyFieldBasedPartitioner-v1.patch, encode-free-KeyFieldBasedPartitioner.patch, HADOOP-5779-partial.patch, HADOOP-5779-v1.0.patch.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5779) KeyFieldBasedPartitioner would lost
data if specifed field not exist, and it should encode free not only
support utf8
Posted by "Jothi Padmanabhan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718791#action_12718791 ]
Jothi Padmanabhan commented on HADOOP-5779:
-------------------------------------------
Some minor comments:
# I think endChar cannot be negative, so the check endChar < 0 can be removed. Could you check?
# Instead of doing i <= end && i < b.length in the hashCode(), I think we should ideally fix the getEndOffset to return min (end, b.length -1). But I would not -1 for that, I am OK with the existing simple change in the patch as well
# In the test case, adding an assert to verify the returned partition is 0 would be good.
> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: encode-free-KeyFieldBasedPartitioner-v1.patch, encode-free-KeyFieldBasedPartitioner.patch, HADOOP-5779-partial.patch, HADOOP-5779-v1.0.patch.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-5779) KeyFieldBasedPartitioner
would lost data if specifed field not exist, and it should encode free not
only support utf8
Posted by "ZhuGuanyin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708734#action_12708734 ]
ZhuGuanyin edited comment on HADOOP-5779 at 5/17/09 6:51 PM:
-------------------------------------------------------------
create patch using svn diff instead of diff
was (Author: buptzhugy):
create patsh using svn diff instead of diff
> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: encode-free-KeyFieldBasedPartitioner-v1.patch, encode-free-KeyFieldBasedPartitioner.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5779) KeyFieldBasedPartitioner should
encode free and handle ArrayOutOfIndex exception!
Posted by "ZhuGuanyin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ZhuGuanyin updated HADOOP-5779:
-------------------------------
Release Note: Using BytesWritable or Text data types for KeyFieldBasedPartitioner, which would support not only utf8, and avoid the ArrayOutOfIndex exception.
Status: Patch Available (was: Open)
Using BytesWritable or Text data types for KeyFieldBasedPartitioner, which would support not only utf8, and avoid the ArrayOutOfIndex exception.
> KeyFieldBasedPartitioner should encode free and handle ArrayOutOfIndex exception!
> ---------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: ZhuGuanyin
> Fix For: 0.20.1
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5779) KeyFieldBasedPartitioner would lost
data if specifed field not exist, and it should encode free not only
support utf8
Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amar Kamat updated HADOOP-5779:
-------------------------------
Attachment: HADOOP-5779-v1.0.patch.patch
Attaching a merged patch. Testing in progress.
> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: encode-free-KeyFieldBasedPartitioner-v1.patch, encode-free-KeyFieldBasedPartitioner.patch, HADOOP-5779-partial.patch, HADOOP-5779-v1.0.patch.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5779) KeyFieldBasedPartitioner would lost
data if specifed field not exist, and it should encode free not only
support utf8
Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719486#action_12719486 ]
Amar Kamat commented on HADOOP-5779:
------------------------------------
ZhuGuanyin, I am in the process of merging the patches and add a testcase. I hope its ok with you. Can you please let me know how to simulate the situation where key.toString().getBytes("UTF-8") throws exception.
> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: encode-free-KeyFieldBasedPartitioner-v1.patch, encode-free-KeyFieldBasedPartitioner.patch, HADOOP-5779-partial.patch, HADOOP-5779-v1.0.patch.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5779) KeyFieldBasedPartitioner would lost
data if specifed field not exist, and it should encode free not only
support utf8
Posted by "ZhuGuanyin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ZhuGuanyin updated HADOOP-5779:
-------------------------------
Fix Version/s: (was: 0.20.1)
0.21.0
Description:
1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
was:
1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
Affects Version/s: 0.20.0
Summary: KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8 (was: KeyFieldBasedPartitioner should encode free and handle ArrayOutOfIndex exception!)
> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: encode-free-KeyFieldBasedPartitioner.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5779) KeyFieldBasedPartitioner would lost
data if specifed field not exist, and it should encode free not only
support utf8
Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719989#action_12719989 ]
Amar Kamat commented on HADOOP-5779:
------------------------------------
Opened HADOOP-6052 to address the ArrayOutOfIndex exception. ZhuGuanyin, could you plz provide a new patch having framework fix and testcase to solve utf-8 encoding issue?
> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: encode-free-KeyFieldBasedPartitioner-v1.patch, encode-free-KeyFieldBasedPartitioner.patch, HADOOP-5779-partial.patch, HADOOP-5779-v1.0.patch.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5779) KeyFieldBasedPartitioner would lost
data if specifed field not exist, and it should encode free not only
support utf8
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707525#action_12707525 ]
Hadoop QA commented on HADOOP-5779:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12407333/encode-free-KeyFieldBasedPartitioner.patch
against trunk revision 772960.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.
-1 patch. The patch command could not apply the patch.
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/307/console
This message is automatically generated.
> KeyFieldBasedPartitioner would lost data if specifed field not exist, and it should encode free not only support utf8
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5779
> URL: https://issues.apache.org/jira/browse/HADOOP-5779
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: ZhuGuanyin
> Fix For: 0.21.0
>
> Attachments: encode-free-KeyFieldBasedPartitioner.patch
>
>
> 1) Currently, KeyFieldBasedPartitioner only support utf8 encoded recored, we should use text or byteswriteable data types.
> 2) when using KeyFieldBasedPartitioner, if the record doesn't contain the specified field, the endChar would equal with array.length, which throw ArrayOutOfIndex exception, losting that record!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.