You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Wang, Gang (JIRA)" <ji...@apache.org> on 2017/10/23 05:44:00 UTC
[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on
value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wang, Gang updated KYLIN-2956:
------------------------------
Description:
In the new release, Kylin will check the value length when building trie dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through method:
_private void positiveShortPreCheck(int i, String fieldName) {
if (!BytesUtil.isPositiveShort(i)) {
throw new IllegalStateException(fieldName + " is not positive short, usually caused by too long dict value.");
}
}_
_public static boolean isPositiveShort(int i) {
return (i & 0xFFFF7000) == 0;
}
_
And 0xFFFF7000 in binary: 1111 1111 1111 1111 0111 0000 0000 0000, so the value length should be less than 0000 0000 0000 0000 0001 0000 0001 1111, values 4095 in decimalism.
I wonder why is 0xFFFF7000, should
0xFFFF8000: 1111 1111 1111 1111 1000 0000 0000 0000
support max length: 0000 0000 0000 0000 0111 1111 1111 1111 (32767)
be what you want? And 32767 may be too lagrge, I prefer use 0xFFFFE000,
0xFFFFE000: 1111 1111 1111 1111 1110 0000 0000 0000,
support max length: 0000 0000 0000 0000 0001 1111 1111 1111 (8191)
was:
In the new release, Kylin will check the value length when building trie dictionary, in class _TrieDictionaryBuilder_ method _buildTrieBytes_ , through method:
_private void positiveShortPreCheck(int i, String fieldName) {
if (!BytesUtil.isPositiveShort(i)) {
throw new IllegalStateException(fieldName + " is not positive short, usually caused by too long dict value.");
}
} _
_public static boolean isPositiveShort(int i) {
return (i & 0xFFFF7000) == 0;
}_
And 0xFFFF7000 in binary: 1111 1111 1111 1111 0111 0000 0000 0000, so the value length should be less than 0000 0000 0000 0000 0001 0000 0001 1111, values 4095 in decimalism.
I wonder why is 0xFFFF7000, should
0xFFFF8000: 1111 1111 1111 1111 1000 0000 0000 0000
support max length: 0000 0000 0000 0000 0111 1111 1111 1111 (32767)
be what you want? And 32767 may be too lagrge, I prefer use 0xFFFFE000,
0xFFFFE000: 1111 1111 1111 1111 1110 0000 0000 0000,
support max length: 0000 0000 0000 0000 0001 1111 1111 1111 (8191)
> building trie dictionary blocked on value of length over 4095
> --------------------------------------------------------------
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
> Issue Type: Bug
> Components: General
> Reporter: Wang, Gang
> Assignee: Wang, Gang
>
> In the new release, Kylin will check the value length when building trie dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through method:
> _private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, usually caused by too long dict value.");
> }
> }_
> _public static boolean isPositiveShort(int i) {
> return (i & 0xFFFF7000) == 0;
> }
> _
> And 0xFFFF7000 in binary: 1111 1111 1111 1111 0111 0000 0000 0000, so the value length should be less than 0000 0000 0000 0000 0001 0000 0001 1111, values 4095 in decimalism.
> I wonder why is 0xFFFF7000, should
> 0xFFFF8000: 1111 1111 1111 1111 1000 0000 0000 0000
> support max length: 0000 0000 0000 0000 0111 1111 1111 1111 (32767)
> be what you want? And 32767 may be too lagrge, I prefer use 0xFFFFE000,
> 0xFFFFE000: 1111 1111 1111 1111 1110 0000 0000 0000,
> support max length: 0000 0000 0000 0000 0001 1111 1111 1111 (8191)
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)