You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Wang, Gang (JIRA)" <ji...@apache.org> on 2017/12/18 07:33:01 UTC
[jira] [Assigned] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

     [ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wang, Gang reassigned KYLIN-3115:
---------------------------------

    Assignee: Wang, Gang

> Incompatible RowKeySplitter initialize between build and merge job
> ------------------------------------------------------------------
>
>                 Key: KYLIN-3115
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3115
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: Wang, Gang
>            Assignee: Wang, Gang
>
> In class NDCuboidBuilder. 
> _public NDCuboidBuilder(CubeSegment cubeSegment) {
>     this.cubeSegment = cubeSegment;
>     this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
>     this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
> }_
> which will create a temp bytes array with length 256 to fill in rowkey column bytes.
> While, in class MergeCuboidMapper it's initialized with length 255. 
> _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_
> So, if a dimension is encoded in fixed length and the length is 256. The cube building job will succeed. While, the merge job will always fail.
>     public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException {
>        _ long cuboidID = rowKeySplitter.split(key.getBytes());_
>         Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
> _        // rowkey columns
>         for (int i = 0; i < cuboid.getColumns().size(); i++) {
>             splitOffsets[i] = offset;
>             TblColRef col = cuboid.getColumns().get(i);
>             int colLength = colIO.getColumnLength(col);
>             SplittedBytes split = this.splitBuffers[this.bufferSize++];
>             split.length = colLength;
>            _ System.arraycopy(bytes, offset, split.value, 0, colLength);_
>             offset += colLength;
>         }_
> Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column length is 256 in bytes and is being copied to a bytes array with length 255.
> The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: 
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)