You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "nichunen (JIRA)" <ji...@apache.org> on 2019/07/05 14:06:00 UTC
[jira] [Closed] (KYLIN-3845) Kylin build error If the Kafka data source lacks selected dimensions or metrics in the kylin stream build.

     [ https://issues.apache.org/jira/browse/KYLIN-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nichunen closed KYLIN-3845.
---------------------------

Resolved in release 2.6.3

> Kylin build error If the Kafka data source lacks selected dimensions or metrics in the kylin stream build.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-3845
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3845
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine, NRT Streaming
>    Affects Versions: v2.5.2
>         Environment: Fusion Insight
>            Reporter: zhao jintao
>            Assignee: zhao jintao
>            Priority: Major
>              Labels: easyfix
>             Fix For: v2.6.3, v3.0.0-beta
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Hi dear team:
> I'm developing OLAP Platform based on Kylin2.5.2. During my work, I build a streaming cube from Kafka source using kafka demo.
> In my streaming project, I set country、currency as dimensions and userId as metrics. But the cube build failed in 3rd step("Extract Fact Table Distinct Columns"). The exception is java.lang.ArrayIndexOutOfBoundsException.
> This is logs:
> 2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: Do cleanup, available memory: 1334m
> 2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: Total rows: 127
> 2019-03-02 14:21:01,492 INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 0
> 2019-03-02 14:21:01,492 INFO [main] org.apache.hadoop.mapred.YarnChild: Exception running child: java.lang.ArrayIndexOutOfBoundsException:2
> 2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: Do cleanup, available memory: 1334m
>  at org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper.doMap(FactDistinctColumnsMapper.java:177)
>  at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
>  at org.apache.hadoop.mapreduce.Mapper.run(MapperTask.java:146)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:187)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java;180)
>  
> Then I find that in Kafka datasource, some streaming data lack the userId column. Most of the streaming data(contry, currency,userId) is ("China","CNY","843c4d");but a small amount of data lack userId, some data is ("China","CNY"). so when run the 3rd step("Extract Fact Table Distinct Columns"),MR engine will throw exception if the streaming data lack userId.
> The I check the source of Kylin, FactDistinctColumnsMapper.java:
> public void doMap(KEYIN key, Object record, Context context) throws IOException, InterruptedException {
>  Collection<String[]> rowCollection = flatTableInputFormat.parseMapperInput(record);
> for (String[] row : rowCollection) {
>  context.getCounter(RawDataCounter.BYTES).increment(countSizeInBytes(row));
>  for (int i = 0; i < allCols.size(); i++) {
>  String fieldValue = row[columnIndex[i]];
>  if (fieldValue == null)
>  continue;
> final DataType type = allCols.get(i).getType();
>  ...
> I find that columnIndex[i] is equal with the size of row if the streaming data lack one column. So the row[columnIndex[i]] will throw the ArrayIndexOutOfBoundsException. So I change this code, check the columnIndex[i] and the size of row. If columnIndex[i] is equal with or larger than the size of row, I set fieldValue empty value. And After I change my code， the 3rd step("Extract Fact Table Distinct Columns") will run success.
> Those are what I found, which will cause problem for developers.
> How do you think?
> Best regard
> jintao



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)