You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "zhao jintao (JIRA)" <ji...@apache.org> on 2019/03/06 04:05:00 UTC

[jira] [Created] (KYLIN-3845) Kylin build error If the Kafka data source lacks selected dimensions or metrics in the kylin stream build.

zhao jintao created KYLIN-3845:
----------------------------------

             Summary: Kylin build error If the Kafka data source lacks selected dimensions or metrics in the kylin stream build.
                 Key: KYLIN-3845
                 URL: https://issues.apache.org/jira/browse/KYLIN-3845
             Project: Kylin
          Issue Type: Bug
          Components: Job Engine
    Affects Versions: v2.5.2
         Environment: Fusion Insight
            Reporter: zhao jintao
             Fix For: Future


Hi dear team:
I'm developing OLAP Platform based on Kylin2.5.2. During my work, I build a streaming cube from Kafka source using kafka demo.
In my streaming project, I set country、currency as dimensions and userId as metrics. But the cube build failed in 3rd step("Extract Fact Table Distinct Columns"). The exception is java.lang.ArrayIndexOutOfBoundsException.
This is logs:
2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: Do cleanup, available memory: 1334m
2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: Total rows: 127
2019-03-02 14:21:01,492 INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 0
2019-03-02 14:21:01,492 INFO [main] org.apache.hadoop.mapred.YarnChild: Exception running child: java.lang.ArrayIndexOutOfBoundsException:2
2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: Do cleanup, available memory: 1334m
 at org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper.doMap(FactDistinctColumnsMapper.java:177)
 at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
 at org.apache.hadoop.mapreduce.Mapper.run(MapperTask.java:146)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:187)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java;180)
 
Then I find that in Kafka datasource, some streaming data lack the userId column. Most of the streaming data(contry, currency,userId) is ("China","CNY","843c4d");but a small amount of data lack userId, some data is ("China","CNY"). so when run the 3rd step("Extract Fact Table Distinct Columns"),MR engine will throw exception if the streaming data lack userId.

The I check the source of Kylin, FactDistinctColumnsMapper.java:

public void doMap(KEYIN key, Object record, Context context) throws IOException, InterruptedException {
 Collection<String[]> rowCollection = flatTableInputFormat.parseMapperInput(record);

for (String[] row : rowCollection) {
 context.getCounter(RawDataCounter.BYTES).increment(countSizeInBytes(row));
 for (int i = 0; i < allCols.size(); i++) {
 String fieldValue = row[columnIndex[i]];
 if (fieldValue == null)
 continue;

final DataType type = allCols.get(i).getType();
 ...

I find that columnIndex[i] is equal with the size of row if the streaming data lack one column. So the row[columnIndex[i]] will throw the ArrayIndexOutOfBoundsException. So I change this code, check the columnIndex[i] and the size of row. If columnIndex[i] is equal with or larger than the size of row, I set fieldValue empty value. And After I change my code, the 3rd step("Extract Fact Table Distinct Columns") will run success.

Those are what I found, which will cause problem for developers.
How do you think?

Best regard
jintao



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)