You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "kangkaisen (JIRA)" <ji...@apache.org> on 2016/08/02 02:31:20 UTC
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

    [ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403250#comment-15403250 ] 

kangkaisen commented on KYLIN-1834:
-----------------------------------

I have found the same question.

it is strange that building cube from 6.13 to 7.1 and from 7.1 to 7.15 are both right, but merging cube from 6.13 to 7.15 is wrong.

the log is like that:

2016-08-01 19:04:49,265 ERROR [main] org.apache.kylin.dict.TrieDictionary: Not a valid value: 00000000000000025963
2016-08-01 19:04:49,267 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
2016-08-01 19:04:49,267 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
2016-08-01 19:04:49,267 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 9429876; bufvoid = 104857600
2016-08-01 19:04:49,267 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584); kvend = 26110792(104443168); length = 103605/6553600
2016-08-01 19:04:49,389 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.lzo_deflate]
2016-08-01 19:04:49,390 WARN [main] org.apache.hadoop.io.compress.LzoCodec: org.apache.hadoop.io.compress.LzoCodec is deprecated. You should use com.hadoop.compression.lzo.LzoCodec instead to generate LZO compressed data.
2016-08-01 19:04:49,579 INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 0
2016-08-01 19:04:49,585 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Value not exists!
	at org.apache.kylin.common.util.Dictionary.getIdFromValueBytes(Dictionary.java:162)
	at org.apache.kylin.common.util.Dictionary.getIdFromValueBytes(Dictionary.java:140)
	at org.apache.kylin.engine.mr.steps.MergeCuboidMapper.map(MergeCuboidMapper.java:207)
	at org.apache.kylin.engine.mr.steps.MergeCuboidMapper.map(MergeCuboidMapper.java:63)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)



"00000000000000025963" test with "TrieDictionary" is nothing wrong.

It is maybe a hidden bug.

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
> ----------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-1834
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1834
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: v1.5.2, v1.5.2.1
>            Reporter: Richard Calaba
>            Priority: Blocker
>         Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
> 	at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
> 	at org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
> 	at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
> 	at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
> 	at org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
> 	at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
> 	at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
> 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
> 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
> 	at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> 	at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
> 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
> 	at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
> 	at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>      * A lower level API, return ID integer from raw value bytes. In case of not found 
>      * <p>
>      * - if roundingFlag=0, throw IllegalArgumentException; <br>
>      * - if roundingFlag<0, the closest smaller ID integer if exist; <br>
>      * - if roundingFlag>0, the closest bigger ID integer if exist. <br>
>      * <p>
>      * Bypassing the cache layer, this could be significantly slower than getIdFromValue(T value).
>      * 
>      * @throws IllegalArgumentException
>      *             if value is not found in dictionary and rounding is off;
>      *             or if rounding cannot find a smaller or bigger ID
>      */
>     final public int getIdFromValueBytes(byte[] value, int offset, int len, int roundingFlag) throws IllegalArgumentException {
>         if (isNullByteForm(value, offset, len))
>             return nullId();
>         else {
>             int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag);
>             if (id < 0)
>                 throw new IllegalArgumentException("Value not exists!");
>             return id;
>         }
>     } 
> ==========================================================
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 mio rows. I have increased the JVM -Xmx to 16gb and set the kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube build doesn't fail (previously we were getting exception complaining about the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==========================================================
> Before that we were getting exception complaining about the Dictionary encoding problem - "Too high cardinality is not suitable for dictionary -- cardinality: 10873977" - this we resolved by changing the affected dimension/row key Encoding from "dict" to "int; length=8" on the Advanced Settings of the Cube.
> ==========================================================
> We have 2 high-cardinality fields (one from fact table and one from the big dimension (customer - see above). We need to use in distinc_count measure for our calculations. I wonder if this exception Value not found! is somewhat related ??? Those count_distinct measures are defined one with return type "bitmap" (exact precission - only for Int columns) and 2nd with return type "hllc16" (error rate <= 1.22 %)
> ==========================================================
> I am looking for any clues to debug the cause of this error and way how to circumwent this ... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)