You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by "quzhengpeng@hetrone.com" <qu...@hetrone.com> on 2016/12/06 08:34:26 UTC

回复: Re: Cube Dup key found

Hi，
    I use sqoop1.4.6 load data from mysql to hive. The table of orders has it's own key ,but in kylin seams have something wrong. How to add the  key of the lookup table (my orders table) ?

Hi，
     This error because of some  dimension table has more than 1 record when fact table join on it through the key ‘5847,ufenqi,2016-11-11’,  you can avoid this by add key columns in the join condition.


在 2016年12月6日，下午3:20，quzhengpeng@hetrone.com 写道：

Hi,
    I have two tables users and orders, one user can make many orders. They're relation is one to many.
    I create the model with inner join users and orders 
    Finally i build the cube and raise a Dup key Error, How can i make the cube?

java.lang.IllegalStateException: Dup key found, key=[5847,ufenqi,2016-11-11], value1=[2615,product,5847,ufenqi,2014-09-09 23:23:31.0,338800,170,10,2016-11-11,2099-12-31], value2=[3635,product,5847,ufenqi,2014-09-11 22:51:06.0,336800,170,10,2016-11-11,2099-12-31]
	at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:85)
	at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:68)
	at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:79)
	at org.apache.kylin.dict.lookup.LookupTable.<init>(LookupTable.java:56)
	at org.apache.kylin.dict.lookup.LookupStringTable.<init>(LookupStringTable.java:65)
	at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:674)
	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:60)
	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:41)
	at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:54)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
	at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
	at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
result code:2

Re: Re: Cube Dup key found

Posted by Billy Liu <bi...@apache.org>.

In your case, the order table would be fact table, and user table would be
lookup table.  Could you have a try?

2016-12-06 17:51 GMT+08:00 quzhengpeng@hetrone.com <qu...@hetrone.com>
:

> Hi ,
>     hive> desc uc_users;
>     OK
>     *user_id              int*
>     *u**ser_modify         string *
>     create_time          string
>     *start_dt             string*
>     end_dt               string
>
>     hive> desc oc_orders;
>     OK
>     *order_id             int    *
>     *order_modify        string *
>     user_id              int
>     user_modify         string
>     create_time          string
>     order_money       int
>     status               int
>     pay_status           int
>     *start_dt             string*
>     end_dt               string
>
>     they are the struct. In RDBMS the user_id and user_modify is FK
>  ,underlined is primary key and the red is my join condition. but now i can
> find additional coumnm in the join step.and the key(5847,ufenqi,2016-11-11)
> in table orders have multi-result, it's right in logic. now  i don't know
> how to Keep every record in my lookup table unique with key(
> 5847,ufenqi,2016-11-11)
>
>
>
>
>
> *From:* Mars Xu <xu...@gmail.com>
> *Date:* 2016-12-06 17:04
> *To:* user <us...@kylin.apache.org>
> *Subject:* Re: Cube Dup key found
> when u join the fact table and lookup table in kylin ,u need to define the
> join key columns such as ordered,username and date ,right ?  in your lookup
> table (orders table) , it seems that the table has more records when
> orderedid=5847,username=ufenqi,date=2016-11-11.   u need to define an
> additional column in the join step according to your business
> meaningness,such as order_seqno.
>
> Keep every record in your lookup table unique .
>
> 在 2016年12月6日，下午4:34，quzhengpeng@hetrone.com 写道：
>
> Hi，
>     I use sqoop1.4.6 load data from mysql to hive. The table of orders has
> it's own key ,but in kylin seams have something wrong. How to add the  key
> of the lookup table (my orders table) ?
>
> Hi，
>      This error because of some  dimension table has more than 1 record
> when fact table join on it through the key ‘5847,ufenqi,2016-11-11’,  you
> can avoid this by add key columns in the join condition.
>
>
> 在 2016年12月6日，下午3:20，quzhengpeng@hetrone.com 写道：
>
> Hi,
>     I have two tables users and orders, one user can make many orders.
> They're relation is one to many.
>     I create the model with inner join users and orders
>     Finally i build the cube and raise a Dup key Error, How can i make the
> cube?
>
> java.lang.IllegalStateException: Dup key found, key=[5847,ufenqi,2016-11-11], value1=[2615,product,5847,ufenqi,2014-09-09 23:23:31.0,338800,170,10,2016-11-11,2099-12-31], value2=[3635,product,5847,ufenqi,2014-09-11 22:51:06.0,336800,170,10,2016-11-11,2099-12-31]
> 	at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:85)
> 	at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:68)
> 	at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:79)
> 	at org.apache.kylin.dict.lookup.LookupTable.<init>(LookupTable.java:56)
> 	at org.apache.kylin.dict.lookup.LookupStringTable.<init>(LookupStringTable.java:65)
> 	at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:674)
> 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:60)
> 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:41)
> 	at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:54)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> 	at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
> 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
> 	at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
> 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
> 	at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> result code:2
>
>
>

回复: Re: Cube Dup key found

Posted by "quzhengpeng@hetrone.com" <qu...@hetrone.com>.

Hi ,
    hive> desc uc_users;
    OK
    user_id              int                                      
    user_modify         string                                   
    create_time          string                                   
    start_dt             string                                   
    end_dt               string                                   

    hive> desc oc_orders;
    OK
    order_id             int                                      
    order_modify        string                                   
    user_id              int                                      
    user_modify         string                                   
    create_time          string                                   
    order_money       int                                      
    status               int                                      
    pay_status           int                                      
    start_dt             string                                   
    end_dt               string 

    they are the struct. In RDBMS the user_id and user_modify is FK  ,underlined is primary key and the red is my join condition. but now i can find additional coumnm in the join step.and the key(5847,ufenqi,2016-11-11) in table orders have multi-result, it's right in logic. now  i don't know how to Keep every record in my lookup table unique with key(5847,ufenqi,2016-11-11)





From: Mars Xu
Date: 2016-12-06 17:04
To: user
Subject: Re: Cube Dup key found
when u join the fact table and lookup table in kylin ,u need to define the join key columns such as ordered,username and date ,right ?  in your lookup table (orders table) , it seems that the table has more records when orderedid=5847,username=ufenqi,date=2016-11-11.   u need to define an additional column in the join step according to your business meaningness,such as order_seqno.

Keep every record in your lookup table unique .

在 2016年12月6日，下午4:34，quzhengpeng@hetrone.com 写道：

Hi，
    I use sqoop1.4.6 load data from mysql to hive. The table of orders has it's own key ,but in kylin seams have something wrong. How to add the  key of the lookup table (my orders table) ?

Hi，
     This error because of some  dimension table has more than 1 record when fact table join on it through the key ‘5847,ufenqi,2016-11-11’,  you can avoid this by add key columns in the join condition.


在 2016年12月6日，下午3:20，quzhengpeng@hetrone.com 写道：

Hi,
    I have two tables users and orders, one user can make many orders. They're relation is one to many.
    I create the model with inner join users and orders 
    Finally i build the cube and raise a Dup key Error, How can i make the cube?

java.lang.IllegalStateException: Dup key found, key=[5847,ufenqi,2016-11-11], value1=[2615,product,5847,ufenqi,2014-09-09 23:23:31.0,338800,170,10,2016-11-11,2099-12-31], value2=[3635,product,5847,ufenqi,2014-09-11 22:51:06.0,336800,170,10,2016-11-11,2099-12-31]
	at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:85)
	at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:68)
	at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:79)
	at org.apache.kylin.dict.lookup.LookupTable.<init>(LookupTable.java:56)
	at org.apache.kylin.dict.lookup.LookupStringTable.<init>(LookupStringTable.java:65)
	at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:674)
	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:60)
	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:41)
	at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:54)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
	at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
	at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
result code:2

Re: Cube Dup key found

Posted by Mars Xu <xu...@gmail.com>.

when u join the fact table and lookup table in kylin ,u need to define the join key columns such as ordered,username and date ,right ?  in your lookup table (orders table) , it seems that the table has more records when orderedid=5847,username=ufenqi,date=2016-11-11.   u need to define an additional column in the join step according to your business meaningness,such as order_seqno.

Keep every record in your lookup table unique .

> 在 2016年12月6日，下午4:34，quzhengpeng@hetrone.com 写道：
> 
> Hi，
>     I use sqoop1.4.6 load data from mysql to hive. The table of orders has it's own key ,but in kylin seams have something wrong. How to add the  key of the lookup table (my orders table) ?
> 
> Hi，
>      This error because of some  dimension table has more than 1 record when fact table join on it through the key ‘5847,ufenqi,2016-11-11’,  you can avoid this by add key columns in the join condition.
> 
> 
>> 在 2016年12月6日，下午3:20，quzhengpeng@hetrone.com <ma...@hetrone.com> 写道：
>> 
>> Hi,
>>     I have two tables users and orders, one user can make many orders. They're relation is one to many.
>>     I create the model with inner join users and orders 
>>     Finally i build the cube and raise a Dup key Error, How can i make the cube?
>> 
>> java.lang.IllegalStateException: Dup key found, key=[5847,ufenqi,2016-11-11], value1=[2615,product,5847,ufenqi,2014-09-09 23:23:31.0,338800,170,10,2016-11-11,2099-12-31], value2=[3635,product,5847,ufenqi,2014-09-11 22:51:06.0,336800,170,10,2016-11-11,2099-12-31]
>> 	at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:85)
>> 	at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:68)
>> 	at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:79)
>> 	at org.apache.kylin.dict.lookup.LookupTable.<init>(LookupTable.java:56)
>> 	at org.apache.kylin.dict.lookup.LookupStringTable.<init>(LookupStringTable.java:65)
>> 	at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:674)
>> 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:60)
>> 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:41)
>> 	at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:54)
>> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>> 	at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>> 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
>> 	at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
>> 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
>> 	at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> 	at java.lang.Thread.run(Thread.java:745)
>> result code:2