You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "胡志华 (万里通科技及数据中心商务智能团队数据分析组)" <HU...@pingan.com.cn> on 2016/08/23 09:41:13 UTC

答复: Dup key found when building cube

Hi shaofeng,

	I meet the same problem.  Let me describe my data model.

My lookup table has fields like  A,B,C,D,E,F,G,....,    A,B,C,D are primary keys.

I use C,D,E,F to inner join fact table, and I meet the problem below, so I can't create model like this ?


Another question,Can i put filed G into filter condition when I create model?







-----邮件原件-----
发件人: ShaoFeng Shi [mailto:shaofengshi@gmail.com] 
发送时间: 2016年8月20日 16:27
收件人: dev@kylin.apache.org; dev
主题: Re: Dup key found when building cube

Usually the join is on PK, which need be unique; if not fulfilled kylin will report such error, you need clean the data to remove this ambiguity.

Regards,

Shaofeng Shi

shaofengshi@gmail.com

From Outlook Mobile




On Fri, Aug 19, 2016 at 7:22 PM +0800, "北极晨光" <17...@qq.com> wrote:










Hi,


We are using kylin of latest version. But there is a error as following when building cube.
java.lang.IllegalStateException: Dup key found, key=[NA94092521], value1=[95859459,NA22686932NA94092521,NA22686932,NA94092521,A100,5,A201,20,NAC03,NAC03,PartScanE,2014/09/26,10:52:34,2014/09/26,1], value2=[95859462,NA22686904NA94092521,NA22686904,NA94092521,A201,20,A401,40,NAC03,NAC03,a61007,2014/09/26,10:52:39,2014/09/26,1] 	at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:84) 	at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:67) 	at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:79) 	at org.apache.kylin.dict.lookup.LookupTable.(LookupTable.java:55) 	at org.apache.kylin.dict.lookup.LookupStringTable.(LookupStringTable.java:65) 	at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:619) 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:61) 	at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) 	at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 	at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63) 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112) 	at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57) 	at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112) 	at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:127) 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 	at java.lang.Thread.run(Thread.java:745)


There are two tables. Fact table is M_ORDER, lookup table is M_SCAN. M_ORDER inner join M_SCAN with column ORDER_NAME. ORDER_NAME is unique for M_ORDER, but duplicated for M_SCAN.


What could we do to solve the problem? Thanks.


Zhang





********************************************************************************************************************************
The information in this email is confidential and may be legally privileged. If you have received this email in error or are not the intended recipient, please immediately notify the sender and delete this message from your computer. Any use, distribution, or copying of this email other than by the intended recipient is strictly prohibited. All messages sent to and from us may be monitored to ensure compliance with internal policies and to protect our business.
Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks.

收发邮件者请注意:
本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
********************************************************************************************************************************

Re: 答复: Dup key found when building cube

Posted by Li Yang <li...@apache.org>.
Currently Kylin assumes the joining columns (C,D,E,F in your case) to be
primary key. And that's wrong for your case. Some thing to improve. Could
you open a JIRA?

On Tue, Aug 23, 2016 at 5:41 PM, 胡志华(万里通科技及数据中心商务智能团队数据分析组) <
HUZHIHUA160@pingan.com.cn> wrote:

> Hi shaofeng,
>
>         I meet the same problem.  Let me describe my data model.
>
> My lookup table has fields like  A,B,C,D,E,F,G,....,    A,B,C,D are
> primary keys.
>
> I use C,D,E,F to inner join fact table, and I meet the problem below, so I
> can't create model like this ?
>
>
> Another question,Can i put filed G into filter condition when I create
> model?
>
>
>
>
>
>
>
> -----邮件原件-----
> 发件人: ShaoFeng Shi [mailto:shaofengshi@gmail.com]
> 发送时间: 2016年8月20日 16:27
> 收件人: dev@kylin.apache.org; dev
> 主题: Re: Dup key found when building cube
>
> Usually the join is on PK, which need be unique; if not fulfilled kylin
> will report such error, you need clean the data to remove this ambiguity.
>
> Regards,
>
> Shaofeng Shi
>
> shaofengshi@gmail.com
>
> From Outlook Mobile
>
>
>
>
> On Fri, Aug 19, 2016 at 7:22 PM +0800, "北极晨光" <17...@qq.com> wrote:
>
>
>
>
>
>
>
>
>
>
> Hi,
>
>
> We are using kylin of latest version. But there is a error as following
> when building cube.
> java.lang.IllegalStateException: Dup key found, key=[NA94092521],
> value1=[95859459,NA22686932NA94092521,NA22686932,NA94092521,A100,5,
> A201,20,NAC03,NAC03,PartScanE,2014/09/26,10:52:34,2014/09/26,1],
> value2=[95859462,NA22686904NA94092521,NA22686904,NA94092521,A201,20,
> A401,40,NAC03,NAC03,a61007,2014/09/26,10:52:39,2014/09/26,1]  at
> org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:84)
>     at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:67)
>  at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:79)
>      at org.apache.kylin.dict.lookup.LookupTable.(LookupTable.java:55)
>    at org.apache.kylin.dict.lookup.LookupStringTable.(LookupStringTable.java:65)
>  at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:619)
>      at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(
> DictionaryGeneratorCLI.java:61)      at org.apache.kylin.cube.cli.
> DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)    at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)    at
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.
> doWork(HadoopShellExecutable.java:63)        at org.apache.kylin.job.
> execution.AbstractExecutable.execute(AbstractExecutable.java:112)
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:57)     at org.apache.kylin.job.
> execution.AbstractExecutable.execute(AbstractExecutable.java:112)
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:127)
>      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
>
>
> There are two tables. Fact table is M_ORDER, lookup table is M_SCAN.
> M_ORDER inner join M_SCAN with column ORDER_NAME. ORDER_NAME is unique for
> M_ORDER, but duplicated for M_SCAN.
>
>
> What could we do to solve the problem? Thanks.
>
>
> Zhang
>
>
>
>
>
> ************************************************************
> ********************************************************************
> The information in this email is confidential and may be legally
> privileged. If you have received this email in error or are not the
> intended recipient, please immediately notify the sender and delete this
> message from your computer. Any use, distribution, or copying of this email
> other than by the intended recipient is strictly prohibited. All messages
> sent to and from us may be monitored to ensure compliance with internal
> policies and to protect our business.
> Emails are not secure and cannot be guaranteed to be error free as they
> can be intercepted, amended, lost or destroyed, or contain viruses. Anyone
> who communicates with us by email is taken to accept these risks.
>
> 收发邮件者请注意:
> 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
> 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
> ************************************************************
> ********************************************************************
>