You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "胡志华 (万里通科技及数据中心商务智能团队数据分析组)" <HU...@pingan.com.cn> on 2016/05/06 15:35:25 UTC

答复: 答复: problem happened at step "build base cuboid data"

I think it's difficulty to check, the amount of data is huge.

Could you give me some suggestion?

-----邮件原件-----
发件人: ShaoFeng Shi [mailto:shaofengshi@apache.org] 
发送时间: 2016年5月6日 23:33
收件人: dev@kylin.apache.org
主题: Re: 答复: problem happened at step "build base cuboid data"

I mean the data in hive table; if there is some dirty data (e.g, it was declared as decimal, but actually be a string), it may cause the cube build failed.

2016-05-06 23:29 GMT+08:00 胡志华(万里通科技及数据中心商务智能团队数据分析组) <
HUZHIHUA160@pingan.com.cn>:

> You mean data type, let me give you description,
>
> > desc partner_txn_sub_order_ft0_s;
> OK
> sub_txn_id              string
> txn_id                  string
> order_id                string
> sub_order_id            string
> pp_order_id             string
> pp_sub_order_id         string
> process_dt              string
> slt_doc_id              string
> doc_id                  string
> sub_doc_id              string
> gain_pay_ind            string
> pathway_ind             string
> txn_type_ind            string
> txn_sub_type_ind        string
> product_id              string
> product_name            string
> rule_id                 string
> rule_name               string
> slt_partner_id          string
> slt_partner_desc        string
> pay_cash                decimal(22,7)
> pay_points              decimal(22,7)
> gain_points             decimal(22,7)
> discount                decimal(22,7)
> age_level_ind           string
> gender_ind              string
> phone_province_ind      string
> phone_city_ind          string
> point_current_level_ind string
> binding_d               string
> binding_m               string
> is_email_verified       int
> is_mobile_verified      int
> is_app                  int
> partner_gain_pt_level_ind       string
> wlt_txn_level_ind       string
> brand_point_no          string
> pathway_desc            string
> is_activity             int
> pt_log_d                string
> partner_id              string
>
> # Partition Information
> # col_name              data_type               comment
>
> pt_log_d                string
> partner_id              string
> Time taken: 0.124 seconds, Fetched: 47 row(s)
> hive>
>
> -----邮件原件-----
> 发件人: ShaoFeng Shi [mailto:shaofengshi@apache.org]
> 发送时间: 2016年5月6日 23:27
> 收件人: dev@kylin.apache.org
> 主题: Re: problem happened at step "build base cuboid data"
>
> seems some data couldn't be parsed as a BigDecimal. You may need check 
> the data type in source table.
>
> 2016-05-06 20:34 GMT+08:00 胡志华(万里通科技及数据中心商务智能团队数据分析组) <
> HUZHIHUA160@pingan.com.cn>:
>
> > Hi all
> >
> >          I am encountering a problem at step "build base cuboid 
> > data", mapreduce log as below
> >
> > And I googled it, but found nothing useful, so who can help me ?
> >
> >
> > Error: java.lang.NumberFormatException at
> > java.math.BigDecimal.<init>(BigDecimal.java:470) at
> > java.math.BigDecimal.<init>(BigDecimal.java:739) at 
> > org.apache.kylin.measure.basic.BigDecimalIngester.valueOf(BigDecimal
> > In
> > gester.java:39)
> > at
> > org.apache.kylin.measure.basic.BigDecimalIngester.valueOf(BigDecimal
> > In
> > gester.java:29)
> > at
> > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(B
> > as
> > eCuboidMapperBase.java:189)
> > at
> > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(Bas
> > eC
> > uboidMapperBase.java:159)
> > at
> > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseC
> > ub
> > oidMapperBase.java:206)
> > at
> > org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBa
> > se
> > CuboidMapper.java:53) at
> > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> > org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
> > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at 
> > java.security.AccessController.doPrivileged(Native Method) at
> > javax.security.auth.Subject.doAs(Subject.java:415) at 
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform
> > at
> > ion.java:1614) at
> > org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> >
> >
> > ********************************************************************
> > **
> > **********************************************************
> > The information in this email is confidential and may be legally 
> > privileged. If you have received this email in error or are not the 
> > intended recipient, please immediately notify the sender and delete 
> > this message from your computer. Any use, distribution, or copying 
> > of this email other than by the intended recipient is strictly 
> > prohibited. All messages sent to and from us may be monitored to 
> > ensure compliance with internal policies and to protect our business.
> > Emails are not secure and cannot be guaranteed to be error free as 
> > they can be intercepted, amended, lost or destroyed, or contain 
> > viruses. Anyone who communicates with us by email is taken to accept
> these risks.
> >
> > 收发邮件者请注意:
> > 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
> > 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
> >
> > ********************************************************************
> > **
> > **********************************************************
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>
>
> **********************************************************************
> **********************************************************
> The information in this email is confidential and may be legally 
> privileged. If you have received this email in error or are not the 
> intended recipient, please immediately notify the sender and delete 
> this message from your computer. Any use, distribution, or copying of 
> this email other than by the intended recipient is strictly 
> prohibited. All messages sent to and from us may be monitored to 
> ensure compliance with internal policies and to protect our business.
> Emails are not secure and cannot be guaranteed to be error free as 
> they can be intercepted, amended, lost or destroyed, or contain 
> viruses. Anyone who communicates with us by email is taken to accept these risks.
>
> 收发邮件者请注意:
> 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
> 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
>
> **********************************************************************
> **********************************************************
>



--
Best regards,

Shaofeng Shi

********************************************************************************************************************************
The information in this email is confidential and may be legally privileged. If you have received this email in error or are not the intended recipient, please immediately notify the sender and delete this message from your computer. Any use, distribution, or copying of this email other than by the intended recipient is strictly prohibited. All messages sent to and from us may be monitored to ensure compliance with internal policies and to protect our business.
Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks.

收发邮件者请注意:
本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
********************************************************************************************************************************

Re: 答复: 答复: problem happened at step "build base cuboid data"

Posted by ShaoFeng Shi <sh...@apache.org>.
Hive is more error-tolerant; while Kylin need the data be washed and clean,
so to ensure the accuracy at a high aggregation level.

I don't have a good idea either; you may need check the up-stream system to
see whether there was some problem.


2016-05-06 23:35 GMT+08:00 胡志华(万里通科技及数据中心商务智能团队数据分析组) <
HUZHIHUA160@pingan.com.cn>:

> I think it's difficulty to check, the amount of data is huge.
>
> Could you give me some suggestion?
>
> -----邮件原件-----
> 发件人: ShaoFeng Shi [mailto:shaofengshi@apache.org]
> 发送时间: 2016年5月6日 23:33
> 收件人: dev@kylin.apache.org
> 主题: Re: 答复: problem happened at step "build base cuboid data"
>
> I mean the data in hive table; if there is some dirty data (e.g, it was
> declared as decimal, but actually be a string), it may cause the cube build
> failed.
>
> 2016-05-06 23:29 GMT+08:00 胡志华(万里通科技及数据中心商务智能团队数据分析组) <
> HUZHIHUA160@pingan.com.cn>:
>
> > You mean data type, let me give you description,
> >
> > > desc partner_txn_sub_order_ft0_s;
> > OK
> > sub_txn_id              string
> > txn_id                  string
> > order_id                string
> > sub_order_id            string
> > pp_order_id             string
> > pp_sub_order_id         string
> > process_dt              string
> > slt_doc_id              string
> > doc_id                  string
> > sub_doc_id              string
> > gain_pay_ind            string
> > pathway_ind             string
> > txn_type_ind            string
> > txn_sub_type_ind        string
> > product_id              string
> > product_name            string
> > rule_id                 string
> > rule_name               string
> > slt_partner_id          string
> > slt_partner_desc        string
> > pay_cash                decimal(22,7)
> > pay_points              decimal(22,7)
> > gain_points             decimal(22,7)
> > discount                decimal(22,7)
> > age_level_ind           string
> > gender_ind              string
> > phone_province_ind      string
> > phone_city_ind          string
> > point_current_level_ind string
> > binding_d               string
> > binding_m               string
> > is_email_verified       int
> > is_mobile_verified      int
> > is_app                  int
> > partner_gain_pt_level_ind       string
> > wlt_txn_level_ind       string
> > brand_point_no          string
> > pathway_desc            string
> > is_activity             int
> > pt_log_d                string
> > partner_id              string
> >
> > # Partition Information
> > # col_name              data_type               comment
> >
> > pt_log_d                string
> > partner_id              string
> > Time taken: 0.124 seconds, Fetched: 47 row(s)
> > hive>
> >
> > -----邮件原件-----
> > 发件人: ShaoFeng Shi [mailto:shaofengshi@apache.org]
> > 发送时间: 2016年5月6日 23:27
> > 收件人: dev@kylin.apache.org
> > 主题: Re: problem happened at step "build base cuboid data"
> >
> > seems some data couldn't be parsed as a BigDecimal. You may need check
> > the data type in source table.
> >
> > 2016-05-06 20:34 GMT+08:00 胡志华(万里通科技及数据中心商务智能团队数据分析组) <
> > HUZHIHUA160@pingan.com.cn>:
> >
> > > Hi all
> > >
> > >          I am encountering a problem at step "build base cuboid
> > > data", mapreduce log as below
> > >
> > > And I googled it, but found nothing useful, so who can help me ?
> > >
> > >
> > > Error: java.lang.NumberFormatException at
> > > java.math.BigDecimal.<init>(BigDecimal.java:470) at
> > > java.math.BigDecimal.<init>(BigDecimal.java:739) at
> > > org.apache.kylin.measure.basic.BigDecimalIngester.valueOf(BigDecimal
> > > In
> > > gester.java:39)
> > > at
> > > org.apache.kylin.measure.basic.BigDecimalIngester.valueOf(BigDecimal
> > > In
> > > gester.java:29)
> > > at
> > > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(B
> > > as
> > > eCuboidMapperBase.java:189)
> > > at
> > > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(Bas
> > > eC
> > > uboidMapperBase.java:159)
> > > at
> > > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseC
> > > ub
> > > oidMapperBase.java:206)
> > > at
> > > org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBa
> > > se
> > > CuboidMapper.java:53) at
> > > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at
> > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> > > org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
> > > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
> > > java.security.AccessController.doPrivileged(Native Method) at
> > > javax.security.auth.Subject.doAs(Subject.java:415) at
> > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInform
> > > at
> > > ion.java:1614) at
> > > org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> > >
> > >
> > > ********************************************************************
> > > **
> > > **********************************************************
> > > The information in this email is confidential and may be legally
> > > privileged. If you have received this email in error or are not the
> > > intended recipient, please immediately notify the sender and delete
> > > this message from your computer. Any use, distribution, or copying
> > > of this email other than by the intended recipient is strictly
> > > prohibited. All messages sent to and from us may be monitored to
> > > ensure compliance with internal policies and to protect our business.
> > > Emails are not secure and cannot be guaranteed to be error free as
> > > they can be intercepted, amended, lost or destroyed, or contain
> > > viruses. Anyone who communicates with us by email is taken to accept
> > these risks.
> > >
> > > 收发邮件者请注意:
> > > 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
> > > 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
> > >
> > > ********************************************************************
> > > **
> > > **********************************************************
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi
> >
> >
> > **********************************************************************
> > **********************************************************
> > The information in this email is confidential and may be legally
> > privileged. If you have received this email in error or are not the
> > intended recipient, please immediately notify the sender and delete
> > this message from your computer. Any use, distribution, or copying of
> > this email other than by the intended recipient is strictly
> > prohibited. All messages sent to and from us may be monitored to
> > ensure compliance with internal policies and to protect our business.
> > Emails are not secure and cannot be guaranteed to be error free as
> > they can be intercepted, amended, lost or destroyed, or contain
> > viruses. Anyone who communicates with us by email is taken to accept
> these risks.
> >
> > 收发邮件者请注意:
> > 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
> > 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
> >
> > **********************************************************************
> > **********************************************************
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>
>
> ********************************************************************************************************************************
> The information in this email is confidential and may be legally
> privileged. If you have received this email in error or are not the
> intended recipient, please immediately notify the sender and delete this
> message from your computer. Any use, distribution, or copying of this email
> other than by the intended recipient is strictly prohibited. All messages
> sent to and from us may be monitored to ensure compliance with internal
> policies and to protect our business.
> Emails are not secure and cannot be guaranteed to be error free as they
> can be intercepted, amended, lost or destroyed, or contain viruses. Anyone
> who communicates with us by email is taken to accept these risks.
>
> 收发邮件者请注意:
> 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
> 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
>
> ********************************************************************************************************************************
>



-- 
Best regards,

Shaofeng Shi