You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@kylin.apache.org by "Lu, Kang-Sen" <kl...@rbbn.com> on 2019/04/08 15:10:05 UTC

question about kylin cube build failure

I am running kylin 2.5.1.

When I am building a cube with spark engine, I got the following error at "#4 Step Name: Extract Fact Table Distinct Columns".

The log shows the following exception:

2019-04-08 12:59:10,375 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 0, hadoop9, executor 1): java.lang.NumberFormatException: For input string: "\N"
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:57)
        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:66)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.addFieldValue(SparkFactDistinct.java:444)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:315)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:226)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Anybody saw this same problem?

Thanks.

Kang-sen


-----------------------------------------------------------------------------------------------------------------------
Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that
is confidential and/or proprietary for the sole use of the intended recipient.  Any review, disclosure, reliance or
distribution by others or forwarding without express permission is strictly prohibited.  If you are not the intended
recipient, please notify the sender immediately and then delete all copies, including any attachments.
-----------------------------------------------------------------------------------------------------------------------

RE: Re:RE: Re:RE: Re:question about kylin cube build failure

Posted by "Lu, Kang-Sen" <kl...@rbbn.com>.

Hi, Chunen:

Thanks for your funding. This bug KYLIN-3644 is exactly what I have seen. The exception and stack trace matched. Also this bug was fixed and submitted to 2.5.1 on 11/06/2018.

Shaofeng SHI<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=Shaofengshi> added a comment - 06/Nov/18 02:21
“Resolved in release 2.5.1 (2018-11-06)"
I will merge the bug fix into my sandbox.

Kang-sen

From: zjsynce@163.com <zj...@163.com> On Behalf Of nichunen
Sent: Tuesday, April 09, 2019 10:00 AM
To: user@kylin.apache.org
Subject: Re:RE: Re:RE: Re:question about kylin cube build failure

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi

I found jira issue https://issues.apache.org/jira/browse/KYLIN-3644<https://issues.apache.org/jira/browse/KYLIN-3644> has the similar exception with yours, but not same stack trace, and it should be resolved in 2.5.1. I'll check the source code later.

--

Best regards,

Ni Chunen / George

At 2019-04-09 20:08:00, "Lu, Kang-Sen" <kl...@rbbn.com>> wrote:

Hi, Chunen:

It is hard for me to check the fact table content because it is huge. Bug-3828 was about “ArrayIndexOutOfBoundsException thrown when build a streaming cube with empty data in its first dimension
“. It is about a different type of exception.

I am wondering if a data column is of type “bigint” in kylin. Could the metric as TOPN or SUM cause sparke engine to use different data type. Because with TOPN aggregation, I did not see this ParseDouble exception. But with SUM aggregation, the spark engine can throw ParseDouble exception.

BTW: “java.lang.NumberFormatException: For input string: "\N"”, is “\N” the same as “\n”?

Kang-sen

From: zjsynce@163.com<ma...@163.com> <zj...@163.com>> On Behalf Of nichunen
Sent: Monday, April 08, 2019 9:55 PM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: Re:RE: Re:question about kylin cube build failure

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi Kang-Sen,

Have you check your hive table, whether there is any dirty data？

By the way, empty data in your first dimension may also cause such an exception. It has been fixed in 2.6.1 https://issues.apache.org/jira/browse/KYLIN-3828<https://issues.apache.org/jira/browse/KYLIN-3828>
--


Best regards,

Ni Chunen / George

At 2019-04-09 00:04:40, "Lu, Kang-Sen" <kl...@rbbn.com>> wrote:


Hi, Chunen:

Thanks for your reply.

I am puzzled by the fact that based on the same data model, I created two cubes, one for computing TOPN metric, and the other for all other aggregation. The reason I separate the TOPN cube creation from the other normal cube is because the TOPN is related to a dimension with high cardinality like SUBSCRIBER_ID.

The same fact table is used to build cuboids for both cube spec. I don’t have problem when building the TOPN cube, with spark engine.
But when I build a cube with spark engine for the normal cube, I had this “\N” format exception. In addition, if I build this normal cube with MR engine, there is no format exception.

Does it make sense to you?

Kang-sen

From: zjsynce@163.com<ma...@163.com> <zj...@163.com>> On Behalf Of nichunen
Sent: Monday, April 08, 2019 11:35 AM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: Re:question about kylin cube build failure

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi Kang-Sen,

It looks like there is a "\n" in your source data of a column with double type.

--



Best regards,

Ni Chunen / George

At 2019-04-08 23:10:05, "Lu, Kang-Sen" <kl...@rbbn.com>> wrote:



I am running kylin 2.5.1.

When I am building a cube with spark engine, I got the following error at “#4 Step Name: Extract Fact Table Distinct Columns”.

The log shows the following exception:

2019-04-08 12:59:10,375 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 0, hadoop9, executor 1): java.lang.NumberFormatException: For input string: "\N"
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:57)
        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:66)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.addFieldValue(SparkFactDistinct.java:444)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:315)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:226)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Anybody saw this same problem?

Thanks.

Kang-sen

________________________________
Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.
________________________________

Re:RE: Re:RE: Re:question about kylin cube build failure

Posted by nichunen <ni...@apache.org>.

Hi 


I found jira issue https://issues.apache.org/jira/browse/KYLIN-3644 has the similar exception with yours, but not same stack trace, and it should be resolved in 2.5.1. I'll check the source code later.



--


Best regards,

 

Ni Chunen / George


At 2019-04-09 20:08:00, "Lu, Kang-Sen" <kl...@rbbn.com> wrote:


Hi, Chunen:

 

It is hard for me to check the fact table content because it is huge. Bug-3828 was about “ArrayIndexOutOfBoundsException thrown when build a streaming cube with empty data in its first dimension

“. It is about a different type of exception.

 

I am wondering if a data column is of type “bigint” in kylin. Could the metric as TOPN or SUM cause sparke engine to use different data type. Because with TOPN aggregation, I did not see this ParseDouble exception. But with SUM aggregation, the spark engine can throw ParseDouble exception.

 

BTW: “java.lang.NumberFormatException: For input string: "\N"”, is “\N” the same as “\n”?

 

Kang-sen

 

From: zjsynce@163.com <zj...@163.com> On Behalf Of nichunen
Sent: Monday, April 08, 2019 9:55 PM
To: user@kylin.apache.org
Subject: Re:RE: Re:question about kylin cube build failure

 

NOTICE: This email was received from an EXTERNAL sender

 

Hi Kang-Sen,

 

Have you check your hive table, whether there is any dirty data？

 

By the way, empty data in your first dimension may also cause such an exception. It has been fixed in 2.6.1 https://issues.apache.org/jira/browse/KYLIN-3828

--



Best regards,

 

Ni Chunen / George


At 2019-04-09 00:04:40, "Lu, Kang-Sen" <kl...@rbbn.com> wrote:



Hi, Chunen:

 

Thanks for your reply.

 

I am puzzled by the fact that based on the same data model, I created two cubes, one for computing TOPN metric, and the other for all other aggregation. The reason I separate the TOPN cube creation from the other normal cube is because the TOPN is related to a dimension with high cardinality like SUBSCRIBER_ID.

 

The same fact table is used to build cuboids for both cube spec. I don’t have problem when building the TOPN cube, with spark engine.

But when I build a cube with spark engine for the normal cube, I had this “\N” format exception. In addition, if I build this normal cube with MR engine, there is no format exception.

 

Does it make sense to you?

 

Kang-sen

 

From:zjsynce@163.com <zj...@163.com> On Behalf Of nichunen
Sent: Monday, April 08, 2019 11:35 AM
To:user@kylin.apache.org
Subject: Re:question about kylin cube build failure

 

NOTICE: This email was received from an EXTERNAL sender

 

Hi Kang-Sen,

 

It looks like there is a "\n" in your source data of a column with double type. 

 

--




Best regards,

 

Ni Chunen / George


At 2019-04-08 23:10:05, "Lu, Kang-Sen" <kl...@rbbn.com> wrote:




I am running kylin 2.5.1.

 

When I am building a cube with spark engine, I got the following error at “#4 Step Name: Extract Fact Table Distinct Columns”.

 

The log shows the following exception:

 

2019-04-08 12:59:10,375 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 0, hadoop9, executor 1): java.lang.NumberFormatException: For input string: "\N"

        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)

        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)

        at java.lang.Double.parseDouble(Double.java:538)

        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:57)

        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:66)

        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.addFieldValue(SparkFactDistinct.java:444)

        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:315)

        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:226)

        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)

        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)

        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)

        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)

        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)

        at org.apache.spark.scheduler.Task.run(Task.scala:99)

        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

Anybody saw this same problem?

 

Thanks.

 

Kang-sen

 

Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.

RE: Re:RE: Re:question about kylin cube build failure

Posted by "Lu, Kang-Sen" <kl...@rbbn.com>.

Hi, Chunen:

It is hard for me to check the fact table content because it is huge. Bug-3828 was about “ArrayIndexOutOfBoundsException thrown when build a streaming cube with empty data in its first dimension
“. It is about a different type of exception.

I am wondering if a data column is of type “bigint” in kylin. Could the metric as TOPN or SUM cause sparke engine to use different data type. Because with TOPN aggregation, I did not see this ParseDouble exception. But with SUM aggregation, the spark engine can throw ParseDouble exception.

BTW: “java.lang.NumberFormatException: For input string: "\N"”, is “\N” the same as “\n”?

Kang-sen

From: zjsynce@163.com <zj...@163.com> On Behalf Of nichunen
Sent: Monday, April 08, 2019 9:55 PM
To: user@kylin.apache.org
Subject: Re:RE: Re:question about kylin cube build failure

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi Kang-Sen,

Have you check your hive table, whether there is any dirty data？

By the way, empty data in your first dimension may also cause such an exception. It has been fixed in 2.6.1 https://issues.apache.org/jira/browse/KYLIN-3828<https://issues.apache.org/jira/browse/KYLIN-3828>
--

Best regards,

Ni Chunen / George

At 2019-04-09 00:04:40, "Lu, Kang-Sen" <kl...@rbbn.com>> wrote:

Hi, Chunen:

Thanks for your reply.

I am puzzled by the fact that based on the same data model, I created two cubes, one for computing TOPN metric, and the other for all other aggregation. The reason I separate the TOPN cube creation from the other normal cube is because the TOPN is related to a dimension with high cardinality like SUBSCRIBER_ID.

The same fact table is used to build cuboids for both cube spec. I don’t have problem when building the TOPN cube, with spark engine.
But when I build a cube with spark engine for the normal cube, I had this “\N” format exception. In addition, if I build this normal cube with MR engine, there is no format exception.

Does it make sense to you?

Kang-sen

From: zjsynce@163.com<ma...@163.com> <zj...@163.com>> On Behalf Of nichunen
Sent: Monday, April 08, 2019 11:35 AM
To: user@kylin.apache.org<ma...@kylin.apache.org>
Subject: Re:question about kylin cube build failure

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi Kang-Sen,

It looks like there is a "\n" in your source data of a column with double type.

--


Best regards,

Ni Chunen / George

At 2019-04-08 23:10:05, "Lu, Kang-Sen" <kl...@rbbn.com>> wrote:


I am running kylin 2.5.1.

When I am building a cube with spark engine, I got the following error at “#4 Step Name: Extract Fact Table Distinct Columns”.

The log shows the following exception:

2019-04-08 12:59:10,375 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 0, hadoop9, executor 1): java.lang.NumberFormatException: For input string: "\N"
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:57)
        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:66)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.addFieldValue(SparkFactDistinct.java:444)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:315)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:226)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Anybody saw this same problem?

Thanks.

Kang-sen

________________________________
Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.
________________________________

Re:RE: Re:question about kylin cube build failure

Posted by nichunen <ni...@apache.org>.

Hi Kang-Sen,


Have you check your hive table, whether there is any dirty data？


By the way, empty data in your first dimension may also cause such an exception. It has been fixed in 2.6.1 https://issues.apache.org/jira/browse/KYLIN-3828


--


Best regards,

 

Ni Chunen / George


At 2019-04-09 00:04:40, "Lu, Kang-Sen" <kl...@rbbn.com> wrote:


Hi, Chunen:

 

Thanks for your reply.

 

I am puzzled by the fact that based on the same data model, I created two cubes, one for computing TOPN metric, and the other for all other aggregation. The reason I separate the TOPN cube creation from the other normal cube is because the TOPN is related to a dimension with high cardinality like SUBSCRIBER_ID.

 

The same fact table is used to build cuboids for both cube spec. I don’t have problem when building the TOPN cube, with spark engine.

But when I build a cube with spark engine for the normal cube, I had this “\N” format exception. In addition, if I build this normal cube with MR engine, there is no format exception.

 

Does it make sense to you?

 

Kang-sen

 

From: zjsynce@163.com <zj...@163.com> On Behalf Of nichunen
Sent: Monday, April 08, 2019 11:35 AM
To: user@kylin.apache.org
Subject: Re:question about kylin cube build failure

 

NOTICE: This email was received from an EXTERNAL sender

 

Hi Kang-Sen,

 

It looks like there is a "\n" in your source data of a column with double type. 

 

--



Best regards,

 

Ni Chunen / George


At 2019-04-08 23:10:05, "Lu, Kang-Sen" <kl...@rbbn.com> wrote:



I am running kylin 2.5.1.

 

When I am building a cube with spark engine, I got the following error at “#4 Step Name: Extract Fact Table Distinct Columns”.

 

The log shows the following exception:

 

2019-04-08 12:59:10,375 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 0, hadoop9, executor 1): java.lang.NumberFormatException: For input string: "\N"

        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)

        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)

        at java.lang.Double.parseDouble(Double.java:538)

        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:57)

        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:66)

        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.addFieldValue(SparkFactDistinct.java:444)

        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:315)

        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:226)

        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)

        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)

        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)

        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)

        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)

        at org.apache.spark.scheduler.Task.run(Task.scala:99)

        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

Anybody saw this same problem?

 

Thanks.

 

Kang-sen

 

Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.

RE: Re:question about kylin cube build failure

Posted by "Lu, Kang-Sen" <kl...@rbbn.com>.

Hi, Chunen:

Thanks for your reply.

I am puzzled by the fact that based on the same data model, I created two cubes, one for computing TOPN metric, and the other for all other aggregation. The reason I separate the TOPN cube creation from the other normal cube is because the TOPN is related to a dimension with high cardinality like SUBSCRIBER_ID.

The same fact table is used to build cuboids for both cube spec. I don’t have problem when building the TOPN cube, with spark engine.
But when I build a cube with spark engine for the normal cube, I had this “\N” format exception. In addition, if I build this normal cube with MR engine, there is no format exception.

Does it make sense to you?

Kang-sen

From: zjsynce@163.com <zj...@163.com> On Behalf Of nichunen
Sent: Monday, April 08, 2019 11:35 AM
To: user@kylin.apache.org
Subject: Re:question about kylin cube build failure

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi Kang-Sen,

It looks like there is a "\n" in your source data of a column with double type.

--

Best regards,

Ni Chunen / George

At 2019-04-08 23:10:05, "Lu, Kang-Sen" <kl...@rbbn.com>> wrote:

I am running kylin 2.5.1.

When I am building a cube with spark engine, I got the following error at “#4 Step Name: Extract Fact Table Distinct Columns”.

The log shows the following exception:

2019-04-08 12:59:10,375 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 0, hadoop9, executor 1): java.lang.NumberFormatException: For input string: "\N"
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
        at java.lang.Double.parseDouble(Double.java:538)
        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:57)
        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:66)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.addFieldValue(SparkFactDistinct.java:444)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:315)
        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:226)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Anybody saw this same problem?

Thanks.

Kang-sen

________________________________
Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.
________________________________

Re:question about kylin cube build failure

Posted by nichunen <ni...@apache.org>.

Hi Kang-Sen,


It looks like there is a "\n" in your source data of a column with double type. 


--


Best regards,

 

Ni Chunen / George


At 2019-04-08 23:10:05, "Lu, Kang-Sen" <kl...@rbbn.com> wrote:


I am running kylin 2.5.1.

 

When I am building a cube with spark engine, I got the following error at “#4 Step Name: Extract Fact Table Distinct Columns”.

 

The log shows the following exception:

 

2019-04-08 12:59:10,375 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 0, hadoop9, executor 1): java.lang.NumberFormatException: For input string: "\N"

        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)

        at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)

        at java.lang.Double.parseDouble(Double.java:538)

        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:57)

        at org.apache.kylin.engine.mr.steps.SelfDefineSortableKey.init(SelfDefineSortableKey.java:66)

        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.addFieldValue(SparkFactDistinct.java:444)

        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:315)

        at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:226)

        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)

        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)

        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)

        at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)

        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)

        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)

        at org.apache.spark.scheduler.Task.run(Task.scala:99)

        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

Anybody saw this same problem?

 

Thanks.

 

Kang-sen




Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.