You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by 刘feng <f....@neusoft.com> on 2017/09/19 02:05:38 UTC

insert carbondata table failed

Hi，community：

   It inserts records from a source table into a target CarbonData
table(kc22_ca). The source table can be a Hive table(‘kc22_p1’).

kc22_p1 records : 102200946  51.5 G

Stage:

spark-shell --master yarn-client --driver-memory 20G --executor-cores 1
--num-executors 12  --executor-memory 5G

 

val cc = new CarbonContext(sc, "hdfs://cluster1/opt/CarbonStore")

 

cc.sql("create table if not exists kc22_ca (akb020 String,akc190
String,aae072 String,akc220 String,ake005 String,bka135 String,bkc301
String,ake001 String,ake002 String,ake006 String,akc221 String,ake010
String,aka065 String,ake003 String,aka063 String,akc225 double,akc226
double,aae019 double,akc228 double,ake051 double,aka068 double,akc268
double,bkc228 double,bka635 double,aka069 double,bka107 double,bka108
double,bkc127 String,aka064 String,aae100 String,bkc126 String,bkc125
String,bka231 String,bae073 double,bka636 double,bka637 double,bka104
double,bka609 String,aka070 String,aka067 String,aka074 String,bkc378
String,bkc379 String,bkc380 String,bkc381 String,aae011 String,aae036
String,bkc319 double,bkf050 String,akc273 String,aka071 double,aka072
String,aka107 String,bka076 String,akf002 String,bkc241 double,bkc242
String,bkc243 String,bka205 String,bkb401 String,bka650 double,bka651
String,aka130 String,aka120 String,bae075 double,aae017 String,aae032
String,bkc060 double,bkc061 double,bkc062 double,bkc063 double,bkc064
double,bkc065 double,bkc066 String,bkc067 String,bkc068 String,bkc069
String,baz001 double,baz002 double,bze011 String,bze036 String,aaa027
String,aab034 String,aac001 double,bkb070 String,bkb071 String,bkc077
String,bkc078 String,bkc079 String,bkc081 double,bka610 String,bka971
double,bka972 double,bka973 String,bka974 String) STORED BY 'carbondata'
TBLPROPERTIES('DICTIONARY_INCLUD'='akb020,  aae072, bka135, akc220, ake005,
bkc301','DICTIONARY_EXCLUDE'='akc190,ake001,ake002,ake006,akc221,ake010,aka0
65,ake003,aka063,bkc127,aka064,aae100,bkc126,bkc125,bka231,bka609,aka070,aka
067,aka074,bkc378,bkc379,bkc380,bkc381,aae011,aae036,bkf050,akc273,aka072,ak
a107,bka076,akf002,bkc242,bkc243,bka205,bkb401,bka651,aka130,aka120,aae017,a
ae032,bkc066,bkc067,bkc068,bkc069,bze011,bze036,aaa027,aab034,bkb070,bkb071,
bkc077,bkc078,bkc079,bka610,bka973,bka974')")

 

note: When using only DICTIONARY_INCLUDE and the two are used together, the
amount of shuffle is not the same.

Reference annex

 

Log:

17/09/19 09:29:51 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID
1039) in 8523 ms on node2 (3/7)

17/09/19 09:30:13 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID
1036) in 30754 ms on node2 (4/7)

17/09/19 09:30:18 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID
1037) in 35309 ms on node1 (5/7)

17/09/19 09:33:49 WARN HeartbeatReceiver: Removing executor 5 with no recent
heartbeats: 135938 ms exceeds timeout 120000 ms

17/09/19 09:33:49 ERROR YarnScheduler: Lost executor 5 on node1: Executor
heartbeat timed out after 135938 ms

17/09/19 09:33:49 WARN TaskSetManager: Lost task 6.0 in stage 1.0 (TID 1041,
node1): ExecutorLostFailure (executor 5 exited caused by one of the running
tasks) Reason: Executor heartbeat timed out after 135938 ms

17/09/19 09:33:49 INFO TaskSetManager: Starting task 6.1 in stage 1.0 (TID
1042, node3, partition 6,PROCESS_LOCAL, 1894 bytes)

17/09/19 09:33:49 INFO DAGScheduler: Executor lost: 5 (epoch 1)

17/09/19 09:33:49 INFO YarnClientSchedulerBackend: Requesting to kill
executor(s) 5

17/09/19 09:33:49 INFO BlockManagerMasterEndpoint: Trying to remove executor
5 from BlockManagerMaster.

17/09/19 09:33:49 INFO BlockManagerMasterEndpoint: Removing block manager
BlockManagerId(5, node1, 58006)

17/09/19 09:33:49 INFO BlockManagerMaster: Removed 5 successfully in
removeExecutor

17/09/19 09:33:49 INFO ShuffleMapStage: ShuffleMapStage 0 is now unavailable
on executor 5 (917/1035, false)

17/09/19 09:33:49 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory
on node3:57113 (size: 3.7 KB, free: 4.1 GB)

17/09/19 09:33:49 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 0 to node3:33757

17/09/19 09:33:49 INFO MapOutputTrackerMaster: Size of output statuses for
shuffle 0 is 5830 bytes

17/09/19 09:33:49 WARN TaskSetManager: Lost task 6.1 in stage 1.0 (TID 1042,
node3): FetchFailed(null, shuffleId=0, mapId=-1, reduceId=6, message=

org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output
location for shuffle 0

        at
org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker
$$convertMapStatuses$2.apply(MapOutputTracker.scala:542)

        at
org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker
$$convertMapStatuses$2.apply(MapOutputTracker.scala:538)

        at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Travers
ableLike.scala:772)

        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala
:33)

        at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)

        at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:77
1)

        at
org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$conver
tMapStatuses(MapOutputTracker.scala:538)

        at
org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.s
cala:155)

        at
org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReade
r.scala:47)

        at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

        at
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.<i
nit>(CarbonGlobalDictionaryRDD.scala:372)

        at
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(Ca
rbonGlobalDictionaryRDD.scala:345)

        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)

        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)

        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

        at org.apache.spark.scheduler.Task.run(Task.scala:89)

        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
42)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
17)

        at java.lang.Thread.run(Thread.java:745)

 

)

17/09/19 09:33:49 INFO DAGScheduler: Marking ResultStage 1 (collect at
GlobalDictionaryUtil.scala:746) as failed due to a fetch failure from
ShuffleMapStage 0 (RDD at CarbonGlobalDictionaryRDD.scala:271)

17/09/19 09:33:49 INFO DAGScheduler: ResultStage 1 (collect at
GlobalDictionaryUtil.scala:746) failed in 247.083 s

17/09/19 09:33:49 INFO DAGScheduler: Resubmitting ShuffleMapStage 0 (RDD at
CarbonGlobalDictionaryRDD.scala:271) and ResultStage 1 (collect at
GlobalDictionaryUtil.scala:746) due to fetch failure

17/09/19 09:33:50 INFO DAGScheduler: Resubmitting failed stages

17/09/19 09:33:50 INFO DAGScheduler: Submitting ShuffleMapStage 0
(CarbonBlockDistinctValuesCombineRDD[11] at RDD at
CarbonGlobalDictionaryRDD.scala:271), which has no missing parents

17/09/19 09:33:50 INFO MemoryStore: Block broadcast_4 stored as values in
memory (estimated size 15.0 KB, free 1291.9 KB)

17/09/19 09:33:50 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes
in memory (estimated size 7.1 KB, free 1299.0 KB)

17/09/19 09:33:50 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory
on 10.9.22.15:58333 (size: 7.1 KB, free: 14.2 GB)

17/09/19 09:33:50 INFO SparkContext: Created broadcast 4 from broadcast at
DAGScheduler.scala:1006

17/09/19 09:33:50 INFO DAGScheduler: Submitting 118 missing tasks from
ShuffleMapStage 0 (CarbonBlockDistinctValuesCombineRDD[11] at RDD at
CarbonGlobalDictionaryRDD.scala:271)

 

----------------------------------------------------------------------------
---------------------------

刘峰

技术发展部（TDD）

东软集团股份有限公司

沈阳浑南新区新秀街2号东软软件园A2-105A

Postcode:110179

Mobile：13889865456

 



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------

答复: insert carbondata table failed

Posted by 刘feng <f....@neusoft.com>.

Sorry，

A total of 4 nodes . of which 3 as datanode and snn on one of the datanodes.

Version:
Carbondata 1.1.0
Spark 1.6.0
Hadoop :2.7.2

Thank you for your help , I'm trying again 
=========================
Liu feng

-----邮件原件-----
发件人: ravipesala [mailto:ravi.pesala@gmail.com] 
发送时间: 2017年9月19日 11:23
收件人: dev@carbondata.apache.org
主题: Re: insert carbondata table failed

Hello,

I don't get much from the logs but the error seems related to memory issue
from Spark. From your old emails I get that you are using 3 node cluster. Is
that all 3 node has nodemanager and datanodes?
So better give only less number of executors and provide more memory to it
like below. While data loading it is recommended to use one executor per
nodemanager.

spark-shell --master yarn-client --driver-memory 10G --executor-cores 4
--num-executors 3  --executor-memory 25G

And also if any configuration gives any error please provide the executor
log.

Thank you,
Ravindra.



--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------

Re: insert carbondata table failed

Posted by ravipesala <ra...@gmail.com>.

Hello,

I don't get much from the logs but the error seems related to memory issue
from Spark. From your old emails I get that you are using 3 node cluster. Is
that all 3 node has nodemanager and datanodes?
So better give only less number of executors and provide more memory to it
like below. While data loading it is recommended to use one executor per
nodemanager.

spark-shell --master yarn-client --driver-memory 10G --executor-cores 4
--num-executors 3  --executor-memory 25G

And also if any configuration gives any error please provide the executor
log.

Thank you,
Ravindra.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

答复: insert carbondata table failed

Posted by 刘feng <f....@neusoft.com>.

Thank you ,
  I have tried to resolve this issue by making changes in the spark
configuration and use two fields as DICTIONARY_INCLUDE.
  test data(30G) load 8 times, each time about 1.5 minutes to complete

 Is currently testing another larger data, hope to be successful, thank you
very much for the help!
=========================
Liu feng


-----邮件原件-----
发件人: manishgupta88 [mailto:tomanishgupta18@gmail.com] 
发送时间: 2017年9月19日 13:27
收件人: dev@carbondata.apache.org
主题: Re: insert carbondata table failed

Hi Feng,

You can also refer the below links wherein the spark users have tried to
resolve this issue by making changes in the configuration. This might help
you.

https://stackoverflow.com/questions/28901123/why-do-spark-jobs-fail-with-org
-apache-spark-shuffle-metadatafetchfailedexceptio

https://stackoverflow.com/questions/29850784/what-are-the-likely-causes-of-o
rg-apache-spark-shuffle-metadatafetchfailedexcept

Regards
Manish Gupta



--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/



---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------

Re: insert carbondata table failed

Posted by manishgupta88 <to...@gmail.com>.

Hi Feng,

You can also refer the below links wherein the spark users have tried to
resolve this issue by making changes in the configuration. This might help
you.

https://stackoverflow.com/questions/28901123/why-do-spark-jobs-fail-with-org-apache-spark-shuffle-metadatafetchfailedexceptio

https://stackoverflow.com/questions/29850784/what-are-the-likely-causes-of-org-apache-spark-shuffle-metadatafetchfailedexcept

Regards
Manish Gupta



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/