You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by "wwyxg@163.com" <ww...@163.com> on 2017/03/25 10:48:03 UTC

insert into carbon table failed

Hello!

0、The failure
When i insert into carbon table，i encounter failure。The failure is  as follow:
Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Slave lost+details
Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Slave lost
Driver stacktrace:
the stage:

Step:
1、start spark－shell
./bin/spark-shell \ 
--master yarn-client \ 
--num-executors 5 \  (I tried to set this parameter range from 10 to 20,but the second job has only 5 tasks)
--executor-cores 5 \ 
--executor-memory 20G \ 
--driver-memory 8G \ 
--queue root.default \ 
--jars /xxx.jar

//spark-default.conf spark.default.parallelism=320

import org.apache.spark.sql.CarbonContext 
val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore") 

2、create table
cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst String,plat String,sty String,is_pay String,is_vip String,is_mpack String,scene String,status String,nw String,isc String,area String,spttag String,province String,isp String,city String,tv String,hwm String,pip String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')") 

//notes，set "fo" column BUCKETCOLUMNS is to join another table
//the column distinct values are as follows:


3、insert into table（xxxx_table_tmp  is a hive extenal orc table，has 20 0000 0000 records）
cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from xxxx_table_tmp where dt='2017-01-01'")

4、spark split sql into two jobs，the first finished succeeded, but the second failed:

 
5、The second job stage:



Question:
1、Why the second job has only five jobs,but the first job has 994 jobs ?( note:My hadoop cluster has 5 datanode）
      I guess it caused the failure 
2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that "one datanode has only one partition ,and then the task is only one on the datanode"?
3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set as follow,but i can not find "carbon.table.split.partition.enable" in other parts of the project。
     I set "carbon.table.split.partition.enable" to true, but the second job has only five jobs.How to use this property?
     ExampleUtils :
    // whether use table split partition 
    // true -> use table split partition, support multiple partition loading 
    // false -> use node split partition, support data load by host partition 
    CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable", "false") 
4、Insert into carbon table takes 3 hours ,but eventually failed 。How can i speed it.
5、in the spark-shell  ,I tried to set this parameter range from 10 to 20,but the second job has only 5 tasks
     the other parameter executor-memory = 20G is enough?

I need your help!Thank you very much!

wwyxg@163.com



wwyxg@163.com

Re:Re: insert into carbon table failed

Posted by a <ww...@163.com>.

| col_name | data_type | 基数数量 |
| dt       | string | date |
| pt       | string | 3 |
| lst      | string | 1 |
| plat     | string | 1 |
| sty      | string | 2 |
| is_pay | string | 2 |
| is_vip | string | 2 |
| is_mpack | string | 2 |
| scene    | string | 3 |
| status   | string | 4 |
| nw       | string | 5 |
| isc      | string | 5 |
| area     | string | 9 |
| spttag   | string | 18 |
| province | string | 484 |
| isp      | string | 706 |
| city     | string | 1127 |
| tv       | string | 1577 |
| hwm      | string | 10000 |
| pip      | string | 1000000 |
| fo | string | 6307095 |
| sh       | string | 10000000 |
| mid      | string | 80000000 |
| user_id  | string | 80000000 |
| play_pv | bigint | 　 |
| spt_cnt  | bigint | 　 |
| prg_spt_cnt | bigint | 　 |






At 2017-03-25 18:52:07, "Liang Chen" <ch...@gmail.com> wrote:
>Hi
>
>Please provide all columns' cardinality info(distinct value).
>
>Regards
>Liang
>
>
>wwyxg@163.com wrote
>> Hello!
>> 
>> 0、The failure
>> When i insert into carbon table，i encounter failure。The failure is  as
>> follow:
>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>> Reason: Slave lost+details
>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>> Reason: Slave lost
>> Driver stacktrace:
>> the stage:
>> 
>> Step:
>> 1、start spark－shell
>> ./bin/spark-shell \ 
>> --master yarn-client \ 
>> --num-executors 5 \  (I tried to set this parameter range from 10 to
>> 20,but the second job has only 5 tasks)
>> --executor-cores 5 \ 
>> --executor-memory 20G \ 
>> --driver-memory 8G \ 
>> --queue root.default \ 
>> --jars /xxx.jar
>> 
>> //spark-default.conf spark.default.parallelism=320
>> 
>> import org.apache.spark.sql.CarbonContext 
>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore") 
>> 
>> 2、create table
>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
>> String,scene String,status String,nw String,isc String,area String,spttag
>> String,province String,isp String,city String,tv String,hwm String,pip
>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
>> BY 'carbondata' TBLPROPERTIES
>> ('DICTIONARY_EXCLUDE'='pip,sh,mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')") 
>> 
>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
>> //the column distinct values are as follows:
>> 
>> 
>> 3、insert into table（xxxx_table_tmp  is a hive extenal orc table，has 20
>> 0000 0000 records）
>> cc.sql("insert into xxxx_table select
>> dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm,pip,fo,sh,mid,user_id
>> ,play_pv,spt_cnt,prg_spt_cnt from xxxx_table_tmp where dt='2017-01-01'")
>> 
>> 4、spark split sql into two jobs，the first finished succeeded, but the
>> second failed:
>> 
>>  
>> 5、The second job stage:
>> 
>> 
>> 
>> Question:
>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
>> note:My hadoop cluster has 5 datanode）
>>       I guess it caused the failure 
>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
>> "one datanode has only one partition ,and then the task is only one on the
>> datanode"?
>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
>> as follow,but i can not find "carbon.table.split.partition.enable" in
>> other parts of the project。
>>      I set "carbon.table.split.partition.enable" to true, but the second
>> job has only five jobs.How to use this property?
>>      ExampleUtils :
>>     // whether use table split partition 
>>     // true -> use table split partition, support multiple partition
>> loading 
>>     // false -> use node split partition, support data load by host
>> partition 
>>    
>> CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
>> "false") 
>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can i
>> speed it.
>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
>> 20,but the second job has only 5 tasks
>>      the other parameter executor-memory = 20G is enough?
>> 
>> I need your help!Thank you very much!
>
>> wwyxg@
>
>> 
>> 
>
>> wwyxg@
>
>
>
>
>
>--
>View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/insert-into-carbon-table-failed-tp9609p9610.html
>Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re:Re: insert into carbon table failed

Posted by a <ww...@163.com>.









At 2017-03-25 18:52:07, "Liang Chen" <ch...@gmail.com> wrote:
>Hi
>
>Please provide all columns' cardinality info(distinct value).
>
>Regards
>Liang
>
>
>wwyxg@163.com wrote
>> Hello!
>> 
>> 0、The failure
>> When i insert into carbon table，i encounter failure。The failure is  as
>> follow:
>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>> Reason: Slave lost+details
>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>> Reason: Slave lost
>> Driver stacktrace:
>> the stage:
>> 
>> Step:
>> 1、start spark－shell
>> ./bin/spark-shell \ 
>> --master yarn-client \ 
>> --num-executors 5 \  (I tried to set this parameter range from 10 to
>> 20,but the second job has only 5 tasks)
>> --executor-cores 5 \ 
>> --executor-memory 20G \ 
>> --driver-memory 8G \ 
>> --queue root.default \ 
>> --jars /xxx.jar
>> 
>> //spark-default.conf spark.default.parallelism=320
>> 
>> import org.apache.spark.sql.CarbonContext 
>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore") 
>> 
>> 2、create table
>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
>> String,scene String,status String,nw String,isc String,area String,spttag
>> String,province String,isp String,city String,tv String,hwm String,pip
>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
>> BY 'carbondata' TBLPROPERTIES
>> ('DICTIONARY_EXCLUDE'='pip,sh,mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')") 
>> 
>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
>> //the column distinct values are as follows:
>> 
>> 
>> 3、insert into table（xxxx_table_tmp  is a hive extenal orc table，has 20
>> 0000 0000 records）
>> cc.sql("insert into xxxx_table select
>> dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm,pip,fo,sh,mid,user_id
>> ,play_pv,spt_cnt,prg_spt_cnt from xxxx_table_tmp where dt='2017-01-01'")
>> 
>> 4、spark split sql into two jobs，the first finished succeeded, but the
>> second failed:
>> 
>>  
>> 5、The second job stage:
>> 
>> 
>> 
>> Question:
>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
>> note:My hadoop cluster has 5 datanode）
>>       I guess it caused the failure 
>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
>> "one datanode has only one partition ,and then the task is only one on the
>> datanode"?
>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
>> as follow,but i can not find "carbon.table.split.partition.enable" in
>> other parts of the project。
>>      I set "carbon.table.split.partition.enable" to true, but the second
>> job has only five jobs.How to use this property?
>>      ExampleUtils :
>>     // whether use table split partition 
>>     // true -> use table split partition, support multiple partition
>> loading 
>>     // false -> use node split partition, support data load by host
>> partition 
>>    
>> CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
>> "false") 
>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can i
>> speed it.
>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
>> 20,but the second job has only 5 tasks
>>      the other parameter executor-memory = 20G is enough?
>> 
>> I need your help!Thank you very much!
>
>> wwyxg@
>
>> 
>> 
>
>> wwyxg@
>
>
>
>
>
>--
>View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/insert-into-carbon-table-failed-tp9609p9610.html
>Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re: insert into carbon table failed

Posted by Liang Chen <ch...@gmail.com>.

Hi

Please provide all columns' cardinality info(distinct value).

Regards
Liang


wwyxg@163.com wrote
> Hello!
> 
> 0、The failure
> When i insert into carbon table，i encounter failure。The failure is  as
> follow:
> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
> Reason: Slave lost+details
> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
> Reason: Slave lost
> Driver stacktrace:
> the stage:
> 
> Step:
> 1、start spark－shell
> ./bin/spark-shell \ 
> --master yarn-client \ 
> --num-executors 5 \  (I tried to set this parameter range from 10 to
> 20,but the second job has only 5 tasks)
> --executor-cores 5 \ 
> --executor-memory 20G \ 
> --driver-memory 8G \ 
> --queue root.default \ 
> --jars /xxx.jar
> 
> //spark-default.conf spark.default.parallelism=320
> 
> import org.apache.spark.sql.CarbonContext 
> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore") 
> 
> 2、create table
> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
> String,plat String,sty String,is_pay String,is_vip String,is_mpack
> String,scene String,status String,nw String,isc String,area String,spttag
> String,province String,isp String,city String,tv String,hwm String,pip
> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
> BY 'carbondata' TBLPROPERTIES
> ('DICTIONARY_EXCLUDE'='pip,sh,mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')") 
> 
> //notes，set "fo" column BUCKETCOLUMNS is to join another table
> //the column distinct values are as follows:
> 
> 
> 3、insert into table（xxxx_table_tmp  is a hive extenal orc table，has 20
> 0000 0000 records）
> cc.sql("insert into xxxx_table select
> dt,pt,lst,plat,sty,is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,city,tv,hwm,pip,fo,sh,mid,user_id
> ,play_pv,spt_cnt,prg_spt_cnt from xxxx_table_tmp where dt='2017-01-01'")
> 
> 4、spark split sql into two jobs，the first finished succeeded, but the
> second failed:
> 
>  
> 5、The second job stage:
> 
> 
> 
> Question:
> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
> note:My hadoop cluster has 5 datanode）
>       I guess it caused the failure 
> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
> "one datanode has only one partition ,and then the task is only one on the
> datanode"?
> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
> as follow,but i can not find "carbon.table.split.partition.enable" in
> other parts of the project。
>      I set "carbon.table.split.partition.enable" to true, but the second
> job has only five jobs.How to use this property?
>      ExampleUtils :
>     // whether use table split partition 
>     // true -> use table split partition, support multiple partition
> loading 
>     // false -> use node split partition, support data load by host
> partition 
>    
> CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
> "false") 
> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can i
> speed it.
> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
> 20,but the second job has only 5 tasks
>      the other parameter executor-memory = 20G is enough?
> 
> I need your help!Thank you very much!

> wwyxg@

> 
> 

> wwyxg@





--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/insert-into-carbon-table-failed-tp9609p9610.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re: Re:Re:Re:Re: insert into carbon table failed

Posted by Ravindra Pesala <ra...@gmail.com>.

Hi,

It is little weird, I tried to reproduce this issue but I am not
successful. Can you make sure that latest jar is updated in all the
datanodes and driver. There may be possibility that old jar is still
referring in either driver or in datanode.


Regards,
Ravindra

On 27 March 2017 at 01:40, a <ww...@163.com> wrote:

> I download  the newest sourcecode (master) and compile,generate the jar
> carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
> Then i use spark2.1 test again.The error logs are as follow:
>
>
>  Container log :
> 17/03/27 02:27:21 ERROR newflow.DataLoadExecutor: Executor task launch
> worker-9 Data Loading failed for table carbon_table
> java.lang.NullPointerException
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 17/03/27 02:27:21 INFO rdd.NewDataFrameLoaderRDD: DataLoad failure
> org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
>         ... 10 more
> 17/03/27 02:27:21 ERROR rdd.NewDataFrameLoaderRDD: Executor task launch
> worker-9
> org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
>         ... 10 more
> 17/03/27 02:27:21 ERROR executor.Executor: Exception in task 0.3 in stage
> 2.0 (TID 538)
> org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
>         ... 10 more
>
>
>
> Spark log:
>
> ERROR 27-03 02:27:21,407 - Task 0 in stage 2.0 failed 4 times; aborting job
> ERROR 27-03 02:27:21,419 - main load data frame failed
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 2.0 (TID 538, hd25): org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
>         ... 10 more
>
>
> Driver stacktrace:
>         at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(
> DAGScheduler.scala:1431)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1419)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1418)
>         at scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(
> ArrayBuffer.scala:47)
>         at org.apache.spark.scheduler.DAGScheduler.abortStage(
> DAGScheduler.scala:1418)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>         at scala.Option.foreach(Option.scala:236)
>         at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
> DAGScheduler.scala:799)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> doOnReceive(DAGScheduler.scala:1640)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1599)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1588)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>         at org.apache.spark.scheduler.DAGScheduler.runJob(
> DAGScheduler.scala:620)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>         at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.
> scala:927)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:111)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>         at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>         at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadDataFrame$1(CarbonDataRDDFactory.scala:665)
>         at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadCarbonData(CarbonDataRDDFactory.scala:794)
>         at org.apache.spark.sql.execution.command.LoadTable.
> run(carbonTableSchema.scala:579)
>         at org.apache.spark.sql.execution.command.LoadTableByInsert.run(
> carbonTableSchema.scala:297)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult$lzycompute(commands.scala:58)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult(commands.scala:56)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> doExecute(commands.scala:70)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:132)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:130)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
>         at org.apache.spark.sql.execution.SparkPlan.execute(
> SparkPlan.scala:130)
>         at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> QueryExecution.scala:55)
>         at org.apache.spark.sql.execution.QueryExecution.
> toRdd(QueryExecution.scala:55)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>         at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>
> (<console>:31)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<
> console>:36)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>
> :38)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>         at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
>         at $line23.$read$$iwC$$iwC.<init>(<console>:46)
>         at $line23.$read$$iwC.<init>(<console>:48)
>         at $line23.$read.<init>(<console>:50)
>         at $line23.$read$.<init>(<console>:54)
>         at $line23.$read$.<clinit>(<console>)
>         at $line23.$eval$.<init>(<console>:7)
>         at $line23.$eval$.<clinit>(<console>)
>         at $line23.$eval.$print(<console>)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(
> SparkIMain.scala:1065)
>         at org.apache.spark.repl.SparkIMain$Request.loadAndRun(
> SparkIMain.scala:1346)
>         at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(
> SparkIMain.scala:840)
>         at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:871)
>         at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:819)
>         at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:857)
>         at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
>         at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>         at org.apache.spark.repl.SparkILoop.processLine$1(
> SparkILoop.scala:657)
>         at org.apache.spark.repl.SparkILoop.innerLoop$1(
> SparkILoop.scala:665)
>         at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$loop(SparkILoop.scala:670)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(
> ScalaClassLoader.scala:135)
>         at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$process(SparkILoop.scala:945)
>         at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>         at org.apache.spark.repl.Main$.main(Main.scala:31)
>         at org.apache.spark.repl.Main.main(Main.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:181)
>         at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:206)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:121)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
>         ... 10 more
> ERROR 27-03 02:27:21,422 - main
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 2.0 (TID 538, hd25): org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
>         ... 10 more
>
>
> Driver stacktrace:
>         at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(
> DAGScheduler.scala:1431)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1419)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:1418)
>         at scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(
> ArrayBuffer.scala:47)
>         at org.apache.spark.scheduler.DAGScheduler.abortStage(
> DAGScheduler.scala:1418)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>         at scala.Option.foreach(Option.scala:236)
>         at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
> DAGScheduler.scala:799)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> doOnReceive(DAGScheduler.scala:1640)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1599)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1588)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>         at org.apache.spark.scheduler.DAGScheduler.runJob(
> DAGScheduler.scala:620)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>         at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.
> scala:927)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:111)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>         at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>         at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadDataFrame$1(CarbonDataRDDFactory.scala:665)
>         at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadCarbonData(CarbonDataRDDFactory.scala:794)
>         at org.apache.spark.sql.execution.command.LoadTable.
> run(carbonTableSchema.scala:579)
>         at org.apache.spark.sql.execution.command.LoadTableByInsert.run(
> carbonTableSchema.scala:297)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult$lzycompute(commands.scala:58)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult(commands.scala:56)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> doExecute(commands.scala:70)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:132)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:130)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
>         at org.apache.spark.sql.execution.SparkPlan.execute(
> SparkPlan.scala:130)
>         at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> QueryExecution.scala:55)
>         at org.apache.spark.sql.execution.QueryExecution.
> toRdd(QueryExecution.scala:55)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>         at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>
> (<console>:31)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<
> console>:36)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>
> :38)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>         at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
>         at $line23.$read$$iwC$$iwC.<init>(<console>:46)
>         at $line23.$read$$iwC.<init>(<console>:48)
>         at $line23.$read.<init>(<console>:50)
>         at $line23.$read$.<init>(<console>:54)
>         at $line23.$read$.<clinit>(<console>)
>         at $line23.$eval$.<init>(<console>:7)
>         at $line23.$eval$.<clinit>(<console>)
>         at $line23.$eval.$print(<console>)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(
> SparkIMain.scala:1065)
>         at org.apache.spark.repl.SparkIMain$Request.loadAndRun(
> SparkIMain.scala:1346)
>         at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(
> SparkIMain.scala:840)
>         at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:871)
>         at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:819)
>         at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:857)
>         at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
>         at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>         at org.apache.spark.repl.SparkILoop.processLine$1(
> SparkILoop.scala:657)
>         at org.apache.spark.repl.SparkILoop.innerLoop$1(
> SparkILoop.scala:665)
>         at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$loop(SparkILoop.scala:670)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(
> ScalaClassLoader.scala:135)
>         at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$process(SparkILoop.scala:945)
>         at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>         at org.apache.spark.repl.Main$.main(Main.scala:31)
>         at org.apache.spark.repl.Main.main(Main.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:181)
>         at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:206)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:121)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
> Data Loading failed for table carbon_table
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:54)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$
> anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>         at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.
> compute(NewCarbonDataLoadRDD.scala:322)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.
> scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:227)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:
> 158)
>         at org.apache.carbondata.processing.newflow.
> DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>         at org.apache.carbondata.processing.newflow.
> DataLoadExecutor.execute(DataLoadExecutor.java:43)
>         ... 10 more
> AUDIT 27-03 02:27:21,453 - [hd21][storm][Thread-1]Data load is failed for
> default.carbon_table
> ERROR 27-03 02:27:21,453 - main
> java.lang.Exception: DataLoad failure: Data Loading failed for table
> carbon_table
>         at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadCarbonData(CarbonDataRDDFactory.scala:937)
>         at org.apache.spark.sql.execution.command.LoadTable.
> run(carbonTableSchema.scala:579)
>         at org.apache.spark.sql.execution.command.LoadTableByInsert.run(
> carbonTableSchema.scala:297)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult$lzycompute(commands.scala:58)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult(commands.scala:56)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> doExecute(commands.scala:70)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:132)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:130)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
>         at org.apache.spark.sql.execution.SparkPlan.execute(
> SparkPlan.scala:130)
>         at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> QueryExecution.scala:55)
>         at org.apache.spark.sql.execution.QueryExecution.
> toRdd(QueryExecution.scala:55)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>         at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>
> (<console>:31)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<
> console>:36)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>
> :38)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>         at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>         at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
>         at $line23.$read$$iwC$$iwC.<init>(<console>:46)
>         at $line23.$read$$iwC.<init>(<console>:48)
>         at $line23.$read.<init>(<console>:50)
>         at $line23.$read$.<init>(<console>:54)
>         at $line23.$read$.<clinit>(<console>)
>         at $line23.$eval$.<init>(<console>:7)
>         at $line23.$eval$.<clinit>(<console>)
>         at $line23.$eval.$print(<console>)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(
> SparkIMain.scala:1065)
>         at org.apache.spark.repl.SparkIMain$Request.loadAndRun(
> SparkIMain.scala:1346)
>         at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(
> SparkIMain.scala:840)
>         at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:871)
>         at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:819)
>         at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:857)
>         at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
>         at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>         at org.apache.spark.repl.SparkILoop.processLine$1(
> SparkILoop.scala:657)
>         at org.apache.spark.repl.SparkILoop.innerLoop$1(
> SparkILoop.scala:665)
>         at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$loop(SparkILoop.scala:670)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(
> ScalaClassLoader.scala:135)
>         at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$process(SparkILoop.scala:945)
>         at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>         at org.apache.spark.repl.Main$.main(Main.scala:31)
>         at org.apache.spark.repl.Main.main(Main.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:181)
>         at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:206)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:121)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> AUDIT 27-03 02:27:21,454 - [hd21][storm][Thread-1]Dataload failure for
> default.carbon_table. Please check the logs
> java.lang.Exception: DataLoad failure: Data Loading failed for table
> carbon_table
>         at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.
> loadCarbonData(CarbonDataRDDFactory.scala:937)
>         at org.apache.spark.sql.execution.command.LoadTable.
> run(carbonTableSchema.scala:579)
>         at org.apache.spark.sql.execution.command.LoadTableByInsert.run(
> carbonTableSchema.scala:297)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult$lzycompute(commands.scala:58)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> sideEffectResult(commands.scala:56)
>         at org.apache.spark.sql.execution.ExecutedCommand.
> doExecute(commands.scala:70)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:132)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$
> execute$5.apply(SparkPlan.scala:130)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:150)
>         at org.apache.spark.sql.execution.SparkPlan.execute(
> SparkPlan.scala:130)
>         at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(
> QueryExecution.scala:55)
>         at org.apache.spark.sql.execution.QueryExecution.
> toRdd(QueryExecution.scala:55)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>         at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>         at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
>         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
>         at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>         at $iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>         at $iwC$$iwC$$iwC.<init>(<console>:44)
>         at $iwC$$iwC.<init>(<console>:46)
>         at $iwC.<init>(<console>:48)
>         at <init>(<console>:50)
>         at .<init>(<console>:54)
>         at .<clinit>(<console>)
>         at .<init>(<console>:7)
>         at .<clinit>(<console>)
>         at $print(<console>)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(
> SparkIMain.scala:1065)
>         at org.apache.spark.repl.SparkIMain$Request.loadAndRun(
> SparkIMain.scala:1346)
>         at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(
> SparkIMain.scala:840)
>         at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:871)
>         at org.apache.spark.repl.SparkIMain.interpret(
> SparkIMain.scala:819)
>         at org.apache.spark.repl.SparkILoop.reallyInterpret$1(
> SparkILoop.scala:857)
>         at org.apache.spark.repl.SparkILoop.interpretStartingWith(
> SparkILoop.scala:902)
>         at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>         at org.apache.spark.repl.SparkILoop.processLine$1(
> SparkILoop.scala:657)
>         at org.apache.spark.repl.SparkILoop.innerLoop$1(
> SparkILoop.scala:665)
>         at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$loop(SparkILoop.scala:670)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at org.apache.spark.repl.SparkILoop$$anonfun$org$
> apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>         at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(
> ScalaClassLoader.scala:135)
>         at org.apache.spark.repl.SparkILoop.org$apache$spark$
> repl$SparkILoop$$process(SparkILoop.scala:945)
>         at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>         at org.apache.spark.repl.Main$.main(Main.scala:31)
>         at org.apache.spark.repl.Main.main(Main.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:181)
>         at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:206)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:121)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> At 2017-03-27 00:42:28, "a" <ww...@163.com> wrote:
>
>
>
>  Container log : error executor.CoarseGrainedExecutorBackend: RECEIVED
> SIGNAL 15: SIGTERM。
>  spark log: 17/03/26 23:40:30 ERROR YarnScheduler: Lost executor 2 on
> hd25: Container killed by YARN for exceeding memory limits. 49.0 GB of 49
> GB physical memory used. Consider boosting spark.yarn.executor.
> memoryOverhead.
> The test sql
>
>
>
>
>
>
>
> At 2017-03-26 23:34:36, "a" <ww...@163.com> wrote:
> >
> >
> >I have set the parameters as follow：
> >1、fs.hdfs.impl.disable.cache=true
> >2、dfs.socket.timeout=1800000  （Exception：aused by: java.io.IOException:
> Filesystem closed）
> >3、dfs.datanode.socket.write.timeout=3600000
> >4、set carbondata property enable.unsafe.sort=true
> >5、remove BUCKETCOLUMNS property from the create table sql
> >6、set spark job parameter executor-memory=48G （from 20G to 48G）
> >
> >
> >But it  still failed, the error is "executor.CoarseGrainedExecutorBackend:
> RECEIVED SIGNAL 15: SIGTERM。"
> >
> >
> >Then i try to insert 40000 0000 records into carbondata table ,it works
> success.
> >
> >
> >How can i insert 20 0000 0000 records into carbondata?
> >Should me set  executor-memory big enough? Or Should me generate the csv
> file from the hive table first ,then load the csv file into carbon table?
> >Any body give me same help?
> >
> >
> >Regards
> >fish
> >
> >
> >
> >
> >
> >
> >
> >At 2017-03-26 00:34:18, "a" <ww...@163.com> wrote:
> >>Thank you  Ravindra!
> >>Version:
> >>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is
> 2.7.1,hive version is 1.1.0
> >>one of the containers log:
> >>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED
> SIGNAL 15: SIGTERM
> >>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called
> >>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called
> >>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory
> /data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/
> appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-
> 700345a84109
> >>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl:
> pool-23-thread-2
> >>java.io.IOException: Error reading file: hdfs://xxxx_table_tmp/dt=2017-
> 01-01/pt=ios/000006_0
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(
> RecordReaderImpl.java:1046)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$
> OriginalReaderPair.next(OrcRawRecordMerger.java:263)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(
> OrcRawRecordMerger.java:547)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(
> OrcInputFormat.java:1234)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(
> OrcInputFormat.java:1218)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$
> NullKeyRecordReader.next(OrcInputFormat.java:1150)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$
> NullKeyRecordReader.next(OrcInputFormat.java:1136)
> >>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(
> HadoopRDD.scala:249)
> >>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(
> HadoopRDD.scala:211)
> >>        at org.apache.spark.util.NextIterator.hasNext(
> NextIterator.scala:73)
> >>        at org.apache.spark.InterruptibleIterator.hasNext(
> InterruptibleIterator.scala:39)
> >>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.
> scala:327)
> >>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.
> scala:327)
> >>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.
> scala:327)
> >>        at org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(
> NewCarbonDataLoadRDD.scala:412)
> >>        at org.apache.carbondata.processing.newflow.steps.
> InputProcessorStepImpl$InputProcessorIterator.internalHasNext(
> InputProcessorStepImpl.java:163)
> >>        at org.apache.carbondata.processing.newflow.steps.
> InputProcessorStepImpl$InputProcessorIterator.getBatch(
> InputProcessorStepImpl.java:221)
> >>        at org.apache.carbondata.processing.newflow.steps.
> InputProcessorStepImpl$InputProcessorIterator.next(
> InputProcessorStepImpl.java:183)
> >>        at org.apache.carbondata.processing.newflow.steps.
> InputProcessorStepImpl$InputProcessorIterator.next(
> InputProcessorStepImpl.java:117)
> >>        at org.apache.carbondata.processing.newflow.steps.
> DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl
> .java:80)
> >>        at org.apache.carbondata.processing.newflow.steps.
> DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl
> .java:73)
> >>        at org.apache.carbondata.processing.newflow.sort.impl.
> ParallelReadMergeSorterImpl$SortIteratorThread.call(
> ParallelReadMergeSorterImpl.java:196)
> >>        at org.apache.carbondata.processing.newflow.sort.impl.
> ParallelReadMergeSorterImpl$SortIteratorThread.call(
> ParallelReadMergeSorterImpl.java:177)
> >>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >>        at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> >>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> >>        at java.lang.Thread.run(Thread.java:745)
> >>Caused by: java.io.IOException: Filesystem closed
> >>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.
> java:808)
> >>        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(
> DFSInputStream.java:868)
> >>        at org.apache.hadoop.hdfs.DFSInputStream.read(
> DFSInputStream.java:934)
> >>        at java.io.DataInputStream.readFully(DataInputStream.java:195)
> >>        at org.apache.hadoop.hive.ql.io.orc.MetadataReader.
> readStripeFooter(MetadataReader.java:112)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.
> readStripeFooter(RecordReaderImpl.java:228)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.
> beginReadStripe(RecordReaderImpl.java:805)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.
> readStripe(RecordReaderImpl.java:776)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.
> advanceStripe(RecordReaderImpl.java:986)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.
> advanceToNextRow(RecordReaderImpl.java:1019)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(
> RecordReaderImpl.java:1042)
> >>        ... 26 more
> >>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS
> property ,and try again.
> >>
> >>
> >>At 2017-03-25 20:55:03, "Ravindra Pesala" <ra...@gmail.com> wrote:
> >>>Hi,
> >>>
> >>>Carbodata launches one job per each node to sort the data at node level
> and
> >>>avoid shuffling. Internally it uses threads to use parallel load. Please
> >>>use carbon.number.of.cores.while.loading property in carbon.properties
> file
> >>>and set the number of cores it should use per machine while loading.
> >>>Carbondata sorts the data  at each node level to maintain the Btree for
> >>>each node per segment. It improves the query performance by filtering
> >>>faster if we have Btree at node level instead of each block level.
> >>>
> >>>1.Which version of Carbondata are you using?
> >>>2.There are memory issues in Carbondata-1.0 version and are fixed
> current
> >>>master.
> >>>3.And you can improve the performance by enabling
> enable.unsafe.sort=true in
> >>>carbon.properties file. But it is not supported if bucketing of columns
> are
> >>>enabled. We are planning to support unsafe sort load for bucketing also
> in
> >>>next version.
> >>>
> >>>Please send the executor log to know about the error you are facing.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>Regards,
> >>>Ravindra
> >>>
> >>>On 25 March 2017 at 16:18, wwyxg@163.com <ww...@163.com> wrote:
> >>>
> >>>> Hello!
> >>>>
> >>>> *0、The failure*
> >>>> When i insert into carbon table，i encounter failure。The failure is  as
> >>>> follow:
> >>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times,
> most
> >>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
> >>>> ExecutorLostFailure (executor 1 exited caused by one of the running
> tasks)
> >>>> Reason: Slave lost+details
> >>>>
> >>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times,
> most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
> Reason: Slave lost
> >>>> Driver stacktrace:
> >>>>
> >>>> the stage:
> >>>>
> >>>> *Step:*
> >>>> *1、start spark－shell*
> >>>> ./bin/spark-shell \
> >>>> --master yarn-client \
> >>>> --num-executors 5 \  (I tried to set this parameter range from 10 to
> >>>> 20,but the second job has only 5 tasks)
> >>>> --executor-cores 5 \
> >>>> --executor-memory 20G \
> >>>> --driver-memory 8G \
> >>>> --queue root.default \
> >>>> --jars /xxx.jar
> >>>>
> >>>> //spark-default.conf spark.default.parallelism=320
> >>>>
> >>>> import org.apache.spark.sql.CarbonContext
> >>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
> >>>>
> >>>> *2、create table*
> >>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
> >>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
> >>>> String,scene String,status String,nw String,isc String,area
> String,spttag
> >>>> String,province String,isp String,city String,tv String,hwm String,pip
> >>>> String,fo String,sh String,mid String,user_id String,play_pv
> Int,spt_cnt
> >>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|'
> STORED
> >>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
> >>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
> >>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
> >>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
> >>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
> >>>>
> >>>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
> >>>> //the column distinct values are as follows:
> >>>>
> >>>>
> >>>> *3、insert into table*（xxxx_table_tmp  is a hive extenal orc table，has
> 20
> >>>> 0000 0000 records）
> >>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
> >>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
> >>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
> >>>> xxxx_table_tmp where dt='2017-01-01'")
> >>>>
> >>>> *4、spark split sql into two jobs，the first finished succeeded, but the
> >>>> second failed:*
> >>>>
> >>>>
> >>>> *5、The second job stage:*
> >>>>
> >>>>
> >>>>
> >>>> *Question:*
> >>>> 1、Why the second job has only five jobs,but the first job has 994
> jobs ?(
> >>>> note:My hadoop cluster has 5 datanode）
> >>>>       I guess it caused the failure
> >>>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means
> that
> >>>> "one datanode has only one partition ,and then the task is only one
> on the
> >>>> datanode"?
> >>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is
> set
> >>>> as follow,but i can not find "carbon.table.split.partition.enable" in
> >>>> other parts of the project。
> >>>>      I set "carbon.table.split.partition.enable" to true, but the
> second
> >>>> job has only five jobs.How to use this property?
> >>>>      ExampleUtils :
> >>>>     // whether use table split partition
> >>>>     // true -> use table split partition, support multiple partition
> >>>> loading
> >>>>     // false -> use node split partition, support data load by host
> >>>> partition
> >>>>     CarbonProperties.getInstance().addProperty("carbon.table.
> split.partition.enable",
> >>>> "false")
> >>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How
> can
> >>>> i speed it.
> >>>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
> >>>> 20,but the second job has only 5 tasks
> >>>>      the other parameter executor-memory = 20G is enough?
> >>>>
> >>>> I need your help!Thank you very much!
> >>>>
> >>>> wwyxg@163.com
> >>>>
> >>>> ------------------------------
> >>>> wwyxg@163.com
> >>>>
> >>>
> >>>
> >>>
> >>>--
> >>>Thanks & Regards,
> >>>Ravi
>



-- 
Thanks & Regards,
Ravi

Re:Re: Re:Re:Re:Re:Re:Re: insert into carbon table failed

Posted by a <ww...@163.com>.

Thank you very much!


I have divided 2 billions data into 4 pieces and loaded in the table 。


The three paramaters carbon.graph.rowset.size、 carbon.sort.size 、carbon.number.of.cores.while.loading may be also effect。


Best regards！

At 2017-03-27 13:53:58, "Liang Chen" <ch...@gmail.com> wrote:
>Hi 
>
>1.Use your current test environment (CarbonData 1.0 + Spark1.6), Please
>divide 2 billions data into 4 pieces(each is 0.5 billion), load data again.
>
>2.For CarbonData 1.0 + Spark1.6 with kettle for loading data, please
>configure the bellow 3 parameters in carbon.properties(note: please copy the
>latest carbon.properties to all nodes)
>
>carbon.graph.rowset.size=10000   (by default is 100000, please set to 1/10
>for reducing Rowset size exchanged between data load graph)
>carbon.number.of.cores.while.loading=5 (because your machine has 5 cores) 
>carbon.sort.size=50000 ( by default is 500000, please set to 1/10 for
>reducing temp intermediate files)
>
>
>Regards
>Liang
>
>
>
>--
>View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/insert-into-carbon-table-failed-tp9609p9688.html
>Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re: Re:Re:Re:Re:Re:Re: insert into carbon table failed

Posted by Liang Chen <ch...@gmail.com>.

Hi 

1.Use your current test environment (CarbonData 1.0 + Spark1.6), Please
divide 2 billions data into 4 pieces(each is 0.5 billion), load data again.

2.For CarbonData 1.0 + Spark1.6 with kettle for loading data, please
configure the bellow 3 parameters in carbon.properties(note: please copy the
latest carbon.properties to all nodes)

carbon.graph.rowset.size=10000   (by default is 100000, please set to 1/10
for reducing Rowset size exchanged between data load graph)
carbon.number.of.cores.while.loading=5 (because your machine has 5 cores) 
carbon.sort.size=50000 ( by default is 500000, please set to 1/10 for
reducing temp intermediate files)


Regards
Liang



--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/insert-into-carbon-table-failed-tp9609p9688.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re: Re:Re:Re:Re:Re:Re: insert into carbon table failed

Posted by Liang Chen <ch...@gmail.com>.

Hi 

Please enable vector , it might help limit query.

import org.apache.carbondata.core.util.CarbonProperties
import org.apache.carbondata.core.constants.CarbonCommonConstants
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_VECTOR_READER,
"true")

Regards
Liang


a wrote
> TEST SQL ：
> 高基数随机查询
> select * From carbon_table where dt='2017-01-01' and user_id='XXXX' limit
> 100;
> 
> 
> 高基数随机查询like
> select * From carbon_table where dt='2017-01-01' and fo like '%YYYY%'
> limit 100;
> 
> 
> 低基数随机查询
> select * From carbon_table where dt='2017-01-01' and plat='android' and
> tv='8400' limit 100
> 
> 
> 1维度查询
> select province,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt
> from carbon_table where dt='2017-01-01' and sty='AAAA'
> group by province
> 
> 
> 2维度查询
> select province,city,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt
> from carbon_table where dt='2017-01-01' and sty='AAAA'
> group by province,city
> 
> 
> 3维度查询
> select province,city,isp,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt
> from carbon_table where dt='2017-01-01' and sty='AAAA'
> group by province,city,isp
> 
> 
> 多维度查询
> select sty,isc,status,nw,tv,area,province,city,isp,sum(play_pv)
> play_pv_sum ,sum(spt_cnt) spt_cnt_sum
> from carbon_table where dt='2017-01-01' and sty='AAAA'
> group by sty,isc,status,nw,tv,area,province,city,isp
> 
> 
> distinct 单列
> select tv, count(distinct user_id) 
> from carbon_table where dt='2017-01-01' and sty='AAAA' and fo like
> '%YYYY%' group by tv
> 
> 
> distinct 多列
> select count(distinct user_id) ，count(distinct mid),count(distinct case
> when sty='AAAA' then mid end)
> from carbon_table where dt='2017-01-01' and sty='AAAA'
> 
> 
> 排序查询
> select user_id,sum(play_pv) play_pv_sum
> from carbon_table
> group by user_id
> order by play_pv_sum desc limit 100
> 
> 
> 简单join查询
> select b.fo_level1,b.fo_level2,sum(a.play_pv) play_pv_sum From
> carbon_table a
> left join dim_carbon_table b
> on a.fo=b.fo and a.dt = b.dt where a.dt = '2017-01-01' group by
> b.fo_level1,b.fo_level2
> 
> At 2017-03-27 04:10:04, "a" &lt;

> wwyxg@

> &gt; wrote:
>>I download  the newest sourcecode (master) and compile,generate the jar
carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
>>Then i use spark2.1 test again.The error logs are as follow:
>>
>>
>> Container log :
>>17/03/27 02:27:21 ERROR newflow.DataLoadExecutor: Executor task launch
worker-9 Data Loading failed for table carbon_table
>>java.lang.NullPointerException
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.
> <init>
> (NewCarbonDataLoadRDD.scala:365)
>>        at
>> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>        at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>        at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:745)
>>17/03/27 02:27:21 INFO rdd.NewDataFrameLoaderRDD: DataLoad failure
>>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
Data Loading failed for table carbon_table
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.
> <init>
> (NewCarbonDataLoadRDD.scala:365)
>>        at
>> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>        at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>        at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:745)
>>Caused by: java.lang.NullPointerException
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>>        ... 10 more
>>17/03/27 02:27:21 ERROR rdd.NewDataFrameLoaderRDD: Executor task launch
worker-9 
>>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
Data Loading failed for table carbon_table
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.
> <init>
> (NewCarbonDataLoadRDD.scala:365)
>>        at
>> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>        at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>        at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:745)
>>Caused by: java.lang.NullPointerException
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>>        ... 10 more
>>17/03/27 02:27:21 ERROR executor.Executor: Exception in task 0.3 in stage
2.0 (TID 538)
>>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
Data Loading failed for table carbon_table
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.
> <init>
> (NewCarbonDataLoadRDD.scala:365)
>>        at
>> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>        at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>        at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:745)
>>Caused by: java.lang.NullPointerException
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>>        ... 10 more
>>
>>
>>
>>Spark log:
>>
>>ERROR 27-03 02:27:21,407 - Task 0 in stage 2.0 failed 4 times; aborting
job
>>ERROR 27-03 02:27:21,419 - main load data frame failed
>>org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0
(TID 538, hd25):
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
Data Loading failed for table carbon_table
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.
> <init>
> (NewCarbonDataLoadRDD.scala:365)
>>        at
>> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>        at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>        at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:745)
>>Caused by: java.lang.NullPointerException
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>>        ... 10 more
>>
>>
>>Driver stacktrace:
>>        at
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>>        at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>>        at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>>        at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>        at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>        at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>>        at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>>        at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>>        at scala.Option.foreach(Option.scala:236)
>>        at
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>>        at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>>        at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>>        at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>>        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>        at
>> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>>        at
>> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
>>        at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>        at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>>        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>>        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>>        at
>> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665)
>>        at
>> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794)
>>        at
>> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>>        at
>> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>>        at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>>        at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>>        at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>        at
>> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>>        at
>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>>        at
>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>>        at org.apache.spark.sql.DataFrame.
> <init>
> (DataFrame.scala:145)
>>        at org.apache.spark.sql.DataFrame.
> <init>
> (DataFrame.scala:130)
>>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :31)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :36)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :38)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :40)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :42)
>>        at $line23.$read$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :44)
>>        at $line23.$read$$iwC$$iwC.
> <init>
> (
> <console>
> :46)
>>        at $line23.$read$$iwC.
> <init>
> (
> <console>
> :48)
>>        at $line23.$read.
> <init>
> (
> <console>
> :50)
>>        at $line23.$read$.
> <init>
> (
> <console>
> :54)
>>        at $line23.$read$.
> <clinit>
> (
> <console>
> )
>>        at $line23.$eval$.
> <init>
> (
> <console>
> :7)
>>        at $line23.$eval$.
> <clinit>
> (
> <console>
> )
>>        at $line23.$eval.$print(
> <console>
> )
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>        at
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>        at
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>>        at
>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>        at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>        at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>        at
>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>>        at
>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>>        at
>> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>>        at
>> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>>        at
>> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>        at
>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>        at
>> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>>        at org.apache.spark.repl.Main.main(Main.scala)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>Caused by:
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
Data Loading failed for table carbon_table
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.
> <init>
> (NewCarbonDataLoadRDD.scala:365)
>>        at
>> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>        at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>        at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:745)
>>Caused by: java.lang.NullPointerException
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>>        ... 10 more
>>ERROR 27-03 02:27:21,422 - main 
>>org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0
(TID 538, hd25):
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
Data Loading failed for table carbon_table
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.
> <init>
> (NewCarbonDataLoadRDD.scala:365)
>>        at
>> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>        at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>        at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:745)
>>Caused by: java.lang.NullPointerException
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>>        ... 10 more
>>
>>
>>Driver stacktrace:
>>        at
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>>        at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>>        at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>>        at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>        at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>        at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>>        at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>>        at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>>        at scala.Option.foreach(Option.scala:236)
>>        at
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>>        at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>>        at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>>        at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>>        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>        at
>> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>>        at
>> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
>>        at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>        at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>>        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>>        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>>        at
>> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665)
>>        at
>> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794)
>>        at
>> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>>        at
>> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>>        at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>>        at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>>        at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>        at
>> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>>        at
>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>>        at
>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>>        at org.apache.spark.sql.DataFrame.
> <init>
> (DataFrame.scala:145)
>>        at org.apache.spark.sql.DataFrame.
> <init>
> (DataFrame.scala:130)
>>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :31)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :36)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :38)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :40)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :42)
>>        at $line23.$read$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :44)
>>        at $line23.$read$$iwC$$iwC.
> <init>
> (
> <console>
> :46)
>>        at $line23.$read$$iwC.
> <init>
> (
> <console>
> :48)
>>        at $line23.$read.
> <init>
> (
> <console>
> :50)
>>        at $line23.$read$.
> <init>
> (
> <console>
> :54)
>>        at $line23.$read$.
> <clinit>
> (
> <console>
> )
>>        at $line23.$eval$.
> <init>
> (
> <console>
> :7)
>>        at $line23.$eval$.
> <clinit>
> (
> <console>
> )
>>        at $line23.$eval.$print(
> <console>
> )
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>        at
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>        at
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>>        at
>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>        at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>        at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>        at
>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>>        at
>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>>        at
>> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>>        at
>> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>>        at
>> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>        at
>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>        at
>> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>>        at org.apache.spark.repl.Main.main(Main.scala)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>Caused by:
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException:
Data Loading failed for table carbon_table
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.
> <init>
> (NewCarbonDataLoadRDD.scala:365)
>>        at
>> org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>        at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>        at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:745)
>>Caused by: java.lang.NullPointerException
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>>        at
>> org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>>        ... 10 more
>>AUDIT 27-03 02:27:21,453 - [hd21][storm][Thread-1]Data load is failed for
default.carbon_table
>>ERROR 27-03 02:27:21,453 - main 
>>java.lang.Exception: DataLoad failure: Data Loading failed for table
carbon_table
>>        at
>> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937)
>>        at
>> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>>        at
>> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>>        at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>>        at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>>        at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>        at
>> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>>        at
>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>>        at
>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>>        at org.apache.spark.sql.DataFrame.
> <init>
> (DataFrame.scala:145)
>>        at org.apache.spark.sql.DataFrame.
> <init>
> (DataFrame.scala:130)
>>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :31)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :36)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :38)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :40)
>>        at $line23.$read$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :42)
>>        at $line23.$read$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :44)
>>        at $line23.$read$$iwC$$iwC.
> <init>
> (
> <console>
> :46)
>>        at $line23.$read$$iwC.
> <init>
> (
> <console>
> :48)
>>        at $line23.$read.
> <init>
> (
> <console>
> :50)
>>        at $line23.$read$.
> <init>
> (
> <console>
> :54)
>>        at $line23.$read$.
> <clinit>
> (
> <console>
> )
>>        at $line23.$eval$.
> <init>
> (
> <console>
> :7)
>>        at $line23.$eval$.
> <clinit>
> (
> <console>
> )
>>        at $line23.$eval.$print(
> <console>
> )
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>        at
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>        at
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>>        at
>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>        at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>        at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>        at
>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>>        at
>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>>        at
>> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>>        at
>> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>>        at
>> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>        at
>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>        at
>> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>>        at org.apache.spark.repl.Main.main(Main.scala)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>AUDIT 27-03 02:27:21,454 - [hd21][storm][Thread-1]Dataload failure for
default.carbon_table. Please check the logs
>>java.lang.Exception: DataLoad failure: Data Loading failed for table
carbon_table
>>        at
>> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937)
>>        at
>> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>>        at
>> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>>        at
>> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>>        at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>>        at
>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>>        at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>>        at
>> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>>        at
>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>>        at
>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>>        at org.apache.spark.sql.DataFrame.
> <init>
> (DataFrame.scala:145)
>>        at org.apache.spark.sql.DataFrame.
> <init>
> (DataFrame.scala:130)
>>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>>        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :31)
>>        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :36)
>>        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :38)
>>        at $iwC$$iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :40)
>>        at $iwC$$iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :42)
>>        at $iwC$$iwC$$iwC.
> <init>
> (
> <console>
> :44)
>>        at $iwC$$iwC.
> <init>
> (
> <console>
> :46)
>>        at $iwC.
> <init>
> (
> <console>
> :48)
>>        at 
> <init>
> (
> <console>
> :50)
>>        at .
> <init>
> (
> <console>
> :54)
>>        at .
> <clinit>
> (
> <console>
> )
>>        at .
> <init>
> (
> <console>
> :7)
>>        at .
> <clinit>
> (
> <console>
> )
>>        at $print(
> <console>
> )
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>        at
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>        at
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>>        at
>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>        at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>        at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>        at
>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>>        at
>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>>        at
>> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>>        at
>> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>>        at
>> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>        at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>        at
>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>        at
>> org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>>        at org.apache.spark.repl.Main.main(Main.scala)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:606)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>>        at
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>At 2017-03-27 00:42:28, "a" &lt;

> wwyxg@

> &gt; wrote:
>>
>> 
>>
>> Container log : error executor.CoarseGrainedExecutorBackend: RECEIVED
>> SIGNAL 15: SIGTERM。
>> spark log: 17/03/26 23:40:30 ERROR YarnScheduler: Lost executor 2 on
>> hd25: Container killed by YARN for exceeding memory limits. 49.0 GB of 49
>> GB physical memory used. Consider boosting
>> spark.yarn.executor.memoryOverhead.
>>The test sql
>>
>>
>>
>>
>>
>>
>>
>>At 2017-03-26 23:34:36, "a" &lt;

> wwyxg@

> &gt; wrote:
>>>
>>>
>>>I have set the parameters as follow：
>>>1、fs.hdfs.impl.disable.cache=true
>>>2、dfs.socket.timeout=1800000  （Exception：aused by: java.io.IOException:
Filesystem closed）
>>>3、dfs.datanode.socket.write.timeout=3600000
>>>4、set carbondata property enable.unsafe.sort=true
>>>5、remove BUCKETCOLUMNS property from the create table sql
>>>6、set spark job parameter executor-memory=48G （from 20G to 48G）
>>>
>>>
>>>But it  still failed, the error is
"executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM。"
>>>
>>>
>>>Then i try to insert 40000 0000 records into carbondata table ,it works
success.
>>>
>>>
>>>How can i insert 20 0000 0000 records into carbondata?
>>>Should me set  executor-memory big enough? Or Should me generate the csv
file from the hive table first ,then load the csv file into carbon table?
>>>Any body give me same help?
>>>
>>>
>>>Regards
>>>fish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>At 2017-03-26 00:34:18, "a" &lt;

> wwyxg@

> &gt; wrote:
>>>>Thank you  Ravindra!
>>>>Version:
>>>>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is
2.7.1,hive version is 1.1.0
>>>>one of the containers log:
>>>>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED
SIGNAL 15: SIGTERM
>>>>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called
>>>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called
>>>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory
/data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-700345a84109
>>>>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl:
pool-23-thread-2 
>>>>java.io.IOException: Error reading file:
hdfs://xxxx_table_tmp/dt=2017-01-01/pt=ios/000006_0
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1234)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1218)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1150)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1136)
>>>>        at
>>>> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:249)
>>>>        at
>>>> org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
>>>>        at
>>>> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>>>>        at
>>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>>>        at
>>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>>>        at
>>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>>>        at
>>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>>>        at
>>>> org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(NewCarbonDataLoadRDD.scala:412)
>>>>        at
>>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.internalHasNext(InputProcessorStepImpl.java:163)
>>>>        at
>>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.getBatch(InputProcessorStepImpl.java:221)
>>>>        at
>>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:183)
>>>>        at
>>>> org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:117)
>>>>        at
>>>> org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:80)
>>>>        at
>>>> org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:73)
>>>>        at
>>>> org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:196)
>>>>        at
>>>> org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:177)
>>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>        at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>        at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>        at java.lang.Thread.run(Thread.java:745)
>>>>Caused by: java.io.IOException: Filesystem closed
>>>>        at
>>>> org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
>>>>        at
>>>> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
>>>>        at
>>>> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
>>>>        at java.io.DataInputStream.readFully(DataInputStream.java:195)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:112)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
>>>>        at
>>>> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042)
>>>>        ... 26 more
>>>>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS
property ,and try again.
>>>>
>>>>
>>>>At 2017-03-25 20:55:03, "Ravindra Pesala" &lt;

> ravi.pesala@

> &gt; wrote:
>>>>>Hi,
>>>>>
>>>>>Carbodata launches one job per each node to sort the data at node level
and
>>>>>avoid shuffling. Internally it uses threads to use parallel load.
Please
>>>>>use carbon.number.of.cores.while.loading property in carbon.properties
file
>>>>>and set the number of cores it should use per machine while loading.
>>>>>Carbondata sorts the data  at each node level to maintain the Btree for
>>>>>each node per segment. It improves the query performance by filtering
>>>>>faster if we have Btree at node level instead of each block level.
>>>>>
>>>>>1.Which version of Carbondata are you using?
>>>>>2.There are memory issues in Carbondata-1.0 version and are fixed
current
>>>>>master.
>>>>>3.And you can improve the performance by enabling
enable.unsafe.sort=true in
>>>>>carbon.properties file. But it is not supported if bucketing of columns
are
>>>>>enabled. We are planning to support unsafe sort load for bucketing also
in
>>>>>next version.
>>>>>
>>>>>Please send the executor log to know about the error you are facing.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>Regards,
>>>>>Ravindra
>>>>>
>>>>>On 25 March 2017 at 16:18, 

> wwyxg@

>  &lt;

> wwyxg@

> &gt; wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> *0、The failure*
>>>>>> When i insert into carbon table，i encounter failure。The failure is 
>>>>>> as
>>>>>> follow:
>>>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times,
>>>>>> most
>>>>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>>>>>> ExecutorLostFailure (executor 1 exited caused by one of the running
>>>>>> tasks)
>>>>>> Reason: Slave lost+details
>>>>>>
>>>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times,
>>>>>> most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>>>>>> ExecutorLostFailure (executor 1 exited caused by one of the running
>>>>>> tasks) Reason: Slave lost
>>>>>> Driver stacktrace:
>>>>>>
>>>>>> the stage:
>>>>>>
>>>>>> *Step:*
>>>>>> *1、start spark－shell*
>>>>>> ./bin/spark-shell \
>>>>>> --master yarn-client \
>>>>>> --num-executors 5 \  (I tried to set this parameter range from 10 to
>>>>>> 20,but the second job has only 5 tasks)
>>>>>> --executor-cores 5 \
>>>>>> --executor-memory 20G \
>>>>>> --driver-memory 8G \
>>>>>> --queue root.default \
>>>>>> --jars /xxx.jar
>>>>>>
>>>>>> //spark-default.conf spark.default.parallelism=320
>>>>>>
>>>>>> import org.apache.spark.sql.CarbonContext
>>>>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
>>>>>>
>>>>>> *2、create table*
>>>>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt
>>>>>> String,lst
>>>>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
>>>>>> String,scene String,status String,nw String,isc String,area
>>>>>> String,spttag
>>>>>> String,province String,isp String,city String,tv String,hwm
>>>>>> String,pip
>>>>>> String,fo String,sh String,mid String,user_id String,play_pv
>>>>>> Int,spt_cnt
>>>>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|'
>>>>>> STORED
>>>>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
>>>>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
>>>>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
>>>>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
>>>>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
>>>>>>
>>>>>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
>>>>>> //the column distinct values are as follows:
>>>>>>
>>>>>>
>>>>>> *3、insert into table*（xxxx_table_tmp  is a hive extenal orc table，has
>>>>>> 20
>>>>>> 0000 0000 records）
>>>>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
>>>>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
>>>>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
>>>>>> xxxx_table_tmp where dt='2017-01-01'")
>>>>>>
>>>>>> *4、spark split sql into two jobs，the first finished succeeded, but
>>>>>> the
>>>>>> second failed:*
>>>>>>
>>>>>>
>>>>>> *5、The second job stage:*
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Question:*
>>>>>> 1、Why the second job has only five jobs,but the first job has 994
>>>>>> jobs ?(
>>>>>> note:My hadoop cluster has 5 datanode）
>>>>>>       I guess it caused the failure
>>>>>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means
>>>>>> that
>>>>>> "one datanode has only one partition ,and then the task is only one
>>>>>> on the
>>>>>> datanode"?
>>>>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is
>>>>>> set
>>>>>> as follow,but i can not find "carbon.table.split.partition.enable" in
>>>>>> other parts of the project。
>>>>>>      I set "carbon.table.split.partition.enable" to true, but the
>>>>>> second
>>>>>> job has only five jobs.How to use this property?
>>>>>>      ExampleUtils :
>>>>>>     // whether use table split partition
>>>>>>     // true -> use table split partition, support multiple partition
>>>>>> loading
>>>>>>     // false -> use node split partition, support data load by host
>>>>>> partition
>>>>>>    
>>>>>> CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
>>>>>> "false")
>>>>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How
>>>>>> can
>>>>>> i speed it.
>>>>>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
>>>>>> 20,but the second job has only 5 tasks
>>>>>>      the other parameter executor-memory = 20G is enough?
>>>>>>
>>>>>> I need your help!Thank you very much!
>>>>>>
>>>>>> 

> wwyxg@

>>>>>>
>>>>>> ------------------------------
>>>>>> 

> wwyxg@

>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>-- 
>>>>>Thanks & Regards,
>>>>>Ravi





--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/insert-into-carbon-table-failed-tp9609p9707.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re:Re:Re:Re:Re:Re: insert into carbon table failed

Posted by a <ww...@163.com>.

TEST SQL ：
高基数随机查询
select * From carbon_table where dt='2017-01-01' and user_id='XXXX' limit 100;


高基数随机查询like
select * From carbon_table where dt='2017-01-01' and fo like '%YYYY%' limit 100;


低基数随机查询
select * From carbon_table where dt='2017-01-01' and plat='android' and tv='8400' limit 100


1维度查询
select province,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt
from carbon_table where dt='2017-01-01' and sty='AAAA'
group by province


2维度查询
select province,city,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt
from carbon_table where dt='2017-01-01' and sty='AAAA'
group by province,city


3维度查询
select province,city,isp,sum(play_pv) play_pv ,sum(spt_cnt) spt_cnt
from carbon_table where dt='2017-01-01' and sty='AAAA'
group by province,city,isp


多维度查询
select sty,isc,status,nw,tv,area,province,city,isp,sum(play_pv) play_pv_sum ,sum(spt_cnt) spt_cnt_sum
from carbon_table where dt='2017-01-01' and sty='AAAA'
group by sty,isc,status,nw,tv,area,province,city,isp


distinct 单列
select tv, count(distinct user_id) 
from carbon_table where dt='2017-01-01' and sty='AAAA' and fo like '%YYYY%' group by tv


distinct 多列
select count(distinct user_id) ，count(distinct mid),count(distinct case when sty='AAAA' then mid end)
from carbon_table where dt='2017-01-01' and sty='AAAA'


排序查询
select user_id,sum(play_pv) play_pv_sum
from carbon_table
group by user_id
order by play_pv_sum desc limit 100


简单join查询
select b.fo_level1,b.fo_level2,sum(a.play_pv) play_pv_sum From carbon_table a
left join dim_carbon_table b
on a.fo=b.fo and a.dt = b.dt where a.dt = '2017-01-01' group by b.fo_level1,b.fo_level2

At 2017-03-27 04:10:04, "a" <ww...@163.com> wrote:
>I download  the newest sourcecode (master) and compile,generate the jar carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
>Then i use spark2.1 test again.The error logs are as follow:
>
>
> Container log :
>17/03/27 02:27:21 ERROR newflow.DataLoadExecutor: Executor task launch worker-9 Data Loading failed for table carbon_table
>java.lang.NullPointerException
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>17/03/27 02:27:21 INFO rdd.NewDataFrameLoaderRDD: DataLoad failure
>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>17/03/27 02:27:21 ERROR rdd.NewDataFrameLoaderRDD: Executor task launch worker-9 
>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>17/03/27 02:27:21 ERROR executor.Executor: Exception in task 0.3 in stage 2.0 (TID 538)
>org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>
>
>
>Spark log:
>
>ERROR 27-03 02:27:21,407 - Task 0 in stage 2.0 failed 4 times; aborting job
>ERROR 27-03 02:27:21,419 - main load data frame failed
>org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 538, hd25): org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>
>
>Driver stacktrace:
>        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>        at scala.Option.foreach(Option.scala:236)
>        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
>        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665)
>        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794)
>        at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>        at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>        at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
>        at $line23.$read$$iwC$$iwC.<init>(<console>:46)
>        at $line23.$read$$iwC.<init>(<console>:48)
>        at $line23.$read.<init>(<console>:50)
>        at $line23.$read$.<init>(<console>:54)
>        at $line23.$read$.<clinit>(<console>)
>        at $line23.$eval$.<init>(<console>:7)
>        at $line23.$eval$.<clinit>(<console>)
>        at $line23.$eval.$print(<console>)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>        at org.apache.spark.repl.Main.main(Main.scala)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>Caused by: org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>ERROR 27-03 02:27:21,422 - main 
>org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 538, hd25): org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>
>
>Driver stacktrace:
>        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
>        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
>        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
>        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
>        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>        at scala.Option.foreach(Option.scala:236)
>        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
>        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
>        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
>        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
>        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
>        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665)
>        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794)
>        at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>        at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>        at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
>        at $line23.$read$$iwC$$iwC.<init>(<console>:46)
>        at $line23.$read$$iwC.<init>(<console>:48)
>        at $line23.$read.<init>(<console>:50)
>        at $line23.$read$.<init>(<console>:54)
>        at $line23.$read$.<clinit>(<console>)
>        at $line23.$eval$.<init>(<console>:7)
>        at $line23.$eval$.<clinit>(<console>)
>        at $line23.$eval.$print(<console>)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>        at org.apache.spark.repl.Main.main(Main.scala)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>Caused by: org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
>        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
>        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.NullPointerException
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
>        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
>        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
>        ... 10 more
>AUDIT 27-03 02:27:21,453 - [hd21][storm][Thread-1]Data load is failed for default.carbon_table
>ERROR 27-03 02:27:21,453 - main 
>java.lang.Exception: DataLoad failure: Data Loading failed for table carbon_table
>        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937)
>        at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>        at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>        at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>        at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
>        at $line23.$read$$iwC$$iwC.<init>(<console>:46)
>        at $line23.$read$$iwC.<init>(<console>:48)
>        at $line23.$read.<init>(<console>:50)
>        at $line23.$read$.<init>(<console>:54)
>        at $line23.$read$.<clinit>(<console>)
>        at $line23.$eval$.<init>(<console>:7)
>        at $line23.$eval$.<clinit>(<console>)
>        at $line23.$eval.$print(<console>)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>        at org.apache.spark.repl.Main.main(Main.scala)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>AUDIT 27-03 02:27:21,454 - [hd21][storm][Thread-1]Dataload failure for default.carbon_table. Please check the logs
>java.lang.Exception: DataLoad failure: Data Loading failed for table carbon_table
>        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937)
>        at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
>        at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
>        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
>        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
>        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
>        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
>        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
>        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
>        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
>        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
>        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
>        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
>        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
>        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
>        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
>        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
>        at $iwC$$iwC$$iwC.<init>(<console>:44)
>        at $iwC$$iwC.<init>(<console>:46)
>        at $iwC.<init>(<console>:48)
>        at <init>(<console>:50)
>        at .<init>(<console>:54)
>        at .<clinit>(<console>)
>        at .<init>(<console>:7)
>        at .<clinit>(<console>)
>        at $print(<console>)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>        at org.apache.spark.repl.Main$.main(Main.scala:31)
>        at org.apache.spark.repl.Main.main(Main.scala)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:606)
>        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>At 2017-03-27 00:42:28, "a" <ww...@163.com> wrote:
>
> 
>
> Container log : error executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM。
> spark log: 17/03/26 23:40:30 ERROR YarnScheduler: Lost executor 2 on hd25: Container killed by YARN for exceeding memory limits. 49.0 GB of 49 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
>The test sql
>
>
>
>
>
>
>
>At 2017-03-26 23:34:36, "a" <ww...@163.com> wrote:
>>
>>
>>I have set the parameters as follow：
>>1、fs.hdfs.impl.disable.cache=true
>>2、dfs.socket.timeout=1800000  （Exception：aused by: java.io.IOException: Filesystem closed）
>>3、dfs.datanode.socket.write.timeout=3600000
>>4、set carbondata property enable.unsafe.sort=true
>>5、remove BUCKETCOLUMNS property from the create table sql
>>6、set spark job parameter executor-memory=48G （from 20G to 48G）
>>
>>
>>But it  still failed, the error is "executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM。"
>>
>>
>>Then i try to insert 40000 0000 records into carbondata table ,it works success.
>>
>>
>>How can i insert 20 0000 0000 records into carbondata?
>>Should me set  executor-memory big enough? Or Should me generate the csv file from the hive table first ,then load the csv file into carbon table?
>>Any body give me same help?
>>
>>
>>Regards
>>fish
>>
>>
>>
>>
>>
>>
>>
>>At 2017-03-26 00:34:18, "a" <ww...@163.com> wrote:
>>>Thank you  Ravindra!
>>>Version:
>>>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is 2.7.1,hive version is 1.1.0
>>>one of the containers log:
>>>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
>>>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called
>>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called
>>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory /data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-700345a84109
>>>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl: pool-23-thread-2 
>>>java.io.IOException: Error reading file: hdfs://xxxx_table_tmp/dt=2017-01-01/pt=ios/000006_0
>>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046)
>>>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263)
>>>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547)
>>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1234)
>>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1218)
>>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1150)
>>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1136)
>>>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:249)
>>>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
>>>        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>>>        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>>        at org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(NewCarbonDataLoadRDD.scala:412)
>>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.internalHasNext(InputProcessorStepImpl.java:163)
>>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.getBatch(InputProcessorStepImpl.java:221)
>>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:183)
>>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:117)
>>>        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:80)
>>>        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:73)
>>>        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:196)
>>>        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:177)
>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>        at java.lang.Thread.run(Thread.java:745)
>>>Caused by: java.io.IOException: Filesystem closed
>>>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
>>>        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
>>>        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
>>>        at java.io.DataInputStream.readFully(DataInputStream.java:195)
>>>        at org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:112)
>>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228)
>>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805)
>>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776)
>>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
>>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
>>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042)
>>>        ... 26 more
>>>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS property ,and try again.
>>>
>>>
>>>At 2017-03-25 20:55:03, "Ravindra Pesala" <ra...@gmail.com> wrote:
>>>>Hi,
>>>>
>>>>Carbodata launches one job per each node to sort the data at node level and
>>>>avoid shuffling. Internally it uses threads to use parallel load. Please
>>>>use carbon.number.of.cores.while.loading property in carbon.properties file
>>>>and set the number of cores it should use per machine while loading.
>>>>Carbondata sorts the data  at each node level to maintain the Btree for
>>>>each node per segment. It improves the query performance by filtering
>>>>faster if we have Btree at node level instead of each block level.
>>>>
>>>>1.Which version of Carbondata are you using?
>>>>2.There are memory issues in Carbondata-1.0 version and are fixed current
>>>>master.
>>>>3.And you can improve the performance by enabling enable.unsafe.sort=true in
>>>>carbon.properties file. But it is not supported if bucketing of columns are
>>>>enabled. We are planning to support unsafe sort load for bucketing also in
>>>>next version.
>>>>
>>>>Please send the executor log to know about the error you are facing.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>Regards,
>>>>Ravindra
>>>>
>>>>On 25 March 2017 at 16:18, wwyxg@163.com <ww...@163.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> *0、The failure*
>>>>> When i insert into carbon table，i encounter failure。The failure is  as
>>>>> follow:
>>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>>>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>>>>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>>>>> Reason: Slave lost+details
>>>>>
>>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Slave lost
>>>>> Driver stacktrace:
>>>>>
>>>>> the stage:
>>>>>
>>>>> *Step:*
>>>>> *1、start spark－shell*
>>>>> ./bin/spark-shell \
>>>>> --master yarn-client \
>>>>> --num-executors 5 \  (I tried to set this parameter range from 10 to
>>>>> 20,but the second job has only 5 tasks)
>>>>> --executor-cores 5 \
>>>>> --executor-memory 20G \
>>>>> --driver-memory 8G \
>>>>> --queue root.default \
>>>>> --jars /xxx.jar
>>>>>
>>>>> //spark-default.conf spark.default.parallelism=320
>>>>>
>>>>> import org.apache.spark.sql.CarbonContext
>>>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
>>>>>
>>>>> *2、create table*
>>>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
>>>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
>>>>> String,scene String,status String,nw String,isc String,area String,spttag
>>>>> String,province String,isp String,city String,tv String,hwm String,pip
>>>>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
>>>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
>>>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
>>>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
>>>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
>>>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
>>>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
>>>>>
>>>>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
>>>>> //the column distinct values are as follows:
>>>>>
>>>>>
>>>>> *3、insert into table*（xxxx_table_tmp  is a hive extenal orc table，has 20
>>>>> 0000 0000 records）
>>>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
>>>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
>>>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
>>>>> xxxx_table_tmp where dt='2017-01-01'")
>>>>>
>>>>> *4、spark split sql into two jobs，the first finished succeeded, but the
>>>>> second failed:*
>>>>>
>>>>>
>>>>> *5、The second job stage:*
>>>>>
>>>>>
>>>>>
>>>>> *Question:*
>>>>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
>>>>> note:My hadoop cluster has 5 datanode）
>>>>>       I guess it caused the failure
>>>>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
>>>>> "one datanode has only one partition ,and then the task is only one on the
>>>>> datanode"?
>>>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
>>>>> as follow,but i can not find "carbon.table.split.partition.enable" in
>>>>> other parts of the project。
>>>>>      I set "carbon.table.split.partition.enable" to true, but the second
>>>>> job has only five jobs.How to use this property?
>>>>>      ExampleUtils :
>>>>>     // whether use table split partition
>>>>>     // true -> use table split partition, support multiple partition
>>>>> loading
>>>>>     // false -> use node split partition, support data load by host
>>>>> partition
>>>>>     CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
>>>>> "false")
>>>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can
>>>>> i speed it.
>>>>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
>>>>> 20,but the second job has only 5 tasks
>>>>>      the other parameter executor-memory = 20G is enough?
>>>>>
>>>>> I need your help!Thank you very much!
>>>>>
>>>>> wwyxg@163.com
>>>>>
>>>>> ------------------------------
>>>>> wwyxg@163.com
>>>>>
>>>>
>>>>
>>>>
>>>>-- 
>>>>Thanks & Regards,
>>>>Ravi

Re:Re:Re:Re:Re: insert into carbon table failed

Posted by a <ww...@163.com>.

I download  the newest sourcecode (master) and compile,generate the jar carbondata_2.11-1.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
Then i use spark2.1 test again.The error logs are as follow:


 Container log :
17/03/27 02:27:21 ERROR newflow.DataLoadExecutor: Executor task launch worker-9 Data Loading failed for table carbon_table
java.lang.NullPointerException
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
17/03/27 02:27:21 INFO rdd.NewDataFrameLoaderRDD: DataLoad failure
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
        ... 10 more
17/03/27 02:27:21 ERROR rdd.NewDataFrameLoaderRDD: Executor task launch worker-9 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
        ... 10 more
17/03/27 02:27:21 ERROR executor.Executor: Exception in task 0.3 in stage 2.0 (TID 538)
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
        ... 10 more



Spark log:

ERROR 27-03 02:27:21,407 - Task 0 in stage 2.0 failed 4 times; aborting job
ERROR 27-03 02:27:21,419 - main load data frame failed
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 538, hd25): org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
        ... 10 more


Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665)
        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794)
        at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
        at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
        at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
        at $line23.$read$$iwC$$iwC.<init>(<console>:46)
        at $line23.$read$$iwC.<init>(<console>:48)
        at $line23.$read.<init>(<console>:50)
        at $line23.$read$.<init>(<console>:54)
        at $line23.$read$.<clinit>(<console>)
        at $line23.$eval$.<init>(<console>:7)
        at $line23.$eval$.<clinit>(<console>)
        at $line23.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
        ... 10 more
ERROR 27-03 02:27:21,422 - main 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 538, hd25): org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
        ... 10 more


Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadDataFrame$1(CarbonDataRDDFactory.scala:665)
        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:794)
        at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
        at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
        at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
        at $line23.$read$$iwC$$iwC.<init>(<console>:46)
        at $line23.$read$$iwC.<init>(<console>:48)
        at $line23.$read.<init>(<console>:50)
        at $line23.$read$.<init>(<console>:54)
        at $line23.$read$.<clinit>(<console>)
        at $line23.$eval$.<init>(<console>:7)
        at $line23.$eval$.<clinit>(<console>)
        at $line23.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException: Data Loading failed for table carbon_table
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:54)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD$$anon$2.<init>(NewCarbonDataLoadRDD.scala:365)
        at org.apache.carbondata.spark.rdd.NewDataFrameLoaderRDD.compute(NewCarbonDataLoadRDD.scala:322)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.createConfiguration(DataLoadProcessBuilder.java:158)
        at org.apache.carbondata.processing.newflow.DataLoadProcessBuilder.build(DataLoadProcessBuilder.java:60)
        at org.apache.carbondata.processing.newflow.DataLoadExecutor.execute(DataLoadExecutor.java:43)
        ... 10 more
AUDIT 27-03 02:27:21,453 - [hd21][storm][Thread-1]Data load is failed for default.carbon_table
ERROR 27-03 02:27:21,453 - main 
java.lang.Exception: DataLoad failure: Data Loading failed for table carbon_table
        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937)
        at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
        at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
        at $line23.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
        at $line23.$read$$iwC$$iwC$$iwC.<init>(<console>:44)
        at $line23.$read$$iwC$$iwC.<init>(<console>:46)
        at $line23.$read$$iwC.<init>(<console>:48)
        at $line23.$read.<init>(<console>:50)
        at $line23.$read$.<init>(<console>:54)
        at $line23.$read$.<clinit>(<console>)
        at $line23.$eval$.<init>(<console>:7)
        at $line23.$eval$.<clinit>(<console>)
        at $line23.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
AUDIT 27-03 02:27:21,454 - [hd21][storm][Thread-1]Dataload failure for default.carbon_table. Please check the logs
java.lang.Exception: DataLoad failure: Data Loading failed for table carbon_table
        at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:937)
        at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:579)
        at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:297)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
        at org.apache.spark.sql.CarbonContext.sql(CarbonContext.scala:139)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
        at $iwC$$iwC$$iwC.<init>(<console>:44)
        at $iwC$$iwC.<init>(<console>:46)
        at $iwC.<init>(<console>:48)
        at <init>(<console>:50)
        at .<init>(<console>:54)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

At 2017-03-27 00:42:28, "a" <ww...@163.com> wrote:

 

 Container log : error executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM。
 spark log: 17/03/26 23:40:30 ERROR YarnScheduler: Lost executor 2 on hd25: Container killed by YARN for exceeding memory limits. 49.0 GB of 49 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
The test sql







At 2017-03-26 23:34:36, "a" <ww...@163.com> wrote:
>
>
>I have set the parameters as follow：
>1、fs.hdfs.impl.disable.cache=true
>2、dfs.socket.timeout=1800000  （Exception：aused by: java.io.IOException: Filesystem closed）
>3、dfs.datanode.socket.write.timeout=3600000
>4、set carbondata property enable.unsafe.sort=true
>5、remove BUCKETCOLUMNS property from the create table sql
>6、set spark job parameter executor-memory=48G （from 20G to 48G）
>
>
>But it  still failed, the error is "executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM。"
>
>
>Then i try to insert 40000 0000 records into carbondata table ,it works success.
>
>
>How can i insert 20 0000 0000 records into carbondata?
>Should me set  executor-memory big enough? Or Should me generate the csv file from the hive table first ,then load the csv file into carbon table?
>Any body give me same help?
>
>
>Regards
>fish
>
>
>
>
>
>
>
>At 2017-03-26 00:34:18, "a" <ww...@163.com> wrote:
>>Thank you  Ravindra!
>>Version:
>>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is 2.7.1,hive version is 1.1.0
>>one of the containers log:
>>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
>>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called
>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called
>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory /data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-700345a84109
>>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl: pool-23-thread-2 
>>java.io.IOException: Error reading file: hdfs://xxxx_table_tmp/dt=2017-01-01/pt=ios/000006_0
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1234)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1218)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1150)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1136)
>>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:249)
>>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
>>        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>>        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>        at org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(NewCarbonDataLoadRDD.scala:412)
>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.internalHasNext(InputProcessorStepImpl.java:163)
>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.getBatch(InputProcessorStepImpl.java:221)
>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:183)
>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:117)
>>        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:80)
>>        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:73)
>>        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:196)
>>        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:177)
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:745)
>>Caused by: java.io.IOException: Filesystem closed
>>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
>>        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
>>        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
>>        at java.io.DataInputStream.readFully(DataInputStream.java:195)
>>        at org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:112)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042)
>>        ... 26 more
>>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS property ,and try again.
>>
>>
>>At 2017-03-25 20:55:03, "Ravindra Pesala" <ra...@gmail.com> wrote:
>>>Hi,
>>>
>>>Carbodata launches one job per each node to sort the data at node level and
>>>avoid shuffling. Internally it uses threads to use parallel load. Please
>>>use carbon.number.of.cores.while.loading property in carbon.properties file
>>>and set the number of cores it should use per machine while loading.
>>>Carbondata sorts the data  at each node level to maintain the Btree for
>>>each node per segment. It improves the query performance by filtering
>>>faster if we have Btree at node level instead of each block level.
>>>
>>>1.Which version of Carbondata are you using?
>>>2.There are memory issues in Carbondata-1.0 version and are fixed current
>>>master.
>>>3.And you can improve the performance by enabling enable.unsafe.sort=true in
>>>carbon.properties file. But it is not supported if bucketing of columns are
>>>enabled. We are planning to support unsafe sort load for bucketing also in
>>>next version.
>>>
>>>Please send the executor log to know about the error you are facing.
>>>
>>>
>>>
>>>
>>>
>>>
>>>Regards,
>>>Ravindra
>>>
>>>On 25 March 2017 at 16:18, wwyxg@163.com <ww...@163.com> wrote:
>>>
>>>> Hello!
>>>>
>>>> *0、The failure*
>>>> When i insert into carbon table，i encounter failure。The failure is  as
>>>> follow:
>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>>>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>>>> Reason: Slave lost+details
>>>>
>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Slave lost
>>>> Driver stacktrace:
>>>>
>>>> the stage:
>>>>
>>>> *Step:*
>>>> *1、start spark－shell*
>>>> ./bin/spark-shell \
>>>> --master yarn-client \
>>>> --num-executors 5 \  (I tried to set this parameter range from 10 to
>>>> 20,but the second job has only 5 tasks)
>>>> --executor-cores 5 \
>>>> --executor-memory 20G \
>>>> --driver-memory 8G \
>>>> --queue root.default \
>>>> --jars /xxx.jar
>>>>
>>>> //spark-default.conf spark.default.parallelism=320
>>>>
>>>> import org.apache.spark.sql.CarbonContext
>>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
>>>>
>>>> *2、create table*
>>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
>>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
>>>> String,scene String,status String,nw String,isc String,area String,spttag
>>>> String,province String,isp String,city String,tv String,hwm String,pip
>>>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
>>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
>>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
>>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
>>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
>>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
>>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
>>>>
>>>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
>>>> //the column distinct values are as follows:
>>>>
>>>>
>>>> *3、insert into table*（xxxx_table_tmp  is a hive extenal orc table，has 20
>>>> 0000 0000 records）
>>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
>>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
>>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
>>>> xxxx_table_tmp where dt='2017-01-01'")
>>>>
>>>> *4、spark split sql into two jobs，the first finished succeeded, but the
>>>> second failed:*
>>>>
>>>>
>>>> *5、The second job stage:*
>>>>
>>>>
>>>>
>>>> *Question:*
>>>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
>>>> note:My hadoop cluster has 5 datanode）
>>>>       I guess it caused the failure
>>>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
>>>> "one datanode has only one partition ,and then the task is only one on the
>>>> datanode"?
>>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
>>>> as follow,but i can not find "carbon.table.split.partition.enable" in
>>>> other parts of the project。
>>>>      I set "carbon.table.split.partition.enable" to true, but the second
>>>> job has only five jobs.How to use this property?
>>>>      ExampleUtils :
>>>>     // whether use table split partition
>>>>     // true -> use table split partition, support multiple partition
>>>> loading
>>>>     // false -> use node split partition, support data load by host
>>>> partition
>>>>     CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
>>>> "false")
>>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can
>>>> i speed it.
>>>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
>>>> 20,but the second job has only 5 tasks
>>>>      the other parameter executor-memory = 20G is enough?
>>>>
>>>> I need your help!Thank you very much!
>>>>
>>>> wwyxg@163.com
>>>>
>>>> ------------------------------
>>>> wwyxg@163.com
>>>>
>>>
>>>
>>>
>>>-- 
>>>Thanks & Regards,
>>>Ravi

Re: Re:Re:Re: insert into carbon table failed

Posted by Ravindra Pesala <ra...@gmail.com>.

Hi,

Please  try to run on the master branch. As I mentioned earlier there are
few memory issues in 1.0 release. We already initiated new release 1.1.0,
so better try to run on the latest code.

And also please make sure that property
enable.unsafe.sort=true available to all nodes. It means carbon.properties
should be updated in all nodes.

Regards,
Ravindra.

On Sun, Mar 26, 2017, 22:27 a <ww...@163.com> wrote:

>
>  Container log : error executor.CoarseGrainedExecutorBackend: RECEIVED
> SIGNAL 15: SIGTERM。
>  spark log: 17/03/26 23:40:30 ERROR YarnScheduler: Lost executor 2 on
> hd25: Container killed by YARN for exceeding memory limits. 49.0 GB of 49
> GB physical memory used. Consider boosting
> spark.yarn.executor.memoryOverhead.
> The test sql
>
>
>
>
>
> At 2017-03-26 23:34:36, "a" <ww...@163.com> wrote:
> >
> >
> >I have set the parameters as follow：
> >1、fs.hdfs.impl.disable.cache=true
> >2、dfs.socket.timeout=1800000  （Exception：aused by: java.io.IOException: Filesystem closed）
> >3、dfs.datanode.socket.write.timeout=3600000
> >4、set carbondata property enable.unsafe.sort=true
> >5、remove BUCKETCOLUMNS property from the create table sql
> >6、set spark job parameter executor-memory=48G （from 20G to 48G）
> >
> >
> >But it  still failed, the error is "executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM。"
> >
> >
> >Then i try to insert 40000 0000 records into carbondata table ,it works success.
> >
> >
> >How can i insert 20 0000 0000 records into carbondata?
> >Should me set  executor-memory big enough? Or Should me generate the csv file from the hive table first ,then load the csv file into carbon table?
> >Any body give me same help?
> >
> >
> >Regards
> >fish
> >
> >
> >
> >
> >
> >
> >
> >At 2017-03-26 00:34:18, "a" <ww...@163.com> wrote:
> >>Thank you  Ravindra!
> >>Version:
> >>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is 2.7.1,hive version is 1.1.0
> >>one of the containers log:
> >>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
> >>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called
> >>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called
> >>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory /data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-700345a84109
> >>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl: pool-23-thread-2
> >>java.io.IOException: Error reading file: hdfs://xxxx_table_tmp/dt=2017-01-01/pt=ios/000006_0
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1234)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1218)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1150)
> >>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1136)
> >>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:249)
> >>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
> >>        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> >>        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> >>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> >>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> >>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> >>        at org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(NewCarbonDataLoadRDD.scala:412)
> >>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.internalHasNext(InputProcessorStepImpl.java:163)
> >>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.getBatch(InputProcessorStepImpl.java:221)
> >>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:183)
> >>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:117)
> >>        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:80)
> >>        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:73)
> >>        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:196)
> >>        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:177)
> >>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>        at java.lang.Thread.run(Thread.java:745)
> >>Caused by: java.io.IOException: Filesystem closed
> >>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
> >>        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
> >>        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
> >>        at java.io.DataInputStream.readFully(DataInputStream.java:195)
> >>        at org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:112)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
> >>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042)
> >>        ... 26 more
> >>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS property ,and try again.
> >>
> >>
> >>At 2017-03-25 20:55:03, "Ravindra Pesala" <ra...@gmail.com> wrote:
> >>>Hi,
> >>>
> >>>Carbodata launches one job per each node to sort the data at node level and
> >>>avoid shuffling. Internally it uses threads to use parallel load. Please
> >>>use carbon.number.of.cores.while.loading property in carbon.properties file
> >>>and set the number of cores it should use per machine while loading.
> >>>Carbondata sorts the data  at each node level to maintain the Btree for
> >>>each node per segment. It improves the query performance by filtering
> >>>faster if we have Btree at node level instead of each block level.
> >>>
> >>>1.Which version of Carbondata are you using?
> >>>2.There are memory issues in Carbondata-1.0 version and are fixed current
> >>>master.
> >>>3.And you can improve the performance by enabling enable.unsafe.sort=true in
> >>>carbon.properties file. But it is not supported if bucketing of columns are
> >>>enabled. We are planning to support unsafe sort load for bucketing also in
> >>>next version.
> >>>
> >>>Please send the executor log to know about the error you are facing.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>Regards,
> >>>Ravindra
> >>>
> >>>On 25 March 2017 at 16:18, wwyxg@163.com <ww...@163.com> wrote:
> >>>
> >>>> Hello!
> >>>>
> >>>> *0、The failure*
> >>>> When i insert into carbon table，i encounter failure。The failure is  as
> >>>> follow:
> >>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
> >>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
> >>>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
> >>>> Reason: Slave lost+details
> >>>>
> >>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Slave lost
> >>>> Driver stacktrace:
> >>>>
> >>>> the stage:
> >>>>
> >>>> *Step:*
> >>>> *1、start spark－shell*
> >>>> ./bin/spark-shell \
> >>>> --master yarn-client \
> >>>> --num-executors 5 \  (I tried to set this parameter range from 10 to
> >>>> 20,but the second job has only 5 tasks)
> >>>> --executor-cores 5 \
> >>>> --executor-memory 20G \
> >>>> --driver-memory 8G \
> >>>> --queue root.default \
> >>>> --jars /xxx.jar
> >>>>
> >>>> //spark-default.conf spark.default.parallelism=320
> >>>>
> >>>> import org.apache.spark.sql.CarbonContext
> >>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
> >>>>
> >>>> *2、create table*
> >>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
> >>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
> >>>> String,scene String,status String,nw String,isc String,area String,spttag
> >>>> String,province String,isp String,city String,tv String,hwm String,pip
> >>>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
> >>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
> >>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
> >>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
> >>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
> >>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
> >>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
> >>>>
> >>>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
> >>>> //the column distinct values are as follows:
> >>>>
> >>>>
> >>>> *3、insert into table*（xxxx_table_tmp  is a hive extenal orc table，has 20
> >>>> 0000 0000 records）
> >>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
> >>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
> >>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
> >>>> xxxx_table_tmp where dt='2017-01-01'")
> >>>>
> >>>> *4、spark split sql into two jobs，the first finished succeeded, but the
> >>>> second failed:*
> >>>>
> >>>>
> >>>> *5、The second job stage:*
> >>>>
> >>>>
> >>>>
> >>>> *Question:*
> >>>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
> >>>> note:My hadoop cluster has 5 datanode）
> >>>>       I guess it caused the failure
> >>>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
> >>>> "one datanode has only one partition ,and then the task is only one on the
> >>>> datanode"?
> >>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
> >>>> as follow,but i can not find "carbon.table.split.partition.enable" in
> >>>> other parts of the project。
> >>>>      I set "carbon.table.split.partition.enable" to true, but the second
> >>>> job has only five jobs.How to use this property?
> >>>>      ExampleUtils :
> >>>>     // whether use table split partition
> >>>>     // true -> use table split partition, support multiple partition
> >>>> loading
> >>>>     // false -> use node split partition, support data load by host
> >>>> partition
> >>>>     CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
> >>>> "false")
> >>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can
> >>>> i speed it.
> >>>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
> >>>> 20,but the second job has only 5 tasks
> >>>>      the other parameter executor-memory = 20G is enough?
> >>>>
> >>>> I need your help!Thank you very much!
> >>>>
> >>>> wwyxg@163.com
> >>>>
> >>>> ------------------------------
> >>>> wwyxg@163.com
> >>>>
> >>>
> >>>
> >>>
> >>>--
> >>>Thanks & Regards,
> >>>Ravi
>
>

Re:Re:Re:Re: insert into carbon table failed

Posted by a <ww...@163.com>.

 

 Container log : error executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM。
 spark log: 17/03/26 23:40:30 ERROR YarnScheduler: Lost executor 2 on hd25: Container killed by YARN for exceeding memory limits. 49.0 GB of 49 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
The test sql







At 2017-03-26 23:34:36, "a" <ww...@163.com> wrote:
>
>
>I have set the parameters as follow：
>1、fs.hdfs.impl.disable.cache=true
>2、dfs.socket.timeout=1800000  （Exception：aused by: java.io.IOException: Filesystem closed）
>3、dfs.datanode.socket.write.timeout=3600000
>4、set carbondata property enable.unsafe.sort=true
>5、remove BUCKETCOLUMNS property from the create table sql
>6、set spark job parameter executor-memory=48G （from 20G to 48G）
>
>
>But it  still failed, the error is "executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM。"
>
>
>Then i try to insert 40000 0000 records into carbondata table ,it works success.
>
>
>How can i insert 20 0000 0000 records into carbondata?
>Should me set  executor-memory big enough? Or Should me generate the csv file from the hive table first ,then load the csv file into carbon table?
>Any body give me same help?
>
>
>Regards
>fish
>
>
>
>
>
>
>
>At 2017-03-26 00:34:18, "a" <ww...@163.com> wrote:
>>Thank you  Ravindra!
>>Version:
>>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is 2.7.1,hive version is 1.1.0
>>one of the containers log:
>>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
>>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called
>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called
>>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory /data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-700345a84109
>>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl: pool-23-thread-2 
>>java.io.IOException: Error reading file: hdfs://xxxx_table_tmp/dt=2017-01-01/pt=ios/000006_0
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1234)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1218)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1150)
>>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1136)
>>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:249)
>>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
>>        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>>        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>        at org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(NewCarbonDataLoadRDD.scala:412)
>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.internalHasNext(InputProcessorStepImpl.java:163)
>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.getBatch(InputProcessorStepImpl.java:221)
>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:183)
>>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:117)
>>        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:80)
>>        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:73)
>>        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:196)
>>        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:177)
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:745)
>>Caused by: java.io.IOException: Filesystem closed
>>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
>>        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
>>        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
>>        at java.io.DataInputStream.readFully(DataInputStream.java:195)
>>        at org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:112)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
>>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042)
>>        ... 26 more
>>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS property ,and try again.
>>
>>
>>At 2017-03-25 20:55:03, "Ravindra Pesala" <ra...@gmail.com> wrote:
>>>Hi,
>>>
>>>Carbodata launches one job per each node to sort the data at node level and
>>>avoid shuffling. Internally it uses threads to use parallel load. Please
>>>use carbon.number.of.cores.while.loading property in carbon.properties file
>>>and set the number of cores it should use per machine while loading.
>>>Carbondata sorts the data  at each node level to maintain the Btree for
>>>each node per segment. It improves the query performance by filtering
>>>faster if we have Btree at node level instead of each block level.
>>>
>>>1.Which version of Carbondata are you using?
>>>2.There are memory issues in Carbondata-1.0 version and are fixed current
>>>master.
>>>3.And you can improve the performance by enabling enable.unsafe.sort=true in
>>>carbon.properties file. But it is not supported if bucketing of columns are
>>>enabled. We are planning to support unsafe sort load for bucketing also in
>>>next version.
>>>
>>>Please send the executor log to know about the error you are facing.
>>>
>>>
>>>
>>>
>>>
>>>
>>>Regards,
>>>Ravindra
>>>
>>>On 25 March 2017 at 16:18, wwyxg@163.com <ww...@163.com> wrote:
>>>
>>>> Hello!
>>>>
>>>> *0、The failure*
>>>> When i insert into carbon table，i encounter failure。The failure is  as
>>>> follow:
>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>>>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>>>> Reason: Slave lost+details
>>>>
>>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Slave lost
>>>> Driver stacktrace:
>>>>
>>>> the stage:
>>>>
>>>> *Step:*
>>>> *1、start spark－shell*
>>>> ./bin/spark-shell \
>>>> --master yarn-client \
>>>> --num-executors 5 \  (I tried to set this parameter range from 10 to
>>>> 20,but the second job has only 5 tasks)
>>>> --executor-cores 5 \
>>>> --executor-memory 20G \
>>>> --driver-memory 8G \
>>>> --queue root.default \
>>>> --jars /xxx.jar
>>>>
>>>> //spark-default.conf spark.default.parallelism=320
>>>>
>>>> import org.apache.spark.sql.CarbonContext
>>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
>>>>
>>>> *2、create table*
>>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
>>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
>>>> String,scene String,status String,nw String,isc String,area String,spttag
>>>> String,province String,isp String,city String,tv String,hwm String,pip
>>>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
>>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
>>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
>>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
>>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
>>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
>>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
>>>>
>>>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
>>>> //the column distinct values are as follows:
>>>>
>>>>
>>>> *3、insert into table*（xxxx_table_tmp  is a hive extenal orc table，has 20
>>>> 0000 0000 records）
>>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
>>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
>>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
>>>> xxxx_table_tmp where dt='2017-01-01'")
>>>>
>>>> *4、spark split sql into two jobs，the first finished succeeded, but the
>>>> second failed:*
>>>>
>>>>
>>>> *5、The second job stage:*
>>>>
>>>>
>>>>
>>>> *Question:*
>>>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
>>>> note:My hadoop cluster has 5 datanode）
>>>>       I guess it caused the failure
>>>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
>>>> "one datanode has only one partition ,and then the task is only one on the
>>>> datanode"?
>>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
>>>> as follow,but i can not find "carbon.table.split.partition.enable" in
>>>> other parts of the project。
>>>>      I set "carbon.table.split.partition.enable" to true, but the second
>>>> job has only five jobs.How to use this property?
>>>>      ExampleUtils :
>>>>     // whether use table split partition
>>>>     // true -> use table split partition, support multiple partition
>>>> loading
>>>>     // false -> use node split partition, support data load by host
>>>> partition
>>>>     CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
>>>> "false")
>>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can
>>>> i speed it.
>>>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
>>>> 20,but the second job has only 5 tasks
>>>>      the other parameter executor-memory = 20G is enough?
>>>>
>>>> I need your help!Thank you very much!
>>>>
>>>> wwyxg@163.com
>>>>
>>>> ------------------------------
>>>> wwyxg@163.com
>>>>
>>>
>>>
>>>
>>>-- 
>>>Thanks & Regards,
>>>Ravi

Re:Re:Re: insert into carbon table failed

Posted by a <ww...@163.com>.


I have set the parameters as follow：
1、fs.hdfs.impl.disable.cache=true
2、dfs.socket.timeout=1800000  （Exception：aused by: java.io.IOException: Filesystem closed）
3、dfs.datanode.socket.write.timeout=3600000
4、set carbondata property enable.unsafe.sort=true
5、remove BUCKETCOLUMNS property from the create table sql
6、set spark job parameter executor-memory=48G （from 20G to 48G）


But it  still failed, the error is "executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM。"


Then i try to insert 40000 0000 records into carbondata table ,it works success.


How can i insert 20 0000 0000 records into carbondata?
Should me set  executor-memory big enough? Or Should me generate the csv file from the hive table first ,then load the csv file into carbon table?
Any body give me same help?


Regards
fish







At 2017-03-26 00:34:18, "a" <ww...@163.com> wrote:
>Thank you  Ravindra!
>Version:
>My carbondata version is 1.0,spark version is 1.6.3,hadoop version is 2.7.1,hive version is 1.1.0
>one of the containers log:
>17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
>17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called
>17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called
>17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory /data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-700345a84109
>17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl: pool-23-thread-2 
>java.io.IOException: Error reading file: hdfs://xxxx_table_tmp/dt=2017-01-01/pt=ios/000006_0
>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046)
>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263)
>        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547)
>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1234)
>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1218)
>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1150)
>        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1136)
>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:249)
>        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
>        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>        at org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(NewCarbonDataLoadRDD.scala:412)
>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.internalHasNext(InputProcessorStepImpl.java:163)
>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.getBatch(InputProcessorStepImpl.java:221)
>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:183)
>        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:117)
>        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:80)
>        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:73)
>        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:196)
>        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:177)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:745)
>Caused by: java.io.IOException: Filesystem closed
>        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
>        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
>        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
>        at java.io.DataInputStream.readFully(DataInputStream.java:195)
>        at org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:112)
>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228)
>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805)
>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776)
>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
>        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042)
>        ... 26 more
>I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS property ,and try again.
>
>
>At 2017-03-25 20:55:03, "Ravindra Pesala" <ra...@gmail.com> wrote:
>>Hi,
>>
>>Carbodata launches one job per each node to sort the data at node level and
>>avoid shuffling. Internally it uses threads to use parallel load. Please
>>use carbon.number.of.cores.while.loading property in carbon.properties file
>>and set the number of cores it should use per machine while loading.
>>Carbondata sorts the data  at each node level to maintain the Btree for
>>each node per segment. It improves the query performance by filtering
>>faster if we have Btree at node level instead of each block level.
>>
>>1.Which version of Carbondata are you using?
>>2.There are memory issues in Carbondata-1.0 version and are fixed current
>>master.
>>3.And you can improve the performance by enabling enable.unsafe.sort=true in
>>carbon.properties file. But it is not supported if bucketing of columns are
>>enabled. We are planning to support unsafe sort load for bucketing also in
>>next version.
>>
>>Please send the executor log to know about the error you are facing.
>>
>>
>>
>>
>>
>>
>>Regards,
>>Ravindra
>>
>>On 25 March 2017 at 16:18, wwyxg@163.com <ww...@163.com> wrote:
>>
>>> Hello!
>>>
>>> *0、The failure*
>>> When i insert into carbon table，i encounter failure。The failure is  as
>>> follow:
>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>>> Reason: Slave lost+details
>>>
>>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Slave lost
>>> Driver stacktrace:
>>>
>>> the stage:
>>>
>>> *Step:*
>>> *1、start spark－shell*
>>> ./bin/spark-shell \
>>> --master yarn-client \
>>> --num-executors 5 \  (I tried to set this parameter range from 10 to
>>> 20,but the second job has only 5 tasks)
>>> --executor-cores 5 \
>>> --executor-memory 20G \
>>> --driver-memory 8G \
>>> --queue root.default \
>>> --jars /xxx.jar
>>>
>>> //spark-default.conf spark.default.parallelism=320
>>>
>>> import org.apache.spark.sql.CarbonContext
>>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
>>>
>>> *2、create table*
>>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
>>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
>>> String,scene String,status String,nw String,isc String,area String,spttag
>>> String,province String,isp String,city String,tv String,hwm String,pip
>>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
>>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
>>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
>>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
>>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
>>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
>>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
>>>
>>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
>>> //the column distinct values are as follows:
>>>
>>>
>>> *3、insert into table*（xxxx_table_tmp  is a hive extenal orc table，has 20
>>> 0000 0000 records）
>>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
>>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
>>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
>>> xxxx_table_tmp where dt='2017-01-01'")
>>>
>>> *4、spark split sql into two jobs，the first finished succeeded, but the
>>> second failed:*
>>>
>>>
>>> *5、The second job stage:*
>>>
>>>
>>>
>>> *Question:*
>>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
>>> note:My hadoop cluster has 5 datanode）
>>>       I guess it caused the failure
>>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
>>> "one datanode has only one partition ,and then the task is only one on the
>>> datanode"?
>>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
>>> as follow,but i can not find "carbon.table.split.partition.enable" in
>>> other parts of the project。
>>>      I set "carbon.table.split.partition.enable" to true, but the second
>>> job has only five jobs.How to use this property?
>>>      ExampleUtils :
>>>     // whether use table split partition
>>>     // true -> use table split partition, support multiple partition
>>> loading
>>>     // false -> use node split partition, support data load by host
>>> partition
>>>     CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
>>> "false")
>>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can
>>> i speed it.
>>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
>>> 20,but the second job has only 5 tasks
>>>      the other parameter executor-memory = 20G is enough?
>>>
>>> I need your help!Thank you very much!
>>>
>>> wwyxg@163.com
>>>
>>> ------------------------------
>>> wwyxg@163.com
>>>
>>
>>
>>
>>-- 
>>Thanks & Regards,
>>Ravi

Re:Re: insert into carbon table failed

Posted by a <ww...@163.com>.

Thank you  Ravindra!
Version:
My carbondata version is 1.0,spark version is 1.6.3,hadoop version is 2.7.1,hive version is 1.1.0
one of the containers log:
17/03/25 22:07:09 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
17/03/25 22:07:09 INFO storage.DiskBlockManager: Shutdown hook called
17/03/25 22:07:09 INFO util.ShutdownHookManager: Shutdown hook called
17/03/25 22:07:09 INFO util.ShutdownHookManager: Deleting directory /data1/hadoop/hd_space/tmp/nm-local-dir/usercache/storm/appcache/application_1490340325187_0042/spark-84b305f9-af7b-4f58-a809-700345a84109
17/03/25 22:07:10 ERROR impl.ParallelReadMergeSorterImpl: pool-23-thread-2 
java.io.IOException: Error reading file: hdfs://xxxx_table_tmp/dt=2017-01-01/pt=ios/000006_0
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046)
        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.next(OrcRawRecordMerger.java:263)
        at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.next(OrcRawRecordMerger.java:547)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1234)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$1.next(OrcInputFormat.java:1218)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1150)
        at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$NullKeyRecordReader.next(OrcInputFormat.java:1136)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:249)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at org.apache.carbondata.spark.rdd.NewRddIterator.hasNext(NewCarbonDataLoadRDD.scala:412)
        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.internalHasNext(InputProcessorStepImpl.java:163)
        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.getBatch(InputProcessorStepImpl.java:221)
        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:183)
        at org.apache.carbondata.processing.newflow.steps.InputProcessorStepImpl$InputProcessorIterator.next(InputProcessorStepImpl.java:117)
        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:80)
        at org.apache.carbondata.processing.newflow.steps.DataConverterProcessorStepImpl$1.next(DataConverterProcessorStepImpl.java:73)
        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:196)
        at org.apache.carbondata.processing.newflow.sort.impl.ParallelReadMergeSorterImpl$SortIteratorThread.call(ParallelReadMergeSorterImpl.java:177)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:112)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1042)
        ... 26 more
I will try to set enable.unsafe.sort=true and remove BUCKETCOLUMNS property ,and try again.


At 2017-03-25 20:55:03, "Ravindra Pesala" <ra...@gmail.com> wrote:
>Hi,
>
>Carbodata launches one job per each node to sort the data at node level and
>avoid shuffling. Internally it uses threads to use parallel load. Please
>use carbon.number.of.cores.while.loading property in carbon.properties file
>and set the number of cores it should use per machine while loading.
>Carbondata sorts the data  at each node level to maintain the Btree for
>each node per segment. It improves the query performance by filtering
>faster if we have Btree at node level instead of each block level.
>
>1.Which version of Carbondata are you using?
>2.There are memory issues in Carbondata-1.0 version and are fixed current
>master.
>3.And you can improve the performance by enabling enable.unsafe.sort=true in
>carbon.properties file. But it is not supported if bucketing of columns are
>enabled. We are planning to support unsafe sort load for bucketing also in
>next version.
>
>Please send the executor log to know about the error you are facing.
>
>
>
>
>
>
>Regards,
>Ravindra
>
>On 25 March 2017 at 16:18, wwyxg@163.com <ww...@163.com> wrote:
>
>> Hello!
>>
>> *0、The failure*
>> When i insert into carbon table，i encounter failure。The failure is  as
>> follow:
>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
>> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
>> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
>> Reason: Slave lost+details
>>
>> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Slave lost
>> Driver stacktrace:
>>
>> the stage:
>>
>> *Step:*
>> *1、start spark－shell*
>> ./bin/spark-shell \
>> --master yarn-client \
>> --num-executors 5 \  (I tried to set this parameter range from 10 to
>> 20,but the second job has only 5 tasks)
>> --executor-cores 5 \
>> --executor-memory 20G \
>> --driver-memory 8G \
>> --queue root.default \
>> --jars /xxx.jar
>>
>> //spark-default.conf spark.default.parallelism=320
>>
>> import org.apache.spark.sql.CarbonContext
>> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
>>
>> *2、create table*
>> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
>> String,plat String,sty String,is_pay String,is_vip String,is_mpack
>> String,scene String,status String,nw String,isc String,area String,spttag
>> String,province String,isp String,city String,tv String,hwm String,pip
>> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
>> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
>> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
>> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
>> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
>> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
>> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
>>
>> //notes，set "fo" column BUCKETCOLUMNS is to join another table
>> //the column distinct values are as follows:
>>
>>
>> *3、insert into table*（xxxx_table_tmp  is a hive extenal orc table，has 20
>> 0000 0000 records）
>> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
>> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
>> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
>> xxxx_table_tmp where dt='2017-01-01'")
>>
>> *4、spark split sql into two jobs，the first finished succeeded, but the
>> second failed:*
>>
>>
>> *5、The second job stage:*
>>
>>
>>
>> *Question:*
>> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
>> note:My hadoop cluster has 5 datanode）
>>       I guess it caused the failure
>> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
>> "one datanode has only one partition ,and then the task is only one on the
>> datanode"?
>> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
>> as follow,but i can not find "carbon.table.split.partition.enable" in
>> other parts of the project。
>>      I set "carbon.table.split.partition.enable" to true, but the second
>> job has only five jobs.How to use this property?
>>      ExampleUtils :
>>     // whether use table split partition
>>     // true -> use table split partition, support multiple partition
>> loading
>>     // false -> use node split partition, support data load by host
>> partition
>>     CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
>> "false")
>> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can
>> i speed it.
>> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
>> 20,but the second job has only 5 tasks
>>      the other parameter executor-memory = 20G is enough?
>>
>> I need your help!Thank you very much!
>>
>> wwyxg@163.com
>>
>> ------------------------------
>> wwyxg@163.com
>>
>
>
>
>-- 
>Thanks & Regards,
>Ravi

Re: insert into carbon table failed

Posted by william <al...@gmail.com>.

I guess then word node in "Carbodata launches one job per each node to sort
the data at node level and
avoid shuffling" may make some confuse. I guess  carbondata should launches
one task per each executor . here job should be task ,node should be
executor.

Maybe he can try increase the number of executors to avoid memory problem

Re: insert into carbon table failed

Posted by Ravindra Pesala <ra...@gmail.com>.

Hi,

Carbodata launches one job per each node to sort the data at node level and
avoid shuffling. Internally it uses threads to use parallel load. Please
use carbon.number.of.cores.while.loading property in carbon.properties file
and set the number of cores it should use per machine while loading.
Carbondata sorts the data  at each node level to maintain the Btree for
each node per segment. It improves the query performance by filtering
faster if we have Btree at node level instead of each block level.

1.Which version of Carbondata are you using?
2.There are memory issues in Carbondata-1.0 version and are fixed current
master.
3.And you can improve the performance by enabling enable.unsafe.sort=true in
carbon.properties file. But it is not supported if bucketing of columns are
enabled. We are planning to support unsafe sort load for bucketing also in
next version.

Please send the executor log to know about the error you are facing.






Regards,
Ravindra

On 25 March 2017 at 16:18, wwyxg@163.com <ww...@163.com> wrote:

> Hello!
>
> *0、The failure*
> When i insert into carbon table，i encounter failure。The failure is  as
> follow:
> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26):
> ExecutorLostFailure (executor 1 exited caused by one of the running tasks)
> Reason: Slave lost+details
>
> Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 1007, hd26): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Slave lost
> Driver stacktrace:
>
> the stage:
>
> *Step:*
> *1、start spark－shell*
> ./bin/spark-shell \
> --master yarn-client \
> --num-executors 5 \  (I tried to set this parameter range from 10 to
> 20,but the second job has only 5 tasks)
> --executor-cores 5 \
> --executor-memory 20G \
> --driver-memory 8G \
> --queue root.default \
> --jars /xxx.jar
>
> //spark-default.conf spark.default.parallelism=320
>
> import org.apache.spark.sql.CarbonContext
> val cc = new CarbonContext(sc, "hdfs://xxxx/carbonData/CarbonStore")
>
> *2、create table*
> cc.sql("CREATE TABLE IF NOT EXISTS xxxx_table (dt String,pt String,lst
> String,plat String,sty String,is_pay String,is_vip String,is_mpack
> String,scene String,status String,nw String,isc String,area String,spttag
> String,province String,isp String,city String,tv String,hwm String,pip
> String,fo String,sh String,mid String,user_id String,play_pv Int,spt_cnt
> Int,prg_spt_cnt Int) row format delimited fields terminated by '|' STORED
> BY 'carbondata' TBLPROPERTIES ('DICTIONARY_EXCLUDE'='pip,sh,
> mid,fo,user_id','DICTIONARY_INCLUDE'='dt,pt,lst,plat,sty,
> is_pay,is_vip,is_mpack,scene,status,nw,isc,area,spttag,
> province,isp,city,tv,hwm','NO_INVERTED_INDEX'='lst,plat,hwm,
> pip,sh,mid','BUCKETNUMBER'='10','BUCKETCOLUMNS'='fo')")
>
> //notes，set "fo" column BUCKETCOLUMNS is to join another table
> //the column distinct values are as follows:
>
>
> *3、insert into table*（xxxx_table_tmp  is a hive extenal orc table，has 20
> 0000 0000 records）
> cc.sql("insert into xxxx_table select dt,pt,lst,plat,sty,is_pay,is_
> vip,is_mpack,scene,status,nw,isc,area,spttag,province,isp,
> city,tv,hwm,pip,fo,sh,mid,user_id ,play_pv,spt_cnt,prg_spt_cnt from
> xxxx_table_tmp where dt='2017-01-01'")
>
> *4、spark split sql into two jobs，the first finished succeeded, but the
> second failed:*
>
>
> *5、The second job stage:*
>
>
>
> *Question:*
> 1、Why the second job has only five jobs,but the first job has 994 jobs ?(
> note:My hadoop cluster has 5 datanode）
>       I guess it caused the failure
> 2、In the sources,i find DataLoadPartitionCoalescer.class，is it means that
> "one datanode has only one partition ,and then the task is only one on the
> datanode"?
> 3、In the ExampleUtils class,"carbon.table.split.partition.enable" is set
> as follow,but i can not find "carbon.table.split.partition.enable" in
> other parts of the project。
>      I set "carbon.table.split.partition.enable" to true, but the second
> job has only five jobs.How to use this property?
>      ExampleUtils :
>     // whether use table split partition
>     // true -> use table split partition, support multiple partition
> loading
>     // false -> use node split partition, support data load by host
> partition
>     CarbonProperties.getInstance().addProperty("carbon.table.split.partition.enable",
> "false")
> 4、Insert into carbon table takes 3 hours ,but eventually failed 。How can
> i speed it.
> 5、in the spark-shell  ,I tried to set this parameter range from 10 to
> 20,but the second job has only 5 tasks
>      the other parameter executor-memory = 20G is enough?
>
> I need your help!Thank you very much!
>
> wwyxg@163.com
>
> ------------------------------
> wwyxg@163.com
>



-- 
Thanks & Regards,
Ravi