You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Lu Cao <wh...@gmail.com> on 2016/12/01 09:14:56 UTC

carbondata loading

Hi dev team,
I'm loading data from parquet file to carbondata file(DF read parquet and
save to csv then load into carbondata file). The job is blocked at "collect
at CarbonDataRDDFactory.scala:963"



*Job Id*

*Description*

*Submitted*

*Duration*

*Stages: Succeeded/Total*

*Tasks (for all stages): Succeeded/Total*

6

collect at CarbonDataRDDFactory.scala:963
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=6>

2016/12/01 13:56:43

3.1 h

0/1

0/2
Completed Jobs (6)

*Job Id*

*Description*

*Submitted*

*Duration*

*Stages: Succeeded/Total*

*Tasks (for all stages): Succeeded/Total*

5

collect at GlobalDictionaryUtil.scala:800
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=5>

2016/12/01 13:34:25

22 min

2/2

422/422

4

take at CarbonCsvRelation.scala:181
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=4>

2016/12/01 13:34:25

0.1 s

1/1

1/1

3

saveAsTextFile at package.scala:169
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=3>

2016/12/01 13:11:02

23 min

1/1

50/50

2

count at SaicSparkConvert.scala:40
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=2>

2016/12/01 13:10:31

31 s

2/2

51/51

1

parquet at SaicSparkConvert.scala:35
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=1>

2016/12/01 13:10:28

1 s

1/1

2/2

0

parquet at SaicSparkConvert.scala:35
<http://10.129.96.13:8088/proxy/application_1479961381214_0612/jobs/job?id=0>

2016/12/01 13:10:26

2 s

1/1

2/2


I looked into the stdout, the log are all the same warning.


WARN  01-12 13:56:46,096 - [pool-25-thread-5][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-4][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-1][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-2][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-6][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-2][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.

WARN  01-12 13:56:46,096 - [pool-25-thread-1][partitionID:carbontest]
Cannot convert : null to Numeric type value. Value considered as null.


My configuration is

--master yarn-custer

--driver-memory 8g

--executor-memory 120g

--num-executors 3


Any idea for this? Is it caused by data type?


Thanks,

Lionel