You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by 喜之郎 <25...@qq.com> on 2018/04/17 12:55:52 UTC

回复: query on string type return error

hi, liang chen.
I start thriftserver, then use beeline to execute this sql.  I use "insert into XXX select * from a_parquet_table" to load data.
I deploy a yarn cluster.


Because I can not find what's the problem, I  use "insert overwrite" to load data again, then the problem disappear.




------------------ 原始邮件 ------------------
发件人: "Liang Chen"<ch...@gmail.com>;
发送时间: 2018年4月16日(星期一) 下午3:51
收件人: "dev"<de...@carbondata.apache.org>;

主题: Re: query on string type return error



Hi

From the log message, seems like can't find the data files.
Can you provide more detail info : 
1. How you created carbonsession and how loaded data.
2. Have you deployed cluster or only single machine?

Regards
Liang


喜之郎 wrote
> hi all, when I use carbondata to run a query "select count(*) from
> action_carbondata where starttimestr = 20180301;", then an error occurs.
> This is the error info:
> ###################
> 0: jdbc:hive2://localhost:10000> select count(*) from action_carbondata
> where starttimestr = 20180301;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 12 in stage 7.0 failed 4 times, most recent failure: Lost task 12.3
> in stage 7.0 (TID 173, sz-pg-entanalytics-research-001.tendcloud.com,
> executor 1): org.apache.spark.util.TaskCompletionListenerException:
> org.apache.carbondata.core.scan.executor.exception.QueryExecutionException:
> 
> 
> Previous exception in task: java.util.concurrent.ExecutionException:
> java.util.concurrent.ExecutionException: java.io.IOException:
> org.apache.thrift.protocol.TProtocolException: Required field
> 'data_chunk_list' was not present! Struct:
> DataChunk3(data_chunk_list:null)
> 
> org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.updateScanner(AbstractDataBlockIterator.java:136)
> 
> org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.processNextBatch(DataBlockIteratorImpl.java:64)
> 
> org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:46)
> 
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:283)
> 
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:171)
> 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:391)
> 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown
> Source)
> 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown
> Source)
> 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
> Source)
> 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
> 	scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> 	org.apache.spark.scheduler.Task.run(Task.scala:108)
> 	org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	java.lang.Thread.run(Thread.java:745)
> 	at
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
> 	at
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:118)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
> 	at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> 
> 
> Driver stacktrace: (state=,code=0)
> 
> ###################
> 
> 
> create table statement:
> CREATE TABLE action_carbondata(
> cur_appversioncode  integer,
> cur_appversionname  integer,
> cur_browserid  integer,
> cur_carrierid  integer,
> cur_channelid  integer,
> cur_cityid  integer,
> cur_countryid  integer,
> cur_ip  string,
> cur_networkid  integer,
> cur_osid  integer,
> cur_provinceid  integer,
> deviceproductoffset  long,
> duration  integer,
> eventcount  integer,
> eventlabelid  integer,
> eventtypeid  integer,
> organizationid  integer,
> platformid  integer,
> productid  integer,
> relatedaccountproductoffset  long,
> sessionduration  integer,
> sessionid  string,
> sessionstarttime  long,
> sessionstatus  integer,
> sourceid  integer,
> starttime  long,
> starttimestr  string )
> partitioned by (eventid int)
> STORED BY 'carbondata'
> TBLPROPERTIES ('partition_type'='Hash','NUM_PARTITIONS'='39',
> 'SORT_COLUMNS'='productid,sourceid,starttimestr,platformid,organizationid,eventtypeid,eventlabelid,cur_channelid,cur_provinceid,cur_countryid,cur_cityid,cur_osid,cur_appversioncode,cur_appversionname,cur_carrierid,cur_networkid,cur_browserid,sessionstatus,cur_ip');
> 
> 
> 
> The value of "starttimestr" field:
> 20180303
> 20180304.
> 
> 
> 
> 
> any advice is appreciated!
> 
> 
> 
> 
> 
> the carbondata version is :
> apache-carbondata-1.3.1-bin-spark2.2.1-hadoop2.7.2.jar
> 
> 
> spark version is :
> spark-2.2.1-bin-hadoop2.7





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/