You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by 喜之郎 <25...@qq.com> on 2018/04/17 12:55:52 UTC
回复: query on string type return error
hi, liang chen.
I start thriftserver, then use beeline to execute this sql. I use "insert into XXX select * from a_parquet_table" to load data.
I deploy a yarn cluster.
Because I can not find what's the problem, I use "insert overwrite" to load data again, then the problem disappear.
------------------ 原始邮件 ------------------
发件人: "Liang Chen"<ch...@gmail.com>;
发送时间: 2018年4月16日(星期一) 下午3:51
收件人: "dev"<de...@carbondata.apache.org>;
主题: Re: query on string type return error
Hi
From the log message, seems like can't find the data files.
Can you provide more detail info :
1. How you created carbonsession and how loaded data.
2. Have you deployed cluster or only single machine?
Regards
Liang
喜之郎 wrote
> hi all, when I use carbondata to run a query "select count(*) from
> action_carbondata where starttimestr = 20180301;", then an error occurs.
> This is the error info:
> ###################
> 0: jdbc:hive2://localhost:10000> select count(*) from action_carbondata
> where starttimestr = 20180301;
> Error: org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 12 in stage 7.0 failed 4 times, most recent failure: Lost task 12.3
> in stage 7.0 (TID 173, sz-pg-entanalytics-research-001.tendcloud.com,
> executor 1): org.apache.spark.util.TaskCompletionListenerException:
> org.apache.carbondata.core.scan.executor.exception.QueryExecutionException:
>
>
> Previous exception in task: java.util.concurrent.ExecutionException:
> java.util.concurrent.ExecutionException: java.io.IOException:
> org.apache.thrift.protocol.TProtocolException: Required field
> 'data_chunk_list' was not present! Struct:
> DataChunk3(data_chunk_list:null)
>
> org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.updateScanner(AbstractDataBlockIterator.java:136)
>
> org.apache.carbondata.core.scan.processor.impl.DataBlockIteratorImpl.processNextBatch(DataBlockIteratorImpl.java:64)
>
> org.apache.carbondata.core.scan.result.iterator.VectorDetailQueryResultIterator.processNextBatch(VectorDetailQueryResultIterator.java:46)
>
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextBatch(VectorizedCarbonRecordReader.java:283)
>
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.nextKeyValue(VectorizedCarbonRecordReader.java:171)
>
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:391)
>
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown
> Source)
>
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown
> Source)
>
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
> Source)
>
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> org.apache.spark.scheduler.Task.run(Task.scala:108)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> at
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
> at
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
> at org.apache.spark.scheduler.Task.run(Task.scala:118)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
> Driver stacktrace: (state=,code=0)
>
> ###################
>
>
> create table statement:
> CREATE TABLE action_carbondata(
> cur_appversioncode integer,
> cur_appversionname integer,
> cur_browserid integer,
> cur_carrierid integer,
> cur_channelid integer,
> cur_cityid integer,
> cur_countryid integer,
> cur_ip string,
> cur_networkid integer,
> cur_osid integer,
> cur_provinceid integer,
> deviceproductoffset long,
> duration integer,
> eventcount integer,
> eventlabelid integer,
> eventtypeid integer,
> organizationid integer,
> platformid integer,
> productid integer,
> relatedaccountproductoffset long,
> sessionduration integer,
> sessionid string,
> sessionstarttime long,
> sessionstatus integer,
> sourceid integer,
> starttime long,
> starttimestr string )
> partitioned by (eventid int)
> STORED BY 'carbondata'
> TBLPROPERTIES ('partition_type'='Hash','NUM_PARTITIONS'='39',
> 'SORT_COLUMNS'='productid,sourceid,starttimestr,platformid,organizationid,eventtypeid,eventlabelid,cur_channelid,cur_provinceid,cur_countryid,cur_cityid,cur_osid,cur_appversioncode,cur_appversionname,cur_carrierid,cur_networkid,cur_browserid,sessionstatus,cur_ip');
>
>
>
> The value of "starttimestr" field:
> 20180303
> 20180304.
>
>
>
>
> any advice is appreciated!
>
>
>
>
>
> the carbondata version is :
> apache-carbondata-1.3.1-bin-spark2.2.1-hadoop2.7.2.jar
>
>
> spark version is :
> spark-2.2.1-bin-hadoop2.7
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/