You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "Vandana Yadav (JIRA)" <ji...@apache.org> on 2018/04/25 06:30:00 UTC

[jira] [Created] (CARBONDATA-2399) Getting Error while applying filter on complex data type

Vandana Yadav created CARBONDATA-2399:
-----------------------------------------

             Summary: Getting Error while applying filter on complex data type 
                 Key: CARBONDATA-2399
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2399
             Project: CarbonData
          Issue Type: Bug
         Environment: spark 2.2
            Reporter: Vandana Yadav
         Attachments: arrayofstruct.csv

Getting Error while applying the filters to the complex data type

Steps to Reproduce:

1) Create Table:

 create table ARRAY_OF_STRUCT_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, ARRAY_OF_STRUCT array<struct<ID:int,COUNTRY:string,STATE:string,CITI:string,CHECK_DATE:timestamp>>,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) STORED BY 'org.apache.carbondata.format'

2) Load data into the table:

LOAD DATA INPATH 'HDFS_URL/BabuStore/Data/complex/arrayofstruct.csv' INTO table ARRAY_OF_STRUCT_com options ('DELIMITER'=',', 'QUOTECHAR'='"', 'FILEHEADER'='CUST_ID,YEAR,MONTH,AGE,GENDER,EDUCATED,IS_MARRIED,ARRAY_OF_STRUCT,CARD_COUNT,DEBIT_COUNT,CREDIT_COUNT,DEPOSIT,HQ_DEPOSIT','COMPLEX_DELIMITER_LEVEL_1'='$','COMPLEX_DELIMITER_LEVEL_2'='&')

 

3) Execute Query:

select array_of_struct.ID[1] from ARRAY_OF_STRUCT_com where array_of_struct.ID[1] >=2;

4) Expected Result: It should display the correct result after applying the filter.

5) Actual Result:

Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 37.0 failed 1 times, most recent failure: Lost task 1.0 in stage 37.0 (TID 2090, localhost, executor driver): java.lang.ClassCastException

Driver stacktrace: (state=,code=0)

6) Reeor Logs:

18/04/25 11:45:13 INFO SparkExecuteStatementOperation: Running query 'select array_of_struct.ID[1] from ARRAY_OF_STRUCT_com where array_of_struct.ID[1] >=2' with 4097fea0-f1d8-497a-bda4-4e148ae08477
18/04/25 11:45:13 INFO CarbonSparkSqlParser: Parsing command: select array_of_struct.ID[1] from ARRAY_OF_STRUCT_com where array_of_struct.ID[1] >=2
18/04/25 11:45:13 INFO HiveMetaStore: 30: get_table : db=bug tbl=array_of_struct_com
18/04/25 11:45:13 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_table : db=bug tbl=array_of_struct_com 
18/04/25 11:45:13 INFO HiveMetaStore: 30: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/04/25 11:45:13 INFO ObjectStore: ObjectStore, initialize called
18/04/25 11:45:13 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
18/04/25 11:45:13 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
18/04/25 11:45:13 INFO ObjectStore: Initialized ObjectStore
18/04/25 11:45:13 INFO CatalystSqlParser: Parsing command: array<string>
18/04/25 11:45:13 INFO CarbonLRUCache: pool-23-thread-28 Removed entry from InMemory lru cache :: hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_0/0_batchno0-0-1524572335034.carbonindex
18/04/25 11:45:13 INFO CarbonLRUCache: pool-23-thread-28 Removed entry from InMemory lru cache :: hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_1/0_batchno0-0-1524575558281.carbonindex
18/04/25 11:45:13 INFO HiveMetaStore: 30: get_table : db=bug tbl=array_of_struct_com
18/04/25 11:45:13 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_table : db=bug tbl=array_of_struct_com 
18/04/25 11:45:13 INFO CatalystSqlParser: Parsing command: array<string>
18/04/25 11:45:13 INFO HiveMetaStore: 30: get_database: bug
18/04/25 11:45:13 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_database: bug 
18/04/25 11:45:13 INFO HiveMetaStore: 30: get_database: bug
18/04/25 11:45:13 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_database: bug 
18/04/25 11:45:13 INFO HiveMetaStore: 30: get_tables: db=bug pat=*
18/04/25 11:45:13 INFO audit: ugi=knoldus ip=unknown-ip-addr cmd=get_tables: db=bug pat=* 
18/04/25 11:45:13 INFO TableInfo: pool-23-thread-28 Table block size not specified for bug_array_of_struct_com. Therefore considering the default value 1024 MB
18/04/25 11:45:13 INFO CarbonLateDecodeRule: pool-23-thread-28 skip CarbonOptimizer
18/04/25 11:45:13 INFO CarbonLateDecodeRule: pool-23-thread-28 Skip CarbonOptimizer
18/04/25 11:45:13 INFO TableInfo: pool-23-thread-28 Table block size not specified for bug_array_of_struct_com. Therefore considering the default value 1024 MB
18/04/25 11:45:13 INFO BlockletDataMap: pool-23-thread-28 Time taken to load blocklet datamap from file : hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_0/0_batchno0-0-1524572335034.carbonindexis 1
18/04/25 11:45:13 INFO BlockletDataMap: pool-23-thread-28 Time taken to load blocklet datamap from file : hdfs://localhost:54310/opt/CarbonStore/bug/array_of_struct_com/Fact/Part0/Segment_1/0_batchno0-0-1524575558281.carbonindexis 1
18/04/25 11:45:13 INFO CarbonScanRDD: 
 Identified no.of.blocks: 2,
 no.of.tasks: 2,
 no.of.nodes: 0,
 parallelism: 4
 
18/04/25 11:45:13 INFO SparkContext: Starting job: run at AccessController.java:0
18/04/25 11:45:13 INFO DAGScheduler: Got job 23 (run at AccessController.java:0) with 2 output partitions
18/04/25 11:45:13 INFO DAGScheduler: Final stage: ResultStage 37 (run at AccessController.java:0)
18/04/25 11:45:13 INFO DAGScheduler: Parents of final stage: List()
18/04/25 11:45:13 INFO DAGScheduler: Missing parents: List()
18/04/25 11:45:13 INFO DAGScheduler: Submitting ResultStage 37 (MapPartitionsRDD[105] at run at AccessController.java:0), which has no missing parents
18/04/25 11:45:13 INFO MemoryStore: Block broadcast_33 stored as values in memory (estimated size 34.4 KB, free 366.0 MB)
18/04/25 11:45:13 INFO MemoryStore: Block broadcast_33_piece0 stored as bytes in memory (estimated size 27.2 KB, free 366.0 MB)
18/04/25 11:45:13 INFO BlockManagerInfo: Added broadcast_33_piece0 in memory on 192.168.2.102:40679 (size: 27.2 KB, free: 366.2 MB)
18/04/25 11:45:13 INFO SparkContext: Created broadcast 33 from broadcast at DAGScheduler.scala:1006
18/04/25 11:45:13 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 37 (MapPartitionsRDD[105] at run at AccessController.java:0) (first 15 tasks are for partitions Vector(0, 1))
18/04/25 11:45:13 INFO TaskSchedulerImpl: Adding task set 37.0 with 2 tasks
18/04/25 11:45:13 INFO TaskSetManager: Starting task 0.0 in stage 37.0 (TID 2089, localhost, executor driver, partition 0, ANY, 6524 bytes)
18/04/25 11:45:13 INFO TaskSetManager: Starting task 1.0 in stage 37.0 (TID 2090, localhost, executor driver, partition 1, ANY, 6534 bytes)
18/04/25 11:45:13 INFO Executor: Running task 1.0 in stage 37.0 (TID 2090)
18/04/25 11:45:13 INFO Executor: Running task 0.0 in stage 37.0 (TID 2089)
18/04/25 11:45:13 INFO TableInfo: Executor task launch worker for task 2089 Table block size not specified for bug_array_of_struct_com. Therefore considering the default value 1024 MB
18/04/25 11:45:13 INFO AbstractQueryExecutor: [Executor task launch worker for task 2089][partitionID:com;queryID:20021979459171] Query will be executed on table: array_of_struct_com
18/04/25 11:45:13 INFO TableInfo: Executor task launch worker for task 2090 Table block size not specified for bug_array_of_struct_com. Therefore considering the default value 1024 MB
18/04/25 11:45:13 INFO AbstractQueryExecutor: [Executor task launch worker for task 2090][partitionID:com;queryID:20021979459171] Query will be executed on table: array_of_struct_com
18/04/25 11:45:13 INFO ResultCollectorFactory: [Executor task launch worker for task 2089][partitionID:com;queryID:20021979459171] Row based dictionary collector is used to scan and collect the data
18/04/25 11:45:13 INFO ResultCollectorFactory: [Executor task launch worker for task 2090][partitionID:com;queryID:20021979459171] Restructure based dictionary collector is used to scan and collect the data
18/04/25 11:45:13 INFO UnsafeMemoryManager: [Executor task launch worker for task 2090][partitionID:com;queryID:20021979459171] Total memory used after task 20022071181912 is 107854 Current tasks running now are : [20022033622596, 18271172672188, 17522539140626, 17607895858118, 18330821230360, 18405469228911, 18618097583871, 20022072761915, 18394132241322, 18418328233121, 18431423923731, 19634142794700, 19645360614384, 18317037545688, 19104026787023, 18368469767199, 19947340764859, 18254776726806, 18307363580438, 18146866243005, 18385290912031, 19986144509530]
18/04/25 11:45:13 ERROR Executor: Exception in task 1.0 in stage 37.0 (TID 2090)
java.lang.ClassCastException
18/04/25 11:45:13 WARN TaskSetManager: Lost task 1.0 in stage 37.0 (TID 2090, localhost, executor driver): java.lang.ClassCastException

18/04/25 11:45:13 ERROR TaskSetManager: Task 1 in stage 37.0 failed 1 times; aborting job
18/04/25 11:45:13 INFO TaskSchedulerImpl: Cancelling stage 37
18/04/25 11:45:13 INFO TaskSchedulerImpl: Stage 37 was cancelled
18/04/25 11:45:13 INFO DAGScheduler: ResultStage 37 (run at AccessController.java:0) failed in 0.083 s due to Job aborted due to stage failure: Task 1 in stage 37.0 failed 1 times, most recent failure: Lost task 1.0 in stage 37.0 (TID 2090, localhost, executor driver): java.lang.ClassCastException

Driver stacktrace:
18/04/25 11:45:13 INFO Executor: Executor is trying to kill task 0.0 in stage 37.0 (TID 2089), reason: stage cancelled
18/04/25 11:45:13 INFO DAGScheduler: Job 23 failed: run at AccessController.java:0, took 0.090779 s
18/04/25 11:45:13 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING, 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 37.0 failed 1 times, most recent failure: Lost task 1.0 in stage 37.0 (TID 2090, localhost, executor driver): java.lang.ClassCastException

Driver stacktrace:
 at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504)
 at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1504)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
 at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
 at scala.Option.foreach(Option.scala:257)
 at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
 at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1732)
 at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)
 at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676)
 at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
 at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2069)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:2094)
 at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
 at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
 at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
 at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
 at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
 at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:278)
 at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2861)
 at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2387)
 at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2387)
 at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2842)
 at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
 at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2841)
 at org.apache.spark.sql.Dataset.collect(Dataset.scala:2387)
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:245)
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException
18/04/25 11:45:13 ERROR SparkExecuteStatementOperation: Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 37.0 failed 1 times, most recent failure: Lost task 1.0 in stage 37.0 (TID 2090, localhost, executor driver): java.lang.ClassCastException

Driver stacktrace:
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:268)
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)