You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "xubo245 (JIRA)" <ji...@apache.org> on 2018/04/23 08:47:00 UTC
[jira] [Created] (CARBONDATA-2385) The result is incorrect when
read data from carbonfile generated by SDK
xubo245 created CARBONDATA-2385:
-----------------------------------
Summary: The result is incorrect when read data from carbonfile generated by SDK
Key: CARBONDATA-2385
URL: https://issues.apache.org/jira/browse/CARBONDATA-2385
Project: CarbonData
Issue Type: Bug
Reporter: xubo245
Assignee: xubo245
The result is incorrect when read data from carbonfile generated by SDK
When generate 10 million rows data by org.apache.carbondata.spark.testsuite.createTable.TestCreateTableUsingSparkCarbonFileFormat
and count is 5888000
{code:java}
18/04/23 01:43:12 INFO SessionState: Created HDFS directory: /tmp/hive/root/6ebdb24c-8b92-45c3-b7c0-639da93c2984/_tmp_space.db
18/04/23 01:43:12 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /huawei/xubo/git/carbondata1/integration/spark-common/target/warehouse
18/04/23 01:43:12 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|name |string |null |
|age |int |null |
|height |double |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |sdkoutputtable | |
|Owner |root | |
|Created |Mon Apr 23 01:43:19 PDT 2018 | |
|Last Access |Wed Dec 31 16:00:00 PST 1969 | |
|Type |EXTERNAL | |
|Provider |carbonfile | |
|Table Properties |[transient_lastDdlTime=1524472999] | |
|Location |file:/huawei/xubo/git/carbondata1/integration/spark-common-test/src/test/resources/SparkCarbonFileFormat/WriterOutput/Fact/Part0/Segment_null| |
|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | |
|Storage Properties |[serialization.format=1] | |
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
+-------+---+------+
|name |age|height|
+-------+---+------+
|robot0 |0 |0.0 |
|robot1 |1 |0.5 |
|robot2 |2 |1.0 |
|robot3 |3 |1.5 |
|robot4 |4 |2.0 |
|robot5 |5 |2.5 |
|robot6 |6 |3.0 |
|robot7 |7 |3.5 |
|robot8 |8 |4.0 |
|robot9 |9 |4.5 |
|robot10|10 |5.0 |
|robot11|11 |5.5 |
|robot12|12 |6.0 |
|robot13|13 |6.5 |
|robot14|14 |7.0 |
|robot15|15 |7.5 |
|robot16|16 |8.0 |
|robot17|17 |8.5 |
|robot18|18 |9.0 |
|robot19|19 |9.5 |
+-------+---+------+
only showing top 20 rows
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
|robot2|2 |1.0 |
+------+---+------+
+-------+
|name |
+-------+
|robot0 |
|robot1 |
|robot2 |
|robot3 |
|robot4 |
|robot5 |
|robot6 |
|robot7 |
|robot8 |
|robot9 |
|robot10|
|robot11|
|robot12|
|robot13|
|robot14|
|robot15|
|robot16|
|robot17|
|robot18|
|robot19|
+-------+
only showing top 20 rows
+---+
|age|
+---+
|0 |
|1 |
|2 |
|3 |
|4 |
|5 |
|6 |
|7 |
|8 |
|9 |
|10 |
|11 |
|12 |
|13 |
|14 |
|15 |
|16 |
|17 |
|18 |
|19 |
+---+
only showing top 20 rows
+------+---+------+
|name |age|height|
+------+---+------+
|robot3|3 |1.5 |
|robot4|4 |2.0 |
|robot5|5 |2.5 |
|robot6|6 |3.0 |
|robot7|7 |3.5 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot3|3 |1.5 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
|robot2|2 |1.0 |
|robot3|3 |1.5 |
|robot4|4 |2.0 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
+------+---+------+
+-------------+
|sum(age) |
+-------------+
|1515150959596|
+-------------+
+--------+
|count(1)|
+--------+
|5888000 |
+--------+
+--------+
|count(1)|
+--------+
|5888000 |
+--------+
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|name |string |null |
|age |int |null |
|height |double |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |sdkoutputtable | |
|Owner |root | |
|Created |Mon Apr 23 01:43:47 PDT 2018 | |
|Last Access |Wed Dec 31 16:00:00 PST 1969 | |
|Type |EXTERNAL | |
|Provider |carbonfile | |
|Table Properties |[transient_lastDdlTime=1524473027] | |
|Location |file:/huawei/xubo/git/carbondata1/integration/spark-common-test/src/test/resources/SparkCarbonFileFormat/WriterOutput/Fact/Part0/Segment_null| |
|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | |
|Storage Properties |[serialization.format=1] | |
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
+-------+---+------+
|name |age|height|
+-------+---+------+
|robot0 |0 |0.0 |
|robot1 |1 |0.5 |
|robot2 |2 |1.0 |
|robot3 |3 |1.5 |
|robot4 |4 |2.0 |
|robot5 |5 |2.5 |
|robot6 |6 |3.0 |
|robot7 |7 |3.5 |
|robot8 |8 |4.0 |
|robot9 |9 |4.5 |
|robot10|10 |5.0 |
|robot11|11 |5.5 |
|robot12|12 |6.0 |
|robot13|13 |6.5 |
|robot14|14 |7.0 |
|robot15|15 |7.5 |
|robot16|16 |8.0 |
|robot17|17 |8.5 |
|robot18|18 |9.0 |
|robot19|19 |9.5 |
+-------+---+------+
only showing top 20 rows
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
|robot2|2 |1.0 |
+------+---+------+
+-------+
|name |
+-------+
|robot0 |
|robot1 |
|robot2 |
|robot3 |
|robot4 |
|robot5 |
|robot6 |
|robot7 |
|robot8 |
|robot9 |
|robot10|
|robot11|
|robot12|
|robot13|
|robot14|
|robot15|
|robot16|
|robot17|
|robot18|
|robot19|
+-------+
only showing top 20 rows
+---+
|age|
+---+
|0 |
|1 |
|2 |
|3 |
|4 |
|5 |
|6 |
|7 |
|8 |
|9 |
|10 |
|11 |
|12 |
|13 |
|14 |
|15 |
|16 |
|17 |
|18 |
|19 |
+---+
only showing top 20 rows
+------+---+------+
|name |age|height|
+------+---+------+
|robot3|3 |1.5 |
|robot4|4 |2.0 |
|robot5|5 |2.5 |
|robot6|6 |3.0 |
|robot7|7 |3.5 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot3|3 |1.5 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
|robot2|2 |1.0 |
|robot3|3 |1.5 |
|robot4|4 |2.0 |
+------+---+------+
+------+---+------+
|name |age|height|
+------+---+------+
|robot0|0 |0.0 |
|robot1|1 |0.5 |
+------+---+------+
+-------------+
|sum(age) |
+-------------+
|1515150959596|
+-------------+
+--------+
|count(1)|
+--------+
|5888000 |
+--------+
+--------+
|count(1)|
+--------+
|5888000 |
+--------+
18/04/23 01:43:56 ERROR Executor: Exception in task 0.0 in stage 32.0 (TID 38)
org.apache.spark.SparkException: Index file not present to read the carbondata file
at org.apache.spark.sql.SparkCarbonFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(SparkCarbonFileFormat.scala:231)
at org.apache.spark.sql.SparkCarbonFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(SparkCarbonFileFormat.scala:188)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:174)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/04/23 01:43:56 ERROR TaskSetManager: Task 0 in stage 32.0 failed 1 times; aborting job
Process finished with exit code 0
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)