You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "xubo245 (JIRA)" <ji...@apache.org> on 2018/04/23 08:47:00 UTC
[jira] [Created] (CARBONDATA-2385) The result is incorrect when read data from carbonfile generated by SDK

xubo245 created CARBONDATA-2385:
-----------------------------------

             Summary: The result is incorrect when read data from carbonfile generated by SDK
                 Key: CARBONDATA-2385
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2385
             Project: CarbonData
          Issue Type: Bug
            Reporter: xubo245
            Assignee: xubo245


The result is incorrect when read data from carbonfile generated by SDK

When generate 10 million rows data by org.apache.carbondata.spark.testsuite.createTable.TestCreateTableUsingSparkCarbonFileFormat

and count is 5888000 



{code:java}
18/04/23 01:43:12 INFO SessionState: Created HDFS directory: /tmp/hive/root/6ebdb24c-8b92-45c3-b7c0-639da93c2984/_tmp_space.db
18/04/23 01:43:12 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /huawei/xubo/git/carbondata1/integration/spark-common/target/warehouse
18/04/23 01:43:12 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|col_name                    |data_type                                                                                                                                    |comment|
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|name                        |string                                                                                                                                       |null   |
|age                         |int                                                                                                                                          |null   |
|height                      |double                                                                                                                                       |null   |
|                            |                                                                                                                                             |       |
|# Detailed Table Information|                                                                                                                                             |       |
|Database                    |default                                                                                                                                      |       |
|Table                       |sdkoutputtable                                                                                                                               |       |
|Owner                       |root                                                                                                                                         |       |
|Created                     |Mon Apr 23 01:43:19 PDT 2018                                                                                                                 |       |
|Last Access                 |Wed Dec 31 16:00:00 PST 1969                                                                                                                 |       |
|Type                        |EXTERNAL                                                                                                                                     |       |
|Provider                    |carbonfile                                                                                                                                   |       |
|Table Properties            |[transient_lastDdlTime=1524472999]                                                                                                           |       |
|Location                    |file:/huawei/xubo/git/carbondata1/integration/spark-common-test/src/test/resources/SparkCarbonFileFormat/WriterOutput/Fact/Part0/Segment_null|       |
|Serde Library               |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                                                                                           |       |
|InputFormat                 |org.apache.hadoop.mapred.SequenceFileInputFormat                                                                                             |       |
|OutputFormat                |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                                                                                    |       |
|Storage Properties          |[serialization.format=1]                                                                                                                     |       |
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+

+-------+---+------+
|name   |age|height|
+-------+---+------+
|robot0 |0  |0.0   |
|robot1 |1  |0.5   |
|robot2 |2  |1.0   |
|robot3 |3  |1.5   |
|robot4 |4  |2.0   |
|robot5 |5  |2.5   |
|robot6 |6  |3.0   |
|robot7 |7  |3.5   |
|robot8 |8  |4.0   |
|robot9 |9  |4.5   |
|robot10|10 |5.0   |
|robot11|11 |5.5   |
|robot12|12 |6.0   |
|robot13|13 |6.5   |
|robot14|14 |7.0   |
|robot15|15 |7.5   |
|robot16|16 |8.0   |
|robot17|17 |8.5   |
|robot18|18 |9.0   |
|robot19|19 |9.5   |
+-------+---+------+
only showing top 20 rows

+------+---+------+
|name  |age|height|
+------+---+------+
|robot0|0  |0.0   |
|robot1|1  |0.5   |
|robot2|2  |1.0   |
+------+---+------+

+-------+
|name   |
+-------+
|robot0 |
|robot1 |
|robot2 |
|robot3 |
|robot4 |
|robot5 |
|robot6 |
|robot7 |
|robot8 |
|robot9 |
|robot10|
|robot11|
|robot12|
|robot13|
|robot14|
|robot15|
|robot16|
|robot17|
|robot18|
|robot19|
+-------+
only showing top 20 rows

+---+
|age|
+---+
|0  |
|1  |
|2  |
|3  |
|4  |
|5  |
|6  |
|7  |
|8  |
|9  |
|10 |
|11 |
|12 |
|13 |
|14 |
|15 |
|16 |
|17 |
|18 |
|19 |
+---+
only showing top 20 rows

+------+---+------+
|name  |age|height|
+------+---+------+
|robot3|3  |1.5   |
|robot4|4  |2.0   |
|robot5|5  |2.5   |
|robot6|6  |3.0   |
|robot7|7  |3.5   |
+------+---+------+

+------+---+------+
|name  |age|height|
+------+---+------+
|robot3|3  |1.5   |
+------+---+------+

+------+---+------+
|name  |age|height|
+------+---+------+
|robot0|0  |0.0   |
|robot1|1  |0.5   |
|robot2|2  |1.0   |
|robot3|3  |1.5   |
|robot4|4  |2.0   |
+------+---+------+

+------+---+------+
|name  |age|height|
+------+---+------+
|robot0|0  |0.0   |
|robot1|1  |0.5   |
+------+---+------+

+-------------+
|sum(age)     |
+-------------+
|1515150959596|
+-------------+

+--------+
|count(1)|
+--------+
|5888000 |
+--------+

+--------+
|count(1)|
+--------+
|5888000 |
+--------+

+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|col_name                    |data_type                                                                                                                                    |comment|
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+
|name                        |string                                                                                                                                       |null   |
|age                         |int                                                                                                                                          |null   |
|height                      |double                                                                                                                                       |null   |
|                            |                                                                                                                                             |       |
|# Detailed Table Information|                                                                                                                                             |       |
|Database                    |default                                                                                                                                      |       |
|Table                       |sdkoutputtable                                                                                                                               |       |
|Owner                       |root                                                                                                                                         |       |
|Created                     |Mon Apr 23 01:43:47 PDT 2018                                                                                                                 |       |
|Last Access                 |Wed Dec 31 16:00:00 PST 1969                                                                                                                 |       |
|Type                        |EXTERNAL                                                                                                                                     |       |
|Provider                    |carbonfile                                                                                                                                   |       |
|Table Properties            |[transient_lastDdlTime=1524473027]                                                                                                           |       |
|Location                    |file:/huawei/xubo/git/carbondata1/integration/spark-common-test/src/test/resources/SparkCarbonFileFormat/WriterOutput/Fact/Part0/Segment_null|       |
|Serde Library               |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                                                                                           |       |
|InputFormat                 |org.apache.hadoop.mapred.SequenceFileInputFormat                                                                                             |       |
|OutputFormat                |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                                                                                    |       |
|Storage Properties          |[serialization.format=1]                                                                                                                     |       |
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+-------+

+-------+---+------+
|name   |age|height|
+-------+---+------+
|robot0 |0  |0.0   |
|robot1 |1  |0.5   |
|robot2 |2  |1.0   |
|robot3 |3  |1.5   |
|robot4 |4  |2.0   |
|robot5 |5  |2.5   |
|robot6 |6  |3.0   |
|robot7 |7  |3.5   |
|robot8 |8  |4.0   |
|robot9 |9  |4.5   |
|robot10|10 |5.0   |
|robot11|11 |5.5   |
|robot12|12 |6.0   |
|robot13|13 |6.5   |
|robot14|14 |7.0   |
|robot15|15 |7.5   |
|robot16|16 |8.0   |
|robot17|17 |8.5   |
|robot18|18 |9.0   |
|robot19|19 |9.5   |
+-------+---+------+
only showing top 20 rows

+------+---+------+
|name  |age|height|
+------+---+------+
|robot0|0  |0.0   |
|robot1|1  |0.5   |
|robot2|2  |1.0   |
+------+---+------+

+-------+
|name   |
+-------+
|robot0 |
|robot1 |
|robot2 |
|robot3 |
|robot4 |
|robot5 |
|robot6 |
|robot7 |
|robot8 |
|robot9 |
|robot10|
|robot11|
|robot12|
|robot13|
|robot14|
|robot15|
|robot16|
|robot17|
|robot18|
|robot19|
+-------+
only showing top 20 rows

+---+
|age|
+---+
|0  |
|1  |
|2  |
|3  |
|4  |
|5  |
|6  |
|7  |
|8  |
|9  |
|10 |
|11 |
|12 |
|13 |
|14 |
|15 |
|16 |
|17 |
|18 |
|19 |
+---+
only showing top 20 rows

+------+---+------+
|name  |age|height|
+------+---+------+
|robot3|3  |1.5   |
|robot4|4  |2.0   |
|robot5|5  |2.5   |
|robot6|6  |3.0   |
|robot7|7  |3.5   |
+------+---+------+

+------+---+------+
|name  |age|height|
+------+---+------+
|robot3|3  |1.5   |
+------+---+------+

+------+---+------+
|name  |age|height|
+------+---+------+
|robot0|0  |0.0   |
|robot1|1  |0.5   |
|robot2|2  |1.0   |
|robot3|3  |1.5   |
|robot4|4  |2.0   |
+------+---+------+

+------+---+------+
|name  |age|height|
+------+---+------+
|robot0|0  |0.0   |
|robot1|1  |0.5   |
+------+---+------+

+-------------+
|sum(age)     |
+-------------+
|1515150959596|
+-------------+

+--------+
|count(1)|
+--------+
|5888000 |
+--------+

+--------+
|count(1)|
+--------+
|5888000 |
+--------+

18/04/23 01:43:56 ERROR Executor: Exception in task 0.0 in stage 32.0 (TID 38)
org.apache.spark.SparkException: Index file not present to read the carbondata file
	at org.apache.spark.sql.SparkCarbonFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(SparkCarbonFileFormat.scala:231)
	at org.apache.spark.sql.SparkCarbonFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(SparkCarbonFileFormat.scala:188)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:174)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
18/04/23 01:43:56 ERROR TaskSetManager: Task 0 in stage 32.0 failed 1 times; aborting job

Process finished with exit code 0
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)