You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/11/12 05:51:35 UTC

[GitHub] [hudi] JoshuaZhuCN opened a new issue #3981: [SUPPORT] If the HUDI table contains only log files, the Spark Datasource cannot obtain data in snapshot mode

JoshuaZhuCN opened a new issue #3981:
URL: https://github.com/apache/hudi/issues/3981


   <p>  When the HBase index is used for the Hudi table, after initializing the data with insert or upsert, only log files will be generated in the directory, and there is no parquet file.
   <p>  At this time, the spark datasource is used for snapshot reading, and the data cannot be obtained. The data can only be obtained through incremental reading.
   <p>  According to the official introduction, when reading the Hudi table with a snapshot, Parque and log will be merged. At this time, why can't the data be read when there is only log. 
   <p>Is this a bug?
   
   
   **Steps to reproduce the behavior:**
   
   1. Create hudi table with hbase index
   2. Use insert or upsert to initialize data
   3. Check whether there are only log files in the Hudi table directory
   4. Read data using snapshot mode and incremental mode respectively
   
   
   **Environment Description**
   
   * Hudi version : 0.9.0
   * Spark version : 2.4.7
   * Hive version : ~
   * Hadoop version : 3.1.1
   * Storage (HDFS/S3/GCS..) : HDFS
   * Running on Docker? (yes/no) : no
   
   **Stacktrace**
   
   ```val conf = new SparkConf()
               .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
               conf.setMaster("local[2]")
   
           val spark = SparkSession
               .builder()
               .config(conf)
               .getOrCreate()
   
           println("=================Snapshot Read===============")
           val dfSnapshot = spark
               .read
               .format("hudi")
               .option(DataSourceReadOptions.QUERY_TYPE.key(), DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL)
               .load("C:\\hudi_data\\baikal\\oms_order_info_hbase\\default\\*")
           dfSnapshot.show(1, false)
           println("================================================")
           println("")
           println("=================Incremental Read===============")
           val dfIncremental = spark
               .read
               .format("hudi")
               .option(DataSourceReadOptions.QUERY_TYPE.key(), DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
               .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key(), "19800101000000")
               .load("C:\\hudi_data\\baikal\\oms_order_info_hbase\\default\\*")
           dfIncremental.show(1, false)
           println("================================================")
           spark.close()```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN edited a comment on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

JoshuaZhuCN edited a comment on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-967144402


   > > after initializing the data with insert or upsert, only log files will be generated in the directory, and there is no parquet file.
   > 
   > This sounds very strange. When you insert to a new empty table, it's not meant to create log files, rather only write parquet. Only when subsequent update will result in log files. You mentioned HBase index. With your code and data, can you configure it to run data generation with SIMPLE index just to see any difference? wanted to rule out if HBase index is the problem here.
   
   @xushiyan I have tried using the (GLOBAL)SIMPLE index, (GLOBAL)BLOOM index for insert or upsert, which generate parquet files, but using the HBASE index for insert or upsert, which only generates log files, The parquet file is generated only with bulk_insert.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan closed issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

nsivabalan closed issue #3981:
URL: https://github.com/apache/hudi/issues/3981


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN commented on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

JoshuaZhuCN commented on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-991898766


   > thanks for confirming this. Can we close it out if you don't have more questions
   
   This issue can be closed, thanks for answering my doubts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN edited a comment on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

JoshuaZhuCN edited a comment on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-971393339


   @xushiyan Hi，Here is my test code：
   
   
   ```
   import com.leqee.sparktool.date.DateUtil
   import com.leqee.sparktool.hoodie.HoodieProp
   import com.leqee.sparktool.spark.SparkTool
   import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, DataSourceOptionsHelper}
   import org.apache.spark.SparkConf
   import org.apache.spark.sql.{Row, SaveMode, SparkSession}
   import org.apache.spark.sql.functions.{col, lit}
   import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType, TimestampType}
   import org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER
   import org.apache.hudi.common.config.HoodieMetadataConfig
   import org.apache.hudi.common.model.HoodieCleaningPolicy
   import org.apache.hudi.config._
   import org.apache.hudi.index.HoodieIndex
   import org.apache.hudi.keygen.constant.KeyGeneratorOptions
   
   object Test4 {
       def main(args: Array[String]): Unit = {
           val AD_DATE = "1980-01-01 00:00:00"
           val spark = SparkSession
               .builder()
               .config(
                   new SparkConf()
                       .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                   .   set("spark.master","local[2]")
               )
               .getOrCreate()
   
           val data = Seq(
               Row(1, "A", 10, DateUtil.now()),
               Row(2, "B", 20, DateUtil.now()),
               Row(3, "C", 30, DateUtil.now()))
   
           val schema = StructType(List(
               StructField("id", IntegerType),
               StructField("name", StringType),
               StructField("age", IntegerType),
               StructField("dt", StringType)))
   
           val df = spark.createDataFrame(spark.sparkContext.makeRDD(data), schema)
   
           df.show(false)
   
           df.write.format("org.apache.hudi")
               .option(DataSourceWriteOptions.RECORDKEY_FIELD.key(), "id")
               .option(DataSourceWriteOptions.PRECOMBINE_FIELD.key(), "id")
               .option(HoodieWriteConfig.TBL_NAME.key(), "tb_hbase_test")
               .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
               .option("hoodie.datasource.write.table.type", "MERGE_ON_READ")
               .option("hoodie.index.type", "HBASE")
               .option("hoodie.index.hbase.zkport", "2181")
               .option("hoodie.hbase.index.update.partition.path", "true")
               .option("hoodie.index.hbase.max.qps.fraction", "10000")
               .option("hoodie.index.hbase.min.qps.fraction", "1000")
               .option("hoodie.index.hbase.table", "hudi:tb_hbase_test")
               .option("hoodie.index.hbase.zknode.path", "/hbase")
               .option("hoodie.index.hbase.get.batch.size", "1000")
               .option("hoodie.index.hbase.zkquorum", "127.0.0.1")
               .option("hoodie.index.hbase.sleep.ms.for.get.batch", "100")
               .option("hoodie.index.hbase.sleep.ms.for.get.batch", "10")
               .option("hoodie.index.hbase.max.qps.per.region.server", "1000")
               .option("hoodie.index.hbase.zk.session_timeout_ms", "5000")
               .option("hoodie.index.hbase.desired_puts_time_in_secs", "3600")
               .option(HoodieProp.INDEX_HBASE_ZKPORT_PROP, HoodieProp.INDEX_HBASE_ZKPORT_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_PROP, HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_PROP, HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_PROP, HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_PROP, HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_PROP, HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_PROP, HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_PROP, HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_MAX_QPS_FRACTION_PROP, HoodieProp.INDEX_HABSE_MAX_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HABSE_MIN_QPS_FRACTION_PROP, HoodieProp.INDEX_HBASE_MIN_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_PROP, HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_PROP, HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_QPS_FRACTION_PROP, HoodieProp.INDEX_HBASE_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZK_SESSION_TIMEOUT_MS_PROP, HoodieProp.INDEX_HBASE_ZK_SESSION_TIMEOUT_MS_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_PROP, HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_DESIRED_PUTS_TIME_IN_SECONDS_PROP, HoodieProp.INDEX_HBASE_DESIRED_PUTS_TIME_IN_SECONDS_VALUE_DEFAULT)
               .mode(SaveMode.Overwrite)
               .save("hdfs://localhost:9000/hoodie/tb_hbase_test")
   
           println("===================Snapshot Read==============================")
           spark.read
               .format("hudi")
               .option("mergeSchema", "true")
               .option(DataSourceReadOptions.QUERY_TYPE.key(), DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL)
               .load("hdfs://localhost:9000/hoodie/tb_hbase_test/default/*")
               .show(false)
           println("==============================================================")
           println("===================Incremental Read==============================")
           spark.read
               .format("hudi")
               .option("mergeSchema", "true")
               .option(DataSourceReadOptions.QUERY_TYPE.key(), DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
               .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key(), AD_DATE)
               .load("hdfs://localhost:9000/hoodie/tb_hbase_test/default/*")
               .show(false)
           println("==============================================================")
           
           spark.close()
       }
   }
   
   ```
   
   **The output is as follows：**
   ```
   
   Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
   +---+----+---+-----------------------+
   |id |name|age|dt                     |
   +---+----+---+-----------------------+
   |1  |A   |10 |2021-11-17 17:21:40.508|
   |2  |B   |20 |2021-11-17 17:21:40.508|
   |3  |C   |30 |2021-11-17 17:21:40.508|
   +---+----+---+-----------------------+
   
   ===================Snapshot Read==============================
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|id |name|age|dt |
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   
   ==============================================================
   ===================Incremental Read==============================
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                     |id |name|age|dt                     |
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   |20211117172143     |20211117172143_0_1  |1                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|1  |A   |10 |2021-11-17 17:21:40.508|
   |20211117172143     |20211117172143_0_2  |2                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|2  |B   |20 |2021-11-17 17:21:40.508|
   |20211117172143     |20211117172143_0_3  |3                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|3  |C   |30 |2021-11-17 17:21:40.508|
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   
   ==============================================================
   
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan edited a comment on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

nsivabalan edited a comment on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-967310667


   Hbase is capable of routing inserts to log files. so thats not a bug for sure. 
   can you clarify something please 
   - Did you say that after initial insert/upsert, you see only log files and no parquet files? for all file groups? 
   
   @n3nash @satishkotha @nbalajee : can you please chime in here. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

xiarixiaoyao commented on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-977595302


   @JoshuaZhuCN   spark should support read pure log table.
   When you specify the load path, do not use wildcards, just specify the path to the table level。
   load("hdfs://localhost:9000/hoodie/tb_hbase_test/default/*") -> load("hdfs://localhost:9000/hoodie/tb_hbase_test")


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xushiyan commented on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

xushiyan commented on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-966940677


   > after initializing the data with insert or upsert, only log files will be generated in the directory, and there is no parquet file.
   
   This sounds very strange. When you insert to a new empty table, it's not meant to create log files, rather only write parquet. Only when subsequent update will result in log files. You mentioned HBase index. With your code and data, can you configure it to run data generation with SIMPLE index just to see any difference? wanted to rule out if HBase index is the problem here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-967310667


   Hbase is capable of routing inserts to log files. so thats not a bug for sure. 
   can you clarify something please 
   - Did you say that after initial insert/upsert, you see only log files and no parquet files? for all file groups? 
   
   @n3nash : can you please chime in here. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN commented on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

JoshuaZhuCN commented on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-968432976


   > Hbase is capable of routing inserts to log files. so thats not a bug for sure.
   > can you clarify something please
   
   @nsivabalan I mean when I use hbase Index I cannot read data in snapshot mode, only in incremental mode. Is that a problem, or am I misinterpreting snapshot mode(merge parquet files and log files)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN commented on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

JoshuaZhuCN commented on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-971393339


   @xushiyan Hi，Here is my test code：
   
   ```
   import com.leqee.sparktool.date.DateUtil
   import com.leqee.sparktool.hoodie.HoodieProp
   import com.leqee.sparktool.spark.SparkTool
   import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, DataSourceOptionsHelper}
   import org.apache.spark.SparkConf
   import org.apache.spark.sql.{Row, SaveMode, SparkSession}
   import org.apache.spark.sql.functions.{col, lit}
   import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType, TimestampType}
   import org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER
   import org.apache.hudi.common.config.HoodieMetadataConfig
   import org.apache.hudi.common.model.HoodieCleaningPolicy
   import org.apache.hudi.config._
   import org.apache.hudi.index.HoodieIndex
   import org.apache.hudi.keygen.constant.KeyGeneratorOptions
   
   object Test4 {
       def main(args: Array[String]): Unit = {
           val AD_DATE = "1980-01-01 00:00:00"
           val spark = SparkSession
               .builder()
               .config(
                   new SparkConf()
                       .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                   .   set("spark.master","local[2]")
               )
               .getOrCreate()
           //val hudi = HoodieBuilder.apply().withSparkSession(spark).withHoodieName("leqee").build()
   
           val data = Seq(
               Row(1, "A", 10, DateUtil.now()),
               Row(2, "B", 20, DateUtil.now()),
               Row(3, "C", 30, DateUtil.now()))
   
           val schema = StructType(List(
               StructField("id", IntegerType),
               StructField("name", StringType),
               StructField("age", IntegerType),
               StructField("dt", StringType)))
   
           val df = spark.createDataFrame(spark.sparkContext.makeRDD(data), schema)
   
           df.show(false)
   
           df.write.format("org.apache.hudi")
               .option(DataSourceWriteOptions.RECORDKEY_FIELD.key(), "id")
               .option(DataSourceWriteOptions.PRECOMBINE_FIELD.key(), "id")
               .option(HoodieWriteConfig.TBL_NAME.key(), "tb_hbase_test")
               .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
               .option("hoodie.datasource.write.table.type", "MERGE_ON_READ")
               .option("hoodie.index.type", "HBASE")
               .option("hoodie.index.hbase.zkport", "2181")
               .option("hoodie.hbase.index.update.partition.path", "true")
               .option("hoodie.index.hbase.max.qps.fraction", "10000")
               .option("hoodie.index.hbase.min.qps.fraction", "1000")
               .option("hoodie.index.hbase.table", "hudi:tb_hbase_test")
               .option("hoodie.index.hbase.zknode.path", "/hbase")
               .option("hoodie.index.hbase.get.batch.size", "1000")
               .option("hoodie.index.hbase.zkquorum", "127.0.0.1")
               .option("hoodie.index.hbase.sleep.ms.for.get.batch", "100")
               .option("hoodie.index.hbase.sleep.ms.for.get.batch", "10")
               .option("hoodie.index.hbase.max.qps.per.region.server", "1000")
               .option("hoodie.index.hbase.zk.session_timeout_ms", "5000")
               .option("hoodie.index.hbase.desired_puts_time_in_secs", "3600")
               .option(HoodieProp.INDEX_HBASE_ZKPORT_PROP, HoodieProp.INDEX_HBASE_ZKPORT_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_PROP, HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_PROP, HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_PROP, HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_PROP, HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_PROP, HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_PROP, HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_PROP, HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_MAX_QPS_FRACTION_PROP, HoodieProp.INDEX_HABSE_MAX_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HABSE_MIN_QPS_FRACTION_PROP, HoodieProp.INDEX_HBASE_MIN_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_PROP, HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_PROP, HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_QPS_FRACTION_PROP, HoodieProp.INDEX_HBASE_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZK_SESSION_TIMEOUT_MS_PROP, HoodieProp.INDEX_HBASE_ZK_SESSION_TIMEOUT_MS_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_PROP, HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_DESIRED_PUTS_TIME_IN_SECONDS_PROP, HoodieProp.INDEX_HBASE_DESIRED_PUTS_TIME_IN_SECONDS_VALUE_DEFAULT)
               .mode(SaveMode.Overwrite)
               .save("hdfs://localhost:9000/hoodie/tb_hbase_test")
   
           println("===================Snapshot Read==============================")
           spark.read
               .format("hudi")
               .option("mergeSchema", "true")
               .option(DataSourceReadOptions.QUERY_TYPE.key(), DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL)
               .load("hdfs://localhost:9000/hoodie/tb_hbase_test/default/*")
               .show(false)
           println("==============================================================")
           println("===================Incremental Read==============================")
           spark.read
               .format("hudi")
               .option("mergeSchema", "true")
               .option(DataSourceReadOptions.QUERY_TYPE.key(), DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
               .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key(), AD_DATE)
               .load("hdfs://localhost:9000/hoodie/tb_hbase_test/default/*")
               .show(false)
           println("==============================================================")
           
           spark.close()
       }
   }
   
   ```
   
   *** The output is as follows：
   ```
   /opt/java/bin/java -javaagent:/home/joshua/app/idea-IC-212.5457.46/lib/idea_rt.jar=38173:/home/joshua/app/idea-IC-212.5457.46/bin -Dfile.encoding=UTF-8 -classpath /opt/java/jre/lib/charsets.jar:/opt/java/jre/lib/deploy.jar:/opt/java/jre/lib/ext/cldrdata.jar:/opt/java/jre/lib/ext/dnsns.jar:/opt/java/jre/lib/ext/jaccess.jar:/opt/java/jre/lib/ext/jfxrt.jar:/opt/java/jre/lib/ext/localedata.jar:/opt/java/jre/lib/ext/nashorn.jar:/opt/java/jre/lib/ext/sunec.jar:/opt/java/jre/lib/ext/sunjce_provider.jar:/opt/java/jre/lib/ext/sunpkcs11.jar:/opt/java/jre/lib/ext/zipfs.jar:/opt/java/jre/lib/javaws.jar:/opt/java/jre/lib/jce.jar:/opt/java/jre/lib/jfr.jar:/opt/java/jre/lib/jfxswt.jar:/opt/java/jre/lib/jsse.jar:/opt/java/jre/lib/management-agent.jar:/opt/java/jre/lib/plugin.jar:/opt/java/jre/lib/resources.jar:/opt/java/jre/lib/rt.jar:/home/joshua/Code/SpartaQjzhu/DataSync/OntarioSync/target/classes:/home/joshua/Code/SpartaQjzhu/SparkTool/target/classes:/opt/apache-maven-3.6.1/repository/org/apac
 he/commons/commons-lang3/3.5/commons-lang3-3.5.jar:/opt/apache-maven-3.6.1/repository/org/scala-lang/scala-library/2.11.12/scala-library-2.11.12.jar:/opt/apache-maven-3.6.1/repository/org/scala-lang/scala-reflect/2.11.12/scala-reflect-2.11.12.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-core_2.11/2.4.7/spark-core_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/com/thoughtworks/paranamer/paranamer/2.8/paranamer-2.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/avro/avro/1.8.2/avro-1.8.2.jar:/opt/apache-maven-3.6.1/repository/org/tukaani/xz/1.5/xz-1.5.jar:/opt/apache-maven-3.6.1/repository/org/apache/avro/avro-mapred/1.8.2/avro-mapred-1.8.2-hadoop2.jar:/opt/apache-maven-3.6.1/repository/org/apache/avro/avro-ipc/1.8.2/avro-ipc-1.8.2.jar:/opt/apache-maven-3.6.1/repository/com/twitter/chill_2.11/0.9.3/chill_2.11-0.9.3.jar:/opt/apache-maven-3.6.1/repository/com/esotericsoftware/kryo-shaded/4.0.2/kryo-shaded-4.0.2.jar:/opt/apache-maven-3.6.1/repository/com/esotericso
 ftware/minlog/1.3.0/minlog-1.3.0.jar:/opt/apache-maven-3.6.1/repository/org/objenesis/objenesis/2.5.1/objenesis-2.5.1.jar:/opt/apache-maven-3.6.1/repository/com/twitter/chill-java/0.9.3/chill-java-0.9.3.jar:/opt/apache-maven-3.6.1/repository/org/apache/xbean/xbean-asm6-shaded/4.8/xbean-asm6-shaded-4.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-launcher_2.11/2.4.7/spark-launcher_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-kvstore_2.11/2.4.7/spark-kvstore_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/com/fasterxml/jackson/core/jackson-core/2.6.7/jackson-core-2.6.7.jar:/opt/apache-maven-3.6.1/repository/com/fasterxml/jackson/core/jackson-annotations/2.6.7/jackson-annotations-2.6.7.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-network-common_2.11/2.4.7/spark-network-common_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-network-shuffle_2.11/2.4.7/spark-network-shuffle_2.11-2.4.7.jar:/opt/apache-
 maven-3.6.1/repository/org/apache/spark/spark-unsafe_2.11/2.4.7/spark-unsafe_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/javax/activation/activation/1.1.1/activation-1.1.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/curator/curator-recipes/2.6.0/curator-recipes-2.6.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6.jar:/opt/apache-maven-3.6.1/repository/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar:/opt/apache-maven-3.6.1/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/opt/apache-maven-3.6.1/repository/org/slf4j/jul-to-slf4j/1.7.16/jul-to-slf4j-1.7.16.jar:/opt/apache-maven-3.6.1/repository/org/slf4j/jcl-over-slf4j/1.7.16/jcl-over-slf4j-1.7.16.jar:/opt/apache-maven-3.6.1/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar:/opt/apache-maven-3.6.1/repository/com/ning/compress-lzf/1.
 0.3/compress-lzf-1.0.3.jar:/opt/apache-maven-3.6.1/repository/org/xerial/snappy/snappy-java/1.1.7.5/snappy-java-1.1.7.5.jar:/opt/apache-maven-3.6.1/repository/org/lz4/lz4-java/1.4.0/lz4-java-1.4.0.jar:/opt/apache-maven-3.6.1/repository/com/github/luben/zstd-jni/1.3.2-2/zstd-jni-1.3.2-2.jar:/opt/apache-maven-3.6.1/repository/org/roaringbitmap/RoaringBitmap/0.7.45/RoaringBitmap-0.7.45.jar:/opt/apache-maven-3.6.1/repository/org/roaringbitmap/shims/0.7.45/shims-0.7.45.jar:/opt/apache-maven-3.6.1/repository/commons-net/commons-net/3.1/commons-net-3.1.jar:/opt/apache-maven-3.6.1/repository/org/json4s/json4s-jackson_2.11/3.5.3/json4s-jackson_2.11-3.5.3.jar:/opt/apache-maven-3.6.1/repository/org/json4s/json4s-core_2.11/3.5.3/json4s-core_2.11-3.5.3.jar:/opt/apache-maven-3.6.1/repository/org/json4s/json4s-ast_2.11/3.5.3/json4s-ast_2.11-3.5.3.jar:/opt/apache-maven-3.6.1/repository/org/json4s/json4s-scalap_2.11/3.5.3/json4s-scalap_2.11-3.5.3.jar:/opt/apache-maven-3.6.1/repository/org/scala-lang
 /modules/scala-xml_2.11/1.0.6/scala-xml_2.11-1.0.6.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/jersey/core/jersey-client/2.22.2/jersey-client-2.22.2.jar:/opt/apache-maven-3.6.1/repository/javax/ws/rs/javax.ws.rs-api/2.0.1/javax.ws.rs-api-2.0.1.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/hk2/hk2-api/2.4.0-b34/hk2-api-2.4.0-b34.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/hk2/hk2-utils/2.4.0-b34/hk2-utils-2.4.0-b34.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/hk2/external/aopalliance-repackaged/2.4.0-b34/aopalliance-repackaged-2.4.0-b34.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/hk2/external/javax.inject/2.4.0-b34/javax.inject-2.4.0-b34.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/hk2/hk2-locator/2.4.0-b34/hk2-locator-2.4.0-b34.jar:/opt/apache-maven-3.6.1/repository/org/javassist/javassist/3.18.1-GA/javassist-3.18.1-GA.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/jersey/core/jersey-common/2.22.2/jersey-common-2.22.2.jar:/opt/
 apache-maven-3.6.1/repository/javax/annotation/javax.annotation-api/1.2/javax.annotation-api-1.2.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/jersey/bundles/repackaged/jersey-guava/2.22.2/jersey-guava-2.22.2.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/hk2/osgi-resource-locator/1.0.1/osgi-resource-locator-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/jersey/core/jersey-server/2.22.2/jersey-server-2.22.2.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/jersey/media/jersey-media-jaxb/2.22.2/jersey-media-jaxb-2.22.2.jar:/opt/apache-maven-3.6.1/repository/javax/validation/validation-api/1.1.0.Final/validation-api-1.1.0.Final.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/jersey/containers/jersey-container-servlet/2.22.2/jersey-container-servlet-2.22.2.jar:/opt/apache-maven-3.6.1/repository/org/glassfish/jersey/containers/jersey-container-servlet-core/2.22.2/jersey-container-servlet-core-2.22.2.jar:/opt/apache-maven-3.6.1/repository/io/netty/netty-a
 ll/4.1.47.Final/netty-all-4.1.47.Final.jar:/opt/apache-maven-3.6.1/repository/io/netty/netty/3.9.9.Final/netty-3.9.9.Final.jar:/opt/apache-maven-3.6.1/repository/com/clearspring/analytics/stream/2.7.0/stream-2.7.0.jar:/opt/apache-maven-3.6.1/repository/io/dropwizard/metrics/metrics-core/3.1.5/metrics-core-3.1.5.jar:/opt/apache-maven-3.6.1/repository/io/dropwizard/metrics/metrics-jvm/3.1.5/metrics-jvm-3.1.5.jar:/opt/apache-maven-3.6.1/repository/io/dropwizard/metrics/metrics-json/3.1.5/metrics-json-3.1.5.jar:/opt/apache-maven-3.6.1/repository/io/dropwizard/metrics/metrics-graphite/3.1.5/metrics-graphite-3.1.5.jar:/opt/apache-maven-3.6.1/repository/com/fasterxml/jackson/core/jackson-databind/2.6.7.3/jackson-databind-2.6.7.3.jar:/opt/apache-maven-3.6.1/repository/com/fasterxml/jackson/module/jackson-module-scala_2.11/2.6.7.1/jackson-module-scala_2.11-2.6.7.1.jar:/opt/apache-maven-3.6.1/repository/com/fasterxml/jackson/module/jackson-module-paranamer/2.7.9/jackson-module-paranamer-2.7.9
 .jar:/opt/apache-maven-3.6.1/repository/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar:/opt/apache-maven-3.6.1/repository/oro/oro/2.0.8/oro-2.0.8.jar:/opt/apache-maven-3.6.1/repository/net/razorvine/pyrolite/4.13/pyrolite-4.13.jar:/opt/apache-maven-3.6.1/repository/net/sf/py4j/py4j/0.10.7/py4j-0.10.7.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-tags_2.11/2.4.7/spark-tags_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/org/apache/commons/commons-crypto/1.0.0/commons-crypto-1.0.0.jar:/opt/apache-maven-3.6.1/repository/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-sql_2.11/2.4.7/spark-sql_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/com/univocity/univocity-parsers/2.7.3/univocity-parsers-2.7.3.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-sketch_2.11/2.4.7/spark-sketch_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-catalyst_2.11/2.4.7/spark-catalyst_2.11-2.4.7.jar
 :/opt/apache-maven-3.6.1/repository/org/antlr/antlr4-runtime/4.7/antlr4-runtime-4.7.jar:/opt/apache-maven-3.6.1/repository/org/apache/orc/orc-core/1.5.5/orc-core-1.5.5-nohive.jar:/opt/apache-maven-3.6.1/repository/org/apache/orc/orc-shims/1.5.5/orc-shims-1.5.5.jar:/opt/apache-maven-3.6.1/repository/io/airlift/aircompressor/0.10/aircompressor-0.10.jar:/opt/apache-maven-3.6.1/repository/org/apache/orc/orc-mapreduce/1.5.5/orc-mapreduce-1.5.5-nohive.jar:/opt/apache-maven-3.6.1/repository/org/apache/parquet/parquet-column/1.10.1/parquet-column-1.10.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/parquet/parquet-common/1.10.1/parquet-common-1.10.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/parquet/parquet-encoding/1.10.1/parquet-encoding-1.10.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/parquet/parquet-hadoop/1.10.1/parquet-hadoop-1.10.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/parquet/parquet-format/2.4.0/parquet-format-2.4.0.jar:/opt/apache-maven-3.6.1/reposit
 ory/org/apache/parquet/parquet-jackson/1.10.1/parquet-jackson-1.10.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/arrow/arrow-vector/0.10.0/arrow-vector-0.10.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/arrow/arrow-format/0.10.0/arrow-format-0.10.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/arrow/arrow-memory/0.10.0/arrow-memory-0.10.0.jar:/opt/apache-maven-3.6.1/repository/joda-time/joda-time/2.9.9/joda-time-2.9.9.jar:/opt/apache-maven-3.6.1/repository/com/carrotsearch/hppc/0.7.2/hppc-0.7.2.jar:/opt/apache-maven-3.6.1/repository/com/vlkan/flatbuffers/1.2.0-3f79e055/flatbuffers-1.2.0-3f79e055.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-avro_2.11/2.4.7/spark-avro_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-mllib_2.11/2.4.7/spark-mllib_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/org/scala-lang/modules/scala-parser-combinators_2.11/1.1.0/scala-parser-combinators_2.11-1.1.0.jar:/opt/apache-maven-3.6.1/repository/org
 /apache/spark/spark-graphx_2.11/2.4.7/spark-graphx_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/com/github/fommil/netlib/core/1.1.2/core-1.1.2.jar:/opt/apache-maven-3.6.1/repository/net/sourceforge/f2j/arpack_combined_all/0.1/arpack_combined_all-0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-mllib-local_2.11/2.4.7/spark-mllib-local_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/org/scalanlp/breeze_2.11/0.13.2/breeze_2.11-0.13.2.jar:/opt/apache-maven-3.6.1/repository/org/scalanlp/breeze-macros_2.11/0.13.2/breeze-macros_2.11-0.13.2.jar:/opt/apache-maven-3.6.1/repository/net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar:/opt/apache-maven-3.6.1/repository/com/github/rwl/jtransforms/2.4.0/jtransforms-2.4.0.jar:/opt/apache-maven-3.6.1/repository/org/spire-math/spire_2.11/0.13.0/spire_2.11-0.13.0.jar:/opt/apache-maven-3.6.1/repository/org/spire-math/spire-macros_2.11/0.13.0/spire-macros_2.11-0.13.0.jar:/opt/apache-maven-3.6.1/repository/org/typelevel/machinist_2.11/0.6.
 1/machinist_2.11-0.6.1.jar:/opt/apache-maven-3.6.1/repository/com/chuusai/shapeless_2.11/2.3.2/shapeless_2.11-2.3.2.jar:/opt/apache-maven-3.6.1/repository/org/typelevel/macro-compat_2.11/1.1.1/macro-compat_2.11-1.1.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/spark/spark-streaming_2.11/2.4.7/spark-streaming_2.11-2.4.7.jar:/opt/apache-maven-3.6.1/repository/com/typesafe/config/1.3.1/config-1.3.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-common/3.1.1/hadoop-common-3.1.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-annotations/3.1.1/hadoop-annotations-3.1.1.jar:/opt/apache-maven-3.6.1/repository/com/google/guava/guava/11.0.2/guava-11.0.2.jar:/opt/apache-maven-3.6.1/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/opt/apache-maven-3.6.1/repository/org/apache/httpcomponents/httpclient/4.5.2/httpclient-4.5.2.jar:/opt/apache-maven-3.6.1/repository/org/apache/httpcomponents/httpcore/4.4.4/httpcore-4.4.4.jar:/opt/apache-maven-3.6.1/r
 epository/commons-codec/commons-codec/1.11/commons-codec-1.11.jar:/opt/apache-maven-3.6.1/repository/commons-io/commons-io/2.5/commons-io-2.5.jar:/opt/apache-maven-3.6.1/repository/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar:/opt/apache-maven-3.6.1/repository/org/eclipse/jetty/jetty-server/9.3.19.v20170502/jetty-server-9.3.19.v20170502.jar:/opt/apache-maven-3.6.1/repository/org/eclipse/jetty/jetty-http/9.3.19.v20170502/jetty-http-9.3.19.v20170502.jar:/opt/apache-maven-3.6.1/repository/org/eclipse/jetty/jetty-io/9.3.19.v20170502/jetty-io-9.3.19.v20170502.jar:/opt/apache-maven-3.6.1/repository/org/eclipse/jetty/jetty-util/9.3.19.v20170502/jetty-util-9.3.19.v20170502.jar:/opt/apache-maven-3.6.1/repository/org/eclipse/jetty/jetty-servlet/9.3.19.v20170502/jetty-servlet-9.3.19.v20170502.jar:/opt/apache-maven-3.6.1/repository/org/eclipse/jetty/jetty-security/9.3.19.v20170502/jetty-security-9.3.19.v20170502.jar:/opt/apache-maven-3.6.1/repository/org/eclipse/j
 etty/jetty-webapp/9.3.19.v20170502/jetty-webapp-9.3.19.v20170502.jar:/opt/apache-maven-3.6.1/repository/org/eclipse/jetty/jetty-xml/9.3.19.v20170502/jetty-xml-9.3.19.v20170502.jar:/opt/apache-maven-3.6.1/repository/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar:/opt/apache-maven-3.6.1/repository/com/sun/jersey/jersey-core/1.19/jersey-core-1.19.jar:/opt/apache-maven-3.6.1/repository/javax/ws/rs/jsr311-api/1.1.1/jsr311-api-1.1.1.jar:/opt/apache-maven-3.6.1/repository/com/sun/jersey/jersey-servlet/1.19/jersey-servlet-1.19.jar:/opt/apache-maven-3.6.1/repository/com/sun/jersey/jersey-json/1.19/jersey-json-1.19.jar:/opt/apache-maven-3.6.1/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar:/opt/apache-maven-3.6.1/repository/org/codehaus/jackson/jackson-jaxrs/1.9.2/jackson-jaxrs-1.9.2.jar:/opt/apache-maven-3.6.1/repository/org/codehaus/jackson/jackson-xc/1.9.2/jackson-xc-1.9.2.jar:/opt/apache-maven-3.6.1/repository/com/sun/jersey/jersey-server/1.19/jersey-server-1.19.jar:/op
 t/apache-maven-3.6.1/repository/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar:/opt/apache-maven-3.6.1/repository/commons-lang/commons-lang/2.6/commons-lang-2.6.jar:/opt/apache-maven-3.6.1/repository/commons-beanutils/commons-beanutils/1.9.3/commons-beanutils-1.9.3.jar:/opt/apache-maven-3.6.1/repository/org/apache/commons/commons-configuration2/2.1.1/commons-configuration2-2.1.1.jar:/opt/apache-maven-3.6.1/repository/com/google/re2j/re2j/1.1/re2j-1.1.jar:/opt/apache-maven-3.6.1/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar:/opt/apache-maven-3.6.1/repository/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar:/opt/apache-maven-3.6.1/repository/com/jcraft/jsch/0.1.54/jsch-0.1.54.jar:/opt/apache-maven-3.6.1/repository/org/apache/curator/curator-client/2.12.0/curator-client-2.12.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar:/opt/apache-maven-3.6.1/repository/org/apache/k
 erby/kerb-simplekdc/1.0.1/kerb-simplekdc-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerb-client/1.0.1/kerb-client-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerby-config/1.0.1/kerby-config-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerb-core/1.0.1/kerb-core-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerby-pkix/1.0.1/kerby-pkix-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerby-asn1/1.0.1/kerby-asn1-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerby-util/1.0.1/kerby-util-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerb-common/1.0.1/kerb-common-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerb-crypto/1.0.1/kerb-crypto-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerb-util/1.0.1/kerb-util-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/token-provider/1.0.1/token-provider-1.0.1.jar:/opt/apache-maven-3.6.1/reposito
 ry/org/apache/kerby/kerb-admin/1.0.1/kerb-admin-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerb-server/1.0.1/kerb-server-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerb-identity/1.0.1/kerb-identity-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/kerby/kerby-xdr/1.0.1/kerby-xdr-1.0.1.jar:/opt/apache-maven-3.6.1/repository/org/codehaus/woodstox/stax2-api/3.1.4/stax2-api-3.1.4.jar:/opt/apache-maven-3.6.1/repository/com/fasterxml/woodstox/woodstox-core/5.0.3/woodstox-core-5.0.3.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-hdfs/3.1.1/hadoop-hdfs-3.1.1.jar:/opt/apache-maven-3.6.1/repository/org/eclipse/jetty/jetty-util-ajax/9.3.19.v20170502/jetty-util-ajax-9.3.19.v20170502.jar:/opt/apache-maven-3.6.1/repository/commons-daemon/commons-daemon/1.0.13/commons-daemon-1.0.13.jar:/opt/apache-maven-3.6.1/repository/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoo
 p/hadoop-client/3.1.1/hadoop-client-3.1.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-hdfs-client/3.1.1/hadoop-hdfs-client-3.1.1.jar:/opt/apache-maven-3.6.1/repository/com/squareup/okhttp/okhttp/2.7.5/okhttp-2.7.5.jar:/opt/apache-maven-3.6.1/repository/com/squareup/okio/okio/1.6.0/okio-1.6.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-yarn-api/3.1.1/hadoop-yarn-api-3.1.1.jar:/opt/apache-maven-3.6.1/repository/javax/xml/bind/jaxb-api/2.2.11/jaxb-api-2.2.11.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-yarn-client/3.1.1/hadoop-yarn-client-3.1.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-mapreduce-client-core/3.1.1/hadoop-mapreduce-client-core-3.1.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-yarn-common/3.1.1/hadoop-yarn-common-3.1.1.jar:/opt/apache-maven-3.6.1/repository/com/sun/jersey/jersey-client/1.19/jersey-client-1.19.jar:/opt/apache-maven-3.6.1/repository/com/fasterxml/jackson/mod
 ule/jackson-module-jaxb-annotations/2.7.8/jackson-module-jaxb-annotations-2.7.8.jar:/opt/apache-maven-3.6.1/repository/com/fasterxml/jackson/jaxrs/jackson-jaxrs-json-provider/2.7.8/jackson-jaxrs-json-provider-2.7.8.jar:/opt/apache-maven-3.6.1/repository/com/fasterxml/jackson/jaxrs/jackson-jaxrs-base/2.7.8/jackson-jaxrs-base-2.7.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-mapreduce-client-jobclient/3.1.1/hadoop-mapreduce-client-jobclient-3.1.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-mapreduce-client-common/3.1.1/hadoop-mapreduce-client-common-3.1.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-auth/3.1.1/hadoop-auth-3.1.1.jar:/opt/apache-maven-3.6.1/repository/com/nimbusds/nimbus-jose-jwt/4.41.1/nimbus-jose-jwt-4.41.1.jar:/opt/apache-maven-3.6.1/repository/com/github/stephenc/jcip/jcip-annotations/1.0-1/jcip-annotations-1.0-1.jar:/opt/apache-maven-3.6.1/repository/net/minidev/json-smart/2.3/json-smart-2.3.jar:/opt/apache
 -maven-3.6.1/repository/net/minidev/accessors-smart/1.2/accessors-smart-1.2.jar:/opt/apache-maven-3.6.1/repository/org/apache/curator/curator-framework/2.12.0/curator-framework-2.12.0.jar:/opt/apache-maven-3.6.1/repository/mysql/mysql-connector-java/8.0.20/mysql-connector-java-8.0.20.jar:/opt/apache-maven-3.6.1/repository/com/opencsv/opencsv/5.2/opencsv-5.2.jar:/opt/apache-maven-3.6.1/repository/org/apache/commons/commons-text/1.8/commons-text-1.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/commons/commons-collections4/4.4/commons-collections4-4.4.jar:/opt/apache-maven-3.6.1/repository/com/alibaba/easyexcel/2.2.6/easyexcel-2.2.6.jar:/opt/apache-maven-3.6.1/repository/cglib/cglib/3.1/cglib-3.1.jar:/opt/apache-maven-3.6.1/repository/org/ow2/asm/asm/4.2/asm-4.2.jar:/opt/apache-maven-3.6.1/repository/org/ehcache/ehcache/3.4.0/ehcache-3.4.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/poi/poi/3.17/poi-3.17.jar:/opt/apache-maven-3.6.1/repository/org/apache/poi/poi-ooxml/3.17/po
 i-ooxml-3.17.jar:/opt/apache-maven-3.6.1/repository/com/github/virtuald/curvesapi/1.04/curvesapi-1.04.jar:/opt/apache-maven-3.6.1/repository/org/apache/poi/poi-ooxml-schemas/3.17/poi-ooxml-schemas-3.17.jar:/opt/apache-maven-3.6.1/repository/org/apache/xmlbeans/xmlbeans/3.1.0/xmlbeans-3.1.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/hudi/hudi-spark-bundle_2.11/0.9.0/hudi-spark-bundle_2.11-0.9.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/hive-exec/2.3.8/hive-exec-2.3.8-core.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/hive-vector-code-gen/2.3.8/hive-vector-code-gen-2.3.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/velocity/velocity/1.5/velocity-1.5.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/hive-llap-tez/2.3.8/hive-llap-tez-2.3.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/hive-common/2.3.8/hive-common-2.3.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/hive-storage-api/2.4.0/hive-storage-api-2.4.0.jar:/opt/apache-mave
 n-3.6.1/repository/org/apache/orc/orc-core/1.3.4/orc-core-1.3.4.jar:/opt/apache-maven-3.6.1/repository/jline/jline/2.12/jline-2.12.jar:/opt/apache-maven-3.6.1/repository/org/apache/logging/log4j/log4j-web/2.6.2/log4j-web-2.6.2.jar:/opt/apache-maven-3.6.1/repository/com/tdunning/json/1.8/json-1.8.jar:/opt/apache-maven-3.6.1/repository/com/github/joshelser/dropwizard-metrics-hadoop-metrics2-reporter/0.1.2/dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/hive-llap-client/2.3.8/hive-llap-client-2.3.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/hive-llap-common/2.3.8/hive-llap-common-2.3.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/hive-serde/2.3.8/hive-serde-2.3.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/hive-service-rpc/2.3.8/hive-service-rpc-2.3.8.jar:/opt/apache-maven-3.6.1/repository/tomcat/jasper-compiler/5.5.23/jasper-compiler-5.5.23.jar:/opt/apache-maven-3.6.1/repository/tomcat/jasper-run
 time/5.5.23/jasper-runtime-5.5.23.jar:/opt/apache-maven-3.6.1/repository/commons-el/commons-el/1.0/commons-el-1.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/thrift/libfb303/0.9.3/libfb303-0.9.3.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/hive-shims/2.3.8/hive-shims-2.3.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/shims/hive-shims-common/2.3.8/hive-shims-common-2.3.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/shims/hive-shims-0.23/2.3.8/hive-shims-0.23-2.3.8.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-yarn-server-resourcemanager/2.7.2/hadoop-yarn-server-resourcemanager-2.7.2.jar:/opt/apache-maven-3.6.1/repository/com/google/inject/extensions/guice-servlet/3.0/guice-servlet-3.0.jar:/opt/apache-maven-3.6.1/repository/com/google/inject/guice/3.0/guice-3.0.jar:/opt/apache-maven-3.6.1/repository/javax/inject/javax.inject/1/javax.inject-1.jar:/opt/apache-maven-3.6.1/repository/aopalliance/aopalliance/1.0/aopalliance-1.0.jar:/opt
 /apache-maven-3.6.1/repository/com/sun/jersey/contribs/jersey-guice/1.9/jersey-guice-1.9.jar:/opt/apache-maven-3.6.1/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-yarn-server-common/2.7.2/hadoop-yarn-server-common-2.7.2.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-yarn-server-applicationhistoryservice/2.7.2/hadoop-yarn-server-applicationhistoryservice-2.7.2.jar:/opt/apache-maven-3.6.1/repository/org/apache/hadoop/hadoop-yarn-server-web-proxy/2.7.2/hadoop-yarn-server-web-proxy-2.7.2.jar:/opt/apache-maven-3.6.1/repository/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar:/opt/apache-maven-3.6.1/repository/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6-tests.jar:/opt/apache-maven-3.6.1/repository/org/apache/hive/shims/hive-shims-scheduler/2.3.8/hive-shims-scheduler-2.3.8.jar:/opt/apache-maven-3.6.1/repository/commons-httpclient/commons-httpclient/3.0.1/commons-httpclient-3.0.1.jar:
 /opt/apache-maven-3.6.1/repository/org/antlr/antlr-runtime/3.5.2/antlr-runtime-3.5.2.jar:/opt/apache-maven-3.6.1/repository/org/antlr/ST4/4.0.4/ST4-4.0.4.jar:/opt/apache-maven-3.6.1/repository/org/apache/ant/ant/1.9.1/ant-1.9.1.jar:/opt/apache-maven-3.6.1/repository/org/apache/ant/ant-launcher/1.9.1/ant-launcher-1.9.1.jar:/opt/apache-maven-3.6.1/repository/org/codehaus/groovy/groovy-all/2.4.4/groovy-all-2.4.4.jar:/opt/apache-maven-3.6.1/repository/org/datanucleus/datanucleus-core/4.1.17/datanucleus-core-4.1.17.jar:/opt/apache-maven-3.6.1/repository/stax/stax-api/1.0.1/stax-api-1.0.1.jar:/opt/apache-maven-3.6.1/repository/net/hydromatic/eigenbase-properties/1.1.5/eigenbase-properties-1.1.5.jar:/opt/apache-maven-3.6.1/repository/org/codehaus/janino/janino/3.0.16/janino-3.0.16.jar:/opt/apache-maven-3.6.1/repository/org/codehaus/janino/commons-compiler/3.0.16/commons-compiler-3.0.16.jar:/opt/apache-maven-3.6.1/repository/org/apache/parquet/parquet-hadoop-bundle/1.10.1/parquet-hadoop-bun
 dle-1.10.1.jar:/opt/apache-maven-3.6.1/repository/com/aliyun/oss/aliyun-sdk-oss/3.11.2/aliyun-sdk-oss-3.11.2.jar:/opt/apache-maven-3.6.1/repository/org/jdom/jdom2/2.0.6/jdom2-2.0.6.jar:/opt/apache-maven-3.6.1/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar:/opt/apache-maven-3.6.1/repository/com/aliyun/aliyun-java-sdk-core/4.5.10/aliyun-java-sdk-core-4.5.10.jar:/opt/apache-maven-3.6.1/repository/org/jacoco/org.jacoco.agent/0.8.5/org.jacoco.agent-0.8.5-runtime.jar:/opt/apache-maven-3.6.1/repository/org/ini4j/ini4j/0.5.4/ini4j-0.5.4.jar:/opt/apache-maven-3.6.1/repository/io/opentracing/opentracing-api/0.33.0/opentracing-api-0.33.0.jar:/opt/apache-maven-3.6.1/repository/io/opentracing/opentracing-util/0.33.0/opentracing-util-0.33.0.jar:/opt/apache-maven-3.6.1/repository/io/opentracing/opentracing-noop/0.33.0/opentracing-noop-0.33.0.jar:/opt/apache-maven-3.6.1/repository/com/aliyun/aliyun-java-sdk-ram/3.1.0/aliyun-java-sdk-ram-3.1.0.jar:/opt/apache-maven-3.6.1/repository/c
 om/aliyun/aliyun-java-sdk-kms/2.11.0/aliyun-java-sdk-kms-2.11.0.jar:/opt/apache-maven-3.6.1/repository/org/aeonbits/owner/owner/1.0.12/owner-1.0.12.jar:/opt/apache-maven-3.6.1/repository/org/apache/logging/log4j/log4j-core/2.14.0/log4j-core-2.14.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/logging/log4j/log4j-api/2.14.0/log4j-api-2.14.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/logging/log4j/log4j-1.2-api/2.14.0/log4j-1.2-api-2.14.0.jar:/opt/apache-maven-3.6.1/repository/org/slf4j/slf4j-api/1.7.30/slf4j-api-1.7.30.jar:/opt/apache-maven-3.6.1/repository/org/apache/flume/flume-ng-clients/flume-ng-log4jappender/1.9.0/flume-ng-log4jappender-1.9.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/flume/flume-ng-sdk/1.9.0/flume-ng-sdk-1.9.0.jar:/opt/apache-maven-3.6.1/repository/org/apache/thrift/libthrift/0.9.3/libthrift-0.9.3.jar:/opt/apache-maven-3.6.1/repository/org/apache/logging/log4j/log4j-flume-ng/2.14.0/log4j-flume-ng-2.14.0.jar:/opt/apache-maven-3.6.1/repository/c
 om/sleepycat/je/5.0.73/je-5.0.73.jar:/opt/apache-maven-3.6.1/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar:/opt/apache-maven-3.6.1/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar:/opt/apache-maven-3.6.1/repository/org/anarres/lzo/lzo-hadoop/1.0.6/lzo-hadoop-1.0.6.jar:/opt/apache-maven-3.6.1/repository/com/google/code/findbugs/annotations/2.0.3/annotations-2.0.3.jar:/opt/apache-maven-3.6.1/repository/org/anarres/lzo/lzo-core/1.0.6/lzo-core-1.0.6.jar:/opt/apache-maven-3.6.1/repository/org/apache/commons/commons-email/1.5/commons-email-1.5.jar:/opt/apache-maven-3.6.1/repository/com/sun/mail/javax.mail/1.5.6/javax.mail-1.5.6.jar:/opt/apache-maven-3.6.1/repository/org/apache/commons/commons-compress/1.20/commons-compress-1.20.jar:/opt/apache-maven-3.6.1/repository/com/belerweb/pinyin4j/2.5.1/pinyin4j-2.5.1.jar:/home/joshua/Code/SpartaQjzhu com.leqee.ontariosync.function.Test4
   Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
   +---+----+---+-----------------------+
   |id |name|age|dt                     |
   +---+----+---+-----------------------+
   |1  |A   |10 |2021-11-17 17:21:40.508|
   |2  |B   |20 |2021-11-17 17:21:40.508|
   |3  |C   |30 |2021-11-17 17:21:40.508|
   +---+----+---+-----------------------+
   
   ===================Snapshot Read==============================
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|id |name|age|dt |
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   
   ==============================================================
   ===================Incremental Read==============================
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                     |id |name|age|dt                     |
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   |20211117172143     |20211117172143_0_1  |1                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|1  |A   |10 |2021-11-17 17:21:40.508|
   |20211117172143     |20211117172143_0_2  |2                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|2  |B   |20 |2021-11-17 17:21:40.508|
   |20211117172143     |20211117172143_0_3  |3                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|3  |C   |30 |2021-11-17 17:21:40.508|
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   
   ==============================================================
   
   
   ```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-997099421


   thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-991846087


   thanks for confirming this. Can we close it out if you don't have more questions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN edited a comment on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

JoshuaZhuCN edited a comment on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-971393339


   @xushiyan Hi，Here is my test code：
   
   ```
   import com.leqee.sparktool.date.DateUtil
   import com.leqee.sparktool.hoodie.HoodieProp
   import com.leqee.sparktool.spark.SparkTool
   import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, DataSourceOptionsHelper}
   import org.apache.spark.SparkConf
   import org.apache.spark.sql.{Row, SaveMode, SparkSession}
   import org.apache.spark.sql.functions.{col, lit}
   import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType, TimestampType}
   import org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER
   import org.apache.hudi.common.config.HoodieMetadataConfig
   import org.apache.hudi.common.model.HoodieCleaningPolicy
   import org.apache.hudi.config._
   import org.apache.hudi.index.HoodieIndex
   import org.apache.hudi.keygen.constant.KeyGeneratorOptions
   
   object Test4 {
       def main(args: Array[String]): Unit = {
           val AD_DATE = "1980-01-01 00:00:00"
           val spark = SparkSession
               .builder()
               .config(
                   new SparkConf()
                       .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                   .   set("spark.master","local[2]")
               )
               .getOrCreate()
   
           val data = Seq(
               Row(1, "A", 10, DateUtil.now()),
               Row(2, "B", 20, DateUtil.now()),
               Row(3, "C", 30, DateUtil.now()))
   
           val schema = StructType(List(
               StructField("id", IntegerType),
               StructField("name", StringType),
               StructField("age", IntegerType),
               StructField("dt", StringType)))
   
           val df = spark.createDataFrame(spark.sparkContext.makeRDD(data), schema)
   
           df.show(false)
   
           df.write.format("org.apache.hudi")
               .option(DataSourceWriteOptions.RECORDKEY_FIELD.key(), "id")
               .option(DataSourceWriteOptions.PRECOMBINE_FIELD.key(), "id")
               .option(HoodieWriteConfig.TBL_NAME.key(), "tb_hbase_test")
               .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
               .option("hoodie.datasource.write.table.type", "MERGE_ON_READ")
               .option("hoodie.index.type", "HBASE")
               .option("hoodie.index.hbase.zkport", "2181")
               .option("hoodie.hbase.index.update.partition.path", "true")
               .option("hoodie.index.hbase.max.qps.fraction", "10000")
               .option("hoodie.index.hbase.min.qps.fraction", "1000")
               .option("hoodie.index.hbase.table", "hudi:tb_hbase_test")
               .option("hoodie.index.hbase.zknode.path", "/hbase")
               .option("hoodie.index.hbase.get.batch.size", "1000")
               .option("hoodie.index.hbase.zkquorum", "127.0.0.1")
               .option("hoodie.index.hbase.sleep.ms.for.get.batch", "100")
               .option("hoodie.index.hbase.sleep.ms.for.get.batch", "10")
               .option("hoodie.index.hbase.max.qps.per.region.server", "1000")
               .option("hoodie.index.hbase.zk.session_timeout_ms", "5000")
               .option("hoodie.index.hbase.desired_puts_time_in_secs", "3600")
               .option(HoodieProp.INDEX_HBASE_ZKPORT_PROP, HoodieProp.INDEX_HBASE_ZKPORT_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_PROP, HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_PROP, HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_PROP, HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_PROP, HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_PROP, HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_PROP, HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_PROP, HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_MAX_QPS_FRACTION_PROP, HoodieProp.INDEX_HABSE_MAX_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HABSE_MIN_QPS_FRACTION_PROP, HoodieProp.INDEX_HBASE_MIN_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_PROP, HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_PROP, HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_QPS_FRACTION_PROP, HoodieProp.INDEX_HBASE_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZK_SESSION_TIMEOUT_MS_PROP, HoodieProp.INDEX_HBASE_ZK_SESSION_TIMEOUT_MS_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_PROP, HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_DESIRED_PUTS_TIME_IN_SECONDS_PROP, HoodieProp.INDEX_HBASE_DESIRED_PUTS_TIME_IN_SECONDS_VALUE_DEFAULT)
               .mode(SaveMode.Overwrite)
               .save("hdfs://localhost:9000/hoodie/tb_hbase_test")
   
           println("===================Snapshot Read==============================")
           spark.read
               .format("hudi")
               .option("mergeSchema", "true")
               .option(DataSourceReadOptions.QUERY_TYPE.key(), DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL)
               .load("hdfs://localhost:9000/hoodie/tb_hbase_test/default/*")
               .show(false)
           println("==============================================================")
           println("===================Incremental Read==============================")
           spark.read
               .format("hudi")
               .option("mergeSchema", "true")
               .option(DataSourceReadOptions.QUERY_TYPE.key(), DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
               .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key(), AD_DATE)
               .load("hdfs://localhost:9000/hoodie/tb_hbase_test/default/*")
               .show(false)
           println("==============================================================")
           
           spark.close()
       }
   }
   
   ```
   
   ***The output is as follows：
   ```
   com.leqee.ontariosync.function.Test4
   Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
   +---+----+---+-----------------------+
   |id |name|age|dt                     |
   +---+----+---+-----------------------+
   |1  |A   |10 |2021-11-17 17:21:40.508|
   |2  |B   |20 |2021-11-17 17:21:40.508|
   |3  |C   |30 |2021-11-17 17:21:40.508|
   +---+----+---+-----------------------+
   
   ===================Snapshot Read==============================
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|id |name|age|dt |
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   
   ==============================================================
   ===================Incremental Read==============================
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                     |id |name|age|dt                     |
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   |20211117172143     |20211117172143_0_1  |1                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|1  |A   |10 |2021-11-17 17:21:40.508|
   |20211117172143     |20211117172143_0_2  |2                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|2  |B   |20 |2021-11-17 17:21:40.508|
   |20211117172143     |20211117172143_0_3  |3                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|3  |C   |30 |2021-11-17 17:21:40.508|
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   
   ==============================================================
   
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN edited a comment on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

JoshuaZhuCN edited a comment on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-971393339


   @xushiyan Hi，Here is my test code：
   
   
   ```
   import com.leqee.sparktool.date.DateUtil
   import com.leqee.sparktool.hoodie.HoodieProp
   import com.leqee.sparktool.spark.SparkTool
   import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, DataSourceOptionsHelper}
   import org.apache.spark.SparkConf
   import org.apache.spark.sql.{Row, SaveMode, SparkSession}
   import org.apache.spark.sql.functions.{col, lit}
   import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType, TimestampType}
   import org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER
   import org.apache.hudi.common.config.HoodieMetadataConfig
   import org.apache.hudi.common.model.HoodieCleaningPolicy
   import org.apache.hudi.config._
   import org.apache.hudi.index.HoodieIndex
   import org.apache.hudi.keygen.constant.KeyGeneratorOptions
   
   object Test4 {
       def main(args: Array[String]): Unit = {
           val AD_DATE = "1980-01-01 00:00:00"
           val spark = SparkSession
               .builder()
               .config(
                   new SparkConf()
                       .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                   .   set("spark.master","local[2]")
               )
               .getOrCreate()
   
           val data = Seq(
               Row(1, "A", 10, DateUtil.now()),
               Row(2, "B", 20, DateUtil.now()),
               Row(3, "C", 30, DateUtil.now()))
   
           val schema = StructType(List(
               StructField("id", IntegerType),
               StructField("name", StringType),
               StructField("age", IntegerType),
               StructField("dt", StringType)))
   
           val df = spark.createDataFrame(spark.sparkContext.makeRDD(data), schema)
   
           df.show(false)
   
           df.write.format("org.apache.hudi")
               .option(DataSourceWriteOptions.RECORDKEY_FIELD.key(), "id")
               .option(DataSourceWriteOptions.PRECOMBINE_FIELD.key(), "id")
               .option(HoodieWriteConfig.TBL_NAME.key(), "tb_hbase_test")
               .option(DataSourceWriteOptions.OPERATION.key, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
               .option("hoodie.datasource.write.table.type", "MERGE_ON_READ")
               .option("hoodie.index.type", "HBASE")
               .option("hoodie.index.hbase.zkport", "2181")
               .option("hoodie.hbase.index.update.partition.path", "true")
               .option("hoodie.index.hbase.max.qps.fraction", "10000")
               .option("hoodie.index.hbase.min.qps.fraction", "1000")
               .option("hoodie.index.hbase.table", "hudi:tb_hbase_test")
               .option("hoodie.index.hbase.zknode.path", "/hbase")
               .option("hoodie.index.hbase.get.batch.size", "1000")
               .option("hoodie.index.hbase.zkquorum", "127.0.0.1")
               .option("hoodie.index.hbase.sleep.ms.for.get.batch", "100")
               .option("hoodie.index.hbase.sleep.ms.for.get.batch", "10")
               .option("hoodie.index.hbase.max.qps.per.region.server", "1000")
               .option("hoodie.index.hbase.zk.session_timeout_ms", "5000")
               .option("hoodie.index.hbase.desired_puts_time_in_secs", "3600")
               .option(HoodieProp.INDEX_HBASE_ZKPORT_PROP, HoodieProp.INDEX_HBASE_ZKPORT_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_PROP, HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_PROP, HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_PROP, HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_PROP, HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_PROP, HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_PROP, HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_PROP, HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_MAX_QPS_FRACTION_PROP, HoodieProp.INDEX_HABSE_MAX_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HABSE_MIN_QPS_FRACTION_PROP, HoodieProp.INDEX_HBASE_MIN_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_PROP, HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_PROP, HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_QPS_FRACTION_PROP, HoodieProp.INDEX_HBASE_QPS_FRACTION_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_ZK_SESSION_TIMEOUT_MS_PROP, HoodieProp.INDEX_HBASE_ZK_SESSION_TIMEOUT_MS_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_PROP, HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_VALUE_DEFAULT)
               .option(HoodieProp.INDEX_HBASE_DESIRED_PUTS_TIME_IN_SECONDS_PROP, HoodieProp.INDEX_HBASE_DESIRED_PUTS_TIME_IN_SECONDS_VALUE_DEFAULT)
               .mode(SaveMode.Overwrite)
               .save("hdfs://localhost:9000/hoodie/tb_hbase_test")
   
           println("===================Snapshot Read==============================")
           spark.read
               .format("hudi")
               .option("mergeSchema", "true")
               .option(DataSourceReadOptions.QUERY_TYPE.key(), DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL)
               .load("hdfs://localhost:9000/hoodie/tb_hbase_test/default/*")
               .show(false)
           println("==============================================================")
           println("===================Incremental Read==============================")
           spark.read
               .format("hudi")
               .option("mergeSchema", "true")
               .option(DataSourceReadOptions.QUERY_TYPE.key(), DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL)
               .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key(), AD_DATE)
               .load("hdfs://localhost:9000/hoodie/tb_hbase_test/default/*")
               .show(false)
           println("==============================================================")
           
           spark.close()
       }
   }
   
   ```
   
   **The output is as follows：**
   ```
   com.leqee.ontariosync.function.Test4
   Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
   +---+----+---+-----------------------+
   |id |name|age|dt                     |
   +---+----+---+-----------------------+
   |1  |A   |10 |2021-11-17 17:21:40.508|
   |2  |B   |20 |2021-11-17 17:21:40.508|
   |3  |C   |30 |2021-11-17 17:21:40.508|
   +---+----+---+-----------------------+
   
   ===================Snapshot Read==============================
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name|id |name|age|dt |
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   +-------------------+--------------------+------------------+----------------------+-----------------+---+----+---+---+
   
   ==============================================================
   ===================Incremental Read==============================
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                     |id |name|age|dt                     |
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   |20211117172143     |20211117172143_0_1  |1                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|1  |A   |10 |2021-11-17 17:21:40.508|
   |20211117172143     |20211117172143_0_2  |2                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|2  |B   |20 |2021-11-17 17:21:40.508|
   |20211117172143     |20211117172143_0_3  |3                 |default               |f66c7e31-56ac-49cc-b147-9a1f27908c7f-0|3  |C   |30 |2021-11-17 17:21:40.508|
   +-------------------+--------------------+------------------+----------------------+--------------------------------------+---+----+---+-----------------------+
   
   ==============================================================
   
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN commented on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

JoshuaZhuCN commented on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-967144402


   > > after initializing the data with insert or upsert, only log files will be generated in the directory, and there is no parquet file.
   > 
   > This sounds very strange. When you insert to a new empty table, it's not meant to create log files, rather only write parquet. Only when subsequent update will result in log files. You mentioned HBase index. With your code and data, can you configure it to run data generation with SIMPLE index just to see any difference? wanted to rule out if HBase index is the problem here.
   
   @xushiyan I have tried using the (GLOBAL)SIMPLE index, (GLOBAL)BLOOM index for insert or upsert, which generates log files and does not generate parquet files, but using the HBASE index for insert or upsert, which only generates log files, The parquet file is generated only with bulk_insert.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] JoshuaZhuCN commented on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

Posted by GitBox <gi...@apache.org>.

JoshuaZhuCN commented on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-979846358


   > @JoshuaZhuCN spark should support read pure log table. When you specify the load path, do not use wildcards, just specify the path to the table level。 load("hdfs://localhost:9000/hoodie/tb_hbase_test/default/*") -> load("hdfs://localhost:9000/hoodie/tb_hbase_test")
   
   @xiarixiaoyao 
   Thanks for the idea, I reproduce this scene:
   
   1, using the old version of the read logic, using wildcards to read data, will not be able to read pure log files
   
   2, the logic of reading using the new version, not wildcard, pure can read the log file (DataSourceWriteOptions must be set. While writing PARTITIONPATH_FIELD parameters, otherwise an error)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org