You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/07 14:10:27 UTC

[GitHub] [hudi] jardel-lima commented on issue #3879: [SUPPORT] Incomplete Table Migration

jardel-lima commented on issue #3879:
URL: https://github.com/apache/hudi/issues/3879#issuecomment-1007434803


   Hi @nsivabalan. 
   [HERE](https://drive.google.com/file/d/1RsesivvlLUZ9dZh7WbaGJJpnqIcWNbso/view?usp=sharing) is the dataset used to replicate this problem. The file is not public, but will give access as soon as you request.
   
   Here is the code that initiate the spark session, maybe it will be useful for you:
   ```
   spark = (
       SparkSession.builder.appName("Hudi_Data_Processing_Framework")
       .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
       .config("spark.sql.hive.convertMetastoreParquet", "false")
       .config("spark.jars.packages","org.apache.hudi:hudi-spark3-bundle_2.12:0.9.0")
       .config("spark.executor.memory", "4G")
       .config("spark.executor.cores","2")
       .enableHiveSupport()
       .getOrCreate()
   )
   ```
   
   Here is the code used to load this dataset:
   ```
   df = spark.read.load('<<dataset_path>>',
                          encoding='utf-8',
                          format='com.databricks.spark.csv',
                          header=True,
                          delimiter=';',
                          inferSchema=True)
   ```
   Sorry for the daley. I hope it can help you identify the problem. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org