You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/06 03:13:09 UTC

[GitHub] [hudi] pengzhiwei2018 commented on pull request #2645: [HUDI-1659] Basic Implementation Of Spark Sql Support

pengzhiwei2018 commented on pull request #2645:
URL: https://github.com/apache/hudi/pull/2645#issuecomment-833192518


   Hi @vinothchandar , Thanks for your  working on the test.
   
   - CREATE TABLE
   > Even if it fails, it ends up creating the table (i.e its not atomic per se)
   
   Yes, It is not atomic for CTAS currently. I can fix this in this PR later.
   
   > When selecting all columns (probably need more tests across data types)
   
   Yes, I will add more tests across data types.
   
   
   > Truncate table
   
   For `Truncate table`, we need do some work for hudi which is not covered in this PR. I will file another PR to solve this.
   
   
   
   - MergeInto
   >1、 Fails due to assignment field/schema mismatch
   
   Currently, merge into cannot support Partial updates, we should specified all the fields of the target table in the update set assignments.
   
   >2、 Merges only allowed by PK
   
   Yes, this is currently a limitation of PR to `merge into` as we discussed in the RFC-25. I think we can solve this in another PR.
   
   > 3、Merge not updating to new value
   
   This is a same issue with 1,currently we do not support Partial updates.
   
   
   - Delete Table
   
   > Non PK based deletes are not working atm
   
   Currently we cannot support delete or update a non-pk hudi table. For this case, we can use the `_hoodie_record_key` to  identify a record and do the delete & update. We can file a PR to support this.
   
   > Why do we have to encode column name into reach record key? i.e _hoodie_record_key = '1' vs being _hoodie_record_key = 'id:1'
   
   This is hoodie's original behavior for `_hoodie_record_key`.
   
   
   - Create or Replace table
   This has not supported in this PR. I will file a PR for this.
   
   
   - Create table, partitioned by
   > create table hudi_gh_ext using hudi partitioned by (type) location 'file:///tmp/hudi-gh-ext' as select type, public, payload, repo, actor, org, id, other  from gh_raw]
   java.lang.AssertionError: assertion failed
   
   The partitioned column must be on the rightmost side of the SELECT column. This is 
   a requirement of Spark SQL. So we should move the `type` to the last select column, just like this:
   > create table hudi_gh_ext using hudi partitioned by (type) location 'file:///tmp/hudi-gh-ext' as select public, payload, repo, actor, org, id, other , type from gh_raw
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org