You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/01/14 04:14:04 UTC

[GitHub] [incubator-hudi] nsivabalan opened a new pull request #1225: Adding util methods to assist in adding deletion support to Quick Start

nsivabalan opened a new pull request #1225: Adding util methods to assist in adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225
 
 
   Adding util methods to assist in adding deletion support to Quick Start
   
   ## Verify this pull request
   
   Latest master has issues w/ spark avro dependency. So, couldn't verify. But the code as such is not a prod code. It is just used in Quick start. 
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574922819
 
 
   @nsivabalan  please squash your commits into one with a clear commit message pre merge :) 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-575924333
 
 
   @vinothchandar : yeah, I thought the person who is merging the PR would squash. I didn't know that the author or PR is expected to squash.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1225: Adding util methods to assist in adding deletion support to Quick Start

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1225: Adding util methods to assist in adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574005188
 
 
   @nsivabalan can you add a `[MINOR]` prefix to your commit and PR? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start

Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574024580
 
 
   @nsivabalan will merge this once you are able to verify this method with quickstart steps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574834841
 
 
   @bhasudha : done. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574492697
 
 
   @bhasudha : I have changed the way we wanna generate deletes. Basically I pass in insert records for which delete records will be generated. If we go with previous approach of generating random deletes, I couldn't verify if deletes actually deleted some records. So, have taken this approach.
   
   Steps I plan to add to Quick start is as follows
   
   - Generate a new batch of inserts.
   - Fetch all records from this new batch (// fix the rider value below since each batch will have unique rider value)
   val ds = spark.sql("select uuid, partitionPath from  hudi_ro_table where rider = 'rider-213'")
   - Generate delete records
   val deletes = dataGen.generateDeletes(ds.collectAsList())
   - Issue deletes
   val df = spark.read.json(spark.sparkContext.parallelize(deletes, 2));
   df.write.format("org.apache.hudi").
       options(getQuickstartWriteConfigs).
       option(OPERATION_OPT_KEY,"delete").
       option(PRECOMBINE_FIELD_OPT_KEY, "ts").
       option(RECORDKEY_FIELD_OPT_KEY, "uuid").
       option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
       option(TABLE_NAME, tableName).
       mode(Append).
       save(basePath);
   
   - Same select query above should fetch 0 records since all records have been deleted. 
   spark.sql("select uuid, partitionPath from  hudi_ro_table where rider = 'rider-213'").count()
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-575925772
 
 
   Goes both ways, PMC/committers should check. 
   
   but its clearly documented here https://hudi.apache.org/contributing.html#contributing-code  that the contributor should squash .. Please read this more carefully

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start

Posted by GitBox <gi...@apache.org>.
leesf commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574947582
 
 
   > @nsivabalan please squash your commits into one with a clear commit message pre merge :)
   
   Also PMC/Committers would `squash and merge` the commits into one to make git history clear. :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start

Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574785486
 
 
   > @bhasudha : I have changed the way we wanna generate deletes. Basically I pass in insert records for which delete records will be generated. If we go with previous approach of generating random deletes, I couldn't verify if deletes actually deleted some records. So, have taken this approach.
   > 
   > Steps I plan to add to Quick start is as follows
   > 
   > * Generate a new batch of inserts.
   > * Fetch all records from this new batch (// fix the rider value below since each batch will have unique rider value)
   >   val ds = spark.sql("select uuid, partitionPath from  hudi_ro_table where rider = 'rider-213'")
   > * Generate delete records
   >   val deletes = dataGen.generateDeletes(ds.collectAsList())
   > * Issue deletes
   >   val df = spark.read.json(spark.sparkContext.parallelize(deletes, 2));
   >   df.write.format("org.apache.hudi").
   >   options(getQuickstartWriteConfigs).
   >   option(OPERATION_OPT_KEY,"delete").
   >   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
   >   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
   >   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
   >   option(TABLE_NAME, tableName).
   >   mode(Append).
   >   save(basePath);
   > * Same select query above should fetch 0 records since all records have been deleted.
   >   spark.sql("select uuid, partitionPath from  hudi_ro_table where rider = 'rider-213'").count()
   
   Plan sounds good. I think there are some checkystyle issues in the build. Once you fix I will be able to approve and merge.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha merged pull request #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start

Posted by GitBox <gi...@apache.org>.
bhasudha merged pull request #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services