You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/01/26 00:41:46 UTC

[GitHub] [incubator-hudi] bhasudha opened a new pull request #1279: [HUDI-577] update docker demo page and quick start pages

bhasudha opened a new pull request #1279: [HUDI-577] update docker demo page and quick start pages
URL: https://github.com/apache/incubator-hudi/pull/1279
 
 
   Summary:
   - contains changes that reflect renaming of terminologies to be in sync wth CWiki
   - contains doc changes pertaining to support of multiple scala versions
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [x ] Has a corresponding JIRA in PR title & commit
    
    - [x] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [x] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages

Posted by GitBox <gi...@apache.org>.
bhasudha commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages
URL: https://github.com/apache/incubator-hudi/pull/1279#discussion_r372707127
 
 

 ##########
 File path: docs/_docs/2_3_querying_data.md
 ##########
 @@ -148,5 +148,5 @@ Additionally, `HoodieReadClient` offers the following functionality using Hudi's
 
 ## Presto
 
-Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized querying on Hudi tables. 
+Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized queries on Hudi tables. 
 
 Review comment:
   > > Presto currently supports only read optimized queries
   > 
   > this is snapshot query right.. Can you please make another pass for any lines we missed like this
   
   @vinothchandar I looked at this. This is my understanding. Please correct if I am wrong.
   When we refer to "snapshot" it means  queries can see data committed as of that point in time. For COW table type, this refers to latest parquet files for each file group. For MOR table type snapshot queries can see real-time data merged from base Parquet file and delta log file (AVRO files). Wrt Presto specifically, we are not supporting reading from avro files yet (referring to https://issues.apache.org/jira/browse/HUDI-305) . So I kept this as ReadOptimized instead of Snapshot.  Later when we support this, we can switch to Snapshot query support in Presto. What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha merged pull request #1279: [HUDI-577] update docker demo page and quick start pages

Posted by GitBox <gi...@apache.org>.
bhasudha merged pull request #1279: [HUDI-577] update docker demo page and quick start pages
URL: https://github.com/apache/incubator-hudi/pull/1279
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages
URL: https://github.com/apache/incubator-hudi/pull/1279#discussion_r371022093
 
 

 ##########
 File path: docs/_docs/2_3_querying_data.md
 ##########
 @@ -148,5 +148,5 @@ Additionally, `HoodieReadClient` offers the following functionality using Hudi's
 
 ## Presto
 
-Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized querying on Hudi tables. 
+Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized queries on Hudi tables. 
 
 Review comment:
   >Presto currently supports only read optimized queries
   
   this is snapshot query right.. Can you please make another pass for any lines we missed like this 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages

Posted by GitBox <gi...@apache.org>.
bhasudha commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages
URL: https://github.com/apache/incubator-hudi/pull/1279#discussion_r371403506
 
 

 ##########
 File path: docs/_docs/2_3_querying_data.md
 ##########
 @@ -148,5 +148,5 @@ Additionally, `HoodieReadClient` offers the following functionality using Hudi's
 
 ## Presto
 
-Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized querying on Hudi tables. 
+Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized queries on Hudi tables. 
 
 Review comment:
   > @bhasudha Thanks for undertaking this! Feels great to have help with docs :)
   > 
   > Please take a another read of all the pages, specifically questioning each mention of "Read optimized" and ask if we really mean "snapshot" ..
   > 
   > Otherwise LGTM
   
   Sure. Let me check and update the PR

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages
URL: https://github.com/apache/incubator-hudi/pull/1279#discussion_r373286733
 
 

 ##########
 File path: docs/_docs/2_3_querying_data.md
 ##########
 @@ -148,5 +148,5 @@ Additionally, `HoodieReadClient` offers the following functionality using Hudi's
 
 ## Presto
 
-Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized querying on Hudi tables. 
+Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized queries on Hudi tables. 
 
 Review comment:
   +1 to keep this as ReadOptimized.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages
URL: https://github.com/apache/incubator-hudi/pull/1279#discussion_r371022046
 
 

 ##########
 File path: docs/_docs/2_5_performance.md
 ##########
 @@ -43,7 +43,7 @@ For e.g , with 100M timestamp prefixed keys (5% updates, 95% inserts) on a event
 
 ## Read Optimized Queries
 
-The major design goal for read optimized querying is to achieve the latency reduction & efficiency gains in previous section,
+The major design goal for read optimized queries is to achieve the latency reduction & efficiency gains in previous section,
 
 Review comment:
   please change "Read optimized" to "snapshot" here..

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on issue #1279: [HUDI-577] update docker demo page and quick start pages

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1279: [HUDI-577] update docker demo page and quick start pages
URL: https://github.com/apache/incubator-hudi/pull/1279#issuecomment-578475098
 
 
   @bhasudha : Can you check if the usage of deltastreamer in https://hudi.incubator.apache.org/docs/writing_data.html#deltastreamer is also up to date with recent changes in the DeltaStreamer CLI.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages

Posted by GitBox <gi...@apache.org>.
lamber-ken commented on a change in pull request #1279: [HUDI-577] update docker demo page and quick start pages
URL: https://github.com/apache/incubator-hudi/pull/1279#discussion_r371075582
 
 

 ##########
 File path: docs/_docs/1_1_quick_start_guide.md
 ##########
 @@ -16,10 +16,20 @@ Hudi works with Spark-2.x versions. You can follow instructions [here](https://s
 From the extracted directory run spark-shell with Hudi as:
 
 ```scala
-bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating \
+spark-2.4.4-bin-hadoop2.7/bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating,org.apache.spark:spark-avro_2.11:2.4.4 \
     --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
 ```
 
+<div class="notice--info">
 
 Review comment:
   Very smart! 👍 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bhasudha commented on issue #1279: [HUDI-577] update docker demo page and quick start pages

Posted by GitBox <gi...@apache.org>.
bhasudha commented on issue #1279: [HUDI-577] update docker demo page and quick start pages
URL: https://github.com/apache/incubator-hudi/pull/1279#issuecomment-580591695
 
 
   Thanks @leesf . Merging this now. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services