You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/10 05:57:12 UTC

[GitHub] [hudi] yihua opened a new pull request #4547: [MINOR] Fix performance table in marker blog

yihua opened a new pull request #4547:
URL: https://github.com/apache/hudi/pull/4547


   ## What is the purpose of the pull request
   
   Fix performance table content in marker blog.
   
   ## Verify this pull request
   
   The site can build and launch.
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar merged pull request #4547: [MINOR] Fix performance table in marker blog

Posted by GitBox <gi...@apache.org>.
vinothchandar merged pull request #4547:
URL: https://github.com/apache/hudi/pull/4547


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4547: [MINOR] Fix performance table in marker blog

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4547:
URL: https://github.com/apache/hudi/pull/4547#discussion_r781233530



##########
File path: website/blog/2021-08-18-improving-marker-mechanism.md
##########
@@ -61,11 +61,11 @@ We evaluate the write performance over both direct and timeline-server-based mar
 
 As shown below, direct marker mechanism works really well, when a part of the table is written, e.g., 1K out of 165K data files.  However, the time of direct marker operations is non-trivial when we need to write significant number of data files. Compared to the direct marker mechanism, the timeline-server-based marker mechanism generates much fewer files storing markers because of the batch processing, leading to much less time on marker-related I/O operations, thus achieving 31% lower write completion time compared to the direct marker file mechanism.
 
-| Marker Type |   Input data size   |  Num data files written | Files created for markers | Marker deletion time | Bulk Insert Time (including marker deletion) |
-| ----------- | --------- | :---------: | :---------: | :---------: | :---------: | 
-| Direct | 600MB | 1k | 1k | 5.4secs | - |
-| Direct | 100GB | 165k | 165k | 15min | 55min |
-| Timeline-server-based | 100GB | 165k | 20 | ~3s | 38min |
+| Marker Type | Total Files |  Num data files written | Files created for markers | Marker deletion time | Bulk Insert Time (including marker deletion) |
+| ----------- |-----------| :---------: | :---------: | :---------: | :---------: | 
+| Direct | 165k | 1k | 1k | 5.4secs | - |

Review comment:
       isn't total files in first row is 1k instead of 165k ? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org