You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/06/12 20:09:39 UTC

[GitHub] [hudi] tooptoop4 opened a new issue #1731: [SUPPORT] how fast (total time) to update 1 row?

tooptoop4 opened a new issue #1731:
URL: https://github.com/apache/hudi/issues/1731


   from slack
   
   
   me
   how fast can u update a single row? including time to trigger spark-submit, unless there is faster way
   
   Sudha
   there is a performance page to give you idea - https://hudi.apache.org/docs/performance.html . Hope thats useful.
   hudi.apache.orghudi.apache.org
   Performance
   In this section, we go over some real world performance numbers for Hudi upserts, incremental pull and compare them againstthe conventional alternatives for achieving these tasks.
   
   me
   @Sudha it doesn't help, I want to know if Hudi can update 1 single row in less than 10 seconds?
   
   Sudha
   Not sure if there is a straight answer to your question. Especially when you are considering spark submit trigger times etc. Usually spark-submit uploads jars and if there are many jars, it could take in the order of minutes. It would be useful to create a Github issue what is your exact use case so we can point you to the right set of configs in Hudi to achieve that.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #1731: [SUPPORT] how fast (total time) to update 1 row?

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #1731:
URL: https://github.com/apache/hudi/issues/1731#issuecomment-644344181


   That's for the delta folks to answer :) .. If you are rewriting parquet files or generating new parquet file on each write, there is nothing fundamentally different any other system can do here.. All databases or even data warehouses you are comparing to, have long running servers with some metadata/data loaded into memory, to help with such fast updates.. 
   
   livy is a long running server.. which already has a spark application running, unlike issuing spark-submit everytime.. ofc if you use livy or zeppelin, that overhead goes away. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tooptoop4 commented on issue #1731: [SUPPORT] how fast (total time) to update 1 row?

Posted by GitBox <gi...@apache.org>.

tooptoop4 commented on issue #1731:
URL: https://github.com/apache/hudi/issues/1731#issuecomment-643592554


   how about via livy could it avoid the 30 seconds? i believe delta promise fast 1 row update via sql


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar closed issue #1731: [SUPPORT] how fast (total time) to update 1 row?

Posted by GitBox <gi...@apache.org>.

vinothchandar closed issue #1731:
URL: https://github.com/apache/hudi/issues/1731


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar edited a comment on issue #1731: [SUPPORT] how fast (total time) to update 1 row?

Posted by GitBox <gi...@apache.org>.

vinothchandar edited a comment on issue #1731:
URL: https://github.com/apache/hudi/issues/1731#issuecomment-643471123


   @tooptoop4 
   > including time to trigger spark-submit, unless there is faster way
   
   if you are targetting something like spark-submit, then its not under hudi's control. spark-submit alone can take anywhere from 30 seconds to minutes depending on various factors on your cluster, size of the jar you are uploading etc.. 
   
   on Hudi upsert it self, using Hudi to upsert 1 row at a time, is not an intended design. I recommend atleast batching input data for a minute or so, and issue upsert as a batch. In other words, Hudi (or anything like Hudi) is not really designed like MySQL.. Hope that helps.. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #1731: [SUPPORT] how fast (total time) to update 1 row?

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #1731:
URL: https://github.com/apache/hudi/issues/1731#issuecomment-643471123


   @tooptoop4 
   > including time to trigger spark-submit, unless there is faster way
   if you are targetting something like spark-submit, then its not under hudi's control. spark-submit alone can take anywhere from 30 seconds to minutes depending on various factors on your cluster, size of the jar you are uploading etc.. 
   
   on Hudi upsert it self, using Hudi to upsert 1 row at a time, is not an intended design. I recommend atleast batching input data for a minute or so, and issue upsert as a batch. In other words, Hudi (or anything like Hudi) is not really designed like MySQL.. Hope that helps.. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org