You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/06 12:05:40 UTC

[GitHub] [iceberg] flyxu1991 opened a new issue #3853: Structed Streaming read iceberg use yarn resources overhead

flyxu1991 opened a new issue #3853:
URL: https://github.com/apache/iceberg/issues/3853


   use spark3.0 Structed Streaming to read iceberg table, i just allocated 2 executors, one per core, but the application actually using yarn 181 cores when executing per-batch task, and when per-batch task finished it falls down 2 cores.
   Here is my code:
   spark.readStream()
                   .format("iceberg")
                   .load("/merge_into_test_a")
                   .repartition(1)
                   .selectExpr("name", "to_timestamp(ts) AS ts")
                   .withWatermark("ts", "10 seconds")
                   .groupBy(functions.window(functions.col("ts"), "1 day"), functions.col("name"))
                   .count()
                   .writeStream()
                   .outputMode("update")
                   .format("console")
                   .option("checkpointLocation", "/structed_streaming_iceberg_window_test")
                   .trigger(Trigger.ProcessingTime(5, TimeUnit.SECONDS))
                   .start()
                   .awaitTermination();
   
   Here is spark-submit shell:
   ./spark-submit \
   --class main.SparkWindowTest \
   --master yarn \
   --deploy-mode cluster \
   --driver-memory 1g \
   --num-executors 2 \
   --executor-memory 1g \
   --executor-cores 1
   
   I'v find SparkReadOptions, but i don't know how to control the iceberg source parallelism, may somebody could help me to find the problem.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyxu1991 edited a comment on issue #3853: Structed Streaming read iceberg use yarn resources overhead

Posted by GitBox <gi...@apache.org>.
flyxu1991 edited a comment on issue #3853:
URL: https://github.com/apache/iceberg/issues/3853#issuecomment-1007086191


   @RussellSpitzer @SreeramGarlapati may you help me to have a look at this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyxu1991 closed issue #3853: Structed Streaming read iceberg use yarn resources overhead

Posted by GitBox <gi...@apache.org>.
flyxu1991 closed issue #3853:
URL: https://github.com/apache/iceberg/issues/3853


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyxu1991 commented on issue #3853: Structed Streaming read iceberg use yarn resources overhead

Posted by GitBox <gi...@apache.org>.
flyxu1991 commented on issue #3853:
URL: https://github.com/apache/iceberg/issues/3853#issuecomment-1007086191


   @RussellSpitzer may you help me to have a look at this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyxu1991 closed issue #3853: Structed Streaming read iceberg use yarn resources overhead

Posted by GitBox <gi...@apache.org>.
flyxu1991 closed issue #3853:
URL: https://github.com/apache/iceberg/issues/3853


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org