You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/07 07:19:07 UTC

[GitHub] [iceberg] wangxujin1221 opened a new issue #3858: How do I quickly traverse a very large table？

wangxujin1221 opened a new issue #3858:
URL: https://github.com/apache/iceberg/issues/3858


   Hi team,
   
   I'm new to iceberg, and i have a question about query big table. 
   
   We have a Hive table with a total of 3.6 million records and 120 fields per record. and we want to transfer all the records in this table to other databases, such as pg, kafak, etc. 
   
   Currently we do like this:
   ` 
   Dataset<Row> dataset = connection.client.read().format("iceberg").load("default.table");
               // here will  stuck for a very long time
               dataset.foreachPartition(par ->{
                   par.forEachRemaining(row ->{
                      ```
                   });
               });
   `
   but it can get stuck for a long time in the foreach process.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on issue #3858: How do I quickly traverse a very large table？

Posted by GitBox <gi...@apache.org>.

RussellSpitzer commented on issue #3858:
URL: https://github.com/apache/iceberg/issues/3858#issuecomment-1012321440


   I would look at the Spark Connectors for those other systems, it should be much more efficient than writing your own sinks in the foreach. For example
   
   spark.read.format("iceberg")....write.format("jdbc) ....


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] wangxujin1221 edited a comment on issue #3858: How do I quickly traverse a very large table？

Posted by GitBox <gi...@apache.org>.

wangxujin1221 edited a comment on issue #3858:
URL: https://github.com/apache/iceberg/issues/3858#issuecomment-1011057333


   @powerzhangquan I think the key is your table  has too many small files, so it is very slowly to scan it. You can check you partition rule or user `df.repartition()` to decrease the num of spark partitions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] wangxujin1221 commented on issue #3858: How do I quickly traverse a very large table？

Posted by GitBox <gi...@apache.org>.

wangxujin1221 commented on issue #3858:
URL: https://github.com/apache/iceberg/issues/3858#issuecomment-1011057333


   @powerzhangquan I think the key is your table  has too many small files, so it is very slowly to scan it. You can check you partition rule or user `df.repartition()` to decrease the spark partitions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] wangxujin1221 closed issue #3858: How do I quickly traverse a very large table？

Posted by GitBox <gi...@apache.org>.

wangxujin1221 closed issue #3858:
URL: https://github.com/apache/iceberg/issues/3858


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] powerzhangquan commented on issue #3858: How do I quickly traverse a very large table？

Posted by GitBox <gi...@apache.org>.

powerzhangquan commented on issue #3858:
URL: https://github.com/apache/iceberg/issues/3858#issuecomment-1011013109


   hi wangxujin1221，I have a similar scenario, do you have a good way


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org