You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/18 15:27:11 UTC

[GitHub] [iceberg] rlcyf opened a new issue #2351: trion very slow

rlcyf opened a new issue #2351:
URL: https://github.com/apache/iceberg/issues/2351


   iceberg 0.11.0
   trino 351
   
   trino worker*3
   16CORE 32G RAM
   
   ```
   trino:odsx1> select count(*) from sample03;
    _col0 
   -------
        9 
   (1 row)
   
   Query 20210318_152248_00001_5mzin, FINISHED, 3 nodes
   Splits: 78 total, 78 done (100.00%)
   4.11 [9 rows, 5.62KB] [2 rows/s, 1.37KB/s]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on issue #2351: trion very slow

Posted by GitBox <gi...@apache.org>.
yyanyy commented on issue #2351:
URL: https://github.com/apache/iceberg/issues/2351#issuecomment-803115995


   I don't think I have, but we are mostly using our internal repository based on glue catalog and S3FileIO so our experiences wouldn't be the same. @jackye1995 did you encounter this before? 
   
   I think this performance problem could be associated with other factors as well; how is the data/table look like, did you try the same query in sparksql or other engines? Is this a customized trino server, since I think trino 351 only has Iceberg 0.9?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rlcyf commented on issue #2351: trion very slow

Posted by GitBox <gi...@apache.org>.
rlcyf commented on issue #2351:
URL: https://github.com/apache/iceberg/issues/2351#issuecomment-807887013


   > @yyanyy @jackye1995 Did you encounter the similar issue ?
   
   halo 
   
   sure_date range 2018-2021
   
   ```
   val partitionSpec = PartitionSpec.builderFor(schema).day("sure_date").build()
   
   java.lang.IllegalStateException: Already closed files for partition: sure_date_day=2020-09-03
   
   
   val partitionSpec = PartitionSpec.builderFor(schema).month("sure_date").build()
   
   java.lang.IllegalStateException: Already closed files for partition: sure_date_month=2020-09
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rlcyf commented on issue #2351: trion very slow

Posted by GitBox <gi...@apache.org>.
rlcyf commented on issue #2351:
URL: https://github.com/apache/iceberg/issues/2351#issuecomment-805493646


   ```
   spark.sql("CREATE TABLE tcsa.ods.sample3 (id bigint COMMENT 'unique id',data string) USING iceberg").show
   ```
   
   ```
   hive.metastore.uri=thrift://hive:9083
   hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml,/etc/hive/conf/hive-site.xml
   iceberg.file-format=Parquet
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on issue #2351: trion very slow

Posted by GitBox <gi...@apache.org>.
openinx commented on issue #2351:
URL: https://github.com/apache/iceberg/issues/2351#issuecomment-802593485


   @yyanyy @jackye1995 Did you encounter the similar issue ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #2351: trion very slow

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #2351:
URL: https://github.com/apache/iceberg/issues/2351#issuecomment-803154665


   I think this is too little information, there are many conditions of the cluster that can cause this in rare situations, could you at least rerun the query many times and see if it can be reproduced?
   
   Also could you also describe information around the table itself, its partition scheme, size, etc.?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #2351: trion very slow

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #2351:
URL: https://github.com/apache/iceberg/issues/2351#issuecomment-805223711


   It's also suspicious that there were 78 splits, but only 9 rows. I'd be interested to see the contents of the files metadata table for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy edited a comment on issue #2351: trion very slow

Posted by GitBox <gi...@apache.org>.
yyanyy edited a comment on issue #2351:
URL: https://github.com/apache/iceberg/issues/2351#issuecomment-803115995


   I don't think I have, but we have our internal implementation so our experiences wouldn't be the same. @jackye1995 did you encounter this before? 
   
   I think this performance problem could be associated with other factors as well; how is the data/table look like, did you try the same query in sparksql or other engines? Is this a customized trino server, since I think trino 351 only has Iceberg 0.9?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org