You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "dramaticlly (via GitHub)" <gi...@apache.org> on 2023/02/09 00:21:50 UTC

[GitHub] [iceberg] dramaticlly opened a new issue, #6780: Spark AddFiles infer incorrect partition type when reading parquet files

dramaticlly opened a new issue, #6780:
URL: https://github.com/apache/iceberg/issues/6780

   ### Apache Iceberg version
   
   0.14.0
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Parquet File Layout
   
   ```
   s3a://bucket/warehouse/foo.db/bar/data/
   .
   ├── date=2023-01-23
   │   ├── hour=00
   │   │   ├── 00000-0-99f4bf58-a862-4244-959b-a6da97155c46-00261.parquet
   │   │   ├── 00001-0-602012de-867f-4083-b179-9bb583bf3377-00269.parquet
   │   │   ├── 00004-0-6086a50c-604d-4916-b675-df5c3cdc59d8-00275.parquet
   │   │   ├── 00005-0-8d1acb73-1589-4286-9c6e-38146ad6e81d-00255.parquet
   │   │   ├── 00006-0-b1b01658-d19e-4149-95df-cb48fa1220a4-00268.parquet
   │   │   └── 00007-0-f5bd568d-a22b-4a2d-b1ac-78e1bd7e1d49-00261.parquet
   │   ├── hour=01
   │   │   ├── 00000-0-99f4bf58-a862-4244-959b-a6da97155c46-00262.parquet
   │   │   ├── 00001-0-602012de-867f-4083-b179-9bb583bf3377-00270.parquet
   │   │   ├── 00004-0-6086a50c-604d-4916-b675-df5c3cdc59d8-00276.parquet
   │   │   ├── 00005-0-8d1acb73-1589-4286-9c6e-38146ad6e81d-00256.parquet
   │   │   ├── 00006-0-b1b01658-d19e-4149-95df-cb48fa1220a4-00269.parquet
   │   │   └── 00007-0-f5bd568d-a22b-4a2d-b1ac-78e1bd7e1d49-00262.parquet
   ```
   
   Spark SQL to use add-files procedure
   
   ```scala
   val sql = """
   |CALL iceberg.system.add_files(
   |table =>'foo.bar',
   |source_table => '`parquet`.`s3a://bucket/warehouse/foo.db/bar/data/`', 
   |check_duplicate_files => false, 
   |partition_filter => map('date','2023-01-23','hour','00') )"""
   spark.sql(sql).show
   ```
   
   This ends up adding 6 external parquet files into iceberg table but with incorrect partition structure
   
   - seeing `date=2023-01-23/hour=0`
   
   - expecting `date=2023-01-23/hour=00`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dramaticlly closed issue #6780: Spark AddFiles infer incorrect partition type when reading parquet files

Posted by "dramaticlly (via GitHub)" <gi...@apache.org>.
dramaticlly closed issue #6780: Spark AddFiles infer incorrect partition type when reading parquet files
URL: https://github.com/apache/iceberg/issues/6780


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dramaticlly commented on issue #6780: Spark AddFiles infer incorrect partition type when reading parquet files

Posted by "dramaticlly (via GitHub)" <gi...@apache.org>.
dramaticlly commented on issue #6780:
URL: https://github.com/apache/iceberg/issues/6780#issuecomment-1423414393

   I believe @abmo-x is helping fix it in #6779 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org