You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/31 14:03:28 UTC

[GitHub] [iceberg] lvyanquan opened a new issue, #5399: Error happened after deleting a partitioned column

lvyanquan opened a new issue, #5399:
URL: https://github.com/apache/iceberg/issues/5399

   error message:
   ```
    Caused by: java.lang.NullPointerException: Cannot find source column: 3
   	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:953) ~[iceberg-bundled-guava-0.13.2.jar:na]
   	at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:503) ~[iceberg-api-0.13.2.jar:na]
   	at org.apache.iceberg.PartitionSpecParser.buildFromJsonFields(PartitionSpecParser.java:155) ~[iceberg-core-0.13.2.jar:na]
   	at org.apache.iceberg.PartitionSpecParser.fromJson(PartitionSpecParser.java:78) ~[iceberg-core-0.13.2.jar:na]
   	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:357) ~[iceberg-core-0.13.2.jar:na]
   	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:288) ~[iceberg-core-0.13.2.jar:na]
   ```
   json of metadata file contains information of schemas/partition-specs/sort-orders. 
   But there is no link between schemas and partition-specs, thus deleting a partitioned column will raise error while building history partition-specs, because source-id could not be found in current schema. I think that schema-id should be add to json of partition-specs.
   part of metadata file:
   ```
       "last-column-id":3,
       "current-schema-id":1,
       "schemas":[
           {
               "type":"struct",
               "schema-id":0,
               "fields":[
                   {
                       "id":1,
                       "name":"name1",
                       "required":false,
                       "type":"string"
                   },
                   {
                       "id":2,
                       "name":"name2",
                       "required":false,
                       "type":"string"
                   },
                   {
                       "id":3,
                       "name":"name3",
                       "required":false,
                       "type":"string"
                   }
               ]
           },
           {
               "type":"struct",
               "schema-id":1,
               "fields":[
                   {
                       "id":1,
                       "name":"name1",
                       "required":false,
                       "type":"string"
                   },
                   {
                       "id":2,
                       "name":"name2",
                       "required":false,
                       "type":"string"
                   }
               ]
           }
       ],
       "default-spec-id":1,
       "partition-specs":[
           {
               "spec-id":0,
               "fields":[
                   {
                       "name":"name3",
                       "transform":"identity",
                       "source-id":3,
                       "field-id":1000
                   }
               ]
           },
           {
               "spec-id":1,
               "fields":[
   
               ]
           }
       ],
       "last-partition-id":1000
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] lvyanquan commented on issue #5399: Error happened after deleting a partitioned column

Posted by GitBox <gi...@apache.org>.
lvyanquan commented on issue #5399:
URL: https://github.com/apache/iceberg/issues/5399#issuecomment-1201926717

   we can reproduce this error using the following sql (spark3.2, iceberg0.13 or 0.14), prod is the name of catalog:
   ```
   CREATE TABLE prod.db.sample (id bigint, data string, category string) USING iceberg PARTITIONED BY (category) TBLPROPERTIES('format-version' = '2');
   
   ALTER TABLE prod.db.sample DROP PARTITION FIELD category;
   
   ALTER TABLE  prod.db.sample DROP COLUMN category;
   ```
   Even though I deleted this column using JAVA API,  I met NullPointerException when using this table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on issue #5399: Error happened after deleting a partitioned column

Posted by GitBox <gi...@apache.org>.
nastra commented on issue #5399:
URL: https://github.com/apache/iceberg/issues/5399#issuecomment-1253521318

   just fyi that we're tracking the same issue in https://github.com/apache/iceberg/issues/5676


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed issue #5399: Error happened after deleting a partitioned column

Posted by GitBox <gi...@apache.org>.
rdblue closed issue #5399: Error happened after deleting a partitioned column 
URL: https://github.com/apache/iceberg/issues/5399


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on issue #5399: Error happened after deleting a partitioned column

Posted by GitBox <gi...@apache.org>.
Fokko commented on issue #5399:
URL: https://github.com/apache/iceberg/issues/5399#issuecomment-1253666472

   Hey all, I have a PR ready: https://github.com/apache/iceberg/pull/5707 This doesn't lookup the historical columns anymore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] lvyanquan commented on issue #5399: Error happened after deleting a partitioned column

Posted by GitBox <gi...@apache.org>.
lvyanquan commented on issue #5399:
URL: https://github.com/apache/iceberg/issues/5399#issuecomment-1201931070

   > @lvyanquan:
   > 
   > Do you have a testcase or sample SQL to reproduce this? As I want to know when do we build the `history partition-specs`
   
   we can reproduce this error using the following sql (spark3.2, iceberg0.13 or 0.14), prod is the name of catalog:
   
   CREATE TABLE prod.db.sample (id bigint, data string, category string) USING iceberg PARTITIONED BY (category) TBLPROPERTIES('format-version' = '2');
   
   ALTER TABLE prod.db.sample DROP PARTITION FIELD category;
   
   ALTER TABLE  prod.db.sample DROP COLUMN category;
   Even though I deleted this column using JAVA API, I met NullPointerException when using this table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on issue #5399: Error happened after deleting a partitioned column

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on issue #5399:
URL: https://github.com/apache/iceberg/issues/5399#issuecomment-1201391911

   @lvyanquan:
   
   Do you have a testcase or sample SQL to reproduce this? 
   As I want to know when do we build the `history partition-specs`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on issue #5399: Error happened after deleting a partitioned column

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on issue #5399:
URL: https://github.com/apache/iceberg/issues/5399#issuecomment-1253239248

   Update:
   Just a different exception in the latest code. But the problem still exist
   
   ```
   Cannot find source column for partition field: 1000: category: identity(3)
   org.apache.iceberg.exceptions.ValidationException: Cannot find source column for partition field: 1000: category: identity(3)
   	at org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
   	at org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:558)
   	at org.apache.iceberg.PartitionSpec$Builder.build(PartitionSpec.java:546)
   	at org.apache.iceberg.UnboundPartitionSpec.bind(UnboundPartitionSpec.java:45)
   	at org.apache.iceberg.PartitionSpecParser.fromJson(PartitionSpecParser.java:85)
   	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:390)
   	at org.apache.iceberg.TableMetadataParser.fromJson(TableMetadataParser.java:311)
   	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:274)
   	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:267)
   	at org.apache.iceberg.hadoop.HadoopTableOperations.updateVersionAndMetadata(HadoopTableOperations.java:98)
   	at org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:121)
   	at org.apache.iceberg.hadoop.HadoopTableOperations.current(HadoopTableOperations.java:84)
   	at org.apache.iceberg.BaseTable.properties(BaseTable.java:119)
   	at org.apache.iceberg.spark.source.SparkTable.<init>(SparkTable.java:128)
   	at org.apache.iceberg.spark.source.SparkTable.<init>(SparkTable.java:118)
   	at org.apache.iceberg.spark.SparkCatalog.alterTable(SparkCatalog.java:290)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org