You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/01/09 05:07:25 UTC

[GitHub] [incubator-iceberg] shawnding opened a new issue #729: SparkBatchWrite partition write should correctly row grouped

shawnding opened a new issue #729: SparkBatchWrite partition write should correctly row grouped
URL: https://github.com/apache/incubator-iceberg/issues/729
 
 
   While create a table use this statement:
   `CREATE TABLE test(id Int, data String) PARTITIONED BY (data)`
   
   When use the SparkBatchWrite write the data into iceberg like that:
   ```
   String Base = "";
   
   for( int i = 0; i < 5000; i ++ ) {
     Random rnd = new Random();
     char c = (char) (rnd.nextInt(26) + 'a');
     Base = Base + "(" + i + ", '" + c + "'),";
   }
   
   spark.sql("INSERT INTO " + CATALOG_DB_TABLE + " VALUES " + Base + "(1, 'a')");
   ```
   The  String `Base` cannot guarantee `data` grouped,  so iceberg throw a `IllegalStateException` in this code: 
   
   ```
   if (completedPartitions.contains(key)) {
     // if rows are not correctly grouped, detect and fail the write
      PartitionKey existingKey = Iterables.find(completedPartitions, key::equals, null);
      LOG.warn("Duplicate key: {} == {}", existingKey, key);
     throw new IllegalStateException("Already closed files for partition: " + key.toPath());
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] shawnding commented on issue #729: SparkBatchWrite partition write should correctly row grouped

Posted by GitBox <gi...@apache.org>.
shawnding commented on issue #729: SparkBatchWrite partition write should correctly row grouped
URL: https://github.com/apache/incubator-iceberg/issues/729#issuecomment-573985036
 
 
   OK I close this issue

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] shawnding closed issue #729: SparkBatchWrite partition write should correctly row grouped

Posted by GitBox <gi...@apache.org>.
shawnding closed issue #729: SparkBatchWrite partition write should correctly row grouped
URL: https://github.com/apache/incubator-iceberg/issues/729
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] jerryshao commented on issue #729: SparkBatchWrite partition write should correctly row grouped

Posted by GitBox <gi...@apache.org>.
jerryshao commented on issue #729: SparkBatchWrite partition write should correctly row grouped
URL: https://github.com/apache/incubator-iceberg/issues/729#issuecomment-572405783
 
 
   I think this is duplicated to #717 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org