You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/06 19:44:08 UTC

[GitHub] [iceberg] rdblue opened a new pull request, #5214: Spark: Add the query ID to file names

rdblue opened a new pull request, #5214:
URL: https://github.com/apache/iceberg/pull/5214

   Spark writes should use the query ID in filenames, but it was not correctly passed through, causing each writer to use a different UUID. Adding the UUID to filenames should allow identifying files from the same write.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #5214: Spark: Add the query ID to file names

Posted by GitBox <gi...@apache.org>.
nastra commented on PR #5214:
URL: https://github.com/apache/iceberg/pull/5214#issuecomment-1379936898

   Closing this one as it's been superseded by https://github.com/apache/iceberg/pull/6569


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5214: Spark: Add the query ID to file names

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #5214:
URL: https://github.com/apache/iceberg/pull/5214#discussion_r915218890


##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##########
@@ -622,6 +625,7 @@ public DataWriter<InternalRow> createWriter(int partitionId, long taskId, long e
       FileIO io = table.io();
 
       OutputFileFactory fileFactory = OutputFileFactory.builderFor(table, partitionId, taskId)
+          .operationId(queryId)

Review Comment:
   [question] should we also do it for [SparkPositionDeltaWrite](https://github.com/apache/iceberg/blob/master/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java#L321-L326) used in mor.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra closed pull request #5214: Spark: Add the query ID to file names

Posted by GitBox <gi...@apache.org>.
nastra closed pull request #5214: Spark: Add the query ID to file names
URL: https://github.com/apache/iceberg/pull/5214


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5214: Spark: Add the query ID to file names

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5214:
URL: https://github.com/apache/iceberg/pull/5214#issuecomment-1176623567

   @aokolnychyi, FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5214: Spark: Add the query ID to file names

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #5214:
URL: https://github.com/apache/iceberg/pull/5214#discussion_r915218890


##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##########
@@ -622,6 +625,7 @@ public DataWriter<InternalRow> createWriter(int partitionId, long taskId, long e
       FileIO io = table.io();
 
       OutputFileFactory fileFactory = OutputFileFactory.builderFor(table, partitionId, taskId)
+          .operationId(queryId)

Review Comment:
   [question] should we also do it for [SparkPositionDeltaWrite](https://github.com/apache/iceberg/blob/master/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java#L321-L326) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5214: Spark: Add the query ID to file names

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5214:
URL: https://github.com/apache/iceberg/pull/5214#discussion_r915265942


##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##########
@@ -622,6 +625,7 @@ public DataWriter<InternalRow> createWriter(int partitionId, long taskId, long e
       FileIO io = table.io();
 
       OutputFileFactory fileFactory = OutputFileFactory.builderFor(table, partitionId, taskId)
+          .operationId(queryId)

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org