You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/11/12 07:30:16 UTC

[GitHub] [incubator-iceberg] manishmalhotrawork commented on issue #480: Don't group files with different partition specs in BaseRewriteManifests

manishmalhotrawork commented on issue #480: Don't group files with different partition specs in BaseRewriteManifests
URL: https://github.com/apache/incubator-iceberg/issues/480#issuecomment-552770181
 
 
   @aokolnychyi trying to understand the problem and thinking loud for solution. 
   
   so, would the fix be to group manifests by partition spec before cluster by `clusterFunction`.
   so for group of each partition spec's manifest files, the cluster by logic will be applied, where currently its applied for every manifests irrespective of the underlying manifests partition spec.
   
   ```              try (ManifestReader reader =
                        ManifestReader.read(ops.io().newInputFile(manifest.path()), ops.current().specsById())) {
                   FilteredManifest filteredManifest = reader.select(Arrays.asList("*"));
                   filteredManifest.liveEntries().forEach(
                       entry -> appendEntry(entry, clusterByFunc.apply(entry.file()))
                   );
   ```
   Manifest files grouped by partition spec is done in `MergingSnapshotProducer` at [this line](https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L269). So similar logic can be applied.
   
   One more interesting point, 
   as per [this line](https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java#L242) ManifestFileWriter is created using the `TableOperations.current.spec` which will be used to create all `ManifestWriters`, though `ManifestReader` is created based on the `partitionSpec` belongs to the Manifest File, as per [this line](https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java#L176)
   
   Not sure if my understanding is correct or not, So that is also part of the problem ?
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org