You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "ksmatharoo (via GitHub)" <gi...@apache.org> on 2023/04/20 08:45:23 UTC
[GitHub] [iceberg] ksmatharoo commented on issue #1617: Support relative paths in Table Metadata

ksmatharoo commented on issue #1617:
URL: https://github.com/apache/iceberg/issues/1617#issuecomment-1515953429

   > I don't think it is a good idea in general to use relative paths. We recently had an issue where using a `hdfs` location without authority caused a user's data to be deleted by the `RemoveOrphanFiles` action because the resolution of the table root changed. The main problem is that places in Iceberg would need to have some idea of "equivalent" paths and path resolution. Full URIs are much easier to work with and more reliable.
   > 
   > But there is still a way to do both. Catalogs and tables can inject their own `FileIO` implementation, which is what is used to open files. That can do any resolution that you want based on environment. So you could use an implementation that allows you to override a portion of the file URI and read it from a different underlying location. I think that works better overall because there are no mistakes about equivalent URIs, but you can still read a table copy without rewriting the metadata.
   
   @rdblue  We tried injecting own FileIO which will replace the table/metadata path prefix with new location, this works till reading table metadata but while reading parquet files it struck in BatchDataReader.java in following function, Please provide your thoughts on this if there is some other way of achieving this.
    
   
   
   
   protected CloseableIterator<ColumnarBatch> open(FileScanTask task) {
       String filePath = task.file().path().toString();
       LOG.debug("Opening data file {}", filePath);
   
       // update the current file for Spark's filename() function
       InputFileBlockHolder.set(filePath, task.start(), task.length());
   
       Map<Integer, ?> idToConstant = constantsMap(task, expectedSchema());
   
      /*** below given code line is causing issue because its searching name given in metadata in the map which will be replaced by custom FileIO, changing this line to InputFile inputFile = table.io().newInputFile(filePath); is making it work but this remove the encryption logic, in short we couldn't make it work with only Custom FileIO
   **/
   
       InputFile inputFile = getInputFile(filePath);
       Preconditions.checkNotNull(inputFile, "Could not find InputFile associated with FileScanTask");
       SparkDeleteFilter deleteFilter =
           task.deletes().isEmpty()
               ? null
               : new SparkDeleteFilter(filePath, task.deletes(), counter());
   
       return newBatchIterable(
               inputFile,
               task.file().format(),
               task.start(),
               task.length(),
               task.residual(),
               idToConstant,
               deleteFilter)
           .iterator();
     }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org