You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by xi...@apache.org on 2021/02/03 07:54:13 UTC

[incubator-pinot] 01/01: Fix the issue in Pinot Spark ingestion job to handle listing input files with correct scheme in URI

This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a commit to branch fixing_spark_path_validation
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit a039de3055d7ecf831a84312f4eb84a97e41fc57
Author: Xiang Fu <fx...@gmail.com>
AuthorDate: Tue Feb 2 23:53:48 2021 -0800

    Fix the issue in Pinot Spark ingestion job to handle listing input files with correct scheme in URI
---
 .../plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
index eeeceb2..387674f 100644
--- a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
+++ b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
@@ -186,7 +186,9 @@ public class SparkSegmentGenerationJobRunner implements IngestionJobRunner, Seri
         }
       }
       if (!inputDirFS.isDirectory(new URI(file))) {
-        filteredFiles.add(file);
+        // In case PinotFS implementations list files without a protocol, then we may lose the schema (e.g. hdfs://)
+        // portion of the path. Call getFileURI() to fix this up.
+        filteredFiles.add(SegmentGenerationUtils.getFileURI(file, inputDirURI).toString());
       }
     }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org