You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by xi...@apache.org on 2021/02/03 07:54:12 UTC
[incubator-pinot] branch fixing_spark_path_validation created (now
a039de3)
This is an automated email from the ASF dual-hosted git repository.
xiangfu pushed a change to branch fixing_spark_path_validation
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.
at a039de3 Fix the issue in Pinot Spark ingestion job to handle listing input files with correct scheme in URI
This branch includes the following new commits:
new a039de3 Fix the issue in Pinot Spark ingestion job to handle listing input files with correct scheme in URI
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[incubator-pinot] 01/01: Fix the issue in Pinot Spark ingestion job
to handle listing input files with correct scheme in URI
Posted by xi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
xiangfu pushed a commit to branch fixing_spark_path_validation
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git
commit a039de3055d7ecf831a84312f4eb84a97e41fc57
Author: Xiang Fu <fx...@gmail.com>
AuthorDate: Tue Feb 2 23:53:48 2021 -0800
Fix the issue in Pinot Spark ingestion job to handle listing input files with correct scheme in URI
---
.../plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
index eeeceb2..387674f 100644
--- a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
+++ b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
@@ -186,7 +186,9 @@ public class SparkSegmentGenerationJobRunner implements IngestionJobRunner, Seri
}
}
if (!inputDirFS.isDirectory(new URI(file))) {
- filteredFiles.add(file);
+ // In case PinotFS implementations list files without a protocol, then we may lose the schema (e.g. hdfs://)
+ // portion of the path. Call getFileURI() to fix this up.
+ filteredFiles.add(SegmentGenerationUtils.getFileURI(file, inputDirURI).toString());
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org