You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by xi...@apache.org on 2021/02/03 07:54:12 UTC

[incubator-pinot] branch fixing_spark_path_validation created (now a039de3)

This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a change to branch fixing_spark_path_validation
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


      at a039de3  Fix the issue in Pinot Spark ingestion job to handle listing input files with correct scheme in URI

This branch includes the following new commits:

     new a039de3  Fix the issue in Pinot Spark ingestion job to handle listing input files with correct scheme in URI

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[incubator-pinot] 01/01: Fix the issue in Pinot Spark ingestion job to handle listing input files with correct scheme in URI

Posted by xi...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a commit to branch fixing_spark_path_validation
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit a039de3055d7ecf831a84312f4eb84a97e41fc57
Author: Xiang Fu <fx...@gmail.com>
AuthorDate: Tue Feb 2 23:53:48 2021 -0800

    Fix the issue in Pinot Spark ingestion job to handle listing input files with correct scheme in URI
---
 .../plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
index eeeceb2..387674f 100644
--- a/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
+++ b/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/src/main/java/org/apache/pinot/plugin/ingestion/batch/spark/SparkSegmentGenerationJobRunner.java
@@ -186,7 +186,9 @@ public class SparkSegmentGenerationJobRunner implements IngestionJobRunner, Seri
         }
       }
       if (!inputDirFS.isDirectory(new URI(file))) {
-        filteredFiles.add(file);
+        // In case PinotFS implementations list files without a protocol, then we may lose the schema (e.g. hdfs://)
+        // portion of the path. Call getFileURI() to fix this up.
+        filteredFiles.add(SegmentGenerationUtils.getFileURI(file, inputDirURI).toString());
       }
     }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org