You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/08/10 09:43:23 UTC

[GitHub] [incubator-pinot] KKcorps opened a new pull request #5836: Fix ingesting data from Amazon S3 bucket

KKcorps opened a new pull request #5836:
URL: https://github.com/apache/incubator-pinot/pull/5836


   This PR fixes the issue #5835 of ingesting data from S3.
   
   The fix involves the following changes 
   
   - Return full paths along with scheme in listFiles functions instead of path after the bucket. 
   This change is needed because the output of the listFiles in passed to isDirectory function which throws Exception since the path doesn't contain the bucket
   
   - Take only the path instead of the full URI in segment generation job runner while creating a local file.
   The segment generator throws the error `inputFileURI doesn't start with file://` on creating the local file. This small change of taking `inputFileURI.getPath()` instead of `inputFileURI` fixes this issue.
   
   The changes have been verified by running a standalone job to ingest as well as upload data in S3.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 merged pull request #5836: Fix data ingestion from Amazon S3 bucket

Posted by GitBox <gi...@apache.org>.
fx19880617 merged pull request #5836:
URL: https://github.com/apache/incubator-pinot/pull/5836


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on a change in pull request #5836: Fix data ingestion from Amazon S3 bucket

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on a change in pull request #5836:
URL: https://github.com/apache/incubator-pinot/pull/5836#discussion_r468148451



##########
File path: pinot-plugins/pinot-file-system/pinot-s3/src/test/java/org/apache/pinot/plugin/filesystem/S3PinotFSTest.java
##########
@@ -28,12 +28,16 @@
 import java.util.Arrays;
 import java.util.List;
 import org.apache.commons.io.IOUtils;
+import org.apache.pinot.spi.env.PinotConfiguration;

Review comment:
       do we need those imports?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] KKcorps commented on a change in pull request #5836: Fix data ingestion from Amazon S3 bucket

Posted by GitBox <gi...@apache.org>.
KKcorps commented on a change in pull request #5836:
URL: https://github.com/apache/incubator-pinot/pull/5836#discussion_r468406304



##########
File path: pinot-plugins/pinot-file-system/pinot-s3/src/test/java/org/apache/pinot/plugin/filesystem/S3PinotFSTest.java
##########
@@ -28,12 +28,16 @@
 import java.util.Arrays;
 import java.util.List;
 import org.apache.commons.io.IOUtils;
+import org.apache.pinot.spi.env.PinotConfiguration;

Review comment:
       Removed unused imports.

##########
File path: pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
##########
@@ -391,7 +393,11 @@ public long length(URI fileUri)
       listObjectsV2Response.contents().stream().forEach(object -> {
         //Only add files and not directories
         if (!object.key().equals(fileUri.getPath()) && !object.key().endsWith(DELIMITER)) {
-          builder.add(object.key());
+          String fileKey = object.key();
+          if(fileKey.startsWith(DELIMITER)){

Review comment:
       does it look fine now?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] KKcorps commented on pull request #5836: Fix data ingestion from Amazon S3 bucket

Posted by GitBox <gi...@apache.org>.
KKcorps commented on pull request #5836:
URL: https://github.com/apache/incubator-pinot/pull/5836#issuecomment-671259746


   @fx19880617 @npawar 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on a change in pull request #5836: Fix data ingestion from Amazon S3 bucket

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on a change in pull request #5836:
URL: https://github.com/apache/incubator-pinot/pull/5836#discussion_r468068735



##########
File path: pinot-plugins/pinot-file-system/pinot-s3/src/main/java/org/apache/pinot/plugin/filesystem/S3PinotFS.java
##########
@@ -391,7 +393,11 @@ public long length(URI fileUri)
       listObjectsV2Response.contents().stream().forEach(object -> {
         //Only add files and not directories
         if (!object.key().equals(fileUri.getPath()) && !object.key().endsWith(DELIMITER)) {
-          builder.add(object.key());
+          String fileKey = object.key();
+          if(fileKey.startsWith(DELIMITER)){

Review comment:
       nit: `if (fileKey.startsWith(DELIMITER)) {`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org