You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2019/10/24 07:29:06 UTC

[GitHub] [incubator-pinot] fx19880617 opened a new pull request #4742: Adding bootstrap mode for Pinot-hadoop job to output segments into relative directories.

fx19880617 opened a new pull request #4742: Adding bootstrap mode for Pinot-hadoop job to output segments into relative directories.
URL: https://github.com/apache/incubator-pinot/pull/4742
 
 
   - Skip hidden files or temp files created by computation frameworks like hadoop, spark.
   - Adding a `job.bootstrap` flag to make output directory following the relative paths from input path.
   
   **job.properties**
   ```
   input.dir = /path/to/input
   output.dir = /path/to/output
   job.bootstrap=true
   segment.table.name=mytable
   ```
   The data structure under `/path/to/input` is like:
   ```
   /path/to/input/yyyy=2019/mm=10/dd=1/part-0-r-aaa.avro
   /path/to/input/yyyy=2019/mm=10/dd=2/part-0-r-bbb.avro
   /path/to/input/yyyy=2019/mm=10/dd=3/part-0-r-ccc.avro
   ```
   
   We expect the output directory structure to be:
   ```
   /path/to/output/yyyy=2019/mm=10/dd=1/mytable_0.tar.gz
   /path/to/output/yyyy=2019/mm=10/dd=2/mytable_1.tar.gz
   /path/to/output/yyyy=2019/mm=10/dd=3/mytable_2.tar.gz
   ```
   In the old job, we will get:
   ```
   /path/to/output/mytable_0.tar.gz
   /path/to/output/mytable_1.tar.gz
   /path/to/output/mytable_2.tar.gz
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org