You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/10/28 20:14:07 UTC

[GitHub] [incubator-pinot] flykent1990 opened a new issue #6206: Supports parquet file parallel processing with LocalPinoFS

flykent1990 opened a new issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206


   hi team pinot 
   
   Can you provide me with parquet parallel file processing
   
   I am currently processing 18Gb parquet file (152 files) with LocalPinotFS which takes 200min
   
   Well, having added this feature will improve performance
   
   ![Screen Shot 2020-10-29 at 03 13 42](https://user-images.githubusercontent.com/71525567/97491446-c1ab1100-1994-11eb-9b75-476eb1cf23b8.png)
   ![Screen Shot 2020-10-28 at 00 08 09](https://user-images.githubusercontent.com/71525567/97491384-ac35e700-1994-11eb-9bff-e17083496202.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6206: Supports parquet file parallel processing with LocalPinoFS

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-719059369


   > ok tks @fx19880617
   > using spark job I reduce processing time by 50% parquet file
   
   👍 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] flykent1990 removed a comment on issue #6206: Supports parquet file parallel processing with LocalPinoFS

Posted by GitBox <gi...@apache.org>.
flykent1990 removed a comment on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-718244971


   hi @fx19880617 
   
   I run the spark job and get an error like this ... can you help me
   
   command run : 
   ./bin/spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master "local[2]" --deploy-mode client --conf "spark.driver.extraClassPath="$CLASSPATH:$PLUGINS_CLASSPATH""  local:///app/pinot/lib/pinot-all-0.5.0-jar-with-dependencies.jar -jobSpecFile /tmp/structure_schema/ingestionJobSpec_begin_spark.yaml
   
   ![Screen Shot 2020-10-29 at 05 20 43](https://user-images.githubusercontent.com/71525567/97503404-89f99480-19a7-11eb-8c31-b9a12789e845.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6206: Supports parquet file parallel processing with LocalPinoFS

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-718209877


   One way you can try is to start a local spark cluster and submit a spark job to it:
   You can ref to this doc, just replace the s3 part with your local. https://docs.pinot.apache.org/users/tutorials/ingest-parquet-files-from-s3-using-spark
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] flykent1990 commented on issue #6206: Supports parquet file parallel processing with LocalPinoFS

Posted by GitBox <gi...@apache.org>.
flykent1990 commented on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-719017406


   ok tks @fx19880617 
   using spark job I reduce processing time by 50% parquet file


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] flykent1990 commented on issue #6206: Supports parquet file parallel processing with LocalPinoFS

Posted by GitBox <gi...@apache.org>.
flykent1990 commented on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-718244971


   hi @fx19880617 
   
   I run the spark job and get an error like this ... can you help me
   
   command run : 
   ./bin/spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master "local[2]" --deploy-mode client --conf "spark.driver.extraClassPath="$CLASSPATH:$PLUGINS_CLASSPATH""  local:///app/pinot/lib/pinot-all-0.5.0-jar-with-dependencies.jar -jobSpecFile /tmp/structure_schema/ingestionJobSpec_begin_spark.yaml
   
   ![Screen Shot 2020-10-29 at 05 20 43](https://user-images.githubusercontent.com/71525567/97503404-89f99480-19a7-11eb-8c31-b9a12789e845.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6206: Supports parquet file parallel processing with LocalPinoFS

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-719170732


   https://github.com/apache/incubator-pinot/pull/6214


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org