You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/10/28 20:14:07 UTC
[GitHub] [incubator-pinot] flykent1990 opened a new issue #6206: Supports parquet file parallel processing with LocalPinoFS
flykent1990 opened a new issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206
hi team pinot
Can you provide me with parquet parallel file processing
I am currently processing 18Gb parquet file (152 files) with LocalPinotFS which takes 200min
Well, having added this feature will improve performance
![Screen Shot 2020-10-29 at 03 13 42](https://user-images.githubusercontent.com/71525567/97491446-c1ab1100-1994-11eb-9b75-476eb1cf23b8.png)
![Screen Shot 2020-10-28 at 00 08 09](https://user-images.githubusercontent.com/71525567/97491384-ac35e700-1994-11eb-9bff-e17083496202.png)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on issue #6206: Supports parquet file parallel processing with LocalPinoFS
Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-719059369
> ok tks @fx19880617
> using spark job I reduce processing time by 50% parquet file
👍
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] flykent1990 removed a comment on issue #6206: Supports parquet file parallel processing with LocalPinoFS
Posted by GitBox <gi...@apache.org>.
flykent1990 removed a comment on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-718244971
hi @fx19880617
I run the spark job and get an error like this ... can you help me
command run :
./bin/spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master "local[2]" --deploy-mode client --conf "spark.driver.extraClassPath="$CLASSPATH:$PLUGINS_CLASSPATH"" local:///app/pinot/lib/pinot-all-0.5.0-jar-with-dependencies.jar -jobSpecFile /tmp/structure_schema/ingestionJobSpec_begin_spark.yaml
![Screen Shot 2020-10-29 at 05 20 43](https://user-images.githubusercontent.com/71525567/97503404-89f99480-19a7-11eb-8c31-b9a12789e845.png)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on issue #6206: Supports parquet file parallel processing with LocalPinoFS
Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-718209877
One way you can try is to start a local spark cluster and submit a spark job to it:
You can ref to this doc, just replace the s3 part with your local. https://docs.pinot.apache.org/users/tutorials/ingest-parquet-files-from-s3-using-spark
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] flykent1990 commented on issue #6206: Supports parquet file parallel processing with LocalPinoFS
Posted by GitBox <gi...@apache.org>.
flykent1990 commented on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-719017406
ok tks @fx19880617
using spark job I reduce processing time by 50% parquet file
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] flykent1990 commented on issue #6206: Supports parquet file parallel processing with LocalPinoFS
Posted by GitBox <gi...@apache.org>.
flykent1990 commented on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-718244971
hi @fx19880617
I run the spark job and get an error like this ... can you help me
command run :
./bin/spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master "local[2]" --deploy-mode client --conf "spark.driver.extraClassPath="$CLASSPATH:$PLUGINS_CLASSPATH"" local:///app/pinot/lib/pinot-all-0.5.0-jar-with-dependencies.jar -jobSpecFile /tmp/structure_schema/ingestionJobSpec_begin_spark.yaml
![Screen Shot 2020-10-29 at 05 20 43](https://user-images.githubusercontent.com/71525567/97503404-89f99480-19a7-11eb-8c31-b9a12789e845.png)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] fx19880617 commented on issue #6206: Supports parquet file parallel processing with LocalPinoFS
Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6206:
URL: https://github.com/apache/incubator-pinot/issues/6206#issuecomment-719170732
https://github.com/apache/incubator-pinot/pull/6214
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org