You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/05/14 18:16:54 UTC

[GitHub] [incubator-pinot] kishred opened a new issue #5388: Batch ingestion is too slow

kishred opened a new issue #5388:
URL: https://github.com/apache/incubator-pinot/issues/5388


   Batch ingestion process takes 26 minutes to ingest a parquet file with 570MB in size. The data contains 3.5 million rows with 16 columns (see table below for details). It does not look like the performance is specific to any input file formats as ingestion took similar times for the same data in CSV and AVRO formats.
   
   Column | Type |Structure | Size | Cardinality| Max Length|
   ------------ | -------------|-----|-------|-----------|---------|
   1 | STRING|fixed bytes value dictionary|5|1|15
   2 | STRING|fixed bytes value dictionary| 222825 |8913|25
   3 | STRING|fixed bytes value dictionary| 571140 | 38076 |15
   4 | STRING|fixed bytes value dictionary| 28| 4 |7
   5 | STRING|fixed bytes value dictionary| 910| 35 |26
   6 | STRING|fixed bytes value dictionary| 1020|85 |12
   7 | STRING|fixed bytes value dictionary| 110| 11 |10
   8 | STRING|fixed bytes value dictionary| 40| 5 |8
   9 | STRING|fixed bytes value dictionary| 25| 5 |5
   10 | STRING|fixed bytes value dictionary| 4553| 157 |29
   11 | LONG|dictionary| | 35943781 ||
   12 | INT|dictionary| | 1 ||
   13 | INT|dictionary| | 85 ||
   14 | INT|dictionary| | 63971 ||
   15 | INT|dictionary| | 3 ||
   16 | INT|dictionary| | 64784 ||
   
   
   
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] icefury71 commented on issue #5388: Batch ingestion is too slow

Posted by GitBox <gi...@apache.org>.
icefury71 commented on issue #5388:
URL: https://github.com/apache/incubator-pinot/issues/5388#issuecomment-639854690


   @kishred can you give us some more details on how you're doing the batch ingestion. How exactly are you creating the Pinot segments ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org