You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by "ankitsultana (via GitHub)" <gi...@apache.org> on 2023/03/28 18:02:07 UTC

[GitHub] [pinot] ankitsultana opened a new issue, #10494: Partial Upsert Tables Read a Lot of MMAP'ed Data

ankitsultana opened a new issue, #10494:
URL: https://github.com/apache/pinot/issues/10494

   For Full Upsert tables, if we get a new event for an existing primary-key, we simply use that new event as the new record.
   
   However for Partial Upsert tables we have to read the entire existing record. This means that we have to do row-based reads on the columnar data which can have a big overhead especially if the number of columns are higher.
   
   Not only does it increase the overall Disk IO Utilization, if there's even a modest spike in Disk IO Utilization the ingestion latency for partial upsert tables can be impacted.
   
   I don't think there's a way around this given how Partial Upsert tables are designed but interested to see what the community thinks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

Re: [I] Partial Upsert Tables Read a Lot of MMAP'ed Data [pinot]

Posted by "rohityadav1993 (via GitHub)" <gi...@apache.org>.

rohityadav1993 commented on issue #10494:
URL: https://github.com/apache/pinot/issues/10494#issuecomment-1986964747

   This issue has been addressed with lazy reads for the previous records in [11826 ](https://github.com/apache/pinot/pull/11826). Only the necessary columns will be read instead of actively reading the complete row.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

Re: [I] Partial Upsert Tables Read a Lot of MMAP'ed Data [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.

Jackie-Jiang closed issue #10494: Partial Upsert Tables Read a Lot of MMAP'ed Data
URL: https://github.com/apache/pinot/issues/10494


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] swaminathanmanish commented on issue #10494: Partial Upsert Tables Read a Lot of MMAP'ed Data

Posted by "swaminathanmanish (via GitHub)" <gi...@apache.org>.

swaminathanmanish commented on issue #10494:
URL: https://github.com/apache/pinot/issues/10494#issuecomment-1487565238

   @ankitsultana  - Just curious, if you have any stats that can indicate how bad the situation is and also how frequently this happens ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] ankitsultana commented on issue #10494: Partial Upsert Tables Read a Lot of MMAP'ed Data

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.

ankitsultana commented on issue #10494:
URL: https://github.com/apache/pinot/issues/10494#issuecomment-1487641599

   > For some context, do you observe this issue when some columns are raw (no dictionary) and also compressed? It will be the worst case when values are compressed but we just need to read one value out
   
   In our case we have ~100 columns. All are dictionary encoded. We also have a bit high ingestion rate (messages per second) for this use-case, so the number of columns multiplied with the ingestion rate worsen the issue.
   
   > @ankitsultana - Just curious, if you have any stats that can indicate how bad the situation is and also how frequently this happens ?
   
   We don't have any general stats to share. Most likely you'll run into this when your table is larger than mmap memory, and the (messages per second x number of columns) are "high".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] deemoliu commented on issue #10494: Partial Upsert Tables Read a Lot of MMAP'ed Data

Posted by "deemoliu (via GitHub)" <gi...@apache.org>.

deemoliu commented on issue #10494:
URL: https://github.com/apache/pinot/issues/10494#issuecomment-1527815550

   I can take a look at partial upsert handler and provide a potential optimization.
   We might ignore and avoid accessing null values during 'merging' to reduce disk I/O.
   
   @Jackie-Jiang Do you think this will be helpful?
   
   @ankitsultana  I think this issue can happen when there are too many columns in a high throughput kafka topic. 
    do we have metrics on disk IO metric for this table? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] ankitsultana commented on issue #10494: Partial Upsert Tables Read a Lot of MMAP'ed Data

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.

ankitsultana commented on issue #10494:
URL: https://github.com/apache/pinot/issues/10494#issuecomment-1527873655

   let's follow-up offline. I think we can keep the open source code as is for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] Jackie-Jiang commented on issue #10494: Partial Upsert Tables Read a Lot of MMAP'ed Data

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.

Jackie-Jiang commented on issue #10494:
URL: https://github.com/apache/pinot/issues/10494#issuecomment-1487550827

   For some context, do you observe this issue when some columns are raw (no dictionary) and also compressed? It will be the worst case when values are compressed but we just need to read one value out


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org