You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/11/16 23:56:37 UTC

[GitHub] [pinot] kirkrodrigues opened a new issue, #9819: Add plugins to support storing and querying JSON log events in Pinot

kirkrodrigues opened a new issue, #9819:
URL: https://github.com/apache/pinot/issues/9819

   We want to be able to store JSON log events in Pinot so that they can be queried efficiently and so that we can reduce storage costs. Part of this involves encoding unstructured message fields in the log event using a new log compressor called CLP. The other part is to transform the log event to fit a table's schema (e.g., extracting nested fields and storing them in a column). We think this can be done with a custom `StreamMessageDecoder` and a few UDFs.
   
   We've written more about the motivation and proposal [here](https://docs.google.com/document/d/1n5qpZgNHRDWk1Hjbu5kwjwIY8T_H5EScFoPndIt8Fis).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kirkrodrigues commented on issue #9819: Add plugins to support storing and querying JSON log events in Pinot

Posted by GitBox <gi...@apache.org>.
kirkrodrigues commented on issue #9819:
URL: https://github.com/apache/pinot/issues/9819#issuecomment-1320862229

   @chenboat Yeah, I could lift the CLP-encoding logic into a class so that it's easy for other input formats to use it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] chenboat commented on issue #9819: Add plugins to support storing and querying JSON log events in Pinot

Posted by GitBox <gi...@apache.org>.
chenboat commented on issue #9819:
URL: https://github.com/apache/pinot/issues/9819#issuecomment-1320614801

   The CLP encoding should be applicable to multiple input formats including JSON and text. Our first implementation of stream decoder should will be based on JSON log input. @kirkrodrigues can we extract the core logic so that other input stream format can easily use it too? May not be the first PR but can be a good follow up one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Add plugins to support storing and querying JSON log events in Pinot [pinot]

Posted by "bssatya (via GitHub)" <gi...@apache.org>.
bssatya commented on issue #9819:
URL: https://github.com/apache/pinot/issues/9819#issuecomment-1814986800

   Does this change support both storing and querying JSON ? Asking because the [design doc](https://docs.google.com/document/d/1nHZb37re4mUwEA258x3a2pgX13EWLWMJ0uLEDk1dUyU/edit) does not have details for querying.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on issue #9819: Add plugins to support storing and querying JSON log events in Pinot

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #9819:
URL: https://github.com/apache/pinot/issues/9819#issuecomment-1319397344

   is this limited to JSON format or any text?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kirkrodrigues commented on issue #9819: Add plugins to support storing and querying JSON log events in Pinot

Posted by GitBox <gi...@apache.org>.
kirkrodrigues commented on issue #9819:
URL: https://github.com/apache/pinot/issues/9819#issuecomment-1319521363

   The plugins would be limited to input JSON records and CLP's encoding can be applied to any text field within those records.
   
   Our short term goal is to kind of use Pinot as a black box columnar store. So we'd apply CLP's encoding to decompose a text field into a columnar format and store the columns in Pinot; for logs this should reduce the storage overhead of that field while still allowing it to be searched without resorting to a text index. Then when a user wants to query the field, we'd use CLP to convert their wildcard query into a SQL query on the decomposed columns in Pinot. Since the query operates on the decomposed columns, this should be faster than a query on the original text and only matching rows would need to be reconstructed from the columns using a UDF.
   
   If this works well, (and if the community gives us their blessing :), we hope to try and integrate this deeper into Pinot, perhaps as a special type of index that could be applied to any text column which contains logs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org