You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "rseetham (via GitHub)" <gi...@apache.org> on 2024/02/29 07:14:12 UTC

[I] Provide pinot schema when initializing StreamMessageDecoder [pinot]

rseetham opened a new issue, #12521:
URL: https://github.com/apache/pinot/issues/12521

   [StreamMessageDecoder's](https://github.com/apache/pinot/blob/ac13a191b945a80084f0a2794391e4be2f463252/pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamMessageDecoder.java#L49) init is
   `void init(Map<String, String> props, Set<String> fieldsToRead, String topicName)`
   
   It would be great if the decoder has access to the pinot schema as well. At Uber, we have our own decoder internally to decode avro messages. We use the AvroRecordExtractor at the end but we need access to the pinot schema to do some custom things. 
   Initially, this class has access to the pinot schema but that was [removed in 2020](https://github.com/apache/pinot/pull/5309).
   This was done because 
   
   > RecordReader and StreamMessageDecoder is the entry point for batch and streaming data ingestion. They are expected to be implemented and plugged to provide customized format support.
   To make the abstraction more crispy and easier to understand, remove the Schema and replace it with fields to read so that users do not need to worry about extracting fields from the Pinot schema when adding a new format.
   
   fieldsToRead is generated [here](https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java#L1477) using
   `Set<String> fieldsToRead = IngestionUtils.getFieldsForRecordExtractor(_tableConfig.getIngestionConfig(), _schema);`
   In the [implmentation](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/IngestionUtils.java#L310), if SchemaConformingTransformerConfig is present, we will return empty fieldsToRead. If the fieldsToRead is empty, other parts of the decoder code, assume that we have to extract all the fields in the input schema anyway. [Example](https://github.com/apache/pinot/blob/master/pinot-plugins/pinot-input-format/pinot-avro-base/src/main/java/org/apache/pinot/plugin/inputformat/avro/AvroRecordExtractor.java#L52).
   
   The request here is to add schema to the initializer of StreamMessageDecoder. It would be great if the StreamMessageDecoder had access to the schema. The fieldsToRead will still be there and used for existing reasons but the schema is a nice to have in the decoder. (In our case, we want to know what the time column). Even in general, if the decoder wants to do specific stuff based on the pinot schema it would be nice to have access to the schema.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Provide pinot schema when initializing StreamMessageDecoder [pinot]

Posted by "rseetham (via GitHub)" <gi...@apache.org>.
rseetham commented on issue #12521:
URL: https://github.com/apache/pinot/issues/12521#issuecomment-1970554202

   For clarification, 
   
   >  @param fieldsToRead The fields to read from the source stream. If blank, reads all fields (only for AVRO/JSON                     currently)
   
   So only avro and json assume extract all is this is empty? Does this mean we want other input formats to also support extract all if this is empty?
      
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Provide pinot schema when initializing StreamMessageDecoder [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #12521:
URL: https://github.com/apache/pinot/issues/12521#issuecomment-1970547053

   cc @snleee @swaminathanmanish 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Provide pinot schema when initializing StreamMessageDecoder [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #12521:
URL: https://github.com/apache/pinot/issues/12521#issuecomment-1972111556

   Yes. I think we can modify the javadoc, and it should apply to all decoders. It doesn't make sense to have a decoder that decodes nothing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Provide pinot schema when initializing StreamMessageDecoder [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang closed issue #12521: Provide pinot schema when initializing StreamMessageDecoder 
URL: https://github.com/apache/pinot/issues/12521


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org