You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/17 04:52:33 UTC

[GitHub] [druid] clintropolis edited a comment on issue #11003: Doc question: Can protobuf extension be used with "index_parallel"?

clintropolis edited a comment on issue #11003:
URL: https://github.com/apache/druid/issues/11003#issuecomment-800791722


   I am not familiar enough with protobuf encoded files to know if this will work, but the error you are seeing is related to trying to use `inputSource` with a parser. You need to use the older 'parser' based ingestion spec to not see this error, see https://druid.apache.org/docs/latest/ingestion/index.html#parser-deprecated. (no `inputSource` or `inputFormat` on parser based specs, instead "firehoses" are used in place of input source iirc, https://druid.apache.org/docs/latest/ingestion/native-batch.html#firehoses-deprecated)
   
   The protobuf parser depends on getting byte chunks of encoded proto messages, so any file reader would need to read out individual message binary blobs from the underlying file to feed to the parser, which is the part that makes me unsure that protobuf files with batch would work correctly. For example the CSV parser is fed single lines from an underlying text file, where the line is expected to be a CSV row. A protobuf file parser would need something to do something similar with the binary message blobs from the file, and i'm not sure if just having the message schema is enough for that to work.
   
   I think if the `inputFormat` did exist, it might still have this issue, a file based protobuf decoder might be a specialized `InputFormat` implementation that is separate from a streaming individual message processor format (again I'm not familiar with the file side of things much).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org