You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2019/07/01 13:41:14 UTC

[GitHub] [pulsar] avatart93 opened a new issue #4651: HDFS sink with different schema

avatart93 opened a new issue #4651: HDFS sink with different schema
URL: https://github.com/apache/pulsar/issues/4651
 
 
   **Is your feature request related to a problem? Please describe.**
   It seems that the hdfs sink connector can only parse string messages, this is a problem when working with schemas or already encoded messages that can't be parse to string (let's say encrypted messages for example) and want to send those messages to a data lake in hdfs for cold storage.
    
   **Describe the solution you'd like**
   I would like that the hdfs connector supports schemas different than StringSchema, or at least the raw schema (bytes[]).
   
   **Describe alternatives you've considered**
   Implement tiered storage for hdfs as well, S3 wont be a solution if working on-premises.
   
   **Additional context**
   If the topic's schema is different to StringSchema you can see the pulsar console printing a IncompatibleSchemaException when you try to run the HDFS sink connector. If you check the connector's status, you will see a IO exception error and multiple retries.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services