You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@camel.apache.org by "Fernando (Jira)" <ji...@apache.org> on 2021/06/24 09:17:00 UTC

[jira] [Created] (CAMEL-16754) Camel Kafka HDFS sink connector

Fernando  created CAMEL-16754:
---------------------------------

             Summary: Camel Kafka HDFS sink connector
                 Key: CAMEL-16754
                 URL: https://issues.apache.org/jira/browse/CAMEL-16754
             Project: Camel
          Issue Type: Bug
          Components: camel-hdfs, camel-kafka
    Affects Versions: 3.10.0, 3.9.0
            Reporter: Fernando 
         Attachments: Screenshot_1.png

Hello, 

I'm trying to connect kafka and hdfs to store data. I set it up and it works correctly, but the problem arises when I save the kafka messages in hdfs as a file is created for each message. I would like to create a file containing multiple messages, but I can't solve this problem. I've change the value of 
{code:java}
camel.sink.endpoint.splitStrategy=BYTES:1000000
{code}
{code:java}
camel.sink.endpoint.splitStrategy=MESSAGES:10
{code}
But when I view the files in the hdfs folder, I see one file for each message (image adjunted).
 The full configuration of the connector is the next:
{code:java}
name=CamelHdfsSinkConnector
connector.class=org.apache.camel.kafkaconnector.hdfs.CamelHdfsSinkConnector
tasks.max=1

# use the kafka converters that better suit your needs, these are just defaults:
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
#key.converter=org.apache.kafka.connect.json.JsonConverter
#value.converter=org.apache.kafka.connect.json.JsonConverter
# comma separated topics to get messages from
topics=modbus-office-topic
# mandatory properties (for a complete properties list see the connector documentation):
# HDFS host to use
camel.sink.path.hostName=namenode
camel.sink.path.port=9000
camel.sink.endpoint.splitStrategy=BYTES:10000000
# The directory path to use
camel.sink.path.path=Example_folder
{code}
I am currently running hadoop version 3.1.2, I have my doubts that this is the problem, and I don't know if the problem is with the connector, the hadoop version or the connection configuration. 
Thanks for your time




--
This message was sent by Atlassian Jira
(v8.3.4#803005)