You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@camel.apache.org by GitBox <gi...@apache.org> on 2021/11/19 13:55:36 UTC

[GitHub] [camel-kafka-connector] FernandoDorado opened a new issue #1291: Problem with line breaks kafka-hdfs sink connector

FernandoDorado opened a new issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291


   Good afternoon, 
   
   I have been doing some testing of the kafka-hdfs connector, especially on the way it stacks files within HDFS. The used configuration for the connector is as follows: 
   ```json
   {
       "name":"CamelHdfsSinkConnector",
       "config":{
           "connector.class":"org.apache.camel.kafkaconnector.hdfs.CamelHdfsSinkConnector",
           "task.max":2,
           "key.converter":"org.apache.kafka.connect.storage.StringConverter",
           "value.converter":"org.apache.kafka.connect.storage.StringConverter",
           "topics":"office-topic",
           "camel.sink.path.hostName": "HAcluster",
           "camel.sink.endpoint.namedNodes": "namenode-1:8020,namenode-2:8020",
           "camel.sink.endpoint.splitStrategy": "IDLE:100000,BYTES:10000000",
           "camel.sink.path.path": "user/kafka/"
       }
   }
   ```
   I have verified that messages do indeed arrive and stack correctly in HDFS. My problem comes because these messages are stacked without any line break to separate them in order to be able to use them for example with Hive. An example of a msg in HDFS is: 
   
   
   
   
   
   I have seen that this was not a problem that happened only to me, as there is a [post on StackOverflow](https://stackoverflow.com/questions/67374326/append-a-new-line-character-at-every-record-passing-through-camelkafkaazureblobs) with this problem. 
   
   I have tried the proposed solution, but since I am not a Java expert, I have not obtained good results. I have modified the Apply method of CamelTypeConverTransform, but I don't know if the post refers to this or another one. This would be the method I have tried: 
   ```java
       @Override
       public R apply(R record) {
           final Schema schema = operatingSchema(record);
           final Object value = operatingValue(record);
   
           final Object convertedValue = convertValueWithCamelTypeConverter(value);
           final Object updatedValue = convertedValue + System.lineSeparator();
   
           final Schema updatedSchema = getOrBuildRecordSchema(schema, updatedValue);
   
           return newRecord(record, updatedSchema, updatedValue);
       }
   ```
   A single msg has a format: 
   ```json
   {
     "fields": {
       "year": 2021,
       "I1": 0,
       "I2": 8.5143e-41,
       "I3": 6.234535,
       "PHI_UI1": 0,
       "sec": 54,
       "PHI_UI2": 8.6209e-41,
       "P": 1335.8148,
       "PHI_UI3": -20.375,
       "Q": -584.91974,
       "min": 42,
       "month": 11,
       "hour": 13,
       "FREQ": 50,
       "I_N": 5.915098,
       "U1": 234.59723,
       "UUID": 35697,
       "day": 19,
       "U2": 233.06522,
       "U3": 234.024
     },
     "timestamp": 1637329377581,
     "expires": null
   }
   ```
   
   An extracted portion of data from the HDFS file: 
   ```json
   {"fields":{"year":2021,"I1":0.0,"I2":4.8611E-41,"I3":6.2057714,"PHI_UI1":0.0,"sec":14,"PHI_UI2":4.104E-42,"P":1325.8503,"PHI_UI3":-20.5,"Q":-583.9375,"min":52,"month":11,"hour":13,"FREQ":49.75,"I_N":5.9074244,"U1":233.73982,"UUID":35697,"day":19,"U2":232.20639,"U3":233.972},"timestamp":1637329937662,"expires":null}{"fields":{"year":2021,"I1":0.0,"I2":4.1306E-41,"I3":6.2395077,"PHI_UI1":0.0,"sec":15,"PHI_UI2":5.0996E-41,"P":1323.7439,"PHI_UI3":-20.75,"Q":-599.9375,"min":52,"month":11,"hour":13,"FREQ":49.75,"I_N":5.9192123,"U1":233.64069,"UUID":35697,"day":19,"U2":232.09172,"U3":233.976},"timestamp":1637329938420,"expires":null}{"fields":{"year":2021,"I1":0.0,"I2":1.208E-41,"I3":6.243064,"PHI_UI1":0.0,"sec":15,"PHI_UI2":3.6516E-41,"P":1317.8503,"PHI_UI3":-20.5,"Q":-581.44635,"min":52,"month":11,"hour":13,"FREQ":49.75,"I_N":5.9074244,"U1":233.63208,"UUID":35697,"day":19,"U2":232.06783,"U3":233.976},"timestamp":1637329939102,"expires":null}
   ```
   
   Where it can be seen that there is no separation between different msgs. Is there some way to fix this problem? 
   
   
   Thank you in advance


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [camel-kafka-connector] oscerd commented on issue #1291: Problem with line breaks kafka-hdfs sink connector

Posted by GitBox <gi...@apache.org>.
oscerd commented on issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291#issuecomment-974099938


   As first step I would look at how to create an SMT and create a extended connector containing it. 
   
   https://camel.apache.org/camel-kafka-connector/next/user-guide/extending-connector/archetype-connector.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [camel-kafka-connector] FernandoDorado closed issue #1291: Problem with line breaks kafka-hdfs sink connector

Posted by GitBox <gi...@apache.org>.
FernandoDorado closed issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [camel-kafka-connector] oscerd commented on issue #1291: Problem with line breaks kafka-hdfs sink connector

Posted by GitBox <gi...@apache.org>.
oscerd commented on issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291#issuecomment-975436387


   Leave it open if you want so we can work on that or wait for the PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [camel-kafka-connector] FernandoDorado closed issue #1291: Problem with line breaks kafka-hdfs sink connector

Posted by GitBox <gi...@apache.org>.
FernandoDorado closed issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [camel-kafka-connector] FernandoDorado commented on issue #1291: Problem with line breaks kafka-hdfs sink connector

Posted by GitBox <gi...@apache.org>.
FernandoDorado commented on issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291#issuecomment-975437276


   Perfect, we will let you know when we have progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [camel-kafka-connector] oscerd commented on issue #1291: Problem with line breaks kafka-hdfs sink connector

Posted by GitBox <gi...@apache.org>.
oscerd commented on issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291#issuecomment-974101581


   Here is an example you could adapt for HDFS
   
   https://github.com/apache/camel-kafka-connector-examples/tree/main/aws2-kinesis/aws2-kinesis-source


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [camel-kafka-connector] FernandoDorado commented on issue #1291: Problem with line breaks kafka-hdfs sink connector

Posted by GitBox <gi...@apache.org>.
FernandoDorado commented on issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291#issuecomment-992709187


   I have been testing the 10.0 connector version, and I got a line break between messages. I have had some problems with later versions, but I don't know what could be the cause. I will keep testing to give you more details


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [camel-kafka-connector] FernandoDorado commented on issue #1291: Problem with line breaks kafka-hdfs sink connector

Posted by GitBox <gi...@apache.org>.
FernandoDorado commented on issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291#issuecomment-974103931


   Perfect, thank you, I'll take a look at it. Anyway, should this line break functionality be done by default in the connector?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [camel-kafka-connector] FernandoDorado commented on issue #1291: Problem with line breaks kafka-hdfs sink connector

Posted by GitBox <gi...@apache.org>.
FernandoDorado commented on issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291#issuecomment-974145319


   Great, thank you very much for your clarifications. Since I am not the Java expert in my organisation, I will pass this issue on and let you know if we have made any progress. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [camel-kafka-connector] oscerd commented on issue #1291: Problem with line breaks kafka-hdfs sink connector

Posted by GitBox <gi...@apache.org>.
oscerd commented on issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291#issuecomment-974110810


   The engine behind all the connectors is Camel, the idea for adapting the different records (Camel exchanges) while routing between Kafka->sink or source->kafka is to provide specific SMT for single connectors, those SMT are present in the documentation if they exists and you could find them in the documentation. So essentially, you'll need to set the SMT on your configuration or if any, use a converter among the listed ones. What we could maybe do, is adding the SMT lines in the configuration example provided in the connector tar.gz.
   
   Make it the default behavior, requires modifying the core and the core will be used by all the generated connectors, so it's not possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [camel-kafka-connector] oscerd commented on issue #1291: Problem with line breaks kafka-hdfs sink connector

Posted by GitBox <gi...@apache.org>.
oscerd commented on issue #1291:
URL: https://github.com/apache/camel-kafka-connector/issues/1291#issuecomment-974114515


   We need to work a bit on the documentation and add more converters, but also we need to have a look at Kamelets, because in the long term we want to align to the Kamelets catalog even for the camel-kafka-connector connectors. Thanks by the way for reaching out. If you're able to make it works through the extended connector, it would be nice if you could open a PR with the SMT you implemented, if you don't have time, it would be enough to show the code here and I'll merge in the code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@camel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org