You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by GitBox <gi...@apache.org> on 2021/07/06 14:24:32 UTC

[GitHub] [flume] wanghangyu817 opened a new pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

wanghangyu817 opened a new pull request #346:
URL: https://github.com/apache/flume/pull/346


   When using memory-channel, the header is retrieved from sink and the metadata information of Kafka is retrieved from it, but when using kafka-channel, it is not retrieved
   
   `       System.out.println("***************************");
   
           Map<String, String> headers = event.getHeaders();
           String rain = headers.get("rain");
           String topic = headers.get("topic");
           String key = headers.get("key");
           String offset = headers.get("offset");
   
           System.out.println("rain = " + rain);
           System.out.println("topic = " + topic);
           System.out.println("key = " + key);
           System.out.println("offset = " + offset);
   
           System.out.println("***************************");`
   
   In the above code, the printed value is empty, so this solves the problem
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] rgoers commented on pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
rgoers commented on pull request #346:
URL: https://github.com/apache/flume/pull/346#issuecomment-1079881866


   Closing this. Support for what you need was added in [PR 356](https://github.com/apache/flume/pull/356). See [NullInitSink](https://github.com/apache/flume/blob/trunk/flume-ng-node/src/test/java/org/apache/flume/sink/NullInitSink.java) for an example of how to access another component.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] wanghangyu817 commented on pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
wanghangyu817 commented on pull request #346:
URL: https://github.com/apache/flume/pull/346#issuecomment-1066272504


   @rgoers When I need to use any kafka-related attributes in sink, I currently can't get them. As mentioned above, if I need to use some attributes in the data as part of the file name, I need to pass them from the upstream component . (of course, this part needs to be customized).  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] wanghangyu817 commented on pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
wanghangyu817 commented on pull request #346:
URL: https://github.com/apache/flume/pull/346#issuecomment-1068814651


   My current approach is to customize hdFssink and get the values in Kafkachannel


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] tmgstevens commented on pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
tmgstevens commented on pull request #346:
URL: https://github.com/apache/flume/pull/346#issuecomment-1066460538


   There is definitely a risk that this will change behaviour, particularly if used by a KafkaSink. I think we need to make this behaviour turned off by default and also allow the user to customise the name of the headers that she wishes to use for each field. We've got some fairly good examples of that sort of pattern here: https://github.com/apache/flume/blob/trunk/flume-ng-sources/flume-kafka-source/src/main/java/org/apache/flume/source/kafka/KafkaSource.java#L264


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] rgoers commented on pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
rgoers commented on pull request #346:
URL: https://github.com/apache/flume/pull/346#issuecomment-1068826739


   Correct. This will let you do that without having to pollute the events with unnecessary info.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] wanghangyu817 commented on pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
wanghangyu817 commented on pull request #346:
URL: https://github.com/apache/flume/pull/346#issuecomment-1066495473


   @tmgstevens I think this is ok, but there are situations where kafkasource is not used, such as taildirsource-kafkachannal-hdfssink, where this value does not work  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] rgoers commented on pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
rgoers commented on pull request #346:
URL: https://github.com/apache/flume/pull/346#issuecomment-1064600233


   I don't understand this PR. I've looked at HDFSEventSink and I cannot find the code referenced above. Furthermore adding the topic to the event makes it very likely they will remain in the event when it is passed to a Sink, which could be a KafkaSink.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] rgoers closed pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
rgoers closed pull request #346:
URL: https://github.com/apache/flume/pull/346


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] rgoers commented on pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
rgoers commented on pull request #346:
URL: https://github.com/apache/flume/pull/346#issuecomment-1067398437


   I am still not understanding. "When I need to use any kafka-related attributes in sink, I currently can't get them.". For the life of me I simply don't understand why you need information regarding the Kafka Channel in an HDFSEventSink. What could that possibly have to do with the file name. 
   
   @tmgstevens The example you cite makes a bit more sense since that topic is where the event Flume is managing was published to - so it originated with the producer of the event. The topic used for the file channel is published to and consumed by Flume (although I know it is possible to use a KafkaChannel as if it was a sink, it still wouldn't make sense to add the topic since the consumer must already know that to consume it).
   
   If this is really about enhancing the way the HDFSEventSink constructs the file name perhaps that would make more sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] rgoers commented on pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
rgoers commented on pull request #346:
URL: https://github.com/apache/flume/pull/346#issuecomment-1065411600


   @wanghangyu817 I am trying to understand why the HDFSEventSink would care about the topic used by the KafkaChannel. There is no guarantee that the Kafka channel was even used - they might have used the file channel.  All a channel is is a place to store the events when flume receives them until someone consumes them, so in that aspect they are just an internal component that could conceivably be changed at any time. So what is the value of the HDFSEventSink knowing information about the topic, etc that was used to temporarily store the event?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] rgoers commented on pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
rgoers commented on pull request #346:
URL: https://github.com/apache/flume/pull/346#issuecomment-1068807611


   I have a feeling that a minor enhancement I am adding will address your issue.
   
   I have had to customize (i.e. - replace) Flume's application class since the day I started using it. The reason why is that we have a Sink that does some internal work and then passes the result to a Source where it gets written to a channel and then is sent along downstream. To do this the Sink needs to locate the Source instance, which it can't do from a start() method since it doesn't have access to the configuration.  To accommodate this I am adding an Initializable interface with a single void initialize(MaterializedConfiguration configuration) method. So in your case you could extend the HDFSEventSink and have it implement the Initialize interface and get the information from the KafkaChannel you want.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flume] wanghangyu817 commented on pull request #346: Resolved that when using KafkaChannel, Kafka-related metadata information could not be obtained in sink

Posted by GitBox <gi...@apache.org>.
wanghangyu817 commented on pull request #346:
URL: https://github.com/apache/flume/pull/346#issuecomment-1064740901


   @rgoers thank you for response!
   If want to use some of kafka's topic, offset, timestrap, etc. properties in HDFSEventSink, it is currently impossible to do so.   So I added these attributes to KafkaChannel and passed them downstream so that they were available to all sinks using Kafka-channel  (In this way, there are many scenarios. For example, some values in the data can be configured as the folder name of sink ) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@flume.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org