You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flume.apache.org by Tristan Stevens <tr...@apache.org> on 2023/03/02 14:53:43 UTC

Re: Breaking up Flume (again)

It's a non-binding -1 from me. My concern is that we actually increase the complexity of the deployment and end-user experience by doing this. All of the separate modules are built into separate maven artifacts anyway, so if people do want to package it up then they can.

My fear is that whatever we gain by splitting it up, we lose in terms of making it harder for people to deploy and use.

Tristan
________________________________
From: Ralph Goers <ra...@dslextreme.com>
Sent: 26 February 2023 18:50
To: dev@flume.apache.org <de...@flume.apache.org>
Subject: Re: Breaking up Flume (again)

The morphline solr sink has a dependency on Kite, which is a project abandoned by Cloudera. Someone would have to copy the relevant parts into the morphline repo and maintain them there. I have no interest myself in doing that.

I already split the Elasticsearch sink into the flume-search repo. As I recall I had problems building it. We have discussed that in other emails. It needs to be upgraded. I suspect the API we would have to use has an acceptable license but I believe ES itself has licensing problems.

To be honest, I don’t know what the deal is with the legacy sources and why we even have them. We have an Avro source and Thrift source in Flume Core so I don’t know why we even keep them around.

I personally don’t use Hadoop or any of its related technology. While I know those are important, it is likely I personally will only apply PRs to any of them.

Ralph

> On Feb 26, 2023, at 10:29 AM, Bessenyei Balázs Donát <be...@apache.org> wrote:
>
> +1.
>
> For #3, which ones do you think can no longer be practically supported?
>
>
> Donat
>
> On Sun, Feb 26, 2023 at 8:08 AM Ralph Goers <ra...@dslextreme.com> wrote:
>>
>> As I mentioned last year I would like to start breaking up flume into separate repos. There are a few reasons for this:
>> 1. Flume has grown so large that the CI system can no longer build it. The jobs run out of disk space due to the large logs.
>> 2. The build takes a very long time to run.
>> 3. There are several components that can no longer be practically be supported.
>>
>> To this end I am planning on creating the following Git repos:
>> flume-hadoop
>> flume-http
>> flume-irc
>> flume-jdbc
>> flume-jms
>> flume-kafka
>> flume-kudu
>> flume-legacy
>> flume-morphline
>> flume-scribe
>> flume-search
>> flume-spring-boot
>> flume-twitter
>>
>> For the time being I would propose everything else remain in the current Flume repo.
>>
>> Note that as each of these is populated they will each need to be released, However, most of these are fairly inactive so after the initial release they may not need to be touched very often.
>>
>> Also, since Jira now requires new users to request us to create accounts for them I would propose that as each of these repos are set up that they be configured to enable GitHub Issues.
>>
>> I am looking for feedback on this but if I don’t get any I plan to start work on this within a week or so.
>>
>> Ralph

Re: Breaking up Flume (again)

Posted by Ralph Goers <ra...@dslextreme.com>.

I can empathize with that. The way Flume has been packaged as a deployable zip makes it seem like adding stuff would be a problem. However, I realized what I was doing previously was completely ridiculous.

In my use of Flume I have some custom components. So I was using the maven dependency plugin to unpack the Flume zip. I then deleted or replaced various jars and added my own before repackaging it.  This was painful and had to be hand modified for every Flume release.

In moving to leverage Spring Boot I realized it should be treated as a “normal” Java application where my pom actually specified all the dependencies I wanted. This means I don’t use the distribution zip at all any more and my build makes much more sense. It also means I don’t have nearly as many potential security vulnerabilities since I am not bringing in all the flume modules I don’t use.

So I would suggest that thinking of Flume as a monolithic tool to be deployed much like Fluentd or Logstash are is probably not the best way to view it. 

Ralph

> On Mar 2, 2023, at 7:53 AM, Tristan Stevens <tr...@apache.org> wrote:
> 
> It's a non-binding -1 from me. My concern is that we actually increase the complexity of the deployment and end-user experience by doing this. All of the separate modules are built into separate maven artifacts anyway, so if people do want to package it up then they can.
> 
> My fear is that whatever we gain by splitting it up, we lose in terms of making it harder for people to deploy and use.
> 
> Tristan
> ________________________________
> From: Ralph Goers <ra...@dslextreme.com>
> Sent: 26 February 2023 18:50
> To: dev@flume.apache.org <de...@flume.apache.org>
> Subject: Re: Breaking up Flume (again)
> 
> The morphline solr sink has a dependency on Kite, which is a project abandoned by Cloudera. Someone would have to copy the relevant parts into the morphline repo and maintain them there. I have no interest myself in doing that.
> 
> I already split the Elasticsearch sink into the flume-search repo. As I recall I had problems building it. We have discussed that in other emails. It needs to be upgraded. I suspect the API we would have to use has an acceptable license but I believe ES itself has licensing problems.
> 
> To be honest, I don’t know what the deal is with the legacy sources and why we even have them. We have an Avro source and Thrift source in Flume Core so I don’t know why we even keep them around.
> 
> I personally don’t use Hadoop or any of its related technology. While I know those are important, it is likely I personally will only apply PRs to any of them.
> 
> Ralph
> 
>> On Feb 26, 2023, at 10:29 AM, Bessenyei Balázs Donát <be...@apache.org> wrote:
>> 
>> +1.
>> 
>> For #3, which ones do you think can no longer be practically supported?
>> 
>> 
>> Donat
>> 
>> On Sun, Feb 26, 2023 at 8:08 AM Ralph Goers <ra...@dslextreme.com> wrote:
>>> 
>>> As I mentioned last year I would like to start breaking up flume into separate repos. There are a few reasons for this:
>>> 1. Flume has grown so large that the CI system can no longer build it. The jobs run out of disk space due to the large logs.
>>> 2. The build takes a very long time to run.
>>> 3. There are several components that can no longer be practically be supported.
>>> 
>>> To this end I am planning on creating the following Git repos:
>>> flume-hadoop
>>> flume-http
>>> flume-irc
>>> flume-jdbc
>>> flume-jms
>>> flume-kafka
>>> flume-kudu
>>> flume-legacy
>>> flume-morphline
>>> flume-scribe
>>> flume-search
>>> flume-spring-boot
>>> flume-twitter
>>> 
>>> For the time being I would propose everything else remain in the current Flume repo.
>>> 
>>> Note that as each of these is populated they will each need to be released, However, most of these are fairly inactive so after the initial release they may not need to be touched very often.
>>> 
>>> Also, since Jira now requires new users to request us to create accounts for them I would propose that as each of these repos are set up that they be configured to enable GitHub Issues.
>>> 
>>> I am looking for feedback on this but if I don’t get any I plan to start work on this within a week or so.
>>> 
>>> Ralph
>