You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Ralph Goers <ra...@dslextreme.com> on 2022/10/14 04:02:55 UTC

The next releases of Flume

The Flume build takes forever. It takes about 70 minutes on my 2019 MacBook Pro, which is a pretty beefy machine. In addition, the build continuously fails in the CI builds because it generates too much output and runs out of disk space.

I discussed this previously but I would like to start breaking up Flume into separate, independently releases repos. This should make releases easier.

Another point of discussion would be that Flume is currently released as a packaged zip. Personally, I think this is a bad idea as it includes ALL the Flume components whether they are required or not. It makes more sense to me to build Flume as a normal application using Maven dependencies. If you use the new Spring Boot support you will still get all the dependencies packaged in the deployable jar. Even if you don’t like using Spring Boot I believe you can still use the Spring Boot Maven plugin to generate an executable jar.

To start this discussion off I would propose immediately creating the following repos.

1. flume-spring-boot
2. flume-kafka
3. flume-jdbc
4. flume-legacy
5. flume-hadoop (would contain hive, hdfs, and hbase stuff)
6. flume-kudu 
7. flume-jms
8. flume-twitter
9. flume-thrift. 

In addition, flume-search was already created and I would like to move the flume-ng-morphline-solr-sink there. For the time being the Elasticsearch module will need to be bypassed until it can be upgraded to a supportable version of ES. 

Thoughts?

Ralph

Re: The next releases of Flume

Posted by Sean Busbey <sb...@apple.com.INVALID>.
Tristan could you give some specific example(s) of the kind of issues or difficulty managing you are concerned about with the multiple repo approach?

The most common problem I’ve seen, for example, is how to come up with an omnibus binary artifact that contains all of the repo output without effectively making it so everyone has to deal with all the repos all the time. I believe Ralph’s suggestion is essentially “we don’t produce an omnibus package and instead rely on e.g. maven for when folks want to pull together pieces” and that would address the concern for me personally.

We would need solid testing infrastructure to handle flagging up problems between the modules in these repos at PR time. I have some experience making that kind of test infra and would be happy to help construct it in this case.



> On Oct 14, 2022, at 7:39 AM, Tristan Stevens <tr...@apache.org> wrote:
> 
> Generally, I'm reluctant to go down the multi-repo approach - I've seen it cause massive issues elsewhere and it becomes really difficult to manage. But could we do something smart where we build the modules separately, perhaps via profiles, so that we can improve the build times and keep things more loosely coupled than before? Perhaps even have different build artifacts for different scenarios?
> 
> Tristan
> 
> ________________________________
> From: Ralph Goers <ra...@dslextreme.com>
> Sent: 14 October 2022 06:42
> To: dev@flume.apache.org <de...@flume.apache.org>
> Subject: Re: The next releases of Flume
> 
> Yes. Flume core could be considered to be the flume-ng-configuration, flume-ng-node, flume-ng-sdk, and most of flume-ng-core modules. While there are a few things in flume-ng-core worth pulling out I wouldn’t remove them initially  unless they obviously fit into one of the repos I listed below. I should also add that the File Channel is a child module of the flume-ng-channels module. I personally also consider that to be a core component.
> 
> I should also add that the main flume build should add a flume-bom so that if the repos are versioned independently users would still have a way of including them easily.
> 
> Ralph
> 
>> On Oct 13, 2022, at 9:22 PM, Matt Sicker <bo...@gmail.com> wrote:
>> 
>> Is there some sort of flume core, too, or is that the spring boot one?
>> 
>> —
>> Matt Sicker
>> 
>>> On Oct 13, 2022, at 23:03, Ralph Goers <ra...@dslextreme.com> wrote:
>>> 
>>> The Flume build takes forever. It takes about 70 minutes on my 2019 MacBook Pro, which is a pretty beefy machine. In addition, the build continuously fails in the CI builds because it generates too much output and runs out of disk space.
>>> 
>>> I discussed this previously but I would like to start breaking up Flume into separate, independently releases repos. This should make releases easier.
>>> 
>>> Another point of discussion would be that Flume is currently released as a packaged zip. Personally, I think this is a bad idea as it includes ALL the Flume components whether they are required or not. It makes more sense to me to build Flume as a normal application using Maven dependencies. If you use the new Spring Boot support you will still get all the dependencies packaged in the deployable jar. Even if you don’t like using Spring Boot I believe you can still use the Spring Boot Maven plugin to generate an executable jar.
>>> 
>>> To start this discussion off I would propose immediately creating the following repos.
>>> 
>>> 1. flume-spring-boot
>>> 2. flume-kafka
>>> 3. flume-jdbc
>>> 4. flume-legacy
>>> 5. flume-hadoop (would contain hive, hdfs, and hbase stuff)
>>> 6. flume-kudu
>>> 7. flume-jms
>>> 8. flume-twitter
>>> 9. flume-thrift.
>>> 
>>> In addition, flume-search was already created and I would like to move the flume-ng-morphline-solr-sink there. For the time being the Elasticsearch module will need to be bypassed until it can be upgraded to a supportable version of ES.
>>> 
>>> Thoughts?
>>> 
>>> Ralph
> 



Re: The next releases of Flume

Posted by Tristan Stevens <tr...@apache.org>.
Generally, I'm reluctant to go down the multi-repo approach - I've seen it cause massive issues elsewhere and it becomes really difficult to manage. But could we do something smart where we build the modules separately, perhaps via profiles, so that we can improve the build times and keep things more loosely coupled than before? Perhaps even have different build artifacts for different scenarios?

Tristan

________________________________
From: Ralph Goers <ra...@dslextreme.com>
Sent: 14 October 2022 06:42
To: dev@flume.apache.org <de...@flume.apache.org>
Subject: Re: The next releases of Flume

Yes. Flume core could be considered to be the flume-ng-configuration, flume-ng-node, flume-ng-sdk, and most of flume-ng-core modules. While there are a few things in flume-ng-core worth pulling out I wouldn’t remove them initially  unless they obviously fit into one of the repos I listed below. I should also add that the File Channel is a child module of the flume-ng-channels module. I personally also consider that to be a core component.

I should also add that the main flume build should add a flume-bom so that if the repos are versioned independently users would still have a way of including them easily.

Ralph

> On Oct 13, 2022, at 9:22 PM, Matt Sicker <bo...@gmail.com> wrote:
>
> Is there some sort of flume core, too, or is that the spring boot one?
>
> —
> Matt Sicker
>
>> On Oct 13, 2022, at 23:03, Ralph Goers <ra...@dslextreme.com> wrote:
>>
>> The Flume build takes forever. It takes about 70 minutes on my 2019 MacBook Pro, which is a pretty beefy machine. In addition, the build continuously fails in the CI builds because it generates too much output and runs out of disk space.
>>
>> I discussed this previously but I would like to start breaking up Flume into separate, independently releases repos. This should make releases easier.
>>
>> Another point of discussion would be that Flume is currently released as a packaged zip. Personally, I think this is a bad idea as it includes ALL the Flume components whether they are required or not. It makes more sense to me to build Flume as a normal application using Maven dependencies. If you use the new Spring Boot support you will still get all the dependencies packaged in the deployable jar. Even if you don’t like using Spring Boot I believe you can still use the Spring Boot Maven plugin to generate an executable jar.
>>
>> To start this discussion off I would propose immediately creating the following repos.
>>
>> 1. flume-spring-boot
>> 2. flume-kafka
>> 3. flume-jdbc
>> 4. flume-legacy
>> 5. flume-hadoop (would contain hive, hdfs, and hbase stuff)
>> 6. flume-kudu
>> 7. flume-jms
>> 8. flume-twitter
>> 9. flume-thrift.
>>
>> In addition, flume-search was already created and I would like to move the flume-ng-morphline-solr-sink there. For the time being the Elasticsearch module will need to be bypassed until it can be upgraded to a supportable version of ES.
>>
>> Thoughts?
>>
>> Ralph


Re: The next releases of Flume

Posted by Ralph Goers <ra...@dslextreme.com>.
Yes. Flume core could be considered to be the flume-ng-configuration, flume-ng-node, flume-ng-sdk, and most of flume-ng-core modules. While there are a few things in flume-ng-core worth pulling out I wouldn’t remove them initially  unless they obviously fit into one of the repos I listed below. I should also add that the File Channel is a child module of the flume-ng-channels module. I personally also consider that to be a core component.

I should also add that the main flume build should add a flume-bom so that if the repos are versioned independently users would still have a way of including them easily. 

Ralph

> On Oct 13, 2022, at 9:22 PM, Matt Sicker <bo...@gmail.com> wrote:
> 
> Is there some sort of flume core, too, or is that the spring boot one?
> 
> —
> Matt Sicker
> 
>> On Oct 13, 2022, at 23:03, Ralph Goers <ra...@dslextreme.com> wrote:
>> 
>> The Flume build takes forever. It takes about 70 minutes on my 2019 MacBook Pro, which is a pretty beefy machine. In addition, the build continuously fails in the CI builds because it generates too much output and runs out of disk space.
>> 
>> I discussed this previously but I would like to start breaking up Flume into separate, independently releases repos. This should make releases easier.
>> 
>> Another point of discussion would be that Flume is currently released as a packaged zip. Personally, I think this is a bad idea as it includes ALL the Flume components whether they are required or not. It makes more sense to me to build Flume as a normal application using Maven dependencies. If you use the new Spring Boot support you will still get all the dependencies packaged in the deployable jar. Even if you don’t like using Spring Boot I believe you can still use the Spring Boot Maven plugin to generate an executable jar.
>> 
>> To start this discussion off I would propose immediately creating the following repos.
>> 
>> 1. flume-spring-boot
>> 2. flume-kafka
>> 3. flume-jdbc
>> 4. flume-legacy
>> 5. flume-hadoop (would contain hive, hdfs, and hbase stuff)
>> 6. flume-kudu 
>> 7. flume-jms
>> 8. flume-twitter
>> 9. flume-thrift. 
>> 
>> In addition, flume-search was already created and I would like to move the flume-ng-morphline-solr-sink there. For the time being the Elasticsearch module will need to be bypassed until it can be upgraded to a supportable version of ES. 
>> 
>> Thoughts?
>> 
>> Ralph


Re: The next releases of Flume

Posted by Matt Sicker <bo...@gmail.com>.
Is there some sort of flume core, too, or is that the spring boot one?

—
Matt Sicker

> On Oct 13, 2022, at 23:03, Ralph Goers <ra...@dslextreme.com> wrote:
> 
> The Flume build takes forever. It takes about 70 minutes on my 2019 MacBook Pro, which is a pretty beefy machine. In addition, the build continuously fails in the CI builds because it generates too much output and runs out of disk space.
> 
> I discussed this previously but I would like to start breaking up Flume into separate, independently releases repos. This should make releases easier.
> 
> Another point of discussion would be that Flume is currently released as a packaged zip. Personally, I think this is a bad idea as it includes ALL the Flume components whether they are required or not. It makes more sense to me to build Flume as a normal application using Maven dependencies. If you use the new Spring Boot support you will still get all the dependencies packaged in the deployable jar. Even if you don’t like using Spring Boot I believe you can still use the Spring Boot Maven plugin to generate an executable jar.
> 
> To start this discussion off I would propose immediately creating the following repos.
> 
> 1. flume-spring-boot
> 2. flume-kafka
> 3. flume-jdbc
> 4. flume-legacy
> 5. flume-hadoop (would contain hive, hdfs, and hbase stuff)
> 6. flume-kudu 
> 7. flume-jms
> 8. flume-twitter
> 9. flume-thrift. 
> 
> In addition, flume-search was already created and I would like to move the flume-ng-morphline-solr-sink there. For the time being the Elasticsearch module will need to be bypassed until it can be upgraded to a supportable version of ES. 
> 
> Thoughts?
> 
> Ralph