You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flume.apache.org by Ralph Goers <ra...@dslextreme.com> on 2019/04/28 16:19:35 UTC

Better Marketing

When I read sites like https://www.slant.co/versus/959/960/~fluentd_vs_flume <https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit discouraged at how people misunderstand Flume. Even a site like https://www.predictiveanalyticstoday.com/data-ingestion-tools/ <https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is misleading by copying our home page by just saying "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data” and then copying the image. This leads users to believe that Flume is only useful in a small set of use cases and is intimately tied to Hadoop. 

I believe the home page should be changed to indicate say that "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and streaming large amounts of data”, and then following up to indicate that it is appropriate to use to move any kind of streaming data such as application, audit, or system logs, real time events such as stock quotes, or user transaction records. 

The second sentence should also be modified to say "It is robust and fault tolerant with tunable reliability mechanisms that can insure guaranteed delivery and many failover and recovery mechanisms”. 

I also think the very first image should be modified to not show just a web application and HDFS as it seems to give people the impression that Flume is only usable with Hadoop or in web applications. Unfortunately, only the png seems to have been committed so redoing the diagram will mean starting from scratch.

Thoughts?

Ralph

Re: Better Marketing

Posted by Ralph Goers <ra...@dslextreme.com>.

Of course. I don’t commit or comment much but I am on the PMC.

Ralph

> On Apr 28, 2019, at 11:43 AM, Bessenyei Balázs Donát <be...@apache.org> wrote:
> 
> I see, thank you!
> 
> Are you open to creating a PR? I hope that more people would be able
> to provide feedback that way.
> 
> 
> Donat
> 
> On Sun, Apr 28, 2019 at 8:36 PM Ralph Goers <ra...@dslextreme.com> wrote:
>> 
>> What I am seeing is that people go to the home page and cut the first paragraph as a description of Flume. All I am really proposing is that we change that to more effectively describe Flume. The description that is there is accurate but minimal. I would just like to rephrase that paragraph to give a more complete description of what Flume can be used for.
>> 
>> As an aside, I have been working on Log4j, Spring-Cloud-Config and docker. In doing that I have done some crude benchmarking which you can see at http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance <http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>. I was quite surprised the performance of the Flume Embedded Appender with a memory channel. I would have expected it to be more in line with the Async Loggers and at the most in line with the Rolling File Appender since the event is essentially handed to another thread to be processed.  It would be nice to see Flume be able to recommended for use as a log forwarder/aggregator for all apps with Docker instead of just when guaranteed delivery is required and I would love to upgrade the Flume documentation to describe how to do that.
>> 
>> Ralph
>> 
>>> On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát <be...@apache.org> wrote:
>>> 
>>> I agree that marketing could be improved and I support finding a
>>> slogan that represents best what Flume is today.
>>> I am not sure about the wording that has been proposed, though. Can
>>> you please elaborate, Ralph?
>>> 
>>> 
>>> Thank you,
>>> 
>>> Donat
>>> 
>>> On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <ra...@dslextreme.com> wrote:
>>>> 
>>>> When I read sites like https://www.slant.co/versus/959/960/~fluentd_vs_flume <https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit discouraged at how people misunderstand Flume. Even a site like https://www.predictiveanalyticstoday.com/data-ingestion-tools/ <https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is misleading by copying our home page by just saying "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data” and then copying the image. This leads users to believe that Flume is only useful in a small set of use cases and is intimately tied to Hadoop.
>>>> 
>>>> I believe the home page should be changed to indicate say that "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and streaming large amounts of data”, and then following up to indicate that it is appropriate to use to move any kind of streaming data such as application, audit, or system logs, real time events such as stock quotes, or user transaction records.
>>>> 
>>>> The second sentence should also be modified to say "It is robust and fault tolerant with tunable reliability mechanisms that can insure guaranteed delivery and many failover and recovery mechanisms”.
>>>> 
>>>> I also think the very first image should be modified to not show just a web application and HDFS as it seems to give people the impression that Flume is only usable with Hadoop or in web applications. Unfortunately, only the png seems to have been committed so redoing the diagram will mean starting from scratch.
>>>> 
>>>> Thoughts?
>>>> 
>>>> Ralph
>>> 
>> 
>

Re: Better Marketing

Posted by Bessenyei Balázs Donát <be...@apache.org>.

I see, thank you!

Are you open to creating a PR? I hope that more people would be able
to provide feedback that way.


Donat

On Sun, Apr 28, 2019 at 8:36 PM Ralph Goers <ra...@dslextreme.com> wrote:
>
> What I am seeing is that people go to the home page and cut the first paragraph as a description of Flume. All I am really proposing is that we change that to more effectively describe Flume. The description that is there is accurate but minimal. I would just like to rephrase that paragraph to give a more complete description of what Flume can be used for.
>
> As an aside, I have been working on Log4j, Spring-Cloud-Config and docker. In doing that I have done some crude benchmarking which you can see at http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance <http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>. I was quite surprised the performance of the Flume Embedded Appender with a memory channel. I would have expected it to be more in line with the Async Loggers and at the most in line with the Rolling File Appender since the event is essentially handed to another thread to be processed.  It would be nice to see Flume be able to recommended for use as a log forwarder/aggregator for all apps with Docker instead of just when guaranteed delivery is required and I would love to upgrade the Flume documentation to describe how to do that.
>
> Ralph
>
> > On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát <be...@apache.org> wrote:
> >
> > I agree that marketing could be improved and I support finding a
> > slogan that represents best what Flume is today.
> > I am not sure about the wording that has been proposed, though. Can
> > you please elaborate, Ralph?
> >
> >
> > Thank you,
> >
> > Donat
> >
> > On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <ra...@dslextreme.com> wrote:
> >>
> >> When I read sites like https://www.slant.co/versus/959/960/~fluentd_vs_flume <https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit discouraged at how people misunderstand Flume. Even a site like https://www.predictiveanalyticstoday.com/data-ingestion-tools/ <https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is misleading by copying our home page by just saying "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data” and then copying the image. This leads users to believe that Flume is only useful in a small set of use cases and is intimately tied to Hadoop.
> >>
> >> I believe the home page should be changed to indicate say that "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and streaming large amounts of data”, and then following up to indicate that it is appropriate to use to move any kind of streaming data such as application, audit, or system logs, real time events such as stock quotes, or user transaction records.
> >>
> >> The second sentence should also be modified to say "It is robust and fault tolerant with tunable reliability mechanisms that can insure guaranteed delivery and many failover and recovery mechanisms”.
> >>
> >> I also think the very first image should be modified to not show just a web application and HDFS as it seems to give people the impression that Flume is only usable with Hadoop or in web applications. Unfortunately, only the png seems to have been committed so redoing the diagram will mean starting from scratch.
> >>
> >> Thoughts?
> >>
> >> Ralph
> >
>

Re: Better Marketing

Posted by Ferenc Szabo <fs...@cloudera.com.INVALID>.

I support the change. It could help the project. Fortunately, the diagram
is simply made with draw.io and can be recreated in no time.
[image: flume_example.png]

Let me know where I can help.


On Sun, Apr 28, 2019 at 9:08 PM Mike Percy <mp...@apache.org> wrote:

> Great, sounds like you made progress on the perf thing. I’m not talking
> about other products Flume is bundled with, simply what the project ships
> with the binary artifacts at release time.
>
> Mike
>
> Sent from my iPhone
>
> > On Apr 28, 2019, at 12:04 PM, Ralph Goers <ra...@dslextreme.com>
> wrote:
> >
> > Yes, Mike. I understand that it is shipped with a product that uses it
> for that purpose. To be honest, I have used Flume in 3 different projects
> so far and none of them have integrated with Hadoop. I do have an upcoming
> project that probably will, although Hadoop will probably only be one of
> the destinations the data is delivered to. The others might be a third
> party SIEM product as well as some kind of ELK stack, so even in that case
> Hadoop wouldn’t be the primary “selling” point.
> >
> > No, I haven’t done profiling yet. At this point my main focus is Log4j.
> Once I get past that I can take a pass at profiling. It is possible the
> problem might be in Log4j, but since the embedded Appender just constructs
> the event and passes it to the Flume Embedded Agent I would be surprised if
> it is in Log4j. However, while testing I did find one bug already in Log4j
> that was causing a performance hit with Flume and have corrected that.
> >
> > Ralph
> >
> >> On Apr 28, 2019, at 11:42 AM, Mike Percy <mp...@apache.org> wrote:
> >>
> >> I’d certainly be in favor of updating the project description to be
> more general. That said, part of Flume’s value proposition is integration
> with a bunch of components off the shelf and the main ones it ships are
> Hadoop ecosystem components, so we shouldn’t completely ignore that when
> describing the project.
> >>
> >> Regarding the memory channel perf issues you observed, did you do any
> profiling? Do you think part of the issue could be Java GC? The memory
> channel tends to allocate and reclaim a lot of memory in a short period of
> time.
> >>
> >> Mike
> >>
> >> Sent from my iPhone
> >>
> >>> On Apr 28, 2019, at 11:35 AM, Ralph Goers <ra...@dslextreme.com>
> wrote:
> >>>
> >>> What I am seeing is that people go to the home page and cut the first
> paragraph as a description of Flume. All I am really proposing is that we
> change that to more effectively describe Flume. The description that is
> there is accurate but minimal. I would just like to rephrase that paragraph
> to give a more complete description of what Flume can be used for.
> >>>
> >>> As an aside, I have been working on Log4j, Spring-Cloud-Config and
> docker. In doing that I have done some crude benchmarking which you can see
> at
> http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance
> <
> http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>.
> I was quite surprised the performance of the Flume Embedded Appender with a
> memory channel. I would have expected it to be more in line with the Async
> Loggers and at the most in line with the Rolling File Appender since the
> event is essentially handed to another thread to be processed.  It would be
> nice to see Flume be able to recommended for use as a log
> forwarder/aggregator for all apps with Docker instead of just when
> guaranteed delivery is required and I would love to upgrade the Flume
> documentation to describe how to do that.
> >>>
> >>> Ralph
> >>>
> >>>> On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát <
> bessbd@apache.org> wrote:
> >>>>
> >>>> I agree that marketing could be improved and I support finding a
> >>>> slogan that represents best what Flume is today.
> >>>> I am not sure about the wording that has been proposed, though. Can
> >>>> you please elaborate, Ralph?
> >>>>
> >>>>
> >>>> Thank you,
> >>>>
> >>>> Donat
> >>>>
> >>>>> On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <
> ralph.goers@dslextreme.com> wrote:
> >>>>>
> >>>>> When I read sites like
> https://www.slant.co/versus/959/960/~fluentd_vs_flume <
> https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit
> discouraged at how people misunderstand Flume. Even a site like
> https://www.predictiveanalyticstoday.com/data-ingestion-tools/ <
> https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is
> misleading by copying our home page by just saying "Flume is a distributed,
> reliable, and available service for efficiently collecting, aggregating,
> and moving large amounts of log data” and then copying the image. This
> leads users to believe that Flume is only useful in a small set of use
> cases and is intimately tied to Hadoop.
> >>>>>
> >>>>> I believe the home page should be changed to indicate say that
> "Flume is a distributed, reliable, and available service for efficiently
> collecting, aggregating, and streaming large amounts of data”, and then
> following up to indicate that it is appropriate to use to move any kind of
> streaming data such as application, audit, or system logs, real time events
> such as stock quotes, or user transaction records.
> >>>>>
> >>>>> The second sentence should also be modified to say "It is robust and
> fault tolerant with tunable reliability mechanisms that can insure
> guaranteed delivery and many failover and recovery mechanisms”.
> >>>>>
> >>>>> I also think the very first image should be modified to not show
> just a web application and HDFS as it seems to give people the impression
> that Flume is only usable with Hadoop or in web applications.
> Unfortunately, only the png seems to have been committed so redoing the
> diagram will mean starting from scratch.
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>> Ralph
> >>>>
> >>>
> >>
> >>
> >
> >
>
>

Re: Better Marketing

Posted by Mike Percy <mp...@apache.org>.

Great, sounds like you made progress on the perf thing. I’m not talking about other products Flume is bundled with, simply what the project ships with the binary artifacts at release time.

Mike

Sent from my iPhone

> On Apr 28, 2019, at 12:04 PM, Ralph Goers <ra...@dslextreme.com> wrote:
> 
> Yes, Mike. I understand that it is shipped with a product that uses it for that purpose. To be honest, I have used Flume in 3 different projects so far and none of them have integrated with Hadoop. I do have an upcoming project that probably will, although Hadoop will probably only be one of the destinations the data is delivered to. The others might be a third party SIEM product as well as some kind of ELK stack, so even in that case Hadoop wouldn’t be the primary “selling” point.
> 
> No, I haven’t done profiling yet. At this point my main focus is Log4j. Once I get past that I can take a pass at profiling. It is possible the problem might be in Log4j, but since the embedded Appender just constructs the event and passes it to the Flume Embedded Agent I would be surprised if it is in Log4j. However, while testing I did find one bug already in Log4j that was causing a performance hit with Flume and have corrected that. 
> 
> Ralph
> 
>> On Apr 28, 2019, at 11:42 AM, Mike Percy <mp...@apache.org> wrote:
>> 
>> I’d certainly be in favor of updating the project description to be more general. That said, part of Flume’s value proposition is integration with a bunch of components off the shelf and the main ones it ships are Hadoop ecosystem components, so we shouldn’t completely ignore that when describing the project.
>> 
>> Regarding the memory channel perf issues you observed, did you do any profiling? Do you think part of the issue could be Java GC? The memory channel tends to allocate and reclaim a lot of memory in a short period of time.
>> 
>> Mike
>> 
>> Sent from my iPhone
>> 
>>> On Apr 28, 2019, at 11:35 AM, Ralph Goers <ra...@dslextreme.com> wrote:
>>> 
>>> What I am seeing is that people go to the home page and cut the first paragraph as a description of Flume. All I am really proposing is that we change that to more effectively describe Flume. The description that is there is accurate but minimal. I would just like to rephrase that paragraph to give a more complete description of what Flume can be used for.
>>> 
>>> As an aside, I have been working on Log4j, Spring-Cloud-Config and docker. In doing that I have done some crude benchmarking which you can see at http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance <http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>. I was quite surprised the performance of the Flume Embedded Appender with a memory channel. I would have expected it to be more in line with the Async Loggers and at the most in line with the Rolling File Appender since the event is essentially handed to another thread to be processed.  It would be nice to see Flume be able to recommended for use as a log forwarder/aggregator for all apps with Docker instead of just when guaranteed delivery is required and I would love to upgrade the Flume documentation to describe how to do that.
>>> 
>>> Ralph
>>> 
>>>> On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát <be...@apache.org> wrote:
>>>> 
>>>> I agree that marketing could be improved and I support finding a
>>>> slogan that represents best what Flume is today.
>>>> I am not sure about the wording that has been proposed, though. Can
>>>> you please elaborate, Ralph?
>>>> 
>>>> 
>>>> Thank you,
>>>> 
>>>> Donat
>>>> 
>>>>> On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <ra...@dslextreme.com> wrote:
>>>>> 
>>>>> When I read sites like https://www.slant.co/versus/959/960/~fluentd_vs_flume <https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit discouraged at how people misunderstand Flume. Even a site like https://www.predictiveanalyticstoday.com/data-ingestion-tools/ <https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is misleading by copying our home page by just saying "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data” and then copying the image. This leads users to believe that Flume is only useful in a small set of use cases and is intimately tied to Hadoop.
>>>>> 
>>>>> I believe the home page should be changed to indicate say that "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and streaming large amounts of data”, and then following up to indicate that it is appropriate to use to move any kind of streaming data such as application, audit, or system logs, real time events such as stock quotes, or user transaction records.
>>>>> 
>>>>> The second sentence should also be modified to say "It is robust and fault tolerant with tunable reliability mechanisms that can insure guaranteed delivery and many failover and recovery mechanisms”.
>>>>> 
>>>>> I also think the very first image should be modified to not show just a web application and HDFS as it seems to give people the impression that Flume is only usable with Hadoop or in web applications. Unfortunately, only the png seems to have been committed so redoing the diagram will mean starting from scratch.
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Ralph
>>>> 
>>> 
>> 
>> 
> 
>

Re: Better Marketing

Posted by Ralph Goers <ra...@dslextreme.com>.

Yes, Mike. I understand that it is shipped with a product that uses it for that purpose. To be honest, I have used Flume in 3 different projects so far and none of them have integrated with Hadoop. I do have an upcoming project that probably will, although Hadoop will probably only be one of the destinations the data is delivered to. The others might be a third party SIEM product as well as some kind of ELK stack, so even in that case Hadoop wouldn’t be the primary “selling” point.

No, I haven’t done profiling yet. At this point my main focus is Log4j. Once I get past that I can take a pass at profiling. It is possible the problem might be in Log4j, but since the embedded Appender just constructs the event and passes it to the Flume Embedded Agent I would be surprised if it is in Log4j. However, while testing I did find one bug already in Log4j that was causing a performance hit with Flume and have corrected that. 

Ralph

> On Apr 28, 2019, at 11:42 AM, Mike Percy <mp...@apache.org> wrote:
> 
> I’d certainly be in favor of updating the project description to be more general. That said, part of Flume’s value proposition is integration with a bunch of components off the shelf and the main ones it ships are Hadoop ecosystem components, so we shouldn’t completely ignore that when describing the project.
> 
> Regarding the memory channel perf issues you observed, did you do any profiling? Do you think part of the issue could be Java GC? The memory channel tends to allocate and reclaim a lot of memory in a short period of time.
> 
> Mike
> 
> Sent from my iPhone
> 
>> On Apr 28, 2019, at 11:35 AM, Ralph Goers <ra...@dslextreme.com> wrote:
>> 
>> What I am seeing is that people go to the home page and cut the first paragraph as a description of Flume. All I am really proposing is that we change that to more effectively describe Flume. The description that is there is accurate but minimal. I would just like to rephrase that paragraph to give a more complete description of what Flume can be used for.
>> 
>> As an aside, I have been working on Log4j, Spring-Cloud-Config and docker. In doing that I have done some crude benchmarking which you can see at http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance <http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>. I was quite surprised the performance of the Flume Embedded Appender with a memory channel. I would have expected it to be more in line with the Async Loggers and at the most in line with the Rolling File Appender since the event is essentially handed to another thread to be processed.  It would be nice to see Flume be able to recommended for use as a log forwarder/aggregator for all apps with Docker instead of just when guaranteed delivery is required and I would love to upgrade the Flume documentation to describe how to do that.
>> 
>> Ralph
>> 
>>> On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát <be...@apache.org> wrote:
>>> 
>>> I agree that marketing could be improved and I support finding a
>>> slogan that represents best what Flume is today.
>>> I am not sure about the wording that has been proposed, though. Can
>>> you please elaborate, Ralph?
>>> 
>>> 
>>> Thank you,
>>> 
>>> Donat
>>> 
>>>> On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <ra...@dslextreme.com> wrote:
>>>> 
>>>> When I read sites like https://www.slant.co/versus/959/960/~fluentd_vs_flume <https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit discouraged at how people misunderstand Flume. Even a site like https://www.predictiveanalyticstoday.com/data-ingestion-tools/ <https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is misleading by copying our home page by just saying "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data” and then copying the image. This leads users to believe that Flume is only useful in a small set of use cases and is intimately tied to Hadoop.
>>>> 
>>>> I believe the home page should be changed to indicate say that "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and streaming large amounts of data”, and then following up to indicate that it is appropriate to use to move any kind of streaming data such as application, audit, or system logs, real time events such as stock quotes, or user transaction records.
>>>> 
>>>> The second sentence should also be modified to say "It is robust and fault tolerant with tunable reliability mechanisms that can insure guaranteed delivery and many failover and recovery mechanisms”.
>>>> 
>>>> I also think the very first image should be modified to not show just a web application and HDFS as it seems to give people the impression that Flume is only usable with Hadoop or in web applications. Unfortunately, only the png seems to have been committed so redoing the diagram will mean starting from scratch.
>>>> 
>>>> Thoughts?
>>>> 
>>>> Ralph
>>> 
>> 
> 
>

Re: Better Marketing

Posted by Mike Percy <mp...@apache.org>.

I’d certainly be in favor of updating the project description to be more general. That said, part of Flume’s value proposition is integration with a bunch of components off the shelf and the main ones it ships are Hadoop ecosystem components, so we shouldn’t completely ignore that when describing the project.

Regarding the memory channel perf issues you observed, did you do any profiling? Do you think part of the issue could be Java GC? The memory channel tends to allocate and reclaim a lot of memory in a short period of time.

Mike

Sent from my iPhone

> On Apr 28, 2019, at 11:35 AM, Ralph Goers <ra...@dslextreme.com> wrote:
> 
> What I am seeing is that people go to the home page and cut the first paragraph as a description of Flume. All I am really proposing is that we change that to more effectively describe Flume. The description that is there is accurate but minimal. I would just like to rephrase that paragraph to give a more complete description of what Flume can be used for.
> 
> As an aside, I have been working on Log4j, Spring-Cloud-Config and docker. In doing that I have done some crude benchmarking which you can see at http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance <http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>. I was quite surprised the performance of the Flume Embedded Appender with a memory channel. I would have expected it to be more in line with the Async Loggers and at the most in line with the Rolling File Appender since the event is essentially handed to another thread to be processed.  It would be nice to see Flume be able to recommended for use as a log forwarder/aggregator for all apps with Docker instead of just when guaranteed delivery is required and I would love to upgrade the Flume documentation to describe how to do that.
> 
> Ralph
> 
>> On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát <be...@apache.org> wrote:
>> 
>> I agree that marketing could be improved and I support finding a
>> slogan that represents best what Flume is today.
>> I am not sure about the wording that has been proposed, though. Can
>> you please elaborate, Ralph?
>> 
>> 
>> Thank you,
>> 
>> Donat
>> 
>>> On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <ra...@dslextreme.com> wrote:
>>> 
>>> When I read sites like https://www.slant.co/versus/959/960/~fluentd_vs_flume <https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit discouraged at how people misunderstand Flume. Even a site like https://www.predictiveanalyticstoday.com/data-ingestion-tools/ <https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is misleading by copying our home page by just saying "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data” and then copying the image. This leads users to believe that Flume is only useful in a small set of use cases and is intimately tied to Hadoop.
>>> 
>>> I believe the home page should be changed to indicate say that "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and streaming large amounts of data”, and then following up to indicate that it is appropriate to use to move any kind of streaming data such as application, audit, or system logs, real time events such as stock quotes, or user transaction records.
>>> 
>>> The second sentence should also be modified to say "It is robust and fault tolerant with tunable reliability mechanisms that can insure guaranteed delivery and many failover and recovery mechanisms”.
>>> 
>>> I also think the very first image should be modified to not show just a web application and HDFS as it seems to give people the impression that Flume is only usable with Hadoop or in web applications. Unfortunately, only the png seems to have been committed so redoing the diagram will mean starting from scratch.
>>> 
>>> Thoughts?
>>> 
>>> Ralph
>> 
>

Re: Better Marketing

Posted by Ralph Goers <ra...@dslextreme.com>.

What I am seeing is that people go to the home page and cut the first paragraph as a description of Flume. All I am really proposing is that we change that to more effectively describe Flume. The description that is there is accurate but minimal. I would just like to rephrase that paragraph to give a more complete description of what Flume can be used for.

As an aside, I have been working on Log4j, Spring-Cloud-Config and docker. In doing that I have done some crude benchmarking which you can see at http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance <http://rgoers.github.io/log4j2-site/manual/cloud.html#Appender_Performance>. I was quite surprised the performance of the Flume Embedded Appender with a memory channel. I would have expected it to be more in line with the Async Loggers and at the most in line with the Rolling File Appender since the event is essentially handed to another thread to be processed.  It would be nice to see Flume be able to recommended for use as a log forwarder/aggregator for all apps with Docker instead of just when guaranteed delivery is required and I would love to upgrade the Flume documentation to describe how to do that.

Ralph

> On Apr 28, 2019, at 9:58 AM, Bessenyei Balázs Donát <be...@apache.org> wrote:
> 
> I agree that marketing could be improved and I support finding a
> slogan that represents best what Flume is today.
> I am not sure about the wording that has been proposed, though. Can
> you please elaborate, Ralph?
> 
> 
> Thank you,
> 
> Donat
> 
> On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <ra...@dslextreme.com> wrote:
>> 
>> When I read sites like https://www.slant.co/versus/959/960/~fluentd_vs_flume <https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit discouraged at how people misunderstand Flume. Even a site like https://www.predictiveanalyticstoday.com/data-ingestion-tools/ <https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is misleading by copying our home page by just saying "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data” and then copying the image. This leads users to believe that Flume is only useful in a small set of use cases and is intimately tied to Hadoop.
>> 
>> I believe the home page should be changed to indicate say that "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and streaming large amounts of data”, and then following up to indicate that it is appropriate to use to move any kind of streaming data such as application, audit, or system logs, real time events such as stock quotes, or user transaction records.
>> 
>> The second sentence should also be modified to say "It is robust and fault tolerant with tunable reliability mechanisms that can insure guaranteed delivery and many failover and recovery mechanisms”.
>> 
>> I also think the very first image should be modified to not show just a web application and HDFS as it seems to give people the impression that Flume is only usable with Hadoop or in web applications. Unfortunately, only the png seems to have been committed so redoing the diagram will mean starting from scratch.
>> 
>> Thoughts?
>> 
>> Ralph
>

Re: Better Marketing

Posted by Bessenyei Balázs Donát <be...@apache.org>.

I agree that marketing could be improved and I support finding a
slogan that represents best what Flume is today.
I am not sure about the wording that has been proposed, though. Can
you please elaborate, Ralph?


Thank you,

Donat

On Sun, Apr 28, 2019 at 6:19 PM Ralph Goers <ra...@dslextreme.com> wrote:
>
> When I read sites like https://www.slant.co/versus/959/960/~fluentd_vs_flume <https://www.slant.co/versus/959/960/~fluentd_vs_flume> I get a bit discouraged at how people misunderstand Flume. Even a site like https://www.predictiveanalyticstoday.com/data-ingestion-tools/ <https://www.predictiveanalyticstoday.com/data-ingestion-tools/> is misleading by copying our home page by just saying "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data” and then copying the image. This leads users to believe that Flume is only useful in a small set of use cases and is intimately tied to Hadoop.
>
> I believe the home page should be changed to indicate say that "Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and streaming large amounts of data”, and then following up to indicate that it is appropriate to use to move any kind of streaming data such as application, audit, or system logs, real time events such as stock quotes, or user transaction records.
>
> The second sentence should also be modified to say "It is robust and fault tolerant with tunable reliability mechanisms that can insure guaranteed delivery and many failover and recovery mechanisms”.
>
> I also think the very first image should be modified to not show just a web application and HDFS as it seems to give people the impression that Flume is only usable with Hadoop or in web applications. Unfortunately, only the png seems to have been committed so redoing the diagram will mean starting from scratch.
>
> Thoughts?
>
> Ralph