You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Dibyajyoti Ghosh <di...@gmail.com> on 2013/10/04 21:14:29 UTC

ElasticSearchSink - A couple of feature requests

Hi all,

This is a repost from dev@flume.apache.org. I was not sure if flume
developers got the email thus pardon my repost if it feels like I am
spamming the mailing list.

I have a couple of feature requests for ElasticSearchSink and didn't find
open JIRA tickets for these requirements.

I have already modified ElasticSearchSink locally for the smaller of the
feature request and the longer one is in progress. I wanted to discuss the
features first with you first before creating the JIRA tickets so here is a
brief summary of the improvements I have in mind.


DETAILS>>>

Flume version:

Flume 1.4.0-cdh4.4.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 154d35659212f07edc896b414a43996fb8121773
Compiled by jenkins on Tue Sep  3 20:53:28 PDT 2013
>From source with checksum f95b4a7f48080f876d6482bb88bcc342

And ElasticSearch v0.90.1.
*
*
*Improvement request #1 - HDFS file suffix style index suffix in
ElasticSearchSink:**
*
*
*
*agent.sinks.myESsink.indexName = myIndex **
*
*
*
ElasticSearchSink uses the provided index name as index prefix and appends
"YYYY-MM-DD" to generate the actual index in ES which being convenient for
my testing purposes, doesn't allow creating index monthly / yearly or more
generally speaking based on some regex provided in flume config similar to
HDFS fileSuffix .e.g.
*
*
*agent.sinks.myESsink.indexSuffix = "YYYY"* will create index as
myIndex-2013 / myIndex-2014 etc and when not provided will create index
with just the index name or can default back to 'YYYY-MM-DD'.

*Improvement request #2 - ElasticSearchSink ttl field modification to mimic
actual ES:*

*agent.sinks.myESsink.ttl = <some integer value> (current specification)*

The second one is comparatively trivial but good to have. Current ElasticSearch
TTL defaults to 5 days and works with integers only again which is treated
as days.

It will be good to have a qualifier like "d" / "s" / "m" / "w" / "h" to
mimic the TTL configuration in ElasticSearch mapping.

*agent.sinks.myESsink.ttl = "3w" / 3 (requested specification)*

For the ttl I have already made changes in my local flume git repo and
currently testing it. The change doesn't break existing way of specifying
TTL field only extends it to allow "1d" / "2w" style TTL specification.

<<<DETAILS

Kindly suggest what should I do to make these changes incorporated in the
future release(s) of Flume.

Best and thanks,
- Dib

Re: ElasticSearchSink - A couple of feature requests

Posted by Dibyajyoti Ghosh <di...@gmail.com>.
Hi all,

Can any of the Flume JIRA admins please assign
https://issues.apache.org/jira/browse/FLUME-2206 ticket to me. I am testing
the changes locally and have a patch I would like to submit for review.

Thanks,
- Dib


On Fri, Oct 4, 2013 at 1:55 PM, Dibyajyoti Ghosh
<di...@gmail.com>wrote:

> Thanks Hari.
>
> I am creating JIRA tickets for the improvements.
>
> Best,
> - Dib
>
>
> On Fri, Oct 4, 2013 at 1:45 PM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>
>>  Hi,
>>
>> I am not too familiar with ElasticSearch. If you want to file a jira,
>> someone might pick it up when they have time.
>>
>>
>> Thanks,
>> Hari
>>
>> On Friday, October 4, 2013 at 12:14 PM, Dibyajyoti Ghosh wrote:
>>
>> Hi all,
>>
>> This is a repost from dev@flume.apache.org. I was not sure if flume
>> developers got the email thus pardon my repost if it feels like I am
>> spamming the mailing list.
>>
>> I have a couple of feature requests for ElasticSearchSink and didn't find
>> open JIRA tickets for these requirements.
>>
>> I have already modified ElasticSearchSink locally for the smaller of the
>> feature request and the longer one is in progress. I wanted to discuss the
>> features first with you first before creating the JIRA tickets so here is a
>> brief summary of the improvements I have in mind.
>>
>>
>> DETAILS>>>
>>
>> Flume version:
>>
>> Flume 1.4.0-cdh4.4.0
>> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
>> Revision: 154d35659212f07edc896b414a43996fb8121773
>> Compiled by jenkins on Tue Sep  3 20:53:28 PDT 2013
>> From source with checksum f95b4a7f48080f876d6482bb88bcc342
>>
>> And ElasticSearch v0.90.1.
>> *
>> *
>> *Improvement request #1 - HDFS file suffix style index suffix in
>> ElasticSearchSink:**
>> *
>> *
>> *
>> *agent.sinks.myESsink.indexName = myIndex **
>> *
>> *
>> *
>> ElasticSearchSink uses the provided index name as index prefix and
>> appends "YYYY-MM-DD" to generate the actual index in ES which being
>> convenient for my testing purposes, doesn't allow creating index monthly /
>> yearly or more generally speaking based on some regex provided in flume
>> config similar to HDFS fileSuffix .e.g.
>> *
>> *
>> *agent.sinks.myESsink.indexSuffix = "YYYY"* will create index as
>> myIndex-2013 / myIndex-2014 etc and when not provided will create index
>> with just the index name or can default back to 'YYYY-MM-DD'.
>>
>> *Improvement request #2 - ElasticSearchSink ttl field modification to
>> mimic actual ES:*
>>
>> *agent.sinks.myESsink.ttl = <some integer value> (current specification)*
>>
>> The second one is comparatively trivial but good to have. Current ElasticSearch
>> TTL defaults to 5 days and works with integers only again which is treated
>> as days.
>>
>> It will be good to have a qualifier like "d" / "s" / "m" / "w" / "h" to
>> mimic the TTL configuration in ElasticSearch mapping.
>>
>> *agent.sinks.myESsink.ttl = "3w" / 3 (requested specification)*
>>
>> For the ttl I have already made changes in my local flume git repo and
>> currently testing it. The change doesn't break existing way of specifying
>> TTL field only extends it to allow "1d" / "2w" style TTL specification.
>>
>> <<<DETAILS
>>
>> Kindly suggest what should I do to make these changes incorporated in the
>> future release(s) of Flume.
>>
>> Best and thanks,
>> - Dib
>>
>>
>>
>

Re: ElasticSearchSink - A couple of feature requests

Posted by Dibyajyoti Ghosh <di...@gmail.com>.
Hi all,

Can any of the Flume JIRA admins please assign
https://issues.apache.org/jira/browse/FLUME-2206 ticket to me. I am testing
the changes locally and have a patch I would like to submit for review.

Thanks,
- Dib


On Fri, Oct 4, 2013 at 1:55 PM, Dibyajyoti Ghosh
<di...@gmail.com>wrote:

> Thanks Hari.
>
> I am creating JIRA tickets for the improvements.
>
> Best,
> - Dib
>
>
> On Fri, Oct 4, 2013 at 1:45 PM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>
>>  Hi,
>>
>> I am not too familiar with ElasticSearch. If you want to file a jira,
>> someone might pick it up when they have time.
>>
>>
>> Thanks,
>> Hari
>>
>> On Friday, October 4, 2013 at 12:14 PM, Dibyajyoti Ghosh wrote:
>>
>> Hi all,
>>
>> This is a repost from dev@flume.apache.org. I was not sure if flume
>> developers got the email thus pardon my repost if it feels like I am
>> spamming the mailing list.
>>
>> I have a couple of feature requests for ElasticSearchSink and didn't find
>> open JIRA tickets for these requirements.
>>
>> I have already modified ElasticSearchSink locally for the smaller of the
>> feature request and the longer one is in progress. I wanted to discuss the
>> features first with you first before creating the JIRA tickets so here is a
>> brief summary of the improvements I have in mind.
>>
>>
>> DETAILS>>>
>>
>> Flume version:
>>
>> Flume 1.4.0-cdh4.4.0
>> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
>> Revision: 154d35659212f07edc896b414a43996fb8121773
>> Compiled by jenkins on Tue Sep  3 20:53:28 PDT 2013
>> From source with checksum f95b4a7f48080f876d6482bb88bcc342
>>
>> And ElasticSearch v0.90.1.
>> *
>> *
>> *Improvement request #1 - HDFS file suffix style index suffix in
>> ElasticSearchSink:**
>> *
>> *
>> *
>> *agent.sinks.myESsink.indexName = myIndex **
>> *
>> *
>> *
>> ElasticSearchSink uses the provided index name as index prefix and
>> appends "YYYY-MM-DD" to generate the actual index in ES which being
>> convenient for my testing purposes, doesn't allow creating index monthly /
>> yearly or more generally speaking based on some regex provided in flume
>> config similar to HDFS fileSuffix .e.g.
>> *
>> *
>> *agent.sinks.myESsink.indexSuffix = "YYYY"* will create index as
>> myIndex-2013 / myIndex-2014 etc and when not provided will create index
>> with just the index name or can default back to 'YYYY-MM-DD'.
>>
>> *Improvement request #2 - ElasticSearchSink ttl field modification to
>> mimic actual ES:*
>>
>> *agent.sinks.myESsink.ttl = <some integer value> (current specification)*
>>
>> The second one is comparatively trivial but good to have. Current ElasticSearch
>> TTL defaults to 5 days and works with integers only again which is treated
>> as days.
>>
>> It will be good to have a qualifier like "d" / "s" / "m" / "w" / "h" to
>> mimic the TTL configuration in ElasticSearch mapping.
>>
>> *agent.sinks.myESsink.ttl = "3w" / 3 (requested specification)*
>>
>> For the ttl I have already made changes in my local flume git repo and
>> currently testing it. The change doesn't break existing way of specifying
>> TTL field only extends it to allow "1d" / "2w" style TTL specification.
>>
>> <<<DETAILS
>>
>> Kindly suggest what should I do to make these changes incorporated in the
>> future release(s) of Flume.
>>
>> Best and thanks,
>> - Dib
>>
>>
>>
>

Re: ElasticSearchSink - A couple of feature requests

Posted by Dibyajyoti Ghosh <di...@gmail.com>.
Thanks Hari.

I am creating JIRA tickets for the improvements.

Best,
- Dib


On Fri, Oct 4, 2013 at 1:45 PM, Hari Shreedharan
<hs...@cloudera.com>wrote:

>  Hi,
>
> I am not too familiar with ElasticSearch. If you want to file a jira,
> someone might pick it up when they have time.
>
>
> Thanks,
> Hari
>
> On Friday, October 4, 2013 at 12:14 PM, Dibyajyoti Ghosh wrote:
>
> Hi all,
>
> This is a repost from dev@flume.apache.org. I was not sure if flume
> developers got the email thus pardon my repost if it feels like I am
> spamming the mailing list.
>
> I have a couple of feature requests for ElasticSearchSink and didn't find
> open JIRA tickets for these requirements.
>
> I have already modified ElasticSearchSink locally for the smaller of the
> feature request and the longer one is in progress. I wanted to discuss the
> features first with you first before creating the JIRA tickets so here is a
> brief summary of the improvements I have in mind.
>
>
> DETAILS>>>
>
> Flume version:
>
> Flume 1.4.0-cdh4.4.0
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 154d35659212f07edc896b414a43996fb8121773
> Compiled by jenkins on Tue Sep  3 20:53:28 PDT 2013
> From source with checksum f95b4a7f48080f876d6482bb88bcc342
>
> And ElasticSearch v0.90.1.
> *
> *
> *Improvement request #1 - HDFS file suffix style index suffix in
> ElasticSearchSink:**
> *
> *
> *
> *agent.sinks.myESsink.indexName = myIndex **
> *
> *
> *
> ElasticSearchSink uses the provided index name as index prefix and appends
> "YYYY-MM-DD" to generate the actual index in ES which being convenient for
> my testing purposes, doesn't allow creating index monthly / yearly or more
> generally speaking based on some regex provided in flume config similar to
> HDFS fileSuffix .e.g.
> *
> *
> *agent.sinks.myESsink.indexSuffix = "YYYY"* will create index as
> myIndex-2013 / myIndex-2014 etc and when not provided will create index
> with just the index name or can default back to 'YYYY-MM-DD'.
>
> *Improvement request #2 - ElasticSearchSink ttl field modification to
> mimic actual ES:*
>
> *agent.sinks.myESsink.ttl = <some integer value> (current specification)*
>
> The second one is comparatively trivial but good to have. Current ElasticSearch
> TTL defaults to 5 days and works with integers only again which is treated
> as days.
>
> It will be good to have a qualifier like "d" / "s" / "m" / "w" / "h" to
> mimic the TTL configuration in ElasticSearch mapping.
>
> *agent.sinks.myESsink.ttl = "3w" / 3 (requested specification)*
>
> For the ttl I have already made changes in my local flume git repo and
> currently testing it. The change doesn't break existing way of specifying
> TTL field only extends it to allow "1d" / "2w" style TTL specification.
>
> <<<DETAILS
>
> Kindly suggest what should I do to make these changes incorporated in the
> future release(s) of Flume.
>
> Best and thanks,
> - Dib
>
>
>

Re: ElasticSearchSink - A couple of feature requests

Posted by Dibyajyoti Ghosh <di...@gmail.com>.
Thanks Hari.

I am creating JIRA tickets for the improvements.

Best,
- Dib


On Fri, Oct 4, 2013 at 1:45 PM, Hari Shreedharan
<hs...@cloudera.com>wrote:

>  Hi,
>
> I am not too familiar with ElasticSearch. If you want to file a jira,
> someone might pick it up when they have time.
>
>
> Thanks,
> Hari
>
> On Friday, October 4, 2013 at 12:14 PM, Dibyajyoti Ghosh wrote:
>
> Hi all,
>
> This is a repost from dev@flume.apache.org. I was not sure if flume
> developers got the email thus pardon my repost if it feels like I am
> spamming the mailing list.
>
> I have a couple of feature requests for ElasticSearchSink and didn't find
> open JIRA tickets for these requirements.
>
> I have already modified ElasticSearchSink locally for the smaller of the
> feature request and the longer one is in progress. I wanted to discuss the
> features first with you first before creating the JIRA tickets so here is a
> brief summary of the improvements I have in mind.
>
>
> DETAILS>>>
>
> Flume version:
>
> Flume 1.4.0-cdh4.4.0
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 154d35659212f07edc896b414a43996fb8121773
> Compiled by jenkins on Tue Sep  3 20:53:28 PDT 2013
> From source with checksum f95b4a7f48080f876d6482bb88bcc342
>
> And ElasticSearch v0.90.1.
> *
> *
> *Improvement request #1 - HDFS file suffix style index suffix in
> ElasticSearchSink:**
> *
> *
> *
> *agent.sinks.myESsink.indexName = myIndex **
> *
> *
> *
> ElasticSearchSink uses the provided index name as index prefix and appends
> "YYYY-MM-DD" to generate the actual index in ES which being convenient for
> my testing purposes, doesn't allow creating index monthly / yearly or more
> generally speaking based on some regex provided in flume config similar to
> HDFS fileSuffix .e.g.
> *
> *
> *agent.sinks.myESsink.indexSuffix = "YYYY"* will create index as
> myIndex-2013 / myIndex-2014 etc and when not provided will create index
> with just the index name or can default back to 'YYYY-MM-DD'.
>
> *Improvement request #2 - ElasticSearchSink ttl field modification to
> mimic actual ES:*
>
> *agent.sinks.myESsink.ttl = <some integer value> (current specification)*
>
> The second one is comparatively trivial but good to have. Current ElasticSearch
> TTL defaults to 5 days and works with integers only again which is treated
> as days.
>
> It will be good to have a qualifier like "d" / "s" / "m" / "w" / "h" to
> mimic the TTL configuration in ElasticSearch mapping.
>
> *agent.sinks.myESsink.ttl = "3w" / 3 (requested specification)*
>
> For the ttl I have already made changes in my local flume git repo and
> currently testing it. The change doesn't break existing way of specifying
> TTL field only extends it to allow "1d" / "2w" style TTL specification.
>
> <<<DETAILS
>
> Kindly suggest what should I do to make these changes incorporated in the
> future release(s) of Flume.
>
> Best and thanks,
> - Dib
>
>
>

Re: ElasticSearchSink - A couple of feature requests

Posted by Hari Shreedharan <hs...@cloudera.com>.
Hi, 

I am not too familiar with ElasticSearch. If you want to file a jira, someone might pick it up when they have time. 


Thanks,
Hari


On Friday, October 4, 2013 at 12:14 PM, Dibyajyoti Ghosh wrote:

> Hi all,
> 
> This is a repost from dev@flume.apache.org (mailto:dev@flume.apache.org). I was not sure if flume developers got the email thus pardon my repost if it feels like I am spamming the mailing list.  
> 
> I have a couple of feature requests for ElasticSearchSink and didn't find open JIRA tickets for these requirements.  
> 
> I have already modified ElasticSearchSink locally for the smaller of the feature request and the longer one is in progress. I wanted to discuss the features first with you first before creating the JIRA tickets so here is a brief summary of the improvements I have in mind. 
> 
> 
> DETAILS>>> 
> 
> Flume version:
> 
> Flume 1.4.0-cdh4.4.0 
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 154d35659212f07edc896b414a43996fb8121773
> Compiled by jenkins on Tue Sep  3 20:53:28 PDT 2013
> From source with checksum f95b4a7f48080f876d6482bb88bcc342
> 
> 
> And ElasticSearch v0.90.1.
> 
> Improvement request #1 - HDFS file suffix style index suffix in ElasticSearchSink:
> 
> agent.sinks.myESsink.indexName = myIndex 
> 
> ElasticSearchSink uses the provided index name as index prefix and appends "YYYY-MM-DD" to generate the actual index in ES which being convenient for my testing purposes, doesn't allow creating index monthly / yearly or more generally speaking based on some regex provided in flume config similar to HDFS fileSuffix .e.g. 
> 
> agent.sinks.myESsink.indexSuffix = "YYYY" will create index as myIndex-2013 / myIndex-2014 etc and when not provided will create index with just the index name or can default back to 'YYYY-MM-DD'.  
> 
> Improvement request #2 - ElasticSearchSink ttl field modification to mimic actual ES:
> 
> agent.sinks.myESsink.ttl = <some integer value> (current specification)
> 
> The second one is comparatively trivial but good to have. Current ElasticSearch TTL defaults to 5 days and works with integers only again which is treated as days. 
> 
> It will be good to have a qualifier like "d" / "s" / "m" / "w" / "h" to mimic the TTL configuration in ElasticSearch mapping. 
> 
> agent.sinks.myESsink.ttl = "3w" / 3 (requested specification)
> 
> For the ttl I have already made changes in my local flume git repo and currently testing it. The change doesn't break existing way of specifying TTL field only extends it to allow "1d" / "2w" style TTL specification. 
> 
> <<<DETAILS
> 
> Kindly suggest what should I do to make these changes incorporated in the future release(s) of Flume.
> 
> Best and thanks,
> - Dib  


Re: ElasticSearchSink - A couple of feature requests

Posted by Hari Shreedharan <hs...@cloudera.com>.
Hi, 

I am not too familiar with ElasticSearch. If you want to file a jira, someone might pick it up when they have time. 


Thanks,
Hari


On Friday, October 4, 2013 at 12:14 PM, Dibyajyoti Ghosh wrote:

> Hi all,
> 
> This is a repost from dev@flume.apache.org (mailto:dev@flume.apache.org). I was not sure if flume developers got the email thus pardon my repost if it feels like I am spamming the mailing list.  
> 
> I have a couple of feature requests for ElasticSearchSink and didn't find open JIRA tickets for these requirements.  
> 
> I have already modified ElasticSearchSink locally for the smaller of the feature request and the longer one is in progress. I wanted to discuss the features first with you first before creating the JIRA tickets so here is a brief summary of the improvements I have in mind. 
> 
> 
> DETAILS>>> 
> 
> Flume version:
> 
> Flume 1.4.0-cdh4.4.0 
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 154d35659212f07edc896b414a43996fb8121773
> Compiled by jenkins on Tue Sep  3 20:53:28 PDT 2013
> From source with checksum f95b4a7f48080f876d6482bb88bcc342
> 
> 
> And ElasticSearch v0.90.1.
> 
> Improvement request #1 - HDFS file suffix style index suffix in ElasticSearchSink:
> 
> agent.sinks.myESsink.indexName = myIndex 
> 
> ElasticSearchSink uses the provided index name as index prefix and appends "YYYY-MM-DD" to generate the actual index in ES which being convenient for my testing purposes, doesn't allow creating index monthly / yearly or more generally speaking based on some regex provided in flume config similar to HDFS fileSuffix .e.g. 
> 
> agent.sinks.myESsink.indexSuffix = "YYYY" will create index as myIndex-2013 / myIndex-2014 etc and when not provided will create index with just the index name or can default back to 'YYYY-MM-DD'.  
> 
> Improvement request #2 - ElasticSearchSink ttl field modification to mimic actual ES:
> 
> agent.sinks.myESsink.ttl = <some integer value> (current specification)
> 
> The second one is comparatively trivial but good to have. Current ElasticSearch TTL defaults to 5 days and works with integers only again which is treated as days. 
> 
> It will be good to have a qualifier like "d" / "s" / "m" / "w" / "h" to mimic the TTL configuration in ElasticSearch mapping. 
> 
> agent.sinks.myESsink.ttl = "3w" / 3 (requested specification)
> 
> For the ttl I have already made changes in my local flume git repo and currently testing it. The change doesn't break existing way of specifying TTL field only extends it to allow "1d" / "2w" style TTL specification. 
> 
> <<<DETAILS
> 
> Kindly suggest what should I do to make these changes incorporated in the future release(s) of Flume.
> 
> Best and thanks,
> - Dib