You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Matthew Moore <ma...@crowdmob.com> on 2013/03/29 15:44:16 UTC

S3 Sink in FlumeNG Configuration?

Hey there,

I know this is a really newbish question, but I'm hoping to get a little
assistance here so I'm not stuck guess-and-checking.

I'm trying to figure out how to configure FlumeNG (1.3.1), but I couldn't
figure out how to setup the hdfs sink to use the s3 implementations.

I'm keeping track of my progress on this gist I made:
https://gist.github.com/crowdmatt/5256881

>From what I've gathered, I should be using the hdfs type, which I'm setting
up as such:

agent.sinks = s3Sink
agent.sinks.s3Sink.type = hdfs
agent.sinks.s3Sink.channel = recoverableMemoryChannel

... but that's where I end up hitting my head against the wall.  I know I
should be specifying my s3 access key, secret, and bucket in this format:
s3n://ACCESS_KEY_ID:SECRET_ACCESS_KEY@my-hdfs/

However, I don't know where to specify that, or what dot notation to use.

Can anyone point me in the right direction?

Best,
Matt
--
Matthew Moore
Co-Founder & CTO, CrowdMob Inc.
Mobile: (650) 888-5962

Need to schedule a meeting?  Invite me via Google Calendar!
matt@crowdmob.com

Re: S3 Sink in FlumeNG Configuration?

Posted by Matthew Moore <ma...@crowdmob.com>.

I am awesome at answering my own questions =\

I was using jets3t 0.7.4 instead of the 0.6.1 included with Hadoop (yet
jets3t wasn't included with Flume)

Best,
Matt
--
Matthew Moore
Co-Founder & CTO, CrowdMob Inc.
Mobile: (650) 888-5962

Need to schedule a meeting?  Invite me via Google Calendar!
matt@crowdmob.com


On Fri, Mar 29, 2013 at 12:15 PM, Matthew Moore <ma...@crowdmob.com> wrote:

> Hey Guys,
>
> I've made a decent amount of progress, and now have the settings correct.
>  For completeness, the settings look like this:
>
> agent.sinks.s3Sink.type = hdfs
> agent.sinks.s3Sink.hdfs.path = s3://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@BUCKET-NAME/
>
> You can see the full setup at this gist:
> https://gist.github.com/crowdmatt/5256881
>
>
> However, I've run into the following problem:
>
>
> 2013-03-29 19:05:28,954 (SinkRunner-PollingRunner-DefaultSinkProcessor)
> [ERROR -
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:460)]
> process failed
> org.apache.hadoop.fs.s3.S3Exception:
> org.jets3t.service.S3ServiceException: Request Error. HEAD
> '/FlumeData.1364583927762.tmp' on Host 'mybucket.s3.amazonaws.com' @
> 'Fri, 29 Mar 2013 19:05:28 GMT' -- ResponseCode: 404, ResponseStatus: Not
> Found, RequestId: 00864FE1DCD5AD95, HostId:
> 68AuSUe/XsP9zUiwe4yqhhDjETjVEnXVuTdZjYKQfj6VBKyACLH++MD1i8xgrEE4
>  at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:122)
>
>
> Does anyone have any pointers on how I can start debugging?
>
> Best,
> Matt
> --
> Matthew Moore
> Co-Founder & CTO, CrowdMob Inc.
> Mobile: (650) 888-5962
>
> Need to schedule a meeting?  Invite me via Google Calendar!
> matt@crowdmob.com
>
>
> On Fri, Mar 29, 2013 at 8:47 AM, Matthew Moore <ma...@crowdmob.com> wrote:
>
>> Hey,
>>
>> Thanks for the links to the Jiras.  It seems like someone implemented
>> an S3BufferedWriter which might be helpful in the future.
>>
>> However, I'm still not sure what to set the configuration (flume.conf) to
>> use s3 as a sink?  Has anyone done that?
>>
>> Best,
>> Matt
>> --
>> Matthew Moore
>> Co-Founder & CTO, CrowdMob Inc.
>> Mobile: (650) 888-5962
>>
>> Need to schedule a meeting?  Invite me via Google Calendar!
>> matt@crowdmob.com
>>
>>
>> On Fri, Mar 29, 2013 at 7:49 AM, Brock Noland <br...@cloudera.com> wrote:
>>
>>> Sorry, I don't know much about this, but here are two relevant JIRA's:
>>>
>>> https://issues.apache.org/jira/browse/FLUME-1228
>>> https://issues.apache.org/jira/browse/FLUME-951
>>>
>>>
>>> On Fri, Mar 29, 2013 at 9:44 AM, Matthew Moore <ma...@crowdmob.com>wrote:
>>>
>>>> Hey there,
>>>>
>>>> I know this is a really newbish question, but I'm hoping to get a
>>>> little assistance here so I'm not stuck guess-and-checking.
>>>>
>>>> I'm trying to figure out how to configure FlumeNG (1.3.1), but I
>>>> couldn't figure out how to setup the hdfs sink to use the s3
>>>> implementations.
>>>>
>>>> I'm keeping track of my progress on this gist I made:
>>>> https://gist.github.com/crowdmatt/5256881
>>>>
>>>> From what I've gathered, I should be using the hdfs type, which I'm
>>>> setting up as such:
>>>>
>>>> agent.sinks = s3Sink
>>>> agent.sinks.s3Sink.type = hdfs
>>>> agent.sinks.s3Sink.channel = recoverableMemoryChannel
>>>>
>>>> ... but that's where I end up hitting my head against the wall.  I know
>>>> I should be specifying my s3 access key, secret, and bucket in this format:
>>>> s3n://ACCESS_KEY_ID:SECRET_ACCESS_KEY@my-hdfs/
>>>>
>>>> However, I don't know where to specify that, or what dot notation to
>>>> use.
>>>>
>>>> Can anyone point me in the right direction?
>>>>
>>>> Best,
>>>> Matt
>>>> --
>>>> Matthew Moore
>>>> Co-Founder & CTO, CrowdMob Inc.
>>>> Mobile: (650) 888-5962
>>>>
>>>> Need to schedule a meeting?  Invite me via Google Calendar!
>>>> matt@crowdmob.com
>>>>
>>>
>>>
>>>
>>> --
>>> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>>>
>>
>>
>

Re: S3 Sink in FlumeNG Configuration?

Posted by Matthew Moore <ma...@crowdmob.com>.

Hey Guys,

I've made a decent amount of progress, and now have the settings correct.
 For completeness, the settings look like this:

agent.sinks.s3Sink.type = hdfs
agent.sinks.s3Sink.hdfs.path =
s3://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@BUCKET-NAME/

You can see the full setup at this gist:
https://gist.github.com/crowdmatt/5256881


However, I've run into the following problem:


2013-03-29 19:05:28,954 (SinkRunner-PollingRunner-DefaultSinkProcessor)
[ERROR -
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:460)]
process failed
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:
Request Error. HEAD '/FlumeData.1364583927762.tmp' on Host '
mybucket.s3.amazonaws.com' @ 'Fri, 29 Mar 2013 19:05:28 GMT' --
ResponseCode: 404, ResponseStatus: Not Found, RequestId: 00864FE1DCD5AD95,
HostId: 68AuSUe/XsP9zUiwe4yqhhDjETjVEnXVuTdZjYKQfj6VBKyACLH++MD1i8xgrEE4
at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:122)


Does anyone have any pointers on how I can start debugging?

Best,
Matt
--
Matthew Moore
Co-Founder & CTO, CrowdMob Inc.
Mobile: (650) 888-5962

Need to schedule a meeting?  Invite me via Google Calendar!
matt@crowdmob.com


On Fri, Mar 29, 2013 at 8:47 AM, Matthew Moore <ma...@crowdmob.com> wrote:

> Hey,
>
> Thanks for the links to the Jiras.  It seems like someone implemented
> an S3BufferedWriter which might be helpful in the future.
>
> However, I'm still not sure what to set the configuration (flume.conf) to
> use s3 as a sink?  Has anyone done that?
>
> Best,
> Matt
> --
> Matthew Moore
> Co-Founder & CTO, CrowdMob Inc.
> Mobile: (650) 888-5962
>
> Need to schedule a meeting?  Invite me via Google Calendar!
> matt@crowdmob.com
>
>
> On Fri, Mar 29, 2013 at 7:49 AM, Brock Noland <br...@cloudera.com> wrote:
>
>> Sorry, I don't know much about this, but here are two relevant JIRA's:
>>
>> https://issues.apache.org/jira/browse/FLUME-1228
>> https://issues.apache.org/jira/browse/FLUME-951
>>
>>
>> On Fri, Mar 29, 2013 at 9:44 AM, Matthew Moore <ma...@crowdmob.com> wrote:
>>
>>> Hey there,
>>>
>>> I know this is a really newbish question, but I'm hoping to get a little
>>> assistance here so I'm not stuck guess-and-checking.
>>>
>>> I'm trying to figure out how to configure FlumeNG (1.3.1), but I
>>> couldn't figure out how to setup the hdfs sink to use the s3
>>> implementations.
>>>
>>> I'm keeping track of my progress on this gist I made:
>>> https://gist.github.com/crowdmatt/5256881
>>>
>>> From what I've gathered, I should be using the hdfs type, which I'm
>>> setting up as such:
>>>
>>> agent.sinks = s3Sink
>>> agent.sinks.s3Sink.type = hdfs
>>> agent.sinks.s3Sink.channel = recoverableMemoryChannel
>>>
>>> ... but that's where I end up hitting my head against the wall.  I know
>>> I should be specifying my s3 access key, secret, and bucket in this format:
>>> s3n://ACCESS_KEY_ID:SECRET_ACCESS_KEY@my-hdfs/
>>>
>>> However, I don't know where to specify that, or what dot notation to use.
>>>
>>> Can anyone point me in the right direction?
>>>
>>> Best,
>>> Matt
>>> --
>>> Matthew Moore
>>> Co-Founder & CTO, CrowdMob Inc.
>>> Mobile: (650) 888-5962
>>>
>>> Need to schedule a meeting?  Invite me via Google Calendar!
>>> matt@crowdmob.com
>>>
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>>
>
>

Re: S3 Sink in FlumeNG Configuration?

Posted by Matthew Moore <ma...@crowdmob.com>.

Hey,

Thanks for the links to the Jiras.  It seems like someone implemented
an S3BufferedWriter which might be helpful in the future.

However, I'm still not sure what to set the configuration (flume.conf) to
use s3 as a sink?  Has anyone done that?

Best,
Matt
--
Matthew Moore
Co-Founder & CTO, CrowdMob Inc.
Mobile: (650) 888-5962

Need to schedule a meeting?  Invite me via Google Calendar!
matt@crowdmob.com


On Fri, Mar 29, 2013 at 7:49 AM, Brock Noland <br...@cloudera.com> wrote:

> Sorry, I don't know much about this, but here are two relevant JIRA's:
>
> https://issues.apache.org/jira/browse/FLUME-1228
> https://issues.apache.org/jira/browse/FLUME-951
>
>
> On Fri, Mar 29, 2013 at 9:44 AM, Matthew Moore <ma...@crowdmob.com> wrote:
>
>> Hey there,
>>
>> I know this is a really newbish question, but I'm hoping to get a little
>> assistance here so I'm not stuck guess-and-checking.
>>
>> I'm trying to figure out how to configure FlumeNG (1.3.1), but I couldn't
>> figure out how to setup the hdfs sink to use the s3 implementations.
>>
>> I'm keeping track of my progress on this gist I made:
>> https://gist.github.com/crowdmatt/5256881
>>
>> From what I've gathered, I should be using the hdfs type, which I'm
>> setting up as such:
>>
>> agent.sinks = s3Sink
>> agent.sinks.s3Sink.type = hdfs
>> agent.sinks.s3Sink.channel = recoverableMemoryChannel
>>
>> ... but that's where I end up hitting my head against the wall.  I know I
>> should be specifying my s3 access key, secret, and bucket in this format:
>> s3n://ACCESS_KEY_ID:SECRET_ACCESS_KEY@my-hdfs/
>>
>> However, I don't know where to specify that, or what dot notation to use.
>>
>> Can anyone point me in the right direction?
>>
>> Best,
>> Matt
>> --
>> Matthew Moore
>> Co-Founder & CTO, CrowdMob Inc.
>> Mobile: (650) 888-5962
>>
>> Need to schedule a meeting?  Invite me via Google Calendar!
>> matt@crowdmob.com
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>

Re: S3 Sink in FlumeNG Configuration?

Posted by Brock Noland <br...@cloudera.com>.

Sorry, I don't know much about this, but here are two relevant JIRA's:

https://issues.apache.org/jira/browse/FLUME-1228
https://issues.apache.org/jira/browse/FLUME-951


On Fri, Mar 29, 2013 at 9:44 AM, Matthew Moore <ma...@crowdmob.com> wrote:

> Hey there,
>
> I know this is a really newbish question, but I'm hoping to get a little
> assistance here so I'm not stuck guess-and-checking.
>
> I'm trying to figure out how to configure FlumeNG (1.3.1), but I couldn't
> figure out how to setup the hdfs sink to use the s3 implementations.
>
> I'm keeping track of my progress on this gist I made:
> https://gist.github.com/crowdmatt/5256881
>
> From what I've gathered, I should be using the hdfs type, which I'm
> setting up as such:
>
> agent.sinks = s3Sink
> agent.sinks.s3Sink.type = hdfs
> agent.sinks.s3Sink.channel = recoverableMemoryChannel
>
> ... but that's where I end up hitting my head against the wall.  I know I
> should be specifying my s3 access key, secret, and bucket in this format:
> s3n://ACCESS_KEY_ID:SECRET_ACCESS_KEY@my-hdfs/
>
> However, I don't know where to specify that, or what dot notation to use.
>
> Can anyone point me in the right direction?
>
> Best,
> Matt
> --
> Matthew Moore
> Co-Founder & CTO, CrowdMob Inc.
> Mobile: (650) 888-5962
>
> Need to schedule a meeting?  Invite me via Google Calendar!
> matt@crowdmob.com
>



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org