You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Jean-Philippe Caruana <jp...@target2sell.com> on 2014/11/26 17:05:38 UTC

support for Google Storage ?

Hi,

I am a total newbee about hadoop, so sorry if my questions sound stupid
(please give me pointers).

I would like to use flume to send data to hdfs on google cloud :
- does GS (google storage) support exists ? It would be great to use a
path like this gs://some_path
- where does the flume agent needs to be ? when I see  hdfs://some_path/
I wonder why there is no server address in the path

In fact I looking for feedback about sending data to a google cloud
hadoop cluster from my own (on premises) servers.

Thanks

-- 
Jean-Philippe Caruana 
http://www.barreverte.fr


Re: support for Google Storage ?

Posted by Hari Shreedharan <hs...@cloudera.com>.
Looks like it will be in the next release of flume



Thanks,
Hari

On Thu, Dec 4, 2014 at 2:22 AM, Jean-Philippe Caruana <jp...@target2sell.com>
wrote:

> ok, but you told me to submit a patch...
> Maybe a dedicated section in the documentation about flume-env.sh would
> prevent people from doing mistakes, like me ?
> Le 04/12/2014 11:03, Ashish a écrit :
>> Related JIRA that has already been fixed
>> https://issues.apache.org/jira/browse/FLUME-2337
>>
>> On Thu, Dec 4, 2014 at 3:18 PM, Jean-Philippe Caruana
>> <jp...@target2sell.com> wrote:
>>> https://issues.apache.org/jira/browse/FLUME-2569
>>>
>>> Le 04/12/2014 09:57, Hari Shreedharan a écrit :
>>>
>>> Reasons are mostly historical. Feel free to submit a patch to bump it up
>>>
>>> Thanks, Hari
>>>
>>>
>>> On Thu, Dec 4, 2014 at 12:55 AM, Jean-Philippe Caruana <jp...@target2sell.com>
>>> wrote:
>>>> Yes, flume-env.sh starts the JVM with 20 Mo and GCS opens a 64 Mo buffer.
>>>>
>>>> Any idea/reason why flume starts with such a low heap space ?
>>>>
>>>> Le 04/12/2014 01:19, Hari Shreedharan a écrit :
>>>>
>>>> It looks like you are just running out of heap space. Try increasing the
>>>> heap space by specifying a higher value in the flume-env.sh file.
>>>>
>>>> Thanks,
>>>> Hari
>>>>
>>>>
>>>> On Mon, Dec 1, 2014 at 9:02 AM, Jean-Philippe Caruana <jp...@target2sell.com>
>>>> wrote:
>>>>> I also asked the question on SO :
>>>>> https://stackoverflow.com/questions/27232966/what-causes-flume-with-gcs-sink-to-throw-a-outofmemoryexepction
> -- 
> Jean-Philippe Caruana 
> http://www.barreverte.fr

Re: support for Google Storage ?

Posted by Jean-Philippe Caruana <jp...@target2sell.com>.
ok, but you told me to submit a patch...

Maybe a dedicated section in the documentation about flume-env.sh would
prevent people from doing mistakes, like me ?

Le 04/12/2014 11:03, Ashish a écrit :
> Related JIRA that has already been fixed
> https://issues.apache.org/jira/browse/FLUME-2337
>
> On Thu, Dec 4, 2014 at 3:18 PM, Jean-Philippe Caruana
> <jp...@target2sell.com> wrote:
>> https://issues.apache.org/jira/browse/FLUME-2569
>>
>> Le 04/12/2014 09:57, Hari Shreedharan a écrit :
>>
>> Reasons are mostly historical. Feel free to submit a patch to bump it up
>>
>> Thanks, Hari
>>
>>
>> On Thu, Dec 4, 2014 at 12:55 AM, Jean-Philippe Caruana <jp...@target2sell.com>
>> wrote:
>>> Yes, flume-env.sh starts the JVM with 20 Mo and GCS opens a 64 Mo buffer.
>>>
>>> Any idea/reason why flume starts with such a low heap space ?
>>>
>>> Le 04/12/2014 01:19, Hari Shreedharan a écrit :
>>>
>>> It looks like you are just running out of heap space. Try increasing the
>>> heap space by specifying a higher value in the flume-env.sh file.
>>>
>>> Thanks,
>>> Hari
>>>
>>>
>>> On Mon, Dec 1, 2014 at 9:02 AM, Jean-Philippe Caruana <jp...@target2sell.com>
>>> wrote:
>>>> I also asked the question on SO :
>>>> https://stackoverflow.com/questions/27232966/what-causes-flume-with-gcs-sink-to-throw-a-outofmemoryexepction

-- 
Jean-Philippe Caruana 
http://www.barreverte.fr


Re: support for Google Storage ?

Posted by Ashish <pa...@gmail.com>.
Related JIRA that has already been fixed
https://issues.apache.org/jira/browse/FLUME-2337

On Thu, Dec 4, 2014 at 3:18 PM, Jean-Philippe Caruana
<jp...@target2sell.com> wrote:
> https://issues.apache.org/jira/browse/FLUME-2569
>
> Le 04/12/2014 09:57, Hari Shreedharan a écrit :
>
> Reasons are mostly historical. Feel free to submit a patch to bump it up
>
> Thanks, Hari
>
>
> On Thu, Dec 4, 2014 at 12:55 AM, Jean-Philippe Caruana <jp...@target2sell.com>
> wrote:
>>
>> Yes, flume-env.sh starts the JVM with 20 Mo and GCS opens a 64 Mo buffer.
>>
>> Any idea/reason why flume starts with such a low heap space ?
>>
>> Le 04/12/2014 01:19, Hari Shreedharan a écrit :
>>
>> It looks like you are just running out of heap space. Try increasing the
>> heap space by specifying a higher value in the flume-env.sh file.
>>
>> Thanks,
>> Hari
>>
>>
>> On Mon, Dec 1, 2014 at 9:02 AM, Jean-Philippe Caruana <jp...@target2sell.com>
>> wrote:
>>>
>>> I also asked the question on SO :
>>> https://stackoverflow.com/questions/27232966/what-causes-flume-with-gcs-sink-to-throw-a-outofmemoryexepction
>>>
>>> Le 01/12/2014 15:35, Jean-Philippe Caruana a écrit :
>>>
>>> Hi,
>>>
>>> I managed to write to GS from flume [1], but this is not working 100%
>>> yet:
>>> - files are created in the expected directories, but are empty
>>> - flume throws a java.lang.OutOfMemoryError: Java heap space:
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>>     at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
>>>     at
>>> com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79)
>>>     at
>>> com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820)
>>>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>>>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>>>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>>>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>>>     at
>>> org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:96)
>>>
>>> (complete stack trace here: http://pastebin.com/i5iSgCM3)
>>>
>>> Has anyone already experienced this ?
>>> Is it a bug from google's gcs-connector-latest-hadoop2.jar ?
>>> Where should I look to find out what's wrong ?
>>>
>>> My configuration looks like this:
>>> a1.sinks.hdfs_sink.hdfs.path =
>>> gs://bucket_name/%{env}/%{tenant}/%{type}/%Y-%m-%d
>>>
>>> I am running flume from Docker.
>>>
>>> [1]
>>> http://stackoverflow.com/questions/27174033/what-is-the-minimal-setup-needed-to-write-to-hdfs-gs-on-google-cloud-storage-wit
>>>
>>> Thanks.
>>>
>>> --
>>> Jean-Philippe Caruana
>>> http://www.barreverte.fr
>>
>>
>>
>> --
>> Jean-Philippe Caruana - jp@target2sell.com
>> Target2sell, le turbo du e-commerce
>> 43 rue de Turbigo - 75003 Paris
>> +33 (0) 9 51 92 63 20 | +33 (0) 1 44 54 94 55
>>
>> http://www.target2sell.com
>> http://www.barreverte.fr
>
>
>
> --
> Jean-Philippe Caruana - jp@target2sell.com
> Target2sell, le turbo du e-commerce
> 43 rue de Turbigo - 75003 Paris
> +33 (0) 9 51 92 63 20 | +33 (0) 1 44 54 94 55
>
> http://www.target2sell.com
> http://www.barreverte.fr



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: support for Google Storage ?

Posted by Jean-Philippe Caruana <jp...@target2sell.com>.
https://issues.apache.org/jira/browse/FLUME-2569

Le 04/12/2014 09:57, Hari Shreedharan a écrit :
> Reasons are mostly historical. Feel free to submit a patch to bump it up
>
> Thanks, Hari
>
>
> On Thu, Dec 4, 2014 at 12:55 AM, Jean-Philippe Caruana
> <jp@target2sell.com <ma...@target2sell.com>> wrote:
>
>     Yes, flume-env.sh starts the JVM with 20 Mo and GCS opens a 64 Mo
>     buffer.
>
>     Any idea/reason why flume starts with such a low heap space ?
>
>     Le 04/12/2014 01:19, Hari Shreedharan a écrit :
>>     It looks like you are just running out of heap space. Try
>>     increasing the heap space by specifying a higher value in the
>>     flume-env.sh file.
>>
>>     Thanks,
>>     Hari
>>
>>
>>     On Mon, Dec 1, 2014 at 9:02 AM, Jean-Philippe Caruana
>>     <jp@target2sell.com <ma...@target2sell.com>> wrote:
>>
>>         I also asked the question on SO :
>>         https://stackoverflow.com/questions/27232966/what-causes-flume-with-gcs-sink-to-throw-a-outofmemoryexepction
>>
>>
>>         Le 01/12/2014 15:35, Jean-Philippe Caruana a écrit :
>>>         Hi,
>>>
>>>         I managed to write to GS from flume [1], but this is not
>>>         working 100% yet:
>>>         - files are created in the expected directories, but are empty
>>>         - flume throws a java.lang.OutOfMemoryError: Java heap space:
>>>
>>>         java.lang.OutOfMemoryError: Java heap space
>>>             at
>>>         java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
>>>             at
>>>         com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79)
>>>             at
>>>         com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820)
>>>             at
>>>         org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>>>             at
>>>         org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>>>             at
>>>         org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>>>             at
>>>         org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>>>             at
>>>         org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:96)
>>>
>>>         (complete stack trace here: http://pastebin.com/i5iSgCM3)
>>>
>>>         Has anyone already experienced this ?
>>>         Is it a bug from google's gcs-connector-latest-hadoop2.jar ?
>>>         Where should I look to find out what's wrong ?
>>>
>>>         My configuration looks like this:
>>>         a1.sinks.hdfs_sink.hdfs.path =
>>>         gs://bucket_name/%{env}/%{tenant}/%{type}/%Y-%m-%d
>>>
>>>         I am running flume from Docker.
>>>
>>>         [1]
>>>         http://stackoverflow.com/questions/27174033/what-is-the-minimal-setup-needed-to-write-to-hdfs-gs-on-google-cloud-storage-wit
>>>
>>>         Thanks.
>>
>>         -- 
>>         Jean-Philippe Caruana 
>>         http://www.barreverte.fr
>>
>>
>
>     -- 
>     Jean-Philippe Caruana - jp@target2sell.com
>     Target2sell, le turbo du e-commerce
>     43 rue de Turbigo - 75003 Paris
>     +33 (0) 9 51 92 63 20 | +33 (0) 1 44 54 94 55
>
>     http://www.target2sell.com
>     http://www.barreverte.fr
>
>

-- 
Jean-Philippe Caruana - jp@target2sell.com
Target2sell, le turbo du e-commerce
43 rue de Turbigo - 75003 Paris
+33 (0) 9 51 92 63 20 | +33 (0) 1 44 54 94 55

http://www.target2sell.com
http://www.barreverte.fr


Re: support for Google Storage ?

Posted by Hari Shreedharan <hs...@cloudera.com>.
Reasons are mostly historical. Feel free to submit a patch to bump it up



Thanks,
Hari

On Thu, Dec 4, 2014 at 12:55 AM, Jean-Philippe Caruana <jp...@target2sell.com>
wrote:

> Yes, flume-env.sh starts the JVM with 20 Mo and GCS opens a 64 Mo buffer.
> Any idea/reason why flume starts with such a low heap space ?
> Le 04/12/2014 01:19, Hari Shreedharan a écrit :
>> It looks like you are just running out of heap space. Try increasing
>> the heap space by specifying a higher value in the flume-env.sh file.
>>
>> Thanks,
>> Hari
>>
>>
>> On Mon, Dec 1, 2014 at 9:02 AM, Jean-Philippe Caruana
>> <jp@target2sell.com <ma...@target2sell.com>> wrote:
>>
>>     I also asked the question on SO :
>>     https://stackoverflow.com/questions/27232966/what-causes-flume-with-gcs-sink-to-throw-a-outofmemoryexepction
>>
>>
>>     Le 01/12/2014 15:35, Jean-Philippe Caruana a écrit :
>>>     Hi,
>>>
>>>     I managed to write to GS from flume [1], but this is not working
>>>     100% yet:
>>>     - files are created in the expected directories, but are empty
>>>     - flume throws a java.lang.OutOfMemoryError: Java heap space:
>>>
>>>     java.lang.OutOfMemoryError: Java heap space
>>>         at
>>>     java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
>>>         at
>>>     com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79)
>>>         at
>>>     com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820)
>>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>>>         at
>>>     org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:96)
>>>
>>>     (complete stack trace here: http://pastebin.com/i5iSgCM3)
>>>
>>>     Has anyone already experienced this ?
>>>     Is it a bug from google's gcs-connector-latest-hadoop2.jar ?
>>>     Where should I look to find out what's wrong ?
>>>
>>>     My configuration looks like this:
>>>     a1.sinks.hdfs_sink.hdfs.path =
>>>     gs://bucket_name/%{env}/%{tenant}/%{type}/%Y-%m-%d
>>>
>>>     I am running flume from Docker.
>>>
>>>     [1]
>>>     http://stackoverflow.com/questions/27174033/what-is-the-minimal-setup-needed-to-write-to-hdfs-gs-on-google-cloud-storage-wit
>>>
>>>     Thanks.
>>
>>     -- 
>>     Jean-Philippe Caruana 
>>     http://www.barreverte.fr
>>
>>
> -- 
> Jean-Philippe Caruana - jp@target2sell.com
> Target2sell, le turbo du e-commerce
> 43 rue de Turbigo - 75003 Paris
> +33 (0) 9 51 92 63 20 | +33 (0) 1 44 54 94 55
> http://www.target2sell.com
> http://www.barreverte.fr

Re: support for Google Storage ?

Posted by Jean-Philippe Caruana <jp...@target2sell.com>.
Yes, flume-env.sh starts the JVM with 20 Mo and GCS opens a 64 Mo buffer.

Any idea/reason why flume starts with such a low heap space ?

Le 04/12/2014 01:19, Hari Shreedharan a écrit :
> It looks like you are just running out of heap space. Try increasing
> the heap space by specifying a higher value in the flume-env.sh file.
>
> Thanks,
> Hari
>
>
> On Mon, Dec 1, 2014 at 9:02 AM, Jean-Philippe Caruana
> <jp@target2sell.com <ma...@target2sell.com>> wrote:
>
>     I also asked the question on SO :
>     https://stackoverflow.com/questions/27232966/what-causes-flume-with-gcs-sink-to-throw-a-outofmemoryexepction
>
>
>     Le 01/12/2014 15:35, Jean-Philippe Caruana a écrit :
>>     Hi,
>>
>>     I managed to write to GS from flume [1], but this is not working
>>     100% yet:
>>     - files are created in the expected directories, but are empty
>>     - flume throws a java.lang.OutOfMemoryError: Java heap space:
>>
>>     java.lang.OutOfMemoryError: Java heap space
>>         at
>>     java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
>>         at
>>     com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79)
>>         at
>>     com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>>         at
>>     org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:96)
>>
>>     (complete stack trace here: http://pastebin.com/i5iSgCM3)
>>
>>     Has anyone already experienced this ?
>>     Is it a bug from google's gcs-connector-latest-hadoop2.jar ?
>>     Where should I look to find out what's wrong ?
>>
>>     My configuration looks like this:
>>     a1.sinks.hdfs_sink.hdfs.path =
>>     gs://bucket_name/%{env}/%{tenant}/%{type}/%Y-%m-%d
>>
>>     I am running flume from Docker.
>>
>>     [1]
>>     http://stackoverflow.com/questions/27174033/what-is-the-minimal-setup-needed-to-write-to-hdfs-gs-on-google-cloud-storage-wit
>>
>>     Thanks.
>
>     -- 
>     Jean-Philippe Caruana 
>     http://www.barreverte.fr
>
>

-- 
Jean-Philippe Caruana - jp@target2sell.com
Target2sell, le turbo du e-commerce
43 rue de Turbigo - 75003 Paris
+33 (0) 9 51 92 63 20 | +33 (0) 1 44 54 94 55

http://www.target2sell.com
http://www.barreverte.fr


Re: support for Google Storage ?

Posted by Hari Shreedharan <hs...@cloudera.com>.
It looks like you are just running out of heap space. Try increasing the heap space by specifying a higher value in the flume-env.sh file.


Thanks,
Hari

On Mon, Dec 1, 2014 at 9:02 AM, Jean-Philippe Caruana <jp...@target2sell.com>
wrote:

> I also asked the question on SO :
> https://stackoverflow.com/questions/27232966/what-causes-flume-with-gcs-sink-to-throw-a-outofmemoryexepction
> Le 01/12/2014 15:35, Jean-Philippe Caruana a écrit :
>> Hi,
>>
>> I managed to write to GS from flume [1], but this is not working 100% yet:
>> - files are created in the expected directories, but are empty
>> - flume throws a java.lang.OutOfMemoryError: Java heap space:
>>
>> java.lang.OutOfMemoryError: Java heap space
>>     at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
>>     at
>> com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79)
>>     at
>> com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820)
>>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>>     at
>> org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:96)
>>
>> (complete stack trace here: http://pastebin.com/i5iSgCM3)
>>
>> Has anyone already experienced this ?
>> Is it a bug from google's gcs-connector-latest-hadoop2.jar ?
>> Where should I look to find out what's wrong ?
>>
>> My configuration looks like this:
>> a1.sinks.hdfs_sink.hdfs.path =
>> gs://bucket_name/%{env}/%{tenant}/%{type}/%Y-%m-%d
>>
>> I am running flume from Docker.
>>
>> [1]
>> http://stackoverflow.com/questions/27174033/what-is-the-minimal-setup-needed-to-write-to-hdfs-gs-on-google-cloud-storage-wit
>>
>> Thanks.
> -- 
> Jean-Philippe Caruana 
> http://www.barreverte.fr

Re: support for Google Storage ?

Posted by Jean-Philippe Caruana <jp...@target2sell.com>.
I also asked the question on SO :
https://stackoverflow.com/questions/27232966/what-causes-flume-with-gcs-sink-to-throw-a-outofmemoryexepction


Le 01/12/2014 15:35, Jean-Philippe Caruana a écrit :
> Hi,
>
> I managed to write to GS from flume [1], but this is not working 100% yet:
> - files are created in the expected directories, but are empty
> - flume throws a java.lang.OutOfMemoryError: Java heap space:
>
> java.lang.OutOfMemoryError: Java heap space
>     at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
>     at
> com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79)
>     at
> com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
>     at
> org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:96)
>
> (complete stack trace here: http://pastebin.com/i5iSgCM3)
>
> Has anyone already experienced this ?
> Is it a bug from google's gcs-connector-latest-hadoop2.jar ?
> Where should I look to find out what's wrong ?
>
> My configuration looks like this:
> a1.sinks.hdfs_sink.hdfs.path =
> gs://bucket_name/%{env}/%{tenant}/%{type}/%Y-%m-%d
>
> I am running flume from Docker.
>
> [1]
> http://stackoverflow.com/questions/27174033/what-is-the-minimal-setup-needed-to-write-to-hdfs-gs-on-google-cloud-storage-wit
>
> Thanks.

-- 
Jean-Philippe Caruana 
http://www.barreverte.fr


Re: support for Google Storage ?

Posted by Jean-Philippe Caruana <jp...@target2sell.com>.
Hi,

I managed to write to GS from flume [1], but this is not working 100% yet:
- files are created in the expected directories, but are empty
- flume throws a java.lang.OutOfMemoryError: Java heap space:

java.lang.OutOfMemoryError: Java heap space
    at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76)
    at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:79)
    at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:820)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
    at
org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:96)

(complete stack trace here: http://pastebin.com/i5iSgCM3)

Has anyone already experienced this ?
Is it a bug from google's gcs-connector-latest-hadoop2.jar ?
Where should I look to find out what's wrong ?

My configuration looks like this:
a1.sinks.hdfs_sink.hdfs.path =
gs://bucket_name/%{env}/%{tenant}/%{type}/%Y-%m-%d

I am running flume from Docker.

[1]
http://stackoverflow.com/questions/27174033/what-is-the-minimal-setup-needed-to-write-to-hdfs-gs-on-google-cloud-storage-wit

Thanks.


Le 26/11/2014 17:05, Jean-Philippe Caruana a écrit :
> Hi,
>
> I am a total newbee about hadoop, so sorry if my questions sound
> stupid (please give me pointers).
>
> I would like to use flume to send data to hdfs on google cloud :
> - does GS (google storage) support exists ? It would be great to use a
> path like this gs://some_path
> - where does the flume agent needs to be ? when I see 
> hdfs://some_path/ I wonder why there is no server address in the path
>
> In fact I looking for feedback about sending data to a google cloud
> hadoop cluster from my own (on premises) servers.
>
> Thanks
> -- 
> Jean-Philippe Caruana 
> http://www.barreverte.fr

-- 
Jean-Philippe Caruana 
http://www.barreverte.fr