You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Rahul Ravindran <ra...@yahoo.com> on 2012/11/19 23:07:41 UTC

Hadoop jars

Hi,

 I was exploring how I could look to deploy Flume into our cluster. The binary package available at http://flume.apache.org/download.html does not appear to have the hadoop jar files which are needed by the HDFS sink. I would expected it to be packaged in the lib folder. Is this deliberate? Is there a different binary which I could look to download for the hadoop jars?
Thanks,
~Rahul.

Re: Hadoop jars

Posted by Rahul Ravindran <ra...@yahoo.com>.
Thanks. We will use that. 

Sent from my phone.Excuse the terseness.

On Nov 19, 2012, at 4:53 PM, Hari Shreedharan <hs...@cloudera.com> wrote:

> No, you don't need Hdfs. Hadoop common/ Hadoop core should be enough. But make sure you add it to the classpath as I mentioned before.
> 
> Hari
> 
> On Nov 19, 2012, at 4:27 PM, Rahul Ravindran <ra...@yahoo.com> wrote:
> 
>> That is unfortunate. Is it sufficient if I package just hadoop-common.jar or is the recommended way essentially doing an apt-get install flume-ng which will install the below
>> 
>> # apt-cache depends flume-ng
>> 
>> flume-ng
>>   Depends: adduser
>>   Depends: hadoop-hdfs
>>   Depends: bigtop-utils
>> 
>> My concern is that hadoop-hdfs brings in a ton of other stuff which will not be used in any box except the one running the hdfs sink.
>> 
>> Thanks,
>> ~Rahul.
>> From: Hari Shreedharan <hs...@cloudera.com>
>> To: user@flume.apache.org; Rahul Ravindran <ra...@yahoo.com> 
>> Sent: Monday, November 19, 2012 4:08 PM
>> Subject: Re: Hadoop jars
>> 
>> Unfortunately, the FileChannel too has a hadoop dependency - even though the classes are never used. So you need the hadoop jars (and they should be added to FLUME_CLASSPATH in flume-env.sh or HADOOP_HOME/HADOOP_PREFIX should be set) on machines which will use the FileChannel. The channel directly does not depend on Hadoop anymore, but still needs them in the class path because we support migration from the older format to new format.
>> 
>> 
>> Thanks,
>> Hari
>> 
>> -- 
>> Hari Shreedharan
>> 
>> On Monday, November 19, 2012 at 4:04 PM, Rahul Ravindran wrote:
>>> Thanks for the responses.
>>> 
>>> Good to know that the only external dependencies are Hadoop and Hbase. We will deploy those components only on boxes which are going to have those sinks set up.
>>> 
>>> From: Hari Shreedharan <hs...@cloudera.com>
>>> To: user@flume.apache.org 
>>> Sent: Monday, November 19, 2012 3:29 PM
>>> Subject: Re: Hadoop jars
>>> 
>>> Flume installs all required binaries, except for Hadoop (and the dependencies it would pull in) and HBase. This is because Flume, like most other Hadoop ecosystem components is meant to work against binary incompatible versions of Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we expect Hadoop to be available on the machines you are running Flume on. Once you install Hadoop you should not have any dependency issues. Same is true for HBase.
>>> 
>>> 
>>> Hari
>>> 
>>> -- 
>>> Hari Shreedharan
>>> 
>>> On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
>>>> Easiest way is to install cdh binary and point your flume's classpath to it.
>>>> 
>>>> On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik <ro...@hortonworks.com> wrote:
>>>>> Currently, unfortunately, i dont think there is any such documentation.
>>>>> A  very general answer would be..Normally this list would depend on the source/sink/channel you are using.
>>>>> I think it would be nice if the user manual did list these external dependencies for each component.
>>>>> I am not the expert on HDFS sink.. but i dont see why it would depend on anything more than HDFS itself. 
>>>>> -roshan
>>>>> 
>>>>> 
>>>>> On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran <ra...@yahoo.com> wrote:
>>>>>> Are there other such libraries which will need to be downloaded? Is there a well-defined location for the hadoop jar and any other jars that flume may depend on?
>> 
>> 
>> 

Re: Hadoop jars

Posted by Hari Shreedharan <hs...@cloudera.com>.
No, you don't need Hdfs. Hadoop common/ Hadoop core should be enough. But make sure you add it to the classpath as I mentioned before.

Hari

On Nov 19, 2012, at 4:27 PM, Rahul Ravindran <ra...@yahoo.com> wrote:

> That is unfortunate. Is it sufficient if I package just hadoop-common.jar or is the recommended way essentially doing an apt-get install flume-ng which will install the below
> 
> # apt-cache depends flume-ng
> 
> flume-ng
>   Depends: adduser
>   Depends: hadoop-hdfs
>   Depends: bigtop-utils
> 
> My concern is that hadoop-hdfs brings in a ton of other stuff which will not be used in any box except the one running the hdfs sink.
> 
> Thanks,
> ~Rahul.
> From: Hari Shreedharan <hs...@cloudera.com>
> To: user@flume.apache.org; Rahul Ravindran <ra...@yahoo.com> 
> Sent: Monday, November 19, 2012 4:08 PM
> Subject: Re: Hadoop jars
> 
> Unfortunately, the FileChannel too has a hadoop dependency - even though the classes are never used. So you need the hadoop jars (and they should be added to FLUME_CLASSPATH in flume-env.sh or HADOOP_HOME/HADOOP_PREFIX should be set) on machines which will use the FileChannel. The channel directly does not depend on Hadoop anymore, but still needs them in the class path because we support migration from the older format to new format.                
> 
> 
> Thanks,
> Hari
> 
> -- 
> Hari Shreedharan
> 
> On Monday, November 19, 2012 at 4:04 PM, Rahul Ravindran wrote:
>> Thanks for the responses.
>> 
>> Good to know that the only external dependencies are Hadoop and Hbase. We will deploy those components only on boxes which are going to have those sinks set up.
>> 
>> From: Hari Shreedharan <hs...@cloudera.com>
>> To: user@flume.apache.org 
>> Sent: Monday, November 19, 2012 3:29 PM
>> Subject: Re: Hadoop jars
>> 
>> Flume installs all required binaries, except for Hadoop (and the dependencies it would pull in) and HBase. This is because Flume, like most other Hadoop ecosystem components is meant to work against binary incompatible versions of Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we expect Hadoop to be available on the machines you are running Flume on. Once you install Hadoop you should not have any dependency issues. Same is true for HBase.
>> 
>> 
>> Hari
>> 
>> -- 
>> Hari Shreedharan
>> 
>> On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
>>> Easiest way is to install cdh binary and point your flume's classpath to it.
>>> 
>>> On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik <ro...@hortonworks.com> wrote:
>>>> Currently, unfortunately, i dont think there is any such documentation.
>>>> A  very general answer would be..Normally this list would depend on the source/sink/channel you are using.
>>>> I think it would be nice if the user manual did list these external dependencies for each component.
>>>> I am not the expert on HDFS sink.. but i dont see why it would depend on anything more than HDFS itself. 
>>>> -roshan
>>>> 
>>>> 
>>>> On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran <ra...@yahoo.com> wrote:
>>>>> Are there other such libraries which will need to be downloaded? Is there a well-defined location for the hadoop jar and any other jars that flume may depend on?
> 
> 
> 

Re: Hadoop jars

Posted by Rahul Ravindran <ra...@yahoo.com>.
That is unfortunate. Is it sufficient if I package just hadoop-common.jar or is the recommended way essentially doing an apt-get install flume-ng which will install the below

# apt-cache depends flume-ng

flume-ng
  Depends: adduser
  Depends: hadoop-hdfs
  Depends: bigtop-utils

My concern is that hadoop-hdfs brings in a ton of other stuff which will not be used in any box except the one running the hdfs sink.

Thanks,
~Rahul.

________________________________
 From: Hari Shreedharan <hs...@cloudera.com>
To: user@flume.apache.org; Rahul Ravindran <ra...@yahoo.com> 
Sent: Monday, November 19, 2012 4:08 PM
Subject: Re: Hadoop jars
 

Unfortunately, the FileChannel too has a hadoop dependency - even though the classes are never used. So you need the hadoop jars (and they should be added to FLUME_CLASSPATH in flume-env.sh or HADOOP_HOME/HADOOP_PREFIX should be set) on machines which will use the FileChannel. The channel directly does not depend on Hadoop anymore, but still needs them in the class path because we support migration from the older format to new format. 


Thanks,
Hari


-- 
Hari Shreedharan

On Monday, November 19, 2012 at 4:04 PM, Rahul Ravindran wrote:
Thanks for the responses.
>
>
>Good to know that the only external dependencies are Hadoop and Hbase. We will deploy those components only on boxes which are going to have those sinks set up.
>
>
>
>________________________________
> From: Hari Shreedharan <hs...@cloudera.com>
>To: user@flume.apache.org 
>Sent: Monday, November 19, 2012 3:29 PM
>Subject: Re: Hadoop jars
> 
>
>Flume installs all required binaries, except for Hadoop (and the dependencies it would pull in) and HBase. This is because Flume, like most other Hadoop ecosystem components is meant to work against binary incompatible versions of Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we expect Hadoop to be available on the machines you are running Flume on. Once you install Hadoop you should not have any dependency issues. Same is true for HBase. 
>
>
>
>
>Hari
>
>
>-- 
>Hari Shreedharan
>
>
>On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
>Easiest way is to install cdh binary and point your flume's classpath to it.
>>
>>
>>On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik <ro...@hortonworks.com> wrote:
>>
>>Currently, unfortunately, i dont think there is any such documentation. 
>>>A  very general answer would be..Normally this list would depend on the source/sink/channel you are using.
>>>I think it would be nice if the user manual did list these external dependencies for each component.
>>>I am not the expert on HDFS sink.. but i dont see why it would depend on anything more than HDFS itself. 
>>>-roshan 
>>>
>>>
>>>
>>>On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran <ra...@yahoo.com> wrote:
>>>
>>>Are there other such libraries which will need to be downloaded? Is there a well-defined location for the hadoop jar and any other jars that flume may depend on?
>>>>
>>>>
>>>
>> 
>
>
>
> 

Re: Hadoop jars

Posted by Hari Shreedharan <hs...@cloudera.com>.
Unfortunately, the FileChannel too has a hadoop dependency - even though the classes are never used. So you need the hadoop jars (and they should be added to FLUME_CLASSPATH in flume-env.sh or HADOOP_HOME/HADOOP_PREFIX should be set) on machines which will use the FileChannel. The channel directly does not depend on Hadoop anymore, but still needs them in the class path because we support migration from the older format to new format. 


Thanks,
Hari

-- 
Hari Shreedharan


On Monday, November 19, 2012 at 4:04 PM, Rahul Ravindran wrote:

> Thanks for the responses.
> 
> Good to know that the only external dependencies are Hadoop and Hbase. We will deploy those components only on boxes which are going to have those sinks set up.
> 
> From: Hari Shreedharan <hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)>
> To: user@flume.apache.org (mailto:user@flume.apache.org) 
> Sent: Monday, November 19, 2012 3:29 PM
> Subject: Re: Hadoop jars
> 
> Flume installs all required binaries, except for Hadoop (and the dependencies it would pull in) and HBase. This is because Flume, like most other Hadoop ecosystem components is meant to work against binary incompatible versions of Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we expect Hadoop to be available on the machines you are running Flume on. Once you install Hadoop you should not have any dependency issues. Same is true for HBase. 
> 
> 
> Hari 
> 
> -- 
> Hari Shreedharan
> 
> On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
> > Easiest way is to install cdh binary and point your flume's classpath to it.
> > 
> > On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik <roshan@hortonworks.com (mailto:roshan@hortonworks.com)> wrote:
> > > Currently, unfortunately, i dont think there is any such documentation. 
> > > A  very general answer would be..Normally this list would depend on the source/sink/channel you are using.
> > > I think it would be nice if the user manual did list these external dependencies for each component.
> > > I am not the expert on HDFS sink.. but i dont see why it would depend on anything more than HDFS itself. 
> > > -roshan 
> > > 
> > > 
> > > On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran <rahulrv@yahoo.com (mailto:rahulrv@yahoo.com)> wrote:
> > > > Are there other such libraries which will need to be downloaded? Is there a well-defined location for the hadoop jar and any other jars that flume may depend on? 
> > > > 
> > > 
> > 
> 
> 
> 


Re: Hadoop jars

Posted by Rahul Ravindran <ra...@yahoo.com>.
Thanks for the responses.

Good to know that the only external dependencies are Hadoop and Hbase. We will deploy those components only on boxes which are going to have those sinks set up.


________________________________
 From: Hari Shreedharan <hs...@cloudera.com>
To: user@flume.apache.org 
Sent: Monday, November 19, 2012 3:29 PM
Subject: Re: Hadoop jars
 

Flume installs all required binaries, except for Hadoop (and the dependencies it would pull in) and HBase. This is because Flume, like most other Hadoop ecosystem components is meant to work against binary incompatible versions of Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we expect Hadoop to be available on the machines you are running Flume on. Once you install Hadoop you should not have any dependency issues. Same is true for HBase. 


Hari


-- 
Hari Shreedharan

On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:
Easiest way is to install cdh binary and point your flume's classpath to it.
>
>
>On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik <ro...@hortonworks.com> wrote:
>
>Currently, unfortunately, i dont think there is any such documentation. 
>>A  very general answer would be..Normally this list would depend on the source/sink/channel you are using.
>>I think it would be nice if the user manual did list these external dependencies for each component.
>>I am not the expert on HDFS sink.. but i dont see why it would depend on anything more than HDFS itself. 
>>-roshan 
>>
>>
>>
>>On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran <ra...@yahoo.com> wrote:
>>
>>Are there other such libraries which will need to be downloaded? Is there a well-defined location for the hadoop jar and any other jars that flume may depend on?
>>>
>>>
>>
> 

Re: Hadoop jars

Posted by Hari Shreedharan <hs...@cloudera.com>.
Flume installs all required binaries, except for Hadoop (and the dependencies it would pull in) and HBase. This is because Flume, like most other Hadoop ecosystem components is meant to work against binary incompatible versions of Hadoop (Hadoop-1/Hadoop2). So instead of packaging hadoop jars with Flume, we expect Hadoop to be available on the machines you are running Flume on. Once you install Hadoop you should not have any dependency issues. Same is true for HBase. 


Hari 

-- 
Hari Shreedharan


On Monday, November 19, 2012 at 2:33 PM, Mohit Anchlia wrote:

> Easiest way is to install cdh binary and point your flume's classpath to it.
> 
> On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik <roshan@hortonworks.com (mailto:roshan@hortonworks.com)> wrote:
> > Currently, unfortunately, i dont think there is any such documentation. 
> > A  very general answer would be..Normally this list would depend on the source/sink/channel you are using.
> > I think it would be nice if the user manual did list these external dependencies for each component.
> > I am not the expert on HDFS sink.. but i dont see why it would depend on anything more than HDFS itself. 
> > -roshan 
> > 
> > 
> > On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran <rahulrv@yahoo.com (mailto:rahulrv@yahoo.com)> wrote:
> > > Are there other such libraries which will need to be downloaded? Is there a well-defined location for the hadoop jar and any other jars that flume may depend on? 
> > > 
> > 
> 


Re: Hadoop jars

Posted by Mohit Anchlia <mo...@gmail.com>.
Easiest way is to install cdh binary and point your flume's classpath to it.

On Mon, Nov 19, 2012 at 2:25 PM, Roshan Naik <ro...@hortonworks.com> wrote:

> Currently, unfortunately, i dont think there is any such documentation.
> A  very general answer would be..Normally this list would depend on the
> source/sink/channel you are using.
> I think it would be nice if the user manual did list these external
> dependencies for each component.
> I am not the expert on HDFS sink.. but i dont see why it would depend on
> anything more than HDFS itself.
> -roshan
>
>
> On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran <ra...@yahoo.com>wrote:
>
>>  Are there other such libraries which will need to be downloaded? Is
>> there a well-defined location for the hadoop jar and any other jars that
>> flume may depend on?
>>
>>
>

Re: Hadoop jars

Posted by Roshan Naik <ro...@hortonworks.com>.
Currently, unfortunately, i dont think there is any such documentation.
A  very general answer would be..Normally this list would depend on the
source/sink/channel you are using.
I think it would be nice if the user manual did list these external
dependencies for each component.
I am not the expert on HDFS sink.. but i dont see why it would depend on
anything more than HDFS itself.
-roshan

On Mon, Nov 19, 2012 at 2:18 PM, Rahul Ravindran <ra...@yahoo.com> wrote:

> Are there other such libraries which will need to be downloaded? Is there
> a well-defined location for the hadoop jar and any other jars that flume
> may depend on?
>
>

Re: Hadoop jars

Posted by Rahul Ravindran <ra...@yahoo.com>.
Are there other such libraries which will need to be downloaded? Is there a well-defined location for the hadoop jar and any other jars that flume may depend on?


________________________________
 From: Mohit Anchlia <mo...@gmail.com>
To: "user@flume.apache.org" <us...@flume.apache.org> 
Cc: User-flume <us...@flume.apache.org> 
Sent: Monday, November 19, 2012 2:11 PM
Subject: Re: Hadoop jars
 

Generally you need to install it separately




On Nov 19, 2012, at 2:07 PM, Rahul Ravindran <ra...@yahoo.com> wrote:


Hi,
>
>
> I was exploring how I could look to deploy Flume into our cluster. The binary package available at http://flume.apache.org/download.html does not appear to have the hadoop jar files which are needed by the HDFS sink. I would expected it to be packaged in the lib folder. Is this deliberate? Is there a different binary which I could look to download for the hadoop jars?
>Thanks,
>~Rahul.

Re: Hadoop jars

Posted by Mohit Anchlia <mo...@gmail.com>.
Generally you need to install it separately



On Nov 19, 2012, at 2:07 PM, Rahul Ravindran <ra...@yahoo.com> wrote:

> Hi,
> 
>  I was exploring how I could look to deploy Flume into our cluster. The binary package available at http://flume.apache.org/download.html does not appear to have the hadoop jar files which are needed by the HDFS sink. I would expected it to be packaged in the lib folder. Is this deliberate? Is there a different binary which I could look to download for the hadoop jars?
> Thanks,
> ~Rahul.