You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mahesh Balija <ba...@gmail.com> on 2013/01/17 11:03:24 UTC

How to copy log files from remote windows machine to Hadoop cluster

Hi,

      My log files are generated and saved in a windows machine.
      Now I have to move those remote files to the Hadoop cluster (HDFS)
either in synchronous or asynchronous way.

      I have gone through flume (Various source types) but was not helpful.
      Please suggest whether there Is any popular way to do this.

Thanks,
Mahesh.B.
Calsoft Labs.

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Mirko,

           Thanks for your reply. It works for me as well.
           Now I was able to mount the folder on the master node and
configured Flume such that it can either poll for logs in real time or even
for periodic retrieval.

Thanks,
Mahesh Balija.
Calsof Labs.

On Thu, Jan 17, 2013 at 5:01 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> One approach I used in my lab was the "data-gateway",
> which is a small linux box which just mounts Windows Shares
> and a single flume node on the gateway corresponds to the
> HDFS cluster. With tail or periodic log rotation you have control
> over all logfiles, depending on your use case. Either grab all
> incomming data and buffer it in Flume or just move all new data
> during night to the cluster. The gateway also contains sqoop
> and HDFS client if needed.
>
> Mirko
>
>
>
>
> 2013/1/17 Mahesh Balija <ba...@gmail.com>
>
>> That link talks about just installing Flume on Windows machine (NOT even
>> have configs to push logs to the Hadoop cluster), but what if I have to
>> collect logs from various clients, then I will endup installing in all
>> clients.
>>
>> I have installed Flume successfully on Linux but I have to configure it
>> such a way that it should gather the log files from the remote windows box?
>>
>> Harsh can you throw some light on this?
>>
>>
>> On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>
>>> Yes. It is possible. I haven't tries windows+flume+hadoop combo
>>> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
>>> has explained beautifully how to run Flume on a windows box.If I
>>> get time i'll try to simulate your use case and let you know.
>>>
>>> BTW, could you please share with us whatever you have tried??
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <
>>> balijamahesh.mca@gmail.com> wrote:
>>>
>>>> I have studied Flume but I didn't find any thing useful in my case.
>>>> My requirement is there is a directory in Windows machine, in which the
>>>> files will be generated and keep updated with new logs. I want to have a
>>>> tail kind of mechanism (using exec source) through which I can push the
>>>> latest updates into the cluster.
>>>> Or I have to simply push once in a day to the cluster using spooling
>>>> directory mechanism.
>>>>
>>>> Can somebody assist whether it is possible using Flume if so the
>>>> configurations needed for this specific to remote windows machine.
>>>>
>>>> But
>>>>
>>>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>>>
>>>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>>>
>>>>> Mirko
>>>>>
>>>>>
>>>>>
>>>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>>>
>>>>>> ftp auto upload?
>>>>>>
>>>>>>
>>>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Mirko,

           Thanks for your reply. It works for me as well.
           Now I was able to mount the folder on the master node and
configured Flume such that it can either poll for logs in real time or even
for periodic retrieval.

Thanks,
Mahesh Balija.
Calsof Labs.

On Thu, Jan 17, 2013 at 5:01 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> One approach I used in my lab was the "data-gateway",
> which is a small linux box which just mounts Windows Shares
> and a single flume node on the gateway corresponds to the
> HDFS cluster. With tail or periodic log rotation you have control
> over all logfiles, depending on your use case. Either grab all
> incomming data and buffer it in Flume or just move all new data
> during night to the cluster. The gateway also contains sqoop
> and HDFS client if needed.
>
> Mirko
>
>
>
>
> 2013/1/17 Mahesh Balija <ba...@gmail.com>
>
>> That link talks about just installing Flume on Windows machine (NOT even
>> have configs to push logs to the Hadoop cluster), but what if I have to
>> collect logs from various clients, then I will endup installing in all
>> clients.
>>
>> I have installed Flume successfully on Linux but I have to configure it
>> such a way that it should gather the log files from the remote windows box?
>>
>> Harsh can you throw some light on this?
>>
>>
>> On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>
>>> Yes. It is possible. I haven't tries windows+flume+hadoop combo
>>> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
>>> has explained beautifully how to run Flume on a windows box.If I
>>> get time i'll try to simulate your use case and let you know.
>>>
>>> BTW, could you please share with us whatever you have tried??
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <
>>> balijamahesh.mca@gmail.com> wrote:
>>>
>>>> I have studied Flume but I didn't find any thing useful in my case.
>>>> My requirement is there is a directory in Windows machine, in which the
>>>> files will be generated and keep updated with new logs. I want to have a
>>>> tail kind of mechanism (using exec source) through which I can push the
>>>> latest updates into the cluster.
>>>> Or I have to simply push once in a day to the cluster using spooling
>>>> directory mechanism.
>>>>
>>>> Can somebody assist whether it is possible using Flume if so the
>>>> configurations needed for this specific to remote windows machine.
>>>>
>>>> But
>>>>
>>>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>>>
>>>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>>>
>>>>> Mirko
>>>>>
>>>>>
>>>>>
>>>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>>>
>>>>>> ftp auto upload?
>>>>>>
>>>>>>
>>>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Mirko,

           Thanks for your reply. It works for me as well.
           Now I was able to mount the folder on the master node and
configured Flume such that it can either poll for logs in real time or even
for periodic retrieval.

Thanks,
Mahesh Balija.
Calsof Labs.

On Thu, Jan 17, 2013 at 5:01 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> One approach I used in my lab was the "data-gateway",
> which is a small linux box which just mounts Windows Shares
> and a single flume node on the gateway corresponds to the
> HDFS cluster. With tail or periodic log rotation you have control
> over all logfiles, depending on your use case. Either grab all
> incomming data and buffer it in Flume or just move all new data
> during night to the cluster. The gateway also contains sqoop
> and HDFS client if needed.
>
> Mirko
>
>
>
>
> 2013/1/17 Mahesh Balija <ba...@gmail.com>
>
>> That link talks about just installing Flume on Windows machine (NOT even
>> have configs to push logs to the Hadoop cluster), but what if I have to
>> collect logs from various clients, then I will endup installing in all
>> clients.
>>
>> I have installed Flume successfully on Linux but I have to configure it
>> such a way that it should gather the log files from the remote windows box?
>>
>> Harsh can you throw some light on this?
>>
>>
>> On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>
>>> Yes. It is possible. I haven't tries windows+flume+hadoop combo
>>> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
>>> has explained beautifully how to run Flume on a windows box.If I
>>> get time i'll try to simulate your use case and let you know.
>>>
>>> BTW, could you please share with us whatever you have tried??
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <
>>> balijamahesh.mca@gmail.com> wrote:
>>>
>>>> I have studied Flume but I didn't find any thing useful in my case.
>>>> My requirement is there is a directory in Windows machine, in which the
>>>> files will be generated and keep updated with new logs. I want to have a
>>>> tail kind of mechanism (using exec source) through which I can push the
>>>> latest updates into the cluster.
>>>> Or I have to simply push once in a day to the cluster using spooling
>>>> directory mechanism.
>>>>
>>>> Can somebody assist whether it is possible using Flume if so the
>>>> configurations needed for this specific to remote windows machine.
>>>>
>>>> But
>>>>
>>>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>>>
>>>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>>>
>>>>> Mirko
>>>>>
>>>>>
>>>>>
>>>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>>>
>>>>>> ftp auto upload?
>>>>>>
>>>>>>
>>>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
Hi Mirko,

           Thanks for your reply. It works for me as well.
           Now I was able to mount the folder on the master node and
configured Flume such that it can either poll for logs in real time or even
for periodic retrieval.

Thanks,
Mahesh Balija.
Calsof Labs.

On Thu, Jan 17, 2013 at 5:01 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> One approach I used in my lab was the "data-gateway",
> which is a small linux box which just mounts Windows Shares
> and a single flume node on the gateway corresponds to the
> HDFS cluster. With tail or periodic log rotation you have control
> over all logfiles, depending on your use case. Either grab all
> incomming data and buffer it in Flume or just move all new data
> during night to the cluster. The gateway also contains sqoop
> and HDFS client if needed.
>
> Mirko
>
>
>
>
> 2013/1/17 Mahesh Balija <ba...@gmail.com>
>
>> That link talks about just installing Flume on Windows machine (NOT even
>> have configs to push logs to the Hadoop cluster), but what if I have to
>> collect logs from various clients, then I will endup installing in all
>> clients.
>>
>> I have installed Flume successfully on Linux but I have to configure it
>> such a way that it should gather the log files from the remote windows box?
>>
>> Harsh can you throw some light on this?
>>
>>
>> On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com>wrote:
>>
>>> Yes. It is possible. I haven't tries windows+flume+hadoop combo
>>> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
>>> has explained beautifully how to run Flume on a windows box.If I
>>> get time i'll try to simulate your use case and let you know.
>>>
>>> BTW, could you please share with us whatever you have tried??
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <
>>> balijamahesh.mca@gmail.com> wrote:
>>>
>>>> I have studied Flume but I didn't find any thing useful in my case.
>>>> My requirement is there is a directory in Windows machine, in which the
>>>> files will be generated and keep updated with new logs. I want to have a
>>>> tail kind of mechanism (using exec source) through which I can push the
>>>> latest updates into the cluster.
>>>> Or I have to simply push once in a day to the cluster using spooling
>>>> directory mechanism.
>>>>
>>>> Can somebody assist whether it is possible using Flume if so the
>>>> configurations needed for this specific to remote windows machine.
>>>>
>>>> But
>>>>
>>>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>>>
>>>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>>>
>>>>> Mirko
>>>>>
>>>>>
>>>>>
>>>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>>>
>>>>>> ftp auto upload?
>>>>>>
>>>>>>
>>>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mirko Kämpf <mi...@gmail.com>.
One approach I used in my lab was the "data-gateway",
which is a small linux box which just mounts Windows Shares
and a single flume node on the gateway corresponds to the
HDFS cluster. With tail or periodic log rotation you have control
over all logfiles, depending on your use case. Either grab all
incomming data and buffer it in Flume or just move all new data
during night to the cluster. The gateway also contains sqoop
and HDFS client if needed.

Mirko



2013/1/17 Mahesh Balija <ba...@gmail.com>

> That link talks about just installing Flume on Windows machine (NOT even
> have configs to push logs to the Hadoop cluster), but what if I have to
> collect logs from various clients, then I will endup installing in all
> clients.
>
> I have installed Flume successfully on Linux but I have to configure it
> such a way that it should gather the log files from the remote windows box?
>
> Harsh can you throw some light on this?
>
>
> On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Yes. It is possible. I haven't tries windows+flume+hadoop combo
>> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
>> has explained beautifully how to run Flume on a windows box.If I
>> get time i'll try to simulate your use case and let you know.
>>
>> BTW, could you please share with us whatever you have tried??
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <
>> balijamahesh.mca@gmail.com> wrote:
>>
>>> I have studied Flume but I didn't find any thing useful in my case.
>>> My requirement is there is a directory in Windows machine, in which the
>>> files will be generated and keep updated with new logs. I want to have a
>>> tail kind of mechanism (using exec source) through which I can push the
>>> latest updates into the cluster.
>>> Or I have to simply push once in a day to the cluster using spooling
>>> directory mechanism.
>>>
>>> Can somebody assist whether it is possible using Flume if so the
>>> configurations needed for this specific to remote windows machine.
>>>
>>> But
>>>
>>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>>
>>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>>
>>>> Mirko
>>>>
>>>>
>>>>
>>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>>
>>>>> ftp auto upload?
>>>>>
>>>>>
>>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>>
>>>>
>>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mirko Kämpf <mi...@gmail.com>.
One approach I used in my lab was the "data-gateway",
which is a small linux box which just mounts Windows Shares
and a single flume node on the gateway corresponds to the
HDFS cluster. With tail or periodic log rotation you have control
over all logfiles, depending on your use case. Either grab all
incomming data and buffer it in Flume or just move all new data
during night to the cluster. The gateway also contains sqoop
and HDFS client if needed.

Mirko



2013/1/17 Mahesh Balija <ba...@gmail.com>

> That link talks about just installing Flume on Windows machine (NOT even
> have configs to push logs to the Hadoop cluster), but what if I have to
> collect logs from various clients, then I will endup installing in all
> clients.
>
> I have installed Flume successfully on Linux but I have to configure it
> such a way that it should gather the log files from the remote windows box?
>
> Harsh can you throw some light on this?
>
>
> On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Yes. It is possible. I haven't tries windows+flume+hadoop combo
>> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
>> has explained beautifully how to run Flume on a windows box.If I
>> get time i'll try to simulate your use case and let you know.
>>
>> BTW, could you please share with us whatever you have tried??
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <
>> balijamahesh.mca@gmail.com> wrote:
>>
>>> I have studied Flume but I didn't find any thing useful in my case.
>>> My requirement is there is a directory in Windows machine, in which the
>>> files will be generated and keep updated with new logs. I want to have a
>>> tail kind of mechanism (using exec source) through which I can push the
>>> latest updates into the cluster.
>>> Or I have to simply push once in a day to the cluster using spooling
>>> directory mechanism.
>>>
>>> Can somebody assist whether it is possible using Flume if so the
>>> configurations needed for this specific to remote windows machine.
>>>
>>> But
>>>
>>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>>
>>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>>
>>>> Mirko
>>>>
>>>>
>>>>
>>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>>
>>>>> ftp auto upload?
>>>>>
>>>>>
>>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>>
>>>>
>>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mirko Kämpf <mi...@gmail.com>.
One approach I used in my lab was the "data-gateway",
which is a small linux box which just mounts Windows Shares
and a single flume node on the gateway corresponds to the
HDFS cluster. With tail or periodic log rotation you have control
over all logfiles, depending on your use case. Either grab all
incomming data and buffer it in Flume or just move all new data
during night to the cluster. The gateway also contains sqoop
and HDFS client if needed.

Mirko



2013/1/17 Mahesh Balija <ba...@gmail.com>

> That link talks about just installing Flume on Windows machine (NOT even
> have configs to push logs to the Hadoop cluster), but what if I have to
> collect logs from various clients, then I will endup installing in all
> clients.
>
> I have installed Flume successfully on Linux but I have to configure it
> such a way that it should gather the log files from the remote windows box?
>
> Harsh can you throw some light on this?
>
>
> On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Yes. It is possible. I haven't tries windows+flume+hadoop combo
>> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
>> has explained beautifully how to run Flume on a windows box.If I
>> get time i'll try to simulate your use case and let you know.
>>
>> BTW, could you please share with us whatever you have tried??
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <
>> balijamahesh.mca@gmail.com> wrote:
>>
>>> I have studied Flume but I didn't find any thing useful in my case.
>>> My requirement is there is a directory in Windows machine, in which the
>>> files will be generated and keep updated with new logs. I want to have a
>>> tail kind of mechanism (using exec source) through which I can push the
>>> latest updates into the cluster.
>>> Or I have to simply push once in a day to the cluster using spooling
>>> directory mechanism.
>>>
>>> Can somebody assist whether it is possible using Flume if so the
>>> configurations needed for this specific to remote windows machine.
>>>
>>> But
>>>
>>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>>
>>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>>
>>>> Mirko
>>>>
>>>>
>>>>
>>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>>
>>>>> ftp auto upload?
>>>>>
>>>>>
>>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>>
>>>>
>>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mirko Kämpf <mi...@gmail.com>.
One approach I used in my lab was the "data-gateway",
which is a small linux box which just mounts Windows Shares
and a single flume node on the gateway corresponds to the
HDFS cluster. With tail or periodic log rotation you have control
over all logfiles, depending on your use case. Either grab all
incomming data and buffer it in Flume or just move all new data
during night to the cluster. The gateway also contains sqoop
and HDFS client if needed.

Mirko



2013/1/17 Mahesh Balija <ba...@gmail.com>

> That link talks about just installing Flume on Windows machine (NOT even
> have configs to push logs to the Hadoop cluster), but what if I have to
> collect logs from various clients, then I will endup installing in all
> clients.
>
> I have installed Flume successfully on Linux but I have to configure it
> such a way that it should gather the log files from the remote windows box?
>
> Harsh can you throw some light on this?
>
>
> On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com>wrote:
>
>> Yes. It is possible. I haven't tries windows+flume+hadoop combo
>> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
>> has explained beautifully how to run Flume on a windows box.If I
>> get time i'll try to simulate your use case and let you know.
>>
>> BTW, could you please share with us whatever you have tried??
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <
>> balijamahesh.mca@gmail.com> wrote:
>>
>>> I have studied Flume but I didn't find any thing useful in my case.
>>> My requirement is there is a directory in Windows machine, in which the
>>> files will be generated and keep updated with new logs. I want to have a
>>> tail kind of mechanism (using exec source) through which I can push the
>>> latest updates into the cluster.
>>> Or I have to simply push once in a day to the cluster using spooling
>>> directory mechanism.
>>>
>>> Can somebody assist whether it is possible using Flume if so the
>>> configurations needed for this specific to remote windows machine.
>>>
>>> But
>>>
>>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>>
>>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>>
>>>> Mirko
>>>>
>>>>
>>>>
>>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>>
>>>>> ftp auto upload?
>>>>>
>>>>>
>>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>>
>>>>
>>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
That link talks about just installing Flume on Windows machine (NOT even
have configs to push logs to the Hadoop cluster), but what if I have to
collect logs from various clients, then I will endup installing in all
clients.

I have installed Flume successfully on Linux but I have to configure it
such a way that it should gather the log files from the remote windows box?

Harsh can you throw some light on this?

On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Yes. It is possible. I haven't tries windows+flume+hadoop combo
> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
> has explained beautifully how to run Flume on a windows box.If I
> get time i'll try to simulate your use case and let you know.
>
> BTW, could you please share with us whatever you have tried??
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <balijamahesh.mca@gmail.com
> > wrote:
>
>> I have studied Flume but I didn't find any thing useful in my case.
>> My requirement is there is a directory in Windows machine, in which the
>> files will be generated and keep updated with new logs. I want to have a
>> tail kind of mechanism (using exec source) through which I can push the
>> latest updates into the cluster.
>> Or I have to simply push once in a day to the cluster using spooling
>> directory mechanism.
>>
>> Can somebody assist whether it is possible using Flume if so the
>> configurations needed for this specific to remote windows machine.
>>
>> But
>>
>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>
>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>
>>> Mirko
>>>
>>>
>>>
>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>
>>>> ftp auto upload?
>>>>
>>>>
>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>
>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
That link talks about just installing Flume on Windows machine (NOT even
have configs to push logs to the Hadoop cluster), but what if I have to
collect logs from various clients, then I will endup installing in all
clients.

I have installed Flume successfully on Linux but I have to configure it
such a way that it should gather the log files from the remote windows box?

Harsh can you throw some light on this?

On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Yes. It is possible. I haven't tries windows+flume+hadoop combo
> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
> has explained beautifully how to run Flume on a windows box.If I
> get time i'll try to simulate your use case and let you know.
>
> BTW, could you please share with us whatever you have tried??
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <balijamahesh.mca@gmail.com
> > wrote:
>
>> I have studied Flume but I didn't find any thing useful in my case.
>> My requirement is there is a directory in Windows machine, in which the
>> files will be generated and keep updated with new logs. I want to have a
>> tail kind of mechanism (using exec source) through which I can push the
>> latest updates into the cluster.
>> Or I have to simply push once in a day to the cluster using spooling
>> directory mechanism.
>>
>> Can somebody assist whether it is possible using Flume if so the
>> configurations needed for this specific to remote windows machine.
>>
>> But
>>
>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>
>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>
>>> Mirko
>>>
>>>
>>>
>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>
>>>> ftp auto upload?
>>>>
>>>>
>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>
>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
That link talks about just installing Flume on Windows machine (NOT even
have configs to push logs to the Hadoop cluster), but what if I have to
collect logs from various clients, then I will endup installing in all
clients.

I have installed Flume successfully on Linux but I have to configure it
such a way that it should gather the log files from the remote windows box?

Harsh can you throw some light on this?

On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Yes. It is possible. I haven't tries windows+flume+hadoop combo
> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
> has explained beautifully how to run Flume on a windows box.If I
> get time i'll try to simulate your use case and let you know.
>
> BTW, could you please share with us whatever you have tried??
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <balijamahesh.mca@gmail.com
> > wrote:
>
>> I have studied Flume but I didn't find any thing useful in my case.
>> My requirement is there is a directory in Windows machine, in which the
>> files will be generated and keep updated with new logs. I want to have a
>> tail kind of mechanism (using exec source) through which I can push the
>> latest updates into the cluster.
>> Or I have to simply push once in a day to the cluster using spooling
>> directory mechanism.
>>
>> Can somebody assist whether it is possible using Flume if so the
>> configurations needed for this specific to remote windows machine.
>>
>> But
>>
>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>
>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>
>>> Mirko
>>>
>>>
>>>
>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>
>>>> ftp auto upload?
>>>>
>>>>
>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>
>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
That link talks about just installing Flume on Windows machine (NOT even
have configs to push logs to the Hadoop cluster), but what if I have to
collect logs from various clients, then I will endup installing in all
clients.

I have installed Flume successfully on Linux but I have to configure it
such a way that it should gather the log files from the remote windows box?

Harsh can you throw some light on this?

On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Yes. It is possible. I haven't tries windows+flume+hadoop combo
> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
> has explained beautifully how to run Flume on a windows box.If I
> get time i'll try to simulate your use case and let you know.
>
> BTW, could you please share with us whatever you have tried??
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <balijamahesh.mca@gmail.com
> > wrote:
>
>> I have studied Flume but I didn't find any thing useful in my case.
>> My requirement is there is a directory in Windows machine, in which the
>> files will be generated and keep updated with new logs. I want to have a
>> tail kind of mechanism (using exec source) through which I can push the
>> latest updates into the cluster.
>> Or I have to simply push once in a day to the cluster using spooling
>> directory mechanism.
>>
>> Can somebody assist whether it is possible using Flume if so the
>> configurations needed for this specific to remote windows machine.
>>
>> But
>>
>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>>
>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>
>>> Mirko
>>>
>>>
>>>
>>> 2013/1/17 sirenfei <si...@gmail.com>
>>>
>>>> ftp auto upload?
>>>>
>>>>
>>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>
>>>
>>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mohammad Tariq <do...@gmail.com>.
Yes. It is possible. I haven't tries windows+flume+hadoop combo
personally, but it should work. You may find this
link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful.
Alex
has explained beautifully how to run Flume on a windows box.If I
get time i'll try to simulate your use case and let you know.

BTW, could you please share with us whatever you have tried??

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija
<ba...@gmail.com>wrote:

> I have studied Flume but I didn't find any thing useful in my case.
> My requirement is there is a directory in Windows machine, in which the
> files will be generated and keep updated with new logs. I want to have a
> tail kind of mechanism (using exec source) through which I can push the
> latest updates into the cluster.
> Or I have to simply push once in a day to the cluster using spooling
> directory mechanism.
>
> Can somebody assist whether it is possible using Flume if so the
> configurations needed for this specific to remote windows machine.
>
> But
>
> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>
>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>
>> Mirko
>>
>>
>>
>> 2013/1/17 sirenfei <si...@gmail.com>
>>
>>> ftp auto upload?
>>>
>>>
>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>
>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mohammad Tariq <do...@gmail.com>.
Yes. It is possible. I haven't tries windows+flume+hadoop combo
personally, but it should work. You may find this
link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful.
Alex
has explained beautifully how to run Flume on a windows box.If I
get time i'll try to simulate your use case and let you know.

BTW, could you please share with us whatever you have tried??

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija
<ba...@gmail.com>wrote:

> I have studied Flume but I didn't find any thing useful in my case.
> My requirement is there is a directory in Windows machine, in which the
> files will be generated and keep updated with new logs. I want to have a
> tail kind of mechanism (using exec source) through which I can push the
> latest updates into the cluster.
> Or I have to simply push once in a day to the cluster using spooling
> directory mechanism.
>
> Can somebody assist whether it is possible using Flume if so the
> configurations needed for this specific to remote windows machine.
>
> But
>
> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>
>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>
>> Mirko
>>
>>
>>
>> 2013/1/17 sirenfei <si...@gmail.com>
>>
>>> ftp auto upload?
>>>
>>>
>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>
>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mohammad Tariq <do...@gmail.com>.
Yes. It is possible. I haven't tries windows+flume+hadoop combo
personally, but it should work. You may find this
link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful.
Alex
has explained beautifully how to run Flume on a windows box.If I
get time i'll try to simulate your use case and let you know.

BTW, could you please share with us whatever you have tried??

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija
<ba...@gmail.com>wrote:

> I have studied Flume but I didn't find any thing useful in my case.
> My requirement is there is a directory in Windows machine, in which the
> files will be generated and keep updated with new logs. I want to have a
> tail kind of mechanism (using exec source) through which I can push the
> latest updates into the cluster.
> Or I have to simply push once in a day to the cluster using spooling
> directory mechanism.
>
> Can somebody assist whether it is possible using Flume if so the
> configurations needed for this specific to remote windows machine.
>
> But
>
> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>
>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>
>> Mirko
>>
>>
>>
>> 2013/1/17 sirenfei <si...@gmail.com>
>>
>>> ftp auto upload?
>>>
>>>
>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>
>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mohammad Tariq <do...@gmail.com>.
Yes. It is possible. I haven't tries windows+flume+hadoop combo
personally, but it should work. You may find this
link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful.
Alex
has explained beautifully how to run Flume on a windows box.If I
get time i'll try to simulate your use case and let you know.

BTW, could you please share with us whatever you have tried??

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija
<ba...@gmail.com>wrote:

> I have studied Flume but I didn't find any thing useful in my case.
> My requirement is there is a directory in Windows machine, in which the
> files will be generated and keep updated with new logs. I want to have a
> tail kind of mechanism (using exec source) through which I can push the
> latest updates into the cluster.
> Or I have to simply push once in a day to the cluster using spooling
> directory mechanism.
>
> Can somebody assist whether it is possible using Flume if so the
> configurations needed for this specific to remote windows machine.
>
> But
>
> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>
>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>
>> Mirko
>>
>>
>>
>> 2013/1/17 sirenfei <si...@gmail.com>
>>
>>> ftp auto upload?
>>>
>>>
>>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>
>>
>>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
I have studied Flume but I didn't find any thing useful in my case.
My requirement is there is a directory in Windows machine, in which the
files will be generated and keep updated with new logs. I want to have a
tail kind of mechanism (using exec source) through which I can push the
latest updates into the cluster.
Or I have to simply push once in a day to the cluster using spooling
directory mechanism.

Can somebody assist whether it is possible using Flume if so the
configurations needed for this specific to remote windows machine.

But

On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> Give Flume (http://flume.apache.org/) a chance to collect your data.
>
> Mirko
>
>
>
> 2013/1/17 sirenfei <si...@gmail.com>
>
>> ftp auto upload?
>>
>>
>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>
>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
I have studied Flume but I didn't find any thing useful in my case.
My requirement is there is a directory in Windows machine, in which the
files will be generated and keep updated with new logs. I want to have a
tail kind of mechanism (using exec source) through which I can push the
latest updates into the cluster.
Or I have to simply push once in a day to the cluster using spooling
directory mechanism.

Can somebody assist whether it is possible using Flume if so the
configurations needed for this specific to remote windows machine.

But

On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> Give Flume (http://flume.apache.org/) a chance to collect your data.
>
> Mirko
>
>
>
> 2013/1/17 sirenfei <si...@gmail.com>
>
>> ftp auto upload?
>>
>>
>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>
>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
I have studied Flume but I didn't find any thing useful in my case.
My requirement is there is a directory in Windows machine, in which the
files will be generated and keep updated with new logs. I want to have a
tail kind of mechanism (using exec source) through which I can push the
latest updates into the cluster.
Or I have to simply push once in a day to the cluster using spooling
directory mechanism.

Can somebody assist whether it is possible using Flume if so the
configurations needed for this specific to remote windows machine.

But

On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> Give Flume (http://flume.apache.org/) a chance to collect your data.
>
> Mirko
>
>
>
> 2013/1/17 sirenfei <si...@gmail.com>
>
>> ftp auto upload?
>>
>>
>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>
>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mahesh Balija <ba...@gmail.com>.
I have studied Flume but I didn't find any thing useful in my case.
My requirement is there is a directory in Windows machine, in which the
files will be generated and keep updated with new logs. I want to have a
tail kind of mechanism (using exec source) through which I can push the
latest updates into the cluster.
Or I have to simply push once in a day to the cluster using spooling
directory mechanism.

Can somebody assist whether it is possible using Flume if so the
configurations needed for this specific to remote windows machine.

But

On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> Give Flume (http://flume.apache.org/) a chance to collect your data.
>
> Mirko
>
>
>
> 2013/1/17 sirenfei <si...@gmail.com>
>
>> ftp auto upload?
>>
>>
>> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>
>
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mirko Kämpf <mi...@gmail.com>.
Give Flume (http://flume.apache.org/) a chance to collect your data.

Mirko


2013/1/17 sirenfei <si...@gmail.com>

> ftp auto upload?
>
>
> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mirko Kämpf <mi...@gmail.com>.
Give Flume (http://flume.apache.org/) a chance to collect your data.

Mirko


2013/1/17 sirenfei <si...@gmail.com>

> ftp auto upload?
>
>
> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mirko Kämpf <mi...@gmail.com>.
Give Flume (http://flume.apache.org/) a chance to collect your data.

Mirko


2013/1/17 sirenfei <si...@gmail.com>

> ftp auto upload?
>
>
> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by Mirko Kämpf <mi...@gmail.com>.
Give Flume (http://flume.apache.org/) a chance to collect your data.

Mirko


2013/1/17 sirenfei <si...@gmail.com>

> ftp auto upload?
>
>
> 2013/1/17 Mahesh Balija <ba...@gmail.com>:
> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by sirenfei <si...@gmail.com>.
ftp auto upload?


2013/1/17 Mahesh Balija <ba...@gmail.com>:
> the Hadoop cluster (HDFS) either in synchronous or asynchronou

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by sirenfei <si...@gmail.com>.
ftp auto upload?


2013/1/17 Mahesh Balija <ba...@gmail.com>:
> the Hadoop cluster (HDFS) either in synchronous or asynchronou

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by sirenfei <si...@gmail.com>.
ftp auto upload?


2013/1/17 Mahesh Balija <ba...@gmail.com>:
> the Hadoop cluster (HDFS) either in synchronous or asynchronou

Re: How to copy log files from remote windows machine to Hadoop cluster

Posted by sirenfei <si...@gmail.com>.
ftp auto upload?


2013/1/17 Mahesh Balija <ba...@gmail.com>:
> the Hadoop cluster (HDFS) either in synchronous or asynchronou