You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Aditya Singh30 <Ad...@infosys.com> on 2011/10/05 12:13:10 UTC

Hadoop : Linux-Window interface

Hi,

We want to use Hadoop and Hive to store and analyze some Web Servers' Log files. The servers are running on windows platform. As mentioned about Hadoop, it is only supported for development on windows. I wanted to know is there a way that we can run the Hadoop server(namenode server) and cluster nodes on  Linux, and have an interface using which we can send files and run analysis queries from the WebServer's windows environment.
I would really appreciate if you could point me to a right direction.


Regards,
Aditya Singh
Infosys. India


**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are not 
to copy, disclose, or distribute this e-mail or its contents to any other person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken 
every reasonable precaution to minimize this risk, but is not liable for any damage 
you may sustain as a result of any virus in this e-mail. You should carry out your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

Re: Hadoop : Linux-Window interface

Posted by "Periya.Data" <pe...@gmail.com>.
Hi Aditya,
    You may want to investigate about using Flume...that is designed to
collect unstructured data from disparate sources and store them in HDFS (or
directly into HIVE tables). I do not know if Flume provides interoperability
with Window's systems (maybe you hack it and make it work with Cygwin...).

http://archive.cloudera.com/cdh/3/flume/Cookbook/


-PD.

On Wed, Oct 5, 2011 at 8:14 AM, Bejoy KS <be...@gmail.com> wrote:

> Hi Aditya
>         Definitely you can do it. As a very basic solution you can ftp the
> contents to LFS(LOCAL/Linux File System ) and they do a copyFromLocal into
> HDFS. Create a hive table with appropriate regex support and load the data
> in. Hive has classes that effectively support parsing and loading of Apache
> log files into hive tables.
> For the entite data transfer,you just need to write a shell script for the
> same. Log analysis won't be real time right? So you can schedule the job
> with some scheduler libe a cron or to be used  in conjuction with hadoop
> jobs you can use some workflow management within hadoop eco ecosystem.
>
>
> On Wed, Oct 5, 2011 at 3:43 PM, Aditya Singh30
> <Ad...@infosys.com>wrote:
>
> > Hi,
> >
> > We want to use Hadoop and Hive to store and analyze some Web Servers' Log
> > files. The servers are running on windows platform. As mentioned about
> > Hadoop, it is only supported for development on windows. I wanted to know
> is
> > there a way that we can run the Hadoop server(namenode server) and
> cluster
> > nodes on  Linux, and have an interface using which we can send files and
> run
> > analysis queries from the WebServer's windows environment.
> > I would really appreciate if you could point me to a right direction.
> >
> >
> > Regards,
> > Aditya Singh
> > Infosys. India
> >
> >
> > **************** CAUTION - Disclaimer *****************
> > This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> > solely
> > for the use of the addressee(s). If you are not the intended recipient,
> > please
> > notify the sender by e-mail and delete the original message. Further, you
> > are not
> > to copy, disclose, or distribute this e-mail or its contents to any other
> > person and
> > any such actions are unlawful. This e-mail may contain viruses. Infosys
> has
> > taken
> > every reasonable precaution to minimize this risk, but is not liable for
> > any damage
> > you may sustain as a result of any virus in this e-mail. You should carry
> > out your
> > own virus checks before opening the e-mail or attachment. Infosys
> reserves
> > the
> > right to monitor and review the content of all messages sent to or from
> > this e-mail
> > address. Messages sent to or from this e-mail address may be stored on
> the
> > Infosys e-mail system.
> > ***INFOSYS******** End of Disclaimer ********INFOSYS***
> >
>

Re: Hadoop : Linux-Window interface

Posted by Bejoy KS <be...@gmail.com>.
Hi Aditya
         Definitely you can do it. As a very basic solution you can ftp the
contents to LFS(LOCAL/Linux File System ) and they do a copyFromLocal into
HDFS. Create a hive table with appropriate regex support and load the data
in. Hive has classes that effectively support parsing and loading of Apache
log files into hive tables.
For the entite data transfer,you just need to write a shell script for the
same. Log analysis won't be real time right? So you can schedule the job
with some scheduler libe a cron or to be used  in conjuction with hadoop
jobs you can use some workflow management within hadoop eco ecosystem.


On Wed, Oct 5, 2011 at 3:43 PM, Aditya Singh30
<Ad...@infosys.com>wrote:

> Hi,
>
> We want to use Hadoop and Hive to store and analyze some Web Servers' Log
> files. The servers are running on windows platform. As mentioned about
> Hadoop, it is only supported for development on windows. I wanted to know is
> there a way that we can run the Hadoop server(namenode server) and cluster
> nodes on  Linux, and have an interface using which we can send files and run
> analysis queries from the WebServer's windows environment.
> I would really appreciate if you could point me to a right direction.
>
>
> Regards,
> Aditya Singh
> Infosys. India
>
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further, you
> are not
> to copy, disclose, or distribute this e-mail or its contents to any other
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has
> taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should carry
> out your
> own virus checks before opening the e-mail or attachment. Infosys reserves
> the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
>