You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Lokesh Basu <lo...@gmail.com> on 2013/05/24 04:39:55 UTC

Where to begin from??

Hi all,

I'm a computer science undergraduate and has recently started to explore
about Hadoop. I find it very interesting and want to get involved both as
contributor and developer for this open source project. I have been going
through many text book related to Hadoop and HDFS but still I find it very
difficult as to where should a beginner start from before writing his first
line of code as contributer or developer.

Also please tell me what are the things I compulsorily need to know before
I dive into depth of these things.

Thanking you all in anticipation.




-- 

*Lokesh Chandra Basu*
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)
+91-8267805498

Re: Where to begin from??

Posted by sc...@gmail.com.
Did you try using MultipleOutputs Class.



Sent from Windows Mail



From: Sanjay Subramanian
Sent: ‎Friday‎, ‎24‎ ‎May‎ ‎2013 ‎11‎:‎13‎ ‎PM
To: user@hadoop.apache.org


Hey guys 




Is there a way to dynamically change the input dir and outputdir




I have the following CONSTANT directories in HDFS 
/path/to/input/9999-99-99 (empty directory )
/path/to/output/9999-99-99 (empty directory)

A new directory with yesterdays date like /path/to/input/2013-05-23 gets created every day 




I set the job params

mapreduce.input.fileinputformat.inputdir=/path/to/input/9999-99-99

mapreduce.output.fileoutputformat.outputdir=/path/to/output/9999-99-99




But in my Mapper I thought I can sneak in this code….But it does not work ?





    protected void setup(Context context)

            throws IOException,

                   InterruptedException{

    
org.apache.hadoop.conf.Configuration hadoopConf = ((JobContext)context).getConfiguration();

    
String inputDir = hadoopConf.get("mapreduce.input.fileinputformat.inputdir");

    
String outputDir = hadoopConf.get("mapreduce.output.fileoutputformat.outputdir");

    
String yesterdaysDate =DateUtils.getDayYYYYMMDD("-", -1);

    
String inputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    
String outputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    
hadoopConf.set("mapreduce.input.fileinputformat.inputdir",inputDirUseThis);

    
hadoopConf.set("mapreduce.output.fileoutputformat.outputdir",outputDirUseThis);

    }




Thanks




sanjay

  


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by sc...@gmail.com.
Did you try using MultipleOutputs Class.



Sent from Windows Mail



From: Sanjay Subramanian
Sent: ‎Friday‎, ‎24‎ ‎May‎ ‎2013 ‎11‎:‎13‎ ‎PM
To: user@hadoop.apache.org


Hey guys 




Is there a way to dynamically change the input dir and outputdir




I have the following CONSTANT directories in HDFS 
/path/to/input/9999-99-99 (empty directory )
/path/to/output/9999-99-99 (empty directory)

A new directory with yesterdays date like /path/to/input/2013-05-23 gets created every day 




I set the job params

mapreduce.input.fileinputformat.inputdir=/path/to/input/9999-99-99

mapreduce.output.fileoutputformat.outputdir=/path/to/output/9999-99-99




But in my Mapper I thought I can sneak in this code….But it does not work ?





    protected void setup(Context context)

            throws IOException,

                   InterruptedException{

    
org.apache.hadoop.conf.Configuration hadoopConf = ((JobContext)context).getConfiguration();

    
String inputDir = hadoopConf.get("mapreduce.input.fileinputformat.inputdir");

    
String outputDir = hadoopConf.get("mapreduce.output.fileoutputformat.outputdir");

    
String yesterdaysDate =DateUtils.getDayYYYYMMDD("-", -1);

    
String inputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    
String outputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    
hadoopConf.set("mapreduce.input.fileinputformat.inputdir",inputDirUseThis);

    
hadoopConf.set("mapreduce.output.fileoutputformat.outputdir",outputDirUseThis);

    }




Thanks




sanjay

  


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by sc...@gmail.com.
Did you try using MultipleOutputs Class.



Sent from Windows Mail



From: Sanjay Subramanian
Sent: ‎Friday‎, ‎24‎ ‎May‎ ‎2013 ‎11‎:‎13‎ ‎PM
To: user@hadoop.apache.org


Hey guys 




Is there a way to dynamically change the input dir and outputdir




I have the following CONSTANT directories in HDFS 
/path/to/input/9999-99-99 (empty directory )
/path/to/output/9999-99-99 (empty directory)

A new directory with yesterdays date like /path/to/input/2013-05-23 gets created every day 




I set the job params

mapreduce.input.fileinputformat.inputdir=/path/to/input/9999-99-99

mapreduce.output.fileoutputformat.outputdir=/path/to/output/9999-99-99




But in my Mapper I thought I can sneak in this code….But it does not work ?





    protected void setup(Context context)

            throws IOException,

                   InterruptedException{

    
org.apache.hadoop.conf.Configuration hadoopConf = ((JobContext)context).getConfiguration();

    
String inputDir = hadoopConf.get("mapreduce.input.fileinputformat.inputdir");

    
String outputDir = hadoopConf.get("mapreduce.output.fileoutputformat.outputdir");

    
String yesterdaysDate =DateUtils.getDayYYYYMMDD("-", -1);

    
String inputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    
String outputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    
hadoopConf.set("mapreduce.input.fileinputformat.inputdir",inputDirUseThis);

    
hadoopConf.set("mapreduce.output.fileoutputformat.outputdir",outputDirUseThis);

    }




Thanks




sanjay

  


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by sc...@gmail.com.
Did you try using MultipleOutputs Class.



Sent from Windows Mail



From: Sanjay Subramanian
Sent: ‎Friday‎, ‎24‎ ‎May‎ ‎2013 ‎11‎:‎13‎ ‎PM
To: user@hadoop.apache.org


Hey guys 




Is there a way to dynamically change the input dir and outputdir




I have the following CONSTANT directories in HDFS 
/path/to/input/9999-99-99 (empty directory )
/path/to/output/9999-99-99 (empty directory)

A new directory with yesterdays date like /path/to/input/2013-05-23 gets created every day 




I set the job params

mapreduce.input.fileinputformat.inputdir=/path/to/input/9999-99-99

mapreduce.output.fileoutputformat.outputdir=/path/to/output/9999-99-99




But in my Mapper I thought I can sneak in this code….But it does not work ?





    protected void setup(Context context)

            throws IOException,

                   InterruptedException{

    
org.apache.hadoop.conf.Configuration hadoopConf = ((JobContext)context).getConfiguration();

    
String inputDir = hadoopConf.get("mapreduce.input.fileinputformat.inputdir");

    
String outputDir = hadoopConf.get("mapreduce.output.fileoutputformat.outputdir");

    
String yesterdaysDate =DateUtils.getDayYYYYMMDD("-", -1);

    
String inputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    
String outputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    
hadoopConf.set("mapreduce.input.fileinputformat.inputdir",inputDirUseThis);

    
hadoopConf.set("mapreduce.output.fileoutputformat.outputdir",outputDirUseThis);

    }




Thanks




sanjay

  


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Hey guys

Is there a way to dynamically change the input dir and outputdir

I have the following CONSTANT directories in HDFS

  *   /path/to/input/9999-99-99 (empty directory )
  *   /path/to/output/9999-99-99 (empty directory)

A new directory with yesterdays date like /path/to/input/2013-05-23 gets created every day

I set the job params
mapreduce.input.fileinputformat.inputdir=/path/to/input/9999-99-99
mapreduce.output.fileoutputformat.outputdir=/path/to/output/9999-99-99

But in my Mapper I thought I can sneak in this code….But it does not work ?


    protected void setup(Context context)

            throws IOException,

                   InterruptedException{

     org.apache.hadoop.conf.Configuration hadoopConf = ((JobContext)context).getConfiguration();

    String inputDir = hadoopConf.get("mapreduce.input.fileinputformat.inputdir");

    String outputDir = hadoopConf.get("mapreduce.output.fileoutputformat.outputdir");

    String yesterdaysDate =DateUtils.getDayYYYYMMDD("-", -1);

    String inputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    String outputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    hadoopConf.set("mapreduce.input.fileinputformat.inputdir",inputDirUseThis);

    hadoopConf.set("mapreduce.output.fileoutputformat.outputdir",outputDirUseThis);

    }

Thanks

sanjay


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Hey guys

Is there a way to dynamically change the input dir and outputdir

I have the following CONSTANT directories in HDFS

  *   /path/to/input/9999-99-99 (empty directory )
  *   /path/to/output/9999-99-99 (empty directory)

A new directory with yesterdays date like /path/to/input/2013-05-23 gets created every day

I set the job params
mapreduce.input.fileinputformat.inputdir=/path/to/input/9999-99-99
mapreduce.output.fileoutputformat.outputdir=/path/to/output/9999-99-99

But in my Mapper I thought I can sneak in this code….But it does not work ?


    protected void setup(Context context)

            throws IOException,

                   InterruptedException{

     org.apache.hadoop.conf.Configuration hadoopConf = ((JobContext)context).getConfiguration();

    String inputDir = hadoopConf.get("mapreduce.input.fileinputformat.inputdir");

    String outputDir = hadoopConf.get("mapreduce.output.fileoutputformat.outputdir");

    String yesterdaysDate =DateUtils.getDayYYYYMMDD("-", -1);

    String inputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    String outputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    hadoopConf.set("mapreduce.input.fileinputformat.inputdir",inputDirUseThis);

    hadoopConf.set("mapreduce.output.fileoutputformat.outputdir",outputDirUseThis);

    }

Thanks

sanjay


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Hey guys

Is there a way to dynamically change the input dir and outputdir

I have the following CONSTANT directories in HDFS

  *   /path/to/input/9999-99-99 (empty directory )
  *   /path/to/output/9999-99-99 (empty directory)

A new directory with yesterdays date like /path/to/input/2013-05-23 gets created every day

I set the job params
mapreduce.input.fileinputformat.inputdir=/path/to/input/9999-99-99
mapreduce.output.fileoutputformat.outputdir=/path/to/output/9999-99-99

But in my Mapper I thought I can sneak in this code….But it does not work ?


    protected void setup(Context context)

            throws IOException,

                   InterruptedException{

     org.apache.hadoop.conf.Configuration hadoopConf = ((JobContext)context).getConfiguration();

    String inputDir = hadoopConf.get("mapreduce.input.fileinputformat.inputdir");

    String outputDir = hadoopConf.get("mapreduce.output.fileoutputformat.outputdir");

    String yesterdaysDate =DateUtils.getDayYYYYMMDD("-", -1);

    String inputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    String outputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    hadoopConf.set("mapreduce.input.fileinputformat.inputdir",inputDirUseThis);

    hadoopConf.set("mapreduce.output.fileoutputformat.outputdir",outputDirUseThis);

    }

Thanks

sanjay


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Hey guys

Is there a way to dynamically change the input dir and outputdir

I have the following CONSTANT directories in HDFS

  *   /path/to/input/9999-99-99 (empty directory )
  *   /path/to/output/9999-99-99 (empty directory)

A new directory with yesterdays date like /path/to/input/2013-05-23 gets created every day

I set the job params
mapreduce.input.fileinputformat.inputdir=/path/to/input/9999-99-99
mapreduce.output.fileoutputformat.outputdir=/path/to/output/9999-99-99

But in my Mapper I thought I can sneak in this code….But it does not work ?


    protected void setup(Context context)

            throws IOException,

                   InterruptedException{

     org.apache.hadoop.conf.Configuration hadoopConf = ((JobContext)context).getConfiguration();

    String inputDir = hadoopConf.get("mapreduce.input.fileinputformat.inputdir");

    String outputDir = hadoopConf.get("mapreduce.output.fileoutputformat.outputdir");

    String yesterdaysDate =DateUtils.getDayYYYYMMDD("-", -1);

    String inputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    String outputDirUseThis = inputDir.replaceAll("9999-99-99", yesterdaysDate);

    hadoopConf.set("mapreduce.input.fileinputformat.inputdir",inputDirUseThis);

    hadoopConf.set("mapreduce.output.fileoutputformat.outputdir",outputDirUseThis);

    }

Thanks

sanjay


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by shashwat shriparv <dw...@gmail.com>.
On Fri, May 24, 2013 at 10:09 AM, Lokesh Basu <lo...@gmail.com> wrote:

> cept that I don't know much about the real world problem and have to begin
> from scratch to get some insight of what is actually driving these
> technologies.
>


Dear Lokesh,

Its best to learn and start doing it and if you face problem start asking
:) welcome to Hadoop world.

*Thanks & Regards    *

∞
Shashwat Shriparv

Re: Where to begin from??

Posted by shashwat shriparv <dw...@gmail.com>.
On Fri, May 24, 2013 at 10:09 AM, Lokesh Basu <lo...@gmail.com> wrote:

> cept that I don't know much about the real world problem and have to begin
> from scratch to get some insight of what is actually driving these
> technologies.
>


Dear Lokesh,

Its best to learn and start doing it and if you face problem start asking
:) welcome to Hadoop world.

*Thanks & Regards    *

∞
Shashwat Shriparv

Re: Where to begin from??

Posted by shashwat shriparv <dw...@gmail.com>.
On Fri, May 24, 2013 at 10:09 AM, Lokesh Basu <lo...@gmail.com> wrote:

> cept that I don't know much about the real world problem and have to begin
> from scratch to get some insight of what is actually driving these
> technologies.
>


Dear Lokesh,

Its best to learn and start doing it and if you face problem start asking
:) welcome to Hadoop world.

*Thanks & Regards    *

∞
Shashwat Shriparv

Re: Where to begin from??

Posted by shashwat shriparv <dw...@gmail.com>.
On Fri, May 24, 2013 at 10:09 AM, Lokesh Basu <lo...@gmail.com> wrote:

> cept that I don't know much about the real world problem and have to begin
> from scratch to get some insight of what is actually driving these
> technologies.
>


Dear Lokesh,

Its best to learn and start doing it and if you face problem start asking
:) welcome to Hadoop world.

*Thanks & Regards    *

∞
Shashwat Shriparv

Re: Where to begin from??

Posted by Lokesh Basu <lo...@gmail.com>.
First of all thank you all.

I accept that I don't know much about the real world problem and have to
begin from scratch to get some insight of what is actually driving these
technologies.

to Chris :

I will start working on finding and implementing some real world problem
and see how these things are implemented in the first place before I try to
do something out of the box.

to Sanjay :

Thank you very much for the sample problems to look into before going into
much detail about it.

to Raj :

Thank you for the appreciation and support for my attempt to learn and
implement something which is new to me. The things that you mentioned like
Linux, Java and Sql are very much familiar to me and in fact I have some
implementation experience with Sql, php, python and c++. I have made some
online event websites and made a command based Search Engine for small
scale search (without something too complex as PageRank). I also have some
experience with version control system as I was trying to qualify for GSoC
2012 (AbiWord, but was unsuccessful).

Right now I just need something like a guide that can allow me to move from
start and let me learn as much as I can, because I I'm willing to give all
the time I have to learn more and more about these things.

Thanking you all for your kind replies and support.


*Lokesh Chandra Basu*
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)
+91-8267805498



On Fri, May 24, 2013 at 9:35 AM, Raj Hadoop <ha...@yahoo.com> wrote:

>
> Hi,
>
> With all due to respect to the senior members of this site, I wanted to
> first congratulate Lokesh for his interest in Hadoop. I want to know how
> many fresh graduates are interested in this technology. I guess not many.
> So we have to welcome Lokesh to Hadoop world.
>
> I agree to the seniors.......It is good and important to know the real
> world problems ....
>
> But coming to your question - as per my knowledge - if u want to learn /
> shine in Hadoop - know the following compulsorily.
> 1) Linux
> 2) Java
> 3) Sql
>
> Seniors may correct me or add or modify to the following list.
>
> Thanks,
> Raj
>  ------------------------------
>  *From:* Sanjay Subramanian <Sa...@wizecommerce.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>; "chris@embree.us"
> <ch...@embree.us>
> *Sent:* Thursday, May 23, 2013 11:03 PM
>
> *Subject:* Re: Where to begin from??
>
>  I agree with Chris…don't worry about what the technology is called
> Hadoop , Big table, Lucene, Hive….Model the problem and see what the
> solution could be….that’s very important
>
>  And Lokesh please don't mind…we are writing to u perhaps stuff that u
> don't want to hear but its an important real perspective
>
>  To illustrate what I mean let me give u a few problems to think about
> and see how u would solve them….
>
>  1. Before Microsoft took over Skype at least this feature used to be
> there and the feature is like this……u type the name of a person and it used
> to come back with some search results in milliseconds often searching close
> to a billion names…….How would u design such a search architecture ?
>
>  2.  In 2012, say 50 million users (cookie based) searched Macys.com on a
> SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013
> on that SALES weekend 60 million users (cookie based) are buying on the
> website….You want to give a 25% extra reward to only those cookies that
> were from last year…So u are looking for an intersection set of possibly
> 20,000 cookies in two sets - 50million and 60 million…..How would u solve
> this problem within milli seconds  ?
>
>  3. Last my favorite….The Postal Services department wants to think of
> new business ideas to avoid bankruptcy…One idea I have is they have zillion
> small delivery vans that go to each street in the country….Say I lease out
> the space to BIG wireless phone providers and promise them them that I will
> mount wireless signal strength measurement systems on these vans and I will
> provide them data 3  times a day…how will u devise a solution to analyse
> and store data ?
>
>  I am sure if u look around in India as well u will see a lot of
> situations where u want to solve a problem….
>
>  As Chris says , think about the problem u want to solve, then model the
> solutions and pick the best one…
>
>  On the flip side….I can tell u it will still be a few years till many
> Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP
> because that data is critical……If your timeline in Facebook does not show a
> photo , its possibly OK but if your 1 million deposit I a bank does not
> show up for days or suddenly vanishes - u r possibly not going to take that
> lightly…..
>
>  Ok enough RAMBLING….
>
>  Good luck
>
>  sanjay
>
>
>
>   From: Chris Embree <ce...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>, "
> chris@embree.us" <ch...@embree.us>
> Date: Thursday, May 23, 2013 7:47 PM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Where to begin from??
>
>   I'll be chastised and have mean things said about me for this.
>
>  Get some experience in IT before you start looking at Hadoop.  My
> reasoning is this:  If you don't know how to develop real applications in a
> Non-Hadoop world, you'll struggle a lot to develop with Hadoop.
>
>  Asking what "things you need to know in compulsory" is like saying you
> want to "learn computers" -- totally worthless!  Find a problem to solve
> and seek to learn the tools you need to solve your problem.  Otherwise,
> your learning is un-applied and somewhat useless.
>
>  Picture a recent acting school graduate how to direct the next Star Wars
> movie.  It's almost like that.
>
>
> On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com>wrote:
>
> Hi all,
>
>  I'm a computer science undergraduate and has recently started to explore
> about Hadoop. I find it very interesting and want to get involved both as
> contributor and developer for this open source project. I have been going
> through many text book related to Hadoop and HDFS but still I find it very
> difficult as to where should a beginner start from before writing his first
> line of code as contributer or developer.
>
>  Also please tell me what are the things I compulsorily need to know
> before I dive into depth of these things.
>
>  Thanking you all in anticipation.
>
>
>
>
> --
>
> *Lokesh Chandra Basu*
>  B. Tech
> Computer Science and Engineering
>  Indian Institute of Technology, Roorkee
> India(GMT +5hr 30min)
> +91-8267805498
>
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>
>

Re: Where to begin from??

Posted by Lokesh Basu <lo...@gmail.com>.
First of all thank you all.

I accept that I don't know much about the real world problem and have to
begin from scratch to get some insight of what is actually driving these
technologies.

to Chris :

I will start working on finding and implementing some real world problem
and see how these things are implemented in the first place before I try to
do something out of the box.

to Sanjay :

Thank you very much for the sample problems to look into before going into
much detail about it.

to Raj :

Thank you for the appreciation and support for my attempt to learn and
implement something which is new to me. The things that you mentioned like
Linux, Java and Sql are very much familiar to me and in fact I have some
implementation experience with Sql, php, python and c++. I have made some
online event websites and made a command based Search Engine for small
scale search (without something too complex as PageRank). I also have some
experience with version control system as I was trying to qualify for GSoC
2012 (AbiWord, but was unsuccessful).

Right now I just need something like a guide that can allow me to move from
start and let me learn as much as I can, because I I'm willing to give all
the time I have to learn more and more about these things.

Thanking you all for your kind replies and support.


*Lokesh Chandra Basu*
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)
+91-8267805498



On Fri, May 24, 2013 at 9:35 AM, Raj Hadoop <ha...@yahoo.com> wrote:

>
> Hi,
>
> With all due to respect to the senior members of this site, I wanted to
> first congratulate Lokesh for his interest in Hadoop. I want to know how
> many fresh graduates are interested in this technology. I guess not many.
> So we have to welcome Lokesh to Hadoop world.
>
> I agree to the seniors.......It is good and important to know the real
> world problems ....
>
> But coming to your question - as per my knowledge - if u want to learn /
> shine in Hadoop - know the following compulsorily.
> 1) Linux
> 2) Java
> 3) Sql
>
> Seniors may correct me or add or modify to the following list.
>
> Thanks,
> Raj
>  ------------------------------
>  *From:* Sanjay Subramanian <Sa...@wizecommerce.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>; "chris@embree.us"
> <ch...@embree.us>
> *Sent:* Thursday, May 23, 2013 11:03 PM
>
> *Subject:* Re: Where to begin from??
>
>  I agree with Chris…don't worry about what the technology is called
> Hadoop , Big table, Lucene, Hive….Model the problem and see what the
> solution could be….that’s very important
>
>  And Lokesh please don't mind…we are writing to u perhaps stuff that u
> don't want to hear but its an important real perspective
>
>  To illustrate what I mean let me give u a few problems to think about
> and see how u would solve them….
>
>  1. Before Microsoft took over Skype at least this feature used to be
> there and the feature is like this……u type the name of a person and it used
> to come back with some search results in milliseconds often searching close
> to a billion names…….How would u design such a search architecture ?
>
>  2.  In 2012, say 50 million users (cookie based) searched Macys.com on a
> SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013
> on that SALES weekend 60 million users (cookie based) are buying on the
> website….You want to give a 25% extra reward to only those cookies that
> were from last year…So u are looking for an intersection set of possibly
> 20,000 cookies in two sets - 50million and 60 million…..How would u solve
> this problem within milli seconds  ?
>
>  3. Last my favorite….The Postal Services department wants to think of
> new business ideas to avoid bankruptcy…One idea I have is they have zillion
> small delivery vans that go to each street in the country….Say I lease out
> the space to BIG wireless phone providers and promise them them that I will
> mount wireless signal strength measurement systems on these vans and I will
> provide them data 3  times a day…how will u devise a solution to analyse
> and store data ?
>
>  I am sure if u look around in India as well u will see a lot of
> situations where u want to solve a problem….
>
>  As Chris says , think about the problem u want to solve, then model the
> solutions and pick the best one…
>
>  On the flip side….I can tell u it will still be a few years till many
> Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP
> because that data is critical……If your timeline in Facebook does not show a
> photo , its possibly OK but if your 1 million deposit I a bank does not
> show up for days or suddenly vanishes - u r possibly not going to take that
> lightly…..
>
>  Ok enough RAMBLING….
>
>  Good luck
>
>  sanjay
>
>
>
>   From: Chris Embree <ce...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>, "
> chris@embree.us" <ch...@embree.us>
> Date: Thursday, May 23, 2013 7:47 PM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Where to begin from??
>
>   I'll be chastised and have mean things said about me for this.
>
>  Get some experience in IT before you start looking at Hadoop.  My
> reasoning is this:  If you don't know how to develop real applications in a
> Non-Hadoop world, you'll struggle a lot to develop with Hadoop.
>
>  Asking what "things you need to know in compulsory" is like saying you
> want to "learn computers" -- totally worthless!  Find a problem to solve
> and seek to learn the tools you need to solve your problem.  Otherwise,
> your learning is un-applied and somewhat useless.
>
>  Picture a recent acting school graduate how to direct the next Star Wars
> movie.  It's almost like that.
>
>
> On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com>wrote:
>
> Hi all,
>
>  I'm a computer science undergraduate and has recently started to explore
> about Hadoop. I find it very interesting and want to get involved both as
> contributor and developer for this open source project. I have been going
> through many text book related to Hadoop and HDFS but still I find it very
> difficult as to where should a beginner start from before writing his first
> line of code as contributer or developer.
>
>  Also please tell me what are the things I compulsorily need to know
> before I dive into depth of these things.
>
>  Thanking you all in anticipation.
>
>
>
>
> --
>
> *Lokesh Chandra Basu*
>  B. Tech
> Computer Science and Engineering
>  Indian Institute of Technology, Roorkee
> India(GMT +5hr 30min)
> +91-8267805498
>
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>
>

Re: Where to begin from??

Posted by Lokesh Basu <lo...@gmail.com>.
First of all thank you all.

I accept that I don't know much about the real world problem and have to
begin from scratch to get some insight of what is actually driving these
technologies.

to Chris :

I will start working on finding and implementing some real world problem
and see how these things are implemented in the first place before I try to
do something out of the box.

to Sanjay :

Thank you very much for the sample problems to look into before going into
much detail about it.

to Raj :

Thank you for the appreciation and support for my attempt to learn and
implement something which is new to me. The things that you mentioned like
Linux, Java and Sql are very much familiar to me and in fact I have some
implementation experience with Sql, php, python and c++. I have made some
online event websites and made a command based Search Engine for small
scale search (without something too complex as PageRank). I also have some
experience with version control system as I was trying to qualify for GSoC
2012 (AbiWord, but was unsuccessful).

Right now I just need something like a guide that can allow me to move from
start and let me learn as much as I can, because I I'm willing to give all
the time I have to learn more and more about these things.

Thanking you all for your kind replies and support.


*Lokesh Chandra Basu*
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)
+91-8267805498



On Fri, May 24, 2013 at 9:35 AM, Raj Hadoop <ha...@yahoo.com> wrote:

>
> Hi,
>
> With all due to respect to the senior members of this site, I wanted to
> first congratulate Lokesh for his interest in Hadoop. I want to know how
> many fresh graduates are interested in this technology. I guess not many.
> So we have to welcome Lokesh to Hadoop world.
>
> I agree to the seniors.......It is good and important to know the real
> world problems ....
>
> But coming to your question - as per my knowledge - if u want to learn /
> shine in Hadoop - know the following compulsorily.
> 1) Linux
> 2) Java
> 3) Sql
>
> Seniors may correct me or add or modify to the following list.
>
> Thanks,
> Raj
>  ------------------------------
>  *From:* Sanjay Subramanian <Sa...@wizecommerce.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>; "chris@embree.us"
> <ch...@embree.us>
> *Sent:* Thursday, May 23, 2013 11:03 PM
>
> *Subject:* Re: Where to begin from??
>
>  I agree with Chris…don't worry about what the technology is called
> Hadoop , Big table, Lucene, Hive….Model the problem and see what the
> solution could be….that’s very important
>
>  And Lokesh please don't mind…we are writing to u perhaps stuff that u
> don't want to hear but its an important real perspective
>
>  To illustrate what I mean let me give u a few problems to think about
> and see how u would solve them….
>
>  1. Before Microsoft took over Skype at least this feature used to be
> there and the feature is like this……u type the name of a person and it used
> to come back with some search results in milliseconds often searching close
> to a billion names…….How would u design such a search architecture ?
>
>  2.  In 2012, say 50 million users (cookie based) searched Macys.com on a
> SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013
> on that SALES weekend 60 million users (cookie based) are buying on the
> website….You want to give a 25% extra reward to only those cookies that
> were from last year…So u are looking for an intersection set of possibly
> 20,000 cookies in two sets - 50million and 60 million…..How would u solve
> this problem within milli seconds  ?
>
>  3. Last my favorite….The Postal Services department wants to think of
> new business ideas to avoid bankruptcy…One idea I have is they have zillion
> small delivery vans that go to each street in the country….Say I lease out
> the space to BIG wireless phone providers and promise them them that I will
> mount wireless signal strength measurement systems on these vans and I will
> provide them data 3  times a day…how will u devise a solution to analyse
> and store data ?
>
>  I am sure if u look around in India as well u will see a lot of
> situations where u want to solve a problem….
>
>  As Chris says , think about the problem u want to solve, then model the
> solutions and pick the best one…
>
>  On the flip side….I can tell u it will still be a few years till many
> Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP
> because that data is critical……If your timeline in Facebook does not show a
> photo , its possibly OK but if your 1 million deposit I a bank does not
> show up for days or suddenly vanishes - u r possibly not going to take that
> lightly…..
>
>  Ok enough RAMBLING….
>
>  Good luck
>
>  sanjay
>
>
>
>   From: Chris Embree <ce...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>, "
> chris@embree.us" <ch...@embree.us>
> Date: Thursday, May 23, 2013 7:47 PM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Where to begin from??
>
>   I'll be chastised and have mean things said about me for this.
>
>  Get some experience in IT before you start looking at Hadoop.  My
> reasoning is this:  If you don't know how to develop real applications in a
> Non-Hadoop world, you'll struggle a lot to develop with Hadoop.
>
>  Asking what "things you need to know in compulsory" is like saying you
> want to "learn computers" -- totally worthless!  Find a problem to solve
> and seek to learn the tools you need to solve your problem.  Otherwise,
> your learning is un-applied and somewhat useless.
>
>  Picture a recent acting school graduate how to direct the next Star Wars
> movie.  It's almost like that.
>
>
> On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com>wrote:
>
> Hi all,
>
>  I'm a computer science undergraduate and has recently started to explore
> about Hadoop. I find it very interesting and want to get involved both as
> contributor and developer for this open source project. I have been going
> through many text book related to Hadoop and HDFS but still I find it very
> difficult as to where should a beginner start from before writing his first
> line of code as contributer or developer.
>
>  Also please tell me what are the things I compulsorily need to know
> before I dive into depth of these things.
>
>  Thanking you all in anticipation.
>
>
>
>
> --
>
> *Lokesh Chandra Basu*
>  B. Tech
> Computer Science and Engineering
>  Indian Institute of Technology, Roorkee
> India(GMT +5hr 30min)
> +91-8267805498
>
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>
>

Re: Where to begin from??

Posted by Lokesh Basu <lo...@gmail.com>.
First of all thank you all.

I accept that I don't know much about the real world problem and have to
begin from scratch to get some insight of what is actually driving these
technologies.

to Chris :

I will start working on finding and implementing some real world problem
and see how these things are implemented in the first place before I try to
do something out of the box.

to Sanjay :

Thank you very much for the sample problems to look into before going into
much detail about it.

to Raj :

Thank you for the appreciation and support for my attempt to learn and
implement something which is new to me. The things that you mentioned like
Linux, Java and Sql are very much familiar to me and in fact I have some
implementation experience with Sql, php, python and c++. I have made some
online event websites and made a command based Search Engine for small
scale search (without something too complex as PageRank). I also have some
experience with version control system as I was trying to qualify for GSoC
2012 (AbiWord, but was unsuccessful).

Right now I just need something like a guide that can allow me to move from
start and let me learn as much as I can, because I I'm willing to give all
the time I have to learn more and more about these things.

Thanking you all for your kind replies and support.


*Lokesh Chandra Basu*
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)
+91-8267805498



On Fri, May 24, 2013 at 9:35 AM, Raj Hadoop <ha...@yahoo.com> wrote:

>
> Hi,
>
> With all due to respect to the senior members of this site, I wanted to
> first congratulate Lokesh for his interest in Hadoop. I want to know how
> many fresh graduates are interested in this technology. I guess not many.
> So we have to welcome Lokesh to Hadoop world.
>
> I agree to the seniors.......It is good and important to know the real
> world problems ....
>
> But coming to your question - as per my knowledge - if u want to learn /
> shine in Hadoop - know the following compulsorily.
> 1) Linux
> 2) Java
> 3) Sql
>
> Seniors may correct me or add or modify to the following list.
>
> Thanks,
> Raj
>  ------------------------------
>  *From:* Sanjay Subramanian <Sa...@wizecommerce.com>
> *To:* "user@hadoop.apache.org" <us...@hadoop.apache.org>; "chris@embree.us"
> <ch...@embree.us>
> *Sent:* Thursday, May 23, 2013 11:03 PM
>
> *Subject:* Re: Where to begin from??
>
>  I agree with Chris…don't worry about what the technology is called
> Hadoop , Big table, Lucene, Hive….Model the problem and see what the
> solution could be….that’s very important
>
>  And Lokesh please don't mind…we are writing to u perhaps stuff that u
> don't want to hear but its an important real perspective
>
>  To illustrate what I mean let me give u a few problems to think about
> and see how u would solve them….
>
>  1. Before Microsoft took over Skype at least this feature used to be
> there and the feature is like this……u type the name of a person and it used
> to come back with some search results in milliseconds often searching close
> to a billion names…….How would u design such a search architecture ?
>
>  2.  In 2012, say 50 million users (cookie based) searched Macys.com on a
> SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013
> on that SALES weekend 60 million users (cookie based) are buying on the
> website….You want to give a 25% extra reward to only those cookies that
> were from last year…So u are looking for an intersection set of possibly
> 20,000 cookies in two sets - 50million and 60 million…..How would u solve
> this problem within milli seconds  ?
>
>  3. Last my favorite….The Postal Services department wants to think of
> new business ideas to avoid bankruptcy…One idea I have is they have zillion
> small delivery vans that go to each street in the country….Say I lease out
> the space to BIG wireless phone providers and promise them them that I will
> mount wireless signal strength measurement systems on these vans and I will
> provide them data 3  times a day…how will u devise a solution to analyse
> and store data ?
>
>  I am sure if u look around in India as well u will see a lot of
> situations where u want to solve a problem….
>
>  As Chris says , think about the problem u want to solve, then model the
> solutions and pick the best one…
>
>  On the flip side….I can tell u it will still be a few years till many
> Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP
> because that data is critical……If your timeline in Facebook does not show a
> photo , its possibly OK but if your 1 million deposit I a bank does not
> show up for days or suddenly vanishes - u r possibly not going to take that
> lightly…..
>
>  Ok enough RAMBLING….
>
>  Good luck
>
>  sanjay
>
>
>
>   From: Chris Embree <ce...@gmail.com>
> Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>, "
> chris@embree.us" <ch...@embree.us>
> Date: Thursday, May 23, 2013 7:47 PM
> To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
> Subject: Re: Where to begin from??
>
>   I'll be chastised and have mean things said about me for this.
>
>  Get some experience in IT before you start looking at Hadoop.  My
> reasoning is this:  If you don't know how to develop real applications in a
> Non-Hadoop world, you'll struggle a lot to develop with Hadoop.
>
>  Asking what "things you need to know in compulsory" is like saying you
> want to "learn computers" -- totally worthless!  Find a problem to solve
> and seek to learn the tools you need to solve your problem.  Otherwise,
> your learning is un-applied and somewhat useless.
>
>  Picture a recent acting school graduate how to direct the next Star Wars
> movie.  It's almost like that.
>
>
> On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com>wrote:
>
> Hi all,
>
>  I'm a computer science undergraduate and has recently started to explore
> about Hadoop. I find it very interesting and want to get involved both as
> contributor and developer for this open source project. I have been going
> through many text book related to Hadoop and HDFS but still I find it very
> difficult as to where should a beginner start from before writing his first
> line of code as contributer or developer.
>
>  Also please tell me what are the things I compulsorily need to know
> before I dive into depth of these things.
>
>  Thanking you all in anticipation.
>
>
>
>
> --
>
> *Lokesh Chandra Basu*
>  B. Tech
> Computer Science and Engineering
>  Indian Institute of Technology, Roorkee
> India(GMT +5hr 30min)
> +91-8267805498
>
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
>
>

Re: Where to begin from??

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,

With all due to respect to the senior members of this site, I wanted to first congratulate Lokesh for his interest in Hadoop. I want to know how many fresh graduates are interested in this technology. I guess not many. So we have to welcome Lokesh to Hadoop world.

I agree to the seniors.......It is good and important to know the real world problems ....

But coming to your question - as per my knowledge - if u want to learn / shine in Hadoop - know the following compulsorily.
1) Linux
2) Java
3) Sql


Seniors may correct me or add or modify to the following list.


Thanks,
Raj


________________________________
 From: Sanjay Subramanian <Sa...@wizecommerce.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>; "chris@embree.us" <ch...@embree.us> 
Sent: Thursday, May 23, 2013 11:03 PM
Subject: Re: Where to begin from??
 


I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important 

And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real perspective

To illustrate what I mean let me give u a few problems to think about and see how u would solve them….

1. Before Microsoft took over Skype at least this feature used to be there and the feature is like this……u type the name of a person and it used to come back with some search results in milliseconds often searching close to a billion names…….How would u design such a search architecture ?

2.  In 2012, say 50 million users (cookie based) searched Macys.com on a SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013 on that SALES weekend 60 million users (cookie based) are buying on the website….You want to give a 25% extra reward to only those cookies that were from last year…So u are looking for an intersection set of possibly 20,000 cookies in two sets - 50million and 60 million…..How would u solve this problem within milli seconds  ?

3. Last my favorite….The Postal Services department wants to think of new business ideas to avoid bankruptcy…One idea I have is they have zillion small delivery vans that go to each street in the country….Say I lease out the space to BIG wireless phone providers and promise them them that I will mount wireless signal strength measurement systems on these vans and I will provide them data 3  times a day…how will u devise a solution to analyse and store data ?

I am sure if u look around in India as well u will see a lot of situations where u want to solve a problem….

As Chris says , think about the problem u want to solve, then model the solutions and pick the best one…

On the flip side….I can tell u it will still be a few years till many Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP because that data is critical……If your timeline in Facebook does not show a photo , its possibly OK but if your 1 million deposit I a bank does not show up for days or suddenly vanishes - u r possibly not going to take that lightly…..

Ok enough RAMBLING….

Good luck

sanjay
  

From: Chris Embree <ce...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>, "chris@embree.us" <ch...@embree.us>
Date: Thursday, May 23, 2013 7:47 PM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: Where to begin from??


I'll be chastised and have mean things said about me for this. 

Get some experience in IT before you start looking at Hadoop.  My reasoning is this:  If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want to "learn computers" -- totally worthless!  Find a problem to solve and seek to learn the tools you need to solve your problem.  Otherwise, your learning is un-applied and somewhat useless. 

Picture a recent acting school graduate how to direct the next Star Wars movie.  It's almost like that.



On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com> wrote:

Hi all, 
>
>
>I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer.
>
>
Also please tell me what are the things I compulsorily need to know before I dive into depth of these things.  
>
>
>Thanking you all in anticipation. 
>
>
>
>
>-- 
>
>
>Lokesh Chandra Basu
>
>B. Tech
>Computer Science and Engineering
>
>Indian Institute of Technology, Roorkee
>India(GMT +5hr 30min)
>+91-8267805498
>
>


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient,
 please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review
 and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,

With all due to respect to the senior members of this site, I wanted to first congratulate Lokesh for his interest in Hadoop. I want to know how many fresh graduates are interested in this technology. I guess not many. So we have to welcome Lokesh to Hadoop world.

I agree to the seniors.......It is good and important to know the real world problems ....

But coming to your question - as per my knowledge - if u want to learn / shine in Hadoop - know the following compulsorily.
1) Linux
2) Java
3) Sql


Seniors may correct me or add or modify to the following list.


Thanks,
Raj


________________________________
 From: Sanjay Subramanian <Sa...@wizecommerce.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>; "chris@embree.us" <ch...@embree.us> 
Sent: Thursday, May 23, 2013 11:03 PM
Subject: Re: Where to begin from??
 


I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important 

And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real perspective

To illustrate what I mean let me give u a few problems to think about and see how u would solve them….

1. Before Microsoft took over Skype at least this feature used to be there and the feature is like this……u type the name of a person and it used to come back with some search results in milliseconds often searching close to a billion names…….How would u design such a search architecture ?

2.  In 2012, say 50 million users (cookie based) searched Macys.com on a SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013 on that SALES weekend 60 million users (cookie based) are buying on the website….You want to give a 25% extra reward to only those cookies that were from last year…So u are looking for an intersection set of possibly 20,000 cookies in two sets - 50million and 60 million…..How would u solve this problem within milli seconds  ?

3. Last my favorite….The Postal Services department wants to think of new business ideas to avoid bankruptcy…One idea I have is they have zillion small delivery vans that go to each street in the country….Say I lease out the space to BIG wireless phone providers and promise them them that I will mount wireless signal strength measurement systems on these vans and I will provide them data 3  times a day…how will u devise a solution to analyse and store data ?

I am sure if u look around in India as well u will see a lot of situations where u want to solve a problem….

As Chris says , think about the problem u want to solve, then model the solutions and pick the best one…

On the flip side….I can tell u it will still be a few years till many Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP because that data is critical……If your timeline in Facebook does not show a photo , its possibly OK but if your 1 million deposit I a bank does not show up for days or suddenly vanishes - u r possibly not going to take that lightly…..

Ok enough RAMBLING….

Good luck

sanjay
  

From: Chris Embree <ce...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>, "chris@embree.us" <ch...@embree.us>
Date: Thursday, May 23, 2013 7:47 PM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: Where to begin from??


I'll be chastised and have mean things said about me for this. 

Get some experience in IT before you start looking at Hadoop.  My reasoning is this:  If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want to "learn computers" -- totally worthless!  Find a problem to solve and seek to learn the tools you need to solve your problem.  Otherwise, your learning is un-applied and somewhat useless. 

Picture a recent acting school graduate how to direct the next Star Wars movie.  It's almost like that.



On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com> wrote:

Hi all, 
>
>
>I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer.
>
>
Also please tell me what are the things I compulsorily need to know before I dive into depth of these things.  
>
>
>Thanking you all in anticipation. 
>
>
>
>
>-- 
>
>
>Lokesh Chandra Basu
>
>B. Tech
>Computer Science and Engineering
>
>Indian Institute of Technology, Roorkee
>India(GMT +5hr 30min)
>+91-8267805498
>
>


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient,
 please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review
 and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,

With all due to respect to the senior members of this site, I wanted to first congratulate Lokesh for his interest in Hadoop. I want to know how many fresh graduates are interested in this technology. I guess not many. So we have to welcome Lokesh to Hadoop world.

I agree to the seniors.......It is good and important to know the real world problems ....

But coming to your question - as per my knowledge - if u want to learn / shine in Hadoop - know the following compulsorily.
1) Linux
2) Java
3) Sql


Seniors may correct me or add or modify to the following list.


Thanks,
Raj


________________________________
 From: Sanjay Subramanian <Sa...@wizecommerce.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>; "chris@embree.us" <ch...@embree.us> 
Sent: Thursday, May 23, 2013 11:03 PM
Subject: Re: Where to begin from??
 


I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important 

And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real perspective

To illustrate what I mean let me give u a few problems to think about and see how u would solve them….

1. Before Microsoft took over Skype at least this feature used to be there and the feature is like this……u type the name of a person and it used to come back with some search results in milliseconds often searching close to a billion names…….How would u design such a search architecture ?

2.  In 2012, say 50 million users (cookie based) searched Macys.com on a SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013 on that SALES weekend 60 million users (cookie based) are buying on the website….You want to give a 25% extra reward to only those cookies that were from last year…So u are looking for an intersection set of possibly 20,000 cookies in two sets - 50million and 60 million…..How would u solve this problem within milli seconds  ?

3. Last my favorite….The Postal Services department wants to think of new business ideas to avoid bankruptcy…One idea I have is they have zillion small delivery vans that go to each street in the country….Say I lease out the space to BIG wireless phone providers and promise them them that I will mount wireless signal strength measurement systems on these vans and I will provide them data 3  times a day…how will u devise a solution to analyse and store data ?

I am sure if u look around in India as well u will see a lot of situations where u want to solve a problem….

As Chris says , think about the problem u want to solve, then model the solutions and pick the best one…

On the flip side….I can tell u it will still be a few years till many Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP because that data is critical……If your timeline in Facebook does not show a photo , its possibly OK but if your 1 million deposit I a bank does not show up for days or suddenly vanishes - u r possibly not going to take that lightly…..

Ok enough RAMBLING….

Good luck

sanjay
  

From: Chris Embree <ce...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>, "chris@embree.us" <ch...@embree.us>
Date: Thursday, May 23, 2013 7:47 PM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: Where to begin from??


I'll be chastised and have mean things said about me for this. 

Get some experience in IT before you start looking at Hadoop.  My reasoning is this:  If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want to "learn computers" -- totally worthless!  Find a problem to solve and seek to learn the tools you need to solve your problem.  Otherwise, your learning is un-applied and somewhat useless. 

Picture a recent acting school graduate how to direct the next Star Wars movie.  It's almost like that.



On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com> wrote:

Hi all, 
>
>
>I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer.
>
>
Also please tell me what are the things I compulsorily need to know before I dive into depth of these things.  
>
>
>Thanking you all in anticipation. 
>
>
>
>
>-- 
>
>
>Lokesh Chandra Basu
>
>B. Tech
>Computer Science and Engineering
>
>Indian Institute of Technology, Roorkee
>India(GMT +5hr 30min)
>+91-8267805498
>
>


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient,
 please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review
 and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Raj Hadoop <ha...@yahoo.com>.

Hi,

With all due to respect to the senior members of this site, I wanted to first congratulate Lokesh for his interest in Hadoop. I want to know how many fresh graduates are interested in this technology. I guess not many. So we have to welcome Lokesh to Hadoop world.

I agree to the seniors.......It is good and important to know the real world problems ....

But coming to your question - as per my knowledge - if u want to learn / shine in Hadoop - know the following compulsorily.
1) Linux
2) Java
3) Sql


Seniors may correct me or add or modify to the following list.


Thanks,
Raj


________________________________
 From: Sanjay Subramanian <Sa...@wizecommerce.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>; "chris@embree.us" <ch...@embree.us> 
Sent: Thursday, May 23, 2013 11:03 PM
Subject: Re: Where to begin from??
 


I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important 

And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real perspective

To illustrate what I mean let me give u a few problems to think about and see how u would solve them….

1. Before Microsoft took over Skype at least this feature used to be there and the feature is like this……u type the name of a person and it used to come back with some search results in milliseconds often searching close to a billion names…….How would u design such a search architecture ?

2.  In 2012, say 50 million users (cookie based) searched Macys.com on a SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013 on that SALES weekend 60 million users (cookie based) are buying on the website….You want to give a 25% extra reward to only those cookies that were from last year…So u are looking for an intersection set of possibly 20,000 cookies in two sets - 50million and 60 million…..How would u solve this problem within milli seconds  ?

3. Last my favorite….The Postal Services department wants to think of new business ideas to avoid bankruptcy…One idea I have is they have zillion small delivery vans that go to each street in the country….Say I lease out the space to BIG wireless phone providers and promise them them that I will mount wireless signal strength measurement systems on these vans and I will provide them data 3  times a day…how will u devise a solution to analyse and store data ?

I am sure if u look around in India as well u will see a lot of situations where u want to solve a problem….

As Chris says , think about the problem u want to solve, then model the solutions and pick the best one…

On the flip side….I can tell u it will still be a few years till many Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP because that data is critical……If your timeline in Facebook does not show a photo , its possibly OK but if your 1 million deposit I a bank does not show up for days or suddenly vanishes - u r possibly not going to take that lightly…..

Ok enough RAMBLING….

Good luck

sanjay
  

From: Chris Embree <ce...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>, "chris@embree.us" <ch...@embree.us>
Date: Thursday, May 23, 2013 7:47 PM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: Where to begin from??


I'll be chastised and have mean things said about me for this. 

Get some experience in IT before you start looking at Hadoop.  My reasoning is this:  If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want to "learn computers" -- totally worthless!  Find a problem to solve and seek to learn the tools you need to solve your problem.  Otherwise, your learning is un-applied and somewhat useless. 

Picture a recent acting school graduate how to direct the next Star Wars movie.  It's almost like that.



On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com> wrote:

Hi all, 
>
>
>I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer.
>
>
Also please tell me what are the things I compulsorily need to know before I dive into depth of these things.  
>
>
>Thanking you all in anticipation. 
>
>
>
>
>-- 
>
>
>Lokesh Chandra Basu
>
>B. Tech
>Computer Science and Engineering
>
>Indian Institute of Technology, Roorkee
>India(GMT +5hr 30min)
>+91-8267805498
>
>


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient,
 please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review
 and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important

And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real perspective

To illustrate what I mean let me give u a few problems to think about and see how u would solve them….

1. Before Microsoft took over Skype at least this feature used to be there and the feature is like this……u type the name of a person and it used to come back with some search results in milliseconds often searching close to a billion names…….How would u design such a search architecture ?

2.  In 2012, say 50 million users (cookie based) searched Macys.com on a SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013 on that SALES weekend 60 million users (cookie based) are buying on the website….You want to give a 25% extra reward to only those cookies that were from last year…So u are looking for an intersection set of possibly 20,000 cookies in two sets - 50million and 60 million…..How would u solve this problem within milli seconds  ?

3. Last my favorite….The Postal Services department wants to think of new business ideas to avoid bankruptcy…One idea I have is they have zillion small delivery vans that go to each street in the country….Say I lease out the space to BIG wireless phone providers and promise them them that I will mount wireless signal strength measurement systems on these vans and I will provide them data 3  times a day…how will u devise a solution to analyse and store data ?

I am sure if u look around in India as well u will see a lot of situations where u want to solve a problem….

As Chris says , think about the problem u want to solve, then model the solutions and pick the best one…

On the flip side….I can tell u it will still be a few years till many Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP because that data is critical……If your timeline in Facebook does not show a photo , its possibly OK but if your 1 million deposit I a bank does not show up for days or suddenly vanishes - u r possibly not going to take that lightly…..

Ok enough RAMBLING….

Good luck

sanjay



From: Chris Embree <ce...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>, "chris@embree.us<ma...@embree.us>" <ch...@embree.us>>
Date: Thursday, May 23, 2013 7:47 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Where to begin from??

I'll be chastised and have mean things said about me for this.

Get some experience in IT before you start looking at Hadoop.  My reasoning is this:  If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want to "learn computers" -- totally worthless!  Find a problem to solve and seek to learn the tools you need to solve your problem.  Otherwise, your learning is un-applied and somewhat useless.

Picture a recent acting school graduate how to direct the next Star Wars movie.  It's almost like that.


On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com>> wrote:
Hi all,

I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer.

Also please tell me what are the things I compulsorily need to know before I dive into depth of these things.

Thanking you all in anticipation.




--

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)
+91-8267805498<tel:%2B91-8267805498>




CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important

And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real perspective

To illustrate what I mean let me give u a few problems to think about and see how u would solve them….

1. Before Microsoft took over Skype at least this feature used to be there and the feature is like this……u type the name of a person and it used to come back with some search results in milliseconds often searching close to a billion names…….How would u design such a search architecture ?

2.  In 2012, say 50 million users (cookie based) searched Macys.com on a SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013 on that SALES weekend 60 million users (cookie based) are buying on the website….You want to give a 25% extra reward to only those cookies that were from last year…So u are looking for an intersection set of possibly 20,000 cookies in two sets - 50million and 60 million…..How would u solve this problem within milli seconds  ?

3. Last my favorite….The Postal Services department wants to think of new business ideas to avoid bankruptcy…One idea I have is they have zillion small delivery vans that go to each street in the country….Say I lease out the space to BIG wireless phone providers and promise them them that I will mount wireless signal strength measurement systems on these vans and I will provide them data 3  times a day…how will u devise a solution to analyse and store data ?

I am sure if u look around in India as well u will see a lot of situations where u want to solve a problem….

As Chris says , think about the problem u want to solve, then model the solutions and pick the best one…

On the flip side….I can tell u it will still be a few years till many Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP because that data is critical……If your timeline in Facebook does not show a photo , its possibly OK but if your 1 million deposit I a bank does not show up for days or suddenly vanishes - u r possibly not going to take that lightly…..

Ok enough RAMBLING….

Good luck

sanjay



From: Chris Embree <ce...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>, "chris@embree.us<ma...@embree.us>" <ch...@embree.us>>
Date: Thursday, May 23, 2013 7:47 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Where to begin from??

I'll be chastised and have mean things said about me for this.

Get some experience in IT before you start looking at Hadoop.  My reasoning is this:  If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want to "learn computers" -- totally worthless!  Find a problem to solve and seek to learn the tools you need to solve your problem.  Otherwise, your learning is un-applied and somewhat useless.

Picture a recent acting school graduate how to direct the next Star Wars movie.  It's almost like that.


On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com>> wrote:
Hi all,

I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer.

Also please tell me what are the things I compulsorily need to know before I dive into depth of these things.

Thanking you all in anticipation.




--

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)
+91-8267805498<tel:%2B91-8267805498>




CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important

And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real perspective

To illustrate what I mean let me give u a few problems to think about and see how u would solve them….

1. Before Microsoft took over Skype at least this feature used to be there and the feature is like this……u type the name of a person and it used to come back with some search results in milliseconds often searching close to a billion names…….How would u design such a search architecture ?

2.  In 2012, say 50 million users (cookie based) searched Macys.com on a SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013 on that SALES weekend 60 million users (cookie based) are buying on the website….You want to give a 25% extra reward to only those cookies that were from last year…So u are looking for an intersection set of possibly 20,000 cookies in two sets - 50million and 60 million…..How would u solve this problem within milli seconds  ?

3. Last my favorite….The Postal Services department wants to think of new business ideas to avoid bankruptcy…One idea I have is they have zillion small delivery vans that go to each street in the country….Say I lease out the space to BIG wireless phone providers and promise them them that I will mount wireless signal strength measurement systems on these vans and I will provide them data 3  times a day…how will u devise a solution to analyse and store data ?

I am sure if u look around in India as well u will see a lot of situations where u want to solve a problem….

As Chris says , think about the problem u want to solve, then model the solutions and pick the best one…

On the flip side….I can tell u it will still be a few years till many Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP because that data is critical……If your timeline in Facebook does not show a photo , its possibly OK but if your 1 million deposit I a bank does not show up for days or suddenly vanishes - u r possibly not going to take that lightly…..

Ok enough RAMBLING….

Good luck

sanjay



From: Chris Embree <ce...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>, "chris@embree.us<ma...@embree.us>" <ch...@embree.us>>
Date: Thursday, May 23, 2013 7:47 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Where to begin from??

I'll be chastised and have mean things said about me for this.

Get some experience in IT before you start looking at Hadoop.  My reasoning is this:  If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want to "learn computers" -- totally worthless!  Find a problem to solve and seek to learn the tools you need to solve your problem.  Otherwise, your learning is un-applied and somewhat useless.

Picture a recent acting school graduate how to direct the next Star Wars movie.  It's almost like that.


On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com>> wrote:
Hi all,

I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer.

Also please tell me what are the things I compulsorily need to know before I dive into depth of these things.

Thanking you all in anticipation.




--

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)
+91-8267805498<tel:%2B91-8267805498>




CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
I agree with Chris…don't worry about what the technology is called Hadoop , Big table, Lucene, Hive….Model the problem and see what the solution could be….that’s very important

And Lokesh please don't mind…we are writing to u perhaps stuff that u don't want to hear but its an important real perspective

To illustrate what I mean let me give u a few problems to think about and see how u would solve them….

1. Before Microsoft took over Skype at least this feature used to be there and the feature is like this……u type the name of a person and it used to come back with some search results in milliseconds often searching close to a billion names…….How would u design such a search architecture ?

2.  In 2012, say 50 million users (cookie based) searched Macys.com on a SALES weekend and say 20,000 bought $100 dollar shoes. Now this year 2013 on that SALES weekend 60 million users (cookie based) are buying on the website….You want to give a 25% extra reward to only those cookies that were from last year…So u are looking for an intersection set of possibly 20,000 cookies in two sets - 50million and 60 million…..How would u solve this problem within milli seconds  ?

3. Last my favorite….The Postal Services department wants to think of new business ideas to avoid bankruptcy…One idea I have is they have zillion small delivery vans that go to each street in the country….Say I lease out the space to BIG wireless phone providers and promise them them that I will mount wireless signal strength measurement systems on these vans and I will provide them data 3  times a day…how will u devise a solution to analyse and store data ?

I am sure if u look around in India as well u will see a lot of situations where u want to solve a problem….

As Chris says , think about the problem u want to solve, then model the solutions and pick the best one…

On the flip side….I can tell u it will still be a few years till many Banks and Stock trading houses will believe in Cassandra and Hbase for OLTP because that data is critical……If your timeline in Facebook does not show a photo , its possibly OK but if your 1 million deposit I a bank does not show up for days or suddenly vanishes - u r possibly not going to take that lightly…..

Ok enough RAMBLING….

Good luck

sanjay



From: Chris Embree <ce...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>, "chris@embree.us<ma...@embree.us>" <ch...@embree.us>>
Date: Thursday, May 23, 2013 7:47 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Where to begin from??

I'll be chastised and have mean things said about me for this.

Get some experience in IT before you start looking at Hadoop.  My reasoning is this:  If you don't know how to develop real applications in a Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want to "learn computers" -- totally worthless!  Find a problem to solve and seek to learn the tools you need to solve your problem.  Otherwise, your learning is un-applied and somewhat useless.

Picture a recent acting school graduate how to direct the next Star Wars movie.  It's almost like that.


On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com>> wrote:
Hi all,

I'm a computer science undergraduate and has recently started to explore about Hadoop. I find it very interesting and want to get involved both as contributor and developer for this open source project. I have been going through many text book related to Hadoop and HDFS but still I find it very difficult as to where should a beginner start from before writing his first line of code as contributer or developer.

Also please tell me what are the things I compulsorily need to know before I dive into depth of these things.

Thanking you all in anticipation.




--

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)
+91-8267805498<tel:%2B91-8267805498>




CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Where to begin from??

Posted by Chris Embree <ce...@gmail.com>.
I'll be chastised and have mean things said about me for this.

Get some experience in IT before you start looking at Hadoop.  My reasoning
is this:  If you don't know how to develop real applications in a
Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want
to "learn computers" -- totally worthless!  Find a problem to solve and
seek to learn the tools you need to solve your problem.  Otherwise, your
learning is un-applied and somewhat useless.

Picture a recent acting school graduate how to direct the next Star Wars
movie.  It's almost like that.


On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com> wrote:

> Hi all,
>
> I'm a computer science undergraduate and has recently started to explore
> about Hadoop. I find it very interesting and want to get involved both as
> contributor and developer for this open source project. I have been going
> through many text book related to Hadoop and HDFS but still I find it very
> difficult as to where should a beginner start from before writing his first
> line of code as contributer or developer.
>
> Also please tell me what are the things I compulsorily need to know before
> I dive into depth of these things.
>
> Thanking you all in anticipation.
>
>
>
>
> --
>
> *Lokesh Chandra Basu*
> B. Tech
> Computer Science and Engineering
> Indian Institute of Technology, Roorkee
> India(GMT +5hr 30min)
> +91-8267805498
>
>
>

Re: Where to begin from??

Posted by Chris Embree <ce...@gmail.com>.
I'll be chastised and have mean things said about me for this.

Get some experience in IT before you start looking at Hadoop.  My reasoning
is this:  If you don't know how to develop real applications in a
Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want
to "learn computers" -- totally worthless!  Find a problem to solve and
seek to learn the tools you need to solve your problem.  Otherwise, your
learning is un-applied and somewhat useless.

Picture a recent acting school graduate how to direct the next Star Wars
movie.  It's almost like that.


On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com> wrote:

> Hi all,
>
> I'm a computer science undergraduate and has recently started to explore
> about Hadoop. I find it very interesting and want to get involved both as
> contributor and developer for this open source project. I have been going
> through many text book related to Hadoop and HDFS but still I find it very
> difficult as to where should a beginner start from before writing his first
> line of code as contributer or developer.
>
> Also please tell me what are the things I compulsorily need to know before
> I dive into depth of these things.
>
> Thanking you all in anticipation.
>
>
>
>
> --
>
> *Lokesh Chandra Basu*
> B. Tech
> Computer Science and Engineering
> Indian Institute of Technology, Roorkee
> India(GMT +5hr 30min)
> +91-8267805498
>
>
>

Re: Where to begin from??

Posted by Chris Embree <ce...@gmail.com>.
I'll be chastised and have mean things said about me for this.

Get some experience in IT before you start looking at Hadoop.  My reasoning
is this:  If you don't know how to develop real applications in a
Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want
to "learn computers" -- totally worthless!  Find a problem to solve and
seek to learn the tools you need to solve your problem.  Otherwise, your
learning is un-applied and somewhat useless.

Picture a recent acting school graduate how to direct the next Star Wars
movie.  It's almost like that.


On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com> wrote:

> Hi all,
>
> I'm a computer science undergraduate and has recently started to explore
> about Hadoop. I find it very interesting and want to get involved both as
> contributor and developer for this open source project. I have been going
> through many text book related to Hadoop and HDFS but still I find it very
> difficult as to where should a beginner start from before writing his first
> line of code as contributer or developer.
>
> Also please tell me what are the things I compulsorily need to know before
> I dive into depth of these things.
>
> Thanking you all in anticipation.
>
>
>
>
> --
>
> *Lokesh Chandra Basu*
> B. Tech
> Computer Science and Engineering
> Indian Institute of Technology, Roorkee
> India(GMT +5hr 30min)
> +91-8267805498
>
>
>

Re: Where to begin from??

Posted by Chris Embree <ce...@gmail.com>.
I'll be chastised and have mean things said about me for this.

Get some experience in IT before you start looking at Hadoop.  My reasoning
is this:  If you don't know how to develop real applications in a
Non-Hadoop world, you'll struggle a lot to develop with Hadoop.

Asking what "things you need to know in compulsory" is like saying you want
to "learn computers" -- totally worthless!  Find a problem to solve and
seek to learn the tools you need to solve your problem.  Otherwise, your
learning is un-applied and somewhat useless.

Picture a recent acting school graduate how to direct the next Star Wars
movie.  It's almost like that.


On Thu, May 23, 2013 at 10:39 PM, Lokesh Basu <lo...@gmail.com> wrote:

> Hi all,
>
> I'm a computer science undergraduate and has recently started to explore
> about Hadoop. I find it very interesting and want to get involved both as
> contributor and developer for this open source project. I have been going
> through many text book related to Hadoop and HDFS but still I find it very
> difficult as to where should a beginner start from before writing his first
> line of code as contributer or developer.
>
> Also please tell me what are the things I compulsorily need to know before
> I dive into depth of these things.
>
> Thanking you all in anticipation.
>
>
>
>
> --
>
> *Lokesh Chandra Basu*
> B. Tech
> Computer Science and Engineering
> Indian Institute of Technology, Roorkee
> India(GMT +5hr 30min)
> +91-8267805498
>
>
>