You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com> on 2013/06/28 13:20:24 UTC

Business Analysts in Hadoop World

Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.


*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:


*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?


*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?


*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

RE: Business Analysts in Hadoop World

Posted by John Lilley <jo...@redpoint.net>.

Hadoop is not yet an easy learning curve, so I'd recommend that you start with Amazon Elastic MapReduce as an experimental platform to start learning.
John

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: Friday, June 28, 2013 7:10 AM
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

Re: Business Analysts in Hadoop World

Posted by Peyman Mohajerian <mo...@gmail.com>.

I would say as a BA you definitely don't have to bother with Java and the
goal of open source big data is not to have people work with Java. If you
get familiar with Hive and HQL it will help you a lot. A lot of
organizations are entering into this space from DW/BI background and none
have to do Java. Even if you do want do some low level data analysis,
Python would be better from your point of view (the language of a lot of
Data Science guys). Java is great if you want to start contributing to the
code base or writing the most optimized queries or understand the low level
implementations.



On Fri, Jun 28, 2013 at 7:22 PM, Michael Aro <m....@gmail.com>wrote:

> Hi Vijay,
>
> Scott Gnau of Teradata Labs mentioned something related in the recent
> Hadoop summit in San Jose. The title of his presentation was "Putting
> Hadoop to Work in the Enterprise" and you can watch the video via this
> link: http://hadoopsummit.org/san-jose/keynote-day1/. He mentioned the
> business analyst around time 07:00 in the video. All the videos were great!
>
> Mike.
>
>
> On Fri, Jun 28, 2013 at 10:49 AM, <sa...@yahoo.com> wrote:
>
>> Is Data scientist in hadoop same as BA in IT.
>> Sent from BlackBerry® on Airtel
>> ------------------------------
>> *From: * Michael Forage <Mi...@livenation.co.uk>
>> *Date: *Fri, 28 Jun 2013 13:57:24 +0000
>> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *RE: Business Analysts in Hadoop World
>>
>>  Hi Vijay****
>>
>> ** **
>>
>> I’m afraid I’m not experienced or specialised enough in either Hadoop or
>> the broader Big data industry to give any advice on career paths****
>>
>> ** **
>>
>> Obviously different organisations expect completely different levels of
>> technical contribution from their Business Analysts but in my experience
>> the ability to collect, evaluate and interpret business requirements into
>> some kind of functional specification document is key. The responsibility
>> for subsequent technical specifications based applied knowledge of the big
>> data tools at disposal will probably sit with your Solution Architect.
>> Often a BA wouldn’t care how a solution is implemented under the covers.
>> However, in the same way that a BA may define user-flows in a use-case they
>> could outline a conceptual data processing flow if they understand the
>> source data and requirements well enough.****
>>
>> ** **
>>
>> There are so many new tools and technologies in this space already that,
>> unless you have a specific requirement to meet, it can be pretty
>> overwhelming. I’d just start by concentrating on getting an understanding
>> of map reduce concepts. Sure, it always helps if you’ve the time to get
>> some hands-on technical  experience but that’s not a trivial undertaking
>> from a standing start and it may be completely irrelevant for you in the
>> long run as that’s not what a BA is paid to do.****
>>
>> ** **
>>
>> Sorry I can’t be more help****
>>
>> Mike****
>>
>> ** **
>>
>> *From:* Vijaya Narayana Reddy Bhoomi Reddy [mailto:
>> vijaya.bhoomi@huawei.com]
>> *Sent:* 28 June 2013 14:10
>> *To:* user@hadoop.apache.org
>> *Subject:* RE: Business Analysts in Hadoop World****
>>
>> ** **
>>
>> Michael,****
>>
>> ** **
>>
>> Thanks for your advice. I am just confused because I could not see a
>> clear career path for Business Analysts in the Hadoop world. May be
>> because, it’s an evolving field or maybe I am not yet aware of the same
>> despite reading some primary content on the internet. However, I firmly
>> believe the Big Data space is going to be very big in the next years and I
>> would like to be part of it and contribute. I would like to know from you
>> more on the role and responsibilities a BA can perform in this space and
>> the possible areas / technologies which I need myself to be prepared for.
>> ****
>>
>> ** **
>>
>> Thanks****
>>
>> Vijay****
>>
>> ** **
>>
>> *From:* Michael Forage [mailto:Michael.Forage@livenation.co.uk<Mi...@livenation.co.uk>]
>>
>> *Sent:* Friday, June 28, 2013 5:23 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* RE: Business Analysts in Hadoop World****
>>
>> ** **
>>
>> Hi Vijay****
>>
>>  ****
>>
>> My advice is to carefully consider the scope of the role you’re aiming for
>> ****
>>
>>  ****
>>
>> As a BA I expect that you’d be able to add value by understanding your
>> business data processing challenges and turning them into specifications
>> for map/reduce jobs. This doesn’t require you to have any Java coding
>> skills as such, just a good handle of the map/reduce concepts. It also
>> helps if you understand common use-cases associated with these technologies
>> (as it’s really an ecosystem of related toolsets) as well as what they’re
>> not so good for.****
>>
>>  ****
>>
>> This would allow you to contribute to solution design on behalf of the
>> business but assumes you’re not concerned with the actual
>> implementation/administration side of things. Definitely only bother
>> re-learning Java if you’re going to be the one writing the code. You do
>> need to have had experience of working with data of some kind (this is a
>> data processing environment at the end of the day) but simply reading a
>> decent Hadoop book and googling a few websites should give you a decent
>> enough background as a BA****
>>
>>  ****
>>
>> Cheers****
>>
>> Mike****
>>
>>  ****
>>
>> *From:* Lokesh Basu [mailto:lokesh.basu@gmail.com <lo...@gmail.com>]
>>
>> *Sent:* 28 June 2013 12:35
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Business Analysts in Hadoop World****
>>
>>  ****
>>
>> Dear Vijay,
>>
>> If you are a beginner in the open source project then I would recommend
>> you to first get familiar with Java and some version control system and
>> then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.
>>
>> If you are not aware of the Hadoop project then you should go through
>> some online text/videos to get the insight.
>>
>> Best of luck****
>>
>>
>> ****
>>
>>
>> *Lokesh Chandra Basu*****
>>
>> B. Tech****
>>
>> Computer Science and Engineering****
>>
>> Indian Institute of Technology, Roorkee****
>>
>> India(GMT +5hr 30min)****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>> On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
>> vijaya.bhoomi@huawei.com> wrote:****
>>
>> Hi,****
>>
>>  ****
>>
>> I am just trying to get myself acquainted with Hadoop and other related
>> technologies. I am very much fascinated with the potential of the Big Data
>> world and hence would like to be part of it!! ****
>>
>> However, it has been a while I have done any coding. Earlier for a brief
>> period of time during early days of my career, I have done some work in
>> Java. All these days, I am working as a Business Analyst in the CRM space.
>> ****
>>
>>  ****
>>
>> ·         Before I start exploring Hadoop world, I would like to hear
>> your thoughts on the following queries:****
>>
>>  ****
>>
>> ·         Being a business analyst, what would be the possible career
>> opportunities in the Hadoop space?****
>>
>>  ****
>>
>> ·         Is it necessary to have a strong technical background before
>> jumping into Hadoop? If so, which technologies need to be learnt primarily?
>> Java, SQL etc?****
>>
>>  ****
>>
>> ·         What are the various certifications available in the Hadoop
>> world? Are there any certifications for Business Analysts?****
>>
>>  ****
>>
>> Please let me know your valuable thoughts.****
>>
>>  ****
>>
>> Thanks****
>>
>> Vijay****
>>
>>  ****
>>
>> ****
>>
>> *Michael Forage* | Solutions Architect - Insight Services
>> Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44
>> 7808 174404
>> Address: 4 Pentonville Road | London | N1 9HF | United Kingdom****
>>
>> Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House
>> 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered
>> in England and Wales.****
>>
>> This message is confidential and may be legally privileged or otherwise
>> protected from disclosure. If you are not the intended recipient, please
>> telephone or email the sender and delete this message and any attachment
>> from your system; you must not copy or disclose the contents of this
>> message or any attachment to any other person.****
>>
>
>

Re: Business Analysts in Hadoop World

Posted by Peyman Mohajerian <mo...@gmail.com>.

I would say as a BA you definitely don't have to bother with Java and the
goal of open source big data is not to have people work with Java. If you
get familiar with Hive and HQL it will help you a lot. A lot of
organizations are entering into this space from DW/BI background and none
have to do Java. Even if you do want do some low level data analysis,
Python would be better from your point of view (the language of a lot of
Data Science guys). Java is great if you want to start contributing to the
code base or writing the most optimized queries or understand the low level
implementations.



On Fri, Jun 28, 2013 at 7:22 PM, Michael Aro <m....@gmail.com>wrote:

> Hi Vijay,
>
> Scott Gnau of Teradata Labs mentioned something related in the recent
> Hadoop summit in San Jose. The title of his presentation was "Putting
> Hadoop to Work in the Enterprise" and you can watch the video via this
> link: http://hadoopsummit.org/san-jose/keynote-day1/. He mentioned the
> business analyst around time 07:00 in the video. All the videos were great!
>
> Mike.
>
>
> On Fri, Jun 28, 2013 at 10:49 AM, <sa...@yahoo.com> wrote:
>
>> Is Data scientist in hadoop same as BA in IT.
>> Sent from BlackBerry® on Airtel
>> ------------------------------
>> *From: * Michael Forage <Mi...@livenation.co.uk>
>> *Date: *Fri, 28 Jun 2013 13:57:24 +0000
>> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *RE: Business Analysts in Hadoop World
>>
>>  Hi Vijay****
>>
>> ** **
>>
>> I’m afraid I’m not experienced or specialised enough in either Hadoop or
>> the broader Big data industry to give any advice on career paths****
>>
>> ** **
>>
>> Obviously different organisations expect completely different levels of
>> technical contribution from their Business Analysts but in my experience
>> the ability to collect, evaluate and interpret business requirements into
>> some kind of functional specification document is key. The responsibility
>> for subsequent technical specifications based applied knowledge of the big
>> data tools at disposal will probably sit with your Solution Architect.
>> Often a BA wouldn’t care how a solution is implemented under the covers.
>> However, in the same way that a BA may define user-flows in a use-case they
>> could outline a conceptual data processing flow if they understand the
>> source data and requirements well enough.****
>>
>> ** **
>>
>> There are so many new tools and technologies in this space already that,
>> unless you have a specific requirement to meet, it can be pretty
>> overwhelming. I’d just start by concentrating on getting an understanding
>> of map reduce concepts. Sure, it always helps if you’ve the time to get
>> some hands-on technical  experience but that’s not a trivial undertaking
>> from a standing start and it may be completely irrelevant for you in the
>> long run as that’s not what a BA is paid to do.****
>>
>> ** **
>>
>> Sorry I can’t be more help****
>>
>> Mike****
>>
>> ** **
>>
>> *From:* Vijaya Narayana Reddy Bhoomi Reddy [mailto:
>> vijaya.bhoomi@huawei.com]
>> *Sent:* 28 June 2013 14:10
>> *To:* user@hadoop.apache.org
>> *Subject:* RE: Business Analysts in Hadoop World****
>>
>> ** **
>>
>> Michael,****
>>
>> ** **
>>
>> Thanks for your advice. I am just confused because I could not see a
>> clear career path for Business Analysts in the Hadoop world. May be
>> because, it’s an evolving field or maybe I am not yet aware of the same
>> despite reading some primary content on the internet. However, I firmly
>> believe the Big Data space is going to be very big in the next years and I
>> would like to be part of it and contribute. I would like to know from you
>> more on the role and responsibilities a BA can perform in this space and
>> the possible areas / technologies which I need myself to be prepared for.
>> ****
>>
>> ** **
>>
>> Thanks****
>>
>> Vijay****
>>
>> ** **
>>
>> *From:* Michael Forage [mailto:Michael.Forage@livenation.co.uk<Mi...@livenation.co.uk>]
>>
>> *Sent:* Friday, June 28, 2013 5:23 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* RE: Business Analysts in Hadoop World****
>>
>> ** **
>>
>> Hi Vijay****
>>
>>  ****
>>
>> My advice is to carefully consider the scope of the role you’re aiming for
>> ****
>>
>>  ****
>>
>> As a BA I expect that you’d be able to add value by understanding your
>> business data processing challenges and turning them into specifications
>> for map/reduce jobs. This doesn’t require you to have any Java coding
>> skills as such, just a good handle of the map/reduce concepts. It also
>> helps if you understand common use-cases associated with these technologies
>> (as it’s really an ecosystem of related toolsets) as well as what they’re
>> not so good for.****
>>
>>  ****
>>
>> This would allow you to contribute to solution design on behalf of the
>> business but assumes you’re not concerned with the actual
>> implementation/administration side of things. Definitely only bother
>> re-learning Java if you’re going to be the one writing the code. You do
>> need to have had experience of working with data of some kind (this is a
>> data processing environment at the end of the day) but simply reading a
>> decent Hadoop book and googling a few websites should give you a decent
>> enough background as a BA****
>>
>>  ****
>>
>> Cheers****
>>
>> Mike****
>>
>>  ****
>>
>> *From:* Lokesh Basu [mailto:lokesh.basu@gmail.com <lo...@gmail.com>]
>>
>> *Sent:* 28 June 2013 12:35
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Business Analysts in Hadoop World****
>>
>>  ****
>>
>> Dear Vijay,
>>
>> If you are a beginner in the open source project then I would recommend
>> you to first get familiar with Java and some version control system and
>> then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.
>>
>> If you are not aware of the Hadoop project then you should go through
>> some online text/videos to get the insight.
>>
>> Best of luck****
>>
>>
>> ****
>>
>>
>> *Lokesh Chandra Basu*****
>>
>> B. Tech****
>>
>> Computer Science and Engineering****
>>
>> Indian Institute of Technology, Roorkee****
>>
>> India(GMT +5hr 30min)****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>> On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
>> vijaya.bhoomi@huawei.com> wrote:****
>>
>> Hi,****
>>
>>  ****
>>
>> I am just trying to get myself acquainted with Hadoop and other related
>> technologies. I am very much fascinated with the potential of the Big Data
>> world and hence would like to be part of it!! ****
>>
>> However, it has been a while I have done any coding. Earlier for a brief
>> period of time during early days of my career, I have done some work in
>> Java. All these days, I am working as a Business Analyst in the CRM space.
>> ****
>>
>>  ****
>>
>> ·         Before I start exploring Hadoop world, I would like to hear
>> your thoughts on the following queries:****
>>
>>  ****
>>
>> ·         Being a business analyst, what would be the possible career
>> opportunities in the Hadoop space?****
>>
>>  ****
>>
>> ·         Is it necessary to have a strong technical background before
>> jumping into Hadoop? If so, which technologies need to be learnt primarily?
>> Java, SQL etc?****
>>
>>  ****
>>
>> ·         What are the various certifications available in the Hadoop
>> world? Are there any certifications for Business Analysts?****
>>
>>  ****
>>
>> Please let me know your valuable thoughts.****
>>
>>  ****
>>
>> Thanks****
>>
>> Vijay****
>>
>>  ****
>>
>> ****
>>
>> *Michael Forage* | Solutions Architect - Insight Services
>> Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44
>> 7808 174404
>> Address: 4 Pentonville Road | London | N1 9HF | United Kingdom****
>>
>> Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House
>> 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered
>> in England and Wales.****
>>
>> This message is confidential and may be legally privileged or otherwise
>> protected from disclosure. If you are not the intended recipient, please
>> telephone or email the sender and delete this message and any attachment
>> from your system; you must not copy or disclose the contents of this
>> message or any attachment to any other person.****
>>
>
>

Re: Business Analysts in Hadoop World

Posted by Peyman Mohajerian <mo...@gmail.com>.

I would say as a BA you definitely don't have to bother with Java and the
goal of open source big data is not to have people work with Java. If you
get familiar with Hive and HQL it will help you a lot. A lot of
organizations are entering into this space from DW/BI background and none
have to do Java. Even if you do want do some low level data analysis,
Python would be better from your point of view (the language of a lot of
Data Science guys). Java is great if you want to start contributing to the
code base or writing the most optimized queries or understand the low level
implementations.



On Fri, Jun 28, 2013 at 7:22 PM, Michael Aro <m....@gmail.com>wrote:

> Hi Vijay,
>
> Scott Gnau of Teradata Labs mentioned something related in the recent
> Hadoop summit in San Jose. The title of his presentation was "Putting
> Hadoop to Work in the Enterprise" and you can watch the video via this
> link: http://hadoopsummit.org/san-jose/keynote-day1/. He mentioned the
> business analyst around time 07:00 in the video. All the videos were great!
>
> Mike.
>
>
> On Fri, Jun 28, 2013 at 10:49 AM, <sa...@yahoo.com> wrote:
>
>> Is Data scientist in hadoop same as BA in IT.
>> Sent from BlackBerry® on Airtel
>> ------------------------------
>> *From: * Michael Forage <Mi...@livenation.co.uk>
>> *Date: *Fri, 28 Jun 2013 13:57:24 +0000
>> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *RE: Business Analysts in Hadoop World
>>
>>  Hi Vijay****
>>
>> ** **
>>
>> I’m afraid I’m not experienced or specialised enough in either Hadoop or
>> the broader Big data industry to give any advice on career paths****
>>
>> ** **
>>
>> Obviously different organisations expect completely different levels of
>> technical contribution from their Business Analysts but in my experience
>> the ability to collect, evaluate and interpret business requirements into
>> some kind of functional specification document is key. The responsibility
>> for subsequent technical specifications based applied knowledge of the big
>> data tools at disposal will probably sit with your Solution Architect.
>> Often a BA wouldn’t care how a solution is implemented under the covers.
>> However, in the same way that a BA may define user-flows in a use-case they
>> could outline a conceptual data processing flow if they understand the
>> source data and requirements well enough.****
>>
>> ** **
>>
>> There are so many new tools and technologies in this space already that,
>> unless you have a specific requirement to meet, it can be pretty
>> overwhelming. I’d just start by concentrating on getting an understanding
>> of map reduce concepts. Sure, it always helps if you’ve the time to get
>> some hands-on technical  experience but that’s not a trivial undertaking
>> from a standing start and it may be completely irrelevant for you in the
>> long run as that’s not what a BA is paid to do.****
>>
>> ** **
>>
>> Sorry I can’t be more help****
>>
>> Mike****
>>
>> ** **
>>
>> *From:* Vijaya Narayana Reddy Bhoomi Reddy [mailto:
>> vijaya.bhoomi@huawei.com]
>> *Sent:* 28 June 2013 14:10
>> *To:* user@hadoop.apache.org
>> *Subject:* RE: Business Analysts in Hadoop World****
>>
>> ** **
>>
>> Michael,****
>>
>> ** **
>>
>> Thanks for your advice. I am just confused because I could not see a
>> clear career path for Business Analysts in the Hadoop world. May be
>> because, it’s an evolving field or maybe I am not yet aware of the same
>> despite reading some primary content on the internet. However, I firmly
>> believe the Big Data space is going to be very big in the next years and I
>> would like to be part of it and contribute. I would like to know from you
>> more on the role and responsibilities a BA can perform in this space and
>> the possible areas / technologies which I need myself to be prepared for.
>> ****
>>
>> ** **
>>
>> Thanks****
>>
>> Vijay****
>>
>> ** **
>>
>> *From:* Michael Forage [mailto:Michael.Forage@livenation.co.uk<Mi...@livenation.co.uk>]
>>
>> *Sent:* Friday, June 28, 2013 5:23 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* RE: Business Analysts in Hadoop World****
>>
>> ** **
>>
>> Hi Vijay****
>>
>>  ****
>>
>> My advice is to carefully consider the scope of the role you’re aiming for
>> ****
>>
>>  ****
>>
>> As a BA I expect that you’d be able to add value by understanding your
>> business data processing challenges and turning them into specifications
>> for map/reduce jobs. This doesn’t require you to have any Java coding
>> skills as such, just a good handle of the map/reduce concepts. It also
>> helps if you understand common use-cases associated with these technologies
>> (as it’s really an ecosystem of related toolsets) as well as what they’re
>> not so good for.****
>>
>>  ****
>>
>> This would allow you to contribute to solution design on behalf of the
>> business but assumes you’re not concerned with the actual
>> implementation/administration side of things. Definitely only bother
>> re-learning Java if you’re going to be the one writing the code. You do
>> need to have had experience of working with data of some kind (this is a
>> data processing environment at the end of the day) but simply reading a
>> decent Hadoop book and googling a few websites should give you a decent
>> enough background as a BA****
>>
>>  ****
>>
>> Cheers****
>>
>> Mike****
>>
>>  ****
>>
>> *From:* Lokesh Basu [mailto:lokesh.basu@gmail.com <lo...@gmail.com>]
>>
>> *Sent:* 28 June 2013 12:35
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Business Analysts in Hadoop World****
>>
>>  ****
>>
>> Dear Vijay,
>>
>> If you are a beginner in the open source project then I would recommend
>> you to first get familiar with Java and some version control system and
>> then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.
>>
>> If you are not aware of the Hadoop project then you should go through
>> some online text/videos to get the insight.
>>
>> Best of luck****
>>
>>
>> ****
>>
>>
>> *Lokesh Chandra Basu*****
>>
>> B. Tech****
>>
>> Computer Science and Engineering****
>>
>> Indian Institute of Technology, Roorkee****
>>
>> India(GMT +5hr 30min)****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>> On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
>> vijaya.bhoomi@huawei.com> wrote:****
>>
>> Hi,****
>>
>>  ****
>>
>> I am just trying to get myself acquainted with Hadoop and other related
>> technologies. I am very much fascinated with the potential of the Big Data
>> world and hence would like to be part of it!! ****
>>
>> However, it has been a while I have done any coding. Earlier for a brief
>> period of time during early days of my career, I have done some work in
>> Java. All these days, I am working as a Business Analyst in the CRM space.
>> ****
>>
>>  ****
>>
>> ·         Before I start exploring Hadoop world, I would like to hear
>> your thoughts on the following queries:****
>>
>>  ****
>>
>> ·         Being a business analyst, what would be the possible career
>> opportunities in the Hadoop space?****
>>
>>  ****
>>
>> ·         Is it necessary to have a strong technical background before
>> jumping into Hadoop? If so, which technologies need to be learnt primarily?
>> Java, SQL etc?****
>>
>>  ****
>>
>> ·         What are the various certifications available in the Hadoop
>> world? Are there any certifications for Business Analysts?****
>>
>>  ****
>>
>> Please let me know your valuable thoughts.****
>>
>>  ****
>>
>> Thanks****
>>
>> Vijay****
>>
>>  ****
>>
>> ****
>>
>> *Michael Forage* | Solutions Architect - Insight Services
>> Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44
>> 7808 174404
>> Address: 4 Pentonville Road | London | N1 9HF | United Kingdom****
>>
>> Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House
>> 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered
>> in England and Wales.****
>>
>> This message is confidential and may be legally privileged or otherwise
>> protected from disclosure. If you are not the intended recipient, please
>> telephone or email the sender and delete this message and any attachment
>> from your system; you must not copy or disclose the contents of this
>> message or any attachment to any other person.****
>>
>
>

Re: Business Analysts in Hadoop World

Posted by Peyman Mohajerian <mo...@gmail.com>.

I would say as a BA you definitely don't have to bother with Java and the
goal of open source big data is not to have people work with Java. If you
get familiar with Hive and HQL it will help you a lot. A lot of
organizations are entering into this space from DW/BI background and none
have to do Java. Even if you do want do some low level data analysis,
Python would be better from your point of view (the language of a lot of
Data Science guys). Java is great if you want to start contributing to the
code base or writing the most optimized queries or understand the low level
implementations.



On Fri, Jun 28, 2013 at 7:22 PM, Michael Aro <m....@gmail.com>wrote:

> Hi Vijay,
>
> Scott Gnau of Teradata Labs mentioned something related in the recent
> Hadoop summit in San Jose. The title of his presentation was "Putting
> Hadoop to Work in the Enterprise" and you can watch the video via this
> link: http://hadoopsummit.org/san-jose/keynote-day1/. He mentioned the
> business analyst around time 07:00 in the video. All the videos were great!
>
> Mike.
>
>
> On Fri, Jun 28, 2013 at 10:49 AM, <sa...@yahoo.com> wrote:
>
>> Is Data scientist in hadoop same as BA in IT.
>> Sent from BlackBerry® on Airtel
>> ------------------------------
>> *From: * Michael Forage <Mi...@livenation.co.uk>
>> *Date: *Fri, 28 Jun 2013 13:57:24 +0000
>> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *RE: Business Analysts in Hadoop World
>>
>>  Hi Vijay****
>>
>> ** **
>>
>> I’m afraid I’m not experienced or specialised enough in either Hadoop or
>> the broader Big data industry to give any advice on career paths****
>>
>> ** **
>>
>> Obviously different organisations expect completely different levels of
>> technical contribution from their Business Analysts but in my experience
>> the ability to collect, evaluate and interpret business requirements into
>> some kind of functional specification document is key. The responsibility
>> for subsequent technical specifications based applied knowledge of the big
>> data tools at disposal will probably sit with your Solution Architect.
>> Often a BA wouldn’t care how a solution is implemented under the covers.
>> However, in the same way that a BA may define user-flows in a use-case they
>> could outline a conceptual data processing flow if they understand the
>> source data and requirements well enough.****
>>
>> ** **
>>
>> There are so many new tools and technologies in this space already that,
>> unless you have a specific requirement to meet, it can be pretty
>> overwhelming. I’d just start by concentrating on getting an understanding
>> of map reduce concepts. Sure, it always helps if you’ve the time to get
>> some hands-on technical  experience but that’s not a trivial undertaking
>> from a standing start and it may be completely irrelevant for you in the
>> long run as that’s not what a BA is paid to do.****
>>
>> ** **
>>
>> Sorry I can’t be more help****
>>
>> Mike****
>>
>> ** **
>>
>> *From:* Vijaya Narayana Reddy Bhoomi Reddy [mailto:
>> vijaya.bhoomi@huawei.com]
>> *Sent:* 28 June 2013 14:10
>> *To:* user@hadoop.apache.org
>> *Subject:* RE: Business Analysts in Hadoop World****
>>
>> ** **
>>
>> Michael,****
>>
>> ** **
>>
>> Thanks for your advice. I am just confused because I could not see a
>> clear career path for Business Analysts in the Hadoop world. May be
>> because, it’s an evolving field or maybe I am not yet aware of the same
>> despite reading some primary content on the internet. However, I firmly
>> believe the Big Data space is going to be very big in the next years and I
>> would like to be part of it and contribute. I would like to know from you
>> more on the role and responsibilities a BA can perform in this space and
>> the possible areas / technologies which I need myself to be prepared for.
>> ****
>>
>> ** **
>>
>> Thanks****
>>
>> Vijay****
>>
>> ** **
>>
>> *From:* Michael Forage [mailto:Michael.Forage@livenation.co.uk<Mi...@livenation.co.uk>]
>>
>> *Sent:* Friday, June 28, 2013 5:23 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* RE: Business Analysts in Hadoop World****
>>
>> ** **
>>
>> Hi Vijay****
>>
>>  ****
>>
>> My advice is to carefully consider the scope of the role you’re aiming for
>> ****
>>
>>  ****
>>
>> As a BA I expect that you’d be able to add value by understanding your
>> business data processing challenges and turning them into specifications
>> for map/reduce jobs. This doesn’t require you to have any Java coding
>> skills as such, just a good handle of the map/reduce concepts. It also
>> helps if you understand common use-cases associated with these technologies
>> (as it’s really an ecosystem of related toolsets) as well as what they’re
>> not so good for.****
>>
>>  ****
>>
>> This would allow you to contribute to solution design on behalf of the
>> business but assumes you’re not concerned with the actual
>> implementation/administration side of things. Definitely only bother
>> re-learning Java if you’re going to be the one writing the code. You do
>> need to have had experience of working with data of some kind (this is a
>> data processing environment at the end of the day) but simply reading a
>> decent Hadoop book and googling a few websites should give you a decent
>> enough background as a BA****
>>
>>  ****
>>
>> Cheers****
>>
>> Mike****
>>
>>  ****
>>
>> *From:* Lokesh Basu [mailto:lokesh.basu@gmail.com <lo...@gmail.com>]
>>
>> *Sent:* 28 June 2013 12:35
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Business Analysts in Hadoop World****
>>
>>  ****
>>
>> Dear Vijay,
>>
>> If you are a beginner in the open source project then I would recommend
>> you to first get familiar with Java and some version control system and
>> then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.
>>
>> If you are not aware of the Hadoop project then you should go through
>> some online text/videos to get the insight.
>>
>> Best of luck****
>>
>>
>> ****
>>
>>
>> *Lokesh Chandra Basu*****
>>
>> B. Tech****
>>
>> Computer Science and Engineering****
>>
>> Indian Institute of Technology, Roorkee****
>>
>> India(GMT +5hr 30min)****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>> On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
>> vijaya.bhoomi@huawei.com> wrote:****
>>
>> Hi,****
>>
>>  ****
>>
>> I am just trying to get myself acquainted with Hadoop and other related
>> technologies. I am very much fascinated with the potential of the Big Data
>> world and hence would like to be part of it!! ****
>>
>> However, it has been a while I have done any coding. Earlier for a brief
>> period of time during early days of my career, I have done some work in
>> Java. All these days, I am working as a Business Analyst in the CRM space.
>> ****
>>
>>  ****
>>
>> ·         Before I start exploring Hadoop world, I would like to hear
>> your thoughts on the following queries:****
>>
>>  ****
>>
>> ·         Being a business analyst, what would be the possible career
>> opportunities in the Hadoop space?****
>>
>>  ****
>>
>> ·         Is it necessary to have a strong technical background before
>> jumping into Hadoop? If so, which technologies need to be learnt primarily?
>> Java, SQL etc?****
>>
>>  ****
>>
>> ·         What are the various certifications available in the Hadoop
>> world? Are there any certifications for Business Analysts?****
>>
>>  ****
>>
>> Please let me know your valuable thoughts.****
>>
>>  ****
>>
>> Thanks****
>>
>> Vijay****
>>
>>  ****
>>
>> ****
>>
>> *Michael Forage* | Solutions Architect - Insight Services
>> Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44
>> 7808 174404
>> Address: 4 Pentonville Road | London | N1 9HF | United Kingdom****
>>
>> Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House
>> 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered
>> in England and Wales.****
>>
>> This message is confidential and may be legally privileged or otherwise
>> protected from disclosure. If you are not the intended recipient, please
>> telephone or email the sender and delete this message and any attachment
>> from your system; you must not copy or disclose the contents of this
>> message or any attachment to any other person.****
>>
>
>

Re: Business Analysts in Hadoop World

Posted by Michael Aro <m....@gmail.com>.

Hi Vijay,

Scott Gnau of Teradata Labs mentioned something related in the recent
Hadoop summit in San Jose. The title of his presentation was "Putting
Hadoop to Work in the Enterprise" and you can watch the video via this
link: http://hadoopsummit.org/san-jose/keynote-day1/. He mentioned the
business analyst around time 07:00 in the video. All the videos were great!

Mike.


On Fri, Jun 28, 2013 at 10:49 AM, <sa...@yahoo.com> wrote:

> Is Data scientist in hadoop same as BA in IT.
> Sent from BlackBerry® on Airtel
> ------------------------------
> *From: * Michael Forage <Mi...@livenation.co.uk>
> *Date: *Fri, 28 Jun 2013 13:57:24 +0000
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *RE: Business Analysts in Hadoop World
>
>  Hi Vijay****
>
> ** **
>
> I’m afraid I’m not experienced or specialised enough in either Hadoop or
> the broader Big data industry to give any advice on career paths****
>
> ** **
>
> Obviously different organisations expect completely different levels of
> technical contribution from their Business Analysts but in my experience
> the ability to collect, evaluate and interpret business requirements into
> some kind of functional specification document is key. The responsibility
> for subsequent technical specifications based applied knowledge of the big
> data tools at disposal will probably sit with your Solution Architect.
> Often a BA wouldn’t care how a solution is implemented under the covers.
> However, in the same way that a BA may define user-flows in a use-case they
> could outline a conceptual data processing flow if they understand the
> source data and requirements well enough.****
>
> ** **
>
> There are so many new tools and technologies in this space already that,
> unless you have a specific requirement to meet, it can be pretty
> overwhelming. I’d just start by concentrating on getting an understanding
> of map reduce concepts. Sure, it always helps if you’ve the time to get
> some hands-on technical  experience but that’s not a trivial undertaking
> from a standing start and it may be completely irrelevant for you in the
> long run as that’s not what a BA is paid to do.****
>
> ** **
>
> Sorry I can’t be more help****
>
> Mike****
>
> ** **
>
> *From:* Vijaya Narayana Reddy Bhoomi Reddy [mailto:
> vijaya.bhoomi@huawei.com]
> *Sent:* 28 June 2013 14:10
> *To:* user@hadoop.apache.org
> *Subject:* RE: Business Analysts in Hadoop World****
>
> ** **
>
> Michael,****
>
> ** **
>
> Thanks for your advice. I am just confused because I could not see a clear
> career path for Business Analysts in the Hadoop world. May be because, it’s
> an evolving field or maybe I am not yet aware of the same despite reading
> some primary content on the internet. However, I firmly believe the Big
> Data space is going to be very big in the next years and I would like to be
> part of it and contribute. I would like to know from you more on the role
> and responsibilities a BA can perform in this space and the possible areas
> / technologies which I need myself to be prepared for.****
>
> ** **
>
> Thanks****
>
> Vijay****
>
> ** **
>
> *From:* Michael Forage [mailto:Michael.Forage@livenation.co.uk<Mi...@livenation.co.uk>]
>
> *Sent:* Friday, June 28, 2013 5:23 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: Business Analysts in Hadoop World****
>
> ** **
>
> Hi Vijay****
>
>  ****
>
> My advice is to carefully consider the scope of the role you’re aiming for
> ****
>
>  ****
>
> As a BA I expect that you’d be able to add value by understanding your
> business data processing challenges and turning them into specifications
> for map/reduce jobs. This doesn’t require you to have any Java coding
> skills as such, just a good handle of the map/reduce concepts. It also
> helps if you understand common use-cases associated with these technologies
> (as it’s really an ecosystem of related toolsets) as well as what they’re
> not so good for.****
>
>  ****
>
> This would allow you to contribute to solution design on behalf of the
> business but assumes you’re not concerned with the actual
> implementation/administration side of things. Definitely only bother
> re-learning Java if you’re going to be the one writing the code. You do
> need to have had experience of working with data of some kind (this is a
> data processing environment at the end of the day) but simply reading a
> decent Hadoop book and googling a few websites should give you a decent
> enough background as a BA****
>
>  ****
>
> Cheers****
>
> Mike****
>
>  ****
>
> *From:* Lokesh Basu [mailto:lokesh.basu@gmail.com <lo...@gmail.com>]
>
> *Sent:* 28 June 2013 12:35
> *To:* user@hadoop.apache.org
> *Subject:* Re: Business Analysts in Hadoop World****
>
>  ****
>
> Dear Vijay,
>
> If you are a beginner in the open source project then I would recommend
> you to first get familiar with Java and some version control system and
> then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.
>
> If you are not aware of the Hadoop project then you should go through some
> online text/videos to get the insight.
>
> Best of luck****
>
>
> ****
>
>
> *Lokesh Chandra Basu*****
>
> B. Tech****
>
> Computer Science and Engineering****
>
> Indian Institute of Technology, Roorkee****
>
> India(GMT +5hr 30min)****
>
>  ****
>
>  ****
>
>  ****
>
> On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
> vijaya.bhoomi@huawei.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am just trying to get myself acquainted with Hadoop and other related
> technologies. I am very much fascinated with the potential of the Big Data
> world and hence would like to be part of it!! ****
>
> However, it has been a while I have done any coding. Earlier for a brief
> period of time during early days of my career, I have done some work in
> Java. All these days, I am working as a Business Analyst in the CRM space.
> ****
>
>  ****
>
> ·         Before I start exploring Hadoop world, I would like to hear
> your thoughts on the following queries:****
>
>  ****
>
> ·         Being a business analyst, what would be the possible career
> opportunities in the Hadoop space?****
>
>  ****
>
> ·         Is it necessary to have a strong technical background before
> jumping into Hadoop? If so, which technologies need to be learnt primarily?
> Java, SQL etc?****
>
>  ****
>
> ·         What are the various certifications available in the Hadoop
> world? Are there any certifications for Business Analysts?****
>
>  ****
>
> Please let me know your valuable thoughts.****
>
>  ****
>
> Thanks****
>
> Vijay****
>
>  ****
>
> ****
>
> *Michael Forage* | Solutions Architect - Insight Services
> Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44
> 7808 174404
> Address: 4 Pentonville Road | London | N1 9HF | United Kingdom****
>
> Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House
> 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered
> in England and Wales.****
>
> This message is confidential and may be legally privileged or otherwise
> protected from disclosure. If you are not the intended recipient, please
> telephone or email the sender and delete this message and any attachment
> from your system; you must not copy or disclose the contents of this
> message or any attachment to any other person.****
>

Re: Business Analysts in Hadoop World

Posted by Michael Aro <m....@gmail.com>.

Hi Vijay,

Scott Gnau of Teradata Labs mentioned something related in the recent
Hadoop summit in San Jose. The title of his presentation was "Putting
Hadoop to Work in the Enterprise" and you can watch the video via this
link: http://hadoopsummit.org/san-jose/keynote-day1/. He mentioned the
business analyst around time 07:00 in the video. All the videos were great!

Mike.


On Fri, Jun 28, 2013 at 10:49 AM, <sa...@yahoo.com> wrote:

> Is Data scientist in hadoop same as BA in IT.
> Sent from BlackBerry® on Airtel
> ------------------------------
> *From: * Michael Forage <Mi...@livenation.co.uk>
> *Date: *Fri, 28 Jun 2013 13:57:24 +0000
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *RE: Business Analysts in Hadoop World
>
>  Hi Vijay****
>
> ** **
>
> I’m afraid I’m not experienced or specialised enough in either Hadoop or
> the broader Big data industry to give any advice on career paths****
>
> ** **
>
> Obviously different organisations expect completely different levels of
> technical contribution from their Business Analysts but in my experience
> the ability to collect, evaluate and interpret business requirements into
> some kind of functional specification document is key. The responsibility
> for subsequent technical specifications based applied knowledge of the big
> data tools at disposal will probably sit with your Solution Architect.
> Often a BA wouldn’t care how a solution is implemented under the covers.
> However, in the same way that a BA may define user-flows in a use-case they
> could outline a conceptual data processing flow if they understand the
> source data and requirements well enough.****
>
> ** **
>
> There are so many new tools and technologies in this space already that,
> unless you have a specific requirement to meet, it can be pretty
> overwhelming. I’d just start by concentrating on getting an understanding
> of map reduce concepts. Sure, it always helps if you’ve the time to get
> some hands-on technical  experience but that’s not a trivial undertaking
> from a standing start and it may be completely irrelevant for you in the
> long run as that’s not what a BA is paid to do.****
>
> ** **
>
> Sorry I can’t be more help****
>
> Mike****
>
> ** **
>
> *From:* Vijaya Narayana Reddy Bhoomi Reddy [mailto:
> vijaya.bhoomi@huawei.com]
> *Sent:* 28 June 2013 14:10
> *To:* user@hadoop.apache.org
> *Subject:* RE: Business Analysts in Hadoop World****
>
> ** **
>
> Michael,****
>
> ** **
>
> Thanks for your advice. I am just confused because I could not see a clear
> career path for Business Analysts in the Hadoop world. May be because, it’s
> an evolving field or maybe I am not yet aware of the same despite reading
> some primary content on the internet. However, I firmly believe the Big
> Data space is going to be very big in the next years and I would like to be
> part of it and contribute. I would like to know from you more on the role
> and responsibilities a BA can perform in this space and the possible areas
> / technologies which I need myself to be prepared for.****
>
> ** **
>
> Thanks****
>
> Vijay****
>
> ** **
>
> *From:* Michael Forage [mailto:Michael.Forage@livenation.co.uk<Mi...@livenation.co.uk>]
>
> *Sent:* Friday, June 28, 2013 5:23 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: Business Analysts in Hadoop World****
>
> ** **
>
> Hi Vijay****
>
>  ****
>
> My advice is to carefully consider the scope of the role you’re aiming for
> ****
>
>  ****
>
> As a BA I expect that you’d be able to add value by understanding your
> business data processing challenges and turning them into specifications
> for map/reduce jobs. This doesn’t require you to have any Java coding
> skills as such, just a good handle of the map/reduce concepts. It also
> helps if you understand common use-cases associated with these technologies
> (as it’s really an ecosystem of related toolsets) as well as what they’re
> not so good for.****
>
>  ****
>
> This would allow you to contribute to solution design on behalf of the
> business but assumes you’re not concerned with the actual
> implementation/administration side of things. Definitely only bother
> re-learning Java if you’re going to be the one writing the code. You do
> need to have had experience of working with data of some kind (this is a
> data processing environment at the end of the day) but simply reading a
> decent Hadoop book and googling a few websites should give you a decent
> enough background as a BA****
>
>  ****
>
> Cheers****
>
> Mike****
>
>  ****
>
> *From:* Lokesh Basu [mailto:lokesh.basu@gmail.com <lo...@gmail.com>]
>
> *Sent:* 28 June 2013 12:35
> *To:* user@hadoop.apache.org
> *Subject:* Re: Business Analysts in Hadoop World****
>
>  ****
>
> Dear Vijay,
>
> If you are a beginner in the open source project then I would recommend
> you to first get familiar with Java and some version control system and
> then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.
>
> If you are not aware of the Hadoop project then you should go through some
> online text/videos to get the insight.
>
> Best of luck****
>
>
> ****
>
>
> *Lokesh Chandra Basu*****
>
> B. Tech****
>
> Computer Science and Engineering****
>
> Indian Institute of Technology, Roorkee****
>
> India(GMT +5hr 30min)****
>
>  ****
>
>  ****
>
>  ****
>
> On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
> vijaya.bhoomi@huawei.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am just trying to get myself acquainted with Hadoop and other related
> technologies. I am very much fascinated with the potential of the Big Data
> world and hence would like to be part of it!! ****
>
> However, it has been a while I have done any coding. Earlier for a brief
> period of time during early days of my career, I have done some work in
> Java. All these days, I am working as a Business Analyst in the CRM space.
> ****
>
>  ****
>
> ·         Before I start exploring Hadoop world, I would like to hear
> your thoughts on the following queries:****
>
>  ****
>
> ·         Being a business analyst, what would be the possible career
> opportunities in the Hadoop space?****
>
>  ****
>
> ·         Is it necessary to have a strong technical background before
> jumping into Hadoop? If so, which technologies need to be learnt primarily?
> Java, SQL etc?****
>
>  ****
>
> ·         What are the various certifications available in the Hadoop
> world? Are there any certifications for Business Analysts?****
>
>  ****
>
> Please let me know your valuable thoughts.****
>
>  ****
>
> Thanks****
>
> Vijay****
>
>  ****
>
> ****
>
> *Michael Forage* | Solutions Architect - Insight Services
> Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44
> 7808 174404
> Address: 4 Pentonville Road | London | N1 9HF | United Kingdom****
>
> Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House
> 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered
> in England and Wales.****
>
> This message is confidential and may be legally privileged or otherwise
> protected from disclosure. If you are not the intended recipient, please
> telephone or email the sender and delete this message and any attachment
> from your system; you must not copy or disclose the contents of this
> message or any attachment to any other person.****
>

Re: Input Split vs Task vs attempt vs computation

Posted by Sonal Goyal <so...@gmail.com>.

Inline

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Fri, Sep 27, 2013 at 10:42 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have a few questions i am trying to understand:
>
> 1. Is each input split same as a record, (a rec can be a single line or
> multiple lines).
>

An InputSplit is a chunk of input that is handled by a map task. It will
generally contain multiple records. The RecordReader provides the key
values to the map task. Check
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/InputSplit.html

>
> 2. Is each Task a collection of few computations or attempts.
>
> For ex: if i have a small file with 5 lines.
> By default there will be 1 line on which each map computation is performed.
> So totally 5 computations r done on 1 node.
>
> This means JT will spawn 1 JVM for 1 Tasktracker on a node
> and another JVM for map task which will instantiate 5 map objects 1 for
> each line.
>
> i am not sure what you mean by 5 map objects. But yes, the mapper will be
invoked 5 times, once for each line.


> The MT JVM is called the task which will have 5 attempts for  each line.
> This means attempt is same as computation.
>
> Please let me know if anything is incorrect.
> Thanks
> Sai
>
>

Re: Input Split vs Task vs attempt vs computation

Posted by Sonal Goyal <so...@gmail.com>.

Inline

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Fri, Sep 27, 2013 at 10:42 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have a few questions i am trying to understand:
>
> 1. Is each input split same as a record, (a rec can be a single line or
> multiple lines).
>

An InputSplit is a chunk of input that is handled by a map task. It will
generally contain multiple records. The RecordReader provides the key
values to the map task. Check
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/InputSplit.html

>
> 2. Is each Task a collection of few computations or attempts.
>
> For ex: if i have a small file with 5 lines.
> By default there will be 1 line on which each map computation is performed.
> So totally 5 computations r done on 1 node.
>
> This means JT will spawn 1 JVM for 1 Tasktracker on a node
> and another JVM for map task which will instantiate 5 map objects 1 for
> each line.
>
> i am not sure what you mean by 5 map objects. But yes, the mapper will be
invoked 5 times, once for each line.


> The MT JVM is called the task which will have 5 attempts for  each line.
> This means attempt is same as computation.
>
> Please let me know if anything is incorrect.
> Thanks
> Sai
>
>

Re: Input Split vs Task vs attempt vs computation

Posted by Sonal Goyal <so...@gmail.com>.

Inline

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Fri, Sep 27, 2013 at 10:42 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have a few questions i am trying to understand:
>
> 1. Is each input split same as a record, (a rec can be a single line or
> multiple lines).
>

An InputSplit is a chunk of input that is handled by a map task. It will
generally contain multiple records. The RecordReader provides the key
values to the map task. Check
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/InputSplit.html

>
> 2. Is each Task a collection of few computations or attempts.
>
> For ex: if i have a small file with 5 lines.
> By default there will be 1 line on which each map computation is performed.
> So totally 5 computations r done on 1 node.
>
> This means JT will spawn 1 JVM for 1 Tasktracker on a node
> and another JVM for map task which will instantiate 5 map objects 1 for
> each line.
>
> i am not sure what you mean by 5 map objects. But yes, the mapper will be
invoked 5 times, once for each line.


> The MT JVM is called the task which will have 5 attempts for  each line.
> This means attempt is same as computation.
>
> Please let me know if anything is incorrect.
> Thanks
> Sai
>
>

Re: Input Split vs Task vs attempt vs computation

Posted by Sonal Goyal <so...@gmail.com>.

Inline

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Fri, Sep 27, 2013 at 10:42 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have a few questions i am trying to understand:
>
> 1. Is each input split same as a record, (a rec can be a single line or
> multiple lines).
>

An InputSplit is a chunk of input that is handled by a map task. It will
generally contain multiple records. The RecordReader provides the key
values to the map task. Check
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/InputSplit.html

>
> 2. Is each Task a collection of few computations or attempts.
>
> For ex: if i have a small file with 5 lines.
> By default there will be 1 line on which each map computation is performed.
> So totally 5 computations r done on 1 node.
>
> This means JT will spawn 1 JVM for 1 Tasktracker on a node
> and another JVM for map task which will instantiate 5 map objects 1 for
> each line.
>
> i am not sure what you mean by 5 map objects. But yes, the mapper will be
invoked 5 times, once for each line.


> The MT JVM is called the task which will have 5 attempts for  each line.
> This means attempt is same as computation.
>
> Please let me know if anything is incorrect.
> Thanks
> Sai
>
>

Re: Input Split vs Task vs attempt vs computation

Posted by Sai Sai <sa...@yahoo.in>.

Hi
I have a few questions i am trying to understand:

1. Is each input split same as a record, (a rec can be a single line or multiple lines).

2. Is each Task a collection of few computations or attempts.

For ex: if i have a small file with 5 lines.

By default there will be 1 line on which each map computation is performed.
So totally 5 computations r done on 1 node.

This means JT will spawn 1 JVM for 1 Tasktracker on a node
and another JVM for map task which will instantiate 5 map objects 1 for each line.

The MT JVM is called the task which will have 5 attempts for  each line.
This means attempt is same as computation.

Please let me know if anything is incorrect.
Thanks
Sai

Re: 2 Map tasks running for a small input file

Posted by Shekhar Sharma <sh...@gmail.com>.

Number of map tasks on a mapreduce job doesnt depend on this
property..it depends on the number of input splits...( or equal to
number blocks if input split size = block size)

1. What is the input format you are using? if yes what is the value of
N, you are using?

2. WHat is the propety mapred.min.split.size? have you changed to
something else or is it default which is 1?




Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Sep 26, 2013 at 4:39 PM, Viji R <vi...@cloudera.com> wrote:
> Hi,
>
> Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> avoid this.
>
> Regards,
> Viji
>
> On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> Hi
>> Here is the input file for the wordcount job:
>> ******************
>> Hi This is a simple test.
>> Hi Hadoop how r u.
>> Hello Hello.
>> Hi Hi.
>> Hadoop Hadoop Welcome.
>> ******************
>>
>> After running the wordcount successfully
>> here r the counters info:
>>
>> ***************
>> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> Launched reduce tasks 0 0 1
>> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
>> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> Launched map tasks 0 0 2
>> Data-local map tasks 0 0 2
>> SLOTS_MILLIS_REDUCES 0 0 9,199
>> ***************
>> My question why r there 2 launched map tasks when i have only a small file.
>> Per my understanding it is only 1 block.
>> and should be only 1 split.
>> Then for each line a map computation should occur
>> but it shows 2 map tasks.
>> Please let me know.
>> Thanks
>> Sai
>>

Re: 2 Map tasks running for a small input file

Posted by Shekhar Sharma <sh...@gmail.com>.

Number of map tasks on a mapreduce job doesnt depend on this
property..it depends on the number of input splits...( or equal to
number blocks if input split size = block size)

1. What is the input format you are using? if yes what is the value of
N, you are using?

2. WHat is the propety mapred.min.split.size? have you changed to
something else or is it default which is 1?




Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Sep 26, 2013 at 4:39 PM, Viji R <vi...@cloudera.com> wrote:
> Hi,
>
> Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> avoid this.
>
> Regards,
> Viji
>
> On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> Hi
>> Here is the input file for the wordcount job:
>> ******************
>> Hi This is a simple test.
>> Hi Hadoop how r u.
>> Hello Hello.
>> Hi Hi.
>> Hadoop Hadoop Welcome.
>> ******************
>>
>> After running the wordcount successfully
>> here r the counters info:
>>
>> ***************
>> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> Launched reduce tasks 0 0 1
>> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
>> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> Launched map tasks 0 0 2
>> Data-local map tasks 0 0 2
>> SLOTS_MILLIS_REDUCES 0 0 9,199
>> ***************
>> My question why r there 2 launched map tasks when i have only a small file.
>> Per my understanding it is only 1 block.
>> and should be only 1 split.
>> Then for each line a map computation should occur
>> but it shows 2 map tasks.
>> Please let me know.
>> Thanks
>> Sai
>>

Re: 2 Map tasks running for a small input file

Posted by Shekhar Sharma <sh...@gmail.com>.

Dmapred.tasktracker.map.tasks.maximum=1 ...Guys this property is set
for task tracker...when you set this property it means, that
particular task tracker will not run more than 1 mapper task
parallely..

FOr example: if a map reduce job requires 5 mapper tasks and if you
set this property to 1, then only 1 mapper task will run and other
will wait..once the task is completed other tasks will be scheduled...


Could you please send the code, you are trying to run..the driver code
and mapred-site.xml contents..?

You can controll the numbr of map task through input split size(
mapred.min.split.size, mapred.max.split.size and dfs.block.size)

max(minSPlitSize, min(maxSPlitsize, blocksize))



Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Sep 26, 2013 at 6:07 PM, shashwat shriparv
<dw...@gmail.com> wrote:
> just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command line
> and check how many map task its running. and also set this in
> mapred-site.xml and check.
>
> Thanks & Regards
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Thu, Sep 26, 2013 at 5:24 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Sai,
>>
>> What Viji indicated is that the default Apache Hadoop setting for any
>> input is 2 maps. If the input is larger than one block, regular
>> policies of splitting such as those stated by Shekhar would apply. But
>> for smaller inputs, just for an out-of-box "parallelism experience",
>> Hadoop ships with a 2-maps forced splitting default
>> (mapred.map.tasks=2).
>>
>> This means your 5 lines is probably divided as 2:3 or other ratios and
>> is processed by 2 different Tasks. As Viji also indicated, to turn off
>> this behavior, you can set the mapred.map.tasks to 1 in your configs
>> and then you'll see only one map task process all 5 lines.
>>
>> On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
>> > Thanks Viji.
>> > I am confused a little when the data is small y would there b 2 tasks.
>> > U will use the min as 2 if u need it but in this case it is not needed
>> > due
>> > to size of the data being small
>> > so y would 2 map tasks exec.
>> > Since it results in 1 block with 5 lines of data in it
>> > i am assuming this results in 5 map computations 1 per each line
>> > and all of em in 1 process/node since i m using a pseudo vm.
>> > Where is the second task coming from.
>> > The 5 computations of map on each line is 1 task.
>> > Is this right.
>> > Please help.
>> > Thanks
>> >
>> >
>> > ________________________________
>> > From: Viji R <vi...@cloudera.com>
>> > To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> > Sent: Thursday, 26 September 2013 5:09 PM
>> > Subject: Re: 2 Map tasks running for a small input file
>> >
>> > Hi,
>> >
>> > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
>> > avoid this.
>> >
>> > Regards,
>> > Viji
>> >
>> > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> >> Hi
>> >> Here is the input file for the wordcount job:
>> >> ******************
>> >> Hi This is a simple test.
>> >> Hi Hadoop how r u.
>> >> Hello Hello.
>> >> Hi Hi.
>> >> Hadoop Hadoop Welcome.
>> >> ******************
>> >>
>> >> After running the wordcount successfully
>> >> here r the counters info:
>> >>
>> >> ***************
>> >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> >> Launched reduce tasks 0 0 1
>> >> Total time spent by all reduces waiting after reserving slots (ms) 0 0
>> >> 0
>> >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> >> Launched map tasks 0 0 2
>> >> Data-local map tasks 0 0 2
>> >> SLOTS_MILLIS_REDUCES 0 0 9,199
>> >> ***************
>> >> My question why r there 2 launched map tasks when i have only a small
>> >> file.
>> >> Per my understanding it is only 1 block.
>> >> and should be only 1 split.
>> >> Then for each line a map computation should occur
>> >> but it shows 2 map tasks.
>> >> Please let me know.
>> >> Thanks
>> >> Sai
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>

Re: 2 Map tasks running for a small input file

Posted by Shekhar Sharma <sh...@gmail.com>.

Dmapred.tasktracker.map.tasks.maximum=1 ...Guys this property is set
for task tracker...when you set this property it means, that
particular task tracker will not run more than 1 mapper task
parallely..

FOr example: if a map reduce job requires 5 mapper tasks and if you
set this property to 1, then only 1 mapper task will run and other
will wait..once the task is completed other tasks will be scheduled...


Could you please send the code, you are trying to run..the driver code
and mapred-site.xml contents..?

You can controll the numbr of map task through input split size(
mapred.min.split.size, mapred.max.split.size and dfs.block.size)

max(minSPlitSize, min(maxSPlitsize, blocksize))



Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Sep 26, 2013 at 6:07 PM, shashwat shriparv
<dw...@gmail.com> wrote:
> just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command line
> and check how many map task its running. and also set this in
> mapred-site.xml and check.
>
> Thanks & Regards
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Thu, Sep 26, 2013 at 5:24 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Sai,
>>
>> What Viji indicated is that the default Apache Hadoop setting for any
>> input is 2 maps. If the input is larger than one block, regular
>> policies of splitting such as those stated by Shekhar would apply. But
>> for smaller inputs, just for an out-of-box "parallelism experience",
>> Hadoop ships with a 2-maps forced splitting default
>> (mapred.map.tasks=2).
>>
>> This means your 5 lines is probably divided as 2:3 or other ratios and
>> is processed by 2 different Tasks. As Viji also indicated, to turn off
>> this behavior, you can set the mapred.map.tasks to 1 in your configs
>> and then you'll see only one map task process all 5 lines.
>>
>> On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
>> > Thanks Viji.
>> > I am confused a little when the data is small y would there b 2 tasks.
>> > U will use the min as 2 if u need it but in this case it is not needed
>> > due
>> > to size of the data being small
>> > so y would 2 map tasks exec.
>> > Since it results in 1 block with 5 lines of data in it
>> > i am assuming this results in 5 map computations 1 per each line
>> > and all of em in 1 process/node since i m using a pseudo vm.
>> > Where is the second task coming from.
>> > The 5 computations of map on each line is 1 task.
>> > Is this right.
>> > Please help.
>> > Thanks
>> >
>> >
>> > ________________________________
>> > From: Viji R <vi...@cloudera.com>
>> > To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> > Sent: Thursday, 26 September 2013 5:09 PM
>> > Subject: Re: 2 Map tasks running for a small input file
>> >
>> > Hi,
>> >
>> > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
>> > avoid this.
>> >
>> > Regards,
>> > Viji
>> >
>> > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> >> Hi
>> >> Here is the input file for the wordcount job:
>> >> ******************
>> >> Hi This is a simple test.
>> >> Hi Hadoop how r u.
>> >> Hello Hello.
>> >> Hi Hi.
>> >> Hadoop Hadoop Welcome.
>> >> ******************
>> >>
>> >> After running the wordcount successfully
>> >> here r the counters info:
>> >>
>> >> ***************
>> >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> >> Launched reduce tasks 0 0 1
>> >> Total time spent by all reduces waiting after reserving slots (ms) 0 0
>> >> 0
>> >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> >> Launched map tasks 0 0 2
>> >> Data-local map tasks 0 0 2
>> >> SLOTS_MILLIS_REDUCES 0 0 9,199
>> >> ***************
>> >> My question why r there 2 launched map tasks when i have only a small
>> >> file.
>> >> Per my understanding it is only 1 block.
>> >> and should be only 1 split.
>> >> Then for each line a map computation should occur
>> >> but it shows 2 map tasks.
>> >> Please let me know.
>> >> Thanks
>> >> Sai
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>

Re: 2 Map tasks running for a small input file

Posted by Shekhar Sharma <sh...@gmail.com>.

Dmapred.tasktracker.map.tasks.maximum=1 ...Guys this property is set
for task tracker...when you set this property it means, that
particular task tracker will not run more than 1 mapper task
parallely..

FOr example: if a map reduce job requires 5 mapper tasks and if you
set this property to 1, then only 1 mapper task will run and other
will wait..once the task is completed other tasks will be scheduled...


Could you please send the code, you are trying to run..the driver code
and mapred-site.xml contents..?

You can controll the numbr of map task through input split size(
mapred.min.split.size, mapred.max.split.size and dfs.block.size)

max(minSPlitSize, min(maxSPlitsize, blocksize))



Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Sep 26, 2013 at 6:07 PM, shashwat shriparv
<dw...@gmail.com> wrote:
> just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command line
> and check how many map task its running. and also set this in
> mapred-site.xml and check.
>
> Thanks & Regards
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Thu, Sep 26, 2013 at 5:24 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Sai,
>>
>> What Viji indicated is that the default Apache Hadoop setting for any
>> input is 2 maps. If the input is larger than one block, regular
>> policies of splitting such as those stated by Shekhar would apply. But
>> for smaller inputs, just for an out-of-box "parallelism experience",
>> Hadoop ships with a 2-maps forced splitting default
>> (mapred.map.tasks=2).
>>
>> This means your 5 lines is probably divided as 2:3 or other ratios and
>> is processed by 2 different Tasks. As Viji also indicated, to turn off
>> this behavior, you can set the mapred.map.tasks to 1 in your configs
>> and then you'll see only one map task process all 5 lines.
>>
>> On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
>> > Thanks Viji.
>> > I am confused a little when the data is small y would there b 2 tasks.
>> > U will use the min as 2 if u need it but in this case it is not needed
>> > due
>> > to size of the data being small
>> > so y would 2 map tasks exec.
>> > Since it results in 1 block with 5 lines of data in it
>> > i am assuming this results in 5 map computations 1 per each line
>> > and all of em in 1 process/node since i m using a pseudo vm.
>> > Where is the second task coming from.
>> > The 5 computations of map on each line is 1 task.
>> > Is this right.
>> > Please help.
>> > Thanks
>> >
>> >
>> > ________________________________
>> > From: Viji R <vi...@cloudera.com>
>> > To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> > Sent: Thursday, 26 September 2013 5:09 PM
>> > Subject: Re: 2 Map tasks running for a small input file
>> >
>> > Hi,
>> >
>> > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
>> > avoid this.
>> >
>> > Regards,
>> > Viji
>> >
>> > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> >> Hi
>> >> Here is the input file for the wordcount job:
>> >> ******************
>> >> Hi This is a simple test.
>> >> Hi Hadoop how r u.
>> >> Hello Hello.
>> >> Hi Hi.
>> >> Hadoop Hadoop Welcome.
>> >> ******************
>> >>
>> >> After running the wordcount successfully
>> >> here r the counters info:
>> >>
>> >> ***************
>> >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> >> Launched reduce tasks 0 0 1
>> >> Total time spent by all reduces waiting after reserving slots (ms) 0 0
>> >> 0
>> >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> >> Launched map tasks 0 0 2
>> >> Data-local map tasks 0 0 2
>> >> SLOTS_MILLIS_REDUCES 0 0 9,199
>> >> ***************
>> >> My question why r there 2 launched map tasks when i have only a small
>> >> file.
>> >> Per my understanding it is only 1 block.
>> >> and should be only 1 split.
>> >> Then for each line a map computation should occur
>> >> but it shows 2 map tasks.
>> >> Please let me know.
>> >> Thanks
>> >> Sai
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>

Re: 2 Map tasks running for a small input file

Posted by Shekhar Sharma <sh...@gmail.com>.

Dmapred.tasktracker.map.tasks.maximum=1 ...Guys this property is set
for task tracker...when you set this property it means, that
particular task tracker will not run more than 1 mapper task
parallely..

FOr example: if a map reduce job requires 5 mapper tasks and if you
set this property to 1, then only 1 mapper task will run and other
will wait..once the task is completed other tasks will be scheduled...


Could you please send the code, you are trying to run..the driver code
and mapred-site.xml contents..?

You can controll the numbr of map task through input split size(
mapred.min.split.size, mapred.max.split.size and dfs.block.size)

max(minSPlitSize, min(maxSPlitsize, blocksize))



Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Sep 26, 2013 at 6:07 PM, shashwat shriparv
<dw...@gmail.com> wrote:
> just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command line
> and check how many map task its running. and also set this in
> mapred-site.xml and check.
>
> Thanks & Regards
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Thu, Sep 26, 2013 at 5:24 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Sai,
>>
>> What Viji indicated is that the default Apache Hadoop setting for any
>> input is 2 maps. If the input is larger than one block, regular
>> policies of splitting such as those stated by Shekhar would apply. But
>> for smaller inputs, just for an out-of-box "parallelism experience",
>> Hadoop ships with a 2-maps forced splitting default
>> (mapred.map.tasks=2).
>>
>> This means your 5 lines is probably divided as 2:3 or other ratios and
>> is processed by 2 different Tasks. As Viji also indicated, to turn off
>> this behavior, you can set the mapred.map.tasks to 1 in your configs
>> and then you'll see only one map task process all 5 lines.
>>
>> On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
>> > Thanks Viji.
>> > I am confused a little when the data is small y would there b 2 tasks.
>> > U will use the min as 2 if u need it but in this case it is not needed
>> > due
>> > to size of the data being small
>> > so y would 2 map tasks exec.
>> > Since it results in 1 block with 5 lines of data in it
>> > i am assuming this results in 5 map computations 1 per each line
>> > and all of em in 1 process/node since i m using a pseudo vm.
>> > Where is the second task coming from.
>> > The 5 computations of map on each line is 1 task.
>> > Is this right.
>> > Please help.
>> > Thanks
>> >
>> >
>> > ________________________________
>> > From: Viji R <vi...@cloudera.com>
>> > To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> > Sent: Thursday, 26 September 2013 5:09 PM
>> > Subject: Re: 2 Map tasks running for a small input file
>> >
>> > Hi,
>> >
>> > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
>> > avoid this.
>> >
>> > Regards,
>> > Viji
>> >
>> > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> >> Hi
>> >> Here is the input file for the wordcount job:
>> >> ******************
>> >> Hi This is a simple test.
>> >> Hi Hadoop how r u.
>> >> Hello Hello.
>> >> Hi Hi.
>> >> Hadoop Hadoop Welcome.
>> >> ******************
>> >>
>> >> After running the wordcount successfully
>> >> here r the counters info:
>> >>
>> >> ***************
>> >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> >> Launched reduce tasks 0 0 1
>> >> Total time spent by all reduces waiting after reserving slots (ms) 0 0
>> >> 0
>> >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> >> Launched map tasks 0 0 2
>> >> Data-local map tasks 0 0 2
>> >> SLOTS_MILLIS_REDUCES 0 0 9,199
>> >> ***************
>> >> My question why r there 2 launched map tasks when i have only a small
>> >> file.
>> >> Per my understanding it is only 1 block.
>> >> and should be only 1 split.
>> >> Then for each line a map computation should occur
>> >> but it shows 2 map tasks.
>> >> Please let me know.
>> >> Thanks
>> >> Sai
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>

Re: 2 Map tasks running for a small input file

Posted by shashwat shriparv <dw...@gmail.com>.

just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command
line and check how many map task its running. and also set this in
mapred-site.xml and check.

*Thanks & Regards    *

∞
Shashwat Shriparv



On Thu, Sep 26, 2013 at 5:24 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Sai,
>
> What Viji indicated is that the default Apache Hadoop setting for any
> input is 2 maps. If the input is larger than one block, regular
> policies of splitting such as those stated by Shekhar would apply. But
> for smaller inputs, just for an out-of-box "parallelism experience",
> Hadoop ships with a 2-maps forced splitting default
> (mapred.map.tasks=2).
>
> This means your 5 lines is probably divided as 2:3 or other ratios and
> is processed by 2 different Tasks. As Viji also indicated, to turn off
> this behavior, you can set the mapred.map.tasks to 1 in your configs
> and then you'll see only one map task process all 5 lines.
>
> On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
> > Thanks Viji.
> > I am confused a little when the data is small y would there b 2 tasks.
> > U will use the min as 2 if u need it but in this case it is not needed
> due
> > to size of the data being small
> > so y would 2 map tasks exec.
> > Since it results in 1 block with 5 lines of data in it
> > i am assuming this results in 5 map computations 1 per each line
> > and all of em in 1 process/node since i m using a pseudo vm.
> > Where is the second task coming from.
> > The 5 computations of map on each line is 1 task.
> > Is this right.
> > Please help.
> > Thanks
> >
> >
> > ________________________________
> > From: Viji R <vi...@cloudera.com>
> > To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> > Sent: Thursday, 26 September 2013 5:09 PM
> > Subject: Re: 2 Map tasks running for a small input file
> >
> > Hi,
> >
> > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> > avoid this.
> >
> > Regards,
> > Viji
> >
> > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> >> Hi
> >> Here is the input file for the wordcount job:
> >> ******************
> >> Hi This is a simple test.
> >> Hi Hadoop how r u.
> >> Hello Hello.
> >> Hi Hi.
> >> Hadoop Hadoop Welcome.
> >> ******************
> >>
> >> After running the wordcount successfully
> >> here r the counters info:
> >>
> >> ***************
> >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> >> Launched reduce tasks 0 0 1
> >> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> >> Launched map tasks 0 0 2
> >> Data-local map tasks 0 0 2
> >> SLOTS_MILLIS_REDUCES 0 0 9,199
> >> ***************
> >> My question why r there 2 launched map tasks when i have only a small
> >> file.
> >> Per my understanding it is only 1 block.
> >> and should be only 1 split.
> >> Then for each line a map computation should occur
> >> but it shows 2 map tasks.
> >> Please let me know.
> >> Thanks
> >> Sai
> >>
> >
> >
>
>
>
> --
> Harsh J
>

Re: 2 Map tasks running for a small input file

Posted by shashwat shriparv <dw...@gmail.com>.

just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command
line and check how many map task its running. and also set this in
mapred-site.xml and check.

*Thanks & Regards    *

∞
Shashwat Shriparv



On Thu, Sep 26, 2013 at 5:24 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Sai,
>
> What Viji indicated is that the default Apache Hadoop setting for any
> input is 2 maps. If the input is larger than one block, regular
> policies of splitting such as those stated by Shekhar would apply. But
> for smaller inputs, just for an out-of-box "parallelism experience",
> Hadoop ships with a 2-maps forced splitting default
> (mapred.map.tasks=2).
>
> This means your 5 lines is probably divided as 2:3 or other ratios and
> is processed by 2 different Tasks. As Viji also indicated, to turn off
> this behavior, you can set the mapred.map.tasks to 1 in your configs
> and then you'll see only one map task process all 5 lines.
>
> On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
> > Thanks Viji.
> > I am confused a little when the data is small y would there b 2 tasks.
> > U will use the min as 2 if u need it but in this case it is not needed
> due
> > to size of the data being small
> > so y would 2 map tasks exec.
> > Since it results in 1 block with 5 lines of data in it
> > i am assuming this results in 5 map computations 1 per each line
> > and all of em in 1 process/node since i m using a pseudo vm.
> > Where is the second task coming from.
> > The 5 computations of map on each line is 1 task.
> > Is this right.
> > Please help.
> > Thanks
> >
> >
> > ________________________________
> > From: Viji R <vi...@cloudera.com>
> > To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> > Sent: Thursday, 26 September 2013 5:09 PM
> > Subject: Re: 2 Map tasks running for a small input file
> >
> > Hi,
> >
> > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> > avoid this.
> >
> > Regards,
> > Viji
> >
> > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> >> Hi
> >> Here is the input file for the wordcount job:
> >> ******************
> >> Hi This is a simple test.
> >> Hi Hadoop how r u.
> >> Hello Hello.
> >> Hi Hi.
> >> Hadoop Hadoop Welcome.
> >> ******************
> >>
> >> After running the wordcount successfully
> >> here r the counters info:
> >>
> >> ***************
> >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> >> Launched reduce tasks 0 0 1
> >> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> >> Launched map tasks 0 0 2
> >> Data-local map tasks 0 0 2
> >> SLOTS_MILLIS_REDUCES 0 0 9,199
> >> ***************
> >> My question why r there 2 launched map tasks when i have only a small
> >> file.
> >> Per my understanding it is only 1 block.
> >> and should be only 1 split.
> >> Then for each line a map computation should occur
> >> but it shows 2 map tasks.
> >> Please let me know.
> >> Thanks
> >> Sai
> >>
> >
> >
>
>
>
> --
> Harsh J
>

Re: 2 Map tasks running for a small input file

Posted by shashwat shriparv <dw...@gmail.com>.

just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command
line and check how many map task its running. and also set this in
mapred-site.xml and check.

*Thanks & Regards    *

∞
Shashwat Shriparv



On Thu, Sep 26, 2013 at 5:24 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Sai,
>
> What Viji indicated is that the default Apache Hadoop setting for any
> input is 2 maps. If the input is larger than one block, regular
> policies of splitting such as those stated by Shekhar would apply. But
> for smaller inputs, just for an out-of-box "parallelism experience",
> Hadoop ships with a 2-maps forced splitting default
> (mapred.map.tasks=2).
>
> This means your 5 lines is probably divided as 2:3 or other ratios and
> is processed by 2 different Tasks. As Viji also indicated, to turn off
> this behavior, you can set the mapred.map.tasks to 1 in your configs
> and then you'll see only one map task process all 5 lines.
>
> On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
> > Thanks Viji.
> > I am confused a little when the data is small y would there b 2 tasks.
> > U will use the min as 2 if u need it but in this case it is not needed
> due
> > to size of the data being small
> > so y would 2 map tasks exec.
> > Since it results in 1 block with 5 lines of data in it
> > i am assuming this results in 5 map computations 1 per each line
> > and all of em in 1 process/node since i m using a pseudo vm.
> > Where is the second task coming from.
> > The 5 computations of map on each line is 1 task.
> > Is this right.
> > Please help.
> > Thanks
> >
> >
> > ________________________________
> > From: Viji R <vi...@cloudera.com>
> > To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> > Sent: Thursday, 26 September 2013 5:09 PM
> > Subject: Re: 2 Map tasks running for a small input file
> >
> > Hi,
> >
> > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> > avoid this.
> >
> > Regards,
> > Viji
> >
> > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> >> Hi
> >> Here is the input file for the wordcount job:
> >> ******************
> >> Hi This is a simple test.
> >> Hi Hadoop how r u.
> >> Hello Hello.
> >> Hi Hi.
> >> Hadoop Hadoop Welcome.
> >> ******************
> >>
> >> After running the wordcount successfully
> >> here r the counters info:
> >>
> >> ***************
> >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> >> Launched reduce tasks 0 0 1
> >> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> >> Launched map tasks 0 0 2
> >> Data-local map tasks 0 0 2
> >> SLOTS_MILLIS_REDUCES 0 0 9,199
> >> ***************
> >> My question why r there 2 launched map tasks when i have only a small
> >> file.
> >> Per my understanding it is only 1 block.
> >> and should be only 1 split.
> >> Then for each line a map computation should occur
> >> but it shows 2 map tasks.
> >> Please let me know.
> >> Thanks
> >> Sai
> >>
> >
> >
>
>
>
> --
> Harsh J
>

Re: 2 Map tasks running for a small input file

Posted by shashwat shriparv <dw...@gmail.com>.

just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command
line and check how many map task its running. and also set this in
mapred-site.xml and check.

*Thanks & Regards    *

∞
Shashwat Shriparv



On Thu, Sep 26, 2013 at 5:24 PM, Harsh J <ha...@cloudera.com> wrote:

> Hi Sai,
>
> What Viji indicated is that the default Apache Hadoop setting for any
> input is 2 maps. If the input is larger than one block, regular
> policies of splitting such as those stated by Shekhar would apply. But
> for smaller inputs, just for an out-of-box "parallelism experience",
> Hadoop ships with a 2-maps forced splitting default
> (mapred.map.tasks=2).
>
> This means your 5 lines is probably divided as 2:3 or other ratios and
> is processed by 2 different Tasks. As Viji also indicated, to turn off
> this behavior, you can set the mapred.map.tasks to 1 in your configs
> and then you'll see only one map task process all 5 lines.
>
> On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
> > Thanks Viji.
> > I am confused a little when the data is small y would there b 2 tasks.
> > U will use the min as 2 if u need it but in this case it is not needed
> due
> > to size of the data being small
> > so y would 2 map tasks exec.
> > Since it results in 1 block with 5 lines of data in it
> > i am assuming this results in 5 map computations 1 per each line
> > and all of em in 1 process/node since i m using a pseudo vm.
> > Where is the second task coming from.
> > The 5 computations of map on each line is 1 task.
> > Is this right.
> > Please help.
> > Thanks
> >
> >
> > ________________________________
> > From: Viji R <vi...@cloudera.com>
> > To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> > Sent: Thursday, 26 September 2013 5:09 PM
> > Subject: Re: 2 Map tasks running for a small input file
> >
> > Hi,
> >
> > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> > avoid this.
> >
> > Regards,
> > Viji
> >
> > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> >> Hi
> >> Here is the input file for the wordcount job:
> >> ******************
> >> Hi This is a simple test.
> >> Hi Hadoop how r u.
> >> Hello Hello.
> >> Hi Hi.
> >> Hadoop Hadoop Welcome.
> >> ******************
> >>
> >> After running the wordcount successfully
> >> here r the counters info:
> >>
> >> ***************
> >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> >> Launched reduce tasks 0 0 1
> >> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> >> Launched map tasks 0 0 2
> >> Data-local map tasks 0 0 2
> >> SLOTS_MILLIS_REDUCES 0 0 9,199
> >> ***************
> >> My question why r there 2 launched map tasks when i have only a small
> >> file.
> >> Per my understanding it is only 1 block.
> >> and should be only 1 split.
> >> Then for each line a map computation should occur
> >> but it shows 2 map tasks.
> >> Please let me know.
> >> Thanks
> >> Sai
> >>
> >
> >
>
>
>
> --
> Harsh J
>

Re: 2 Map tasks running for a small input file

Posted by Harsh J <ha...@cloudera.com>.

Hi Sai,

What Viji indicated is that the default Apache Hadoop setting for any
input is 2 maps. If the input is larger than one block, regular
policies of splitting such as those stated by Shekhar would apply. But
for smaller inputs, just for an out-of-box "parallelism experience",
Hadoop ships with a 2-maps forced splitting default
(mapred.map.tasks=2).

This means your 5 lines is probably divided as 2:3 or other ratios and
is processed by 2 different Tasks. As Viji also indicated, to turn off
this behavior, you can set the mapred.map.tasks to 1 in your configs
and then you'll see only one map task process all 5 lines.

On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
> Thanks Viji.
> I am confused a little when the data is small y would there b 2 tasks.
> U will use the min as 2 if u need it but in this case it is not needed due
> to size of the data being small
> so y would 2 map tasks exec.
> Since it results in 1 block with 5 lines of data in it
> i am assuming this results in 5 map computations 1 per each line
> and all of em in 1 process/node since i m using a pseudo vm.
> Where is the second task coming from.
> The 5 computations of map on each line is 1 task.
> Is this right.
> Please help.
> Thanks
>
>
> ________________________________
> From: Viji R <vi...@cloudera.com>
> To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> Sent: Thursday, 26 September 2013 5:09 PM
> Subject: Re: 2 Map tasks running for a small input file
>
> Hi,
>
> Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> avoid this.
>
> Regards,
> Viji
>
> On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> Hi
>> Here is the input file for the wordcount job:
>> ******************
>> Hi This is a simple test.
>> Hi Hadoop how r u.
>> Hello Hello.
>> Hi Hi.
>> Hadoop Hadoop Welcome.
>> ******************
>>
>> After running the wordcount successfully
>> here r the counters info:
>>
>> ***************
>> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> Launched reduce tasks 0 0 1
>> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
>> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> Launched map tasks 0 0 2
>> Data-local map tasks 0 0 2
>> SLOTS_MILLIS_REDUCES 0 0 9,199
>> ***************
>> My question why r there 2 launched map tasks when i have only a small
>> file.
>> Per my understanding it is only 1 block.
>> and should be only 1 split.
>> Then for each line a map computation should occur
>> but it shows 2 map tasks.
>> Please let me know.
>> Thanks
>> Sai
>>
>
>



-- 
Harsh J

Re: 2 Map tasks running for a small input file

Posted by Harsh J <ha...@cloudera.com>.

Hi Sai,

What Viji indicated is that the default Apache Hadoop setting for any
input is 2 maps. If the input is larger than one block, regular
policies of splitting such as those stated by Shekhar would apply. But
for smaller inputs, just for an out-of-box "parallelism experience",
Hadoop ships with a 2-maps forced splitting default
(mapred.map.tasks=2).

This means your 5 lines is probably divided as 2:3 or other ratios and
is processed by 2 different Tasks. As Viji also indicated, to turn off
this behavior, you can set the mapred.map.tasks to 1 in your configs
and then you'll see only one map task process all 5 lines.

On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
> Thanks Viji.
> I am confused a little when the data is small y would there b 2 tasks.
> U will use the min as 2 if u need it but in this case it is not needed due
> to size of the data being small
> so y would 2 map tasks exec.
> Since it results in 1 block with 5 lines of data in it
> i am assuming this results in 5 map computations 1 per each line
> and all of em in 1 process/node since i m using a pseudo vm.
> Where is the second task coming from.
> The 5 computations of map on each line is 1 task.
> Is this right.
> Please help.
> Thanks
>
>
> ________________________________
> From: Viji R <vi...@cloudera.com>
> To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> Sent: Thursday, 26 September 2013 5:09 PM
> Subject: Re: 2 Map tasks running for a small input file
>
> Hi,
>
> Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> avoid this.
>
> Regards,
> Viji
>
> On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> Hi
>> Here is the input file for the wordcount job:
>> ******************
>> Hi This is a simple test.
>> Hi Hadoop how r u.
>> Hello Hello.
>> Hi Hi.
>> Hadoop Hadoop Welcome.
>> ******************
>>
>> After running the wordcount successfully
>> here r the counters info:
>>
>> ***************
>> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> Launched reduce tasks 0 0 1
>> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
>> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> Launched map tasks 0 0 2
>> Data-local map tasks 0 0 2
>> SLOTS_MILLIS_REDUCES 0 0 9,199
>> ***************
>> My question why r there 2 launched map tasks when i have only a small
>> file.
>> Per my understanding it is only 1 block.
>> and should be only 1 split.
>> Then for each line a map computation should occur
>> but it shows 2 map tasks.
>> Please let me know.
>> Thanks
>> Sai
>>
>
>



-- 
Harsh J

Re: 2 Map tasks running for a small input file

Posted by Harsh J <ha...@cloudera.com>.

Hi Sai,

What Viji indicated is that the default Apache Hadoop setting for any
input is 2 maps. If the input is larger than one block, regular
policies of splitting such as those stated by Shekhar would apply. But
for smaller inputs, just for an out-of-box "parallelism experience",
Hadoop ships with a 2-maps forced splitting default
(mapred.map.tasks=2).

This means your 5 lines is probably divided as 2:3 or other ratios and
is processed by 2 different Tasks. As Viji also indicated, to turn off
this behavior, you can set the mapred.map.tasks to 1 in your configs
and then you'll see only one map task process all 5 lines.

On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
> Thanks Viji.
> I am confused a little when the data is small y would there b 2 tasks.
> U will use the min as 2 if u need it but in this case it is not needed due
> to size of the data being small
> so y would 2 map tasks exec.
> Since it results in 1 block with 5 lines of data in it
> i am assuming this results in 5 map computations 1 per each line
> and all of em in 1 process/node since i m using a pseudo vm.
> Where is the second task coming from.
> The 5 computations of map on each line is 1 task.
> Is this right.
> Please help.
> Thanks
>
>
> ________________________________
> From: Viji R <vi...@cloudera.com>
> To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> Sent: Thursday, 26 September 2013 5:09 PM
> Subject: Re: 2 Map tasks running for a small input file
>
> Hi,
>
> Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> avoid this.
>
> Regards,
> Viji
>
> On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> Hi
>> Here is the input file for the wordcount job:
>> ******************
>> Hi This is a simple test.
>> Hi Hadoop how r u.
>> Hello Hello.
>> Hi Hi.
>> Hadoop Hadoop Welcome.
>> ******************
>>
>> After running the wordcount successfully
>> here r the counters info:
>>
>> ***************
>> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> Launched reduce tasks 0 0 1
>> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
>> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> Launched map tasks 0 0 2
>> Data-local map tasks 0 0 2
>> SLOTS_MILLIS_REDUCES 0 0 9,199
>> ***************
>> My question why r there 2 launched map tasks when i have only a small
>> file.
>> Per my understanding it is only 1 block.
>> and should be only 1 split.
>> Then for each line a map computation should occur
>> but it shows 2 map tasks.
>> Please let me know.
>> Thanks
>> Sai
>>
>
>



-- 
Harsh J

Re: 2 Map tasks running for a small input file

Posted by Harsh J <ha...@cloudera.com>.

Hi Sai,

What Viji indicated is that the default Apache Hadoop setting for any
input is 2 maps. If the input is larger than one block, regular
policies of splitting such as those stated by Shekhar would apply. But
for smaller inputs, just for an out-of-box "parallelism experience",
Hadoop ships with a 2-maps forced splitting default
(mapred.map.tasks=2).

This means your 5 lines is probably divided as 2:3 or other ratios and
is processed by 2 different Tasks. As Viji also indicated, to turn off
this behavior, you can set the mapred.map.tasks to 1 in your configs
and then you'll see only one map task process all 5 lines.

On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai <sa...@yahoo.in> wrote:
> Thanks Viji.
> I am confused a little when the data is small y would there b 2 tasks.
> U will use the min as 2 if u need it but in this case it is not needed due
> to size of the data being small
> so y would 2 map tasks exec.
> Since it results in 1 block with 5 lines of data in it
> i am assuming this results in 5 map computations 1 per each line
> and all of em in 1 process/node since i m using a pseudo vm.
> Where is the second task coming from.
> The 5 computations of map on each line is 1 task.
> Is this right.
> Please help.
> Thanks
>
>
> ________________________________
> From: Viji R <vi...@cloudera.com>
> To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> Sent: Thursday, 26 September 2013 5:09 PM
> Subject: Re: 2 Map tasks running for a small input file
>
> Hi,
>
> Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> avoid this.
>
> Regards,
> Viji
>
> On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> Hi
>> Here is the input file for the wordcount job:
>> ******************
>> Hi This is a simple test.
>> Hi Hadoop how r u.
>> Hello Hello.
>> Hi Hi.
>> Hadoop Hadoop Welcome.
>> ******************
>>
>> After running the wordcount successfully
>> here r the counters info:
>>
>> ***************
>> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> Launched reduce tasks 0 0 1
>> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
>> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> Launched map tasks 0 0 2
>> Data-local map tasks 0 0 2
>> SLOTS_MILLIS_REDUCES 0 0 9,199
>> ***************
>> My question why r there 2 launched map tasks when i have only a small
>> file.
>> Per my understanding it is only 1 block.
>> and should be only 1 split.
>> Then for each line a map computation should occur
>> but it shows 2 map tasks.
>> Please let me know.
>> Thanks
>> Sai
>>
>
>



-- 
Harsh J

Re: 2 Map tasks running for a small input file

Posted by Sai Sai <sa...@yahoo.in>.

Thanks Viji.
I am confused a little when the data is small y would there b 2 tasks.
U will use the min as 2 if u need it but in this case it is not needed due to size of the data being small 
so y would 2 map tasks exec.
Since it results in 1 block with 5 lines of data in it
i am assuming this results in 5 map computations 1 per each line 
and all of em in 1 process/node since i m using a pseudo vm.
Where is the second task coming from.
The 5 computations of map on each line is 1 task.
Is this right.
Please help.
Thanks

________________________________
 From: Viji R <vi...@cloudera.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Thursday, 26 September 2013 5:09 PM
Subject: Re: 2 Map tasks running for a small input file

Hi,

Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
avoid this.

Regards,
Viji

On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> Hi
> Here is the input file for the wordcount job:
> ******************
> Hi This is a simple test.
> Hi Hadoop how r u.
> Hello Hello.
> Hi Hi.
> Hadoop Hadoop Welcome.
> ******************
>
> After running the wordcount successfully
> here r the counters info:
>
> ***************
> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> Launched reduce tasks 0 0 1
> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> Launched map tasks 0 0 2
> Data-local map tasks 0 0 2
> SLOTS_MILLIS_REDUCES 0 0 9,199
> ***************
> My question why r there 2 launched map tasks when i have only a small file.
> Per my understanding it is only 1 block.
> and should be only 1 split.
> Then for each line a map computation should occur
> but it shows 2 map tasks.
> Please let me know.
> Thanks
> Sai
>

Re: 2 Map tasks running for a small input file

Posted by Sai Sai <sa...@yahoo.in>.

Thanks Viji.
I am confused a little when the data is small y would there b 2 tasks.
U will use the min as 2 if u need it but in this case it is not needed due to size of the data being small 
so y would 2 map tasks exec.
Since it results in 1 block with 5 lines of data in it
i am assuming this results in 5 map computations 1 per each line 
and all of em in 1 process/node since i m using a pseudo vm.
Where is the second task coming from.
The 5 computations of map on each line is 1 task.
Is this right.
Please help.
Thanks

________________________________
 From: Viji R <vi...@cloudera.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Thursday, 26 September 2013 5:09 PM
Subject: Re: 2 Map tasks running for a small input file

Hi,

Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
avoid this.

Regards,
Viji

On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> Hi
> Here is the input file for the wordcount job:
> ******************
> Hi This is a simple test.
> Hi Hadoop how r u.
> Hello Hello.
> Hi Hi.
> Hadoop Hadoop Welcome.
> ******************
>
> After running the wordcount successfully
> here r the counters info:
>
> ***************
> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> Launched reduce tasks 0 0 1
> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> Launched map tasks 0 0 2
> Data-local map tasks 0 0 2
> SLOTS_MILLIS_REDUCES 0 0 9,199
> ***************
> My question why r there 2 launched map tasks when i have only a small file.
> Per my understanding it is only 1 block.
> and should be only 1 split.
> Then for each line a map computation should occur
> but it shows 2 map tasks.
> Please let me know.
> Thanks
> Sai
>

Re: 2 Map tasks running for a small input file

Posted by Sai Sai <sa...@yahoo.in>.

Thanks Viji.
I am confused a little when the data is small y would there b 2 tasks.
U will use the min as 2 if u need it but in this case it is not needed due to size of the data being small 
so y would 2 map tasks exec.
Since it results in 1 block with 5 lines of data in it
i am assuming this results in 5 map computations 1 per each line 
and all of em in 1 process/node since i m using a pseudo vm.
Where is the second task coming from.
The 5 computations of map on each line is 1 task.
Is this right.
Please help.
Thanks

________________________________
 From: Viji R <vi...@cloudera.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Thursday, 26 September 2013 5:09 PM
Subject: Re: 2 Map tasks running for a small input file

Hi,

Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
avoid this.

Regards,
Viji

On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> Hi
> Here is the input file for the wordcount job:
> ******************
> Hi This is a simple test.
> Hi Hadoop how r u.
> Hello Hello.
> Hi Hi.
> Hadoop Hadoop Welcome.
> ******************
>
> After running the wordcount successfully
> here r the counters info:
>
> ***************
> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> Launched reduce tasks 0 0 1
> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> Launched map tasks 0 0 2
> Data-local map tasks 0 0 2
> SLOTS_MILLIS_REDUCES 0 0 9,199
> ***************
> My question why r there 2 launched map tasks when i have only a small file.
> Per my understanding it is only 1 block.
> and should be only 1 split.
> Then for each line a map computation should occur
> but it shows 2 map tasks.
> Please let me know.
> Thanks
> Sai
>

Re: 2 Map tasks running for a small input file

Posted by Shekhar Sharma <sh...@gmail.com>.

Number of map tasks on a mapreduce job doesnt depend on this
property..it depends on the number of input splits...( or equal to
number blocks if input split size = block size)

1. What is the input format you are using? if yes what is the value of
N, you are using?

2. WHat is the propety mapred.min.split.size? have you changed to
something else or is it default which is 1?




Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Sep 26, 2013 at 4:39 PM, Viji R <vi...@cloudera.com> wrote:
> Hi,
>
> Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> avoid this.
>
> Regards,
> Viji
>
> On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> Hi
>> Here is the input file for the wordcount job:
>> ******************
>> Hi This is a simple test.
>> Hi Hadoop how r u.
>> Hello Hello.
>> Hi Hi.
>> Hadoop Hadoop Welcome.
>> ******************
>>
>> After running the wordcount successfully
>> here r the counters info:
>>
>> ***************
>> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> Launched reduce tasks 0 0 1
>> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
>> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> Launched map tasks 0 0 2
>> Data-local map tasks 0 0 2
>> SLOTS_MILLIS_REDUCES 0 0 9,199
>> ***************
>> My question why r there 2 launched map tasks when i have only a small file.
>> Per my understanding it is only 1 block.
>> and should be only 1 split.
>> Then for each line a map computation should occur
>> but it shows 2 map tasks.
>> Please let me know.
>> Thanks
>> Sai
>>

Re: 2 Map tasks running for a small input file

Posted by Shekhar Sharma <sh...@gmail.com>.

Number of map tasks on a mapreduce job doesnt depend on this
property..it depends on the number of input splits...( or equal to
number blocks if input split size = block size)

1. What is the input format you are using? if yes what is the value of
N, you are using?

2. WHat is the propety mapred.min.split.size? have you changed to
something else or is it default which is 1?




Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Sep 26, 2013 at 4:39 PM, Viji R <vi...@cloudera.com> wrote:
> Hi,
>
> Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> avoid this.
>
> Regards,
> Viji
>
> On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
>> Hi
>> Here is the input file for the wordcount job:
>> ******************
>> Hi This is a simple test.
>> Hi Hadoop how r u.
>> Hello Hello.
>> Hi Hi.
>> Hadoop Hadoop Welcome.
>> ******************
>>
>> After running the wordcount successfully
>> here r the counters info:
>>
>> ***************
>> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
>> Launched reduce tasks 0 0 1
>> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
>> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
>> Launched map tasks 0 0 2
>> Data-local map tasks 0 0 2
>> SLOTS_MILLIS_REDUCES 0 0 9,199
>> ***************
>> My question why r there 2 launched map tasks when i have only a small file.
>> Per my understanding it is only 1 block.
>> and should be only 1 split.
>> Then for each line a map computation should occur
>> but it shows 2 map tasks.
>> Please let me know.
>> Thanks
>> Sai
>>

Re: 2 Map tasks running for a small input file

Posted by Sai Sai <sa...@yahoo.in>.

Thanks Viji.
I am confused a little when the data is small y would there b 2 tasks.
U will use the min as 2 if u need it but in this case it is not needed due to size of the data being small 
so y would 2 map tasks exec.
Since it results in 1 block with 5 lines of data in it
i am assuming this results in 5 map computations 1 per each line 
and all of em in 1 process/node since i m using a pseudo vm.
Where is the second task coming from.
The 5 computations of map on each line is 1 task.
Is this right.
Please help.
Thanks

________________________________
 From: Viji R <vi...@cloudera.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Thursday, 26 September 2013 5:09 PM
Subject: Re: 2 Map tasks running for a small input file

Hi,

Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
avoid this.

Regards,
Viji

On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> Hi
> Here is the input file for the wordcount job:
> ******************
> Hi This is a simple test.
> Hi Hadoop how r u.
> Hello Hello.
> Hi Hi.
> Hadoop Hadoop Welcome.
> ******************
>
> After running the wordcount successfully
> here r the counters info:
>
> ***************
> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> Launched reduce tasks 0 0 1
> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> Launched map tasks 0 0 2
> Data-local map tasks 0 0 2
> SLOTS_MILLIS_REDUCES 0 0 9,199
> ***************
> My question why r there 2 launched map tasks when i have only a small file.
> Per my understanding it is only 1 block.
> and should be only 1 split.
> Then for each line a map computation should occur
> but it shows 2 map tasks.
> Please let me know.
> Thanks
> Sai
>

Re: 2 Map tasks running for a small input file

Posted by Viji R <vi...@cloudera.com>.

Hi,

Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
avoid this.

Regards,
Viji

On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> Hi
> Here is the input file for the wordcount job:
> ******************
> Hi This is a simple test.
> Hi Hadoop how r u.
> Hello Hello.
> Hi Hi.
> Hadoop Hadoop Welcome.
> ******************
>
> After running the wordcount successfully
> here r the counters info:
>
> ***************
> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> Launched reduce tasks 0 0 1
> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> Launched map tasks 0 0 2
> Data-local map tasks 0 0 2
> SLOTS_MILLIS_REDUCES 0 0 9,199
> ***************
> My question why r there 2 launched map tasks when i have only a small file.
> Per my understanding it is only 1 block.
> and should be only 1 split.
> Then for each line a map computation should occur
> but it shows 2 map tasks.
> Please let me know.
> Thanks
> Sai
>

Re: 2 Map tasks running for a small input file

Posted by Viji R <vi...@cloudera.com>.

Hi,

Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
avoid this.

Regards,
Viji

On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> Hi
> Here is the input file for the wordcount job:
> ******************
> Hi This is a simple test.
> Hi Hadoop how r u.
> Hello Hello.
> Hi Hi.
> Hadoop Hadoop Welcome.
> ******************
>
> After running the wordcount successfully
> here r the counters info:
>
> ***************
> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> Launched reduce tasks 0 0 1
> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> Launched map tasks 0 0 2
> Data-local map tasks 0 0 2
> SLOTS_MILLIS_REDUCES 0 0 9,199
> ***************
> My question why r there 2 launched map tasks when i have only a small file.
> Per my understanding it is only 1 block.
> and should be only 1 split.
> Then for each line a map computation should occur
> but it shows 2 map tasks.
> Please let me know.
> Thanks
> Sai
>

Re: Input Split vs Task vs attempt vs computation

Posted by Sai Sai <sa...@yahoo.in>.

Hi
I have a few questions i am trying to understand:

1. Is each input split same as a record, (a rec can be a single line or multiple lines).

2. Is each Task a collection of few computations or attempts.

For ex: if i have a small file with 5 lines.

By default there will be 1 line on which each map computation is performed.
So totally 5 computations r done on 1 node.

This means JT will spawn 1 JVM for 1 Tasktracker on a node
and another JVM for map task which will instantiate 5 map objects 1 for each line.

The MT JVM is called the task which will have 5 attempts for  each line.
This means attempt is same as computation.

Please let me know if anything is incorrect.
Thanks
Sai

Re: 2 Map tasks running for a small input file

Posted by Viji R <vi...@cloudera.com>.

Hi,

Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
avoid this.

Regards,
Viji

On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> Hi
> Here is the input file for the wordcount job:
> ******************
> Hi This is a simple test.
> Hi Hadoop how r u.
> Hello Hello.
> Hi Hi.
> Hadoop Hadoop Welcome.
> ******************
>
> After running the wordcount successfully
> here r the counters info:
>
> ***************
> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> Launched reduce tasks 0 0 1
> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> Launched map tasks 0 0 2
> Data-local map tasks 0 0 2
> SLOTS_MILLIS_REDUCES 0 0 9,199
> ***************
> My question why r there 2 launched map tasks when i have only a small file.
> Per my understanding it is only 1 block.
> and should be only 1 split.
> Then for each line a map computation should occur
> but it shows 2 map tasks.
> Please let me know.
> Thanks
> Sai
>

Re: Input Split vs Task vs attempt vs computation

Posted by Sai Sai <sa...@yahoo.in>.

Hi
I have a few questions i am trying to understand:

1. Is each input split same as a record, (a rec can be a single line or multiple lines).

2. Is each Task a collection of few computations or attempts.

For ex: if i have a small file with 5 lines.

By default there will be 1 line on which each map computation is performed.
So totally 5 computations r done on 1 node.

This means JT will spawn 1 JVM for 1 Tasktracker on a node
and another JVM for map task which will instantiate 5 map objects 1 for each line.

The MT JVM is called the task which will have 5 attempts for  each line.
This means attempt is same as computation.

Please let me know if anything is incorrect.
Thanks
Sai

Re: Input Split vs Task vs attempt vs computation

Posted by Sai Sai <sa...@yahoo.in>.

Hi
I have a few questions i am trying to understand:

1. Is each input split same as a record, (a rec can be a single line or multiple lines).

2. Is each Task a collection of few computations or attempts.

For ex: if i have a small file with 5 lines.

By default there will be 1 line on which each map computation is performed.
So totally 5 computations r done on 1 node.

This means JT will spawn 1 JVM for 1 Tasktracker on a node
and another JVM for map task which will instantiate 5 map objects 1 for each line.

The MT JVM is called the task which will have 5 attempts for  each line.
This means attempt is same as computation.

Please let me know if anything is incorrect.
Thanks
Sai

Re: Retrieve and compute input splits

Posted by Sai Sai <sa...@yahoo.in>.

Thanks for your suggestions and replies.
I am still confused about this:

To create the list of tasks to run, the job scheduler first retrieves the input splits computed by the JobClient from the shared filesystem (step 6).

My question:

Does the input split in the above statement refer to the physical block or the logical input split.
I undersstand that the client will split the file and save the blocks at the time of writing the file to the cluster and the meta data
about the blocks is in Namenode. 
The only place where the meta data about the blocks is in NN so can v assume in step 6 is the scheduler goes to 
NN for retrieving this meta data from NN and thats what is indicated in the diagram as Shared File System HDFS.
And if this is right the input split is the physical blocks info and not the logical input split info which could be just a single line
if v r using TextInuptFormat  the default one.
Any suggestions.
Thanks
Sai

________________________________
 From: Jay Vyas <ja...@gmail.com>
To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Cc: Sai Sai <sa...@yahoo.in> 
Sent: Saturday, 28 September 2013 5:35 AM
Subject: Re: Retrieve and compute input splits

Technically, the block locations are provided by the InputSplit which in the FileInputFormat case, is provided by the FileSystem Interface.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputSplit.html

The thing to realize here is that the FileSystem implementation is provided at runtime - so the InputSplit class is responsible to create a FileSystem implementation using reflection, and then call the getBlockLocations of on a given file or set of files which the input split is corresponding to.

I think your confusion here lies in the fact that the input splits create a filesystem, however, they dont know what the filesystem implementation actually is - they only rely on the abstract contract, which provides a set of block locations.  

See the FileSystem abstract class for details on that.

On Fri, Sep 27, 2013 at 7:02 PM, Peyman Mohajerian <mo...@gmail.com> wrote:

For the JobClient to compute the input splits doesn't it need to contact Name Node. Only Name Node knows where the splits are, how can it compute it without that additional call?
>
>
>
>
>On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com> wrote:
>
>The input splits are not copied, only the information on the location of the splits is copied to the jobtracker so that it can assign tasktrackers which are local to the split.
>>
>>
>>Check the Job Initialization section at 
>>http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>>
>>
>>
>>To create the list of tasks to run, the job scheduler first retrieves the input splits computed by the JobClient from the shared filesystem (step 6). It then creates one map task for each split. The number of reduce tasks to create is determined by the mapred.reduce.tasks property in the JobConf, which is set by the setNumReduceTasks() method, and the scheduler simply creates this number of reduce tasks to be run. Tasks are given IDs at this point.
>>
>>
>>
>>Best Regards,
>>Sonal
>>Nube Technologies 
>>
>>
>>
>>
>>
>>
>>
>>
>>On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>Hi
>>>I have attached the anatomy of MR from definitive guide.
>>>
>>>
>>>In step 6 it says JT/Scheduler  retrieve  input splits computed by the client from hdfs.
>>>
>>>
>>>In the above line it refers to as the client computes input splits.
>>>
>>>
>>>
>>>1. Why does the JT/Scheduler retrieve the input splits and what does it do.
>>>If it is retrieving the input split does this mean it goes to the block and reads each record 
>>>and gets the record back to JT. If so this is a lot of data movement for large files.
>>>which is not data locality. so i m getting confused.
>>>
>>>
>>>2. How does the client know how to calculate the input splits.
>>>
>>>
>>>Any help please.
>>>ThanksSai
>>
>

-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Retrieve and compute input splits

Posted by Sai Sai <sa...@yahoo.in>.

Thanks for your suggestions and replies.
I am still confused about this:

To create the list of tasks to run, the job scheduler first retrieves the input splits computed by the JobClient from the shared filesystem (step 6).

My question:

Does the input split in the above statement refer to the physical block or the logical input split.
I undersstand that the client will split the file and save the blocks at the time of writing the file to the cluster and the meta data
about the blocks is in Namenode. 
The only place where the meta data about the blocks is in NN so can v assume in step 6 is the scheduler goes to 
NN for retrieving this meta data from NN and thats what is indicated in the diagram as Shared File System HDFS.
And if this is right the input split is the physical blocks info and not the logical input split info which could be just a single line
if v r using TextInuptFormat  the default one.
Any suggestions.
Thanks
Sai

________________________________
 From: Jay Vyas <ja...@gmail.com>
To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Cc: Sai Sai <sa...@yahoo.in> 
Sent: Saturday, 28 September 2013 5:35 AM
Subject: Re: Retrieve and compute input splits

Technically, the block locations are provided by the InputSplit which in the FileInputFormat case, is provided by the FileSystem Interface.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputSplit.html

The thing to realize here is that the FileSystem implementation is provided at runtime - so the InputSplit class is responsible to create a FileSystem implementation using reflection, and then call the getBlockLocations of on a given file or set of files which the input split is corresponding to.

I think your confusion here lies in the fact that the input splits create a filesystem, however, they dont know what the filesystem implementation actually is - they only rely on the abstract contract, which provides a set of block locations.  

See the FileSystem abstract class for details on that.

On Fri, Sep 27, 2013 at 7:02 PM, Peyman Mohajerian <mo...@gmail.com> wrote:

For the JobClient to compute the input splits doesn't it need to contact Name Node. Only Name Node knows where the splits are, how can it compute it without that additional call?
>
>
>
>
>On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com> wrote:
>
>The input splits are not copied, only the information on the location of the splits is copied to the jobtracker so that it can assign tasktrackers which are local to the split.
>>
>>
>>Check the Job Initialization section at 
>>http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>>
>>
>>
>>To create the list of tasks to run, the job scheduler first retrieves the input splits computed by the JobClient from the shared filesystem (step 6). It then creates one map task for each split. The number of reduce tasks to create is determined by the mapred.reduce.tasks property in the JobConf, which is set by the setNumReduceTasks() method, and the scheduler simply creates this number of reduce tasks to be run. Tasks are given IDs at this point.
>>
>>
>>
>>Best Regards,
>>Sonal
>>Nube Technologies 
>>
>>
>>
>>
>>
>>
>>
>>
>>On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>Hi
>>>I have attached the anatomy of MR from definitive guide.
>>>
>>>
>>>In step 6 it says JT/Scheduler  retrieve  input splits computed by the client from hdfs.
>>>
>>>
>>>In the above line it refers to as the client computes input splits.
>>>
>>>
>>>
>>>1. Why does the JT/Scheduler retrieve the input splits and what does it do.
>>>If it is retrieving the input split does this mean it goes to the block and reads each record 
>>>and gets the record back to JT. If so this is a lot of data movement for large files.
>>>which is not data locality. so i m getting confused.
>>>
>>>
>>>2. How does the client know how to calculate the input splits.
>>>
>>>
>>>Any help please.
>>>ThanksSai
>>
>

-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Retrieve and compute input splits

Posted by Sai Sai <sa...@yahoo.in>.

Thanks for your suggestions and replies.
I am still confused about this:

To create the list of tasks to run, the job scheduler first retrieves the input splits computed by the JobClient from the shared filesystem (step 6).

My question:

Does the input split in the above statement refer to the physical block or the logical input split.
I undersstand that the client will split the file and save the blocks at the time of writing the file to the cluster and the meta data
about the blocks is in Namenode. 
The only place where the meta data about the blocks is in NN so can v assume in step 6 is the scheduler goes to 
NN for retrieving this meta data from NN and thats what is indicated in the diagram as Shared File System HDFS.
And if this is right the input split is the physical blocks info and not the logical input split info which could be just a single line
if v r using TextInuptFormat  the default one.
Any suggestions.
Thanks
Sai

________________________________
 From: Jay Vyas <ja...@gmail.com>
To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Cc: Sai Sai <sa...@yahoo.in> 
Sent: Saturday, 28 September 2013 5:35 AM
Subject: Re: Retrieve and compute input splits

Technically, the block locations are provided by the InputSplit which in the FileInputFormat case, is provided by the FileSystem Interface.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputSplit.html

The thing to realize here is that the FileSystem implementation is provided at runtime - so the InputSplit class is responsible to create a FileSystem implementation using reflection, and then call the getBlockLocations of on a given file or set of files which the input split is corresponding to.

I think your confusion here lies in the fact that the input splits create a filesystem, however, they dont know what the filesystem implementation actually is - they only rely on the abstract contract, which provides a set of block locations.  

See the FileSystem abstract class for details on that.

On Fri, Sep 27, 2013 at 7:02 PM, Peyman Mohajerian <mo...@gmail.com> wrote:

For the JobClient to compute the input splits doesn't it need to contact Name Node. Only Name Node knows where the splits are, how can it compute it without that additional call?
>
>
>
>
>On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com> wrote:
>
>The input splits are not copied, only the information on the location of the splits is copied to the jobtracker so that it can assign tasktrackers which are local to the split.
>>
>>
>>Check the Job Initialization section at 
>>http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>>
>>
>>
>>To create the list of tasks to run, the job scheduler first retrieves the input splits computed by the JobClient from the shared filesystem (step 6). It then creates one map task for each split. The number of reduce tasks to create is determined by the mapred.reduce.tasks property in the JobConf, which is set by the setNumReduceTasks() method, and the scheduler simply creates this number of reduce tasks to be run. Tasks are given IDs at this point.
>>
>>
>>
>>Best Regards,
>>Sonal
>>Nube Technologies 
>>
>>
>>
>>
>>
>>
>>
>>
>>On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>Hi
>>>I have attached the anatomy of MR from definitive guide.
>>>
>>>
>>>In step 6 it says JT/Scheduler  retrieve  input splits computed by the client from hdfs.
>>>
>>>
>>>In the above line it refers to as the client computes input splits.
>>>
>>>
>>>
>>>1. Why does the JT/Scheduler retrieve the input splits and what does it do.
>>>If it is retrieving the input split does this mean it goes to the block and reads each record 
>>>and gets the record back to JT. If so this is a lot of data movement for large files.
>>>which is not data locality. so i m getting confused.
>>>
>>>
>>>2. How does the client know how to calculate the input splits.
>>>
>>>
>>>Any help please.
>>>ThanksSai
>>
>

-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Retrieve and compute input splits

Posted by Sai Sai <sa...@yahoo.in>.

Thanks for your suggestions and replies.
I am still confused about this:

To create the list of tasks to run, the job scheduler first retrieves the input splits computed by the JobClient from the shared filesystem (step 6).

My question:

Does the input split in the above statement refer to the physical block or the logical input split.
I undersstand that the client will split the file and save the blocks at the time of writing the file to the cluster and the meta data
about the blocks is in Namenode. 
The only place where the meta data about the blocks is in NN so can v assume in step 6 is the scheduler goes to 
NN for retrieving this meta data from NN and thats what is indicated in the diagram as Shared File System HDFS.
And if this is right the input split is the physical blocks info and not the logical input split info which could be just a single line
if v r using TextInuptFormat  the default one.
Any suggestions.
Thanks
Sai

________________________________
 From: Jay Vyas <ja...@gmail.com>
To: "common-user@hadoop.apache.org" <us...@hadoop.apache.org> 
Cc: Sai Sai <sa...@yahoo.in> 
Sent: Saturday, 28 September 2013 5:35 AM
Subject: Re: Retrieve and compute input splits

Technically, the block locations are provided by the InputSplit which in the FileInputFormat case, is provided by the FileSystem Interface.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputSplit.html

The thing to realize here is that the FileSystem implementation is provided at runtime - so the InputSplit class is responsible to create a FileSystem implementation using reflection, and then call the getBlockLocations of on a given file or set of files which the input split is corresponding to.

I think your confusion here lies in the fact that the input splits create a filesystem, however, they dont know what the filesystem implementation actually is - they only rely on the abstract contract, which provides a set of block locations.  

See the FileSystem abstract class for details on that.

On Fri, Sep 27, 2013 at 7:02 PM, Peyman Mohajerian <mo...@gmail.com> wrote:

For the JobClient to compute the input splits doesn't it need to contact Name Node. Only Name Node knows where the splits are, how can it compute it without that additional call?
>
>
>
>
>On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com> wrote:
>
>The input splits are not copied, only the information on the location of the splits is copied to the jobtracker so that it can assign tasktrackers which are local to the split.
>>
>>
>>Check the Job Initialization section at 
>>http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>>
>>
>>
>>To create the list of tasks to run, the job scheduler first retrieves the input splits computed by the JobClient from the shared filesystem (step 6). It then creates one map task for each split. The number of reduce tasks to create is determined by the mapred.reduce.tasks property in the JobConf, which is set by the setNumReduceTasks() method, and the scheduler simply creates this number of reduce tasks to be run. Tasks are given IDs at this point.
>>
>>
>>
>>Best Regards,
>>Sonal
>>Nube Technologies 
>>
>>
>>
>>
>>
>>
>>
>>
>>On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>Hi
>>>I have attached the anatomy of MR from definitive guide.
>>>
>>>
>>>In step 6 it says JT/Scheduler  retrieve  input splits computed by the client from hdfs.
>>>
>>>
>>>In the above line it refers to as the client computes input splits.
>>>
>>>
>>>
>>>1. Why does the JT/Scheduler retrieve the input splits and what does it do.
>>>If it is retrieving the input split does this mean it goes to the block and reads each record 
>>>and gets the record back to JT. If so this is a lot of data movement for large files.
>>>which is not data locality. so i m getting confused.
>>>
>>>
>>>2. How does the client know how to calculate the input splits.
>>>
>>>
>>>Any help please.
>>>ThanksSai
>>
>

-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Retrieve and compute input splits

Posted by Jay Vyas <ja...@gmail.com>.

Technically, the block locations are provided by the InputSplit which in
the FileInputFormat case, is provided by the FileSystem Interface.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputSplit.html

The thing to realize here is that the FileSystem implementation is provided
at runtime - so the InputSplit class is responsible to create a FileSystem
implementation using reflection, and then call the getBlockLocations of on
a given file or set of files which the input split is corresponding to.

I think your confusion here lies in the fact that the input splits create a
filesystem, however, they dont know what the filesystem implementation
actually is - they only rely on the abstract contract, which provides a set
of block locations.

See the FileSystem abstract class for details on that.


On Fri, Sep 27, 2013 at 7:02 PM, Peyman Mohajerian <mo...@gmail.com>wrote:

> For the JobClient to compute the input splits doesn't it need to contact
> Name Node. Only Name Node knows where the splits are, how can it compute it
> without that additional call?
>
>
> On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com>wrote:
>
>> The input splits are not copied, only the information on the location of
>> the splits is copied to the jobtracker so that it can assign tasktrackers
>> which are local to the split.
>>
>> Check the Job Initialization section at
>>
>> http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>>
>> To create the list of tasks to run, the job scheduler first retrieves
>> the input splits computed by the JobClient from the shared filesystem
>> (step 6). It then creates one map task for each split. The number of reduce
>> tasks to create is determined by the mapred.reduce.tasks property in the
>> JobConf, which is set by the setNumReduceTasks() method, and the
>> scheduler simply creates this number of reduce tasks to be run. Tasks are
>> given IDs at this point.
>>
>> Best Regards,
>> Sonal
>> Nube Technologies <http://www.nubetech.co>
>>
>>  <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>> On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>> Hi
>>> I have attached the anatomy of MR from definitive guide.
>>>
>>> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
>>> client from hdfs.
>>>
>>> In the above line it refers to as the client computes input splits.
>>>
>>> 1. Why does the JT/Scheduler retrieve the input splits and what does it
>>> do.
>>> If it is retrieving the input split does this mean it goes to the block
>>> and reads each record
>>> and gets the record back to JT. If so this is a lot of data movement for
>>> large files.
>>> which is not data locality. so i m getting confused.
>>>
>>> 2. How does the client know how to calculate the input splits.
>>>
>>> Any help please.
>>> Thanks
>>> Sai
>>>
>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Retrieve and compute input splits

Posted by Jay Vyas <ja...@gmail.com>.

Technically, the block locations are provided by the InputSplit which in
the FileInputFormat case, is provided by the FileSystem Interface.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputSplit.html

The thing to realize here is that the FileSystem implementation is provided
at runtime - so the InputSplit class is responsible to create a FileSystem
implementation using reflection, and then call the getBlockLocations of on
a given file or set of files which the input split is corresponding to.

I think your confusion here lies in the fact that the input splits create a
filesystem, however, they dont know what the filesystem implementation
actually is - they only rely on the abstract contract, which provides a set
of block locations.

See the FileSystem abstract class for details on that.


On Fri, Sep 27, 2013 at 7:02 PM, Peyman Mohajerian <mo...@gmail.com>wrote:

> For the JobClient to compute the input splits doesn't it need to contact
> Name Node. Only Name Node knows where the splits are, how can it compute it
> without that additional call?
>
>
> On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com>wrote:
>
>> The input splits are not copied, only the information on the location of
>> the splits is copied to the jobtracker so that it can assign tasktrackers
>> which are local to the split.
>>
>> Check the Job Initialization section at
>>
>> http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>>
>> To create the list of tasks to run, the job scheduler first retrieves
>> the input splits computed by the JobClient from the shared filesystem
>> (step 6). It then creates one map task for each split. The number of reduce
>> tasks to create is determined by the mapred.reduce.tasks property in the
>> JobConf, which is set by the setNumReduceTasks() method, and the
>> scheduler simply creates this number of reduce tasks to be run. Tasks are
>> given IDs at this point.
>>
>> Best Regards,
>> Sonal
>> Nube Technologies <http://www.nubetech.co>
>>
>>  <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>> On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>> Hi
>>> I have attached the anatomy of MR from definitive guide.
>>>
>>> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
>>> client from hdfs.
>>>
>>> In the above line it refers to as the client computes input splits.
>>>
>>> 1. Why does the JT/Scheduler retrieve the input splits and what does it
>>> do.
>>> If it is retrieving the input split does this mean it goes to the block
>>> and reads each record
>>> and gets the record back to JT. If so this is a lot of data movement for
>>> large files.
>>> which is not data locality. so i m getting confused.
>>>
>>> 2. How does the client know how to calculate the input splits.
>>>
>>> Any help please.
>>> Thanks
>>> Sai
>>>
>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Retrieve and compute input splits

Posted by Jay Vyas <ja...@gmail.com>.

Technically, the block locations are provided by the InputSplit which in
the FileInputFormat case, is provided by the FileSystem Interface.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputSplit.html

The thing to realize here is that the FileSystem implementation is provided
at runtime - so the InputSplit class is responsible to create a FileSystem
implementation using reflection, and then call the getBlockLocations of on
a given file or set of files which the input split is corresponding to.

I think your confusion here lies in the fact that the input splits create a
filesystem, however, they dont know what the filesystem implementation
actually is - they only rely on the abstract contract, which provides a set
of block locations.

See the FileSystem abstract class for details on that.


On Fri, Sep 27, 2013 at 7:02 PM, Peyman Mohajerian <mo...@gmail.com>wrote:

> For the JobClient to compute the input splits doesn't it need to contact
> Name Node. Only Name Node knows where the splits are, how can it compute it
> without that additional call?
>
>
> On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com>wrote:
>
>> The input splits are not copied, only the information on the location of
>> the splits is copied to the jobtracker so that it can assign tasktrackers
>> which are local to the split.
>>
>> Check the Job Initialization section at
>>
>> http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>>
>> To create the list of tasks to run, the job scheduler first retrieves
>> the input splits computed by the JobClient from the shared filesystem
>> (step 6). It then creates one map task for each split. The number of reduce
>> tasks to create is determined by the mapred.reduce.tasks property in the
>> JobConf, which is set by the setNumReduceTasks() method, and the
>> scheduler simply creates this number of reduce tasks to be run. Tasks are
>> given IDs at this point.
>>
>> Best Regards,
>> Sonal
>> Nube Technologies <http://www.nubetech.co>
>>
>>  <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>> On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>> Hi
>>> I have attached the anatomy of MR from definitive guide.
>>>
>>> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
>>> client from hdfs.
>>>
>>> In the above line it refers to as the client computes input splits.
>>>
>>> 1. Why does the JT/Scheduler retrieve the input splits and what does it
>>> do.
>>> If it is retrieving the input split does this mean it goes to the block
>>> and reads each record
>>> and gets the record back to JT. If so this is a lot of data movement for
>>> large files.
>>> which is not data locality. so i m getting confused.
>>>
>>> 2. How does the client know how to calculate the input splits.
>>>
>>> Any help please.
>>> Thanks
>>> Sai
>>>
>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Retrieve and compute input splits

Posted by Jay Vyas <ja...@gmail.com>.

Technically, the block locations are provided by the InputSplit which in
the FileInputFormat case, is provided by the FileSystem Interface.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputSplit.html

The thing to realize here is that the FileSystem implementation is provided
at runtime - so the InputSplit class is responsible to create a FileSystem
implementation using reflection, and then call the getBlockLocations of on
a given file or set of files which the input split is corresponding to.

I think your confusion here lies in the fact that the input splits create a
filesystem, however, they dont know what the filesystem implementation
actually is - they only rely on the abstract contract, which provides a set
of block locations.

See the FileSystem abstract class for details on that.


On Fri, Sep 27, 2013 at 7:02 PM, Peyman Mohajerian <mo...@gmail.com>wrote:

> For the JobClient to compute the input splits doesn't it need to contact
> Name Node. Only Name Node knows where the splits are, how can it compute it
> without that additional call?
>
>
> On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com>wrote:
>
>> The input splits are not copied, only the information on the location of
>> the splits is copied to the jobtracker so that it can assign tasktrackers
>> which are local to the split.
>>
>> Check the Job Initialization section at
>>
>> http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>>
>> To create the list of tasks to run, the job scheduler first retrieves
>> the input splits computed by the JobClient from the shared filesystem
>> (step 6). It then creates one map task for each split. The number of reduce
>> tasks to create is determined by the mapred.reduce.tasks property in the
>> JobConf, which is set by the setNumReduceTasks() method, and the
>> scheduler simply creates this number of reduce tasks to be run. Tasks are
>> given IDs at this point.
>>
>> Best Regards,
>> Sonal
>> Nube Technologies <http://www.nubetech.co>
>>
>>  <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>> On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>> Hi
>>> I have attached the anatomy of MR from definitive guide.
>>>
>>> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
>>> client from hdfs.
>>>
>>> In the above line it refers to as the client computes input splits.
>>>
>>> 1. Why does the JT/Scheduler retrieve the input splits and what does it
>>> do.
>>> If it is retrieving the input split does this mean it goes to the block
>>> and reads each record
>>> and gets the record back to JT. If so this is a lot of data movement for
>>> large files.
>>> which is not data locality. so i m getting confused.
>>>
>>> 2. How does the client know how to calculate the input splits.
>>>
>>> Any help please.
>>> Thanks
>>> Sai
>>>
>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Retrieve and compute input splits

Posted by Peyman Mohajerian <mo...@gmail.com>.

For the JobClient to compute the input splits doesn't it need to contact
Name Node. Only Name Node knows where the splits are, how can it compute it
without that additional call?


On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com> wrote:

> The input splits are not copied, only the information on the location of
> the splits is copied to the jobtracker so that it can assign tasktrackers
> which are local to the split.
>
> Check the Job Initialization section at
>
> http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>
> To create the list of tasks to run, the job scheduler first retrieves the
> input splits computed by the JobClient from the shared filesystem (step
> 6). It then creates one map task for each split. The number of reduce tasks
> to create is determined by the mapred.reduce.tasks property in the JobConf,
> which is set by the setNumReduceTasks() method, and the scheduler simply
> creates this number of reduce tasks to be run. Tasks are given IDs at this
> point.
>
> Best Regards,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
>  <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
> On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>
>> Hi
>> I have attached the anatomy of MR from definitive guide.
>>
>> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
>> client from hdfs.
>>
>> In the above line it refers to as the client computes input splits.
>>
>> 1. Why does the JT/Scheduler retrieve the input splits and what does it
>> do.
>> If it is retrieving the input split does this mean it goes to the block
>> and reads each record
>> and gets the record back to JT. If so this is a lot of data movement for
>> large files.
>> which is not data locality. so i m getting confused.
>>
>> 2. How does the client know how to calculate the input splits.
>>
>> Any help please.
>> Thanks
>> Sai
>>
>
>

Re: Retrieve and compute input splits

Posted by Peyman Mohajerian <mo...@gmail.com>.

For the JobClient to compute the input splits doesn't it need to contact
Name Node. Only Name Node knows where the splits are, how can it compute it
without that additional call?


On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com> wrote:

> The input splits are not copied, only the information on the location of
> the splits is copied to the jobtracker so that it can assign tasktrackers
> which are local to the split.
>
> Check the Job Initialization section at
>
> http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>
> To create the list of tasks to run, the job scheduler first retrieves the
> input splits computed by the JobClient from the shared filesystem (step
> 6). It then creates one map task for each split. The number of reduce tasks
> to create is determined by the mapred.reduce.tasks property in the JobConf,
> which is set by the setNumReduceTasks() method, and the scheduler simply
> creates this number of reduce tasks to be run. Tasks are given IDs at this
> point.
>
> Best Regards,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
>  <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
> On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>
>> Hi
>> I have attached the anatomy of MR from definitive guide.
>>
>> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
>> client from hdfs.
>>
>> In the above line it refers to as the client computes input splits.
>>
>> 1. Why does the JT/Scheduler retrieve the input splits and what does it
>> do.
>> If it is retrieving the input split does this mean it goes to the block
>> and reads each record
>> and gets the record back to JT. If so this is a lot of data movement for
>> large files.
>> which is not data locality. so i m getting confused.
>>
>> 2. How does the client know how to calculate the input splits.
>>
>> Any help please.
>> Thanks
>> Sai
>>
>
>

Re: Retrieve and compute input splits

Posted by Peyman Mohajerian <mo...@gmail.com>.

For the JobClient to compute the input splits doesn't it need to contact
Name Node. Only Name Node knows where the splits are, how can it compute it
without that additional call?


On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com> wrote:

> The input splits are not copied, only the information on the location of
> the splits is copied to the jobtracker so that it can assign tasktrackers
> which are local to the split.
>
> Check the Job Initialization section at
>
> http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>
> To create the list of tasks to run, the job scheduler first retrieves the
> input splits computed by the JobClient from the shared filesystem (step
> 6). It then creates one map task for each split. The number of reduce tasks
> to create is determined by the mapred.reduce.tasks property in the JobConf,
> which is set by the setNumReduceTasks() method, and the scheduler simply
> creates this number of reduce tasks to be run. Tasks are given IDs at this
> point.
>
> Best Regards,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
>  <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
> On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>
>> Hi
>> I have attached the anatomy of MR from definitive guide.
>>
>> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
>> client from hdfs.
>>
>> In the above line it refers to as the client computes input splits.
>>
>> 1. Why does the JT/Scheduler retrieve the input splits and what does it
>> do.
>> If it is retrieving the input split does this mean it goes to the block
>> and reads each record
>> and gets the record back to JT. If so this is a lot of data movement for
>> large files.
>> which is not data locality. so i m getting confused.
>>
>> 2. How does the client know how to calculate the input splits.
>>
>> Any help please.
>> Thanks
>> Sai
>>
>
>

Re: Retrieve and compute input splits

Posted by Peyman Mohajerian <mo...@gmail.com>.

For the JobClient to compute the input splits doesn't it need to contact
Name Node. Only Name Node knows where the splits are, how can it compute it
without that additional call?


On Fri, Sep 27, 2013 at 1:41 AM, Sonal Goyal <so...@gmail.com> wrote:

> The input splits are not copied, only the information on the location of
> the splits is copied to the jobtracker so that it can assign tasktrackers
> which are local to the split.
>
> Check the Job Initialization section at
>
> http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/
>
> To create the list of tasks to run, the job scheduler first retrieves the
> input splits computed by the JobClient from the shared filesystem (step
> 6). It then creates one map task for each split. The number of reduce tasks
> to create is determined by the mapred.reduce.tasks property in the JobConf,
> which is set by the setNumReduceTasks() method, and the scheduler simply
> creates this number of reduce tasks to be run. Tasks are given IDs at this
> point.
>
> Best Regards,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
>  <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
> On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:
>
>> Hi
>> I have attached the anatomy of MR from definitive guide.
>>
>> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
>> client from hdfs.
>>
>> In the above line it refers to as the client computes input splits.
>>
>> 1. Why does the JT/Scheduler retrieve the input splits and what does it
>> do.
>> If it is retrieving the input split does this mean it goes to the block
>> and reads each record
>> and gets the record back to JT. If so this is a lot of data movement for
>> large files.
>> which is not data locality. so i m getting confused.
>>
>> 2. How does the client know how to calculate the input splits.
>>
>> Any help please.
>> Thanks
>> Sai
>>
>
>

Re: Retrieve and compute input splits

Posted by Sonal Goyal <so...@gmail.com>.

The input splits are not copied, only the information on the location of
the splits is copied to the jobtracker so that it can assign tasktrackers
which are local to the split.

Check the Job Initialization section at
http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/

To create the list of tasks to run, the job scheduler first retrieves the
input splits computed by the JobClient from the shared filesystem (step 6).
It then creates one map task for each split. The number of reduce tasks to
create is determined by the mapred.reduce.tasks property in the JobConf,
which is set by the setNumReduceTasks() method, and the scheduler simply
creates this number of reduce tasks to be run. Tasks are given IDs at this
point.

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>

On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have attached the anatomy of MR from definitive guide.
>
> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
> client from hdfs.
>
> In the above line it refers to as the client computes input splits.
>
> 1. Why does the JT/Scheduler retrieve the input splits and what does it do.
> If it is retrieving the input split does this mean it goes to the block
> and reads each record
> and gets the record back to JT. If so this is a lot of data movement for
> large files.
> which is not data locality. so i m getting confused.
>
> 2. How does the client know how to calculate the input splits.
>
> Any help please.
> Thanks
> Sai
>

Re: Retrieve and compute input splits

Posted by Sonal Goyal <so...@gmail.com>.

The input splits are not copied, only the information on the location of
the splits is copied to the jobtracker so that it can assign tasktrackers
which are local to the split.

Check the Job Initialization section at
http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/

To create the list of tasks to run, the job scheduler first retrieves the
input splits computed by the JobClient from the shared filesystem (step 6).
It then creates one map task for each split. The number of reduce tasks to
create is determined by the mapred.reduce.tasks property in the JobConf,
which is set by the setNumReduceTasks() method, and the scheduler simply
creates this number of reduce tasks to be run. Tasks are given IDs at this
point.

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>

On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have attached the anatomy of MR from definitive guide.
>
> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
> client from hdfs.
>
> In the above line it refers to as the client computes input splits.
>
> 1. Why does the JT/Scheduler retrieve the input splits and what does it do.
> If it is retrieving the input split does this mean it goes to the block
> and reads each record
> and gets the record back to JT. If so this is a lot of data movement for
> large files.
> which is not data locality. so i m getting confused.
>
> 2. How does the client know how to calculate the input splits.
>
> Any help please.
> Thanks
> Sai
>

Re: Retrieve and compute input splits

Posted by Sonal Goyal <so...@gmail.com>.

The input splits are not copied, only the information on the location of
the splits is copied to the jobtracker so that it can assign tasktrackers
which are local to the split.

Check the Job Initialization section at
http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/

To create the list of tasks to run, the job scheduler first retrieves the
input splits computed by the JobClient from the shared filesystem (step 6).
It then creates one map task for each split. The number of reduce tasks to
create is determined by the mapred.reduce.tasks property in the JobConf,
which is set by the setNumReduceTasks() method, and the scheduler simply
creates this number of reduce tasks to be run. Tasks are given IDs at this
point.

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>

On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have attached the anatomy of MR from definitive guide.
>
> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
> client from hdfs.
>
> In the above line it refers to as the client computes input splits.
>
> 1. Why does the JT/Scheduler retrieve the input splits and what does it do.
> If it is retrieving the input split does this mean it goes to the block
> and reads each record
> and gets the record back to JT. If so this is a lot of data movement for
> large files.
> which is not data locality. so i m getting confused.
>
> 2. How does the client know how to calculate the input splits.
>
> Any help please.
> Thanks
> Sai
>

Re: Retrieve and compute input splits

Posted by Sonal Goyal <so...@gmail.com>.

The input splits are not copied, only the information on the location of
the splits is copied to the jobtracker so that it can assign tasktrackers
which are local to the split.

Check the Job Initialization section at
http://answers.oreilly.com/topic/459-anatomy-of-a-mapreduce-job-run-with-hadoop/

To create the list of tasks to run, the job scheduler first retrieves the
input splits computed by the JobClient from the shared filesystem (step 6).
It then creates one map task for each split. The number of reduce tasks to
create is determined by the mapred.reduce.tasks property in the JobConf,
which is set by the setNumReduceTasks() method, and the scheduler simply
creates this number of reduce tasks to be run. Tasks are given IDs at this
point.

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>

On Fri, Sep 27, 2013 at 10:55 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have attached the anatomy of MR from definitive guide.
>
> In step 6 it says JT/Scheduler  retrieve  input splits computed by the
> client from hdfs.
>
> In the above line it refers to as the client computes input splits.
>
> 1. Why does the JT/Scheduler retrieve the input splits and what does it do.
> If it is retrieving the input split does this mean it goes to the block
> and reads each record
> and gets the record back to JT. If so this is a lot of data movement for
> large files.
> which is not data locality. so i m getting confused.
>
> 2. How does the client know how to calculate the input splits.
>
> Any help please.
> Thanks
> Sai
>

Re: Retrieve and compute input splits

Posted by Sai Sai <sa...@yahoo.in>.

Hi
I have attached the anatomy of MR from definitive guide.

In step 6 it says JT/Scheduler  retrieve  input splits computed by the client from hdfs.

In the above line it refers to as the client computes input splits.


1. Why does the JT/Scheduler retrieve the input splits and what does it do.
If it is retrieving the input split does this mean it goes to the block and reads each record 
and gets the record back to JT. If so this is a lot of data movement for large files.
which is not data locality. so i m getting confused.

2. How does the client know how to calculate the input splits.

Any help please.
Thanks
Sai

Re: 2 Map tasks running for a small input file

Posted by Viji R <vi...@cloudera.com>.

Hi,

Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
avoid this.

Regards,
Viji

On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai <sa...@yahoo.in> wrote:
> Hi
> Here is the input file for the wordcount job:
> ******************
> Hi This is a simple test.
> Hi Hadoop how r u.
> Hello Hello.
> Hi Hi.
> Hadoop Hadoop Welcome.
> ******************
>
> After running the wordcount successfully
> here r the counters info:
>
> ***************
> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> Launched reduce tasks 0 0 1
> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> Launched map tasks 0 0 2
> Data-local map tasks 0 0 2
> SLOTS_MILLIS_REDUCES 0 0 9,199
> ***************
> My question why r there 2 launched map tasks when i have only a small file.
> Per my understanding it is only 1 block.
> and should be only 1 split.
> Then for each line a map computation should occur
> but it shows 2 map tasks.
> Please let me know.
> Thanks
> Sai
>

Re: Retrieve and compute input splits

Posted by Sai Sai <sa...@yahoo.in>.

Hi
I have attached the anatomy of MR from definitive guide.

In step 6 it says JT/Scheduler  retrieve  input splits computed by the client from hdfs.

In the above line it refers to as the client computes input splits.


1. Why does the JT/Scheduler retrieve the input splits and what does it do.
If it is retrieving the input split does this mean it goes to the block and reads each record 
and gets the record back to JT. If so this is a lot of data movement for large files.
which is not data locality. so i m getting confused.

2. How does the client know how to calculate the input splits.

Any help please.
Thanks
Sai

Re: Retrieve and compute input splits

Posted by Sai Sai <sa...@yahoo.in>.

Hi
I have attached the anatomy of MR from definitive guide.

In step 6 it says JT/Scheduler  retrieve  input splits computed by the client from hdfs.

In the above line it refers to as the client computes input splits.


1. Why does the JT/Scheduler retrieve the input splits and what does it do.
If it is retrieving the input split does this mean it goes to the block and reads each record 
and gets the record back to JT. If so this is a lot of data movement for large files.
which is not data locality. so i m getting confused.

2. How does the client know how to calculate the input splits.

Any help please.
Thanks
Sai

Re: Retrieve and compute input splits

Posted by Sai Sai <sa...@yahoo.in>.

Hi
I have attached the anatomy of MR from definitive guide.

In step 6 it says JT/Scheduler  retrieve  input splits computed by the client from hdfs.

In the above line it refers to as the client computes input splits.


1. Why does the JT/Scheduler retrieve the input splits and what does it do.
If it is retrieving the input split does this mean it goes to the block and reads each record 
and gets the record back to JT. If so this is a lot of data movement for large files.
which is not data locality. so i m getting confused.

2. How does the client know how to calculate the input splits.

Any help please.
Thanks
Sai

Re: 2 Map tasks running for a small input file

Posted by Sai Sai <sa...@yahoo.in>.

Hi
Here is the input file for the wordcount job:
******************

Hi This is a simple test.
Hi Hadoop how r u.
Hello Hello.
Hi Hi.
Hadoop Hadoop Welcome.
******************


After running the wordcount successfully 
here r the counters info:

***************
Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
Launched reduce tasks 0 0 1
Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
Total time spent by all maps waiting after reserving slots (ms) 0 0 0
Launched map tasks 0 0 2
Data-local map tasks 0 0 2
SLOTS_MILLIS_REDUCES 0 0 9,199
***************

My question why r there 2 launched map tasks when i have only a small file.
Per my understanding it is only 1 block.
and should be only 1 split.
Then for each line a map computation should occur
but it shows 2 map tasks.
Please let me know.
Thanks
Sai

Re: Business Analysts in Hadoop World

Posted by Michael Aro <m....@gmail.com>.

Hi Vijay,

Scott Gnau of Teradata Labs mentioned something related in the recent
Hadoop summit in San Jose. The title of his presentation was "Putting
Hadoop to Work in the Enterprise" and you can watch the video via this
link: http://hadoopsummit.org/san-jose/keynote-day1/. He mentioned the
business analyst around time 07:00 in the video. All the videos were great!

Mike.


On Fri, Jun 28, 2013 at 10:49 AM, <sa...@yahoo.com> wrote:

> Is Data scientist in hadoop same as BA in IT.
> Sent from BlackBerry® on Airtel
> ------------------------------
> *From: * Michael Forage <Mi...@livenation.co.uk>
> *Date: *Fri, 28 Jun 2013 13:57:24 +0000
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *RE: Business Analysts in Hadoop World
>
>  Hi Vijay****
>
> ** **
>
> I’m afraid I’m not experienced or specialised enough in either Hadoop or
> the broader Big data industry to give any advice on career paths****
>
> ** **
>
> Obviously different organisations expect completely different levels of
> technical contribution from their Business Analysts but in my experience
> the ability to collect, evaluate and interpret business requirements into
> some kind of functional specification document is key. The responsibility
> for subsequent technical specifications based applied knowledge of the big
> data tools at disposal will probably sit with your Solution Architect.
> Often a BA wouldn’t care how a solution is implemented under the covers.
> However, in the same way that a BA may define user-flows in a use-case they
> could outline a conceptual data processing flow if they understand the
> source data and requirements well enough.****
>
> ** **
>
> There are so many new tools and technologies in this space already that,
> unless you have a specific requirement to meet, it can be pretty
> overwhelming. I’d just start by concentrating on getting an understanding
> of map reduce concepts. Sure, it always helps if you’ve the time to get
> some hands-on technical  experience but that’s not a trivial undertaking
> from a standing start and it may be completely irrelevant for you in the
> long run as that’s not what a BA is paid to do.****
>
> ** **
>
> Sorry I can’t be more help****
>
> Mike****
>
> ** **
>
> *From:* Vijaya Narayana Reddy Bhoomi Reddy [mailto:
> vijaya.bhoomi@huawei.com]
> *Sent:* 28 June 2013 14:10
> *To:* user@hadoop.apache.org
> *Subject:* RE: Business Analysts in Hadoop World****
>
> ** **
>
> Michael,****
>
> ** **
>
> Thanks for your advice. I am just confused because I could not see a clear
> career path for Business Analysts in the Hadoop world. May be because, it’s
> an evolving field or maybe I am not yet aware of the same despite reading
> some primary content on the internet. However, I firmly believe the Big
> Data space is going to be very big in the next years and I would like to be
> part of it and contribute. I would like to know from you more on the role
> and responsibilities a BA can perform in this space and the possible areas
> / technologies which I need myself to be prepared for.****
>
> ** **
>
> Thanks****
>
> Vijay****
>
> ** **
>
> *From:* Michael Forage [mailto:Michael.Forage@livenation.co.uk<Mi...@livenation.co.uk>]
>
> *Sent:* Friday, June 28, 2013 5:23 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: Business Analysts in Hadoop World****
>
> ** **
>
> Hi Vijay****
>
>  ****
>
> My advice is to carefully consider the scope of the role you’re aiming for
> ****
>
>  ****
>
> As a BA I expect that you’d be able to add value by understanding your
> business data processing challenges and turning them into specifications
> for map/reduce jobs. This doesn’t require you to have any Java coding
> skills as such, just a good handle of the map/reduce concepts. It also
> helps if you understand common use-cases associated with these technologies
> (as it’s really an ecosystem of related toolsets) as well as what they’re
> not so good for.****
>
>  ****
>
> This would allow you to contribute to solution design on behalf of the
> business but assumes you’re not concerned with the actual
> implementation/administration side of things. Definitely only bother
> re-learning Java if you’re going to be the one writing the code. You do
> need to have had experience of working with data of some kind (this is a
> data processing environment at the end of the day) but simply reading a
> decent Hadoop book and googling a few websites should give you a decent
> enough background as a BA****
>
>  ****
>
> Cheers****
>
> Mike****
>
>  ****
>
> *From:* Lokesh Basu [mailto:lokesh.basu@gmail.com <lo...@gmail.com>]
>
> *Sent:* 28 June 2013 12:35
> *To:* user@hadoop.apache.org
> *Subject:* Re: Business Analysts in Hadoop World****
>
>  ****
>
> Dear Vijay,
>
> If you are a beginner in the open source project then I would recommend
> you to first get familiar with Java and some version control system and
> then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.
>
> If you are not aware of the Hadoop project then you should go through some
> online text/videos to get the insight.
>
> Best of luck****
>
>
> ****
>
>
> *Lokesh Chandra Basu*****
>
> B. Tech****
>
> Computer Science and Engineering****
>
> Indian Institute of Technology, Roorkee****
>
> India(GMT +5hr 30min)****
>
>  ****
>
>  ****
>
>  ****
>
> On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
> vijaya.bhoomi@huawei.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am just trying to get myself acquainted with Hadoop and other related
> technologies. I am very much fascinated with the potential of the Big Data
> world and hence would like to be part of it!! ****
>
> However, it has been a while I have done any coding. Earlier for a brief
> period of time during early days of my career, I have done some work in
> Java. All these days, I am working as a Business Analyst in the CRM space.
> ****
>
>  ****
>
> ·         Before I start exploring Hadoop world, I would like to hear
> your thoughts on the following queries:****
>
>  ****
>
> ·         Being a business analyst, what would be the possible career
> opportunities in the Hadoop space?****
>
>  ****
>
> ·         Is it necessary to have a strong technical background before
> jumping into Hadoop? If so, which technologies need to be learnt primarily?
> Java, SQL etc?****
>
>  ****
>
> ·         What are the various certifications available in the Hadoop
> world? Are there any certifications for Business Analysts?****
>
>  ****
>
> Please let me know your valuable thoughts.****
>
>  ****
>
> Thanks****
>
> Vijay****
>
>  ****
>
> ****
>
> *Michael Forage* | Solutions Architect - Insight Services
> Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44
> 7808 174404
> Address: 4 Pentonville Road | London | N1 9HF | United Kingdom****
>
> Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House
> 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered
> in England and Wales.****
>
> This message is confidential and may be legally privileged or otherwise
> protected from disclosure. If you are not the intended recipient, please
> telephone or email the sender and delete this message and any attachment
> from your system; you must not copy or disclose the contents of this
> message or any attachment to any other person.****
>

Re: 2 Map tasks running for a small input file

Posted by Sai Sai <sa...@yahoo.in>.

Hi
Here is the input file for the wordcount job:
******************

Hi This is a simple test.
Hi Hadoop how r u.
Hello Hello.
Hi Hi.
Hadoop Hadoop Welcome.
******************


After running the wordcount successfully 
here r the counters info:

***************
Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
Launched reduce tasks 0 0 1
Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
Total time spent by all maps waiting after reserving slots (ms) 0 0 0
Launched map tasks 0 0 2
Data-local map tasks 0 0 2
SLOTS_MILLIS_REDUCES 0 0 9,199
***************

My question why r there 2 launched map tasks when i have only a small file.
Per my understanding it is only 1 block.
and should be only 1 split.
Then for each line a map computation should occur
but it shows 2 map tasks.
Please let me know.
Thanks
Sai

Re: 2 Map tasks running for a small input file

Posted by Sai Sai <sa...@yahoo.in>.

Hi
Here is the input file for the wordcount job:
******************

Hi This is a simple test.
Hi Hadoop how r u.
Hello Hello.
Hi Hi.
Hadoop Hadoop Welcome.
******************


After running the wordcount successfully 
here r the counters info:

***************
Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
Launched reduce tasks 0 0 1
Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
Total time spent by all maps waiting after reserving slots (ms) 0 0 0
Launched map tasks 0 0 2
Data-local map tasks 0 0 2
SLOTS_MILLIS_REDUCES 0 0 9,199
***************

My question why r there 2 launched map tasks when i have only a small file.
Per my understanding it is only 1 block.
and should be only 1 split.
Then for each line a map computation should occur
but it shows 2 map tasks.
Please let me know.
Thanks
Sai

Re: 2 Map tasks running for a small input file

Posted by Sai Sai <sa...@yahoo.in>.

Hi
Here is the input file for the wordcount job:
******************

Hi This is a simple test.
Hi Hadoop how r u.
Hello Hello.
Hi Hi.
Hadoop Hadoop Welcome.
******************


After running the wordcount successfully 
here r the counters info:

***************
Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
Launched reduce tasks 0 0 1
Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
Total time spent by all maps waiting after reserving slots (ms) 0 0 0
Launched map tasks 0 0 2
Data-local map tasks 0 0 2
SLOTS_MILLIS_REDUCES 0 0 9,199
***************

My question why r there 2 launched map tasks when i have only a small file.
Per my understanding it is only 1 block.
and should be only 1 split.
Then for each line a map computation should occur
but it shows 2 map tasks.
Please let me know.
Thanks
Sai

Re: Business Analysts in Hadoop World

Posted by Michael Aro <m....@gmail.com>.

Hi Vijay,

Scott Gnau of Teradata Labs mentioned something related in the recent
Hadoop summit in San Jose. The title of his presentation was "Putting
Hadoop to Work in the Enterprise" and you can watch the video via this
link: http://hadoopsummit.org/san-jose/keynote-day1/. He mentioned the
business analyst around time 07:00 in the video. All the videos were great!

Mike.


On Fri, Jun 28, 2013 at 10:49 AM, <sa...@yahoo.com> wrote:

> Is Data scientist in hadoop same as BA in IT.
> Sent from BlackBerry® on Airtel
> ------------------------------
> *From: * Michael Forage <Mi...@livenation.co.uk>
> *Date: *Fri, 28 Jun 2013 13:57:24 +0000
> *To: *user@hadoop.apache.org<us...@hadoop.apache.org>
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *RE: Business Analysts in Hadoop World
>
>  Hi Vijay****
>
> ** **
>
> I’m afraid I’m not experienced or specialised enough in either Hadoop or
> the broader Big data industry to give any advice on career paths****
>
> ** **
>
> Obviously different organisations expect completely different levels of
> technical contribution from their Business Analysts but in my experience
> the ability to collect, evaluate and interpret business requirements into
> some kind of functional specification document is key. The responsibility
> for subsequent technical specifications based applied knowledge of the big
> data tools at disposal will probably sit with your Solution Architect.
> Often a BA wouldn’t care how a solution is implemented under the covers.
> However, in the same way that a BA may define user-flows in a use-case they
> could outline a conceptual data processing flow if they understand the
> source data and requirements well enough.****
>
> ** **
>
> There are so many new tools and technologies in this space already that,
> unless you have a specific requirement to meet, it can be pretty
> overwhelming. I’d just start by concentrating on getting an understanding
> of map reduce concepts. Sure, it always helps if you’ve the time to get
> some hands-on technical  experience but that’s not a trivial undertaking
> from a standing start and it may be completely irrelevant for you in the
> long run as that’s not what a BA is paid to do.****
>
> ** **
>
> Sorry I can’t be more help****
>
> Mike****
>
> ** **
>
> *From:* Vijaya Narayana Reddy Bhoomi Reddy [mailto:
> vijaya.bhoomi@huawei.com]
> *Sent:* 28 June 2013 14:10
> *To:* user@hadoop.apache.org
> *Subject:* RE: Business Analysts in Hadoop World****
>
> ** **
>
> Michael,****
>
> ** **
>
> Thanks for your advice. I am just confused because I could not see a clear
> career path for Business Analysts in the Hadoop world. May be because, it’s
> an evolving field or maybe I am not yet aware of the same despite reading
> some primary content on the internet. However, I firmly believe the Big
> Data space is going to be very big in the next years and I would like to be
> part of it and contribute. I would like to know from you more on the role
> and responsibilities a BA can perform in this space and the possible areas
> / technologies which I need myself to be prepared for.****
>
> ** **
>
> Thanks****
>
> Vijay****
>
> ** **
>
> *From:* Michael Forage [mailto:Michael.Forage@livenation.co.uk<Mi...@livenation.co.uk>]
>
> *Sent:* Friday, June 28, 2013 5:23 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: Business Analysts in Hadoop World****
>
> ** **
>
> Hi Vijay****
>
>  ****
>
> My advice is to carefully consider the scope of the role you’re aiming for
> ****
>
>  ****
>
> As a BA I expect that you’d be able to add value by understanding your
> business data processing challenges and turning them into specifications
> for map/reduce jobs. This doesn’t require you to have any Java coding
> skills as such, just a good handle of the map/reduce concepts. It also
> helps if you understand common use-cases associated with these technologies
> (as it’s really an ecosystem of related toolsets) as well as what they’re
> not so good for.****
>
>  ****
>
> This would allow you to contribute to solution design on behalf of the
> business but assumes you’re not concerned with the actual
> implementation/administration side of things. Definitely only bother
> re-learning Java if you’re going to be the one writing the code. You do
> need to have had experience of working with data of some kind (this is a
> data processing environment at the end of the day) but simply reading a
> decent Hadoop book and googling a few websites should give you a decent
> enough background as a BA****
>
>  ****
>
> Cheers****
>
> Mike****
>
>  ****
>
> *From:* Lokesh Basu [mailto:lokesh.basu@gmail.com <lo...@gmail.com>]
>
> *Sent:* 28 June 2013 12:35
> *To:* user@hadoop.apache.org
> *Subject:* Re: Business Analysts in Hadoop World****
>
>  ****
>
> Dear Vijay,
>
> If you are a beginner in the open source project then I would recommend
> you to first get familiar with Java and some version control system and
> then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.
>
> If you are not aware of the Hadoop project then you should go through some
> online text/videos to get the insight.
>
> Best of luck****
>
>
> ****
>
>
> *Lokesh Chandra Basu*****
>
> B. Tech****
>
> Computer Science and Engineering****
>
> Indian Institute of Technology, Roorkee****
>
> India(GMT +5hr 30min)****
>
>  ****
>
>  ****
>
>  ****
>
> On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
> vijaya.bhoomi@huawei.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am just trying to get myself acquainted with Hadoop and other related
> technologies. I am very much fascinated with the potential of the Big Data
> world and hence would like to be part of it!! ****
>
> However, it has been a while I have done any coding. Earlier for a brief
> period of time during early days of my career, I have done some work in
> Java. All these days, I am working as a Business Analyst in the CRM space.
> ****
>
>  ****
>
> ·         Before I start exploring Hadoop world, I would like to hear
> your thoughts on the following queries:****
>
>  ****
>
> ·         Being a business analyst, what would be the possible career
> opportunities in the Hadoop space?****
>
>  ****
>
> ·         Is it necessary to have a strong technical background before
> jumping into Hadoop? If so, which technologies need to be learnt primarily?
> Java, SQL etc?****
>
>  ****
>
> ·         What are the various certifications available in the Hadoop
> world? Are there any certifications for Business Analysts?****
>
>  ****
>
> Please let me know your valuable thoughts.****
>
>  ****
>
> Thanks****
>
> Vijay****
>
>  ****
>
> ****
>
> *Michael Forage* | Solutions Architect - Insight Services
> Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44
> 7808 174404
> Address: 4 Pentonville Road | London | N1 9HF | United Kingdom****
>
> Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House
> 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered
> in England and Wales.****
>
> This message is confidential and may be legally privileged or otherwise
> protected from disclosure. If you are not the intended recipient, please
> telephone or email the sender and delete this message and any attachment
> from your system; you must not copy or disclose the contents of this
> message or any attachment to any other person.****
>

Re: Business Analysts in Hadoop World

Posted by sa...@yahoo.com.

Is Data scientist in hadoop same as BA in IT.

Sent from BlackBerry® on Airtel

-----Original Message-----
From: Michael Forage <Mi...@livenation.co.uk>
Date: Fri, 28 Jun 2013 13:57:24 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

I'm afraid I'm not experienced or specialised enough in either Hadoop or the broader Big data industry to give any advice on career paths

Obviously different organisations expect completely different levels of technical contribution from their Business Analysts but in my experience the ability to collect, evaluate and interpret business requirements into some kind of functional specification document is key. The responsibility for subsequent technical specifications based applied knowledge of the big data tools at disposal will probably sit with your Solution Architect. Often a BA wouldn't care how a solution is implemented under the covers. However, in the same way that a BA may define user-flows in a use-case they could outline a conceptual data processing flow if they understand the source data and requirements well enough.

There are so many new tools and technologies in this space already that, unless you have a specific requirement to meet, it can be pretty overwhelming. I'd just start by concentrating on getting an understanding of map reduce concepts. Sure, it always helps if you've the time to get some hands-on technical  experience but that's not a trivial undertaking from a standing start and it may be completely irrelevant for you in the long run as that's not what a BA is paid to do.

Sorry I can't be more help
Mike

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: 28 June 2013 14:10
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

Re: Business Analysts in Hadoop World

Posted by sa...@yahoo.com.

Is Data scientist in hadoop same as BA in IT.

Sent from BlackBerry® on Airtel

-----Original Message-----
From: Michael Forage <Mi...@livenation.co.uk>
Date: Fri, 28 Jun 2013 13:57:24 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

I'm afraid I'm not experienced or specialised enough in either Hadoop or the broader Big data industry to give any advice on career paths

Obviously different organisations expect completely different levels of technical contribution from their Business Analysts but in my experience the ability to collect, evaluate and interpret business requirements into some kind of functional specification document is key. The responsibility for subsequent technical specifications based applied knowledge of the big data tools at disposal will probably sit with your Solution Architect. Often a BA wouldn't care how a solution is implemented under the covers. However, in the same way that a BA may define user-flows in a use-case they could outline a conceptual data processing flow if they understand the source data and requirements well enough.

There are so many new tools and technologies in this space already that, unless you have a specific requirement to meet, it can be pretty overwhelming. I'd just start by concentrating on getting an understanding of map reduce concepts. Sure, it always helps if you've the time to get some hands-on technical  experience but that's not a trivial undertaking from a standing start and it may be completely irrelevant for you in the long run as that's not what a BA is paid to do.

Sorry I can't be more help
Mike

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: 28 June 2013 14:10
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

Re: Business Analysts in Hadoop World

Posted by sa...@yahoo.com.

Is Data scientist in hadoop same as BA in IT.

Sent from BlackBerry® on Airtel

-----Original Message-----
From: Michael Forage <Mi...@livenation.co.uk>
Date: Fri, 28 Jun 2013 13:57:24 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

I'm afraid I'm not experienced or specialised enough in either Hadoop or the broader Big data industry to give any advice on career paths

Obviously different organisations expect completely different levels of technical contribution from their Business Analysts but in my experience the ability to collect, evaluate and interpret business requirements into some kind of functional specification document is key. The responsibility for subsequent technical specifications based applied knowledge of the big data tools at disposal will probably sit with your Solution Architect. Often a BA wouldn't care how a solution is implemented under the covers. However, in the same way that a BA may define user-flows in a use-case they could outline a conceptual data processing flow if they understand the source data and requirements well enough.

There are so many new tools and technologies in this space already that, unless you have a specific requirement to meet, it can be pretty overwhelming. I'd just start by concentrating on getting an understanding of map reduce concepts. Sure, it always helps if you've the time to get some hands-on technical  experience but that's not a trivial undertaking from a standing start and it may be completely irrelevant for you in the long run as that's not what a BA is paid to do.

Sorry I can't be more help
Mike

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: 28 June 2013 14:10
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

Re: Business Analysts in Hadoop World

Posted by sa...@yahoo.com.

Is Data scientist in hadoop same as BA in IT.

Sent from BlackBerry® on Airtel

-----Original Message-----
From: Michael Forage <Mi...@livenation.co.uk>
Date: Fri, 28 Jun 2013 13:57:24 
To: user@hadoop.apache.org<us...@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

I'm afraid I'm not experienced or specialised enough in either Hadoop or the broader Big data industry to give any advice on career paths

Obviously different organisations expect completely different levels of technical contribution from their Business Analysts but in my experience the ability to collect, evaluate and interpret business requirements into some kind of functional specification document is key. The responsibility for subsequent technical specifications based applied knowledge of the big data tools at disposal will probably sit with your Solution Architect. Often a BA wouldn't care how a solution is implemented under the covers. However, in the same way that a BA may define user-flows in a use-case they could outline a conceptual data processing flow if they understand the source data and requirements well enough.

There are so many new tools and technologies in this space already that, unless you have a specific requirement to meet, it can be pretty overwhelming. I'd just start by concentrating on getting an understanding of map reduce concepts. Sure, it always helps if you've the time to get some hands-on technical  experience but that's not a trivial undertaking from a standing start and it may be completely irrelevant for you in the long run as that's not what a BA is paid to do.

Sorry I can't be more help
Mike

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: 28 June 2013 14:10
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Michael Forage <Mi...@livenation.co.uk>.

Hi Vijay

I'm afraid I'm not experienced or specialised enough in either Hadoop or the broader Big data industry to give any advice on career paths

Obviously different organisations expect completely different levels of technical contribution from their Business Analysts but in my experience the ability to collect, evaluate and interpret business requirements into some kind of functional specification document is key. The responsibility for subsequent technical specifications based applied knowledge of the big data tools at disposal will probably sit with your Solution Architect. Often a BA wouldn't care how a solution is implemented under the covers. However, in the same way that a BA may define user-flows in a use-case they could outline a conceptual data processing flow if they understand the source data and requirements well enough.

There are so many new tools and technologies in this space already that, unless you have a specific requirement to meet, it can be pretty overwhelming. I'd just start by concentrating on getting an understanding of map reduce concepts. Sure, it always helps if you've the time to get some hands-on technical  experience but that's not a trivial undertaking from a standing start and it may be completely irrelevant for you in the long run as that's not what a BA is paid to do.

Sorry I can't be more help
Mike

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: 28 June 2013 14:10
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck


Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)



On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.


*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:


*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?


*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?


*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay


[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by John Lilley <jo...@redpoint.net>.

Hadoop is not yet an easy learning curve, so I'd recommend that you start with Amazon Elastic MapReduce as an experimental platform to start learning.
John

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: Friday, June 28, 2013 7:10 AM
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by John Lilley <jo...@redpoint.net>.

Hadoop is not yet an easy learning curve, so I'd recommend that you start with Amazon Elastic MapReduce as an experimental platform to start learning.
John

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: Friday, June 28, 2013 7:10 AM
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Michael Forage <Mi...@livenation.co.uk>.

Hi Vijay

I'm afraid I'm not experienced or specialised enough in either Hadoop or the broader Big data industry to give any advice on career paths

Obviously different organisations expect completely different levels of technical contribution from their Business Analysts but in my experience the ability to collect, evaluate and interpret business requirements into some kind of functional specification document is key. The responsibility for subsequent technical specifications based applied knowledge of the big data tools at disposal will probably sit with your Solution Architect. Often a BA wouldn't care how a solution is implemented under the covers. However, in the same way that a BA may define user-flows in a use-case they could outline a conceptual data processing flow if they understand the source data and requirements well enough.

There are so many new tools and technologies in this space already that, unless you have a specific requirement to meet, it can be pretty overwhelming. I'd just start by concentrating on getting an understanding of map reduce concepts. Sure, it always helps if you've the time to get some hands-on technical  experience but that's not a trivial undertaking from a standing start and it may be completely irrelevant for you in the long run as that's not what a BA is paid to do.

Sorry I can't be more help
Mike

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: 28 June 2013 14:10
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck


Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)



On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.


*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:


*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?


*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?


*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay


[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by John Lilley <jo...@redpoint.net>.

Hadoop is not yet an easy learning curve, so I'd recommend that you start with Amazon Elastic MapReduce as an experimental platform to start learning.
John

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: Friday, June 28, 2013 7:10 AM
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Michael Forage <Mi...@livenation.co.uk>.

Hi Vijay

I'm afraid I'm not experienced or specialised enough in either Hadoop or the broader Big data industry to give any advice on career paths

Obviously different organisations expect completely different levels of technical contribution from their Business Analysts but in my experience the ability to collect, evaluate and interpret business requirements into some kind of functional specification document is key. The responsibility for subsequent technical specifications based applied knowledge of the big data tools at disposal will probably sit with your Solution Architect. Often a BA wouldn't care how a solution is implemented under the covers. However, in the same way that a BA may define user-flows in a use-case they could outline a conceptual data processing flow if they understand the source data and requirements well enough.

There are so many new tools and technologies in this space already that, unless you have a specific requirement to meet, it can be pretty overwhelming. I'd just start by concentrating on getting an understanding of map reduce concepts. Sure, it always helps if you've the time to get some hands-on technical  experience but that's not a trivial undertaking from a standing start and it may be completely irrelevant for you in the long run as that's not what a BA is paid to do.

Sorry I can't be more help
Mike

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: 28 June 2013 14:10
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck


Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)



On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.


*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:


*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?


*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?


*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay


[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Michael Forage <Mi...@livenation.co.uk>.

Hi Vijay

I'm afraid I'm not experienced or specialised enough in either Hadoop or the broader Big data industry to give any advice on career paths

Obviously different organisations expect completely different levels of technical contribution from their Business Analysts but in my experience the ability to collect, evaluate and interpret business requirements into some kind of functional specification document is key. The responsibility for subsequent technical specifications based applied knowledge of the big data tools at disposal will probably sit with your Solution Architect. Often a BA wouldn't care how a solution is implemented under the covers. However, in the same way that a BA may define user-flows in a use-case they could outline a conceptual data processing flow if they understand the source data and requirements well enough.

There are so many new tools and technologies in this space already that, unless you have a specific requirement to meet, it can be pretty overwhelming. I'd just start by concentrating on getting an understanding of map reduce concepts. Sure, it always helps if you've the time to get some hands-on technical  experience but that's not a trivial undertaking from a standing start and it may be completely irrelevant for you in the long run as that's not what a BA is paid to do.

Sorry I can't be more help
Mike

From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomi@huawei.com]
Sent: 28 June 2013 14:10
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck


Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)



On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.


*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:


*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?


*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?


*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay


[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>.

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>.

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>.

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>.

Michael,

Thanks for your advice. I am just confused because I could not see a clear career path for Business Analysts in the Hadoop world. May be because, it's an evolving field or maybe I am not yet aware of the same despite reading some primary content on the internet. However, I firmly believe the Big Data space is going to be very big in the next years and I would like to be part of it and contribute. I would like to know from you more on the role and responsibilities a BA can perform in this space and the possible areas / technologies which I need myself to be prepared for.

Thanks
Vijay

From: Michael Forage [mailto:Michael.Forage@livenation.co.uk]
Sent: Friday, June 28, 2013 5:23 PM
To: user@hadoop.apache.org
Subject: RE: Business Analysts in Hadoop World

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck

Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.

*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:

*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?

*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?

*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay

[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk<ma...@livenation.co.uk> | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Michael Forage <Mi...@livenation.co.uk>.

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck


Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)



On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.


*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:


*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?


*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?


*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay


[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Michael Forage <Mi...@livenation.co.uk>.

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck


Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)



On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.


*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:


*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?


*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?


*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay


[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Michael Forage <Mi...@livenation.co.uk>.

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck


Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)



On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.


*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:


*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?


*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?


*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay


[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

RE: Business Analysts in Hadoop World

Posted by Michael Forage <Mi...@livenation.co.uk>.

Hi Vijay

My advice is to carefully consider the scope of the role you're aiming for

As a BA I expect that you'd be able to add value by understanding your business data processing challenges and turning them into specifications for map/reduce jobs. This doesn't require you to have any Java coding skills as such, just a good handle of the map/reduce concepts. It also helps if you understand common use-cases associated with these technologies (as it's really an ecosystem of related toolsets) as well as what they're not so good for.

This would allow you to contribute to solution design on behalf of the business but assumes you're not concerned with the actual implementation/administration side of things. Definitely only bother re-learning Java if you're going to be the one writing the code. You do need to have had experience of working with data of some kind (this is a data processing environment at the end of the day) but simply reading a decent Hadoop book and googling a few websites should give you a decent enough background as a BA

Cheers
Mike

From: Lokesh Basu [mailto:lokesh.basu@gmail.com]
Sent: 28 June 2013 12:35
To: user@hadoop.apache.org
Subject: Re: Business Analysts in Hadoop World

Dear Vijay,

If you are a beginner in the open source project then I would recommend you to first get familiar with Java and some version control system and then go to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some online text/videos to get the insight.

Best of luck


Lokesh Chandra Basu
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)



On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <vi...@huawei.com>> wrote:
Hi,

I am just trying to get myself acquainted with Hadoop and other related technologies. I am very much fascinated with the potential of the Big Data world and hence would like to be part of it!!
However, it has been a while I have done any coding. Earlier for a brief period of time during early days of my career, I have done some work in Java. All these days, I am working as a Business Analyst in the CRM space.


*         Before I start exploring Hadoop world, I would like to hear your thoughts on the following queries:


*         Being a business analyst, what would be the possible career opportunities in the Hadoop space?


*         Is it necessary to have a strong technical background before jumping into Hadoop? If so, which technologies need to be learnt primarily? Java, SQL etc?


*         What are the various certifications available in the Hadoop world? Are there any certifications for Business Analysts?

Please let me know your valuable thoughts.

Thanks
Vijay


[http://media.livenationinternational.com/corpimages/livenation2.jpg]

Michael Forage | Solutions Architect - Insight Services
Email: Michael.Forage@livenation.co.uk | Tel: +44 207 980 4362 | Mob: +44 7808 174404
Address: 4 Pentonville Road | London | N1 9HF | United Kingdom

Live Nation Limited, Registered Office: 2nd Floor, Regent Arcade House 19-25 Argyll Street, London, W1F 7TS. Company Number 03805556. Registered in England and Wales.

This message is confidential and may be legally privileged or otherwise protected from disclosure. If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system; you must not copy or disclose the contents of this message or any attachment to any other person.

Re: Business Analysts in Hadoop World

Posted by Lokesh Basu <lo...@gmail.com>.

Dear Vijay,

If you are a beginner in the open source project then I would recommend you
to first get familiar with Java and some version control system and then go
to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some
online text/videos to get the insight.

Best of luck

*Lokesh Chandra Basu*
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
vijaya.bhoomi@huawei.com> wrote:

>  Hi,****
>
> ** **
>
> I am just trying to get myself acquainted with Hadoop and other related
> technologies. I am very much fascinated with the potential of the Big Data
> world and hence would like to be part of it!! ****
>
> However, it has been a while I have done any coding. Earlier for a brief
> period of time during early days of my career, I have done some work in
> Java. All these days, I am working as a Business Analyst in the CRM space.
> ****
>
> ** **
>
> **·         **Before I start exploring Hadoop world, I would like to hear
> your thoughts on the following queries:****
>
> ** **
>
> **·         **Being a business analyst, what would be the possible career
> opportunities in the Hadoop space?****
>
> ** **
>
> **·         **Is it necessary to have a strong technical background
> before jumping into Hadoop? If so, which technologies need to be learnt
> primarily? Java, SQL etc?****
>
> ** **
>
> **·         **What are the various certifications available in the Hadoop
> world? Are there any certifications for Business Analysts?****
>
> ** **
>
> Please let me know your valuable thoughts.****
>
> ** **
>
> Thanks****
>
> Vijay****
>

Re: Business Analysts in Hadoop World

Posted by Lokesh Basu <lo...@gmail.com>.

Dear Vijay,

If you are a beginner in the open source project then I would recommend you
to first get familiar with Java and some version control system and then go
to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some
online text/videos to get the insight.

Best of luck

*Lokesh Chandra Basu*
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
vijaya.bhoomi@huawei.com> wrote:

>  Hi,****
>
> ** **
>
> I am just trying to get myself acquainted with Hadoop and other related
> technologies. I am very much fascinated with the potential of the Big Data
> world and hence would like to be part of it!! ****
>
> However, it has been a while I have done any coding. Earlier for a brief
> period of time during early days of my career, I have done some work in
> Java. All these days, I am working as a Business Analyst in the CRM space.
> ****
>
> ** **
>
> **·         **Before I start exploring Hadoop world, I would like to hear
> your thoughts on the following queries:****
>
> ** **
>
> **·         **Being a business analyst, what would be the possible career
> opportunities in the Hadoop space?****
>
> ** **
>
> **·         **Is it necessary to have a strong technical background
> before jumping into Hadoop? If so, which technologies need to be learnt
> primarily? Java, SQL etc?****
>
> ** **
>
> **·         **What are the various certifications available in the Hadoop
> world? Are there any certifications for Business Analysts?****
>
> ** **
>
> Please let me know your valuable thoughts.****
>
> ** **
>
> Thanks****
>
> Vijay****
>

Re: Business Analysts in Hadoop World

Posted by Lokesh Basu <lo...@gmail.com>.

Dear Vijay,

If you are a beginner in the open source project then I would recommend you
to first get familiar with Java and some version control system and then go
to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some
online text/videos to get the insight.

Best of luck

*Lokesh Chandra Basu*
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
vijaya.bhoomi@huawei.com> wrote:

>  Hi,****
>
> ** **
>
> I am just trying to get myself acquainted with Hadoop and other related
> technologies. I am very much fascinated with the potential of the Big Data
> world and hence would like to be part of it!! ****
>
> However, it has been a while I have done any coding. Earlier for a brief
> period of time during early days of my career, I have done some work in
> Java. All these days, I am working as a Business Analyst in the CRM space.
> ****
>
> ** **
>
> **·         **Before I start exploring Hadoop world, I would like to hear
> your thoughts on the following queries:****
>
> ** **
>
> **·         **Being a business analyst, what would be the possible career
> opportunities in the Hadoop space?****
>
> ** **
>
> **·         **Is it necessary to have a strong technical background
> before jumping into Hadoop? If so, which technologies need to be learnt
> primarily? Java, SQL etc?****
>
> ** **
>
> **·         **What are the various certifications available in the Hadoop
> world? Are there any certifications for Business Analysts?****
>
> ** **
>
> Please let me know your valuable thoughts.****
>
> ** **
>
> Thanks****
>
> Vijay****
>

Re: Business Analysts in Hadoop World

Posted by Lokesh Basu <lo...@gmail.com>.

Dear Vijay,

If you are a beginner in the open source project then I would recommend you
to first get familiar with Java and some version control system and then go
to this (http://wiki.apache.org/hadoop/HowToContribute) page.

If you are not aware of the Hadoop project then you should go through some
online text/videos to get the insight.

Best of luck

*Lokesh Chandra Basu*
B. Tech
Computer Science and Engineering
Indian Institute of Technology, Roorkee
India(GMT +5hr 30min)

On Fri, Jun 28, 2013 at 4:50 PM, Vijaya Narayana Reddy Bhoomi Reddy <
vijaya.bhoomi@huawei.com> wrote:

>  Hi,****
>
> ** **
>
> I am just trying to get myself acquainted with Hadoop and other related
> technologies. I am very much fascinated with the potential of the Big Data
> world and hence would like to be part of it!! ****
>
> However, it has been a while I have done any coding. Earlier for a brief
> period of time during early days of my career, I have done some work in
> Java. All these days, I am working as a Business Analyst in the CRM space.
> ****
>
> ** **
>
> **·         **Before I start exploring Hadoop world, I would like to hear
> your thoughts on the following queries:****
>
> ** **
>
> **·         **Being a business analyst, what would be the possible career
> opportunities in the Hadoop space?****
>
> ** **
>
> **·         **Is it necessary to have a strong technical background
> before jumping into Hadoop? If so, which technologies need to be learnt
> primarily? Java, SQL etc?****
>
> ** **
>
> **·         **What are the various certifications available in the Hadoop
> world? Are there any certifications for Business Analysts?****
>
> ** **
>
> Please let me know your valuable thoughts.****
>
> ** **
>
> Thanks****
>
> Vijay****
>