You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Raghavendra Chandra <ra...@gmail.com> on 2014/11/01 16:45:33 UTC

Doubts in Map reduce programs

Hi There,

I have couple of doubts in Hadoop, it would be really helpful if anyone can
answer these questions or if this is already answered somewhere, the link
to that would be helpful.

Below are my doubts:

1. How to count the number of paragraphs in a text file using java map
reduce ?

2. How to count the number of sentences in a paragraph/file using java map
reduce ?

Please let me know where I can get the map reduce programs list with
different use cases.

Looking forward for your responses.

Re: Doubts in Map reduce programs

Posted by Pumudu ruhunage <pu...@gmail.com>.
Hi,

There are some great map reduce samples in hadoop itself. Have you seen
them ? If you have hadoop 2.2.0 and if you goto
{hadoop_base}/share/hadoop/mapreduce you can find bunch of great sample map
reduce programs. In different versions of hadoop this directory can be
different.

Regards,
Pumudu

On 1 November 2014 21:23, Shahab Yunus <sh...@gmail.com> wrote:

> One way that I can think of is that you basically need to define your own
> InputFormal and RecordReader so that each record is 'a paragraph' or a
> 'sentence'. The reason being that in regular case, a line terminated by
> standard end of line characters is considered as one record for
> FileInputFormat. Here, you instead want to get one paragraph as one record
> instead of one line. So, once you override a RecordReader, you will have
> control on how do you want to define a 'record' that is passed to each map
> task.
>
> Some starting points...E.g. look here to define and implement your own
> RecordReader for FileInputFormat:
>
> http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
> http://www.infoq.com/articles/HadoopInputFormat
>
> http://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/
>
> Regards,
> Shahab
>
> Regards,
> Shahab
>
> On Sat, Nov 1, 2014 at 11:45 AM, Raghavendra Chandra <
> raghavchandra.learning@gmail.com> wrote:
>
>> Hi There,
>>
>> I have couple of doubts in Hadoop, it would be really helpful if anyone
>> can answer these questions or if this is already answered somewhere, the
>> link to that would be helpful.
>>
>> Below are my doubts:
>>
>> 1. How to count the number of paragraphs in a text file using java map
>> reduce ?
>>
>> 2. How to count the number of sentences in a paragraph/file using java
>> map reduce ?
>>
>> Please let me know where I can get the map reduce programs list with
>> different use cases.
>>
>> Looking forward for your responses.
>>
>>
>

Re: Doubts in Map reduce programs

Posted by Pumudu ruhunage <pu...@gmail.com>.
Hi,

There are some great map reduce samples in hadoop itself. Have you seen
them ? If you have hadoop 2.2.0 and if you goto
{hadoop_base}/share/hadoop/mapreduce you can find bunch of great sample map
reduce programs. In different versions of hadoop this directory can be
different.

Regards,
Pumudu

On 1 November 2014 21:23, Shahab Yunus <sh...@gmail.com> wrote:

> One way that I can think of is that you basically need to define your own
> InputFormal and RecordReader so that each record is 'a paragraph' or a
> 'sentence'. The reason being that in regular case, a line terminated by
> standard end of line characters is considered as one record for
> FileInputFormat. Here, you instead want to get one paragraph as one record
> instead of one line. So, once you override a RecordReader, you will have
> control on how do you want to define a 'record' that is passed to each map
> task.
>
> Some starting points...E.g. look here to define and implement your own
> RecordReader for FileInputFormat:
>
> http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
> http://www.infoq.com/articles/HadoopInputFormat
>
> http://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/
>
> Regards,
> Shahab
>
> Regards,
> Shahab
>
> On Sat, Nov 1, 2014 at 11:45 AM, Raghavendra Chandra <
> raghavchandra.learning@gmail.com> wrote:
>
>> Hi There,
>>
>> I have couple of doubts in Hadoop, it would be really helpful if anyone
>> can answer these questions or if this is already answered somewhere, the
>> link to that would be helpful.
>>
>> Below are my doubts:
>>
>> 1. How to count the number of paragraphs in a text file using java map
>> reduce ?
>>
>> 2. How to count the number of sentences in a paragraph/file using java
>> map reduce ?
>>
>> Please let me know where I can get the map reduce programs list with
>> different use cases.
>>
>> Looking forward for your responses.
>>
>>
>

Re: Doubts in Map reduce programs

Posted by Pumudu ruhunage <pu...@gmail.com>.
Hi,

There are some great map reduce samples in hadoop itself. Have you seen
them ? If you have hadoop 2.2.0 and if you goto
{hadoop_base}/share/hadoop/mapreduce you can find bunch of great sample map
reduce programs. In different versions of hadoop this directory can be
different.

Regards,
Pumudu

On 1 November 2014 21:23, Shahab Yunus <sh...@gmail.com> wrote:

> One way that I can think of is that you basically need to define your own
> InputFormal and RecordReader so that each record is 'a paragraph' or a
> 'sentence'. The reason being that in regular case, a line terminated by
> standard end of line characters is considered as one record for
> FileInputFormat. Here, you instead want to get one paragraph as one record
> instead of one line. So, once you override a RecordReader, you will have
> control on how do you want to define a 'record' that is passed to each map
> task.
>
> Some starting points...E.g. look here to define and implement your own
> RecordReader for FileInputFormat:
>
> http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
> http://www.infoq.com/articles/HadoopInputFormat
>
> http://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/
>
> Regards,
> Shahab
>
> Regards,
> Shahab
>
> On Sat, Nov 1, 2014 at 11:45 AM, Raghavendra Chandra <
> raghavchandra.learning@gmail.com> wrote:
>
>> Hi There,
>>
>> I have couple of doubts in Hadoop, it would be really helpful if anyone
>> can answer these questions or if this is already answered somewhere, the
>> link to that would be helpful.
>>
>> Below are my doubts:
>>
>> 1. How to count the number of paragraphs in a text file using java map
>> reduce ?
>>
>> 2. How to count the number of sentences in a paragraph/file using java
>> map reduce ?
>>
>> Please let me know where I can get the map reduce programs list with
>> different use cases.
>>
>> Looking forward for your responses.
>>
>>
>

Re: Doubts in Map reduce programs

Posted by Pumudu ruhunage <pu...@gmail.com>.
Hi,

There are some great map reduce samples in hadoop itself. Have you seen
them ? If you have hadoop 2.2.0 and if you goto
{hadoop_base}/share/hadoop/mapreduce you can find bunch of great sample map
reduce programs. In different versions of hadoop this directory can be
different.

Regards,
Pumudu

On 1 November 2014 21:23, Shahab Yunus <sh...@gmail.com> wrote:

> One way that I can think of is that you basically need to define your own
> InputFormal and RecordReader so that each record is 'a paragraph' or a
> 'sentence'. The reason being that in regular case, a line terminated by
> standard end of line characters is considered as one record for
> FileInputFormat. Here, you instead want to get one paragraph as one record
> instead of one line. So, once you override a RecordReader, you will have
> control on how do you want to define a 'record' that is passed to each map
> task.
>
> Some starting points...E.g. look here to define and implement your own
> RecordReader for FileInputFormat:
>
> http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
> http://www.infoq.com/articles/HadoopInputFormat
>
> http://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/
>
> Regards,
> Shahab
>
> Regards,
> Shahab
>
> On Sat, Nov 1, 2014 at 11:45 AM, Raghavendra Chandra <
> raghavchandra.learning@gmail.com> wrote:
>
>> Hi There,
>>
>> I have couple of doubts in Hadoop, it would be really helpful if anyone
>> can answer these questions or if this is already answered somewhere, the
>> link to that would be helpful.
>>
>> Below are my doubts:
>>
>> 1. How to count the number of paragraphs in a text file using java map
>> reduce ?
>>
>> 2. How to count the number of sentences in a paragraph/file using java
>> map reduce ?
>>
>> Please let me know where I can get the map reduce programs list with
>> different use cases.
>>
>> Looking forward for your responses.
>>
>>
>

Re: Doubts in Map reduce programs

Posted by Shahab Yunus <sh...@gmail.com>.
One way that I can think of is that you basically need to define your own
InputFormal and RecordReader so that each record is 'a paragraph' or a
'sentence'. The reason being that in regular case, a line terminated by
standard end of line characters is considered as one record for
FileInputFormat. Here, you instead want to get one paragraph as one record
instead of one line. So, once you override a RecordReader, you will have
control on how do you want to define a 'record' that is passed to each map
task.

Some starting points...E.g. look here to define and implement your own
RecordReader for FileInputFormat:
http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
http://www.infoq.com/articles/HadoopInputFormat
http://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/

Regards,
Shahab

Regards,
Shahab

On Sat, Nov 1, 2014 at 11:45 AM, Raghavendra Chandra <
raghavchandra.learning@gmail.com> wrote:

> Hi There,
>
> I have couple of doubts in Hadoop, it would be really helpful if anyone
> can answer these questions or if this is already answered somewhere, the
> link to that would be helpful.
>
> Below are my doubts:
>
> 1. How to count the number of paragraphs in a text file using java map
> reduce ?
>
> 2. How to count the number of sentences in a paragraph/file using java map
> reduce ?
>
> Please let me know where I can get the map reduce programs list with
> different use cases.
>
> Looking forward for your responses.
>
>

Re: Doubts in Map reduce programs

Posted by Shahab Yunus <sh...@gmail.com>.
One way that I can think of is that you basically need to define your own
InputFormal and RecordReader so that each record is 'a paragraph' or a
'sentence'. The reason being that in regular case, a line terminated by
standard end of line characters is considered as one record for
FileInputFormat. Here, you instead want to get one paragraph as one record
instead of one line. So, once you override a RecordReader, you will have
control on how do you want to define a 'record' that is passed to each map
task.

Some starting points...E.g. look here to define and implement your own
RecordReader for FileInputFormat:
http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
http://www.infoq.com/articles/HadoopInputFormat
http://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/

Regards,
Shahab

Regards,
Shahab

On Sat, Nov 1, 2014 at 11:45 AM, Raghavendra Chandra <
raghavchandra.learning@gmail.com> wrote:

> Hi There,
>
> I have couple of doubts in Hadoop, it would be really helpful if anyone
> can answer these questions or if this is already answered somewhere, the
> link to that would be helpful.
>
> Below are my doubts:
>
> 1. How to count the number of paragraphs in a text file using java map
> reduce ?
>
> 2. How to count the number of sentences in a paragraph/file using java map
> reduce ?
>
> Please let me know where I can get the map reduce programs list with
> different use cases.
>
> Looking forward for your responses.
>
>

Re: Doubts in Map reduce programs

Posted by Shahab Yunus <sh...@gmail.com>.
One way that I can think of is that you basically need to define your own
InputFormal and RecordReader so that each record is 'a paragraph' or a
'sentence'. The reason being that in regular case, a line terminated by
standard end of line characters is considered as one record for
FileInputFormat. Here, you instead want to get one paragraph as one record
instead of one line. So, once you override a RecordReader, you will have
control on how do you want to define a 'record' that is passed to each map
task.

Some starting points...E.g. look here to define and implement your own
RecordReader for FileInputFormat:
http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
http://www.infoq.com/articles/HadoopInputFormat
http://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/

Regards,
Shahab

Regards,
Shahab

On Sat, Nov 1, 2014 at 11:45 AM, Raghavendra Chandra <
raghavchandra.learning@gmail.com> wrote:

> Hi There,
>
> I have couple of doubts in Hadoop, it would be really helpful if anyone
> can answer these questions or if this is already answered somewhere, the
> link to that would be helpful.
>
> Below are my doubts:
>
> 1. How to count the number of paragraphs in a text file using java map
> reduce ?
>
> 2. How to count the number of sentences in a paragraph/file using java map
> reduce ?
>
> Please let me know where I can get the map reduce programs list with
> different use cases.
>
> Looking forward for your responses.
>
>

Re: Doubts in Map reduce programs

Posted by Shahab Yunus <sh...@gmail.com>.
One way that I can think of is that you basically need to define your own
InputFormal and RecordReader so that each record is 'a paragraph' or a
'sentence'. The reason being that in regular case, a line terminated by
standard end of line characters is considered as one record for
FileInputFormat. Here, you instead want to get one paragraph as one record
instead of one line. So, once you override a RecordReader, you will have
control on how do you want to define a 'record' that is passed to each map
task.

Some starting points...E.g. look here to define and implement your own
RecordReader for FileInputFormat:
http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
http://www.infoq.com/articles/HadoopInputFormat
http://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/

Regards,
Shahab

Regards,
Shahab

On Sat, Nov 1, 2014 at 11:45 AM, Raghavendra Chandra <
raghavchandra.learning@gmail.com> wrote:

> Hi There,
>
> I have couple of doubts in Hadoop, it would be really helpful if anyone
> can answer these questions or if this is already answered somewhere, the
> link to that would be helpful.
>
> Below are my doubts:
>
> 1. How to count the number of paragraphs in a text file using java map
> reduce ?
>
> 2. How to count the number of sentences in a paragraph/file using java map
> reduce ?
>
> Please let me know where I can get the map reduce programs list with
> different use cases.
>
> Looking forward for your responses.
>
>