You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Sai Sai <sa...@yahoo.in> on 2013/02/23 13:52:46 UTC

Re: WordPairCount Mapreduce question.


Hello

I have a question about how Mapreduce sorting works internally with multiple columns.

Below r my classes using 2 columns in an input file given below.


1st question: About the method hashCode, we r adding a "31 + ", i am wondering why is this required. what does 31 refer to.


2nd question: what if my input file has 3 columns instead of 2 how would you write a compare method and was wondering if anyone can map this to a real world scenario it will be really helpful.



    @Override
    public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
        }
        return diff;
    }
    
    @Override
    public int hashCode() {
        return word1.hashCode() + 31 * word2.hashCode();
    }

******************************

Here is my input file wordpair.txt

******************************

a    b
a    c
a    b
a    d
b    d
e    f
b    d
e    f
b    d

**********************************


Here is my WordPairObject:

*********************************

public class WordPairCountKey implements WritableComparable<WordPairCountKey> {

    private String word1;
    private String word2;

    @Override
    public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
        }
        return diff;
    }
    
    @Override
    public int hashCode() {
        return word1.hashCode() + 31 * word2.hashCode();
    }

    
    public String getWord1() {
        return word1;
    }

    public void setWord1(String word1) {
        this.word1 = word1;
    }

    public String getWord2() {
        return word2;
    }

    public void setWord2(String word2) {
        this.word2 = word2;
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        word1 = in.readUTF();
        word2 = in.readUTF();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(word1);
        out.writeUTF(word2);
    }

    
    @Override
    public String toString() {
        return "[word1=" + word1 + ", word2=" + word2 + "]";
    }

}

******************************

Any help will be really appreciated.
Thanks
Sai

Re: Increase the number of mappers in PM mode

Posted by Harsh J <ha...@cloudera.com>.
In MR2, to have more mappers executed per NM, your memory request for each
map should be set such that the NM's configured memory allowance can fit in
multiple requests. For example, if my NM memory is set to 16 GB assuming
just 1 NM in cluster, and I submit a job with mapreduce.map.memory.mb and
yarn.app.mapreduce.am.resource.mb both set to 1 GB, then the NM can execute
15 maps in parallel consuming upto 1 GB memory each (while using the
remaining 1 GB for the AM to coordinate those executions).


On Sat, Mar 16, 2013 at 10:16 AM, yypvsxf19870706 <yypvsxf19870706@gmail.com
> wrote:

> hi:
>    i think i have got it . Thank you.
>
> 发自我的 iPhone
>
> 在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:
>
> Indeed you cannot explicitly set the number of mappers, but still you can
> gain some control over it, by setting mapred.max.split.size, or
> mapred.min.split.size.
>
> For example, if you have a file of 10GB (10737418240 B), you would like 10
> mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
> you can set mapred.min.split.size=1073741824 (1GB), i.e.
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
>
> It is well explained in thread:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
> .
>
> Regards,
> Zheyi.
>
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> s
>
>
>
>


-- 
Harsh J

Re: Increase the number of mappers in PM mode

Posted by Harsh J <ha...@cloudera.com>.
In MR2, to have more mappers executed per NM, your memory request for each
map should be set such that the NM's configured memory allowance can fit in
multiple requests. For example, if my NM memory is set to 16 GB assuming
just 1 NM in cluster, and I submit a job with mapreduce.map.memory.mb and
yarn.app.mapreduce.am.resource.mb both set to 1 GB, then the NM can execute
15 maps in parallel consuming upto 1 GB memory each (while using the
remaining 1 GB for the AM to coordinate those executions).


On Sat, Mar 16, 2013 at 10:16 AM, yypvsxf19870706 <yypvsxf19870706@gmail.com
> wrote:

> hi:
>    i think i have got it . Thank you.
>
> 发自我的 iPhone
>
> 在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:
>
> Indeed you cannot explicitly set the number of mappers, but still you can
> gain some control over it, by setting mapred.max.split.size, or
> mapred.min.split.size.
>
> For example, if you have a file of 10GB (10737418240 B), you would like 10
> mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
> you can set mapred.min.split.size=1073741824 (1GB), i.e.
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
>
> It is well explained in thread:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
> .
>
> Regards,
> Zheyi.
>
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> s
>
>
>
>


-- 
Harsh J

Re: Increase the number of mappers in PM mode

Posted by Harsh J <ha...@cloudera.com>.
In MR2, to have more mappers executed per NM, your memory request for each
map should be set such that the NM's configured memory allowance can fit in
multiple requests. For example, if my NM memory is set to 16 GB assuming
just 1 NM in cluster, and I submit a job with mapreduce.map.memory.mb and
yarn.app.mapreduce.am.resource.mb both set to 1 GB, then the NM can execute
15 maps in parallel consuming upto 1 GB memory each (while using the
remaining 1 GB for the AM to coordinate those executions).


On Sat, Mar 16, 2013 at 10:16 AM, yypvsxf19870706 <yypvsxf19870706@gmail.com
> wrote:

> hi:
>    i think i have got it . Thank you.
>
> 发自我的 iPhone
>
> 在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:
>
> Indeed you cannot explicitly set the number of mappers, but still you can
> gain some control over it, by setting mapred.max.split.size, or
> mapred.min.split.size.
>
> For example, if you have a file of 10GB (10737418240 B), you would like 10
> mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
> you can set mapred.min.split.size=1073741824 (1GB), i.e.
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
>
> It is well explained in thread:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
> .
>
> Regards,
> Zheyi.
>
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> s
>
>
>
>


-- 
Harsh J

Re: Increase the number of mappers in PM mode

Posted by Harsh J <ha...@cloudera.com>.
In MR2, to have more mappers executed per NM, your memory request for each
map should be set such that the NM's configured memory allowance can fit in
multiple requests. For example, if my NM memory is set to 16 GB assuming
just 1 NM in cluster, and I submit a job with mapreduce.map.memory.mb and
yarn.app.mapreduce.am.resource.mb both set to 1 GB, then the NM can execute
15 maps in parallel consuming upto 1 GB memory each (while using the
remaining 1 GB for the AM to coordinate those executions).


On Sat, Mar 16, 2013 at 10:16 AM, yypvsxf19870706 <yypvsxf19870706@gmail.com
> wrote:

> hi:
>    i think i have got it . Thank you.
>
> 发自我的 iPhone
>
> 在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:
>
> Indeed you cannot explicitly set the number of mappers, but still you can
> gain some control over it, by setting mapred.max.split.size, or
> mapred.min.split.size.
>
> For example, if you have a file of 10GB (10737418240 B), you would like 10
> mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
> you can set mapred.min.split.size=1073741824 (1GB), i.e.
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
>
> It is well explained in thread:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
> .
>
> Regards,
> Zheyi.
>
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> s
>
>
>
>


-- 
Harsh J

Re: Increase the number of mappers in PM mode

Posted by yypvsxf19870706 <yy...@gmail.com>.
hi��
   i think i have got it . Thank you.

�����ҵ� iPhone

�� 2013-3-15��18:32��Zheyi RONG <ro...@gmail.com> ���

> Indeed you cannot explicitly set the number of mappers, but still you can gain some control over it, by setting mapred.max.split.size, or mapred.min.split.size.
> 
> For example, if you have a file of 10GB (10737418240 B), you would like 10 mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))", you can set mapred.min.split.size=1073741824 (1GB), i.e.    
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
> 
> It is well explained in thread: http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.
> 
> Regards,
> Zheyi.
> 
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com> wrote:
>> s
> 
> 

Re: Increase the number of mappers in PM mode

Posted by yypvsxf19870706 <yy...@gmail.com>.
hi:
   i think i have got it . Thank you.

发自我的 iPhone

在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:

> Indeed you cannot explicitly set the number of mappers, but still you can gain some control over it, by setting mapred.max.split.size, or mapred.min.split.size.
> 
> For example, if you have a file of 10GB (10737418240 B), you would like 10 mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))", you can set mapred.min.split.size=1073741824 (1GB), i.e.    
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
> 
> It is well explained in thread: http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.
> 
> Regards,
> Zheyi.
> 
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com> wrote:
>> s
> 
> 

Re: Increase the number of mappers in PM mode

Posted by yypvsxf19870706 <yy...@gmail.com>.
hi:
   i think i have got it . Thank you.

发自我的 iPhone

在 2013-3-15,18:32,Zheyi RONG <ro...@gmail.com> 写道:

> Indeed you cannot explicitly set the number of mappers, but still you can gain some control over it, by setting mapred.max.split.size, or mapred.min.split.size.
> 
> For example, if you have a file of 10GB (10737418240 B), you would like 10 mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))", you can set mapred.min.split.size=1073741824 (1GB), i.e.    
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
> 
> It is well explained in thread: http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.
> 
> Regards,
> Zheyi.
> 
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com> wrote:
>> s
> 
> 

Re: Increase the number of mappers in PM mode

Posted by yypvsxf19870706 <yy...@gmail.com>.
hi��
   i think i have got it . Thank you.

�����ҵ� iPhone

�� 2013-3-15��18:32��Zheyi RONG <ro...@gmail.com> ���

> Indeed you cannot explicitly set the number of mappers, but still you can gain some control over it, by setting mapred.max.split.size, or mapred.min.split.size.
> 
> For example, if you have a file of 10GB (10737418240 B), you would like 10 mappers, then each mapper has to deal with 1GB data.
> According to "splitsize = max(minimumSize, min(maximumSize, blockSize))", you can set mapred.min.split.size=1073741824 (1GB), i.e.    
> $hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs
> 
> It is well explained in thread: http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.
> 
> Regards,
> Zheyi.
> 
> On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com> wrote:
>> s
> 
> 

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Indeed you cannot explicitly set the number of mappers, but still you can
gain some control over it, by setting mapred.max.split.size, or
mapred.min.split.size.

For example, if you have a file of 10GB (10737418240 B), you would like 10
mappers, then each mapper has to deal with 1GB data.
According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
you can set mapred.min.split.size=1073741824 (1GB), i.e.
$hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs

It is well explained in thread:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.

Regards,
Zheyi.

On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:

> s

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Indeed you cannot explicitly set the number of mappers, but still you can
gain some control over it, by setting mapred.max.split.size, or
mapred.min.split.size.

For example, if you have a file of 10GB (10737418240 B), you would like 10
mappers, then each mapper has to deal with 1GB data.
According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
you can set mapred.min.split.size=1073741824 (1GB), i.e.
$hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs

It is well explained in thread:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.

Regards,
Zheyi.

On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:

> s

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Indeed you cannot explicitly set the number of mappers, but still you can
gain some control over it, by setting mapred.max.split.size, or
mapred.min.split.size.

For example, if you have a file of 10GB (10737418240 B), you would like 10
mappers, then each mapper has to deal with 1GB data.
According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
you can set mapred.min.split.size=1073741824 (1GB), i.e.
$hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs

It is well explained in thread:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.

Regards,
Zheyi.

On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:

> s

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Indeed you cannot explicitly set the number of mappers, but still you can
gain some control over it, by setting mapred.max.split.size, or
mapred.min.split.size.

For example, if you have a file of 10GB (10737418240 B), you would like 10
mappers, then each mapper has to deal with 1GB data.
According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
you can set mapred.min.split.size=1073741824 (1GB), i.e.
$hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs

It is well explained in thread:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.

Regards,
Zheyi.

On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yy...@gmail.com>wrote:

> s

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
HI:
  i get these interview questions  by doing some googles:

Q29. How can you set an arbitary number of mappers to be created for a job
in Hadoop

This is a trick question. You cannot set it

 >> The above test proves you cannot  an arbitary number of mappers .

Q30. How can you set an arbitary number of reducers to be created for a job
in Hadoop

You can either do it progamatically by using method setNumReduceTasksin the
JobConfclass or set it up as a configuration setting


 I test the Q30,it seems right.

 my logs:

[hadoop@Hadoop01 bin]$./hadoop  jar
 ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar
wordcount -D mapreduce.job.reduces=2  -D mapreduce.jobtracker.address=
10.167.14.221:50030 /user/hadoop/yyp/input /user/hadoop/yyp/output3

===================================

Job Counters

Launched map tasks=1

Launched reduce tasks=2 -----> it actually changed .

Rack-local map tasks=1

Total time spent by all maps in occupied slots (ms)=60356

Total time spent by all reduces in occupied slots (ms)=135224

============================




regards





2013/3/14 YouPeng Yang <yy...@gmail.com>

> Hi
>   the docs only have a property
> : mapreduce.input.fileinputformat.split.minsize (default value is 0)
>   does it matter?
>
>
>
> 2013/3/14 Zheyi RONG <ro...@gmail.com>
>
>> Have you considered change mapred.max.split.size ?
>> As in:
>> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>>
>> Zheyi
>>
>>
>> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi
>>>
>>>
>>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>>  :
>>>   According to the doc:
>>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>>> job tracker runs at. If "local", then jobs are run in-process as a single
>>> map and reduce task.
>>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>>
>>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>>
>>>   And then run the wordcount examples:
>>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>>> input output
>>>
>>>   the output logs are as follows:
>>>         ....
>>>    Job Counters
>>> Launched map tasks=1
>>>  Launched reduce tasks=1
>>> Data-local map tasks=1
>>>  Total time spent by all maps in occupied slots (ms)=60336
>>> Total time spent by all reduces in occupied slots (ms)=63264
>>>      Map-Reduce Framework
>>> Map input records=5
>>>  Map output records=7
>>> Map output bytes=56
>>> Map output materialized bytes=76
>>>         ....
>>>
>>>  i seem to does not work.
>>>
>>>  I thought maybe my input file is small-just 5 records . is it right?
>>>
>>> regards
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>>
>>>>
>>>>
>>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>>> or is this not possible.
>>>> Thanks
>>>> Sai
>>>>
>>>
>>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
HI:
  i get these interview questions  by doing some googles:

Q29. How can you set an arbitary number of mappers to be created for a job
in Hadoop

This is a trick question. You cannot set it

 >> The above test proves you cannot  an arbitary number of mappers .

Q30. How can you set an arbitary number of reducers to be created for a job
in Hadoop

You can either do it progamatically by using method setNumReduceTasksin the
JobConfclass or set it up as a configuration setting


 I test the Q30,it seems right.

 my logs:

[hadoop@Hadoop01 bin]$./hadoop  jar
 ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar
wordcount -D mapreduce.job.reduces=2  -D mapreduce.jobtracker.address=
10.167.14.221:50030 /user/hadoop/yyp/input /user/hadoop/yyp/output3

===================================

Job Counters

Launched map tasks=1

Launched reduce tasks=2 -----> it actually changed .

Rack-local map tasks=1

Total time spent by all maps in occupied slots (ms)=60356

Total time spent by all reduces in occupied slots (ms)=135224

============================




regards





2013/3/14 YouPeng Yang <yy...@gmail.com>

> Hi
>   the docs only have a property
> : mapreduce.input.fileinputformat.split.minsize (default value is 0)
>   does it matter?
>
>
>
> 2013/3/14 Zheyi RONG <ro...@gmail.com>
>
>> Have you considered change mapred.max.split.size ?
>> As in:
>> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>>
>> Zheyi
>>
>>
>> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi
>>>
>>>
>>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>>  :
>>>   According to the doc:
>>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>>> job tracker runs at. If "local", then jobs are run in-process as a single
>>> map and reduce task.
>>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>>
>>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>>
>>>   And then run the wordcount examples:
>>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>>> input output
>>>
>>>   the output logs are as follows:
>>>         ....
>>>    Job Counters
>>> Launched map tasks=1
>>>  Launched reduce tasks=1
>>> Data-local map tasks=1
>>>  Total time spent by all maps in occupied slots (ms)=60336
>>> Total time spent by all reduces in occupied slots (ms)=63264
>>>      Map-Reduce Framework
>>> Map input records=5
>>>  Map output records=7
>>> Map output bytes=56
>>> Map output materialized bytes=76
>>>         ....
>>>
>>>  i seem to does not work.
>>>
>>>  I thought maybe my input file is small-just 5 records . is it right?
>>>
>>> regards
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>>
>>>>
>>>>
>>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>>> or is this not possible.
>>>> Thanks
>>>> Sai
>>>>
>>>
>>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
HI:
  i get these interview questions  by doing some googles:

Q29. How can you set an arbitary number of mappers to be created for a job
in Hadoop

This is a trick question. You cannot set it

 >> The above test proves you cannot  an arbitary number of mappers .

Q30. How can you set an arbitary number of reducers to be created for a job
in Hadoop

You can either do it progamatically by using method setNumReduceTasksin the
JobConfclass or set it up as a configuration setting


 I test the Q30,it seems right.

 my logs:

[hadoop@Hadoop01 bin]$./hadoop  jar
 ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar
wordcount -D mapreduce.job.reduces=2  -D mapreduce.jobtracker.address=
10.167.14.221:50030 /user/hadoop/yyp/input /user/hadoop/yyp/output3

===================================

Job Counters

Launched map tasks=1

Launched reduce tasks=2 -----> it actually changed .

Rack-local map tasks=1

Total time spent by all maps in occupied slots (ms)=60356

Total time spent by all reduces in occupied slots (ms)=135224

============================




regards





2013/3/14 YouPeng Yang <yy...@gmail.com>

> Hi
>   the docs only have a property
> : mapreduce.input.fileinputformat.split.minsize (default value is 0)
>   does it matter?
>
>
>
> 2013/3/14 Zheyi RONG <ro...@gmail.com>
>
>> Have you considered change mapred.max.split.size ?
>> As in:
>> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>>
>> Zheyi
>>
>>
>> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi
>>>
>>>
>>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>>  :
>>>   According to the doc:
>>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>>> job tracker runs at. If "local", then jobs are run in-process as a single
>>> map and reduce task.
>>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>>
>>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>>
>>>   And then run the wordcount examples:
>>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>>> input output
>>>
>>>   the output logs are as follows:
>>>         ....
>>>    Job Counters
>>> Launched map tasks=1
>>>  Launched reduce tasks=1
>>> Data-local map tasks=1
>>>  Total time spent by all maps in occupied slots (ms)=60336
>>> Total time spent by all reduces in occupied slots (ms)=63264
>>>      Map-Reduce Framework
>>> Map input records=5
>>>  Map output records=7
>>> Map output bytes=56
>>> Map output materialized bytes=76
>>>         ....
>>>
>>>  i seem to does not work.
>>>
>>>  I thought maybe my input file is small-just 5 records . is it right?
>>>
>>> regards
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>>
>>>>
>>>>
>>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>>> or is this not possible.
>>>> Thanks
>>>> Sai
>>>>
>>>
>>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
HI:
  i get these interview questions  by doing some googles:

Q29. How can you set an arbitary number of mappers to be created for a job
in Hadoop

This is a trick question. You cannot set it

 >> The above test proves you cannot  an arbitary number of mappers .

Q30. How can you set an arbitary number of reducers to be created for a job
in Hadoop

You can either do it progamatically by using method setNumReduceTasksin the
JobConfclass or set it up as a configuration setting


 I test the Q30,it seems right.

 my logs:

[hadoop@Hadoop01 bin]$./hadoop  jar
 ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar
wordcount -D mapreduce.job.reduces=2  -D mapreduce.jobtracker.address=
10.167.14.221:50030 /user/hadoop/yyp/input /user/hadoop/yyp/output3

===================================

Job Counters

Launched map tasks=1

Launched reduce tasks=2 -----> it actually changed .

Rack-local map tasks=1

Total time spent by all maps in occupied slots (ms)=60356

Total time spent by all reduces in occupied slots (ms)=135224

============================




regards





2013/3/14 YouPeng Yang <yy...@gmail.com>

> Hi
>   the docs only have a property
> : mapreduce.input.fileinputformat.split.minsize (default value is 0)
>   does it matter?
>
>
>
> 2013/3/14 Zheyi RONG <ro...@gmail.com>
>
>> Have you considered change mapred.max.split.size ?
>> As in:
>> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>>
>> Zheyi
>>
>>
>> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi
>>>
>>>
>>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>>  :
>>>   According to the doc:
>>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>>> job tracker runs at. If "local", then jobs are run in-process as a single
>>> map and reduce task.
>>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>>
>>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>>
>>>   And then run the wordcount examples:
>>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>>> input output
>>>
>>>   the output logs are as follows:
>>>         ....
>>>    Job Counters
>>> Launched map tasks=1
>>>  Launched reduce tasks=1
>>> Data-local map tasks=1
>>>  Total time spent by all maps in occupied slots (ms)=60336
>>> Total time spent by all reduces in occupied slots (ms)=63264
>>>      Map-Reduce Framework
>>> Map input records=5
>>>  Map output records=7
>>> Map output bytes=56
>>> Map output materialized bytes=76
>>>         ....
>>>
>>>  i seem to does not work.
>>>
>>>  I thought maybe my input file is small-just 5 records . is it right?
>>>
>>> regards
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>>
>>>>
>>>>
>>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>>> or is this not possible.
>>>> Thanks
>>>> Sai
>>>>
>>>
>>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
  the docs only have a property
: mapreduce.input.fileinputformat.split.minsize (default value is 0)
  does it matter?



2013/3/14 Zheyi RONG <ro...@gmail.com>

> Have you considered change mapred.max.split.size ?
> As in:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>
> Zheyi
>
>
> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi
>>
>>
>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>  :
>>   According to the doc:
>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>> job tracker runs at. If "local", then jobs are run in-process as a single
>> map and reduce task.
>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>
>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>
>>   And then run the wordcount examples:
>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>> input output
>>
>>   the output logs are as follows:
>>         ....
>>    Job Counters
>> Launched map tasks=1
>>  Launched reduce tasks=1
>> Data-local map tasks=1
>>  Total time spent by all maps in occupied slots (ms)=60336
>> Total time spent by all reduces in occupied slots (ms)=63264
>>      Map-Reduce Framework
>> Map input records=5
>>  Map output records=7
>> Map output bytes=56
>> Map output materialized bytes=76
>>         ....
>>
>>  i seem to does not work.
>>
>>  I thought maybe my input file is small-just 5 records . is it right?
>>
>> regards
>>
>>
>>
>>
>>
>>
>>
>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>
>>>
>>>
>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>> or is this not possible.
>>> Thanks
>>> Sai
>>>
>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
  the docs only have a property
: mapreduce.input.fileinputformat.split.minsize (default value is 0)
  does it matter?



2013/3/14 Zheyi RONG <ro...@gmail.com>

> Have you considered change mapred.max.split.size ?
> As in:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>
> Zheyi
>
>
> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi
>>
>>
>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>  :
>>   According to the doc:
>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>> job tracker runs at. If "local", then jobs are run in-process as a single
>> map and reduce task.
>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>
>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>
>>   And then run the wordcount examples:
>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>> input output
>>
>>   the output logs are as follows:
>>         ....
>>    Job Counters
>> Launched map tasks=1
>>  Launched reduce tasks=1
>> Data-local map tasks=1
>>  Total time spent by all maps in occupied slots (ms)=60336
>> Total time spent by all reduces in occupied slots (ms)=63264
>>      Map-Reduce Framework
>> Map input records=5
>>  Map output records=7
>> Map output bytes=56
>> Map output materialized bytes=76
>>         ....
>>
>>  i seem to does not work.
>>
>>  I thought maybe my input file is small-just 5 records . is it right?
>>
>> regards
>>
>>
>>
>>
>>
>>
>>
>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>
>>>
>>>
>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>> or is this not possible.
>>> Thanks
>>> Sai
>>>
>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
  the docs only have a property
: mapreduce.input.fileinputformat.split.minsize (default value is 0)
  does it matter?



2013/3/14 Zheyi RONG <ro...@gmail.com>

> Have you considered change mapred.max.split.size ?
> As in:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>
> Zheyi
>
>
> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi
>>
>>
>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>  :
>>   According to the doc:
>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>> job tracker runs at. If "local", then jobs are run in-process as a single
>> map and reduce task.
>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>
>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>
>>   And then run the wordcount examples:
>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>> input output
>>
>>   the output logs are as follows:
>>         ....
>>    Job Counters
>> Launched map tasks=1
>>  Launched reduce tasks=1
>> Data-local map tasks=1
>>  Total time spent by all maps in occupied slots (ms)=60336
>> Total time spent by all reduces in occupied slots (ms)=63264
>>      Map-Reduce Framework
>> Map input records=5
>>  Map output records=7
>> Map output bytes=56
>> Map output materialized bytes=76
>>         ....
>>
>>  i seem to does not work.
>>
>>  I thought maybe my input file is small-just 5 records . is it right?
>>
>> regards
>>
>>
>>
>>
>>
>>
>>
>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>
>>>
>>>
>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>> or is this not possible.
>>> Thanks
>>> Sai
>>>
>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
  the docs only have a property
: mapreduce.input.fileinputformat.split.minsize (default value is 0)
  does it matter?



2013/3/14 Zheyi RONG <ro...@gmail.com>

> Have you considered change mapred.max.split.size ?
> As in:
> http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop
>
> Zheyi
>
>
> On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi
>>
>>
>>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and
>>  :
>>   According to the doc:
>>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
>> job tracker runs at. If "local", then jobs are run in-process as a single
>> map and reduce task.
>>   *mapreduce.job.maps (default value is 2)* :The default number of map
>> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>>
>>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>>
>>   And then run the wordcount examples:
>>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
>> input output
>>
>>   the output logs are as follows:
>>         ....
>>    Job Counters
>> Launched map tasks=1
>>  Launched reduce tasks=1
>> Data-local map tasks=1
>>  Total time spent by all maps in occupied slots (ms)=60336
>> Total time spent by all reduces in occupied slots (ms)=63264
>>      Map-Reduce Framework
>> Map input records=5
>>  Map output records=7
>> Map output bytes=56
>> Map output materialized bytes=76
>>         ....
>>
>>  i seem to does not work.
>>
>>  I thought maybe my input file is small-just 5 records . is it right?
>>
>> regards
>>
>>
>>
>>
>>
>>
>>
>> 2013/3/14 Sai Sai <sa...@yahoo.in>
>>
>>>
>>>
>>>  In Pseudo Mode where is the setting to increase the number of mappers
>>> or is this not possible.
>>> Thanks
>>> Sai
>>>
>>
>>
>

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Have you considered change mapred.max.split.size ?
As in:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop

Zheyi

On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi
>
>
>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
>   According to the doc:
>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
> job tracker runs at. If "local", then jobs are run in-process as a single
> map and reduce task.
>   *mapreduce.job.maps (default value is 2)* :The default number of map
> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>
>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>
>   And then run the wordcount examples:
>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
> input output
>
>   the output logs are as follows:
>         ....
>    Job Counters
> Launched map tasks=1
>  Launched reduce tasks=1
> Data-local map tasks=1
>  Total time spent by all maps in occupied slots (ms)=60336
> Total time spent by all reduces in occupied slots (ms)=63264
>      Map-Reduce Framework
> Map input records=5
>  Map output records=7
> Map output bytes=56
> Map output materialized bytes=76
>         ....
>
>  i seem to does not work.
>
>  I thought maybe my input file is small-just 5 records . is it right?
>
> regards
>
>
>
>
>
>
>
> 2013/3/14 Sai Sai <sa...@yahoo.in>
>
>>
>>
>>  In Pseudo Mode where is the setting to increase the number of mappers or
>> is this not possible.
>> Thanks
>> Sai
>>
>
>

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Have you considered change mapred.max.split.size ?
As in:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop

Zheyi

On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi
>
>
>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
>   According to the doc:
>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
> job tracker runs at. If "local", then jobs are run in-process as a single
> map and reduce task.
>   *mapreduce.job.maps (default value is 2)* :The default number of map
> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>
>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>
>   And then run the wordcount examples:
>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
> input output
>
>   the output logs are as follows:
>         ....
>    Job Counters
> Launched map tasks=1
>  Launched reduce tasks=1
> Data-local map tasks=1
>  Total time spent by all maps in occupied slots (ms)=60336
> Total time spent by all reduces in occupied slots (ms)=63264
>      Map-Reduce Framework
> Map input records=5
>  Map output records=7
> Map output bytes=56
> Map output materialized bytes=76
>         ....
>
>  i seem to does not work.
>
>  I thought maybe my input file is small-just 5 records . is it right?
>
> regards
>
>
>
>
>
>
>
> 2013/3/14 Sai Sai <sa...@yahoo.in>
>
>>
>>
>>  In Pseudo Mode where is the setting to increase the number of mappers or
>> is this not possible.
>> Thanks
>> Sai
>>
>
>

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Have you considered change mapred.max.split.size ?
As in:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop

Zheyi

On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi
>
>
>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
>   According to the doc:
>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
> job tracker runs at. If "local", then jobs are run in-process as a single
> map and reduce task.
>   *mapreduce.job.maps (default value is 2)* :The default number of map
> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>
>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>
>   And then run the wordcount examples:
>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
> input output
>
>   the output logs are as follows:
>         ....
>    Job Counters
> Launched map tasks=1
>  Launched reduce tasks=1
> Data-local map tasks=1
>  Total time spent by all maps in occupied slots (ms)=60336
> Total time spent by all reduces in occupied slots (ms)=63264
>      Map-Reduce Framework
> Map input records=5
>  Map output records=7
> Map output bytes=56
> Map output materialized bytes=76
>         ....
>
>  i seem to does not work.
>
>  I thought maybe my input file is small-just 5 records . is it right?
>
> regards
>
>
>
>
>
>
>
> 2013/3/14 Sai Sai <sa...@yahoo.in>
>
>>
>>
>>  In Pseudo Mode where is the setting to increase the number of mappers or
>> is this not possible.
>> Thanks
>> Sai
>>
>
>

Re: Increase the number of mappers in PM mode

Posted by Zheyi RONG <ro...@gmail.com>.
Have you considered change mapred.max.split.size ?
As in:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop

Zheyi

On Thu, Mar 14, 2013 at 3:27 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi
>
>
>   I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
>   According to the doc:
>   *mapreduce.jobtracker.address :*The host and port that the MapReduce
> job tracker runs at. If "local", then jobs are run in-process as a single
> map and reduce task.
>   *mapreduce.job.maps (default value is 2)* :The default number of map
> tasks per job. Ignored when mapreduce.jobtracker.address is "local".
>
>   I changed the mapreduce.jobtracker.address = Hadoop:50031.
>
>   And then run the wordcount examples:
>   hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
> input output
>
>   the output logs are as follows:
>         ....
>    Job Counters
> Launched map tasks=1
>  Launched reduce tasks=1
> Data-local map tasks=1
>  Total time spent by all maps in occupied slots (ms)=60336
> Total time spent by all reduces in occupied slots (ms)=63264
>      Map-Reduce Framework
> Map input records=5
>  Map output records=7
> Map output bytes=56
> Map output materialized bytes=76
>         ....
>
>  i seem to does not work.
>
>  I thought maybe my input file is small-just 5 records . is it right?
>
> regards
>
>
>
>
>
>
>
> 2013/3/14 Sai Sai <sa...@yahoo.in>
>
>>
>>
>>  In Pseudo Mode where is the setting to increase the number of mappers or
>> is this not possible.
>> Thanks
>> Sai
>>
>
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi


  I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
  According to the doc:
  *mapreduce.jobtracker.address :*The host and port that the MapReduce job
tracker runs at. If "local", then jobs are run in-process as a single map
and reduce task.
  *mapreduce.job.maps (default value is 2)* :The default number of map
tasks per job. Ignored when mapreduce.jobtracker.address is "local".

  I changed the mapreduce.jobtracker.address = Hadoop:50031.

  And then run the wordcount examples:
  hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
input output

  the output logs are as follows:
        ....
   Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=60336
Total time spent by all reduces in occupied slots (ms)=63264
     Map-Reduce Framework
Map input records=5
Map output records=7
Map output bytes=56
Map output materialized bytes=76
        ....

 i seem to does not work.

 I thought maybe my input file is small-just 5 records . is it right?

regards







2013/3/14 Sai Sai <sa...@yahoo.in>

>
>
> In Pseudo Mode where is the setting to increase the number of mappers or
> is this not possible.
> Thanks
> Sai
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi


  I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
  According to the doc:
  *mapreduce.jobtracker.address :*The host and port that the MapReduce job
tracker runs at. If "local", then jobs are run in-process as a single map
and reduce task.
  *mapreduce.job.maps (default value is 2)* :The default number of map
tasks per job. Ignored when mapreduce.jobtracker.address is "local".

  I changed the mapreduce.jobtracker.address = Hadoop:50031.

  And then run the wordcount examples:
  hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
input output

  the output logs are as follows:
        ....
   Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=60336
Total time spent by all reduces in occupied slots (ms)=63264
     Map-Reduce Framework
Map input records=5
Map output records=7
Map output bytes=56
Map output materialized bytes=76
        ....

 i seem to does not work.

 I thought maybe my input file is small-just 5 records . is it right?

regards







2013/3/14 Sai Sai <sa...@yahoo.in>

>
>
> In Pseudo Mode where is the setting to increase the number of mappers or
> is this not possible.
> Thanks
> Sai
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi


  I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
  According to the doc:
  *mapreduce.jobtracker.address :*The host and port that the MapReduce job
tracker runs at. If "local", then jobs are run in-process as a single map
and reduce task.
  *mapreduce.job.maps (default value is 2)* :The default number of map
tasks per job. Ignored when mapreduce.jobtracker.address is "local".

  I changed the mapreduce.jobtracker.address = Hadoop:50031.

  And then run the wordcount examples:
  hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
input output

  the output logs are as follows:
        ....
   Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=60336
Total time spent by all reduces in occupied slots (ms)=63264
     Map-Reduce Framework
Map input records=5
Map output records=7
Map output bytes=56
Map output materialized bytes=76
        ....

 i seem to does not work.

 I thought maybe my input file is small-just 5 records . is it right?

regards







2013/3/14 Sai Sai <sa...@yahoo.in>

>
>
> In Pseudo Mode where is the setting to increase the number of mappers or
> is this not possible.
> Thanks
> Sai
>

Re: Increase the number of mappers in PM mode

Posted by YouPeng Yang <yy...@gmail.com>.
Hi


  I have done some tests in my  Pseudo Mode(CDH4.1.2)with MV2 yarn,and   :
  According to the doc:
  *mapreduce.jobtracker.address :*The host and port that the MapReduce job
tracker runs at. If "local", then jobs are run in-process as a single map
and reduce task.
  *mapreduce.job.maps (default value is 2)* :The default number of map
tasks per job. Ignored when mapreduce.jobtracker.address is "local".

  I changed the mapreduce.jobtracker.address = Hadoop:50031.

  And then run the wordcount examples:
  hadoop jar  hadoop-mapreduce-examples-2.0.0-cdh4.1.2.jar wordcount
input output

  the output logs are as follows:
        ....
   Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=60336
Total time spent by all reduces in occupied slots (ms)=63264
     Map-Reduce Framework
Map input records=5
Map output records=7
Map output bytes=56
Map output materialized bytes=76
        ....

 i seem to does not work.

 I thought maybe my input file is small-just 5 records . is it right?

regards







2013/3/14 Sai Sai <sa...@yahoo.in>

>
>
> In Pseudo Mode where is the setting to increase the number of mappers or
> is this not possible.
> Thanks
> Sai
>

Re: Increase the number of mappers in PM mode

Posted by Sai Sai <sa...@yahoo.in>.


In Pseudo Mode where is the setting to increase the number of mappers or is this not possible.
Thanks
Sai

Re: Increase the number of mappers in PM mode

Posted by Sai Sai <sa...@yahoo.in>.


In Pseudo Mode where is the setting to increase the number of mappers or is this not possible.
Thanks
Sai

Re: Increase the number of mappers in PM mode

Posted by Sai Sai <sa...@yahoo.in>.


In Pseudo Mode where is the setting to increase the number of mappers or is this not possible.
Thanks
Sai

Re: Increase the number of mappers in PM mode

Posted by Sai Sai <sa...@yahoo.in>.


In Pseudo Mode where is the setting to increase the number of mappers or is this not possible.
Thanks
Sai

Re: Block vs FileSplit vs record vs line

Posted by Sai Sai <sa...@yahoo.in>.
Just wondering if this is right way to understand this:
A large file is split into multiple blocks and each block is split into multiple file splits and each file split has multiple records and each record has multiple lines. Each line is processed by 1 instance of mapper.
Any help is appreciated.
Thanks
Sai

RE: Unknown processes unable to terminate

Posted by Leo Leung <ll...@ddn.com>.
Hi Sai,

   The RunJar process is normally the result of someone or something running “hadoop jar <something>”
   (i.e:  org.apache.hadoop.util.RunJar  <something>)

   You probably want to find out who/what is running with a more detail info via ps –ef | grep RunJar
   <stop|start>-all.sh deals with hdfs/ M/R specific process only.   So it will not stop any other java process reported by jps.

Cheers.


From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Monday, March 04, 2013 1:42 AM
To: user@hadoop.apache.org
Subject: Re: Unknown processes unable to terminate

I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:


30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode

Thanks
Sai

RE: Unknown processes unable to terminate

Posted by Leo Leung <ll...@ddn.com>.
Hi Sai,

   The RunJar process is normally the result of someone or something running “hadoop jar <something>”
   (i.e:  org.apache.hadoop.util.RunJar  <something>)

   You probably want to find out who/what is running with a more detail info via ps –ef | grep RunJar
   <stop|start>-all.sh deals with hdfs/ M/R specific process only.   So it will not stop any other java process reported by jps.

Cheers.


From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Monday, March 04, 2013 1:42 AM
To: user@hadoop.apache.org
Subject: Re: Unknown processes unable to terminate

I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:


30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode

Thanks
Sai

RE: Unknown processes unable to terminate

Posted by Leo Leung <ll...@ddn.com>.
Hi Sai,

   The RunJar process is normally the result of someone or something running “hadoop jar <something>”
   (i.e:  org.apache.hadoop.util.RunJar  <something>)

   You probably want to find out who/what is running with a more detail info via ps –ef | grep RunJar
   <stop|start>-all.sh deals with hdfs/ M/R specific process only.   So it will not stop any other java process reported by jps.

Cheers.


From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Monday, March 04, 2013 1:42 AM
To: user@hadoop.apache.org
Subject: Re: Unknown processes unable to terminate

I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:


30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode

Thanks
Sai

RE: Unknown processes unable to terminate

Posted by Leo Leung <ll...@ddn.com>.
Hi Sai,

   The RunJar process is normally the result of someone or something running “hadoop jar <something>”
   (i.e:  org.apache.hadoop.util.RunJar  <something>)

   You probably want to find out who/what is running with a more detail info via ps –ef | grep RunJar
   <stop|start>-all.sh deals with hdfs/ M/R specific process only.   So it will not stop any other java process reported by jps.

Cheers.


From: Sai Sai [mailto:saigraph@yahoo.in]
Sent: Monday, March 04, 2013 1:42 AM
To: user@hadoop.apache.org
Subject: Re: Unknown processes unable to terminate

I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:


30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode

Thanks
Sai

Re: Unknown processes unable to terminate

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Sai,

Are you fine to kill all those process on this machine? If you need
ALL those process to be killed, and if they are all Java processes,
you can use killall -9 java. That will kill ALL the java process under
this user.

JM

2013/3/4 shashwat shriparv <dw...@gmail.com>:
> You can you kill -9 13082
>
> Is there eclipse or netbeans project running, that may the this process..
>
>
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> I have a list of following processes given below, i am trying to kill the
>> process 13082 using:
>>
>> kill 13082
>>
>> Its not terminating RunJar.
>>
>> I have done a stop-all.sh hoping it would stop all the processes but only
>> stopped the hadoop related processes.
>> I am just wondering if it is necessary to stop all other processes before
>> starting the hadoop process and how to stop these other processes.
>>
>> Here is the list of processes which r appearing:
>>
>>
>> 30969 FileSystemCat
>> 30877 FileSystemCat
>> 5647 StreamCompressor
>> 32200 DataNode
>> 25015 Jps
>> 2227 URLCat
>> 5563 StreamCompressor
>> 5398 StreamCompressor
>> 13082 RunJar
>> 32578 JobTracker
>> 7215
>> 385 TaskTracker
>> 31884 NameNode
>> 32489 SecondaryNameNode
>>
>> Thanks
>> Sai
>
>

Re: Unknown processes unable to terminate

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Sai,

Are you fine to kill all those process on this machine? If you need
ALL those process to be killed, and if they are all Java processes,
you can use killall -9 java. That will kill ALL the java process under
this user.

JM

2013/3/4 shashwat shriparv <dw...@gmail.com>:
> You can you kill -9 13082
>
> Is there eclipse or netbeans project running, that may the this process..
>
>
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> I have a list of following processes given below, i am trying to kill the
>> process 13082 using:
>>
>> kill 13082
>>
>> Its not terminating RunJar.
>>
>> I have done a stop-all.sh hoping it would stop all the processes but only
>> stopped the hadoop related processes.
>> I am just wondering if it is necessary to stop all other processes before
>> starting the hadoop process and how to stop these other processes.
>>
>> Here is the list of processes which r appearing:
>>
>>
>> 30969 FileSystemCat
>> 30877 FileSystemCat
>> 5647 StreamCompressor
>> 32200 DataNode
>> 25015 Jps
>> 2227 URLCat
>> 5563 StreamCompressor
>> 5398 StreamCompressor
>> 13082 RunJar
>> 32578 JobTracker
>> 7215
>> 385 TaskTracker
>> 31884 NameNode
>> 32489 SecondaryNameNode
>>
>> Thanks
>> Sai
>
>

Re: Unknown processes unable to terminate

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Sai,

Are you fine to kill all those process on this machine? If you need
ALL those process to be killed, and if they are all Java processes,
you can use killall -9 java. That will kill ALL the java process under
this user.

JM

2013/3/4 shashwat shriparv <dw...@gmail.com>:
> You can you kill -9 13082
>
> Is there eclipse or netbeans project running, that may the this process..
>
>
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> I have a list of following processes given below, i am trying to kill the
>> process 13082 using:
>>
>> kill 13082
>>
>> Its not terminating RunJar.
>>
>> I have done a stop-all.sh hoping it would stop all the processes but only
>> stopped the hadoop related processes.
>> I am just wondering if it is necessary to stop all other processes before
>> starting the hadoop process and how to stop these other processes.
>>
>> Here is the list of processes which r appearing:
>>
>>
>> 30969 FileSystemCat
>> 30877 FileSystemCat
>> 5647 StreamCompressor
>> 32200 DataNode
>> 25015 Jps
>> 2227 URLCat
>> 5563 StreamCompressor
>> 5398 StreamCompressor
>> 13082 RunJar
>> 32578 JobTracker
>> 7215
>> 385 TaskTracker
>> 31884 NameNode
>> 32489 SecondaryNameNode
>>
>> Thanks
>> Sai
>
>

Re: Unknown processes unable to terminate

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Sai,

Are you fine to kill all those process on this machine? If you need
ALL those process to be killed, and if they are all Java processes,
you can use killall -9 java. That will kill ALL the java process under
this user.

JM

2013/3/4 shashwat shriparv <dw...@gmail.com>:
> You can you kill -9 13082
>
> Is there eclipse or netbeans project running, that may the this process..
>
>
>
> ∞
>
> Shashwat Shriparv
>
>
>
> On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> I have a list of following processes given below, i am trying to kill the
>> process 13082 using:
>>
>> kill 13082
>>
>> Its not terminating RunJar.
>>
>> I have done a stop-all.sh hoping it would stop all the processes but only
>> stopped the hadoop related processes.
>> I am just wondering if it is necessary to stop all other processes before
>> starting the hadoop process and how to stop these other processes.
>>
>> Here is the list of processes which r appearing:
>>
>>
>> 30969 FileSystemCat
>> 30877 FileSystemCat
>> 5647 StreamCompressor
>> 32200 DataNode
>> 25015 Jps
>> 2227 URLCat
>> 5563 StreamCompressor
>> 5398 StreamCompressor
>> 13082 RunJar
>> 32578 JobTracker
>> 7215
>> 385 TaskTracker
>> 31884 NameNode
>> 32489 SecondaryNameNode
>>
>> Thanks
>> Sai
>
>

Re: Unknown processes unable to terminate

Posted by shashwat shriparv <dw...@gmail.com>.
You can you kill -9 13082

Is there eclipse or netbeans project running, that may the this process..



∞
Shashwat Shriparv



On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:

> I have a list of following processes given below, i am trying to kill the
> process 13082 using:
>
> kill 13082
>
> Its not terminating RunJar.
>
> I have done a stop-all.sh hoping it would stop all the processes but only
> stopped the hadoop related processes.
> I am just wondering if it is necessary to stop all other processes before
> starting the hadoop process and how to stop these other processes.
>
> Here is the list of processes which r appearing:
>
>
> 30969 FileSystemCat
> 30877 FileSystemCat
> 5647 StreamCompressor
> 32200 DataNode
> 25015 Jps
> 2227 URLCat
> 5563 StreamCompressor
> 5398 StreamCompressor
> 13082 RunJar
> 32578 JobTracker
> 7215
> 385 TaskTracker
> 31884 NameNode
> 32489 SecondaryNameNode
>
> Thanks
> Sai
>

Re: Unknown processes unable to terminate

Posted by shashwat shriparv <dw...@gmail.com>.
You can you kill -9 13082

Is there eclipse or netbeans project running, that may the this process..



∞
Shashwat Shriparv



On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:

> I have a list of following processes given below, i am trying to kill the
> process 13082 using:
>
> kill 13082
>
> Its not terminating RunJar.
>
> I have done a stop-all.sh hoping it would stop all the processes but only
> stopped the hadoop related processes.
> I am just wondering if it is necessary to stop all other processes before
> starting the hadoop process and how to stop these other processes.
>
> Here is the list of processes which r appearing:
>
>
> 30969 FileSystemCat
> 30877 FileSystemCat
> 5647 StreamCompressor
> 32200 DataNode
> 25015 Jps
> 2227 URLCat
> 5563 StreamCompressor
> 5398 StreamCompressor
> 13082 RunJar
> 32578 JobTracker
> 7215
> 385 TaskTracker
> 31884 NameNode
> 32489 SecondaryNameNode
>
> Thanks
> Sai
>

Re: Unknown processes unable to terminate

Posted by shashwat shriparv <dw...@gmail.com>.
You can you kill -9 13082

Is there eclipse or netbeans project running, that may the this process..



∞
Shashwat Shriparv



On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:

> I have a list of following processes given below, i am trying to kill the
> process 13082 using:
>
> kill 13082
>
> Its not terminating RunJar.
>
> I have done a stop-all.sh hoping it would stop all the processes but only
> stopped the hadoop related processes.
> I am just wondering if it is necessary to stop all other processes before
> starting the hadoop process and how to stop these other processes.
>
> Here is the list of processes which r appearing:
>
>
> 30969 FileSystemCat
> 30877 FileSystemCat
> 5647 StreamCompressor
> 32200 DataNode
> 25015 Jps
> 2227 URLCat
> 5563 StreamCompressor
> 5398 StreamCompressor
> 13082 RunJar
> 32578 JobTracker
> 7215
> 385 TaskTracker
> 31884 NameNode
> 32489 SecondaryNameNode
>
> Thanks
> Sai
>

Re: Block vs FileSplit vs record vs line

Posted by Sai Sai <sa...@yahoo.in>.
Just wondering if this is right way to understand this:
A large file is split into multiple blocks and each block is split into multiple file splits and each file split has multiple records and each record has multiple lines. Each line is processed by 1 instance of mapper.
Any help is appreciated.
Thanks
Sai

Re: Block vs FileSplit vs record vs line

Posted by Sai Sai <sa...@yahoo.in>.
Just wondering if this is right way to understand this:
A large file is split into multiple blocks and each block is split into multiple file splits and each file split has multiple records and each record has multiple lines. Each line is processed by 1 instance of mapper.
Any help is appreciated.
Thanks
Sai

Re: Unknown processes unable to terminate

Posted by shashwat shriparv <dw...@gmail.com>.
You can you kill -9 13082

Is there eclipse or netbeans project running, that may the this process..



∞
Shashwat Shriparv



On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai <sa...@yahoo.in> wrote:

> I have a list of following processes given below, i am trying to kill the
> process 13082 using:
>
> kill 13082
>
> Its not terminating RunJar.
>
> I have done a stop-all.sh hoping it would stop all the processes but only
> stopped the hadoop related processes.
> I am just wondering if it is necessary to stop all other processes before
> starting the hadoop process and how to stop these other processes.
>
> Here is the list of processes which r appearing:
>
>
> 30969 FileSystemCat
> 30877 FileSystemCat
> 5647 StreamCompressor
> 32200 DataNode
> 25015 Jps
> 2227 URLCat
> 5563 StreamCompressor
> 5398 StreamCompressor
> 13082 RunJar
> 32578 JobTracker
> 7215
> 385 TaskTracker
> 31884 NameNode
> 32489 SecondaryNameNode
>
> Thanks
> Sai
>

Re: Block vs FileSplit vs record vs line

Posted by Sai Sai <sa...@yahoo.in>.
Just wondering if this is right way to understand this:
A large file is split into multiple blocks and each block is split into multiple file splits and each file split has multiple records and each record has multiple lines. Each line is processed by 1 instance of mapper.
Any help is appreciated.
Thanks
Sai

Re: Unknown processes unable to terminate

Posted by Sai Sai <sa...@yahoo.in>.
I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:



30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215 
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode


Thanks
Sai

Re: Trying to copy file to Hadoop file system from a program

Posted by Nitin Pawar <ni...@gmail.com>.
Sai,

just use 127.0.0.1 in all the URIs you have. Less complicated and easily
replaceable


On Sun, Feb 24, 2013 at 5:37 PM, sudhakara st <su...@gmail.com>wrote:

> Hi,
>
> Execute ifcongf find the IP of system
> and add line in /etc/host
> (your ip) ubuntu
>
> use URI string  : public static String fsURI = "hdfs://ubuntu:9000";
>
>
> On Sun, Feb 24, 2013 at 5:23 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>> Many Thanks Nitin for your quick reply.
>>
>> Heres what i have in my hosts file and i am running in VM i m assuming it
>> is the pseudo mode:
>>
>> *********************
>> 127.0.0.1    localhost.localdomain    localhost
>> #::1    ubuntu    localhost6.localdomain6    localhost6
>> #127.0.1.1    ubuntu
>> 127.0.0.1   ubuntu
>>
>> # The following lines are desirable for IPv6 capable hosts
>> ::1     localhost ip6-localhost ip6-loopback
>> fe00::0 ip6-localnet
>> ff00::0 ip6-mcastprefix
>> ff02::1 ip6-allnodes
>> ff02::2 ip6-allrouters
>> ff02::3 ip6-allhosts
>> *********************
>> In my masters i have:
>> ubuntu
>> In my slaves i have:
>> localhost
>> ***********************
>> My question is in my variable below:
>> public static String fsURI = "hdfs://master:9000";
>>
>> what would be the right value so i can connect to Hadoop successfully.
>> Please let me know if you need more info.
>> Thanks
>> Sai
>>
>>
>>
>>
>>
>>    ------------------------------
>> *From:* Nitin Pawar <ni...@gmail.com>
>> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> *Sent:* Sunday, 24 February 2013 3:42 AM
>> *Subject:* Re: Trying to copy file to Hadoop file system from a program
>>
>> if you want to use master as your hostname then make such entry in your
>> /etc/hosts file
>>
>> or change the hdfs://master to hdfs://localhost
>>
>>
>> On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>
>> Greetings,
>>
>> Below is the program i am trying to run and getting this exception:
>>  ***************************************
>> Test Start.....
>> java.net.UnknownHostException: unknown host: master
>>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>>     at $Proxy1.getProtocolVersion(Unknown Source)
>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>>     at
>> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>>
>>
>> ********************
>>
>> public class HdpTest {
>>
>>     public static String fsURI = "hdfs://master:9000";
>>
>>
>>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>>             String dstFile) throws IOException {
>>         try {
>>             System.out.println("Initialize copy...");
>>             URI suri = new URI(srcFile);
>>             URI duri = new URI(fsURI + "/" + dstFile);
>>             Path dst = new Path(duri.toString());
>>             Path src = new Path(suri.toString());
>>             System.out.println("Start copy...");
>>             fs.copyFromLocalFile(src, dst);
>>             System.out.println("End copy...");
>>         } catch (Exception e) {
>>             e.printStackTrace();
>>         }
>>     }
>>
>>     public static void main(String[] args) {
>>         try {
>>             System.out.println("Test Start.....");
>>             Configuration conf = new Configuration();
>>             DistributedFileSystem fs = new DistributedFileSystem();
>>             URI duri = new URI(fsURI);
>>             fs.initialize(duri, conf); // Here is the xception occuring
>>             long start = 0, end = 0;
>>             start = System.nanoTime();
>>             //writing data from local to HDFS
>>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>>                     "/input/raptor/trade1.txt");
>>             //Writing data from HDFS to Local
>> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
>> "/home/kosmos/Work/input/wordpair1.txt");
>>             end = System.nanoTime();
>>             System.out.println("Total Execution times: " + (end - start));
>>             fs.close();
>>         } catch (Throwable t) {
>>             t.printStackTrace();
>>         }
>>     }
>> ******************************
>> I am trying to access in FireFox this url:
>>  hdfs://master:9000
>>
>>  Get an error msg FF does not know how to display this message.
>>
>>  I can successfully access my admin page:
>>
>>  http://localhost:50070/dfshealth.jsp
>>
>> Just wondering if anyone can give me any suggestions, your help will be
>> really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>>
>>
>
>
> --
>
> Regards,
> .....  Sudhakara.st
>
>



-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by Nitin Pawar <ni...@gmail.com>.
Sai,

just use 127.0.0.1 in all the URIs you have. Less complicated and easily
replaceable


On Sun, Feb 24, 2013 at 5:37 PM, sudhakara st <su...@gmail.com>wrote:

> Hi,
>
> Execute ifcongf find the IP of system
> and add line in /etc/host
> (your ip) ubuntu
>
> use URI string  : public static String fsURI = "hdfs://ubuntu:9000";
>
>
> On Sun, Feb 24, 2013 at 5:23 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>> Many Thanks Nitin for your quick reply.
>>
>> Heres what i have in my hosts file and i am running in VM i m assuming it
>> is the pseudo mode:
>>
>> *********************
>> 127.0.0.1    localhost.localdomain    localhost
>> #::1    ubuntu    localhost6.localdomain6    localhost6
>> #127.0.1.1    ubuntu
>> 127.0.0.1   ubuntu
>>
>> # The following lines are desirable for IPv6 capable hosts
>> ::1     localhost ip6-localhost ip6-loopback
>> fe00::0 ip6-localnet
>> ff00::0 ip6-mcastprefix
>> ff02::1 ip6-allnodes
>> ff02::2 ip6-allrouters
>> ff02::3 ip6-allhosts
>> *********************
>> In my masters i have:
>> ubuntu
>> In my slaves i have:
>> localhost
>> ***********************
>> My question is in my variable below:
>> public static String fsURI = "hdfs://master:9000";
>>
>> what would be the right value so i can connect to Hadoop successfully.
>> Please let me know if you need more info.
>> Thanks
>> Sai
>>
>>
>>
>>
>>
>>    ------------------------------
>> *From:* Nitin Pawar <ni...@gmail.com>
>> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> *Sent:* Sunday, 24 February 2013 3:42 AM
>> *Subject:* Re: Trying to copy file to Hadoop file system from a program
>>
>> if you want to use master as your hostname then make such entry in your
>> /etc/hosts file
>>
>> or change the hdfs://master to hdfs://localhost
>>
>>
>> On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>
>> Greetings,
>>
>> Below is the program i am trying to run and getting this exception:
>>  ***************************************
>> Test Start.....
>> java.net.UnknownHostException: unknown host: master
>>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>>     at $Proxy1.getProtocolVersion(Unknown Source)
>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>>     at
>> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>>
>>
>> ********************
>>
>> public class HdpTest {
>>
>>     public static String fsURI = "hdfs://master:9000";
>>
>>
>>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>>             String dstFile) throws IOException {
>>         try {
>>             System.out.println("Initialize copy...");
>>             URI suri = new URI(srcFile);
>>             URI duri = new URI(fsURI + "/" + dstFile);
>>             Path dst = new Path(duri.toString());
>>             Path src = new Path(suri.toString());
>>             System.out.println("Start copy...");
>>             fs.copyFromLocalFile(src, dst);
>>             System.out.println("End copy...");
>>         } catch (Exception e) {
>>             e.printStackTrace();
>>         }
>>     }
>>
>>     public static void main(String[] args) {
>>         try {
>>             System.out.println("Test Start.....");
>>             Configuration conf = new Configuration();
>>             DistributedFileSystem fs = new DistributedFileSystem();
>>             URI duri = new URI(fsURI);
>>             fs.initialize(duri, conf); // Here is the xception occuring
>>             long start = 0, end = 0;
>>             start = System.nanoTime();
>>             //writing data from local to HDFS
>>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>>                     "/input/raptor/trade1.txt");
>>             //Writing data from HDFS to Local
>> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
>> "/home/kosmos/Work/input/wordpair1.txt");
>>             end = System.nanoTime();
>>             System.out.println("Total Execution times: " + (end - start));
>>             fs.close();
>>         } catch (Throwable t) {
>>             t.printStackTrace();
>>         }
>>     }
>> ******************************
>> I am trying to access in FireFox this url:
>>  hdfs://master:9000
>>
>>  Get an error msg FF does not know how to display this message.
>>
>>  I can successfully access my admin page:
>>
>>  http://localhost:50070/dfshealth.jsp
>>
>> Just wondering if anyone can give me any suggestions, your help will be
>> really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>>
>>
>
>
> --
>
> Regards,
> .....  Sudhakara.st
>
>



-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by Nitin Pawar <ni...@gmail.com>.
Sai,

just use 127.0.0.1 in all the URIs you have. Less complicated and easily
replaceable


On Sun, Feb 24, 2013 at 5:37 PM, sudhakara st <su...@gmail.com>wrote:

> Hi,
>
> Execute ifcongf find the IP of system
> and add line in /etc/host
> (your ip) ubuntu
>
> use URI string  : public static String fsURI = "hdfs://ubuntu:9000";
>
>
> On Sun, Feb 24, 2013 at 5:23 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>> Many Thanks Nitin for your quick reply.
>>
>> Heres what i have in my hosts file and i am running in VM i m assuming it
>> is the pseudo mode:
>>
>> *********************
>> 127.0.0.1    localhost.localdomain    localhost
>> #::1    ubuntu    localhost6.localdomain6    localhost6
>> #127.0.1.1    ubuntu
>> 127.0.0.1   ubuntu
>>
>> # The following lines are desirable for IPv6 capable hosts
>> ::1     localhost ip6-localhost ip6-loopback
>> fe00::0 ip6-localnet
>> ff00::0 ip6-mcastprefix
>> ff02::1 ip6-allnodes
>> ff02::2 ip6-allrouters
>> ff02::3 ip6-allhosts
>> *********************
>> In my masters i have:
>> ubuntu
>> In my slaves i have:
>> localhost
>> ***********************
>> My question is in my variable below:
>> public static String fsURI = "hdfs://master:9000";
>>
>> what would be the right value so i can connect to Hadoop successfully.
>> Please let me know if you need more info.
>> Thanks
>> Sai
>>
>>
>>
>>
>>
>>    ------------------------------
>> *From:* Nitin Pawar <ni...@gmail.com>
>> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> *Sent:* Sunday, 24 February 2013 3:42 AM
>> *Subject:* Re: Trying to copy file to Hadoop file system from a program
>>
>> if you want to use master as your hostname then make such entry in your
>> /etc/hosts file
>>
>> or change the hdfs://master to hdfs://localhost
>>
>>
>> On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>
>> Greetings,
>>
>> Below is the program i am trying to run and getting this exception:
>>  ***************************************
>> Test Start.....
>> java.net.UnknownHostException: unknown host: master
>>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>>     at $Proxy1.getProtocolVersion(Unknown Source)
>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>>     at
>> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>>
>>
>> ********************
>>
>> public class HdpTest {
>>
>>     public static String fsURI = "hdfs://master:9000";
>>
>>
>>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>>             String dstFile) throws IOException {
>>         try {
>>             System.out.println("Initialize copy...");
>>             URI suri = new URI(srcFile);
>>             URI duri = new URI(fsURI + "/" + dstFile);
>>             Path dst = new Path(duri.toString());
>>             Path src = new Path(suri.toString());
>>             System.out.println("Start copy...");
>>             fs.copyFromLocalFile(src, dst);
>>             System.out.println("End copy...");
>>         } catch (Exception e) {
>>             e.printStackTrace();
>>         }
>>     }
>>
>>     public static void main(String[] args) {
>>         try {
>>             System.out.println("Test Start.....");
>>             Configuration conf = new Configuration();
>>             DistributedFileSystem fs = new DistributedFileSystem();
>>             URI duri = new URI(fsURI);
>>             fs.initialize(duri, conf); // Here is the xception occuring
>>             long start = 0, end = 0;
>>             start = System.nanoTime();
>>             //writing data from local to HDFS
>>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>>                     "/input/raptor/trade1.txt");
>>             //Writing data from HDFS to Local
>> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
>> "/home/kosmos/Work/input/wordpair1.txt");
>>             end = System.nanoTime();
>>             System.out.println("Total Execution times: " + (end - start));
>>             fs.close();
>>         } catch (Throwable t) {
>>             t.printStackTrace();
>>         }
>>     }
>> ******************************
>> I am trying to access in FireFox this url:
>>  hdfs://master:9000
>>
>>  Get an error msg FF does not know how to display this message.
>>
>>  I can successfully access my admin page:
>>
>>  http://localhost:50070/dfshealth.jsp
>>
>> Just wondering if anyone can give me any suggestions, your help will be
>> really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>>
>>
>
>
> --
>
> Regards,
> .....  Sudhakara.st
>
>



-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by Nitin Pawar <ni...@gmail.com>.
Sai,

just use 127.0.0.1 in all the URIs you have. Less complicated and easily
replaceable


On Sun, Feb 24, 2013 at 5:37 PM, sudhakara st <su...@gmail.com>wrote:

> Hi,
>
> Execute ifcongf find the IP of system
> and add line in /etc/host
> (your ip) ubuntu
>
> use URI string  : public static String fsURI = "hdfs://ubuntu:9000";
>
>
> On Sun, Feb 24, 2013 at 5:23 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>> Many Thanks Nitin for your quick reply.
>>
>> Heres what i have in my hosts file and i am running in VM i m assuming it
>> is the pseudo mode:
>>
>> *********************
>> 127.0.0.1    localhost.localdomain    localhost
>> #::1    ubuntu    localhost6.localdomain6    localhost6
>> #127.0.1.1    ubuntu
>> 127.0.0.1   ubuntu
>>
>> # The following lines are desirable for IPv6 capable hosts
>> ::1     localhost ip6-localhost ip6-loopback
>> fe00::0 ip6-localnet
>> ff00::0 ip6-mcastprefix
>> ff02::1 ip6-allnodes
>> ff02::2 ip6-allrouters
>> ff02::3 ip6-allhosts
>> *********************
>> In my masters i have:
>> ubuntu
>> In my slaves i have:
>> localhost
>> ***********************
>> My question is in my variable below:
>> public static String fsURI = "hdfs://master:9000";
>>
>> what would be the right value so i can connect to Hadoop successfully.
>> Please let me know if you need more info.
>> Thanks
>> Sai
>>
>>
>>
>>
>>
>>    ------------------------------
>> *From:* Nitin Pawar <ni...@gmail.com>
>> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> *Sent:* Sunday, 24 February 2013 3:42 AM
>> *Subject:* Re: Trying to copy file to Hadoop file system from a program
>>
>> if you want to use master as your hostname then make such entry in your
>> /etc/hosts file
>>
>> or change the hdfs://master to hdfs://localhost
>>
>>
>> On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>
>> Greetings,
>>
>> Below is the program i am trying to run and getting this exception:
>>  ***************************************
>> Test Start.....
>> java.net.UnknownHostException: unknown host: master
>>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>>     at $Proxy1.getProtocolVersion(Unknown Source)
>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>>     at
>> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>>     at
>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>>
>>
>> ********************
>>
>> public class HdpTest {
>>
>>     public static String fsURI = "hdfs://master:9000";
>>
>>
>>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>>             String dstFile) throws IOException {
>>         try {
>>             System.out.println("Initialize copy...");
>>             URI suri = new URI(srcFile);
>>             URI duri = new URI(fsURI + "/" + dstFile);
>>             Path dst = new Path(duri.toString());
>>             Path src = new Path(suri.toString());
>>             System.out.println("Start copy...");
>>             fs.copyFromLocalFile(src, dst);
>>             System.out.println("End copy...");
>>         } catch (Exception e) {
>>             e.printStackTrace();
>>         }
>>     }
>>
>>     public static void main(String[] args) {
>>         try {
>>             System.out.println("Test Start.....");
>>             Configuration conf = new Configuration();
>>             DistributedFileSystem fs = new DistributedFileSystem();
>>             URI duri = new URI(fsURI);
>>             fs.initialize(duri, conf); // Here is the xception occuring
>>             long start = 0, end = 0;
>>             start = System.nanoTime();
>>             //writing data from local to HDFS
>>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>>                     "/input/raptor/trade1.txt");
>>             //Writing data from HDFS to Local
>> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
>> "/home/kosmos/Work/input/wordpair1.txt");
>>             end = System.nanoTime();
>>             System.out.println("Total Execution times: " + (end - start));
>>             fs.close();
>>         } catch (Throwable t) {
>>             t.printStackTrace();
>>         }
>>     }
>> ******************************
>> I am trying to access in FireFox this url:
>>  hdfs://master:9000
>>
>>  Get an error msg FF does not know how to display this message.
>>
>>  I can successfully access my admin page:
>>
>>  http://localhost:50070/dfshealth.jsp
>>
>> Just wondering if anyone can give me any suggestions, your help will be
>> really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>>
>>
>
>
> --
>
> Regards,
> .....  Sudhakara.st
>
>



-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by sudhakara st <su...@gmail.com>.
Hi,

Execute ifcongf find the IP of system
and add line in /etc/host
(your ip) ubuntu

use URI string  : public static String fsURI = "hdfs://ubuntu:9000";

On Sun, Feb 24, 2013 at 5:23 PM, Sai Sai <sa...@yahoo.in> wrote:

> Many Thanks Nitin for your quick reply.
>
> Heres what i have in my hosts file and i am running in VM i m assuming it
> is the pseudo mode:
>
> *********************
> 127.0.0.1    localhost.localdomain    localhost
> #::1    ubuntu    localhost6.localdomain6    localhost6
> #127.0.1.1    ubuntu
> 127.0.0.1   ubuntu
>
> # The following lines are desirable for IPv6 capable hosts
> ::1     localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> *********************
> In my masters i have:
> ubuntu
> In my slaves i have:
> localhost
> ***********************
> My question is in my variable below:
> public static String fsURI = "hdfs://master:9000";
>
> what would be the right value so i can connect to Hadoop successfully.
> Please let me know if you need more info.
> Thanks
> Sai
>
>
>
>
>
>   ------------------------------
> *From:* Nitin Pawar <ni...@gmail.com>
> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> *Sent:* Sunday, 24 February 2013 3:42 AM
> *Subject:* Re: Trying to copy file to Hadoop file system from a program
>
> if you want to use master as your hostname then make such entry in your
> /etc/hosts file
>
> or change the hdfs://master to hdfs://localhost
>
>
> On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>
> Greetings,
>
> Below is the program i am trying to run and getting this exception:
> ***************************************
> Test Start.....
> java.net.UnknownHostException: unknown host: master
>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy1.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>     at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
> ********************
>
> public class HdpTest {
>
>     public static String fsURI = "hdfs://master:9000";
>
>
>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>             String dstFile) throws IOException {
>         try {
>             System.out.println("Initialize copy...");
>             URI suri = new URI(srcFile);
>             URI duri = new URI(fsURI + "/" + dstFile);
>             Path dst = new Path(duri.toString());
>             Path src = new Path(suri.toString());
>             System.out.println("Start copy...");
>             fs.copyFromLocalFile(src, dst);
>             System.out.println("End copy...");
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
>
>     public static void main(String[] args) {
>         try {
>             System.out.println("Test Start.....");
>             Configuration conf = new Configuration();
>             DistributedFileSystem fs = new DistributedFileSystem();
>             URI duri = new URI(fsURI);
>             fs.initialize(duri, conf); // Here is the xception occuring
>             long start = 0, end = 0;
>             start = System.nanoTime();
>             //writing data from local to HDFS
>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                     "/input/raptor/trade1.txt");
>             //Writing data from HDFS to Local
> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
> "/home/kosmos/Work/input/wordpair1.txt");
>             end = System.nanoTime();
>             System.out.println("Total Execution times: " + (end - start));
>             fs.close();
>         } catch (Throwable t) {
>             t.printStackTrace();
>         }
>     }
> ******************************
> I am trying to access in FireFox this url:
>  hdfs://master:9000
>
>  Get an error msg FF does not know how to display this message.
>
>  I can successfully access my admin page:
>
>  http://localhost:50070/dfshealth.jsp
>
> Just wondering if anyone can give me any suggestions, your help will be
> really appreciated.
> Thanks
> Sai
>
>
>
>
> --
> Nitin Pawar
>
>
>


-- 

Regards,
.....  Sudhakara.st

Re: Unknown processes unable to terminate

Posted by Sai Sai <sa...@yahoo.in>.
I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:



30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215 
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode


Thanks
Sai

Re: Trying to copy file to Hadoop file system from a program

Posted by sudhakara st <su...@gmail.com>.
Hi,

Execute ifcongf find the IP of system
and add line in /etc/host
(your ip) ubuntu

use URI string  : public static String fsURI = "hdfs://ubuntu:9000";

On Sun, Feb 24, 2013 at 5:23 PM, Sai Sai <sa...@yahoo.in> wrote:

> Many Thanks Nitin for your quick reply.
>
> Heres what i have in my hosts file and i am running in VM i m assuming it
> is the pseudo mode:
>
> *********************
> 127.0.0.1    localhost.localdomain    localhost
> #::1    ubuntu    localhost6.localdomain6    localhost6
> #127.0.1.1    ubuntu
> 127.0.0.1   ubuntu
>
> # The following lines are desirable for IPv6 capable hosts
> ::1     localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> *********************
> In my masters i have:
> ubuntu
> In my slaves i have:
> localhost
> ***********************
> My question is in my variable below:
> public static String fsURI = "hdfs://master:9000";
>
> what would be the right value so i can connect to Hadoop successfully.
> Please let me know if you need more info.
> Thanks
> Sai
>
>
>
>
>
>   ------------------------------
> *From:* Nitin Pawar <ni...@gmail.com>
> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> *Sent:* Sunday, 24 February 2013 3:42 AM
> *Subject:* Re: Trying to copy file to Hadoop file system from a program
>
> if you want to use master as your hostname then make such entry in your
> /etc/hosts file
>
> or change the hdfs://master to hdfs://localhost
>
>
> On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>
> Greetings,
>
> Below is the program i am trying to run and getting this exception:
> ***************************************
> Test Start.....
> java.net.UnknownHostException: unknown host: master
>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy1.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>     at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
> ********************
>
> public class HdpTest {
>
>     public static String fsURI = "hdfs://master:9000";
>
>
>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>             String dstFile) throws IOException {
>         try {
>             System.out.println("Initialize copy...");
>             URI suri = new URI(srcFile);
>             URI duri = new URI(fsURI + "/" + dstFile);
>             Path dst = new Path(duri.toString());
>             Path src = new Path(suri.toString());
>             System.out.println("Start copy...");
>             fs.copyFromLocalFile(src, dst);
>             System.out.println("End copy...");
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
>
>     public static void main(String[] args) {
>         try {
>             System.out.println("Test Start.....");
>             Configuration conf = new Configuration();
>             DistributedFileSystem fs = new DistributedFileSystem();
>             URI duri = new URI(fsURI);
>             fs.initialize(duri, conf); // Here is the xception occuring
>             long start = 0, end = 0;
>             start = System.nanoTime();
>             //writing data from local to HDFS
>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                     "/input/raptor/trade1.txt");
>             //Writing data from HDFS to Local
> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
> "/home/kosmos/Work/input/wordpair1.txt");
>             end = System.nanoTime();
>             System.out.println("Total Execution times: " + (end - start));
>             fs.close();
>         } catch (Throwable t) {
>             t.printStackTrace();
>         }
>     }
> ******************************
> I am trying to access in FireFox this url:
>  hdfs://master:9000
>
>  Get an error msg FF does not know how to display this message.
>
>  I can successfully access my admin page:
>
>  http://localhost:50070/dfshealth.jsp
>
> Just wondering if anyone can give me any suggestions, your help will be
> really appreciated.
> Thanks
> Sai
>
>
>
>
> --
> Nitin Pawar
>
>
>


-- 

Regards,
.....  Sudhakara.st

Re: Unknown processes unable to terminate

Posted by Sai Sai <sa...@yahoo.in>.
I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:



30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215 
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode


Thanks
Sai

Re: Unknown processes unable to terminate

Posted by Sai Sai <sa...@yahoo.in>.
I have a list of following processes given below, i am trying to kill the process 13082 using:

kill 13082

Its not terminating RunJar.

I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes.
I am just wondering if it is necessary to stop all other processes before starting the hadoop process and how to stop these other processes.

Here is the list of processes which r appearing:



30969 FileSystemCat
30877 FileSystemCat
5647 StreamCompressor
32200 DataNode
25015 Jps
2227 URLCat
5563 StreamCompressor
5398 StreamCompressor
13082 RunJar
32578 JobTracker
7215 
385 TaskTracker
31884 NameNode
32489 SecondaryNameNode


Thanks
Sai

Re: Trying to copy file to Hadoop file system from a program

Posted by sudhakara st <su...@gmail.com>.
Hi,

Execute ifcongf find the IP of system
and add line in /etc/host
(your ip) ubuntu

use URI string  : public static String fsURI = "hdfs://ubuntu:9000";

On Sun, Feb 24, 2013 at 5:23 PM, Sai Sai <sa...@yahoo.in> wrote:

> Many Thanks Nitin for your quick reply.
>
> Heres what i have in my hosts file and i am running in VM i m assuming it
> is the pseudo mode:
>
> *********************
> 127.0.0.1    localhost.localdomain    localhost
> #::1    ubuntu    localhost6.localdomain6    localhost6
> #127.0.1.1    ubuntu
> 127.0.0.1   ubuntu
>
> # The following lines are desirable for IPv6 capable hosts
> ::1     localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> *********************
> In my masters i have:
> ubuntu
> In my slaves i have:
> localhost
> ***********************
> My question is in my variable below:
> public static String fsURI = "hdfs://master:9000";
>
> what would be the right value so i can connect to Hadoop successfully.
> Please let me know if you need more info.
> Thanks
> Sai
>
>
>
>
>
>   ------------------------------
> *From:* Nitin Pawar <ni...@gmail.com>
> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> *Sent:* Sunday, 24 February 2013 3:42 AM
> *Subject:* Re: Trying to copy file to Hadoop file system from a program
>
> if you want to use master as your hostname then make such entry in your
> /etc/hosts file
>
> or change the hdfs://master to hdfs://localhost
>
>
> On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>
> Greetings,
>
> Below is the program i am trying to run and getting this exception:
> ***************************************
> Test Start.....
> java.net.UnknownHostException: unknown host: master
>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy1.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>     at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
> ********************
>
> public class HdpTest {
>
>     public static String fsURI = "hdfs://master:9000";
>
>
>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>             String dstFile) throws IOException {
>         try {
>             System.out.println("Initialize copy...");
>             URI suri = new URI(srcFile);
>             URI duri = new URI(fsURI + "/" + dstFile);
>             Path dst = new Path(duri.toString());
>             Path src = new Path(suri.toString());
>             System.out.println("Start copy...");
>             fs.copyFromLocalFile(src, dst);
>             System.out.println("End copy...");
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
>
>     public static void main(String[] args) {
>         try {
>             System.out.println("Test Start.....");
>             Configuration conf = new Configuration();
>             DistributedFileSystem fs = new DistributedFileSystem();
>             URI duri = new URI(fsURI);
>             fs.initialize(duri, conf); // Here is the xception occuring
>             long start = 0, end = 0;
>             start = System.nanoTime();
>             //writing data from local to HDFS
>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                     "/input/raptor/trade1.txt");
>             //Writing data from HDFS to Local
> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
> "/home/kosmos/Work/input/wordpair1.txt");
>             end = System.nanoTime();
>             System.out.println("Total Execution times: " + (end - start));
>             fs.close();
>         } catch (Throwable t) {
>             t.printStackTrace();
>         }
>     }
> ******************************
> I am trying to access in FireFox this url:
>  hdfs://master:9000
>
>  Get an error msg FF does not know how to display this message.
>
>  I can successfully access my admin page:
>
>  http://localhost:50070/dfshealth.jsp
>
> Just wondering if anyone can give me any suggestions, your help will be
> really appreciated.
> Thanks
> Sai
>
>
>
>
> --
> Nitin Pawar
>
>
>


-- 

Regards,
.....  Sudhakara.st

Re: Trying to copy file to Hadoop file system from a program

Posted by sudhakara st <su...@gmail.com>.
Hi,

Execute ifcongf find the IP of system
and add line in /etc/host
(your ip) ubuntu

use URI string  : public static String fsURI = "hdfs://ubuntu:9000";

On Sun, Feb 24, 2013 at 5:23 PM, Sai Sai <sa...@yahoo.in> wrote:

> Many Thanks Nitin for your quick reply.
>
> Heres what i have in my hosts file and i am running in VM i m assuming it
> is the pseudo mode:
>
> *********************
> 127.0.0.1    localhost.localdomain    localhost
> #::1    ubuntu    localhost6.localdomain6    localhost6
> #127.0.1.1    ubuntu
> 127.0.0.1   ubuntu
>
> # The following lines are desirable for IPv6 capable hosts
> ::1     localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> *********************
> In my masters i have:
> ubuntu
> In my slaves i have:
> localhost
> ***********************
> My question is in my variable below:
> public static String fsURI = "hdfs://master:9000";
>
> what would be the right value so i can connect to Hadoop successfully.
> Please let me know if you need more info.
> Thanks
> Sai
>
>
>
>
>
>   ------------------------------
> *From:* Nitin Pawar <ni...@gmail.com>
> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> *Sent:* Sunday, 24 February 2013 3:42 AM
> *Subject:* Re: Trying to copy file to Hadoop file system from a program
>
> if you want to use master as your hostname then make such entry in your
> /etc/hosts file
>
> or change the hdfs://master to hdfs://localhost
>
>
> On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>
> Greetings,
>
> Below is the program i am trying to run and getting this exception:
> ***************************************
> Test Start.....
> java.net.UnknownHostException: unknown host: master
>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy1.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>     at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
> ********************
>
> public class HdpTest {
>
>     public static String fsURI = "hdfs://master:9000";
>
>
>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>             String dstFile) throws IOException {
>         try {
>             System.out.println("Initialize copy...");
>             URI suri = new URI(srcFile);
>             URI duri = new URI(fsURI + "/" + dstFile);
>             Path dst = new Path(duri.toString());
>             Path src = new Path(suri.toString());
>             System.out.println("Start copy...");
>             fs.copyFromLocalFile(src, dst);
>             System.out.println("End copy...");
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
>
>     public static void main(String[] args) {
>         try {
>             System.out.println("Test Start.....");
>             Configuration conf = new Configuration();
>             DistributedFileSystem fs = new DistributedFileSystem();
>             URI duri = new URI(fsURI);
>             fs.initialize(duri, conf); // Here is the xception occuring
>             long start = 0, end = 0;
>             start = System.nanoTime();
>             //writing data from local to HDFS
>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                     "/input/raptor/trade1.txt");
>             //Writing data from HDFS to Local
> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
> "/home/kosmos/Work/input/wordpair1.txt");
>             end = System.nanoTime();
>             System.out.println("Total Execution times: " + (end - start));
>             fs.close();
>         } catch (Throwable t) {
>             t.printStackTrace();
>         }
>     }
> ******************************
> I am trying to access in FireFox this url:
>  hdfs://master:9000
>
>  Get an error msg FF does not know how to display this message.
>
>  I can successfully access my admin page:
>
>  http://localhost:50070/dfshealth.jsp
>
> Just wondering if anyone can give me any suggestions, your help will be
> really appreciated.
> Thanks
> Sai
>
>
>
>
> --
> Nitin Pawar
>
>
>


-- 

Regards,
.....  Sudhakara.st

Re: Trying to copy file to Hadoop file system from a program

Posted by Sai Sai <sa...@yahoo.in>.
Many Thanks Nitin for your quick reply.

Heres what i have in my hosts file and i am running in VM i m assuming it is the pseudo mode:

*********************
127.0.0.1    localhost.localdomain    localhost
#::1    ubuntu    localhost6.localdomain6    localhost6
#127.0.1.1    ubuntu
127.0.0.1   ubuntu

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
*********************
In my masters i have:
ubuntu
In my slaves i have:
localhost
***********************
My question is in my variable below:
public static String fsURI = "hdfs://master:9000";

what would be the right value so i can connect to Hadoop successfully.
Please let me know if you need more info.
Thanks
Sai







________________________________
 From: Nitin Pawar <ni...@gmail.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Sunday, 24 February 2013 3:42 AM
Subject: Re: Trying to copy file to Hadoop file system from a program
 

if you want to use master as your hostname then make such entry in your /etc/hosts file 

or change the hdfs://master to hdfs://localhost 



On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:


>
>Greetings,
>
>
>Below is the program i am trying to run and getting this exception:
>***************************************
>
>Test Start.....
>java.net.UnknownHostException: unknown host: master
>    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>    at $Proxy1.getProtocolVersion(Unknown Source)
>    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>    at
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>    at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
>
>
>********************
>
>
>
>public class HdpTest {
>    
>    public static String fsURI = "hdfs://master:9000";
>
>    
>    public static void copyFileToDFS(FileSystem fs, String srcFile,
>   
         String dstFile) throws IOException {
>        try {
>            System.out.println("Initialize copy...");
>            URI suri = new URI(srcFile);
>            URI duri = new URI(fsURI + "/" + dstFile);
>            Path dst = new Path(duri.toString());
>            Path src = new Path(suri.toString());
>            System.out.println("Start copy...");
>            fs.copyFromLocalFile(src, dst);
>            System.out.println("End copy...");
>        } catch (Exception e)
 {
>            e.printStackTrace();
>        }
>    }
>
>    public static void main(String[] args) {
>        try {
>            System.out.println("Test Start.....");
>            Configuration conf = new Configuration();
>            DistributedFileSystem fs = new DistributedFileSystem();
>            URI duri = new URI(fsURI);
>            fs.initialize(duri, conf); // Here is the xception occuring
>            long start = 0, end = 0;
>       
     start = System.nanoTime();
>            //writing data from local to HDFS
>            copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                    "/input/raptor/trade1.txt");
>            //Writing data from HDFS to Local
>//             copyFileFromDFS(fs, "/input/raptor/trade1.txt", "/home/kosmos/Work/input/wordpair1.txt");
>            end = System.nanoTime();
>            System.out.println("Total Execution times: " + (end - start));
>            fs.close();
>        } catch
 (Throwable t) {
>            t.printStackTrace();
>        }
>    }
>
>******************************
>I am trying to access in FireFox this url: 
>
>hdfs://master:9000
>
>
>Get an error msg FF does not know how to display this message.
>
>
>I can successfully access my admin page:
>
>
>http://localhost:50070/dfshealth.jsp
>
>
>Just wondering if anyone can give me any suggestions, your help will be really appreciated.
>ThanksSai
>
>
>


-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by Sai Sai <sa...@yahoo.in>.
Many Thanks Nitin for your quick reply.

Heres what i have in my hosts file and i am running in VM i m assuming it is the pseudo mode:

*********************
127.0.0.1    localhost.localdomain    localhost
#::1    ubuntu    localhost6.localdomain6    localhost6
#127.0.1.1    ubuntu
127.0.0.1   ubuntu

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
*********************
In my masters i have:
ubuntu
In my slaves i have:
localhost
***********************
My question is in my variable below:
public static String fsURI = "hdfs://master:9000";

what would be the right value so i can connect to Hadoop successfully.
Please let me know if you need more info.
Thanks
Sai







________________________________
 From: Nitin Pawar <ni...@gmail.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Sunday, 24 February 2013 3:42 AM
Subject: Re: Trying to copy file to Hadoop file system from a program
 

if you want to use master as your hostname then make such entry in your /etc/hosts file 

or change the hdfs://master to hdfs://localhost 



On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:


>
>Greetings,
>
>
>Below is the program i am trying to run and getting this exception:
>***************************************
>
>Test Start.....
>java.net.UnknownHostException: unknown host: master
>    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>    at $Proxy1.getProtocolVersion(Unknown Source)
>    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>    at
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>    at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
>
>
>********************
>
>
>
>public class HdpTest {
>    
>    public static String fsURI = "hdfs://master:9000";
>
>    
>    public static void copyFileToDFS(FileSystem fs, String srcFile,
>   
         String dstFile) throws IOException {
>        try {
>            System.out.println("Initialize copy...");
>            URI suri = new URI(srcFile);
>            URI duri = new URI(fsURI + "/" + dstFile);
>            Path dst = new Path(duri.toString());
>            Path src = new Path(suri.toString());
>            System.out.println("Start copy...");
>            fs.copyFromLocalFile(src, dst);
>            System.out.println("End copy...");
>        } catch (Exception e)
 {
>            e.printStackTrace();
>        }
>    }
>
>    public static void main(String[] args) {
>        try {
>            System.out.println("Test Start.....");
>            Configuration conf = new Configuration();
>            DistributedFileSystem fs = new DistributedFileSystem();
>            URI duri = new URI(fsURI);
>            fs.initialize(duri, conf); // Here is the xception occuring
>            long start = 0, end = 0;
>       
     start = System.nanoTime();
>            //writing data from local to HDFS
>            copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                    "/input/raptor/trade1.txt");
>            //Writing data from HDFS to Local
>//             copyFileFromDFS(fs, "/input/raptor/trade1.txt", "/home/kosmos/Work/input/wordpair1.txt");
>            end = System.nanoTime();
>            System.out.println("Total Execution times: " + (end - start));
>            fs.close();
>        } catch
 (Throwable t) {
>            t.printStackTrace();
>        }
>    }
>
>******************************
>I am trying to access in FireFox this url: 
>
>hdfs://master:9000
>
>
>Get an error msg FF does not know how to display this message.
>
>
>I can successfully access my admin page:
>
>
>http://localhost:50070/dfshealth.jsp
>
>
>Just wondering if anyone can give me any suggestions, your help will be really appreciated.
>ThanksSai
>
>
>


-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by Sai Sai <sa...@yahoo.in>.
Many Thanks Nitin for your quick reply.

Heres what i have in my hosts file and i am running in VM i m assuming it is the pseudo mode:

*********************
127.0.0.1    localhost.localdomain    localhost
#::1    ubuntu    localhost6.localdomain6    localhost6
#127.0.1.1    ubuntu
127.0.0.1   ubuntu

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
*********************
In my masters i have:
ubuntu
In my slaves i have:
localhost
***********************
My question is in my variable below:
public static String fsURI = "hdfs://master:9000";

what would be the right value so i can connect to Hadoop successfully.
Please let me know if you need more info.
Thanks
Sai







________________________________
 From: Nitin Pawar <ni...@gmail.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Sunday, 24 February 2013 3:42 AM
Subject: Re: Trying to copy file to Hadoop file system from a program
 

if you want to use master as your hostname then make such entry in your /etc/hosts file 

or change the hdfs://master to hdfs://localhost 



On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:


>
>Greetings,
>
>
>Below is the program i am trying to run and getting this exception:
>***************************************
>
>Test Start.....
>java.net.UnknownHostException: unknown host: master
>    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>    at $Proxy1.getProtocolVersion(Unknown Source)
>    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>    at
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>    at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
>
>
>********************
>
>
>
>public class HdpTest {
>    
>    public static String fsURI = "hdfs://master:9000";
>
>    
>    public static void copyFileToDFS(FileSystem fs, String srcFile,
>   
         String dstFile) throws IOException {
>        try {
>            System.out.println("Initialize copy...");
>            URI suri = new URI(srcFile);
>            URI duri = new URI(fsURI + "/" + dstFile);
>            Path dst = new Path(duri.toString());
>            Path src = new Path(suri.toString());
>            System.out.println("Start copy...");
>            fs.copyFromLocalFile(src, dst);
>            System.out.println("End copy...");
>        } catch (Exception e)
 {
>            e.printStackTrace();
>        }
>    }
>
>    public static void main(String[] args) {
>        try {
>            System.out.println("Test Start.....");
>            Configuration conf = new Configuration();
>            DistributedFileSystem fs = new DistributedFileSystem();
>            URI duri = new URI(fsURI);
>            fs.initialize(duri, conf); // Here is the xception occuring
>            long start = 0, end = 0;
>       
     start = System.nanoTime();
>            //writing data from local to HDFS
>            copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                    "/input/raptor/trade1.txt");
>            //Writing data from HDFS to Local
>//             copyFileFromDFS(fs, "/input/raptor/trade1.txt", "/home/kosmos/Work/input/wordpair1.txt");
>            end = System.nanoTime();
>            System.out.println("Total Execution times: " + (end - start));
>            fs.close();
>        } catch
 (Throwable t) {
>            t.printStackTrace();
>        }
>    }
>
>******************************
>I am trying to access in FireFox this url: 
>
>hdfs://master:9000
>
>
>Get an error msg FF does not know how to display this message.
>
>
>I can successfully access my admin page:
>
>
>http://localhost:50070/dfshealth.jsp
>
>
>Just wondering if anyone can give me any suggestions, your help will be really appreciated.
>ThanksSai
>
>
>


-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by Sai Sai <sa...@yahoo.in>.
Many Thanks Nitin for your quick reply.

Heres what i have in my hosts file and i am running in VM i m assuming it is the pseudo mode:

*********************
127.0.0.1    localhost.localdomain    localhost
#::1    ubuntu    localhost6.localdomain6    localhost6
#127.0.1.1    ubuntu
127.0.0.1   ubuntu

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
*********************
In my masters i have:
ubuntu
In my slaves i have:
localhost
***********************
My question is in my variable below:
public static String fsURI = "hdfs://master:9000";

what would be the right value so i can connect to Hadoop successfully.
Please let me know if you need more info.
Thanks
Sai







________________________________
 From: Nitin Pawar <ni...@gmail.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Sunday, 24 February 2013 3:42 AM
Subject: Re: Trying to copy file to Hadoop file system from a program
 

if you want to use master as your hostname then make such entry in your /etc/hosts file 

or change the hdfs://master to hdfs://localhost 



On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:


>
>Greetings,
>
>
>Below is the program i am trying to run and getting this exception:
>***************************************
>
>Test Start.....
>java.net.UnknownHostException: unknown host: master
>    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>    at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>    at $Proxy1.getProtocolVersion(Unknown Source)
>    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>    at
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>    at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
>
>
>********************
>
>
>
>public class HdpTest {
>    
>    public static String fsURI = "hdfs://master:9000";
>
>    
>    public static void copyFileToDFS(FileSystem fs, String srcFile,
>   
         String dstFile) throws IOException {
>        try {
>            System.out.println("Initialize copy...");
>            URI suri = new URI(srcFile);
>            URI duri = new URI(fsURI + "/" + dstFile);
>            Path dst = new Path(duri.toString());
>            Path src = new Path(suri.toString());
>            System.out.println("Start copy...");
>            fs.copyFromLocalFile(src, dst);
>            System.out.println("End copy...");
>        } catch (Exception e)
 {
>            e.printStackTrace();
>        }
>    }
>
>    public static void main(String[] args) {
>        try {
>            System.out.println("Test Start.....");
>            Configuration conf = new Configuration();
>            DistributedFileSystem fs = new DistributedFileSystem();
>            URI duri = new URI(fsURI);
>            fs.initialize(duri, conf); // Here is the xception occuring
>            long start = 0, end = 0;
>       
     start = System.nanoTime();
>            //writing data from local to HDFS
>            copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                    "/input/raptor/trade1.txt");
>            //Writing data from HDFS to Local
>//             copyFileFromDFS(fs, "/input/raptor/trade1.txt", "/home/kosmos/Work/input/wordpair1.txt");
>            end = System.nanoTime();
>            System.out.println("Total Execution times: " + (end - start));
>            fs.close();
>        } catch
 (Throwable t) {
>            t.printStackTrace();
>        }
>    }
>
>******************************
>I am trying to access in FireFox this url: 
>
>hdfs://master:9000
>
>
>Get an error msg FF does not know how to display this message.
>
>
>I can successfully access my admin page:
>
>
>http://localhost:50070/dfshealth.jsp
>
>
>Just wondering if anyone can give me any suggestions, your help will be really appreciated.
>ThanksSai
>
>
>


-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by Nitin Pawar <ni...@gmail.com>.
if you want to use master as your hostname then make such entry in your
/etc/hosts file

or change the hdfs://master to hdfs://localhost


On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:

>
> Greetings,
>
> Below is the program i am trying to run and getting this exception:
> ***************************************
> Test Start.....
> java.net.UnknownHostException: unknown host: master
>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy1.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>     at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
> ********************
>
> public class HdpTest {
>
>     public static String fsURI = "hdfs://master:9000";
>
>
>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>             String dstFile) throws IOException {
>         try {
>             System.out.println("Initialize copy...");
>             URI suri = new URI(srcFile);
>             URI duri = new URI(fsURI + "/" + dstFile);
>             Path dst = new Path(duri.toString());
>             Path src = new Path(suri.toString());
>             System.out.println("Start copy...");
>             fs.copyFromLocalFile(src, dst);
>             System.out.println("End copy...");
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
>
>     public static void main(String[] args) {
>         try {
>             System.out.println("Test Start.....");
>             Configuration conf = new Configuration();
>             DistributedFileSystem fs = new DistributedFileSystem();
>             URI duri = new URI(fsURI);
>             fs.initialize(duri, conf); // Here is the xception occuring
>             long start = 0, end = 0;
>             start = System.nanoTime();
>             //writing data from local to HDFS
>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                     "/input/raptor/trade1.txt");
>             //Writing data from HDFS to Local
> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
> "/home/kosmos/Work/input/wordpair1.txt");
>             end = System.nanoTime();
>             System.out.println("Total Execution times: " + (end - start));
>             fs.close();
>         } catch (Throwable t) {
>             t.printStackTrace();
>         }
>     }
> ******************************
> I am trying to access in FireFox this url:
>  hdfs://master:9000
>
>  Get an error msg FF does not know how to display this message.
>
>  I can successfully access my admin page:
>
>  http://localhost:50070/dfshealth.jsp
>
> Just wondering if anyone can give me any suggestions, your help will be
> really appreciated.
> Thanks
> Sai
>
>


-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by Nitin Pawar <ni...@gmail.com>.
if you want to use master as your hostname then make such entry in your
/etc/hosts file

or change the hdfs://master to hdfs://localhost


On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:

>
> Greetings,
>
> Below is the program i am trying to run and getting this exception:
> ***************************************
> Test Start.....
> java.net.UnknownHostException: unknown host: master
>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy1.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>     at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
> ********************
>
> public class HdpTest {
>
>     public static String fsURI = "hdfs://master:9000";
>
>
>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>             String dstFile) throws IOException {
>         try {
>             System.out.println("Initialize copy...");
>             URI suri = new URI(srcFile);
>             URI duri = new URI(fsURI + "/" + dstFile);
>             Path dst = new Path(duri.toString());
>             Path src = new Path(suri.toString());
>             System.out.println("Start copy...");
>             fs.copyFromLocalFile(src, dst);
>             System.out.println("End copy...");
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
>
>     public static void main(String[] args) {
>         try {
>             System.out.println("Test Start.....");
>             Configuration conf = new Configuration();
>             DistributedFileSystem fs = new DistributedFileSystem();
>             URI duri = new URI(fsURI);
>             fs.initialize(duri, conf); // Here is the xception occuring
>             long start = 0, end = 0;
>             start = System.nanoTime();
>             //writing data from local to HDFS
>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                     "/input/raptor/trade1.txt");
>             //Writing data from HDFS to Local
> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
> "/home/kosmos/Work/input/wordpair1.txt");
>             end = System.nanoTime();
>             System.out.println("Total Execution times: " + (end - start));
>             fs.close();
>         } catch (Throwable t) {
>             t.printStackTrace();
>         }
>     }
> ******************************
> I am trying to access in FireFox this url:
>  hdfs://master:9000
>
>  Get an error msg FF does not know how to display this message.
>
>  I can successfully access my admin page:
>
>  http://localhost:50070/dfshealth.jsp
>
> Just wondering if anyone can give me any suggestions, your help will be
> really appreciated.
> Thanks
> Sai
>
>


-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by Nitin Pawar <ni...@gmail.com>.
if you want to use master as your hostname then make such entry in your
/etc/hosts file

or change the hdfs://master to hdfs://localhost


On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:

>
> Greetings,
>
> Below is the program i am trying to run and getting this exception:
> ***************************************
> Test Start.....
> java.net.UnknownHostException: unknown host: master
>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy1.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>     at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
> ********************
>
> public class HdpTest {
>
>     public static String fsURI = "hdfs://master:9000";
>
>
>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>             String dstFile) throws IOException {
>         try {
>             System.out.println("Initialize copy...");
>             URI suri = new URI(srcFile);
>             URI duri = new URI(fsURI + "/" + dstFile);
>             Path dst = new Path(duri.toString());
>             Path src = new Path(suri.toString());
>             System.out.println("Start copy...");
>             fs.copyFromLocalFile(src, dst);
>             System.out.println("End copy...");
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
>
>     public static void main(String[] args) {
>         try {
>             System.out.println("Test Start.....");
>             Configuration conf = new Configuration();
>             DistributedFileSystem fs = new DistributedFileSystem();
>             URI duri = new URI(fsURI);
>             fs.initialize(duri, conf); // Here is the xception occuring
>             long start = 0, end = 0;
>             start = System.nanoTime();
>             //writing data from local to HDFS
>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                     "/input/raptor/trade1.txt");
>             //Writing data from HDFS to Local
> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
> "/home/kosmos/Work/input/wordpair1.txt");
>             end = System.nanoTime();
>             System.out.println("Total Execution times: " + (end - start));
>             fs.close();
>         } catch (Throwable t) {
>             t.printStackTrace();
>         }
>     }
> ******************************
> I am trying to access in FireFox this url:
>  hdfs://master:9000
>
>  Get an error msg FF does not know how to display this message.
>
>  I can successfully access my admin page:
>
>  http://localhost:50070/dfshealth.jsp
>
> Just wondering if anyone can give me any suggestions, your help will be
> really appreciated.
> Thanks
> Sai
>
>


-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by Nitin Pawar <ni...@gmail.com>.
if you want to use master as your hostname then make such entry in your
/etc/hosts file

or change the hdfs://master to hdfs://localhost


On Sun, Feb 24, 2013 at 5:10 PM, Sai Sai <sa...@yahoo.in> wrote:

>
> Greetings,
>
> Below is the program i am trying to run and getting this exception:
> ***************************************
> Test Start.....
> java.net.UnknownHostException: unknown host: master
>     at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1050)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy1.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
>     at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
>     at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>     at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)
>
>
> ********************
>
> public class HdpTest {
>
>     public static String fsURI = "hdfs://master:9000";
>
>
>     public static void copyFileToDFS(FileSystem fs, String srcFile,
>             String dstFile) throws IOException {
>         try {
>             System.out.println("Initialize copy...");
>             URI suri = new URI(srcFile);
>             URI duri = new URI(fsURI + "/" + dstFile);
>             Path dst = new Path(duri.toString());
>             Path src = new Path(suri.toString());
>             System.out.println("Start copy...");
>             fs.copyFromLocalFile(src, dst);
>             System.out.println("End copy...");
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>     }
>
>     public static void main(String[] args) {
>         try {
>             System.out.println("Test Start.....");
>             Configuration conf = new Configuration();
>             DistributedFileSystem fs = new DistributedFileSystem();
>             URI duri = new URI(fsURI);
>             fs.initialize(duri, conf); // Here is the xception occuring
>             long start = 0, end = 0;
>             start = System.nanoTime();
>             //writing data from local to HDFS
>             copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
>                     "/input/raptor/trade1.txt");
>             //Writing data from HDFS to Local
> //             copyFileFromDFS(fs, "/input/raptor/trade1.txt",
> "/home/kosmos/Work/input/wordpair1.txt");
>             end = System.nanoTime();
>             System.out.println("Total Execution times: " + (end - start));
>             fs.close();
>         } catch (Throwable t) {
>             t.printStackTrace();
>         }
>     }
> ******************************
> I am trying to access in FireFox this url:
>  hdfs://master:9000
>
>  Get an error msg FF does not know how to display this message.
>
>  I can successfully access my admin page:
>
>  http://localhost:50070/dfshealth.jsp
>
> Just wondering if anyone can give me any suggestions, your help will be
> really appreciated.
> Thanks
> Sai
>
>


-- 
Nitin Pawar

Re: Trying to copy file to Hadoop file system from a program

Posted by Sai Sai <sa...@yahoo.in>.

Greetings,

Below is the program i am trying to run and getting this exception:
***************************************

Test Start.....
java.net.UnknownHostException: unknown host: master
    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
    at org.apache.hadoop.ipc.Client.call(Client.java:1050)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)



********************


public class HdpTest {
    
    public static String fsURI = "hdfs://master:9000";

    
    public static void copyFileToDFS(FileSystem fs, String srcFile,
            String dstFile) throws IOException {
        try {
            System.out.println("Initialize copy...");
            URI suri = new URI(srcFile);
            URI duri = new URI(fsURI + "/" + dstFile);
            Path dst = new Path(duri.toString());
            Path src = new Path(suri.toString());
            System.out.println("Start copy...");
            fs.copyFromLocalFile(src, dst);
            System.out.println("End copy...");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        try {
            System.out.println("Test Start.....");
            Configuration conf = new Configuration();
            DistributedFileSystem fs = new DistributedFileSystem();
            URI duri = new URI(fsURI);
            fs.initialize(duri, conf); // Here is the xception occuring
            long start = 0, end = 0;
            start = System.nanoTime();
            //writing data from local to HDFS
            copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
                    "/input/raptor/trade1.txt");
            //Writing data from HDFS to Local
//             copyFileFromDFS(fs, "/input/raptor/trade1.txt", "/home/kosmos/Work/input/wordpair1.txt");
            end = System.nanoTime();
            System.out.println("Total Execution times: " + (end - start));
            fs.close();
        } catch (Throwable t) {
            t.printStackTrace();
        }
    }

******************************
I am trying to access in FireFox this url: 

hdfs://master:9000

Get an error msg FF does not know how to display this message.

I can successfully access my admin page:

http://localhost:50070/dfshealth.jsp

Just wondering if anyone can give me any suggestions, your help will be really appreciated.
Thanks
Sai

Re: WordPairCount Mapreduce question.

Posted by Harsh J <ha...@cloudera.com>.
Also noteworthy is that the performance gain can only be had (from the
byte level compare method) iff the
serialization/deserialization/format of data is comparable at the byte
level. One such provider is Apache Avro:
http://avro.apache.org/docs/current/spec.html#order.

Most other implementations simply deserialize again from the
bytestream and then compare, which has a higher (or, regular) cost.

On Mon, Feb 25, 2013 at 1:44 PM, Mahesh Balija
<ba...@gmail.com> wrote:
> byte array comparison is for performance reasons only, but NOT the way you
> are thinking.
> This method comes from an interface called RawComparator which provides the
> prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2);) for this method.
> In the sorting phase where the keys are sorted, because of this
> implementation the records are read from the stream directly and sorted
> without the need to deserializing them into Objects.
>
> Best,
> Mahesh Balija,
> CalsoftLabs.
>
>
> On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> Thanks Mahesh for your help.
>>
>> Wondering if u can provide some insight with the below compare method
>> using byte[] in the SecondarySort example:
>>
>> public static class Comparator extends WritableComparator {
>>         public Comparator() {
>>             super(URICountKey.class);
>>         }
>>
>>         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
>> int l2) {
>>             return compareBytes(b1, s1, l1, b2, s2, l2);
>>         }
>>     }
>>
>> My question is in the below compare method that i have given we are
>> comparing word1/word2
>> which makes sense but what about this byte[] comparison, is it right in
>> assuming  it converts each objects word1/word2/word3 to byte[] and compares
>> them.
>> If so is it for performance reason it is done.
>> Could you please verify.
>> Thanks
>> Sai
>> ________________________________
>> From: Mahesh Balija <ba...@gmail.com>
>> To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> Sent: Saturday, 23 February 2013 5:23 AM
>> Subject: Re: WordPairCount Mapreduce question.
>>
>> Please check the in-line answers...
>>
>> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>
>> Hello
>>
>> I have a question about how Mapreduce sorting works internally with
>> multiple columns.
>>
>> Below r my classes using 2 columns in an input file given below.
>>
>> 1st question: About the method hashCode, we r adding a "31 + ", i am
>> wondering why is this required. what does 31 refer to.
>>
>> This is how usually hashcode is calculated for any String instance
>> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
>> the String. Since in your case you only have 2 chars then it will be a *
>> 31^0 + b * 31^1.
>>
>>
>>
>> 2nd question: what if my input file has 3 columns instead of 2 how would
>> you write a compare method and was wondering if anyone can map this to a
>> real world scenario it will be really helpful.
>>
>> you will extend the same approach for the third column,
>>  public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>             if(diff==0){
>>                  diff = word3.compareTo(o.word3);
>>             }
>>         }
>>         return diff;
>>     }
>>
>>
>>
>>
>>     @Override
>>     public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>         }
>>         return diff;
>>     }
>>
>>     @Override
>>     public int hashCode() {
>>         return word1.hashCode() + 31 * word2.hashCode();
>>     }
>>
>> ******************************
>>
>> Here is my input file wordpair.txt
>>
>> ******************************
>>
>> a    b
>> a    c
>> a    b
>> a    d
>> b    d
>> e    f
>> b    d
>> e    f
>> b    d
>>
>> **********************************
>>
>> Here is my WordPairObject:
>>
>> *********************************
>>
>> public class WordPairCountKey implements
>> WritableComparable<WordPairCountKey> {
>>
>>     private String word1;
>>     private String word2;
>>
>>     @Override
>>     public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>         }
>>         return diff;
>>     }
>>
>>     @Override
>>     public int hashCode() {
>>         return word1.hashCode() + 31 * word2.hashCode();
>>     }
>>
>>
>>     public String getWord1() {
>>         return word1;
>>     }
>>
>>     public void setWord1(String word1) {
>>         this.word1 = word1;
>>     }
>>
>>     public String getWord2() {
>>         return word2;
>>     }
>>
>>     public void setWord2(String word2) {
>>         this.word2 = word2;
>>     }
>>
>>     @Override
>>     public void readFields(DataInput in) throws IOException {
>>         word1 = in.readUTF();
>>         word2 = in.readUTF();
>>     }
>>
>>     @Override
>>     public void write(DataOutput out) throws IOException {
>>         out.writeUTF(word1);
>>         out.writeUTF(word2);
>>     }
>>
>>
>>     @Override
>>     public String toString() {
>>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>>     }
>>
>> }
>>
>> ******************************
>>
>> Any help will be really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>>
>



--
Harsh J

Re: WordPairCount Mapreduce question.

Posted by Harsh J <ha...@cloudera.com>.
Also noteworthy is that the performance gain can only be had (from the
byte level compare method) iff the
serialization/deserialization/format of data is comparable at the byte
level. One such provider is Apache Avro:
http://avro.apache.org/docs/current/spec.html#order.

Most other implementations simply deserialize again from the
bytestream and then compare, which has a higher (or, regular) cost.

On Mon, Feb 25, 2013 at 1:44 PM, Mahesh Balija
<ba...@gmail.com> wrote:
> byte array comparison is for performance reasons only, but NOT the way you
> are thinking.
> This method comes from an interface called RawComparator which provides the
> prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2);) for this method.
> In the sorting phase where the keys are sorted, because of this
> implementation the records are read from the stream directly and sorted
> without the need to deserializing them into Objects.
>
> Best,
> Mahesh Balija,
> CalsoftLabs.
>
>
> On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> Thanks Mahesh for your help.
>>
>> Wondering if u can provide some insight with the below compare method
>> using byte[] in the SecondarySort example:
>>
>> public static class Comparator extends WritableComparator {
>>         public Comparator() {
>>             super(URICountKey.class);
>>         }
>>
>>         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
>> int l2) {
>>             return compareBytes(b1, s1, l1, b2, s2, l2);
>>         }
>>     }
>>
>> My question is in the below compare method that i have given we are
>> comparing word1/word2
>> which makes sense but what about this byte[] comparison, is it right in
>> assuming  it converts each objects word1/word2/word3 to byte[] and compares
>> them.
>> If so is it for performance reason it is done.
>> Could you please verify.
>> Thanks
>> Sai
>> ________________________________
>> From: Mahesh Balija <ba...@gmail.com>
>> To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> Sent: Saturday, 23 February 2013 5:23 AM
>> Subject: Re: WordPairCount Mapreduce question.
>>
>> Please check the in-line answers...
>>
>> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>
>> Hello
>>
>> I have a question about how Mapreduce sorting works internally with
>> multiple columns.
>>
>> Below r my classes using 2 columns in an input file given below.
>>
>> 1st question: About the method hashCode, we r adding a "31 + ", i am
>> wondering why is this required. what does 31 refer to.
>>
>> This is how usually hashcode is calculated for any String instance
>> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
>> the String. Since in your case you only have 2 chars then it will be a *
>> 31^0 + b * 31^1.
>>
>>
>>
>> 2nd question: what if my input file has 3 columns instead of 2 how would
>> you write a compare method and was wondering if anyone can map this to a
>> real world scenario it will be really helpful.
>>
>> you will extend the same approach for the third column,
>>  public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>             if(diff==0){
>>                  diff = word3.compareTo(o.word3);
>>             }
>>         }
>>         return diff;
>>     }
>>
>>
>>
>>
>>     @Override
>>     public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>         }
>>         return diff;
>>     }
>>
>>     @Override
>>     public int hashCode() {
>>         return word1.hashCode() + 31 * word2.hashCode();
>>     }
>>
>> ******************************
>>
>> Here is my input file wordpair.txt
>>
>> ******************************
>>
>> a    b
>> a    c
>> a    b
>> a    d
>> b    d
>> e    f
>> b    d
>> e    f
>> b    d
>>
>> **********************************
>>
>> Here is my WordPairObject:
>>
>> *********************************
>>
>> public class WordPairCountKey implements
>> WritableComparable<WordPairCountKey> {
>>
>>     private String word1;
>>     private String word2;
>>
>>     @Override
>>     public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>         }
>>         return diff;
>>     }
>>
>>     @Override
>>     public int hashCode() {
>>         return word1.hashCode() + 31 * word2.hashCode();
>>     }
>>
>>
>>     public String getWord1() {
>>         return word1;
>>     }
>>
>>     public void setWord1(String word1) {
>>         this.word1 = word1;
>>     }
>>
>>     public String getWord2() {
>>         return word2;
>>     }
>>
>>     public void setWord2(String word2) {
>>         this.word2 = word2;
>>     }
>>
>>     @Override
>>     public void readFields(DataInput in) throws IOException {
>>         word1 = in.readUTF();
>>         word2 = in.readUTF();
>>     }
>>
>>     @Override
>>     public void write(DataOutput out) throws IOException {
>>         out.writeUTF(word1);
>>         out.writeUTF(word2);
>>     }
>>
>>
>>     @Override
>>     public String toString() {
>>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>>     }
>>
>> }
>>
>> ******************************
>>
>> Any help will be really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>>
>



--
Harsh J

Re: WordPairCount Mapreduce question.

Posted by Harsh J <ha...@cloudera.com>.
Also noteworthy is that the performance gain can only be had (from the
byte level compare method) iff the
serialization/deserialization/format of data is comparable at the byte
level. One such provider is Apache Avro:
http://avro.apache.org/docs/current/spec.html#order.

Most other implementations simply deserialize again from the
bytestream and then compare, which has a higher (or, regular) cost.

On Mon, Feb 25, 2013 at 1:44 PM, Mahesh Balija
<ba...@gmail.com> wrote:
> byte array comparison is for performance reasons only, but NOT the way you
> are thinking.
> This method comes from an interface called RawComparator which provides the
> prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2);) for this method.
> In the sorting phase where the keys are sorted, because of this
> implementation the records are read from the stream directly and sorted
> without the need to deserializing them into Objects.
>
> Best,
> Mahesh Balija,
> CalsoftLabs.
>
>
> On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> Thanks Mahesh for your help.
>>
>> Wondering if u can provide some insight with the below compare method
>> using byte[] in the SecondarySort example:
>>
>> public static class Comparator extends WritableComparator {
>>         public Comparator() {
>>             super(URICountKey.class);
>>         }
>>
>>         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
>> int l2) {
>>             return compareBytes(b1, s1, l1, b2, s2, l2);
>>         }
>>     }
>>
>> My question is in the below compare method that i have given we are
>> comparing word1/word2
>> which makes sense but what about this byte[] comparison, is it right in
>> assuming  it converts each objects word1/word2/word3 to byte[] and compares
>> them.
>> If so is it for performance reason it is done.
>> Could you please verify.
>> Thanks
>> Sai
>> ________________________________
>> From: Mahesh Balija <ba...@gmail.com>
>> To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> Sent: Saturday, 23 February 2013 5:23 AM
>> Subject: Re: WordPairCount Mapreduce question.
>>
>> Please check the in-line answers...
>>
>> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>
>> Hello
>>
>> I have a question about how Mapreduce sorting works internally with
>> multiple columns.
>>
>> Below r my classes using 2 columns in an input file given below.
>>
>> 1st question: About the method hashCode, we r adding a "31 + ", i am
>> wondering why is this required. what does 31 refer to.
>>
>> This is how usually hashcode is calculated for any String instance
>> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
>> the String. Since in your case you only have 2 chars then it will be a *
>> 31^0 + b * 31^1.
>>
>>
>>
>> 2nd question: what if my input file has 3 columns instead of 2 how would
>> you write a compare method and was wondering if anyone can map this to a
>> real world scenario it will be really helpful.
>>
>> you will extend the same approach for the third column,
>>  public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>             if(diff==0){
>>                  diff = word3.compareTo(o.word3);
>>             }
>>         }
>>         return diff;
>>     }
>>
>>
>>
>>
>>     @Override
>>     public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>         }
>>         return diff;
>>     }
>>
>>     @Override
>>     public int hashCode() {
>>         return word1.hashCode() + 31 * word2.hashCode();
>>     }
>>
>> ******************************
>>
>> Here is my input file wordpair.txt
>>
>> ******************************
>>
>> a    b
>> a    c
>> a    b
>> a    d
>> b    d
>> e    f
>> b    d
>> e    f
>> b    d
>>
>> **********************************
>>
>> Here is my WordPairObject:
>>
>> *********************************
>>
>> public class WordPairCountKey implements
>> WritableComparable<WordPairCountKey> {
>>
>>     private String word1;
>>     private String word2;
>>
>>     @Override
>>     public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>         }
>>         return diff;
>>     }
>>
>>     @Override
>>     public int hashCode() {
>>         return word1.hashCode() + 31 * word2.hashCode();
>>     }
>>
>>
>>     public String getWord1() {
>>         return word1;
>>     }
>>
>>     public void setWord1(String word1) {
>>         this.word1 = word1;
>>     }
>>
>>     public String getWord2() {
>>         return word2;
>>     }
>>
>>     public void setWord2(String word2) {
>>         this.word2 = word2;
>>     }
>>
>>     @Override
>>     public void readFields(DataInput in) throws IOException {
>>         word1 = in.readUTF();
>>         word2 = in.readUTF();
>>     }
>>
>>     @Override
>>     public void write(DataOutput out) throws IOException {
>>         out.writeUTF(word1);
>>         out.writeUTF(word2);
>>     }
>>
>>
>>     @Override
>>     public String toString() {
>>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>>     }
>>
>> }
>>
>> ******************************
>>
>> Any help will be really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>>
>



--
Harsh J

Re: WordPairCount Mapreduce question.

Posted by Harsh J <ha...@cloudera.com>.
Also noteworthy is that the performance gain can only be had (from the
byte level compare method) iff the
serialization/deserialization/format of data is comparable at the byte
level. One such provider is Apache Avro:
http://avro.apache.org/docs/current/spec.html#order.

Most other implementations simply deserialize again from the
bytestream and then compare, which has a higher (or, regular) cost.

On Mon, Feb 25, 2013 at 1:44 PM, Mahesh Balija
<ba...@gmail.com> wrote:
> byte array comparison is for performance reasons only, but NOT the way you
> are thinking.
> This method comes from an interface called RawComparator which provides the
> prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2);) for this method.
> In the sorting phase where the keys are sorted, because of this
> implementation the records are read from the stream directly and sorted
> without the need to deserializing them into Objects.
>
> Best,
> Mahesh Balija,
> CalsoftLabs.
>
>
> On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>> Thanks Mahesh for your help.
>>
>> Wondering if u can provide some insight with the below compare method
>> using byte[] in the SecondarySort example:
>>
>> public static class Comparator extends WritableComparator {
>>         public Comparator() {
>>             super(URICountKey.class);
>>         }
>>
>>         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
>> int l2) {
>>             return compareBytes(b1, s1, l1, b2, s2, l2);
>>         }
>>     }
>>
>> My question is in the below compare method that i have given we are
>> comparing word1/word2
>> which makes sense but what about this byte[] comparison, is it right in
>> assuming  it converts each objects word1/word2/word3 to byte[] and compares
>> them.
>> If so is it for performance reason it is done.
>> Could you please verify.
>> Thanks
>> Sai
>> ________________________________
>> From: Mahesh Balija <ba...@gmail.com>
>> To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
>> Sent: Saturday, 23 February 2013 5:23 AM
>> Subject: Re: WordPairCount Mapreduce question.
>>
>> Please check the in-line answers...
>>
>> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>
>> Hello
>>
>> I have a question about how Mapreduce sorting works internally with
>> multiple columns.
>>
>> Below r my classes using 2 columns in an input file given below.
>>
>> 1st question: About the method hashCode, we r adding a "31 + ", i am
>> wondering why is this required. what does 31 refer to.
>>
>> This is how usually hashcode is calculated for any String instance
>> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
>> the String. Since in your case you only have 2 chars then it will be a *
>> 31^0 + b * 31^1.
>>
>>
>>
>> 2nd question: what if my input file has 3 columns instead of 2 how would
>> you write a compare method and was wondering if anyone can map this to a
>> real world scenario it will be really helpful.
>>
>> you will extend the same approach for the third column,
>>  public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>             if(diff==0){
>>                  diff = word3.compareTo(o.word3);
>>             }
>>         }
>>         return diff;
>>     }
>>
>>
>>
>>
>>     @Override
>>     public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>         }
>>         return diff;
>>     }
>>
>>     @Override
>>     public int hashCode() {
>>         return word1.hashCode() + 31 * word2.hashCode();
>>     }
>>
>> ******************************
>>
>> Here is my input file wordpair.txt
>>
>> ******************************
>>
>> a    b
>> a    c
>> a    b
>> a    d
>> b    d
>> e    f
>> b    d
>> e    f
>> b    d
>>
>> **********************************
>>
>> Here is my WordPairObject:
>>
>> *********************************
>>
>> public class WordPairCountKey implements
>> WritableComparable<WordPairCountKey> {
>>
>>     private String word1;
>>     private String word2;
>>
>>     @Override
>>     public int compareTo(WordPairCountKey o) {
>>         int diff = word1.compareTo(o.word1);
>>         if (diff == 0) {
>>             diff = word2.compareTo(o.word2);
>>         }
>>         return diff;
>>     }
>>
>>     @Override
>>     public int hashCode() {
>>         return word1.hashCode() + 31 * word2.hashCode();
>>     }
>>
>>
>>     public String getWord1() {
>>         return word1;
>>     }
>>
>>     public void setWord1(String word1) {
>>         this.word1 = word1;
>>     }
>>
>>     public String getWord2() {
>>         return word2;
>>     }
>>
>>     public void setWord2(String word2) {
>>         this.word2 = word2;
>>     }
>>
>>     @Override
>>     public void readFields(DataInput in) throws IOException {
>>         word1 = in.readUTF();
>>         word2 = in.readUTF();
>>     }
>>
>>     @Override
>>     public void write(DataOutput out) throws IOException {
>>         out.writeUTF(word1);
>>         out.writeUTF(word2);
>>     }
>>
>>
>>     @Override
>>     public String toString() {
>>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>>     }
>>
>> }
>>
>> ******************************
>>
>> Any help will be really appreciated.
>> Thanks
>> Sai
>>
>>
>>
>>
>



--
Harsh J

Re: WordPairCount Mapreduce question.

Posted by Mahesh Balija <ba...@gmail.com>.
byte array comparison is for performance reasons only, but NOT the way you
are thinking.
This method comes from an interface called RawComparator which provides the
prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
int l2);) for this method.
In the sorting phase where the keys are sorted, because of this
implementation the records are read from the stream directly and sorted
without the need to deserializing them into Objects.

Best,
Mahesh Balija,
CalsoftLabs.

On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai <sa...@yahoo.in> wrote:

> Thanks Mahesh for your help.
>
> Wondering if u can provide some insight with the below compare method
> using byte[] in the SecondarySort example:
>
> public static class Comparator extends WritableComparator {
>         public Comparator() {
>             super(URICountKey.class);
>         }
>
>         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2) {
>             return compareBytes(b1, s1, l1, b2, s2, l2);
>         }
>     }
>
> My question is in the below compare method that i have given we are
> comparing word1/word2
> which makes sense but what about this byte[] comparison, is it right in
> assuming  it converts each objects word1/word2/word3 to byte[] and compares
> them.
> If so is it for performance reason it is done.
> Could you please verify.
> Thanks
> Sai
>   ------------------------------
> *From:* Mahesh Balija <ba...@gmail.com>
> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> *Sent:* Saturday, 23 February 2013 5:23 AM
> *Subject:* Re: WordPairCount Mapreduce question.
>
> Please check the in-line answers...
>
> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
> This is how usually hashcode is calculated for any String instance
> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
> the String. Since in your case you only have 2 chars then it will be a *
> 31^0 + b * 31^1.
>
>
>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
> you will extend the same approach for the third column,
>  public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>             if(diff==0){
>                  diff = word3.compareTo(o.word3);
>             }
>          }
>         return diff;
>     }
>
>
>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
>         return word1;
>     }
>
>     public void setWord1(String word1) {
>         this.word1 = word1;
>     }
>
>     public String getWord2() {
>         return word2;
>     }
>
>     public void setWord2(String word2) {
>         this.word2 = word2;
>     }
>
>     @Override
>     public void readFields(DataInput in) throws IOException {
>         word1 = in.readUTF();
>         word2 = in.readUTF();
>     }
>
>     @Override
>     public void write(DataOutput out) throws IOException {
>         out.writeUTF(word1);
>         out.writeUTF(word2);
>     }
>
>
>     @Override
>     public String toString() {
>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>     }
>
> }
>
> ******************************
>
> Any help will be really appreciated.
> Thanks
> Sai
>
>
>
>
>

Re: WordPairCount Mapreduce question.

Posted by Mahesh Balija <ba...@gmail.com>.
byte array comparison is for performance reasons only, but NOT the way you
are thinking.
This method comes from an interface called RawComparator which provides the
prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
int l2);) for this method.
In the sorting phase where the keys are sorted, because of this
implementation the records are read from the stream directly and sorted
without the need to deserializing them into Objects.

Best,
Mahesh Balija,
CalsoftLabs.

On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai <sa...@yahoo.in> wrote:

> Thanks Mahesh for your help.
>
> Wondering if u can provide some insight with the below compare method
> using byte[] in the SecondarySort example:
>
> public static class Comparator extends WritableComparator {
>         public Comparator() {
>             super(URICountKey.class);
>         }
>
>         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2) {
>             return compareBytes(b1, s1, l1, b2, s2, l2);
>         }
>     }
>
> My question is in the below compare method that i have given we are
> comparing word1/word2
> which makes sense but what about this byte[] comparison, is it right in
> assuming  it converts each objects word1/word2/word3 to byte[] and compares
> them.
> If so is it for performance reason it is done.
> Could you please verify.
> Thanks
> Sai
>   ------------------------------
> *From:* Mahesh Balija <ba...@gmail.com>
> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> *Sent:* Saturday, 23 February 2013 5:23 AM
> *Subject:* Re: WordPairCount Mapreduce question.
>
> Please check the in-line answers...
>
> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
> This is how usually hashcode is calculated for any String instance
> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
> the String. Since in your case you only have 2 chars then it will be a *
> 31^0 + b * 31^1.
>
>
>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
> you will extend the same approach for the third column,
>  public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>             if(diff==0){
>                  diff = word3.compareTo(o.word3);
>             }
>          }
>         return diff;
>     }
>
>
>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
>         return word1;
>     }
>
>     public void setWord1(String word1) {
>         this.word1 = word1;
>     }
>
>     public String getWord2() {
>         return word2;
>     }
>
>     public void setWord2(String word2) {
>         this.word2 = word2;
>     }
>
>     @Override
>     public void readFields(DataInput in) throws IOException {
>         word1 = in.readUTF();
>         word2 = in.readUTF();
>     }
>
>     @Override
>     public void write(DataOutput out) throws IOException {
>         out.writeUTF(word1);
>         out.writeUTF(word2);
>     }
>
>
>     @Override
>     public String toString() {
>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>     }
>
> }
>
> ******************************
>
> Any help will be really appreciated.
> Thanks
> Sai
>
>
>
>
>

Re: WordPairCount Mapreduce question.

Posted by Mahesh Balija <ba...@gmail.com>.
byte array comparison is for performance reasons only, but NOT the way you
are thinking.
This method comes from an interface called RawComparator which provides the
prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
int l2);) for this method.
In the sorting phase where the keys are sorted, because of this
implementation the records are read from the stream directly and sorted
without the need to deserializing them into Objects.

Best,
Mahesh Balija,
CalsoftLabs.

On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai <sa...@yahoo.in> wrote:

> Thanks Mahesh for your help.
>
> Wondering if u can provide some insight with the below compare method
> using byte[] in the SecondarySort example:
>
> public static class Comparator extends WritableComparator {
>         public Comparator() {
>             super(URICountKey.class);
>         }
>
>         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2) {
>             return compareBytes(b1, s1, l1, b2, s2, l2);
>         }
>     }
>
> My question is in the below compare method that i have given we are
> comparing word1/word2
> which makes sense but what about this byte[] comparison, is it right in
> assuming  it converts each objects word1/word2/word3 to byte[] and compares
> them.
> If so is it for performance reason it is done.
> Could you please verify.
> Thanks
> Sai
>   ------------------------------
> *From:* Mahesh Balija <ba...@gmail.com>
> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> *Sent:* Saturday, 23 February 2013 5:23 AM
> *Subject:* Re: WordPairCount Mapreduce question.
>
> Please check the in-line answers...
>
> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
> This is how usually hashcode is calculated for any String instance
> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
> the String. Since in your case you only have 2 chars then it will be a *
> 31^0 + b * 31^1.
>
>
>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
> you will extend the same approach for the third column,
>  public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>             if(diff==0){
>                  diff = word3.compareTo(o.word3);
>             }
>          }
>         return diff;
>     }
>
>
>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
>         return word1;
>     }
>
>     public void setWord1(String word1) {
>         this.word1 = word1;
>     }
>
>     public String getWord2() {
>         return word2;
>     }
>
>     public void setWord2(String word2) {
>         this.word2 = word2;
>     }
>
>     @Override
>     public void readFields(DataInput in) throws IOException {
>         word1 = in.readUTF();
>         word2 = in.readUTF();
>     }
>
>     @Override
>     public void write(DataOutput out) throws IOException {
>         out.writeUTF(word1);
>         out.writeUTF(word2);
>     }
>
>
>     @Override
>     public String toString() {
>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>     }
>
> }
>
> ******************************
>
> Any help will be really appreciated.
> Thanks
> Sai
>
>
>
>
>

Re: WordPairCount Mapreduce question.

Posted by Mahesh Balija <ba...@gmail.com>.
byte array comparison is for performance reasons only, but NOT the way you
are thinking.
This method comes from an interface called RawComparator which provides the
prototype (public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
int l2);) for this method.
In the sorting phase where the keys are sorted, because of this
implementation the records are read from the stream directly and sorted
without the need to deserializing them into Objects.

Best,
Mahesh Balija,
CalsoftLabs.

On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai <sa...@yahoo.in> wrote:

> Thanks Mahesh for your help.
>
> Wondering if u can provide some insight with the below compare method
> using byte[] in the SecondarySort example:
>
> public static class Comparator extends WritableComparator {
>         public Comparator() {
>             super(URICountKey.class);
>         }
>
>         public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,
> int l2) {
>             return compareBytes(b1, s1, l1, b2, s2, l2);
>         }
>     }
>
> My question is in the below compare method that i have given we are
> comparing word1/word2
> which makes sense but what about this byte[] comparison, is it right in
> assuming  it converts each objects word1/word2/word3 to byte[] and compares
> them.
> If so is it for performance reason it is done.
> Could you please verify.
> Thanks
> Sai
>   ------------------------------
> *From:* Mahesh Balija <ba...@gmail.com>
> *To:* user@hadoop.apache.org; Sai Sai <sa...@yahoo.in>
> *Sent:* Saturday, 23 February 2013 5:23 AM
> *Subject:* Re: WordPairCount Mapreduce question.
>
> Please check the in-line answers...
>
> On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
> This is how usually hashcode is calculated for any String instance
> (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
> the String. Since in your case you only have 2 chars then it will be a *
> 31^0 + b * 31^1.
>
>
>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
> you will extend the same approach for the third column,
>  public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>             if(diff==0){
>                  diff = word3.compareTo(o.word3);
>             }
>          }
>         return diff;
>     }
>
>
>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
>         return word1;
>     }
>
>     public void setWord1(String word1) {
>         this.word1 = word1;
>     }
>
>     public String getWord2() {
>         return word2;
>     }
>
>     public void setWord2(String word2) {
>         this.word2 = word2;
>     }
>
>     @Override
>     public void readFields(DataInput in) throws IOException {
>         word1 = in.readUTF();
>         word2 = in.readUTF();
>     }
>
>     @Override
>     public void write(DataOutput out) throws IOException {
>         out.writeUTF(word1);
>         out.writeUTF(word2);
>     }
>
>
>     @Override
>     public String toString() {
>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>     }
>
> }
>
> ******************************
>
> Any help will be really appreciated.
> Thanks
> Sai
>
>
>
>
>

Re: WordPairCount Mapreduce question.

Posted by Sai Sai <sa...@yahoo.in>.
Thanks Mahesh for your help.

Wondering if u can provide some insight with the below compare method using byte[] in the SecondarySort example:

public static class Comparator extends WritableComparator {
        public Comparator() {
            super(URICountKey.class);
        }

        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return compareBytes(b1, s1, l1, b2, s2, l2);
        }
    }


My question is in the below compare method that i have given we are comparing word1/word2
which makes sense but what about this byte[] comparison, is it right in assuming  it converts each objects word1/word2/word3 to byte[] and compares them.
If so is it for performance reason it is done.
Could you please verify.
Thanks
Sai


________________________________
 From: Mahesh Balija <ba...@gmail.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Saturday, 23 February 2013 5:23 AM
Subject: Re: WordPairCount Mapreduce question.
 

Please check the in-line answers...


On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:


>
>Hello
>
>
>I have a question about how Mapreduce sorting works internally with multiple columns.
>
>
>Below r my classes using 2 columns in an input file given below.
>
>
>
>1st question: About the method hashCode, we r adding a "31 + ", i am wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of the String. Since in your case you only have 2 chars then it will be a * 31^0 + b * 31^1.
 


>
>2nd question: what if my input file has 3 columns instead of 2 how would you write a compare method and was wondering if anyone can map this to a real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }
    

>
>
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>    
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>
>******************************
>
>Here is my input file wordpair.txt
>
>******************************
>
>a    b
>a    c
>a    b
>a    d
>b    d
>e    f
>b    d
>e    f
>b    d
>
>**********************************
>
>
>Here is my WordPairObject:
>
>*********************************
>
>public class WordPairCountKey implements WritableComparable<WordPairCountKey> {
>
>    private String word1;
>    private String word2;
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>    
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>    
>    public String getWord1() {
>        return word1;
>    }
>
>    public void setWord1(String word1) {
>        this.word1 = word1;
>    }
>
>    public String getWord2() {
>        return word2;
>    }
>
>    public void setWord2(String word2) {
>        this.word2 = word2;
>    }
>
>    @Override
>    public void readFields(DataInput in) throws IOException {
>        word1 = in.readUTF();
>        word2 = in.readUTF();
>    }
>
>    @Override
>    public void
 write(DataOutput out) throws IOException {
>        out.writeUTF(word1);
>        out.writeUTF(word2);
>    }
>
>    
>    @Override
>    public String toString() {
>        return "[word1=" + word1 + ", word2=" + word2 + "]";
>    }
>
>}
>
>******************************
>
>Any help will be really appreciated.
>Thanks
>Sai
>

Re: Trying to copy file to Hadoop file system from a program

Posted by Sai Sai <sa...@yahoo.in>.

Greetings,

Below is the program i am trying to run and getting this exception:
***************************************

Test Start.....
java.net.UnknownHostException: unknown host: master
    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
    at org.apache.hadoop.ipc.Client.call(Client.java:1050)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)



********************


public class HdpTest {
    
    public static String fsURI = "hdfs://master:9000";

    
    public static void copyFileToDFS(FileSystem fs, String srcFile,
            String dstFile) throws IOException {
        try {
            System.out.println("Initialize copy...");
            URI suri = new URI(srcFile);
            URI duri = new URI(fsURI + "/" + dstFile);
            Path dst = new Path(duri.toString());
            Path src = new Path(suri.toString());
            System.out.println("Start copy...");
            fs.copyFromLocalFile(src, dst);
            System.out.println("End copy...");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        try {
            System.out.println("Test Start.....");
            Configuration conf = new Configuration();
            DistributedFileSystem fs = new DistributedFileSystem();
            URI duri = new URI(fsURI);
            fs.initialize(duri, conf); // Here is the xception occuring
            long start = 0, end = 0;
            start = System.nanoTime();
            //writing data from local to HDFS
            copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
                    "/input/raptor/trade1.txt");
            //Writing data from HDFS to Local
//             copyFileFromDFS(fs, "/input/raptor/trade1.txt", "/home/kosmos/Work/input/wordpair1.txt");
            end = System.nanoTime();
            System.out.println("Total Execution times: " + (end - start));
            fs.close();
        } catch (Throwable t) {
            t.printStackTrace();
        }
    }

******************************
I am trying to access in FireFox this url: 

hdfs://master:9000

Get an error msg FF does not know how to display this message.

I can successfully access my admin page:

http://localhost:50070/dfshealth.jsp

Just wondering if anyone can give me any suggestions, your help will be really appreciated.
Thanks
Sai

Re: Trying to copy file to Hadoop file system from a program

Posted by Sai Sai <sa...@yahoo.in>.

Greetings,

Below is the program i am trying to run and getting this exception:
***************************************

Test Start.....
java.net.UnknownHostException: unknown host: master
    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
    at org.apache.hadoop.ipc.Client.call(Client.java:1050)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)



********************


public class HdpTest {
    
    public static String fsURI = "hdfs://master:9000";

    
    public static void copyFileToDFS(FileSystem fs, String srcFile,
            String dstFile) throws IOException {
        try {
            System.out.println("Initialize copy...");
            URI suri = new URI(srcFile);
            URI duri = new URI(fsURI + "/" + dstFile);
            Path dst = new Path(duri.toString());
            Path src = new Path(suri.toString());
            System.out.println("Start copy...");
            fs.copyFromLocalFile(src, dst);
            System.out.println("End copy...");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        try {
            System.out.println("Test Start.....");
            Configuration conf = new Configuration();
            DistributedFileSystem fs = new DistributedFileSystem();
            URI duri = new URI(fsURI);
            fs.initialize(duri, conf); // Here is the xception occuring
            long start = 0, end = 0;
            start = System.nanoTime();
            //writing data from local to HDFS
            copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
                    "/input/raptor/trade1.txt");
            //Writing data from HDFS to Local
//             copyFileFromDFS(fs, "/input/raptor/trade1.txt", "/home/kosmos/Work/input/wordpair1.txt");
            end = System.nanoTime();
            System.out.println("Total Execution times: " + (end - start));
            fs.close();
        } catch (Throwable t) {
            t.printStackTrace();
        }
    }

******************************
I am trying to access in FireFox this url: 

hdfs://master:9000

Get an error msg FF does not know how to display this message.

I can successfully access my admin page:

http://localhost:50070/dfshealth.jsp

Just wondering if anyone can give me any suggestions, your help will be really appreciated.
Thanks
Sai

Re: WordPairCount Mapreduce question.

Posted by Sai Sai <sa...@yahoo.in>.
Thanks Mahesh for your help.

Wondering if u can provide some insight with the below compare method using byte[] in the SecondarySort example:

public static class Comparator extends WritableComparator {
        public Comparator() {
            super(URICountKey.class);
        }

        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return compareBytes(b1, s1, l1, b2, s2, l2);
        }
    }


My question is in the below compare method that i have given we are comparing word1/word2
which makes sense but what about this byte[] comparison, is it right in assuming  it converts each objects word1/word2/word3 to byte[] and compares them.
If so is it for performance reason it is done.
Could you please verify.
Thanks
Sai


________________________________
 From: Mahesh Balija <ba...@gmail.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Saturday, 23 February 2013 5:23 AM
Subject: Re: WordPairCount Mapreduce question.
 

Please check the in-line answers...


On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:


>
>Hello
>
>
>I have a question about how Mapreduce sorting works internally with multiple columns.
>
>
>Below r my classes using 2 columns in an input file given below.
>
>
>
>1st question: About the method hashCode, we r adding a "31 + ", i am wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of the String. Since in your case you only have 2 chars then it will be a * 31^0 + b * 31^1.
 


>
>2nd question: what if my input file has 3 columns instead of 2 how would you write a compare method and was wondering if anyone can map this to a real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }
    

>
>
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>    
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>
>******************************
>
>Here is my input file wordpair.txt
>
>******************************
>
>a    b
>a    c
>a    b
>a    d
>b    d
>e    f
>b    d
>e    f
>b    d
>
>**********************************
>
>
>Here is my WordPairObject:
>
>*********************************
>
>public class WordPairCountKey implements WritableComparable<WordPairCountKey> {
>
>    private String word1;
>    private String word2;
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>    
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>    
>    public String getWord1() {
>        return word1;
>    }
>
>    public void setWord1(String word1) {
>        this.word1 = word1;
>    }
>
>    public String getWord2() {
>        return word2;
>    }
>
>    public void setWord2(String word2) {
>        this.word2 = word2;
>    }
>
>    @Override
>    public void readFields(DataInput in) throws IOException {
>        word1 = in.readUTF();
>        word2 = in.readUTF();
>    }
>
>    @Override
>    public void
 write(DataOutput out) throws IOException {
>        out.writeUTF(word1);
>        out.writeUTF(word2);
>    }
>
>    
>    @Override
>    public String toString() {
>        return "[word1=" + word1 + ", word2=" + word2 + "]";
>    }
>
>}
>
>******************************
>
>Any help will be really appreciated.
>Thanks
>Sai
>

Re: WordPairCount Mapreduce question.

Posted by Sai Sai <sa...@yahoo.in>.
Thanks Mahesh for your help.

Wondering if u can provide some insight with the below compare method using byte[] in the SecondarySort example:

public static class Comparator extends WritableComparator {
        public Comparator() {
            super(URICountKey.class);
        }

        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return compareBytes(b1, s1, l1, b2, s2, l2);
        }
    }


My question is in the below compare method that i have given we are comparing word1/word2
which makes sense but what about this byte[] comparison, is it right in assuming  it converts each objects word1/word2/word3 to byte[] and compares them.
If so is it for performance reason it is done.
Could you please verify.
Thanks
Sai


________________________________
 From: Mahesh Balija <ba...@gmail.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Saturday, 23 February 2013 5:23 AM
Subject: Re: WordPairCount Mapreduce question.
 

Please check the in-line answers...


On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:


>
>Hello
>
>
>I have a question about how Mapreduce sorting works internally with multiple columns.
>
>
>Below r my classes using 2 columns in an input file given below.
>
>
>
>1st question: About the method hashCode, we r adding a "31 + ", i am wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of the String. Since in your case you only have 2 chars then it will be a * 31^0 + b * 31^1.
 


>
>2nd question: what if my input file has 3 columns instead of 2 how would you write a compare method and was wondering if anyone can map this to a real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }
    

>
>
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>    
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>
>******************************
>
>Here is my input file wordpair.txt
>
>******************************
>
>a    b
>a    c
>a    b
>a    d
>b    d
>e    f
>b    d
>e    f
>b    d
>
>**********************************
>
>
>Here is my WordPairObject:
>
>*********************************
>
>public class WordPairCountKey implements WritableComparable<WordPairCountKey> {
>
>    private String word1;
>    private String word2;
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>    
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>    
>    public String getWord1() {
>        return word1;
>    }
>
>    public void setWord1(String word1) {
>        this.word1 = word1;
>    }
>
>    public String getWord2() {
>        return word2;
>    }
>
>    public void setWord2(String word2) {
>        this.word2 = word2;
>    }
>
>    @Override
>    public void readFields(DataInput in) throws IOException {
>        word1 = in.readUTF();
>        word2 = in.readUTF();
>    }
>
>    @Override
>    public void
 write(DataOutput out) throws IOException {
>        out.writeUTF(word1);
>        out.writeUTF(word2);
>    }
>
>    
>    @Override
>    public String toString() {
>        return "[word1=" + word1 + ", word2=" + word2 + "]";
>    }
>
>}
>
>******************************
>
>Any help will be really appreciated.
>Thanks
>Sai
>

Re: WordPairCount Mapreduce question.

Posted by Sai Sai <sa...@yahoo.in>.
Thanks Mahesh for your help.

Wondering if u can provide some insight with the below compare method using byte[] in the SecondarySort example:

public static class Comparator extends WritableComparator {
        public Comparator() {
            super(URICountKey.class);
        }

        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return compareBytes(b1, s1, l1, b2, s2, l2);
        }
    }


My question is in the below compare method that i have given we are comparing word1/word2
which makes sense but what about this byte[] comparison, is it right in assuming  it converts each objects word1/word2/word3 to byte[] and compares them.
If so is it for performance reason it is done.
Could you please verify.
Thanks
Sai


________________________________
 From: Mahesh Balija <ba...@gmail.com>
To: user@hadoop.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Saturday, 23 February 2013 5:23 AM
Subject: Re: WordPairCount Mapreduce question.
 

Please check the in-line answers...


On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:


>
>Hello
>
>
>I have a question about how Mapreduce sorting works internally with multiple columns.
>
>
>Below r my classes using 2 columns in an input file given below.
>
>
>
>1st question: About the method hashCode, we r adding a "31 + ", i am wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance (s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of the String. Since in your case you only have 2 chars then it will be a * 31^0 + b * 31^1.
 


>
>2nd question: what if my input file has 3 columns instead of 2 how would you write a compare method and was wondering if anyone can map this to a real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }
    

>
>
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>    
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>
>******************************
>
>Here is my input file wordpair.txt
>
>******************************
>
>a    b
>a    c
>a    b
>a    d
>b    d
>e    f
>b    d
>e    f
>b    d
>
>**********************************
>
>
>Here is my WordPairObject:
>
>*********************************
>
>public class WordPairCountKey implements WritableComparable<WordPairCountKey> {
>
>    private String word1;
>    private String word2;
>
>    @Override
>    public int compareTo(WordPairCountKey o) {
>        int diff = word1.compareTo(o.word1);
>        if (diff == 0) {
>            diff = word2.compareTo(o.word2);
>        }
>        return diff;
>    }
>    
>    @Override
>    public int hashCode() {
>        return word1.hashCode() + 31 * word2.hashCode();
>    }
>
>    
>    public String getWord1() {
>        return word1;
>    }
>
>    public void setWord1(String word1) {
>        this.word1 = word1;
>    }
>
>    public String getWord2() {
>        return word2;
>    }
>
>    public void setWord2(String word2) {
>        this.word2 = word2;
>    }
>
>    @Override
>    public void readFields(DataInput in) throws IOException {
>        word1 = in.readUTF();
>        word2 = in.readUTF();
>    }
>
>    @Override
>    public void
 write(DataOutput out) throws IOException {
>        out.writeUTF(word1);
>        out.writeUTF(word2);
>    }
>
>    
>    @Override
>    public String toString() {
>        return "[word1=" + word1 + ", word2=" + word2 + "]";
>    }
>
>}
>
>******************************
>
>Any help will be really appreciated.
>Thanks
>Sai
>

Re: Trying to copy file to Hadoop file system from a program

Posted by Sai Sai <sa...@yahoo.in>.

Greetings,

Below is the program i am trying to run and getting this exception:
***************************************

Test Start.....
java.net.UnknownHostException: unknown host: master
    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1196)
    at org.apache.hadoop.ipc.Client.call(Client.java:1050)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at kelly.hadoop.hive.test.HadoopTest.main(HadoopTest.java:54)



********************


public class HdpTest {
    
    public static String fsURI = "hdfs://master:9000";

    
    public static void copyFileToDFS(FileSystem fs, String srcFile,
            String dstFile) throws IOException {
        try {
            System.out.println("Initialize copy...");
            URI suri = new URI(srcFile);
            URI duri = new URI(fsURI + "/" + dstFile);
            Path dst = new Path(duri.toString());
            Path src = new Path(suri.toString());
            System.out.println("Start copy...");
            fs.copyFromLocalFile(src, dst);
            System.out.println("End copy...");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        try {
            System.out.println("Test Start.....");
            Configuration conf = new Configuration();
            DistributedFileSystem fs = new DistributedFileSystem();
            URI duri = new URI(fsURI);
            fs.initialize(duri, conf); // Here is the xception occuring
            long start = 0, end = 0;
            start = System.nanoTime();
            //writing data from local to HDFS
            copyFileToDFS(fs, "/home/kosmos/Work/input/wordpair.txt",
                    "/input/raptor/trade1.txt");
            //Writing data from HDFS to Local
//             copyFileFromDFS(fs, "/input/raptor/trade1.txt", "/home/kosmos/Work/input/wordpair1.txt");
            end = System.nanoTime();
            System.out.println("Total Execution times: " + (end - start));
            fs.close();
        } catch (Throwable t) {
            t.printStackTrace();
        }
    }

******************************
I am trying to access in FireFox this url: 

hdfs://master:9000

Get an error msg FF does not know how to display this message.

I can successfully access my admin page:

http://localhost:50070/dfshealth.jsp

Just wondering if anyone can give me any suggestions, your help will be really appreciated.
Thanks
Sai

Re: WordPairCount Mapreduce question.

Posted by Mahesh Balija <ba...@gmail.com>.
Please check the in-line answers...

On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:

>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance
(s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
the String. Since in your case you only have 2 chars then it will be a *
31^0 + b * 31^1.


>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }


>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
>         return word1;
>     }
>
>     public void setWord1(String word1) {
>         this.word1 = word1;
>     }
>
>     public String getWord2() {
>         return word2;
>     }
>
>     public void setWord2(String word2) {
>         this.word2 = word2;
>     }
>
>     @Override
>     public void readFields(DataInput in) throws IOException {
>         word1 = in.readUTF();
>         word2 = in.readUTF();
>     }
>
>     @Override
>     public void write(DataOutput out) throws IOException {
>         out.writeUTF(word1);
>         out.writeUTF(word2);
>     }
>
>
>     @Override
>     public String toString() {
>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>     }
>
> }
>
> ******************************
>
> Any help will be really appreciated.
> Thanks
> Sai
>

Re: WordPairCount Mapreduce question.

Posted by Mahesh Balija <ba...@gmail.com>.
Please check the in-line answers...

On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:

>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance
(s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
the String. Since in your case you only have 2 chars then it will be a *
31^0 + b * 31^1.


>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }


>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
>         return word1;
>     }
>
>     public void setWord1(String word1) {
>         this.word1 = word1;
>     }
>
>     public String getWord2() {
>         return word2;
>     }
>
>     public void setWord2(String word2) {
>         this.word2 = word2;
>     }
>
>     @Override
>     public void readFields(DataInput in) throws IOException {
>         word1 = in.readUTF();
>         word2 = in.readUTF();
>     }
>
>     @Override
>     public void write(DataOutput out) throws IOException {
>         out.writeUTF(word1);
>         out.writeUTF(word2);
>     }
>
>
>     @Override
>     public String toString() {
>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>     }
>
> }
>
> ******************************
>
> Any help will be really appreciated.
> Thanks
> Sai
>

Re: WordPairCount Mapreduce question.

Posted by Mahesh Balija <ba...@gmail.com>.
Please check the in-line answers...

On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:

>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance
(s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
the String. Since in your case you only have 2 chars then it will be a *
31^0 + b * 31^1.


>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }


>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
>         return word1;
>     }
>
>     public void setWord1(String word1) {
>         this.word1 = word1;
>     }
>
>     public String getWord2() {
>         return word2;
>     }
>
>     public void setWord2(String word2) {
>         this.word2 = word2;
>     }
>
>     @Override
>     public void readFields(DataInput in) throws IOException {
>         word1 = in.readUTF();
>         word2 = in.readUTF();
>     }
>
>     @Override
>     public void write(DataOutput out) throws IOException {
>         out.writeUTF(word1);
>         out.writeUTF(word2);
>     }
>
>
>     @Override
>     public String toString() {
>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>     }
>
> }
>
> ******************************
>
> Any help will be really appreciated.
> Thanks
> Sai
>

Re: WordPairCount Mapreduce question.

Posted by Mahesh Balija <ba...@gmail.com>.
Please check the in-line answers...

On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai <sa...@yahoo.in> wrote:

>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCode, we r adding a "31 + ", i am
> wondering why is this required. what does 31 refer to.
>
This is how usually hashcode is calculated for any String instance
(s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]) where n stands for length of
the String. Since in your case you only have 2 chars then it will be a *
31^0 + b * 31^1.


>
> 2nd question: what if my input file has 3 columns instead of 2 how would
> you write a compare method and was wondering if anyone can map this to a
> real world scenario it will be really helpful.
>
you will extend the same approach for the third column,
 public int compareTo(WordPairCountKey o) {
        int diff = word1.compareTo(o.word1);
        if (diff == 0) {
            diff = word2.compareTo(o.word2);
            if(diff==0){
                 diff = word3.compareTo(o.word3);
            }
        }
        return diff;
    }


>
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
> ******************************
>
> Here is my input file wordpair.txt
>
> ******************************
>
> a    b
> a    c
> a    b
> a    d
> b    d
> e    f
> b    d
> e    f
> b    d
>
> **********************************
>
> Here is my WordPairObject:
>
> *********************************
>
> public class WordPairCountKey implements
> WritableComparable<WordPairCountKey> {
>
>     private String word1;
>     private String word2;
>
>     @Override
>     public int compareTo(WordPairCountKey o) {
>         int diff = word1.compareTo(o.word1);
>         if (diff == 0) {
>             diff = word2.compareTo(o.word2);
>         }
>         return diff;
>     }
>
>     @Override
>     public int hashCode() {
>         return word1.hashCode() + 31 * word2.hashCode();
>     }
>
>
>     public String getWord1() {
>         return word1;
>     }
>
>     public void setWord1(String word1) {
>         this.word1 = word1;
>     }
>
>     public String getWord2() {
>         return word2;
>     }
>
>     public void setWord2(String word2) {
>         this.word2 = word2;
>     }
>
>     @Override
>     public void readFields(DataInput in) throws IOException {
>         word1 = in.readUTF();
>         word2 = in.readUTF();
>     }
>
>     @Override
>     public void write(DataOutput out) throws IOException {
>         out.writeUTF(word1);
>         out.writeUTF(word2);
>     }
>
>
>     @Override
>     public String toString() {
>         return "[word1=" + word1 + ", word2=" + word2 + "]";
>     }
>
> }
>
> ******************************
>
> Any help will be really appreciated.
> Thanks
> Sai
>