You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Krzysztof Kucybała <kr...@softwaremind.pl> on 2006/05/23 14:39:32 UTC

Hadoop + WinXP + cygwin

Hi!

I am new to hadoop as well as cygwin, and as far as I know, you need to 
use cygwin in order to use hadoop on win. Unfortunately I'm not allowed 
to switch to linux or even use a linux machine to get the dfs running. 
Seems there's no other way but the cygwin-way, is there? Anyways, I was 
wondering, is there a way to get hadoop daemons running via cygwin, and 
then quit the latter? Cause I think I got the namenode and datanode 
running (how can I test that, by the way - other than by writing "ps" in 
cygwin? Does writing the address and port of the node in a browser and 
waiting for the outcome, which is a blank-but-not-error page, tell me 
anything about whether the dfs is configured correctly?), yet if I close 
cygwin, the daemons shutdown too.

And there's another thing I'd like to ask. I'm writing a Java program 
that is supposed to connect to a dfs. As much as I've read the API docs, 
I suppose I should use the DistributedFileSystem class, shouldn't I? But 
what does creating it's instance actually do? Create me a new filesystem 
or rather just a connection to an existing one? What I do is specify a 
InetSocketAddress and a Configuration. Can the configuration object be 
created using the hadoop-defaul.xml and hadoop-site.xml files?

I know these questions probably sound stupid, but still I'd really 
appreciate if someone provided me with some answers. I'm a true beginner 
  in the matters of hadoop and cygwin, and I'm also quite new to Java, 
so please - be gentle ;o)

Regards,
-- 
Krzysztof Kucybała
Software Mind | Where Quality Meets the Future

e-mail: krzysztof.kucybala@softwaremind.pl
skype: krzysztof.kucybala
tel./fax: +48-12 614 51 70
http://www.softwaremind.pl


****************************** LEGAL DISCLAIMER**********************
This e-mail and any attachments thereto may contain information that
is confidential and/ or protected by intellectual property rights and
are intended for the sole use of the recipient(s) named above, at the
e-mail address to which it has been addressed. Any use of the
information contained herein (including, but not limited to, total or
partial reproduction, communication or distribution in any form) by
persons other than the designated recipient(s) is prohibited. If you
have received this e-mail in error, please notify the sender either
by telephone on +48-12 6145170 or by e-mail and delete the material
from any computer. Please note that we accept no responsibility for
viruses and it is your responsibility to scan attachments (if any).

Re: Hadoop + WinXP + cygwin

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.

Cool, so you are up and running.
By default you are in /user/<user> directory, even if it is not created yet.
I'm not sure this is documented but it's a feature.
Try
bin/hadoop dfs -copyFromLocal something something
and then
bin/hadoop dfs -ls

Good luck with your exploration.

--Konst

Krzysztof Kucybała wrote:

> Thanks a lot :o) Still, some new questions emerged, which I'd like to 
> ask. I tried what You told me, only with lsr instead of ls -
>     
>     bin/hadoop dfs -lsr /
>
> Here's the result:
>     
>     /tmp    <dir>
>     /tmp/hadoop    <dir>
>     /tmp/hadoop/mapred    <dir>
>     /tmp/hadoop/mapred/system    <dir>
>
> So I assume, the fs is up and running. But then I got an interesting 
> results in Java. Here's the code:
>
>     try {
>       InetSocketAddress adr = new InetSocketAddress("127.0.0.1",2905);
>       Configuration conf = new Configuration();
>       DistributedFileSystem dfs = new DistributedFileSystem(adr,conf);
>       System.out.println("Working dir: " + dfs.getWorkingDirectory());
>       System.out.println("Size: " + dfs.getUsed());
>       System.out.println("Name: " + dfs.getName());
>     } catch (IOException e) {
>       e.printStackTrace();
>     }
>
> And the output:
>
> 060524 090137 parsing jar:file:/W:/lib/hadoop-0.2.1.jar              
> !/hadoop-default.xml
> 060524 090138 Client connection to 127.0.0.1:2905: starting
> Working dir: /user/praktykant //!!!Non existent anywhere in the system
> Size: 0
> Name: localhost:2905
>
> So I'm curious... How come I get a directory that not only doesn't 
> exist on the dfs, but neither does it exist anywhere on my system or 
> is visible under cygwin. How? :o) By the way - in hadoop-site.xml I 
> changed the fs.default.name to localhost:2905 and dfs.datanode.port to 
> 2906, but this is the hadoop-site.xml file I used when I called 
> start-all.sh in cygwin, whereas my eclipse seems to be using the one 
> configuration stored inside the jar file, doesn't it? Is there a way 
> to change that behaviour?
>
> Once again, many many thanks :o)
> Regards,
> Krzysztof Kucybała
>
> Konstantin Shvachko napisał(a):
>
>> Krzysztof Kucybała wrote:
>>
>>> Hi!
>>>
>>> I am new to hadoop as well as cygwin, and as far as I know, you need 
>>> to use cygwin in order to use hadoop on win. Unfortunately I'm not 
>>> allowed to switch to linux or even use a linux machine to get the 
>>> dfs running. Seems there's no other way but the cygwin-way, is there? 
>>
>>
>> If you use dfs only then you can replace one class DF.java with a 
>> universal version see attachment in
>> http://issues.apache.org/jira/browse/HADOOP-33
>> and run the cluster without cygwin. I do.
>> If you are planning to use map/reduce then cygwin is probably the 
>> best way, since you want to start
>> job/task Tracker using hadoop scripts.
>>
>>> Anyways, I was wondering, is there a way to get hadoop daemons 
>>> running via cygwin, and then quit the latter? Cause I think I got 
>>> the namenode and datanode running (how can I test that, by the way - 
>>> other than by writing "ps" in cygwin?
>>
>>
>> Use bin/hadoop dfs -ls /
>> or other options. This is a command line shell. Under cygwin.
>>
>>> Does writing the address and port of the node in a browser and 
>>> waiting for the outcome, which is a blank-but-not-error page, tell 
>>> me anything about whether the dfs is configured correctly?), yet if 
>>> I close cygwin, the daemons shutdown too.
>>>
>>> And there's another thing I'd like to ask. I'm writing a Java 
>>> program that is supposed to connect to a dfs. As much as I've read 
>>> the API docs, I suppose I should use the DistributedFileSystem 
>>> class, shouldn't I? 
>>
>>
>> You should use FileSystem class see
>> org.apache.hadoop.examples
>> and test/org.apache.hadoop.fs
>>
>>> But what does creating it's instance actually do? Create me a new 
>>> filesystem or rather just a connection to an existing one? What I do 
>>> is specify a InetSocketAddress and a Configuration. 
>>
>>
>> FileSystem.get( Configuration ) would do
>>
>>> Can the configuration object be created using the hadoop-defaul.xml 
>>> and hadoop-site.xml files?
>>
>>
>> This is the default behavior. Configuration constructor reads the files.
>>
>>> I know these questions probably sound stupid, but still I'd really 
>>> appreciate if someone provided me with some answers. I'm a true 
>>> beginner  in the matters of hadoop and cygwin, and I'm also quite 
>>> new to Java, so please - be gentle ;o)
>>>
>>> Regards,
>>
>>
>>
>>
>
>
>

Re: Hadoop + WinXP + cygwin

Posted by Krzysztof Kucybała <kr...@softwaremind.pl>.

Thanks a lot :o) Still, some new questions emerged, which I'd like to 
ask. I tried what You told me, only with lsr instead of ls -
	
	bin/hadoop dfs -lsr /

Here's the result:
	
	/tmp	<dir>
	/tmp/hadoop	<dir>
	/tmp/hadoop/mapred	<dir>
	/tmp/hadoop/mapred/system	<dir>

So I assume, the fs is up and running. But then I got an interesting 
results in Java. Here's the code:

     try {
       InetSocketAddress adr = new InetSocketAddress("127.0.0.1",2905);
       Configuration conf = new Configuration();
       DistributedFileSystem dfs = new DistributedFileSystem(adr,conf);
       System.out.println("Working dir: " + dfs.getWorkingDirectory());
       System.out.println("Size: " + dfs.getUsed());
       System.out.println("Name: " + dfs.getName());
     } catch (IOException e) {
       e.printStackTrace();
     }

And the output:

060524 090137 parsing jar:file:/W:/lib/hadoop-0.2.1.jar 			 
!/hadoop-default.xml
060524 090138 Client connection to 127.0.0.1:2905: starting
Working dir: /user/praktykant //!!!Non existent anywhere in the system
Size: 0
Name: localhost:2905

So I'm curious... How come I get a directory that not only doesn't exist 
on the dfs, but neither does it exist anywhere on my system or is 
visible under cygwin. How? :o) By the way - in hadoop-site.xml I changed 
the fs.default.name to localhost:2905 and dfs.datanode.port to 2906, but 
this is the hadoop-site.xml file I used when I called start-all.sh in 
cygwin, whereas my eclipse seems to be using the one configuration 
stored inside the jar file, doesn't it? Is there a way to change that 
behaviour?

Once again, many many thanks :o)
Regards,
Krzysztof Kucybała

Konstantin Shvachko napisał(a):
> Krzysztof Kucybała wrote:
> 
>> Hi!
>>
>> I am new to hadoop as well as cygwin, and as far as I know, you need 
>> to use cygwin in order to use hadoop on win. Unfortunately I'm not 
>> allowed to switch to linux or even use a linux machine to get the dfs 
>> running. Seems there's no other way but the cygwin-way, is there? 
> 
> If you use dfs only then you can replace one class DF.java with a 
> universal version see attachment in
> http://issues.apache.org/jira/browse/HADOOP-33
> and run the cluster without cygwin. I do.
> If you are planning to use map/reduce then cygwin is probably the best 
> way, since you want to start
> job/task Tracker using hadoop scripts.
> 
>> Anyways, I was wondering, is there a way to get hadoop daemons running 
>> via cygwin, and then quit the latter? Cause I think I got the namenode 
>> and datanode running (how can I test that, by the way - other than by 
>> writing "ps" in cygwin?
> 
> Use bin/hadoop dfs -ls /
> or other options. This is a command line shell. Under cygwin.
> 
>> Does writing the address and port of the node in a browser and waiting 
>> for the outcome, which is a blank-but-not-error page, tell me anything 
>> about whether the dfs is configured correctly?), yet if I close 
>> cygwin, the daemons shutdown too.
>>
>> And there's another thing I'd like to ask. I'm writing a Java program 
>> that is supposed to connect to a dfs. As much as I've read the API 
>> docs, I suppose I should use the DistributedFileSystem class, 
>> shouldn't I? 
> 
> You should use FileSystem class see
> org.apache.hadoop.examples
> and test/org.apache.hadoop.fs
> 
>> But what does creating it's instance actually do? Create me a new 
>> filesystem or rather just a connection to an existing one? What I do 
>> is specify a InetSocketAddress and a Configuration. 
> 
> FileSystem.get( Configuration ) would do
> 
>> Can the configuration object be created using the hadoop-defaul.xml 
>> and hadoop-site.xml files?
> 
> This is the default behavior. Configuration constructor reads the files.
> 
>> I know these questions probably sound stupid, but still I'd really 
>> appreciate if someone provided me with some answers. I'm a true 
>> beginner  in the matters of hadoop and cygwin, and I'm also quite new 
>> to Java, so please - be gentle ;o)
>>
>> Regards,
> 
> 
>

Re: Hadoop + WinXP + cygwin

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.

Krzysztof Kucybała wrote:

> Hi!
>
> I am new to hadoop as well as cygwin, and as far as I know, you need 
> to use cygwin in order to use hadoop on win. Unfortunately I'm not 
> allowed to switch to linux or even use a linux machine to get the dfs 
> running. Seems there's no other way but the cygwin-way, is there? 

If you use dfs only then you can replace one class DF.java with a 
universal version see attachment in
http://issues.apache.org/jira/browse/HADOOP-33
and run the cluster without cygwin. I do.
If you are planning to use map/reduce then cygwin is probably the best 
way, since you want to start
job/task Tracker using hadoop scripts.

> Anyways, I was wondering, is there a way to get hadoop daemons running 
> via cygwin, and then quit the latter? Cause I think I got the namenode 
> and datanode running (how can I test that, by the way - other than by 
> writing "ps" in cygwin?

Use bin/hadoop dfs -ls /
or other options. This is a command line shell. Under cygwin.

> Does writing the address and port of the node in a browser and waiting 
> for the outcome, which is a blank-but-not-error page, tell me anything 
> about whether the dfs is configured correctly?), yet if I close 
> cygwin, the daemons shutdown too.
>
> And there's another thing I'd like to ask. I'm writing a Java program 
> that is supposed to connect to a dfs. As much as I've read the API 
> docs, I suppose I should use the DistributedFileSystem class, 
> shouldn't I? 

You should use FileSystem class see
org.apache.hadoop.examples
and test/org.apache.hadoop.fs

> But what does creating it's instance actually do? Create me a new 
> filesystem or rather just a connection to an existing one? What I do 
> is specify a InetSocketAddress and a Configuration. 

FileSystem.get( Configuration ) would do

> Can the configuration object be created using the hadoop-defaul.xml 
> and hadoop-site.xml files?

This is the default behavior. Configuration constructor reads the files.

> I know these questions probably sound stupid, but still I'd really 
> appreciate if someone provided me with some answers. I'm a true 
> beginner  in the matters of hadoop and cygwin, and I'm also quite new 
> to Java, so please - be gentle ;o)
>
> Regards,