You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@whirr.apache.org by Samarth Gupta <sa...@gmail.com> on 2013/10/29 06:10:46 UTC

Running whirr with VPC

Hi,

I have launched cluster using whirr and tried using HDFS API's to transfer
data on the started cluster. However got the following error :

"There are 1 datanode(s) running and 1 node(s) are excluded from the
operation"

which was same as
http://stackoverflow.com/questions/14544055/copyfromlocalfile-doesnt-work-in-cdh4


On setting up VPC and VPN and launching cluster through whirr i faced the
problem of "No default VPC assigned" , for which i made a change a whirr
source code and passed the VPC ID while launching clusters.

I was able to start the cluster using the same, but faced trouble in :

1. Configuration scripts for installing hadoop were unsuccessful
2. Security groups API was throwing error since VPC already had a assigned
security group and whirr was trying to create one more.


I wanted to check, if anyone is using whirr with AWS VPC and help me with
the same ...

Thanks:
Samarth

Re: Running whirr with VPC

Posted by Andrei Savu <sa...@gmail.com>.
You don't VPC with a hardware VPN  if all you need is an easy way to
connect to the cluster from the local machine - a SOCKS proxy over SSH will
work just fine.

My recommendation for you would be to use standard EC2 and start a proxy on
your local machine. See "Run a proxy" section in the guide:
http://whirr.apache.org/docs/0.8.2/quick-start-guide.html

Also you need to make sure you are running the same version of Hadoop on
your local machine.

Are you going to transfer large amounts of data? What's the end goal?

--
Andrei Savu

On Mon, Oct 28, 2013 at 10:36 PM, Samarth Gupta
<sa...@gmail.com>wrote:

> the only reason i had to use VPC was the same as
> http://stackoverflow.com/questions/14544055/copyfromlocalfile-doesnt-work-in-cdh4
>
> While using HDFS api to transfer data,  namenode returns private IP of the
> datanode, so any method that deals with datanode is not able to communicate
> with it since they are not on same network. If i can get transfer to HDFS
> working, i can do away with VPC.
>
> Thanks..
>
>
> On Tue, Oct 29, 2013 at 10:44 AM, Andrei Savu <sa...@gmail.com>wrote:
>
>> Hi Samarth -
>>
>>
>> AFAIK we've never actually tested Whirr with VPC. I guess you can make it
>> work eventually but probably you will need to make many changes.
>>
>> Can you tell me a bit more about your use case? Why VPC and not standard
>> EC2?
>>
>> --
>> Andrei Savu - https://www.linkedin.com/in/sandrei
>>
>> On Mon, Oct 28, 2013 at 10:10 PM, Samarth Gupta <
>> samarthgupta437@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have launched cluster using whirr and tried using HDFS API's to
>>> transfer data on the started cluster. However got the following error :
>>>
>>> "There are 1 datanode(s) running and 1 node(s) are excluded from the
>>> operation"
>>>
>>> which was same as
>>> http://stackoverflow.com/questions/14544055/copyfromlocalfile-doesnt-work-in-cdh4
>>>
>>>
>>> On setting up VPC and VPN and launching cluster through whirr i faced
>>> the problem of "No default VPC assigned" , for which i made a change a
>>> whirr source code and passed the VPC ID while launching clusters.
>>>
>>> I was able to start the cluster using the same, but faced trouble in :
>>>
>>> 1. Configuration scripts for installing hadoop were unsuccessful
>>> 2. Security groups API was throwing error since VPC already had a
>>> assigned security group and whirr was trying to create one more.
>>>
>>>
>>> I wanted to check, if anyone is using whirr with AWS VPC and help me
>>> with the same ...
>>>
>>> Thanks:
>>> Samarth
>>>
>>
>>
>

Re: Running whirr with VPC

Posted by Samarth Gupta <sa...@gmail.com>.
the only reason i had to use VPC was the same as
http://stackoverflow.com/questions/14544055/copyfromlocalfile-doesnt-work-in-cdh4

While using HDFS api to transfer data,  namenode returns private IP of the
datanode, so any method that deals with datanode is not able to communicate
with it since they are not on same network. If i can get transfer to HDFS
working, i can do away with VPC.

Thanks..


On Tue, Oct 29, 2013 at 10:44 AM, Andrei Savu <sa...@gmail.com> wrote:

> Hi Samarth -
>
>
> AFAIK we've never actually tested Whirr with VPC. I guess you can make it
> work eventually but probably you will need to make many changes.
>
> Can you tell me a bit more about your use case? Why VPC and not standard
> EC2?
>
> --
> Andrei Savu - https://www.linkedin.com/in/sandrei
>
> On Mon, Oct 28, 2013 at 10:10 PM, Samarth Gupta <samarthgupta437@gmail.com
> > wrote:
>
>> Hi,
>>
>> I have launched cluster using whirr and tried using HDFS API's to
>> transfer data on the started cluster. However got the following error :
>>
>> "There are 1 datanode(s) running and 1 node(s) are excluded from the
>> operation"
>>
>> which was same as
>> http://stackoverflow.com/questions/14544055/copyfromlocalfile-doesnt-work-in-cdh4
>>
>>
>> On setting up VPC and VPN and launching cluster through whirr i faced the
>> problem of "No default VPC assigned" , for which i made a change a whirr
>> source code and passed the VPC ID while launching clusters.
>>
>> I was able to start the cluster using the same, but faced trouble in :
>>
>> 1. Configuration scripts for installing hadoop were unsuccessful
>> 2. Security groups API was throwing error since VPC already had a
>> assigned security group and whirr was trying to create one more.
>>
>>
>> I wanted to check, if anyone is using whirr with AWS VPC and help me with
>> the same ...
>>
>> Thanks:
>> Samarth
>>
>
>

Re: Running whirr with VPC

Posted by Andrei Savu <sa...@gmail.com>.
Hi Samarth -


AFAIK we've never actually tested Whirr with VPC. I guess you can make it
work eventually but probably you will need to make many changes.

Can you tell me a bit more about your use case? Why VPC and not standard
EC2?

--
Andrei Savu - https://www.linkedin.com/in/sandrei

On Mon, Oct 28, 2013 at 10:10 PM, Samarth Gupta
<sa...@gmail.com>wrote:

> Hi,
>
> I have launched cluster using whirr and tried using HDFS API's to transfer
> data on the started cluster. However got the following error :
>
> "There are 1 datanode(s) running and 1 node(s) are excluded from the
> operation"
>
> which was same as
> http://stackoverflow.com/questions/14544055/copyfromlocalfile-doesnt-work-in-cdh4
>
>
> On setting up VPC and VPN and launching cluster through whirr i faced the
> problem of "No default VPC assigned" , for which i made a change a whirr
> source code and passed the VPC ID while launching clusters.
>
> I was able to start the cluster using the same, but faced trouble in :
>
> 1. Configuration scripts for installing hadoop were unsuccessful
> 2. Security groups API was throwing error since VPC already had a assigned
> security group and whirr was trying to create one more.
>
>
> I wanted to check, if anyone is using whirr with AWS VPC and help me with
> the same ...
>
> Thanks:
> Samarth
>