You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by durga <du...@gmail.com> on 2014/07/24 00:26:27 UTC

persistent HDFS instance for cluster restarts/destroys

Hi All,
I have a question,

For my company , we are planning to use spark-ec2 scripts to create cluster
for us.

I understand that , persistent HDFS will make the hdfs available for cluster
restarts.

Question is:

1) What happens , If I destroy and re-create , do I loose the data.
    a) If I loose the data , is there only way is to copy to s3 and recopy
after launching the cluster(it seems costly data transfer from and to s3?)
2) How would I add/remove some machines in the cluster?. I mean I am asking
for cluster management.
Is there any place amazon allows to see the machines , and do the operation
of adding and removing?

Thanks,
D.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/persistent-HDFS-instance-for-cluster-restarts-destroys-tp10551.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: persistent HDFS instance for cluster restarts/destroys

Posted by durga <du...@gmail.com>.
Thanks Mayur. 
is there any documentation/readme with step by step process available for
adding or deleting nodes?
Thanks,
D.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/persistent-HDFS-instance-for-cluster-restarts-destroys-tp10551p10565.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: persistent HDFS instance for cluster restarts/destroys

Posted by Mayur Rustagi <ma...@gmail.com>.
Yes you lose the data
You can add machines but will require you to restart the cluster. Also
adding is manual on you add nodes
Regards
Mayur

On Wednesday, July 23, 2014, durga <du...@gmail.com> wrote:

> Hi All,
> I have a question,
>
> For my company , we are planning to use spark-ec2 scripts to create cluster
> for us.
>
> I understand that , persistent HDFS will make the hdfs available for
> cluster
> restarts.
>
> Question is:
>
> 1) What happens , If I destroy and re-create , do I loose the data.
>     a) If I loose the data , is there only way is to copy to s3 and recopy
> after launching the cluster(it seems costly data transfer from and to s3?)
> 2) How would I add/remove some machines in the cluster?. I mean I am asking
> for cluster management.
> Is there any place amazon allows to see the machines , and do the operation
> of adding and removing?
>
> Thanks,
> D.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/persistent-HDFS-instance-for-cluster-restarts-destroys-tp10551.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>


-- 
Sent from Gmail Mobile