You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Sebastien Brennion <se...@leanbi.ch> on 2015/06/23 13:51:23 UTC

mesos be aware of hdfs location using spark

?Hi,


- I would like to know it there is a way to make mesos dispatch in priority spark jobs, on instances, that also contain hdfs service, to prevent all Data going over the network ?


- What I also not sure about, is how to handle the storage, if hdfs is running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to mesos_worker_2, do all Data have to be copied ? How are handling this ?


Regards

Sébastien

RE: mesos be aware of hdfs location using spark

Posted by Sebastien Brennion <se...@leanbi.ch>.
Thank you very much!
It is much more clear to me now…

From: haosdent [mailto:haosdent@gmail.com]
Sent: mardi 23 juin 2015 16:10
To: user@mesos.apache.org
Subject: Re: mesos be aware of hdfs location using spark

>-          Is there a way to ensure, that there is always a spark and an hdfs instance on the same mesos worker ?
I think mesos could not ensure this now.

> If they are on the same mesos worker, the two services would probably not know they could talk to each other locally ?
Spark use hdfs client library to connect to hdfs. HDFS would detect that which is the fastest way to read data if would configure your hdfs cluster correct.

>Is it the way most of the people are using hdfs in mesos, or are they any bestpractices to attach the same storage when the instance moves ?

I think it is not related to mesos. For a hdfs cluster, if you replica factor number is 3. After you decommission a datanode, some blocks replica would become to 2. This hdfs cluster still could work as normal, because the client would retry to connect other datanodes to get the data. But for the hdfs cluster, it would have some internal network copy to keep the replica factor number 3. There may be some glitches during this time, but I think you don't need worried too much.

On Tue, Jun 23, 2015 at 9:21 PM, Sebastien Brennion <se...@leanbi.ch>> wrote:
Thank you for your answers… I’m new to both…

Sorry sent to quick…

I’m not sure to understand your answers…  I probably should try to reformulate my question…

If hdfs is in Mesos, and Spark too..
1.

-          Is there a way to ensure, that there is always a spark and an hdfs instance on the same mesos worker ?

-          If they are on the same mesos worker, the two services would probably not know they could talk to each other locally ?


2.  What I also not sure about, is how to handle the storage, if hdfs is running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to mesos_worker_2, do all Data have to be copied ? How are handling this ?

->            I think unless you already have data replica in new hdfs datanode, otherwise hdfs would copy the block from other exist datanode.

Do you mean, if the Data Node instance moves, it would copy all data ?
Is it the way most of the people are using hdfs in mesos, or are they any bestpractices to attach the same storage when the instance moves ?

From: haosdent [mailto:haosdent@gmail.com]
Sent: mardi 23 juin 2015 15:03
To: user@mesos.apache.org<ma...@mesos.apache.org>
Subject: Re: mesos be aware of hdfs location using spark

By the way, I think you problems are related to HDFS, not related to mesos. You could send it to hdfs user email list.

On Tue, Jun 23, 2015 at 9:01 PM, haosdent <ha...@gmail.com>> wrote:
And for your this question:
>on instances, that also contain hdfs service, to prevent all Data going over the network ?
If you open HDFS Short-Circuit Local Reads, HDFS would auto read from local machine instead of read from network when the data exists in local machine.

On Tue, Jun 23, 2015 at 8:58 PM, haosdent <ha...@gmail.com>> wrote:
For your second question, I think unless you already have data replica in new hdfs datanode, otherwise hdfs would copy the block from other exist datanode.

On Tue, Jun 23, 2015 at 7:51 PM, Sebastien Brennion <se...@leanbi.ch>> wrote:

​Hi,



- I would like to know it there is a way to make mesos dispatch in priority spark jobs, on instances, that also contain hdfs service, to prevent all Data going over the network ?



- What I also not sure about, is how to handle the storage, if hdfs is running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to mesos_worker_2, do all Data have to be copied ? How are handling this ?



Regards

Sébastien



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang

Re: mesos be aware of hdfs location using spark

Posted by haosdent <ha...@gmail.com>.
>-          Is there a way to ensure, that there is always a spark and an
hdfs instance on the same mesos worker ?
I think mesos could not ensure this now.

> If they are on the same mesos worker, the two services would probably not
know they could talk to each other locally ?
Spark use hdfs client library to connect to hdfs. HDFS would detect that
which is the fastest way to read data if would configure your hdfs cluster
correct.

>Is it the way most of the people are using hdfs in mesos, or are they any
bestpractices to attach the same storage when the instance moves ?

I think it is not related to mesos. For a hdfs cluster, if you replica
factor number is 3. After you decommission a datanode, some blocks replica
would become to 2. This hdfs cluster still could work as normal, because
the client would retry to connect other datanodes to get the data. But for
the hdfs cluster, it would have some internal network copy to keep the
replica factor number 3. There may be some glitches during this time, but I
think you don't need worried too much.

On Tue, Jun 23, 2015 at 9:21 PM, Sebastien Brennion <
sebastien.brennion@leanbi.ch> wrote:

>  Thank you for your answers… I’m new to both…
>
>
>
> Sorry sent to quick…
>
>
>
> I’m not sure to understand your answers…  I probably should try to
> reformulate my question…
>
>
> If hdfs is in Mesos, and Spark too..
>
> 1.
>
> -          Is there a way to ensure, that there is always a spark and an
> hdfs instance on the same mesos worker ?
>
> -          If they are on the same mesos worker, the two services would
> probably not know they could talk to each other locally ?
>
>
>
> 2.  What I also not sure about, is how to handle the storage, if hdfs is
> running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to
> mesos_worker_2, do all Data have to be copied ? How are handling this ?
>
> ->            I think unless you already have data replica in new hdfs
> datanode, otherwise hdfs would copy the block from other exist datanode.
>
> Do you mean, if the Data Node instance moves, it would copy all data ?
> Is it the way most of the people are using hdfs in mesos, or are they any
> bestpractices to attach the same storage when the instance moves ?
>
>
>
> *From:* haosdent [mailto:haosdent@gmail.com <ha...@gmail.com>]
> *Sent:* mardi 23 juin 2015 15:03
> *To:* user@mesos.apache.org
> *Subject:* Re: mesos be aware of hdfs location using spark
>
>
>
> By the way, I think you problems are related to HDFS, not related to
> mesos. You could send it to hdfs user email list.
>
>
>
> On Tue, Jun 23, 2015 at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>
>  And for your this question:
>
> >on instances, that also contain hdfs service, to prevent all Data going
> over the network ?
>
> If you open HDFS Short-Circuit Local Reads, HDFS would auto read from
> local machine instead of read from network when the data exists in local
> machine.
>
>
>
> On Tue, Jun 23, 2015 at 8:58 PM, haosdent <ha...@gmail.com> wrote:
>
>  For your second question, I think unless you already have data replica
> in new hdfs datanode, otherwise hdfs would copy the block from other exist
> datanode.
>
>
>
> On Tue, Jun 23, 2015 at 7:51 PM, Sebastien Brennion <
> sebastien.brennion@leanbi.ch> wrote:
>
>  ​Hi,
>
>
>
> - I would like to know it there is a way to make mesos dispatch in
> priority spark jobs, on instances, that also contain hdfs service, to
> prevent all Data going over the network ?
>
>
>
> - What I also not sure about, is how to handle the storage, if hdfs is
> running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to
> mesos_worker_2, do all Data have to be copied ? How are handling this ?
>
>
>
> Regards
>
> Sébastien
>
>
>
>
>
> --
>
> Best Regards,
>
> Haosdent Huang
>
>
>
>
>
> --
>
> Best Regards,
>
> Haosdent Huang
>
>
>
>
>
> --
>
> Best Regards,
>
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: mesos be aware of hdfs location using spark

Posted by zhou weitao <zh...@gmail.com>.
2015-06-23 21:21 GMT+08:00 Sebastien Brennion <se...@leanbi.ch>
:

>  Thank you for your answers… I’m new to both…
>
>
>
> Sorry sent to quick…
>
>
>
> I’m not sure to understand your answers…  I probably should try to
> reformulate my question…
>
>
> If hdfs is in Mesos, and Spark too..
>
> 1.
>
> -          Is there a way to ensure, that there is always a spark and an
> hdfs instance on the same mesos worker ?
>
AFAIK, under current 0.22,
   1. config HDFS in mesos slaves: A, B, C
   2. set the "attributes(mesos-slave --help)" as HDFS of A, B, C, (or set
the "role" for fine grain constraint)
   3. constraint spark driver/framework to run job on A, B, C only. (I
didn't test this step, but Maybe)


>  -          If they are on the same mesos worker, the two services would
> probably not know they could talk to each other locally ?
>
Mesos wouldn't be responsible for it. It's spark&HDFS issue.

>
>
> 2.  What I also not sure about, is how to handle the storage, if hdfs is
> running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to
> mesos_worker_2, do all Data have to be copied ? How are handling this ?
>
> ->            I think unless you already have data replica in new hdfs
> datanode, otherwise hdfs would copy the block from other exist datanode.
>
> Do you mean, if the Data Node instance moves, it would copy all data ?
> Is it the way most of the people are using hdfs in mesos, or are they any
> bestpractices to attach the same storage when the instance moves ?
>
>
>
> *From:* haosdent [mailto:haosdent@gmail.com <ha...@gmail.com>]
> *Sent:* mardi 23 juin 2015 15:03
> *To:* user@mesos.apache.org
> *Subject:* Re: mesos be aware of hdfs location using spark
>
>
>
> By the way, I think you problems are related to HDFS, not related to
> mesos. You could send it to hdfs user email list.
>
>
>
> On Tue, Jun 23, 2015 at 9:01 PM, haosdent <ha...@gmail.com> wrote:
>
>  And for your this question:
>
> >on instances, that also contain hdfs service, to prevent all Data going
> over the network ?
>
> If you open HDFS Short-Circuit Local Reads, HDFS would auto read from
> local machine instead of read from network when the data exists in local
> machine.
>
>
>
> On Tue, Jun 23, 2015 at 8:58 PM, haosdent <ha...@gmail.com> wrote:
>
>  For your second question, I think unless you already have data replica
> in new hdfs datanode, otherwise hdfs would copy the block from other exist
> datanode.
>
>
>
> On Tue, Jun 23, 2015 at 7:51 PM, Sebastien Brennion <
> sebastien.brennion@leanbi.ch> wrote:
>
>  ​Hi,
>
>
>
> - I would like to know it there is a way to make mesos dispatch in
> priority spark jobs, on instances, that also contain hdfs service, to
> prevent all Data going over the network ?
>
>
>
> - What I also not sure about, is how to handle the storage, if hdfs is
> running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to
> mesos_worker_2, do all Data have to be copied ? How are handling this ?
>
>
>
> Regards
>
> Sébastien
>
>
>
>
>
> --
>
> Best Regards,
>
> Haosdent Huang
>
>
>
>
>
> --
>
> Best Regards,
>
> Haosdent Huang
>
>
>
>
>
> --
>
> Best Regards,
>
> Haosdent Huang
>

RE: mesos be aware of hdfs location using spark

Posted by Sebastien Brennion <se...@leanbi.ch>.
Thank you for your answers… I’m new to both…

Sorry sent to quick…

I’m not sure to understand your answers…  I probably should try to reformulate my question…

If hdfs is in Mesos, and Spark too..
1.

-          Is there a way to ensure, that there is always a spark and an hdfs instance on the same mesos worker ?

-          If they are on the same mesos worker, the two services would probably not know they could talk to each other locally ?


2.  What I also not sure about, is how to handle the storage, if hdfs is running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to mesos_worker_2, do all Data have to be copied ? How are handling this ?

->            I think unless you already have data replica in new hdfs datanode, otherwise hdfs would copy the block from other exist datanode.

Do you mean, if the Data Node instance moves, it would copy all data ?
Is it the way most of the people are using hdfs in mesos, or are they any bestpractices to attach the same storage when the instance moves ?


From: haosdent [mailto:haosdent@gmail.com]
Sent: mardi 23 juin 2015 15:03
To: user@mesos.apache.org<ma...@mesos.apache.org>
Subject: Re: mesos be aware of hdfs location using spark

By the way, I think you problems are related to HDFS, not related to mesos. You could send it to hdfs user email list.

On Tue, Jun 23, 2015 at 9:01 PM, haosdent <ha...@gmail.com>> wrote:
And for your this question:
>on instances, that also contain hdfs service, to prevent all Data going over the network ?
If you open HDFS Short-Circuit Local Reads, HDFS would auto read from local machine instead of read from network when the data exists in local machine.

On Tue, Jun 23, 2015 at 8:58 PM, haosdent <ha...@gmail.com>> wrote:
For your second question, I think unless you already have data replica in new hdfs datanode, otherwise hdfs would copy the block from other exist datanode.

On Tue, Jun 23, 2015 at 7:51 PM, Sebastien Brennion <se...@leanbi.ch>> wrote:

​Hi,



- I would like to know it there is a way to make mesos dispatch in priority spark jobs, on instances, that also contain hdfs service, to prevent all Data going over the network ?



- What I also not sure about, is how to handle the storage, if hdfs is running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to mesos_worker_2, do all Data have to be copied ? How are handling this ?



Regards

Sébastien



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang

RE: mesos be aware of hdfs location using spark

Posted by Sebastien Brennion <se...@leanbi.ch>.
Thank you for your answers… I’m new to both…

I’m not sure to understand your answers…  I probably should try to reformulate my question…

If hdfs is in Mesos, and Spark too..
1.

-          Is there a way to ensure, that there is always a spark and an hdfs instance on the same mesos worker ?

-          If they are on the same mesos worker, the two services would probably not know they could talk to each other locally ?


2.  What I also not sure about, is how to handle the storage, if hdfs is running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to mesos_worker_2, do all Data have to be copied ? How are handling this ?

From: haosdent [mailto:haosdent@gmail.com]
Sent: mardi 23 juin 2015 15:03
To: user@mesos.apache.org
Subject: Re: mesos be aware of hdfs location using spark

By the way, I think you problems are related to HDFS, not related to mesos. You could send it to hdfs user email list.

On Tue, Jun 23, 2015 at 9:01 PM, haosdent <ha...@gmail.com>> wrote:
And for your this question:
>on instances, that also contain hdfs service, to prevent all Data going over the network ?
If you open HDFS Short-Circuit Local Reads, HDFS would auto read from local machine instead of read from network when the data exists in local machine.

On Tue, Jun 23, 2015 at 8:58 PM, haosdent <ha...@gmail.com>> wrote:
For your second question, I think unless you already have data replica in new hdfs datanode, otherwise hdfs would copy the block from other exist datanode.

On Tue, Jun 23, 2015 at 7:51 PM, Sebastien Brennion <se...@leanbi.ch>> wrote:

​Hi,



- I would like to know it there is a way to make mesos dispatch in priority spark jobs, on instances, that also contain hdfs service, to prevent all Data going over the network ?



- What I also not sure about, is how to handle the storage, if hdfs is running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to mesos_worker_2, do all Data have to be copied ? How are handling this ?



Regards

Sébastien



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang

Re: mesos be aware of hdfs location using spark

Posted by haosdent <ha...@gmail.com>.
By the way, I think you problems are related to HDFS, not related to mesos.
You could send it to hdfs user email list.

On Tue, Jun 23, 2015 at 9:01 PM, haosdent <ha...@gmail.com> wrote:

> And for your this question:
> >on instances, that also contain hdfs service, to prevent all Data going
> over the network ?
> If you open HDFS Short-Circuit Local Reads, HDFS would auto read from
> local machine instead of read from network when the data exists in local
> machine.
>
> On Tue, Jun 23, 2015 at 8:58 PM, haosdent <ha...@gmail.com> wrote:
>
>> For your second question, I think unless you already have data replica in
>> new hdfs datanode, otherwise hdfs would copy the block from other exist
>> datanode.
>>
>> On Tue, Jun 23, 2015 at 7:51 PM, Sebastien Brennion <
>> sebastien.brennion@leanbi.ch> wrote:
>>
>>>  ​Hi,
>>>
>>>
>>>  - I would like to know it there is a way to make mesos dispatch in
>>> priority spark jobs, on instances, that also contain hdfs service, to
>>> prevent all Data going over the network ?
>>>
>>>
>>>  - What I also not sure about, is how to handle the storage, if hdfs is
>>> running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to
>>> mesos_worker_2, do all Data have to be copied ? How are handling this ?
>>>
>>>
>>>  Regards
>>>
>>> Sébastien
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: mesos be aware of hdfs location using spark

Posted by haosdent <ha...@gmail.com>.
And for your this question:
>on instances, that also contain hdfs service, to prevent all Data going
over the network ?
If you open HDFS Short-Circuit Local Reads, HDFS would auto read from local
machine instead of read from network when the data exists in local machine.

On Tue, Jun 23, 2015 at 8:58 PM, haosdent <ha...@gmail.com> wrote:

> For your second question, I think unless you already have data replica in
> new hdfs datanode, otherwise hdfs would copy the block from other exist
> datanode.
>
> On Tue, Jun 23, 2015 at 7:51 PM, Sebastien Brennion <
> sebastien.brennion@leanbi.ch> wrote:
>
>>  ​Hi,
>>
>>
>>  - I would like to know it there is a way to make mesos dispatch in
>> priority spark jobs, on instances, that also contain hdfs service, to
>> prevent all Data going over the network ?
>>
>>
>>  - What I also not sure about, is how to handle the storage, if hdfs is
>> running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to
>> mesos_worker_2, do all Data have to be copied ? How are handling this ?
>>
>>
>>  Regards
>>
>> Sébastien
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: mesos be aware of hdfs location using spark

Posted by haosdent <ha...@gmail.com>.
For your second question, I think unless you already have data replica in
new hdfs datanode, otherwise hdfs would copy the block from other exist
datanode.

On Tue, Jun 23, 2015 at 7:51 PM, Sebastien Brennion <
sebastien.brennion@leanbi.ch> wrote:

>  ​Hi,
>
>
>  - I would like to know it there is a way to make mesos dispatch in
> priority spark jobs, on instances, that also contain hdfs service, to
> prevent all Data going over the network ?
>
>
>  - What I also not sure about, is how to handle the storage, if hdfs is
> running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to
> mesos_worker_2, do all Data have to be copied ? How are handling this ?
>
>
>  Regards
>
> Sébastien
>



-- 
Best Regards,
Haosdent Huang