You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Qinghe Jin (JIRA)" <ji...@apache.org> on 2012/06/05 04:54:22 UTC

[jira] [Created] (MESOS-204) enable Frameworks level disk IO bandwidth support

Qinghe Jin created MESOS-204:
--------------------------------

             Summary: enable Frameworks level disk IO bandwidth support
                 Key: MESOS-204
                 URL: https://issues.apache.org/jira/browse/MESOS-204
             Project: Mesos
          Issue Type: Brainstorming
            Reporter: Qinghe Jin


I am considering to add Frameworks level disk IO bandwidth support to mesos,which behaves like ionice on the single node,but this time it‘s in multi-nodes。I believe this kind of disk QoS support will be more user friendly。My initial idea is to allow user to use an ioprio_set like api to set the io priority when they commit their job. At the same time, here may need some tools to display the disk IO usage per node/container on the webui.

I have read the mesos paper carefully and I found that in the comment,it mentioned that in the future it will support disk and network bandwidth control。But,in my opinion,disk IO bandwidth is more feasible for it has more locality,and network bandwidth control may be more difficult for the complexity of the network environment。But I don't know why is there nobody going to add this feature。Is it useless?Or not feasible?Or just not interested?

For this feature may need a lot of work,I‘d like to listen to your opinions before I start to work on it。Anybody who have any questions ,suggestions or ideas, please tell me and I appreciate it very much,Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MESOS-204) enable Frameworks level disk IO bandwidth support

Posted by "Charles Reiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289254#comment-13289254 ] 

Charles Reiss commented on MESOS-204:
-------------------------------------

Getting at least some local network isolation support is similar to getting disk IO isolation, and sometimes it might be all that's needed. It is not obvious what the right way to do network isolation in general (there are some proposals in the literature), and it's not obvious how a user should specify their needs even in the purely local case.

The priority inversion problem I'm talking about is that, in most setups, any program can cause HDFS -- or most distributed storage systems -- to highly load the disks, even on machines on which they are not running. Sometimes the program will be more important than what the DFS is competing with and this will not be a problem, but other times it won't. Since both cases happen, there's no "right" priority setting for the DFS, and there's not an obvious way that the kernel/Mesos/etc. can find out who's using the DFS at runtime.


                
> enable Frameworks level disk IO bandwidth support
> -------------------------------------------------
>
>                 Key: MESOS-204
>                 URL: https://issues.apache.org/jira/browse/MESOS-204
>             Project: Mesos
>          Issue Type: Brainstorming
>            Reporter: Qinghe Jin
>
> I am considering to add Frameworks level disk IO bandwidth support to mesos,which behaves like ionice on the single node,but this time it‘s in multi-nodes。I believe this kind of disk QoS support will be more user friendly。My initial idea is to allow user to use an ioprio_set like api to set the io priority when they commit their job. At the same time, here may need some tools to display the disk IO usage per node/container on the webui.
> I have read the mesos paper carefully and I found that in the comment,it mentioned that in the future it will support disk and network bandwidth control。But,in my opinion,disk IO bandwidth is more feasible for it has more locality,and network bandwidth control may be more difficult for the complexity of the network environment。But I don't know why is there nobody going to add this feature。Is it useless?Or not feasible?Or just not interested?
> For this feature may need a lot of work,I‘d like to listen to your opinions before I start to work on it。Anybody who have any questions ,suggestions or ideas, please tell me and I appreciate it very much,Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MESOS-204) enable Frameworks level disk IO bandwidth support

Posted by "Qinghe Jin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289224#comment-13289224 ] 

Qinghe Jin commented on MESOS-204:
----------------------------------

Thanks Charles Reiss very much for your suggestion!

Cgroups blkio controller seems to be more powerfully than ioprio_set, and I will consider to use it. Yes,OOM is really nasty,but you said that Ben had started working on it, so I'd like to add the IO support first :D

Does the network really similarly to disk ? In my opinion, network prioitization is more difficult than disk IO because  the network application is point to point and it’s related to the network topology and the traffic state of the network,which is beyond the ability of single node. From the view of single node, we probably could set the network bandwidth or priority, but could we insure that we could really achieve that for application? After all, the node know more about the disk than the network, right?

Frankly speaking, I know little about HDFS's priority inversion, how does it happened in the real world? Does the os could do the priority inheritance to avoid this?

At the beginning, I suppose the IO bandwidth control is specially for IO bound applications with the same priority, and haven't taken priority inversion into account. If it really happens, I will pay more attention to it.

                
> enable Frameworks level disk IO bandwidth support
> -------------------------------------------------
>
>                 Key: MESOS-204
>                 URL: https://issues.apache.org/jira/browse/MESOS-204
>             Project: Mesos
>          Issue Type: Brainstorming
>            Reporter: Qinghe Jin
>
> I am considering to add Frameworks level disk IO bandwidth support to mesos,which behaves like ionice on the single node,but this time it‘s in multi-nodes。I believe this kind of disk QoS support will be more user friendly。My initial idea is to allow user to use an ioprio_set like api to set the io priority when they commit their job. At the same time, here may need some tools to display the disk IO usage per node/container on the webui.
> I have read the mesos paper carefully and I found that in the comment,it mentioned that in the future it will support disk and network bandwidth control。But,in my opinion,disk IO bandwidth is more feasible for it has more locality,and network bandwidth control may be more difficult for the complexity of the network environment。But I don't know why is there nobody going to add this feature。Is it useless?Or not feasible?Or just not interested?
> For this feature may need a lot of work,I‘d like to listen to your opinions before I start to work on it。Anybody who have any questions ,suggestions or ideas, please tell me and I appreciate it very much,Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MESOS-204) enable Frameworks level disk IO bandwidth support

Posted by "Qinghe Jin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290002#comment-13290002 ] 

Qinghe Jin commented on MESOS-204:
----------------------------------

Hi Charles, if the application is running upon dfs, it's really hard for us to achieve QoS for it. And without dfs, many frameworks could not run, which makes the IO bandwidth control meanless. What a bad news!

Actually, the priority inversion problem you mentioned maybe is not so serious if the dfs is competing with some other applications running on a node. After all, they are all processes and we can control their IO priorities. The real tough problem, just as you have mentioned, is that dfs kill our locality of IO access  which I must rely on. And if I do the IO isolation, it can be only useful when the applications access the native fs. Is there any applications behave like this? May be that's the true difference between IO isolation and Cpu/mem isolation.
                
> enable Frameworks level disk IO bandwidth support
> -------------------------------------------------
>
>                 Key: MESOS-204
>                 URL: https://issues.apache.org/jira/browse/MESOS-204
>             Project: Mesos
>          Issue Type: Brainstorming
>            Reporter: Qinghe Jin
>
> I am considering to add Frameworks level disk IO bandwidth support to mesos,which behaves like ionice on the single node,but this time it‘s in multi-nodes。I believe this kind of disk QoS support will be more user friendly。My initial idea is to allow user to use an ioprio_set like api to set the io priority when they commit their job. At the same time, here may need some tools to display the disk IO usage per node/container on the webui.
> I have read the mesos paper carefully and I found that in the comment,it mentioned that in the future it will support disk and network bandwidth control。But,in my opinion,disk IO bandwidth is more feasible for it has more locality,and network bandwidth control may be more difficult for the complexity of the network environment。But I don't know why is there nobody going to add this feature。Is it useless?Or not feasible?Or just not interested?
> For this feature may need a lot of work,I‘d like to listen to your opinions before I start to work on it。Anybody who have any questions ,suggestions or ideas, please tell me and I appreciate it very much,Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MESOS-204) enable Frameworks level disk IO bandwidth support

Posted by "Charles Reiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289108#comment-13289108 ] 

Charles Reiss commented on MESOS-204:
-------------------------------------

It would probably make more sense to use the cgroups blkio controller (http://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt), assuming you'll have a recent enough kernel. Network prioritization can probably be done similarly. This probably just hasn't been a big priority for people because there are nastier issues with isolation, e.g. MESOS-47.

For I/O isolation, one issue is that HDFS (or other DFS) will probably create priority inversion issues in most real installations. It's not clear how to deal with this problem.
                
> enable Frameworks level disk IO bandwidth support
> -------------------------------------------------
>
>                 Key: MESOS-204
>                 URL: https://issues.apache.org/jira/browse/MESOS-204
>             Project: Mesos
>          Issue Type: Brainstorming
>            Reporter: Qinghe Jin
>
> I am considering to add Frameworks level disk IO bandwidth support to mesos,which behaves like ionice on the single node,but this time it‘s in multi-nodes。I believe this kind of disk QoS support will be more user friendly。My initial idea is to allow user to use an ioprio_set like api to set the io priority when they commit their job. At the same time, here may need some tools to display the disk IO usage per node/container on the webui.
> I have read the mesos paper carefully and I found that in the comment,it mentioned that in the future it will support disk and network bandwidth control。But,in my opinion,disk IO bandwidth is more feasible for it has more locality,and network bandwidth control may be more difficult for the complexity of the network environment。But I don't know why is there nobody going to add this feature。Is it useless?Or not feasible?Or just not interested?
> For this feature may need a lot of work,I‘d like to listen to your opinions before I start to work on it。Anybody who have any questions ,suggestions or ideas, please tell me and I appreciate it very much,Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira