You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by zhangh43 <gi...@git.apache.org> on 2015/12/03 07:58:42 UTC

[GitHub] incubator-hawq pull request: HAWQ-210. Improve data locality by ca...

GitHub user zhangh43 opened a pull request:

    https://github.com/apache/incubator-hawq/pull/155

    HAWQ-210. Improve data locality by calculating the insert host.

    Currently, data locality is based on a heuristic greedy algotirhm.
    First consider continue blocks for a vseg and then non continue blocks and finally non local blocks.
    But when a file contains several continue blocks but each vseg could only process one blocks due to avg size. In this case continue blocks are assigned to different vsegs one by one, and they are to be treated as non continue blocks.
    In this improvement, we try to add continue infomation to help choosing the right vseg in non continue blocks allocation stages. The main idea is to go through the blocks in a file, and find the host which include the max number of blocks in this file. We call this host as INSERT HOST. When assigning non continue blocks, we prefer INSERT HOST to other hosts when they are all local read.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhangh43/incubator-hawq hawq210

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/155.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #155
    
----
commit 1039f7a59f6d28ee8a258922384ec73768a04b45
Author: hubertzhang <hz...@pivotal.io>
Date:   2015-12-03T06:56:10Z

    HAWQ-210. Improve data locality by calculating the insert host.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-210. Improve data locality by ca...

Posted by yaoj2 <gi...@git.apache.org>.
Github user yaoj2 commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/155#issuecomment-161540301
  
    Looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-210. Improve data locality by ca...

Posted by zhangh43 <gi...@git.apache.org>.
Github user zhangh43 closed the pull request at:

    https://github.com/apache/incubator-hawq/pull/155


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: HAWQ-210. Improve data locality by ca...

Posted by wengyanqing <gi...@git.apache.org>.
Github user wengyanqing commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/155#issuecomment-161539970
  
    looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---