You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by hornn <gi...@git.apache.org> on 2016/01/19 19:43:08 UTC

[GitHub] incubator-hawq pull request: Hawq 317

GitHub user hornn opened a pull request:

    https://github.com/apache/incubator-hawq/pull/279

    Hawq 317

    Getting delegation token for PXF directly from pg_filespace_entry which stores the HDFS location of the HAWQ instance. 
    This is similar to how delegation is token is retrieved for regular HAWQ table.
    The advantage of this approach is that no special treatment is needed for HA or Isilon cases - the location of HDFS in the catalog table already reflects it. (In HA case the location will be the nameservice, and in Isilon it will be the location of the SmartConnect).
    
    @shivzone @sansanichfb

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hornn/incubator-hawq HAWQ-317

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/279.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #279
    
----
commit 4990ed0a88dc686adca6854a7ccaab3fbb958a1e
Author: Noa Horn <nh...@pivotal.io>
Date:   2016-01-13T21:55:38Z

    HAWQ-317. Get delegation token for PXF from dfs_url

commit ba8a88feee3d61b39102f7750d4731c5cfc03eba
Author: Noa Horn <nh...@pivotal.io>
Date:   2016-01-13T22:08:03Z

    HAWQ-317. whitespaces

commit 418568ce4e0beba3381eacdae0021dca381148e2
Author: Noa Horn <nh...@pivotal.io>
Date:   2016-01-14T22:27:02Z

    HAWQ-317. pointer error

commit ad8384c7c7b11fb8ca43efcb41d2d74396447dff
Author: Noa Horn <nh...@pivotal.io>
Date:   2016-01-19T00:03:34Z

    HAWQ-317. Use pg_filespace_entry to get hdfs location for delegation token

commit 58f7f17d2f99ad79c8d7d829f716567d1c73340d
Author: Noa Horn <nh...@pivotal.io>
Date:   2016-01-19T01:49:34Z

    HAWQ-317. cleanup and doc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by shivzone <gi...@git.apache.org>.
Github user shivzone commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/279#issuecomment-172989559
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by shivzone <gi...@git.apache.org>.
Github user shivzone commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/279#discussion_r50161526
  
    --- Diff: src/backend/cdb/cdbquerycontextdispatching.c ---
    @@ -2872,31 +2872,27 @@ static char* GetExtTableFirstLocation(Datum *array)
      * Using host from uri, dispatch HDFS credentials to 
      * segments.
      *
    - * The function uses a hdfs uri in the form hdfs://host:port/ where
    - * port is hard-coded 8020. For HA the function uses hdfs://nameservice/
    + * The function uses a hdfs uri in the form hdfs://host:port/path where
    + * this value is taken from pg_filespace_entry which is populated
    + * based on hawq-site.xml's hawq_dfs_url entry.
      *
      * prepareDispatchedCatalogFileSystemCredential will store the token
    - * using port == 0 in HA case (otherwise the supplied port)
    - *
    - * TODO Get HDFS port from someplace else, currently hard coded
    + * using port == 0 in HA case (otherwise the supplied port).
      */
     static void AddFileSystemCredentialForPxfTable(char *uri)
     {
    -	StringInfoData hdfs_uri;
    +	char* dfs_address = NULL;
     
    -	initStringInfo(&hdfs_uri);
    -	GPHDUri *gphd_uri = parseGPHDUri(uri);
    -    if (gphd_uri->ha_nodes)
    -        appendStringInfo(&hdfs_uri, "hdfs://%s/", gphd_uri->ha_nodes->nameservice);
    -    else
    -        appendStringInfo(&hdfs_uri, "hdfs://%s:8020/", gphd_uri->host);
    +	if (!enable_secure_filesystem)
    --- End diff --
    
    we didn't have this check earlier ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by shivzone <gi...@git.apache.org>.
Github user shivzone commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/279#discussion_r50160999
  
    --- Diff: src/backend/access/external/pxfutils.c ---
    @@ -47,6 +52,24 @@ void port_to_str(char **port, int new_port)
     }
     
     /*
    + * get_hdfs_location_from_filespace
    + *
    + * Get hdfs location from pg_filespace_entry
    + * The returned path needs to be pfreed by the caller.
    + */
    +void get_hdfs_location_from_filespace(char** path)
    +{
    +	Assert(NULL != path);
    +	Oid dtsoid = get_database_dts(MyDatabaseId);
    --- End diff --
    
    what is MyDatabaseId and Where is it set ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by hornn <gi...@git.apache.org>.
Github user hornn commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/279#discussion_r50173781
  
    --- Diff: src/backend/cdb/cdbquerycontextdispatching.c ---
    @@ -2872,31 +2872,27 @@ static char* GetExtTableFirstLocation(Datum *array)
      * Using host from uri, dispatch HDFS credentials to 
      * segments.
      *
    - * The function uses a hdfs uri in the form hdfs://host:port/ where
    - * port is hard-coded 8020. For HA the function uses hdfs://nameservice/
    + * The function uses a hdfs uri in the form hdfs://host:port/path where
    + * this value is taken from pg_filespace_entry which is populated
    + * based on hawq-site.xml's hawq_dfs_url entry.
      *
      * prepareDispatchedCatalogFileSystemCredential will store the token
    - * using port == 0 in HA case (otherwise the supplied port)
    - *
    - * TODO Get HDFS port from someplace else, currently hard coded
    + * using port == 0 in HA case (otherwise the supplied port).
      */
     static void AddFileSystemCredentialForPxfTable(char *uri)
     {
    -	StringInfoData hdfs_uri;
    +	char* dfs_address = NULL;
     
    -	initStringInfo(&hdfs_uri);
    -	GPHDUri *gphd_uri = parseGPHDUri(uri);
    -    if (gphd_uri->ha_nodes)
    -        appendStringInfo(&hdfs_uri, "hdfs://%s/", gphd_uri->ha_nodes->nameservice);
    -    else
    -        appendStringInfo(&hdfs_uri, "hdfs://%s:8020/", gphd_uri->host);
    +	if (!enable_secure_filesystem)
    --- End diff --
    
    Surprisingly, no. The check was only done inside prepareDispatchedCatalogFileSystemCredential(), which this function calls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by wangzw <gi...@git.apache.org>.
Github user wangzw commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/279#discussion_r50350767
  
    --- Diff: src/backend/access/external/pxfutils.c ---
    @@ -47,6 +52,24 @@ void port_to_str(char **port, int new_port)
     }
     
     /*
    + * get_hdfs_location_from_filespace
    + *
    + * Get hdfs location from pg_filespace_entry
    + * The returned path needs to be pfreed by the caller.
    + */
    +void get_hdfs_location_from_filespace(char** path)
    +{
    +	Assert(NULL != path);
    --- End diff --
    
    If I remember correctly a table can specify a table space that is different from the table space used by the database. Will it cause trouble with this code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by hornn <gi...@git.apache.org>.
Github user hornn closed the pull request at:

    https://github.com/apache/incubator-hawq/pull/279


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by GodenYao <gi...@git.apache.org>.
Github user GodenYao commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/279#issuecomment-188397037
  
    Jemish is on paternity leave until Mar 1st. He will pick up the verification and update us.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by huor <gi...@git.apache.org>.
Github user huor commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/279#issuecomment-173779281
  
    Nice fix. It would be perfect if a commit message is provided.
    
    Here is the convention for commit message in [Contributing to HAWQ](https://cwiki.apache.org/confluence/display/HAWQ/Contributing+to+HAWQ). Also pasted below:
    ```
    # Make changes and commit to local. Please format commit message as follows and feel free to add additional comments.
    # <jira ticket number>. <Commit Message>
    # Example: HAWQ-1002. Add awesome feature
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by hornn <gi...@git.apache.org>.
Github user hornn commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/279#discussion_r50173636
  
    --- Diff: src/backend/access/external/pxfutils.c ---
    @@ -47,6 +52,24 @@ void port_to_str(char **port, int new_port)
     }
     
     /*
    + * get_hdfs_location_from_filespace
    + *
    + * Get hdfs location from pg_filespace_entry
    + * The returned path needs to be pfreed by the caller.
    + */
    +void get_hdfs_location_from_filespace(char** path)
    +{
    +	Assert(NULL != path);
    +	Oid dtsoid = get_database_dts(MyDatabaseId);
    --- End diff --
    
    It holds the OID of the current session's db. It is usually used to get the db's name or any other properties related to it, such as the filespace.
    It is defined in globals.c and set in postinit.c's InitPostgres() which is called when a session is opened (see postgres.c's PostgresMain()).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by shivzone <gi...@git.apache.org>.
Github user shivzone commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/279#issuecomment-189326238
  
    @hornn this patch is required to fix https://issues.apache.org/jira/browse/HAWQ-462
    We did the RCA for the hcatalog failure against secure cluster and this patch fixes that issue as well.
    Can you please merge it ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by shivzone <gi...@git.apache.org>.
Github user shivzone commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/279#discussion_r50160316
  
    --- Diff: src/backend/access/external/hd_work_mgr.c ---
    @@ -900,36 +900,30 @@ static void init_client_context(ClientContext *client_context)
      * is created. We cannot use that token because hd_work_mgr.c code gets 
      * executed before a portal is created.
      *
    - * The function uses a hdfs uri in the form of hdfs://host:port/
    - * where port is 8020. In the case of HA the function uses the form
    - * hdfs://nameservice/
    - *
    - * TODO 8020 is hard-coded. Must find the NameNode port somehow.
    + * The function uses a hdfs uri in the form of hdfs://host:port/path.
    --- End diff --
    
    hdfs://host:port/path or hdfs://nameservice/path


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by hornn <gi...@git.apache.org>.
Github user hornn commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/279#discussion_r50430125
  
    --- Diff: src/backend/access/external/pxfutils.c ---
    @@ -47,6 +52,24 @@ void port_to_str(char **port, int new_port)
     }
     
     /*
    + * get_hdfs_location_from_filespace
    + *
    + * Get hdfs location from pg_filespace_entry
    + * The returned path needs to be pfreed by the caller.
    + */
    +void get_hdfs_location_from_filespace(char** path)
    +{
    +	Assert(NULL != path);
    --- End diff --
    
    This code gets the file space of the current DB (MyDatabaseId), so even if a specific table is configured to a different file space it shouldn't affect it.
    Do you think it would be safer to get the file space of DB 'template1' instead of the current session's DB? I followed the examples in other places in the code to get the file space, and they either use the current table's DB or MyDatabaseId.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by cwelton <gi...@git.apache.org>.
Github user cwelton commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/279#issuecomment-185328317
  
    This PR has been open for 29 days, and has enough +1 votes to move forward.  Anything blocking it from being resolved?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by hornn <gi...@git.apache.org>.
Github user hornn commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/279#issuecomment-185330578
  
    Awaiting verification on Isilon cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by shivzone <gi...@git.apache.org>.
Github user shivzone commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/279#issuecomment-185334905
  
    As Noa said we are waiting on Jemish's certification of this patch against a secure Isilon cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request: Hawq 317

Posted by wangzw <gi...@git.apache.org>.
Github user wangzw commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/279#issuecomment-173775801
  
    LGTM +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---