You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by xunzhang <gi...@git.apache.org> on 2016/09/01 13:09:18 UTC

[GitHub] incubator-hawq pull request #883: HAWQ-1029. Update hawqregister_help info.

GitHub user xunzhang opened a pull request:

    https://github.com/apache/incubator-hawq/pull/883

    HAWQ-1029. Update hawqregister_help info.

    cc @ictmalili 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xunzhang/incubator-hawq HAWQ-1029

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/883.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #883
    
----
commit 0958ca01221ecfdf819dda8887dddd7709835a05
Author: xunzhang <xu...@gmail.com>
Date:   2016-09-01T13:08:34Z

    HAWQ-1029. Update hawqregister_help info.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #883: HAWQ-1029. Update hawqregister_help info.

Posted by ictmalili <gi...@git.apache.org>.
Github user ictmalili commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/883#discussion_r80218098
  
    --- Diff: tools/doc/hawqregister_help ---
    @@ -37,10 +35,19 @@ The file(s) to be registered and the table in HAWQ must be in the
     same HDFS cluster.
     
     Use Case2:
    -User should be able to use hawq register to register table files into a new HAWQ cluster.
    -It is some kind of protecting against corruption from users' perspective.
    -Users use the last-known-good metadata to update the portion of catalog managing HDFS blocks.
    -The table files or dictionary should be backuped(such as using distcp) into the same path in the new HDFS setting.
    +Hawq register can register both AO and parquet format table, and the files to be registered are listed in the .yml configuration file.
    +This configuration file can be generated by hawq extract. Register through .yml configuration doesn\u2019t require the table already exist,
    +since .yml file contains table schema already.
    +HAWQ register behaviors differently with different options: 
    + * If the table does not exist, hawq register will create table and do register. 
    + * If table already exist, hawq register will append the files to the existing table.
    + * If --force option specified, hawq register will erase existing catalog 
    +   table pg_aoseg.pg_aoseg_$relid/pg_aoseg.pg_paqseg_$relid data for the table and 
    +   re-register according to .yml configuration file definition. Note. If there are
    +   files under table directory which are not specified in .yml configuration file, it will throw error out.
    +Note. Without --force specified, if some file specified in .yml configuration file lie under the table directory, hawq register will throw error out.
    +Note. With --force option specified, if there are files under table directory which are not specified in .yml configuration file, hawq register will throw error out.
    +Note. For both the use cases of hawq register, if the table is hash distributed, hawq register just check the file number to be registered has to be integral multiple multiple times of this table\u2019s bucket number, and check whether the distribution key specified in .yml configuration file is same as that of table. It does not check whether files are actually distributed by the key.
    --- End diff --
    
    We 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq issue #883: HAWQ-1029. Update hawqregister_help info.

Posted by xunzhang <gi...@git.apache.org>.
Github user xunzhang commented on the issue:

    https://github.com/apache/incubator-hawq/pull/883
  
    cc @ictmalili @linwen again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #883: HAWQ-1029. Update hawqregister_help info.

Posted by ictmalili <gi...@git.apache.org>.
Github user ictmalili commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/883#discussion_r80217715
  
    --- Diff: tools/doc/hawqregister_help ---
    @@ -37,10 +35,19 @@ The file(s) to be registered and the table in HAWQ must be in the
     same HDFS cluster.
     
     Use Case2:
    -User should be able to use hawq register to register table files into a new HAWQ cluster.
    -It is some kind of protecting against corruption from users' perspective.
    -Users use the last-known-good metadata to update the portion of catalog managing HDFS blocks.
    -The table files or dictionary should be backuped(such as using distcp) into the same path in the new HDFS setting.
    +Hawq register can register both AO and parquet format table, and the files to be registered are listed in the .yml configuration file.
    +This configuration file can be generated by hawq extract. Register through .yml configuration doesn\u2019t require the table already exist,
    +since .yml file contains table schema already.
    +HAWQ register behaviors differently with different options: 
    + * If the table does not exist, hawq register will create table and do register. 
    + * If table already exist, hawq register will append the files to the existing table.
    + * If --force option specified, hawq register will erase existing catalog 
    +   table pg_aoseg.pg_aoseg_$relid/pg_aoseg.pg_paqseg_$relid data for the table and 
    +   re-register according to .yml configuration file definition. Note. If there are
    +   files under table directory which are not specified in .yml configuration file, it will throw error out.
    +Note. Without --force specified, if some file specified in .yml configuration file lie under the table directory, hawq register will throw error out.
    +Note. With --force option specified, if there are files under table directory which are not specified in .yml configuration file, hawq register will throw error out.
    +Note. For both the use cases of hawq register, if the table is hash distributed, hawq register just check the file number to be registered has to be integral multiple multiple times of this table\u2019s bucket number, and check whether the distribution key specified in .yml configuration file is same as that of table. It does not check whether files are actually distributed by the key.
    --- End diff --
    
    Use Case 1 does not support hash distributed table register. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #883: HAWQ-1029. Update hawqregister_help info.

Posted by ictmalili <gi...@git.apache.org>.
Github user ictmalili commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/883#discussion_r80217825
  
    --- Diff: tools/doc/hawqregister_help ---
    @@ -37,10 +35,19 @@ The file(s) to be registered and the table in HAWQ must be in the
     same HDFS cluster.
     
     Use Case2:
    -User should be able to use hawq register to register table files into a new HAWQ cluster.
    -It is some kind of protecting against corruption from users' perspective.
    -Users use the last-known-good metadata to update the portion of catalog managing HDFS blocks.
    -The table files or dictionary should be backuped(such as using distcp) into the same path in the new HDFS setting.
    +Hawq register can register both AO and parquet format table, and the files to be registered are listed in the .yml configuration file.
    +This configuration file can be generated by hawq extract. Register through .yml configuration doesn\u2019t require the table already exist,
    +since .yml file contains table schema already.
    +HAWQ register behaviors differently with different options: 
    + * If the table does not exist, hawq register will create table and do register. 
    + * If table already exist, hawq register will append the files to the existing table.
    + * If --force option specified, hawq register will erase existing catalog 
    +   table pg_aoseg.pg_aoseg_$relid/pg_aoseg.pg_paqseg_$relid data for the table and 
    +   re-register according to .yml configuration file definition. Note. If there are
    +   files under table directory which are not specified in .yml configuration file, it will throw error out.
    +Note. Without --force specified, if some file specified in .yml configuration file lie under the table directory, hawq register will throw error out.
    +Note. With --force option specified, if there are files under table directory which are not specified in .yml configuration file, hawq register will throw error out.
    +Note. For both the use cases of hawq register, if the table is hash distributed, hawq register just check the file number to be registered has to be integral multiple multiple times of this table\u2019s bucket number, and check whether the distribution key specified in .yml configuration file is same as that of table. It does not check whether files are actually distributed by the key.
    --- End diff --
    
    "hawq register just check the file number to be registered has to be integral multiple multiple times of this table\u2019s bucket number"  There are many useless words there. "integral" and "multiple"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #883: HAWQ-1029. Update hawqregister_help info.

Posted by ictmalili <gi...@git.apache.org>.
Github user ictmalili commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/883#discussion_r80218089
  
    --- Diff: tools/doc/hawqregister_help ---
    @@ -110,14 +113,24 @@ table 'pg_aoseg.pg_paqseg_77160'.
     *****************************************************
     EXAMPLE FOR USAGE2
     *****************************************************
    -$ psql -c "drop table if exists table;"
    -$ psql -c "create table table(i int) with (appendonly=true, orientation=parquet) distributed by (i);"
    -$ psql -c "insert into table values(1), (2), (3);"
    -$ hawq extract -d postgres -o t.yml table
    -$ hawq register -d postgres -c t.yml newtable
    -In this example, suppose that "table" is a table in old HAWQ Cluster, user dump "t.yml" yaml file to
    -save the metadata of "table". To register the "newtable" in a new HAWQ Cluster, user run "hawq register"
    -to register the newtable with the given yaml file "t.yml".
    +This example shows hawq register functionality of hawq register according to yml configuration file.
    +Usually the yml configuration file is generated by hawq extract.
    +This example shows the life cycle of hawq extract and hawq register.
    +
    +Firstly, create a table and insert some data into it:
    +$ psql -c "create table paq1(a int, b varchar(10))with(appendonly=true, orientation=parquet);"
    +$ psql -c "insert into paq1 values(generate_series(1,1000), 'abcde');"
    +
    +Secondly, extract the table metadata information out:
    +$ hawq extract -o paq1.yml paq1
    +
    +Thirdly, register to new table paq2 identifying yml file:
    +$ hawq register --config paq1.yml paq2
    +
    +Finally, select the new table to look at whether the content has already been registered.
    +$ select count(*) from paq2;
    +
    +In the above example, the final result must be return 1000.
    --- End diff --
    
    change "must be" to "should"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #883: HAWQ-1029. Update hawqregister_help info.

Posted by xunzhang <gi...@git.apache.org>.
Github user xunzhang closed the pull request at:

    https://github.com/apache/incubator-hawq/pull/883


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq issue #883: HAWQ-1029. Update hawqregister_help info.

Posted by xunzhang <gi...@git.apache.org>.
Github user xunzhang commented on the issue:

    https://github.com/apache/incubator-hawq/pull/883
  
    Merged into master, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #883: HAWQ-1029. Update hawqregister_help info.

Posted by ictmalili <gi...@git.apache.org>.
Github user ictmalili commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/883#discussion_r77279576
  
    --- Diff: tools/doc/hawqregister_help ---
    @@ -100,7 +100,7 @@ Assume the location of the database is 'hdfs://localhost:8020/hawq_default',
     tablespace id is '16385', database id is '16387', table filenode id is '77160',
     last file under the filenode numbered '7'.
     
    -$ hawq register postgres parquet_table hdfs://localhost:8020/temp/hive.paq
    +$ hawq register -d postgres -f hdfs://localhost:8020/temp/hive.paq parquet_table
    --- End diff --
    
    +1 for this change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq issue #883: HAWQ-1029. Update hawqregister_help info.

Posted by xunzhang <gi...@git.apache.org>.
Github user xunzhang commented on the issue:

    https://github.com/apache/incubator-hawq/pull/883
  
    also cc @radarwave 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #883: HAWQ-1029. Update hawqregister_help info.

Posted by ictmalili <gi...@git.apache.org>.
Github user ictmalili commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/883#discussion_r77279552
  
    --- Diff: tools/doc/hawqregister_help ---
    @@ -7,8 +7,8 @@ Usage2: Register parquet/ao table from laterst-sync-metadata in yaml format
     SYNOPSIS
     *****************************************************
     
    -Usage1: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f filepath] <tablename>
    -Usage2: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c config] <tablename>
    +Usage1: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-f filepath] [-e eof] <tablename>
    +Usage2: hawq register [-h hostname] [-p port] [-U username] [-d databasename] [-c config] [--force] [--repair] <tablename>
    --- End diff --
    
    Should we add more description about the --force and --repair params? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---