You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by xunzhang <gi...@git.apache.org> on 2016/08/31 06:00:17 UTC

[GitHub] incubator-hawq pull request #878: HAWQ-1025.

GitHub user xunzhang opened a pull request:

    https://github.com/apache/incubator-hawq/pull/878

    HAWQ-1025. 

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xunzhang/incubator-hawq HAWQ-1025

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq/pull/878.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #878
    
----
commit b060a365c8d42e22afaaffca78affef7f4cd556f
Author: xunzhang <xu...@gmail.com>
Date:   2016-08-30T08:03:42Z

    HAWQ-1025. Add bucket number in the yaml file of hawq extract, modify to use actual eof for usage1.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq issue #878: HAWQ-1025. Implement issues in HAWQ-1025.

Posted by xunzhang <gi...@git.apache.org>.
Github user xunzhang commented on the issue:

    https://github.com/apache/incubator-hawq/pull/878
  
    cc @ictmalili @wcl14 @radarwave 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq issue #878: HAWQ-1025. Implement issues in HAWQ-1025.

Posted by ictmalili <gi...@git.apache.org>.
Github user ictmalili commented on the issue:

    https://github.com/apache/incubator-hawq/pull/878
  
    LGTM except the three comments. Thanks \U0001f3b1 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq issue #878: HAWQ-1025. Implement issues in HAWQ-1025.

Posted by xunzhang <gi...@git.apache.org>.
Github user xunzhang commented on the issue:

    https://github.com/apache/incubator-hawq/pull/878
  
    Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #878: HAWQ-1025. Implement issues in HAWQ-1025.

Posted by ictmalili <gi...@git.apache.org>.
Github user ictmalili commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/878#discussion_r77279326
  
    --- Diff: tools/bin/hawqregister ---
    @@ -327,50 +327,85 @@ def insert_metadata_into_database(dburl, databasename, tablename, seg_name, firs
         '''Insert the metadata into database'''
         try:
             query = "SET allow_system_table_mods='dml';"
    -        segno = firstsegno
    -        for eof in eofs:
    -            query += "insert into pg_aoseg.%s values(%d, %d, %d, %d);" % (seg_name, segno, eof, -1, -1)
    -            segno += 1
    +        query += 'insert into pg_aoseg.%s values(%d, %d, %d, %d)' % (seg_name, firstsegno, eofs[0], -1, -1)
    +        for k, eof in enumerate(eofs[1:]):
    +            query += ',(%d, %d, %d, %d)' % (firstsegno + k + 1, eof, -1, -1)
    +        query += ';'
             conn = dbconn.connect(dburl, True)
             rows = dbconn.execSQL(conn, query)
             conn.commit()
             conn.close()
         except DatabaseError, ex:
             logger.error('Failed to connect to database, this script can only be run when the database is up')
    -        move_files_in_hdfs(options.database, options.tablename, files, firstsegno, tabledir, False)
    +        move_files_in_hdfs(database, tablename, files, firstsegno, tabledir, False)
             sys.exit(1)
     
     
     if __name__ == '__main__':
    +
         parser = option_parser()
         options, args = parser.parse_args()
    -    if len(args) != 1 or (options.yml_config and options.filepath):
    +
    +    if len(args) != 1 or ((options.yml_config or options.force or options.repair) and options.filepath) or (options.force and options.repair):
             parser.print_help(sys.stderr)
             sys.exit(1)
         if local_ssh('hadoop', logger):
             logger.error('command "hadoop" is not available.')
             sys.exit(1)
     
    -    dburl = dbconn.DbURL(hostname=options.host, port=options.port, username=options.user, dbname=options.database)
    +    dburl = dbconn.DbURL(hostname = options.host, port = options.port, username = options.user, dbname = options.database)
         filepath, database, tablename = options.filepath, options.database, args[0]
     
    +    second_normal_mode, second_exist_mode, force_mode, repair_mode = False, False, False, False
         if options.yml_config: # Usage2
    -        fileformat, filepath, schema, distribution_policy, file_locations = option_parser_yml(options.yml_config)
    -        create_table(dburl, tablename, schema, fileformat, distribution_policy, file_locations)
    +        if options.force:
    +            force_mode = True
    +        elif options.repair:
    +            repair_mode = True
    +        else:
    +            second_normal_mode = True
    +        fileformat, files, sizes, schema, distribution_policy, file_locations, bucket_number = option_parser_yml(options.yml_config)
    +        filepath = files[0][:files[0].rfind('/')] if files else ''
    +        if distribution_policy.startswith('DISTRIBUTED BY'):
    +            if len(files) % bucket_number != 0:
    +                logger.error('Files to be registered must match the bucket number of hash table.')
    --- End diff --
    
    What about adding more information describing "match"?  for example, "be multiple times of"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #878: HAWQ-1025. Implement issues in HAWQ-1025.

Posted by ictmalili <gi...@git.apache.org>.
Github user ictmalili commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/878#discussion_r77285478
  
    --- Diff: tools/bin/hawqregister ---
    @@ -297,8 +297,8 @@ def move_files_in_hdfs(databasename, tablename, files, firstsegno, tabledir, nor
         '''Move file(s) in src path into the folder correspoding to the target table'''
         if normal:
             segno = firstsegno
    -        for file in files:
    -            srcfile = file
    +        for f in files:
    +            srcfile = f
    --- End diff --
    
    Forget it since I was talking about the --force option implementation. It's out of scope of this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #878: HAWQ-1025. Implement issues in HAWQ-1025.

Posted by ictmalili <gi...@git.apache.org>.
Github user ictmalili commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/878#discussion_r77279215
  
    --- Diff: tools/bin/hawqregister ---
    @@ -297,8 +297,8 @@ def move_files_in_hdfs(databasename, tablename, files, firstsegno, tabledir, nor
         '''Move file(s) in src path into the folder correspoding to the target table'''
         if normal:
             segno = firstsegno
    -        for file in files:
    -            srcfile = file
    +        for f in files:
    +            srcfile = f
    --- End diff --
    
    @xunzhang Have you considered the case when src files colocate with dest file under same folder? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq issue #878: HAWQ-1025. Implement issues in HAWQ-1025.

Posted by radarwave <gi...@git.apache.org>.
Github user radarwave commented on the issue:

    https://github.com/apache/incubator-hawq/pull/878
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #878: HAWQ-1025. Implement issues in HAWQ-1025.

Posted by xunzhang <gi...@git.apache.org>.
Github user xunzhang commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/878#discussion_r77286236
  
    --- Diff: tools/bin/hawqregister ---
    @@ -327,50 +327,85 @@ def insert_metadata_into_database(dburl, databasename, tablename, seg_name, firs
         '''Insert the metadata into database'''
         try:
             query = "SET allow_system_table_mods='dml';"
    -        segno = firstsegno
    -        for eof in eofs:
    -            query += "insert into pg_aoseg.%s values(%d, %d, %d, %d);" % (seg_name, segno, eof, -1, -1)
    -            segno += 1
    +        query += 'insert into pg_aoseg.%s values(%d, %d, %d, %d)' % (seg_name, firstsegno, eofs[0], -1, -1)
    +        for k, eof in enumerate(eofs[1:]):
    +            query += ',(%d, %d, %d, %d)' % (firstsegno + k + 1, eof, -1, -1)
    +        query += ';'
             conn = dbconn.connect(dburl, True)
             rows = dbconn.execSQL(conn, query)
             conn.commit()
             conn.close()
         except DatabaseError, ex:
             logger.error('Failed to connect to database, this script can only be run when the database is up')
    -        move_files_in_hdfs(options.database, options.tablename, files, firstsegno, tabledir, False)
    +        move_files_in_hdfs(database, tablename, files, firstsegno, tabledir, False)
             sys.exit(1)
     
     
     if __name__ == '__main__':
    +
         parser = option_parser()
         options, args = parser.parse_args()
    -    if len(args) != 1 or (options.yml_config and options.filepath):
    +
    +    if len(args) != 1 or ((options.yml_config or options.force or options.repair) and options.filepath) or (options.force and options.repair):
             parser.print_help(sys.stderr)
             sys.exit(1)
         if local_ssh('hadoop', logger):
             logger.error('command "hadoop" is not available.')
             sys.exit(1)
     
    -    dburl = dbconn.DbURL(hostname=options.host, port=options.port, username=options.user, dbname=options.database)
    +    dburl = dbconn.DbURL(hostname = options.host, port = options.port, username = options.user, dbname = options.database)
         filepath, database, tablename = options.filepath, options.database, args[0]
     
    +    second_normal_mode, second_exist_mode, force_mode, repair_mode = False, False, False, False
         if options.yml_config: # Usage2
    -        fileformat, filepath, schema, distribution_policy, file_locations = option_parser_yml(options.yml_config)
    -        create_table(dburl, tablename, schema, fileformat, distribution_policy, file_locations)
    +        if options.force:
    +            force_mode = True
    +        elif options.repair:
    +            repair_mode = True
    +        else:
    +            second_normal_mode = True
    +        fileformat, files, sizes, schema, distribution_policy, file_locations, bucket_number = option_parser_yml(options.yml_config)
    +        filepath = files[0][:files[0].rfind('/')] if files else ''
    +        if distribution_policy.startswith('DISTRIBUTED BY'):
    +            if len(files) % bucket_number != 0:
    +                logger.error('Files to be registered must match the bucket number of hash table.')
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #878: HAWQ-1025. Implement issues in HAWQ-1025.

Posted by xunzhang <gi...@git.apache.org>.
Github user xunzhang closed the pull request at:

    https://github.com/apache/incubator-hawq/pull/878


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #878: HAWQ-1025. Implement issues in HAWQ-1025.

Posted by xunzhang <gi...@git.apache.org>.
Github user xunzhang commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/878#discussion_r77286378
  
    --- Diff: src/test/feature/ManagementTool/test_hawq_register.cpp ---
    @@ -341,15 +341,30 @@ TEST_F(TestHawqRegister, TestIncorrectYaml) {
       EXPECT_EQ(1, Command::getCommandStatus("hawq register -d " + (string) HAWQ_DB + " -c " + filePath + "incorrect4.yml xx"));
       EXPECT_EQ(1, Command::getCommandStatus("hawq register -d " + (string) HAWQ_DB + " -c " + filePath + "incorrect5.yml xx"));
       EXPECT_EQ(1, Command::getCommandStatus("hawq register -d " + (string) HAWQ_DB + " -c " + filePath + "incorrect6.yml xx"));
    +  EXPECT_EQ(1, Command::getCommandStatus("hawq register -d " + (string) HAWQ_DB + " -c " + filePath + "incorrect8.yml xx"));
     }
     
    -TEST_F(TestHawqRegister, TestCreateExistedTable) {
    +TEST_F(TestHawqRegister, TestDismatchFileNumber) {
    +  SQLUtility util;
    +  string filePath = util.getTestRootPath() + "/ManagementTool/";
    +  EXPECT_EQ(1, Command::getCommandStatus("hawq register -d " + (string) HAWQ_DB + " -c " + filePath + "incorrect7.yml xx"));
    +}
    +
    +TEST_F(TestHawqRegister, TestUsage2Behavior2) {
       SQLUtility util;
       util.execute("drop table if exists t10;");
    -  util.execute("create table t10(i int) with (appendonly=true, orientation=row) distributed by (i);");
    --- End diff --
    
    done. I modify the name of test yml file, and some of the cases. I think I will modify table names of all cases in the following pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq pull request #878: HAWQ-1025. Implement issues in HAWQ-1025.

Posted by ictmalili <gi...@git.apache.org>.
Github user ictmalili commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/878#discussion_r77278257
  
    --- Diff: src/test/feature/ManagementTool/test_hawq_register.cpp ---
    @@ -341,15 +341,30 @@ TEST_F(TestHawqRegister, TestIncorrectYaml) {
       EXPECT_EQ(1, Command::getCommandStatus("hawq register -d " + (string) HAWQ_DB + " -c " + filePath + "incorrect4.yml xx"));
       EXPECT_EQ(1, Command::getCommandStatus("hawq register -d " + (string) HAWQ_DB + " -c " + filePath + "incorrect5.yml xx"));
       EXPECT_EQ(1, Command::getCommandStatus("hawq register -d " + (string) HAWQ_DB + " -c " + filePath + "incorrect6.yml xx"));
    +  EXPECT_EQ(1, Command::getCommandStatus("hawq register -d " + (string) HAWQ_DB + " -c " + filePath + "incorrect8.yml xx"));
     }
     
    -TEST_F(TestHawqRegister, TestCreateExistedTable) {
    +TEST_F(TestHawqRegister, TestDismatchFileNumber) {
    +  SQLUtility util;
    +  string filePath = util.getTestRootPath() + "/ManagementTool/";
    +  EXPECT_EQ(1, Command::getCommandStatus("hawq register -d " + (string) HAWQ_DB + " -c " + filePath + "incorrect7.yml xx"));
    +}
    +
    +TEST_F(TestHawqRegister, TestUsage2Behavior2) {
       SQLUtility util;
       util.execute("drop table if exists t10;");
    -  util.execute("create table t10(i int) with (appendonly=true, orientation=row) distributed by (i);");
    --- End diff --
    
    Shall we rename the tableName and test case to a meaning name, instead of 1,2,3? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---