You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by orhankislal <gi...@git.apache.org> on 2018/05/24 22:30:25 UTC

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

GitHub user orhankislal opened a pull request:

    https://github.com/apache/madlib/pull/271

    Madpack: Make install, reinstall and upgrade atomic

    We now write all the necessary sql into one file, and run it once in a
    single session. The database's rollback will be useful to bring it back
    to original state in case of a failure.
    
    Co-authored-by: Rahul Iyer <ri...@apache.org>
    Co-authored-by: Orhan Kislal <ok...@pivotal.io>

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib feature/atomic_upgrade

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #271
    
----
commit 693f6ae59dbb65d56fe4ff95bac8fde198f3d04f
Author: Nandish Jayaram <nj...@...>
Date:   2018-05-18T23:13:28Z

    Madpack: Make install, reinstall and upgrade atomic
    
    We now write all the necessary sql into one file, and run it once in a
    single session. The database's rollback will be useful to bring it back
    to original state in case of a failure.
    
    Co-authored-by: Rahul Iyer <ri...@apache.org>
    Co-authored-by: Orhan Kislal <ok...@pivotal.io>

----


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192191872
  
    --- Diff: src/madpack/madpack.py ---
    @@ -238,6 +311,88 @@ def _run_sql_file(schema, maddir_mod_py, module, sqlfile,
         return retval
     # ------------------------------------------------------------------------------
     
    +def _run_sql_file(schema, sqlfile):
    +    """
    +        Run SQL file
    +            @param schema name of the target schema
    +            @param sqlfile name of the file to parse
    +    """
    +    # Run the SQL using DB command-line utility
    +    if portid in SUPPORTED_PORTS:
    +        sqlcmd = 'psql'
    +        # Test the DB cmd line utility
    +        std, err = subprocess.Popen(['which', sqlcmd], stdout=subprocess.PIPE,
    +                                    stderr=subprocess.PIPE).communicate()
    +        if not std:
    +            error_(this, "Command not found: %s" % sqlcmd, True)
    +
    +        runcmd = [sqlcmd, '-a',
    +                  '-v', 'ON_ERROR_STOP=1',
    +                  '-h', con_args['host'].split(':')[0],
    +                  '-p', con_args['host'].split(':')[1],
    +                  '-d', con_args['database'],
    +                  '-U', con_args['user'],
    +                  '--no-password',
    +                  '--single-transaction',
    +                  '-f', sqlfile]
    +        runenv = os.environ
    +        if 'password' in con_args:
    +            runenv["PGPASSWORD"] = con_args['password']
    +        runenv["PGOPTIONS"] = '-c client_min_messages=notice'
    +
    +    # Open log file
    +    logfile = sqlfile + '.log'
    +    try:
    +        log = open(logfile, 'w')
    +    except:
    +        error_(this, "Cannot create log file: %s" % logfile, False)
    +        raise Exception
    +
    +    # Run the SQL
    +    try:
    +        info_(this, "> ... executing " + sqlfile, verbose)
    +        info_(this, ' '.join(runcmd), verbose)
    +        retval = subprocess.call(runcmd, env=runenv, stdout=log, stderr=log)
    +    except:
    +        error_(this, "Failed executing %s" % sqlfile, False)
    +        raise Exception
    +    finally:
    +        log.close()
    +    # Check the exit status
    +    result = _parse_result_logfile(retval, logfile, sqlfile)
    +    return result
    +# ------------------------------------------------------------------------------
    +
    +def _parse_result_logfile(retval, logfile, sql_abspath,
    +                          sql_filename=None, module=None, milliseconds=None):
    +    """
    +    Function to parse the logfile and return if its content indicate a failure
    +    or success.
    +    """
    +    is_install_check_logfile = bool(sql_filename and module)
    +    # Check the exit status
    +    if retval != 0:
    +        result = 'FAIL'
    +        global keeplogs
    +        keeplogs = True
    +    # Since every single statement in the test file gets logged,
    +    # an empty log file indicates an empty or a failed test
    +    elif os.path.isfile(logfile) and os.path.getsize(logfile) > 0:
    +        result = 'PASS'
    +    # Otherwise
    +    else:
    +        result = 'ERROR'
    +
    +    if is_install_check_logfile:
    --- End diff --
    
    That would change the order of messages. If we want to preserve the order, then we would have to duplicate the code wherever this function gets called. 


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191608985
  
    --- Diff: src/madpack/madpack.py ---
    @@ -824,6 +873,246 @@ def parse_arguments():
         # Get the arguments
         return parser.parse_args()
     
    +def run_install_check(args, testcase):
    +    schema = args['schema']
    +    dbrev = args['dbrev']
    +    # 1) Compare OS and DB versions. Continue if OS = DB.
    +    if get_rev_num(dbrev) != get_rev_num(rev):
    +        _print_revs(rev, dbrev, con_args, schema)
    +        info_(this, "Versions do not match. Install-check stopped.", True)
    +        return
    +
    +    # Create install-check user
    +    test_user = ('madlib_' +
    +                 rev.replace('.', '').replace('-', '_') +
    +                 '_installcheck')
    +    try:
    +        _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), False)
    +    except:
    +        _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    +        _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), True)
    +    _internal_run_query("CREATE USER %s;" % (test_user), True)
    +
    +    _internal_run_query("GRANT USAGE ON SCHEMA %s TO %s;" % (schema, test_user), True)
    +
    +    # 2) Run test SQLs
    +    info_(this, "> Running test scripts for:", verbose)
    +
    +    caseset = (set([test.strip() for test in testcase.split(',')])
    +               if testcase != "" else set())
    +
    +    modset = {}
    +    for case in caseset:
    +        if case.find('/') > -1:
    +            [mod, algo] = case.split('/')
    +            if mod not in modset:
    +                modset[mod] = []
    +            if algo not in modset[mod]:
    +                modset[mod].append(algo)
    +        else:
    +            modset[case] = []
    +
    +    # Loop through all modules
    +    for moduleinfo in portspecs['modules']:
    +
    +        # Get module name
    +        module = moduleinfo['name']
    +
    +        # Skip if doesn't meet specified modules
    +        if modset is not None and len(modset) > 0 and module not in modset:
    +            continue
    +        # JIRA: MADLIB-1078 fix
    +        # Skip pmml during install-check (when run without the -t option).
    +        # We can still run install-check on pmml with '-t' option.
    +        if not modset and module in ['pmml']:
    +            continue
    +        info_(this, "> - %s" % module, verbose)
    +
    +        # Make a temp dir for this module (if doesn't exist)
    +        cur_tmpdir = tmpdir + '/' + module + '/test'  # tmpdir is a global variable
    +        _make_dir(cur_tmpdir)
    +
    +        # Find the Python module dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/" + dbver + "/modules/" + module):
    +            maddir_mod_py = maddir + "/ports/" + portid + "/" + dbver + "/modules"
    +        else:
    +            maddir_mod_py = maddir + "/modules"
    +
    +        # Find the SQL module dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/modules/" + module):
    +            maddir_mod_sql = maddir + "/ports/" + portid + "/modules"
    +        else:
    +            maddir_mod_sql = maddir + "/modules"
    +
    +        # Prepare test schema
    +        test_schema = "madlib_installcheck_%s" % (module)
    +        _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE; CREATE SCHEMA %s;" %
    +                            (test_schema, test_schema), True)
    +        _internal_run_query("GRANT ALL ON SCHEMA %s TO %s;" %
    +                            (test_schema, test_user), True)
    +
    +        # Switch to test user and prepare the search_path
    +        pre_sql = '-- Switch to test user:\n' \
    +                  'SET ROLE %s;\n' \
    +                  '-- Set SEARCH_PATH for install-check:\n' \
    +                  'SET search_path=%s,%s;\n' \
    +                  % (test_user, test_schema, schema)
    +
    +        # Loop through all test SQL files for this module
    +        sql_files = maddir_mod_sql + '/' + module + '/test/*.sql_in'
    +        for sqlfile in sorted(glob.glob(sql_files), reverse=True):
    +            algoname = os.path.basename(sqlfile).split('.')[0]
    +            # run only algo specified
    +            if (module in modset and modset[module] and
    +                    algoname not in modset[module]):
    +                continue
    +
    +            # Set file names
    +            tmpfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.tmp'
    +            logfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.log'
    +
    +            # If there is no problem with the SQL file
    +            milliseconds = 0
    +
    +            # Run the SQL
    +            run_start = datetime.datetime.now()
    +            retval = _run_sql_file_install_check(schema, maddir_mod_py,
    +                                                 module, sqlfile, tmpfile,
    +                                                 logfile, pre_sql)
    +            # Runtime evaluation
    +            run_end = datetime.datetime.now()
    +            milliseconds = round((run_end - run_start).seconds * 1000 +
    +                                 (run_end - run_start).microseconds / 1000)
    +
    +            # Check the exit status
    +            result = _parse_result_logfile(retval, logfile, tmpfile, sqlfile,
    +                                           module, milliseconds)
    +
    +        # Cleanup test schema for the module
    +        _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE;" % (test_schema), True)
    +
    +    # Drop install-check user
    +    _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    +    _internal_run_query("DROP USER %s;" % (test_user), True)
    +
    +def create_install_madlib_sqlfile(args, madpack_cmd, testcase):
    +    upgrade = args['upgrade']
    +    schema = args['schema']
    +    dbrev = args['dbrev']
    +    is_schema_in_db = args['is_schema_in_db']
    +    with open(args['output_filename'], 'a+') as output_filehandle:
    +        # COMMAND: uninstall/reinstall
    +        if madpack_cmd in ('uninstall', 'reinstall'):
    +            if get_rev_num(dbrev) == [0]:
    +                info_(this, "Nothing to uninstall. No version found in schema %s." % schema.upper(), True)
    +                return 1
    +
    +            # Find any potential data to lose
    +            affected_objects = _internal_run_query("""
    +                SELECT
    +                    n1.nspname AS schema,
    +                    relname AS relation,
    +                    attname AS column,
    +                    typname AS type
    +                FROM
    +                    pg_attribute a,
    +                    pg_class c,
    +                    pg_type t,
    +                    pg_namespace n,
    +                    pg_namespace n1
    +                WHERE
    +                    n.nspname = '%s'
    +                    AND t.typnamespace = n.oid
    +                    AND a.atttypid = t.oid
    +                    AND c.oid = a.attrelid
    +                    AND c.relnamespace = n1.oid
    +                    AND c.relkind = 'r'
    +                ORDER BY
    +                    n1.nspname, relname, attname, typname""" % schema.lower(), True)
    +
    +            info_(this, "*** Uninstalling MADlib ***", True)
    +            info_(this, "***********************************************************************************", True)
    +            info_(this, "* Schema %s and all database objects depending on it will be dropped!" % schema.upper(), True)
    +            if affected_objects:
    +                info_(this, "* If you continue the following data will be lost (schema : table.column : type):", True)
    +                for ao in affected_objects:
    +                    info_(this, '* - ' + ao['schema'] + ' : ' + ao['relation'] + '.' +
    +                          ao['column'] + ' : ' + ao['type'], True)
    +            info_(this, "***********************************************************************************", True)
    +            info_(this, "Would you like to continue? [Y/N]", True)
    +            go = raw_input('>>> ').upper()
    +            while (go not in ('Y', 'N', 'YES', 'NO')):
    +                go = raw_input('Yes or No >>> ').upper()
    +
    +            # 2) Do the uninstall/drop
    +            if go in ('N', 'NO'):
    +                info_(this, 'No problem. Nothing dropped.', True)
    +                return 1
    +            elif go in ('Y', 'YES'):
    +                try:
    +                    _write_to_file(output_filehandle,
    +                                   "DROP SCHEMA %s CASCADE;" % (schema), True)
    +                    is_schema_in_db = False
    +                except:
    +                    error_(this, "Cannot drop schema %s." % schema.upper(), True)
    +
    +            else:
    +                return 1
    +
    +        # COMMAND: install/reinstall
    +        if madpack_cmd in ('install', 'reinstall'):
    +            # Refresh MADlib version in DB, None for GP/PG
    +            if madpack_cmd == 'reinstall':
    +                info_(this, "Setting MADlib database version to be None for reinstall", verbose)
    +                dbrev = None
    +
    +            info_(this, "*** Installing MADlib ***", True)
    +
    +            # 1) Compare OS and DB versions.
    +            # noop if OS <= DB.
    +            _print_revs(rev, dbrev, con_args, schema)
    +            if is_rev_gte(get_rev_num(dbrev), get_rev_num(rev)):
    --- End diff --
    
    can we also rename the `dbrev` and `rev` variables to be reflective of their purpose ?


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192182069
  
    --- Diff: src/madpack/madpack.py ---
    @@ -95,6 +95,16 @@ def _internal_run_query(sql, show_error):
         return run_query(sql, con_args, show_error)
     # ------------------------------------------------------------------------------
     
    +def _write_to_file(handle, sql, show_error=False):
    +    handle.write(sql)
    +    handle.write('\n')
    +# ------------------------------------------------------------------------------
    +
    +
    +def _merge_to_file(handle, other_file, show_error=False):
    --- End diff --
    
    It was used very early in the story I guess, have removed it now.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192199353
  
    --- Diff: src/madpack/madpack.py ---
    @@ -824,6 +873,246 @@ def parse_arguments():
         # Get the arguments
         return parser.parse_args()
     
    +def run_install_check(args, testcase):
    +    schema = args['schema']
    +    dbrev = args['dbrev']
    +    # 1) Compare OS and DB versions. Continue if OS = DB.
    +    if get_rev_num(dbrev) != get_rev_num(rev):
    +        _print_revs(rev, dbrev, con_args, schema)
    +        info_(this, "Versions do not match. Install-check stopped.", True)
    +        return
    +
    +    # Create install-check user
    +    test_user = ('madlib_' +
    +                 rev.replace('.', '').replace('-', '_') +
    +                 '_installcheck')
    +    try:
    +        _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), False)
    +    except:
    +        _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    +        _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), True)
    +    _internal_run_query("CREATE USER %s;" % (test_user), True)
    +
    +    _internal_run_query("GRANT USAGE ON SCHEMA %s TO %s;" % (schema, test_user), True)
    +
    +    # 2) Run test SQLs
    +    info_(this, "> Running test scripts for:", verbose)
    +
    +    caseset = (set([test.strip() for test in testcase.split(',')])
    +               if testcase != "" else set())
    +
    +    modset = {}
    +    for case in caseset:
    +        if case.find('/') > -1:
    +            [mod, algo] = case.split('/')
    +            if mod not in modset:
    +                modset[mod] = []
    +            if algo not in modset[mod]:
    +                modset[mod].append(algo)
    +        else:
    +            modset[case] = []
    +
    +    # Loop through all modules
    +    for moduleinfo in portspecs['modules']:
    +
    +        # Get module name
    +        module = moduleinfo['name']
    +
    +        # Skip if doesn't meet specified modules
    +        if modset is not None and len(modset) > 0 and module not in modset:
    +            continue
    +        # JIRA: MADLIB-1078 fix
    +        # Skip pmml during install-check (when run without the -t option).
    +        # We can still run install-check on pmml with '-t' option.
    +        if not modset and module in ['pmml']:
    +            continue
    +        info_(this, "> - %s" % module, verbose)
    +
    +        # Make a temp dir for this module (if doesn't exist)
    +        cur_tmpdir = tmpdir + '/' + module + '/test'  # tmpdir is a global variable
    +        _make_dir(cur_tmpdir)
    +
    +        # Find the Python module dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/" + dbver + "/modules/" + module):
    +            maddir_mod_py = maddir + "/ports/" + portid + "/" + dbver + "/modules"
    +        else:
    +            maddir_mod_py = maddir + "/modules"
    +
    +        # Find the SQL module dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/modules/" + module):
    +            maddir_mod_sql = maddir + "/ports/" + portid + "/modules"
    +        else:
    +            maddir_mod_sql = maddir + "/modules"
    +
    +        # Prepare test schema
    +        test_schema = "madlib_installcheck_%s" % (module)
    +        _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE; CREATE SCHEMA %s;" %
    +                            (test_schema, test_schema), True)
    +        _internal_run_query("GRANT ALL ON SCHEMA %s TO %s;" %
    +                            (test_schema, test_user), True)
    +
    +        # Switch to test user and prepare the search_path
    +        pre_sql = '-- Switch to test user:\n' \
    +                  'SET ROLE %s;\n' \
    +                  '-- Set SEARCH_PATH for install-check:\n' \
    +                  'SET search_path=%s,%s;\n' \
    +                  % (test_user, test_schema, schema)
    +
    +        # Loop through all test SQL files for this module
    +        sql_files = maddir_mod_sql + '/' + module + '/test/*.sql_in'
    +        for sqlfile in sorted(glob.glob(sql_files), reverse=True):
    +            algoname = os.path.basename(sqlfile).split('.')[0]
    +            # run only algo specified
    +            if (module in modset and modset[module] and
    +                    algoname not in modset[module]):
    +                continue
    +
    +            # Set file names
    +            tmpfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.tmp'
    +            logfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.log'
    +
    +            # If there is no problem with the SQL file
    +            milliseconds = 0
    +
    +            # Run the SQL
    +            run_start = datetime.datetime.now()
    +            retval = _run_sql_file_install_check(schema, maddir_mod_py,
    +                                                 module, sqlfile, tmpfile,
    +                                                 logfile, pre_sql)
    +            # Runtime evaluation
    +            run_end = datetime.datetime.now()
    +            milliseconds = round((run_end - run_start).seconds * 1000 +
    +                                 (run_end - run_start).microseconds / 1000)
    +
    +            # Check the exit status
    +            result = _parse_result_logfile(retval, logfile, tmpfile, sqlfile,
    +                                           module, milliseconds)
    +
    +        # Cleanup test schema for the module
    +        _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE;" % (test_schema), True)
    +
    +    # Drop install-check user
    +    _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    +    _internal_run_query("DROP USER %s;" % (test_user), True)
    +
    +def create_install_madlib_sqlfile(args, madpack_cmd, testcase):
    +    upgrade = args['upgrade']
    +    schema = args['schema']
    +    dbrev = args['dbrev']
    +    is_schema_in_db = args['is_schema_in_db']
    +    with open(args['output_filename'], 'a+') as output_filehandle:
    +        # COMMAND: uninstall/reinstall
    --- End diff --
    
    We created new functions for uninstall and install/reinstall.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191581553
  
    --- Diff: src/madpack/madpack.py ---
    @@ -95,6 +95,16 @@ def _internal_run_query(sql, show_error):
         return run_query(sql, con_args, show_error)
     # ------------------------------------------------------------------------------
     
    +def _write_to_file(handle, sql, show_error=False):
    +    handle.write(sql)
    +    handle.write('\n')
    +# ------------------------------------------------------------------------------
    +
    +
    +def _merge_to_file(handle, other_file, show_error=False):
    +    with open(other_file) as other_handle:
    --- End diff --
    
    does the order matter here ? will this work as well `other_handle.write(handle.read())`. Maybe add some comments about this function


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191604335
  
    --- Diff: src/madpack/upgrade_util.py ---
    @@ -1299,18 +1303,19 @@ def _clean_function(self):
             pattern = re.compile(r"""CREATE(\s+)FUNCTION""", re.DOTALL | re.IGNORECASE)
             self._sql = re.sub(pattern, 'CREATE OR REPLACE FUNCTION', self._sql)
     
    -    def cleanup(self, sql):
    +    def cleanup(self, sql, algoname):
             """
             @brief Entry function for cleaning the sql script
             """
             self._sql = sql
    -        self._clean_comment()
    -        self._clean_type()
    -        self._clean_cast()
    -        self._clean_operator()
    -        self._clean_opclass()
    -        self._clean_aggregate()
    -        self._clean_function()
    +        if algoname not in self.get_change_handler().newmodule:
    --- End diff --
    
    can we switch the if logic to 
    ```
    if algoname in  self.get_change_handler().newmodule:
        return self._sql
    ```
    
    also it is not really clear why we need the if check in the first place. Can you explain it in a comment ?


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192193755
  
    --- Diff: src/madpack/madpack.py ---
    @@ -824,6 +873,246 @@ def parse_arguments():
         # Get the arguments
         return parser.parse_args()
     
    +def run_install_check(args, testcase):
    +    schema = args['schema']
    --- End diff --
    
    This is install check code which is not refactored in this commit, and should be done as part of a different commit. We only moved `install-check` specific code from `main` to this function, since `main` was very huge.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192190940
  
    --- Diff: src/madpack/madpack.py ---
    @@ -238,6 +311,88 @@ def _run_sql_file(schema, maddir_mod_py, module, sqlfile,
         return retval
     # ------------------------------------------------------------------------------
     
    +def _run_sql_file(schema, sqlfile):
    --- End diff --
    
    We refactored the code in `_run_sql_file_install_check` since it uses the same command as what is in `_run_m4_and_append`. 
    `_run_sql_file` has a different command, so we have left it as is.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191608639
  
    --- Diff: src/madpack/madpack.py ---
    @@ -824,6 +873,246 @@ def parse_arguments():
         # Get the arguments
         return parser.parse_args()
     
    +def run_install_check(args, testcase):
    +    schema = args['schema']
    +    dbrev = args['dbrev']
    +    # 1) Compare OS and DB versions. Continue if OS = DB.
    +    if get_rev_num(dbrev) != get_rev_num(rev):
    +        _print_revs(rev, dbrev, con_args, schema)
    +        info_(this, "Versions do not match. Install-check stopped.", True)
    +        return
    +
    +    # Create install-check user
    +    test_user = ('madlib_' +
    +                 rev.replace('.', '').replace('-', '_') +
    +                 '_installcheck')
    +    try:
    +        _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), False)
    +    except:
    +        _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    +        _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), True)
    +    _internal_run_query("CREATE USER %s;" % (test_user), True)
    +
    +    _internal_run_query("GRANT USAGE ON SCHEMA %s TO %s;" % (schema, test_user), True)
    +
    +    # 2) Run test SQLs
    +    info_(this, "> Running test scripts for:", verbose)
    +
    +    caseset = (set([test.strip() for test in testcase.split(',')])
    +               if testcase != "" else set())
    +
    +    modset = {}
    +    for case in caseset:
    +        if case.find('/') > -1:
    +            [mod, algo] = case.split('/')
    +            if mod not in modset:
    +                modset[mod] = []
    +            if algo not in modset[mod]:
    +                modset[mod].append(algo)
    +        else:
    +            modset[case] = []
    +
    +    # Loop through all modules
    +    for moduleinfo in portspecs['modules']:
    +
    +        # Get module name
    +        module = moduleinfo['name']
    +
    +        # Skip if doesn't meet specified modules
    +        if modset is not None and len(modset) > 0 and module not in modset:
    +            continue
    +        # JIRA: MADLIB-1078 fix
    +        # Skip pmml during install-check (when run without the -t option).
    +        # We can still run install-check on pmml with '-t' option.
    +        if not modset and module in ['pmml']:
    +            continue
    +        info_(this, "> - %s" % module, verbose)
    +
    +        # Make a temp dir for this module (if doesn't exist)
    +        cur_tmpdir = tmpdir + '/' + module + '/test'  # tmpdir is a global variable
    +        _make_dir(cur_tmpdir)
    +
    +        # Find the Python module dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/" + dbver + "/modules/" + module):
    +            maddir_mod_py = maddir + "/ports/" + portid + "/" + dbver + "/modules"
    +        else:
    +            maddir_mod_py = maddir + "/modules"
    +
    +        # Find the SQL module dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/modules/" + module):
    +            maddir_mod_sql = maddir + "/ports/" + portid + "/modules"
    +        else:
    +            maddir_mod_sql = maddir + "/modules"
    +
    +        # Prepare test schema
    +        test_schema = "madlib_installcheck_%s" % (module)
    +        _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE; CREATE SCHEMA %s;" %
    +                            (test_schema, test_schema), True)
    +        _internal_run_query("GRANT ALL ON SCHEMA %s TO %s;" %
    +                            (test_schema, test_user), True)
    +
    +        # Switch to test user and prepare the search_path
    +        pre_sql = '-- Switch to test user:\n' \
    +                  'SET ROLE %s;\n' \
    +                  '-- Set SEARCH_PATH for install-check:\n' \
    +                  'SET search_path=%s,%s;\n' \
    +                  % (test_user, test_schema, schema)
    +
    +        # Loop through all test SQL files for this module
    +        sql_files = maddir_mod_sql + '/' + module + '/test/*.sql_in'
    +        for sqlfile in sorted(glob.glob(sql_files), reverse=True):
    +            algoname = os.path.basename(sqlfile).split('.')[0]
    +            # run only algo specified
    +            if (module in modset and modset[module] and
    +                    algoname not in modset[module]):
    +                continue
    +
    +            # Set file names
    +            tmpfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.tmp'
    +            logfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.log'
    +
    +            # If there is no problem with the SQL file
    +            milliseconds = 0
    +
    +            # Run the SQL
    +            run_start = datetime.datetime.now()
    +            retval = _run_sql_file_install_check(schema, maddir_mod_py,
    +                                                 module, sqlfile, tmpfile,
    +                                                 logfile, pre_sql)
    +            # Runtime evaluation
    +            run_end = datetime.datetime.now()
    +            milliseconds = round((run_end - run_start).seconds * 1000 +
    +                                 (run_end - run_start).microseconds / 1000)
    +
    +            # Check the exit status
    +            result = _parse_result_logfile(retval, logfile, tmpfile, sqlfile,
    +                                           module, milliseconds)
    +
    +        # Cleanup test schema for the module
    +        _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE;" % (test_schema), True)
    +
    +    # Drop install-check user
    +    _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    +    _internal_run_query("DROP USER %s;" % (test_user), True)
    +
    +def create_install_madlib_sqlfile(args, madpack_cmd, testcase):
    +    upgrade = args['upgrade']
    +    schema = args['schema']
    +    dbrev = args['dbrev']
    +    is_schema_in_db = args['is_schema_in_db']
    +    with open(args['output_filename'], 'a+') as output_filehandle:
    +        # COMMAND: uninstall/reinstall
    +        if madpack_cmd in ('uninstall', 'reinstall'):
    +            if get_rev_num(dbrev) == [0]:
    +                info_(this, "Nothing to uninstall. No version found in schema %s." % schema.upper(), True)
    +                return 1
    +
    +            # Find any potential data to lose
    +            affected_objects = _internal_run_query("""
    +                SELECT
    +                    n1.nspname AS schema,
    +                    relname AS relation,
    +                    attname AS column,
    +                    typname AS type
    +                FROM
    +                    pg_attribute a,
    +                    pg_class c,
    +                    pg_type t,
    +                    pg_namespace n,
    +                    pg_namespace n1
    +                WHERE
    +                    n.nspname = '%s'
    +                    AND t.typnamespace = n.oid
    +                    AND a.atttypid = t.oid
    +                    AND c.oid = a.attrelid
    +                    AND c.relnamespace = n1.oid
    +                    AND c.relkind = 'r'
    +                ORDER BY
    +                    n1.nspname, relname, attname, typname""" % schema.lower(), True)
    +
    +            info_(this, "*** Uninstalling MADlib ***", True)
    +            info_(this, "***********************************************************************************", True)
    +            info_(this, "* Schema %s and all database objects depending on it will be dropped!" % schema.upper(), True)
    +            if affected_objects:
    +                info_(this, "* If you continue the following data will be lost (schema : table.column : type):", True)
    +                for ao in affected_objects:
    +                    info_(this, '* - ' + ao['schema'] + ' : ' + ao['relation'] + '.' +
    +                          ao['column'] + ' : ' + ao['type'], True)
    +            info_(this, "***********************************************************************************", True)
    +            info_(this, "Would you like to continue? [Y/N]", True)
    +            go = raw_input('>>> ').upper()
    +            while (go not in ('Y', 'N', 'YES', 'NO')):
    +                go = raw_input('Yes or No >>> ').upper()
    +
    +            # 2) Do the uninstall/drop
    +            if go in ('N', 'NO'):
    +                info_(this, 'No problem. Nothing dropped.', True)
    +                return 1
    +            elif go in ('Y', 'YES'):
    +                try:
    +                    _write_to_file(output_filehandle,
    +                                   "DROP SCHEMA %s CASCADE;" % (schema), True)
    +                    is_schema_in_db = False
    +                except:
    +                    error_(this, "Cannot drop schema %s." % schema.upper(), True)
    +
    +            else:
    +                return 1
    +
    +        # COMMAND: install/reinstall
    +        if madpack_cmd in ('install', 'reinstall'):
    +            # Refresh MADlib version in DB, None for GP/PG
    +            if madpack_cmd == 'reinstall':
    +                info_(this, "Setting MADlib database version to be None for reinstall", verbose)
    +                dbrev = None
    +
    +            info_(this, "*** Installing MADlib ***", True)
    +
    +            # 1) Compare OS and DB versions.
    +            # noop if OS <= DB.
    +            _print_revs(rev, dbrev, con_args, schema)
    +            if is_rev_gte(get_rev_num(dbrev), get_rev_num(rev)):
    --- End diff --
    
    Since the if and elif conditions are not directly related, can we refactor this as 
    
    ```python
    if dbrev is None:
        pass
    if is_rev_gte(get_rev_num(dbrev), get_rev_num(rev)):
        info_(this, "Current MADlib version already up to date.", True)
        return 1
    ```



---

[GitHub] madlib issue #271: Madpack: Make install, reinstall and upgrade atomic

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on the issue:

    https://github.com/apache/madlib/pull/271
  
    Thank you for the comments @kaknikhil , will address them.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191587755
  
    --- Diff: src/madpack/madpack.py ---
    @@ -131,10 +141,73 @@ def _get_relative_maddir(maddir, port):
             return maddir
     # ------------------------------------------------------------------------------
     
    +def _cleanup_comments_in_sqlfile(output_filename, upgrade):
    +    """
    +    @brief Remove comments in the sql script
    +    """
    +    if not upgrade:
    +        with open(output_filename, 'r+') as output_filehandle:
    +            full_sql = output_filehandle.read()
    +            pattern = re.compile(r"""(/\*(.|[\r\n])*?\*/)|(--(.*|[\r\n]))""")
    +            res = ''
    +            lines = re.split(r'[\r\n]+', full_sql)
    +            for line in lines:
    +                tmp = line
    +                if not tmp.strip().startswith("E'"):
    +                    line = re.sub(pattern, '', line)
    +                res += line + '\n'
    +            full_sql = res.strip()
    +            full_sql = re.sub(pattern, '', full_sql).strip()
    +        # Re-write the cleaned-up sql to a new file. Python does not let us
    --- End diff --
    
    can we move the new file creation and the renaming logic to a different function? This way the function will have a single responsibility of just cleaning the input.
    



---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191595833
  
    --- Diff: src/madpack/madpack.py ---
    @@ -559,71 +650,59 @@ def _db_rename_schema(from_schema, to_schema):
     # ------------------------------------------------------------------------------
     
     
    -def _db_create_schema(schema):
    +def _db_create_schema(schema, is_schema_in_db, filehandle):
         """
         Create schema
             @param from_schema name of the schema to rename
    +        @param is_schema_in_db flag to indicate if schema is already present
             @param to_schema new name for the schema
         """
     
    -    info_(this, "> Creating %s schema" % schema.upper(), True)
    --- End diff --
    
    Does it add value to keep this info statement ?


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192193089
  
    --- Diff: src/madpack/madpack.py ---
    @@ -559,71 +650,59 @@ def _db_rename_schema(from_schema, to_schema):
     # ------------------------------------------------------------------------------
     
     
    -def _db_create_schema(schema):
    +def _db_create_schema(schema, is_schema_in_db, filehandle):
         """
         Create schema
             @param from_schema name of the schema to rename
    +        @param is_schema_in_db flag to indicate if schema is already present
             @param to_schema new name for the schema
         """
     
    -    info_(this, "> Creating %s schema" % schema.upper(), True)
    --- End diff --
    
    This is useful and it was preserved in `main`.


---

[GitHub] madlib issue #271: Madpack: Make install, reinstall and upgrade atomic

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on the issue:

    https://github.com/apache/madlib/pull/271
  
    w/ @orhankislal  The reason we had to change the info messages that are printed during install/reinstall/upgrade is that we are now writing all the necessary SQL into a single file, and running it as an atomic operation. Previously, the messages were getting printed after individual modules were processed. If we keep the older info messages as is, it won't reflect what is happening underneath. Breaking the SQL file up to print messages will defeat the purpose of this PR.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/madlib/pull/271


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191592112
  
    --- Diff: src/madpack/madpack.py ---
    @@ -238,6 +311,88 @@ def _run_sql_file(schema, maddir_mod_py, module, sqlfile,
         return retval
     # ------------------------------------------------------------------------------
     
    +def _run_sql_file(schema, sqlfile):
    +    """
    +        Run SQL file
    +            @param schema name of the target schema
    +            @param sqlfile name of the file to parse
    +    """
    +    # Run the SQL using DB command-line utility
    +    if portid in SUPPORTED_PORTS:
    +        sqlcmd = 'psql'
    +        # Test the DB cmd line utility
    +        std, err = subprocess.Popen(['which', sqlcmd], stdout=subprocess.PIPE,
    +                                    stderr=subprocess.PIPE).communicate()
    +        if not std:
    +            error_(this, "Command not found: %s" % sqlcmd, True)
    +
    +        runcmd = [sqlcmd, '-a',
    +                  '-v', 'ON_ERROR_STOP=1',
    +                  '-h', con_args['host'].split(':')[0],
    +                  '-p', con_args['host'].split(':')[1],
    +                  '-d', con_args['database'],
    +                  '-U', con_args['user'],
    +                  '--no-password',
    +                  '--single-transaction',
    +                  '-f', sqlfile]
    +        runenv = os.environ
    +        if 'password' in con_args:
    +            runenv["PGPASSWORD"] = con_args['password']
    +        runenv["PGOPTIONS"] = '-c client_min_messages=notice'
    +
    +    # Open log file
    +    logfile = sqlfile + '.log'
    +    try:
    +        log = open(logfile, 'w')
    +    except:
    +        error_(this, "Cannot create log file: %s" % logfile, False)
    +        raise Exception
    +
    +    # Run the SQL
    +    try:
    +        info_(this, "> ... executing " + sqlfile, verbose)
    +        info_(this, ' '.join(runcmd), verbose)
    +        retval = subprocess.call(runcmd, env=runenv, stdout=log, stderr=log)
    +    except:
    +        error_(this, "Failed executing %s" % sqlfile, False)
    +        raise Exception
    +    finally:
    +        log.close()
    +    # Check the exit status
    +    result = _parse_result_logfile(retval, logfile, sqlfile)
    +    return result
    +# ------------------------------------------------------------------------------
    +
    +def _parse_result_logfile(retval, logfile, sql_abspath,
    +                          sql_filename=None, module=None, milliseconds=None):
    +    """
    +    Function to parse the logfile and return if its content indicate a failure
    +    or success.
    +    """
    +    is_install_check_logfile = bool(sql_filename and module)
    +    # Check the exit status
    +    if retval != 0:
    +        result = 'FAIL'
    +        global keeplogs
    +        keeplogs = True
    +    # Since every single statement in the test file gets logged,
    +    # an empty log file indicates an empty or a failed test
    +    elif os.path.isfile(logfile) and os.path.getsize(logfile) > 0:
    +        result = 'PASS'
    +    # Otherwise
    +    else:
    +        result = 'ERROR'
    +
    +    if is_install_check_logfile:
    --- End diff --
    
    can we move the `install_check` logic out of this function. The function already returns the `result`, we can use this `result` variable to print the install check output. 


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191581799
  
    --- Diff: src/madpack/madpack.py ---
    @@ -95,6 +95,16 @@ def _internal_run_query(sql, show_error):
         return run_query(sql, con_args, show_error)
     # ------------------------------------------------------------------------------
     
    +def _write_to_file(handle, sql, show_error=False):
    +    handle.write(sql)
    +    handle.write('\n')
    +# ------------------------------------------------------------------------------
    +
    +
    +def _merge_to_file(handle, other_file, show_error=False):
    +    with open(other_file) as other_handle:
    --- End diff --
    
    `show_error` arg is never used


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192174800
  
    --- Diff: src/madpack/upgrade_util.py ---
    @@ -1299,18 +1303,19 @@ def _clean_function(self):
             pattern = re.compile(r"""CREATE(\s+)FUNCTION""", re.DOTALL | re.IGNORECASE)
             self._sql = re.sub(pattern, 'CREATE OR REPLACE FUNCTION', self._sql)
     
    -    def cleanup(self, sql):
    +    def cleanup(self, sql, algoname):
    --- End diff --
    
    Good catch, this is dead code.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191607552
  
    --- Diff: src/madpack/madpack.py ---
    @@ -824,6 +873,246 @@ def parse_arguments():
         # Get the arguments
         return parser.parse_args()
     
    +def run_install_check(args, testcase):
    +    schema = args['schema']
    +    dbrev = args['dbrev']
    +    # 1) Compare OS and DB versions. Continue if OS = DB.
    +    if get_rev_num(dbrev) != get_rev_num(rev):
    +        _print_revs(rev, dbrev, con_args, schema)
    +        info_(this, "Versions do not match. Install-check stopped.", True)
    +        return
    +
    +    # Create install-check user
    +    test_user = ('madlib_' +
    +                 rev.replace('.', '').replace('-', '_') +
    +                 '_installcheck')
    +    try:
    +        _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), False)
    +    except:
    +        _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    +        _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), True)
    +    _internal_run_query("CREATE USER %s;" % (test_user), True)
    +
    +    _internal_run_query("GRANT USAGE ON SCHEMA %s TO %s;" % (schema, test_user), True)
    +
    +    # 2) Run test SQLs
    +    info_(this, "> Running test scripts for:", verbose)
    +
    +    caseset = (set([test.strip() for test in testcase.split(',')])
    +               if testcase != "" else set())
    +
    +    modset = {}
    +    for case in caseset:
    +        if case.find('/') > -1:
    +            [mod, algo] = case.split('/')
    +            if mod not in modset:
    +                modset[mod] = []
    +            if algo not in modset[mod]:
    +                modset[mod].append(algo)
    +        else:
    +            modset[case] = []
    +
    +    # Loop through all modules
    +    for moduleinfo in portspecs['modules']:
    +
    +        # Get module name
    +        module = moduleinfo['name']
    +
    +        # Skip if doesn't meet specified modules
    +        if modset is not None and len(modset) > 0 and module not in modset:
    +            continue
    +        # JIRA: MADLIB-1078 fix
    +        # Skip pmml during install-check (when run without the -t option).
    +        # We can still run install-check on pmml with '-t' option.
    +        if not modset and module in ['pmml']:
    +            continue
    +        info_(this, "> - %s" % module, verbose)
    +
    +        # Make a temp dir for this module (if doesn't exist)
    +        cur_tmpdir = tmpdir + '/' + module + '/test'  # tmpdir is a global variable
    +        _make_dir(cur_tmpdir)
    +
    +        # Find the Python module dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/" + dbver + "/modules/" + module):
    +            maddir_mod_py = maddir + "/ports/" + portid + "/" + dbver + "/modules"
    +        else:
    +            maddir_mod_py = maddir + "/modules"
    +
    +        # Find the SQL module dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/modules/" + module):
    +            maddir_mod_sql = maddir + "/ports/" + portid + "/modules"
    +        else:
    +            maddir_mod_sql = maddir + "/modules"
    +
    +        # Prepare test schema
    +        test_schema = "madlib_installcheck_%s" % (module)
    +        _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE; CREATE SCHEMA %s;" %
    +                            (test_schema, test_schema), True)
    +        _internal_run_query("GRANT ALL ON SCHEMA %s TO %s;" %
    +                            (test_schema, test_user), True)
    +
    +        # Switch to test user and prepare the search_path
    +        pre_sql = '-- Switch to test user:\n' \
    +                  'SET ROLE %s;\n' \
    +                  '-- Set SEARCH_PATH for install-check:\n' \
    +                  'SET search_path=%s,%s;\n' \
    +                  % (test_user, test_schema, schema)
    +
    +        # Loop through all test SQL files for this module
    +        sql_files = maddir_mod_sql + '/' + module + '/test/*.sql_in'
    +        for sqlfile in sorted(glob.glob(sql_files), reverse=True):
    +            algoname = os.path.basename(sqlfile).split('.')[0]
    +            # run only algo specified
    +            if (module in modset and modset[module] and
    +                    algoname not in modset[module]):
    +                continue
    +
    +            # Set file names
    +            tmpfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.tmp'
    +            logfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.log'
    +
    +            # If there is no problem with the SQL file
    +            milliseconds = 0
    +
    +            # Run the SQL
    +            run_start = datetime.datetime.now()
    +            retval = _run_sql_file_install_check(schema, maddir_mod_py,
    +                                                 module, sqlfile, tmpfile,
    +                                                 logfile, pre_sql)
    +            # Runtime evaluation
    +            run_end = datetime.datetime.now()
    +            milliseconds = round((run_end - run_start).seconds * 1000 +
    +                                 (run_end - run_start).microseconds / 1000)
    +
    +            # Check the exit status
    +            result = _parse_result_logfile(retval, logfile, tmpfile, sqlfile,
    +                                           module, milliseconds)
    +
    +        # Cleanup test schema for the module
    +        _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE;" % (test_schema), True)
    +
    +    # Drop install-check user
    +    _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    +    _internal_run_query("DROP USER %s;" % (test_user), True)
    +
    +def create_install_madlib_sqlfile(args, madpack_cmd, testcase):
    +    upgrade = args['upgrade']
    +    schema = args['schema']
    +    dbrev = args['dbrev']
    +    is_schema_in_db = args['is_schema_in_db']
    +    with open(args['output_filename'], 'a+') as output_filehandle:
    +        # COMMAND: uninstall/reinstall
    --- End diff --
    
    can we create separate functions for uninstall , reinstall/install and upgrade ?


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192181634
  
    --- Diff: src/madpack/madpack.py ---
    @@ -95,6 +95,16 @@ def _internal_run_query(sql, show_error):
         return run_query(sql, con_args, show_error)
     # ------------------------------------------------------------------------------
     
    +def _write_to_file(handle, sql, show_error=False):
    --- End diff --
    
    Moved this function to `madlib/utilities.py`.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192205268
  
    --- Diff: src/madpack/madpack.py ---
    @@ -987,275 +1276,42 @@ def main(argv):
                 error_(this, "Missing -p/--platform parameter.", True)
             if not con_args:
                 error_(this, "Unknown problem with database connection string: %s" % con_args, True)
    +    # ---------------- Completed "Get and validate arguments" -----------------
     
         # COMMAND: version
         if args.command[0] == 'version':
             _print_revs(rev, dbrev, con_args, schema)
     
    -    # COMMAND: uninstall/reinstall
    -    if args.command[0] in ('uninstall', 'reinstall'):
    -        if get_rev_num(dbrev) == [0]:
    -            info_(this, "Nothing to uninstall. No version found in schema %s." % schema.upper(), True)
    -            return
    -
    -        # Find any potential data to lose
    -        affected_objects = _internal_run_query("""
    -            SELECT
    -                n1.nspname AS schema,
    -                relname AS relation,
    -                attname AS column,
    -                typname AS type
    -            FROM
    -                pg_attribute a,
    -                pg_class c,
    -                pg_type t,
    -                pg_namespace n,
    -                pg_namespace n1
    -            WHERE
    -                n.nspname = '%s'
    -                AND t.typnamespace = n.oid
    -                AND a.atttypid = t.oid
    -                AND c.oid = a.attrelid
    -                AND c.relnamespace = n1.oid
    -                AND c.relkind = 'r'
    -            ORDER BY
    -                n1.nspname, relname, attname, typname""" % schema.lower(), True)
    -
    -        info_(this, "*** Uninstalling MADlib ***", True)
    -        info_(this, "***********************************************************************************", True)
    -        info_(this, "* Schema %s and all database objects depending on it will be dropped!" % schema.upper(), True)
    -        if affected_objects:
    -            info_(this, "* If you continue the following data will be lost (schema : table.column : type):", True)
    -            for ao in affected_objects:
    -                info_(this, '* - ' + ao['schema'] + ' : ' + ao['relation'] + '.' +
    -                      ao['column'] + ' : ' + ao['type'], True)
    -        info_(this, "***********************************************************************************", True)
    -        info_(this, "Would you like to continue? [Y/N]", True)
    -        go = raw_input('>>> ').upper()
    -        while go != 'Y' and go != 'N':
    -            go = raw_input('Yes or No >>> ').upper()
    -
    -        # 2) Do the uninstall/drop
    -        if go == 'N':
    -            info_(this, 'No problem. Nothing dropped.', True)
    -            return
    -
    -        elif go == 'Y':
    -            info_(this, "> dropping schema %s" % schema.upper(), verbose)
    -            try:
    -                _internal_run_query("DROP SCHEMA %s CASCADE;" % (schema), True)
    -            except:
    -                error_(this, "Cannot drop schema %s." % schema.upper(), True)
    -
    -            info_(this, 'Schema %s (and all dependent objects) has been dropped.' % schema.upper(), True)
    -            info_(this, 'MADlib uninstalled successfully.', True)
    -
    -        else:
    -            return
    -
    -    # COMMAND: install/reinstall
    -    if args.command[0] in ('install', 'reinstall'):
    -        # Refresh MADlib version in DB, None for GP/PG
    -        if args.command[0] == 'reinstall':
    -            print "Setting MADlib database version to be None for reinstall"
    -            dbrev = None
    -
    -        info_(this, "*** Installing MADlib ***", True)
    -
    -        # 1) Compare OS and DB versions.
    -        # noop if OS <= DB.
    -        _print_revs(rev, dbrev, con_args, schema)
    -        if is_rev_gte(get_rev_num(dbrev), get_rev_num(rev)):
    -            info_(this, "Current MADlib version already up to date.", True)
    -            return
    -        # proceed to create objects if nothing installed in DB
    -        elif dbrev is None:
    -            pass
    -        # error and refer to upgrade if OS > DB
    -        else:
    -            error_(this, """Aborting installation: existing MADlib version detected in {0} schema
    -                    To upgrade the {0} schema to MADlib v{1} please run the following command:
    -                    madpack upgrade -s {0} -p {2} [-c ...]
    -                    """.format(schema, rev, portid), True)
    -
    -        # 2) Run installation
    -        try:
    -            _plpy_check(py_min_ver)
    -            _db_install(schema, dbrev, args.testcase)
    -        except:
    -            error_(this, "MADlib installation failed.", True)
    -
    -    # COMMAND: upgrade
    -    if args.command[0] in ('upgrade', 'update'):
    -        info_(this, "*** Upgrading MADlib ***", True)
    -        dbrev = get_madlib_dbrev(con_args, schema)
    -
    -        # 1) Check DB version. If None, nothing to upgrade.
    -        if not dbrev:
    -            info_(this, "MADlib is not installed in {schema} schema and there "
    -                  "is nothing to upgrade. Please use install "
    -                  "instead.".format(schema=schema.upper()),
    -                  True)
    -            return
    -
    -        # 2) Compare OS and DB versions. Continue if OS > DB.
    -        _print_revs(rev, dbrev, con_args, schema)
    -        if is_rev_gte(get_rev_num(dbrev), get_rev_num(rev)):
    -            info_(this, "Current MADlib version is already up-to-date.", True)
    -            return
    -
    -        if float('.'.join(dbrev.split('.')[0:2])) < 1.0:
    -            info_(this, "The version gap is too large, upgrade is supported only for "
    -                  "packages greater than or equal to v1.0.", True)
    -            return
    -
    -        # 3) Run upgrade
    -        try:
    -            _plpy_check(py_min_ver)
    -            _db_upgrade(schema, dbrev)
    -        except Exception as e:
    -            # Uncomment the following lines when debugging
    -            print "Exception: " + str(e)
    -            print sys.exc_info()
    -            traceback.print_tb(sys.exc_info()[2])
    -            error_(this, "MADlib upgrade failed.", True)
    -
         # COMMAND: install-check
         if args.command[0] == 'install-check':
    -
    -        # 1) Compare OS and DB versions. Continue if OS = DB.
    -        if get_rev_num(dbrev) != get_rev_num(rev):
    -            _print_revs(rev, dbrev, con_args, schema)
    -            info_(this, "Versions do not match. Install-check stopped.", True)
    -            return
    -
    -        # Create install-check user
    -        test_user = ('madlib_' +
    -                     rev.replace('.', '').replace('-', '_') +
    -                     '_installcheck')
    +        run_install_check(locals(), args.testcase)
    +    else:
             try:
    -            _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), False)
    +            is_schema_in_db = _internal_run_query("SELECT schema_name FROM information_schema.schemata WHERE schema_name='%s';" % schema, True)
             except:
    -            _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    -            _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), True)
    -        _internal_run_query("CREATE USER %s;" % (test_user), True)
    -
    -        _internal_run_query("GRANT USAGE ON SCHEMA %s TO %s;" % (schema, test_user), True)
    -
    -        # 2) Run test SQLs
    -        info_(this, "> Running test scripts for:", verbose)
    -
    -        caseset = (set([test.strip() for test in args.testcase.split(',')])
    -                   if args.testcase != "" else set())
    -
    -        modset = {}
    -        for case in caseset:
    -            if case.find('/') > -1:
    -                [mod, algo] = case.split('/')
    -                if mod not in modset:
    -                    modset[mod] = []
    -                if algo not in modset[mod]:
    -                    modset[mod].append(algo)
    -            else:
    -                modset[case] = []
    -
    -        # Loop through all modules
    -        for moduleinfo in portspecs['modules']:
    -
    -            # Get module name
    -            module = moduleinfo['name']
    -
    -            # Skip if doesn't meet specified modules
    -            if modset is not None and len(modset) > 0 and module not in modset:
    -                continue
    -            # JIRA: MADLIB-1078 fix
    -            # Skip pmml during install-check (when run without the -t option).
    -            # We can still run install-check on pmml with '-t' option.
    -            if not modset and module in ['pmml']:
    -                continue
    -            info_(this, "> - %s" % module, verbose)
    -
    -            # Make a temp dir for this module (if doesn't exist)
    -            cur_tmpdir = tmpdir + '/' + module + '/test'  # tmpdir is a global variable
    -            _make_dir(cur_tmpdir)
    -
    -            # Find the Python module dir (platform specific or generic)
    -            if os.path.isdir(maddir + "/ports/" + portid + "/" + dbver + "/modules/" + module):
    -                maddir_mod_py = maddir + "/ports/" + portid + "/" + dbver + "/modules"
    +            error_(this, "Cannot validate if schema already exists.", True)
    +
    +        output_filename = tmpdir + "/madlib_{0}.sql".format(args.command[0])
    +        upgrade = False
    +        return_val = create_install_madlib_sqlfile(locals(), args.command[0], args.testcase)
    +        if return_val == 0:
    +            info_(this, "Installing MADlib modules...", True)
    +            _cleanup_comments_in_sqlfile(output_filename, upgrade)
    +            result = _run_sql_file(schema, output_filename)
    +
    +            if result == 'FAIL':
    +                info_(this, "MADlib {0} unsuccessful.".format(args.command[0]), True)
                 else:
    -                maddir_mod_py = maddir + "/modules"
    -
    -            # Find the SQL module dir (platform specific or generic)
    -            if os.path.isdir(maddir + "/ports/" + portid + "/modules/" + module):
    -                maddir_mod_sql = maddir + "/ports/" + portid + "/modules"
    -            else:
    -                maddir_mod_sql = maddir + "/modules"
    -
    -            # Prepare test schema
    -            test_schema = "madlib_installcheck_%s" % (module)
    -            _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE; CREATE SCHEMA %s;" %
    -                                (test_schema, test_schema), True)
    -            _internal_run_query("GRANT ALL ON SCHEMA %s TO %s;" %
    -                                (test_schema, test_user), True)
    -
    -            # Switch to test user and prepare the search_path
    -            pre_sql = '-- Switch to test user:\n' \
    -                      'SET ROLE %s;\n' \
    -                      '-- Set SEARCH_PATH for install-check:\n' \
    -                      'SET search_path=%s,%s;\n' \
    -                      % (test_user, test_schema, schema)
    -
    -            # Loop through all test SQL files for this module
    -            sql_files = maddir_mod_sql + '/' + module + '/test/*.sql_in'
    -            for sqlfile in sorted(glob.glob(sql_files), reverse=True):
    -                algoname = os.path.basename(sqlfile).split('.')[0]
    -                # run only algo specified
    -                if (module in modset and modset[module] and
    -                        algoname not in modset[module]):
    -                    continue
    -
    -                # Set file names
    -                tmpfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.tmp'
    -                logfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.log'
    -
    -                # If there is no problem with the SQL file
    -                milliseconds = 0
    -
    -                # Run the SQL
    -                run_start = datetime.datetime.now()
    -                retval = _run_sql_file(schema, maddir_mod_py, module,
    -                                       sqlfile, tmpfile, logfile, pre_sql)
    -                # Runtime evaluation
    -                run_end = datetime.datetime.now()
    -                milliseconds = round((run_end - run_start).seconds * 1000 +
    -                                     (run_end - run_start).microseconds / 1000)
    -
    -                # Check the exit status
    -                if retval != 0:
    -                    result = 'FAIL'
    -                    keeplogs = True
    -                # Since every single statement in the test file gets logged,
    -                # an empty log file indicates an empty or a failed test
    -                elif os.path.isfile(logfile) and os.path.getsize(logfile) > 0:
    -                    result = 'PASS'
    -                # Otherwise
    -                else:
    -                    result = 'ERROR'
    -
    -                # Output result
    -                print "TEST CASE RESULT|Module: " + module + \
    -                    "|" + os.path.basename(sqlfile) + "|" + result + \
    -                    "|Time: %d milliseconds" % (milliseconds)
    -
    -                if result == 'FAIL':
    -                    error_(this, "Failed executing %s" % tmpfile, False)
    -                    error_(this, "Check the log at %s" % logfile, False)
    -            # Cleanup test schema for the module
    -            _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE;" % (test_schema), True)
    -
    -        # Drop install-check user
    -        _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    -        _internal_run_query("DROP USER %s;" % (test_user), True)
    +                if args.command[0] != 'uninstall':
    +                    if args.command[0] == 'upgrade':
    +                        info_(this, "MADlib %s upgraded successfully in %s schema." % (str(rev), schema.upper()), True)
    +                    else:
    +                        info_(this, "> Created %s schema" % schema.upper(), True)
    --- End diff --
    
    Removed `upper()` function calls.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192204168
  
    --- Diff: src/madpack/madpack.py ---
    @@ -824,6 +873,246 @@ def parse_arguments():
         # Get the arguments
         return parser.parse_args()
     
    +def run_install_check(args, testcase):
    +    schema = args['schema']
    +    dbrev = args['dbrev']
    +    # 1) Compare OS and DB versions. Continue if OS = DB.
    +    if get_rev_num(dbrev) != get_rev_num(rev):
    +        _print_revs(rev, dbrev, con_args, schema)
    +        info_(this, "Versions do not match. Install-check stopped.", True)
    +        return
    +
    +    # Create install-check user
    +    test_user = ('madlib_' +
    +                 rev.replace('.', '').replace('-', '_') +
    +                 '_installcheck')
    +    try:
    +        _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), False)
    +    except:
    +        _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    +        _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), True)
    +    _internal_run_query("CREATE USER %s;" % (test_user), True)
    +
    +    _internal_run_query("GRANT USAGE ON SCHEMA %s TO %s;" % (schema, test_user), True)
    +
    +    # 2) Run test SQLs
    +    info_(this, "> Running test scripts for:", verbose)
    +
    +    caseset = (set([test.strip() for test in testcase.split(',')])
    +               if testcase != "" else set())
    +
    +    modset = {}
    +    for case in caseset:
    +        if case.find('/') > -1:
    +            [mod, algo] = case.split('/')
    +            if mod not in modset:
    +                modset[mod] = []
    +            if algo not in modset[mod]:
    +                modset[mod].append(algo)
    +        else:
    +            modset[case] = []
    +
    +    # Loop through all modules
    +    for moduleinfo in portspecs['modules']:
    +
    +        # Get module name
    +        module = moduleinfo['name']
    +
    +        # Skip if doesn't meet specified modules
    +        if modset is not None and len(modset) > 0 and module not in modset:
    +            continue
    +        # JIRA: MADLIB-1078 fix
    +        # Skip pmml during install-check (when run without the -t option).
    +        # We can still run install-check on pmml with '-t' option.
    +        if not modset and module in ['pmml']:
    +            continue
    +        info_(this, "> - %s" % module, verbose)
    +
    +        # Make a temp dir for this module (if doesn't exist)
    +        cur_tmpdir = tmpdir + '/' + module + '/test'  # tmpdir is a global variable
    +        _make_dir(cur_tmpdir)
    +
    +        # Find the Python module dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/" + dbver + "/modules/" + module):
    +            maddir_mod_py = maddir + "/ports/" + portid + "/" + dbver + "/modules"
    +        else:
    +            maddir_mod_py = maddir + "/modules"
    +
    +        # Find the SQL module dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/modules/" + module):
    +            maddir_mod_sql = maddir + "/ports/" + portid + "/modules"
    +        else:
    +            maddir_mod_sql = maddir + "/modules"
    +
    +        # Prepare test schema
    +        test_schema = "madlib_installcheck_%s" % (module)
    +        _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE; CREATE SCHEMA %s;" %
    +                            (test_schema, test_schema), True)
    +        _internal_run_query("GRANT ALL ON SCHEMA %s TO %s;" %
    +                            (test_schema, test_user), True)
    +
    +        # Switch to test user and prepare the search_path
    +        pre_sql = '-- Switch to test user:\n' \
    +                  'SET ROLE %s;\n' \
    +                  '-- Set SEARCH_PATH for install-check:\n' \
    +                  'SET search_path=%s,%s;\n' \
    +                  % (test_user, test_schema, schema)
    +
    +        # Loop through all test SQL files for this module
    +        sql_files = maddir_mod_sql + '/' + module + '/test/*.sql_in'
    +        for sqlfile in sorted(glob.glob(sql_files), reverse=True):
    +            algoname = os.path.basename(sqlfile).split('.')[0]
    +            # run only algo specified
    +            if (module in modset and modset[module] and
    +                    algoname not in modset[module]):
    +                continue
    +
    +            # Set file names
    +            tmpfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.tmp'
    +            logfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.log'
    +
    +            # If there is no problem with the SQL file
    +            milliseconds = 0
    +
    +            # Run the SQL
    +            run_start = datetime.datetime.now()
    +            retval = _run_sql_file_install_check(schema, maddir_mod_py,
    +                                                 module, sqlfile, tmpfile,
    +                                                 logfile, pre_sql)
    +            # Runtime evaluation
    +            run_end = datetime.datetime.now()
    +            milliseconds = round((run_end - run_start).seconds * 1000 +
    +                                 (run_end - run_start).microseconds / 1000)
    +
    +            # Check the exit status
    +            result = _parse_result_logfile(retval, logfile, tmpfile, sqlfile,
    +                                           module, milliseconds)
    +
    +        # Cleanup test schema for the module
    +        _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE;" % (test_schema), True)
    +
    +    # Drop install-check user
    +    _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    +    _internal_run_query("DROP USER %s;" % (test_user), True)
    +
    +def create_install_madlib_sqlfile(args, madpack_cmd, testcase):
    +    upgrade = args['upgrade']
    +    schema = args['schema']
    +    dbrev = args['dbrev']
    +    is_schema_in_db = args['is_schema_in_db']
    +    with open(args['output_filename'], 'a+') as output_filehandle:
    +        # COMMAND: uninstall/reinstall
    +        if madpack_cmd in ('uninstall', 'reinstall'):
    +            if get_rev_num(dbrev) == [0]:
    +                info_(this, "Nothing to uninstall. No version found in schema %s." % schema.upper(), True)
    +                return 1
    +
    +            # Find any potential data to lose
    +            affected_objects = _internal_run_query("""
    +                SELECT
    +                    n1.nspname AS schema,
    +                    relname AS relation,
    +                    attname AS column,
    +                    typname AS type
    +                FROM
    +                    pg_attribute a,
    +                    pg_class c,
    +                    pg_type t,
    +                    pg_namespace n,
    +                    pg_namespace n1
    +                WHERE
    +                    n.nspname = '%s'
    +                    AND t.typnamespace = n.oid
    +                    AND a.atttypid = t.oid
    +                    AND c.oid = a.attrelid
    +                    AND c.relnamespace = n1.oid
    +                    AND c.relkind = 'r'
    +                ORDER BY
    +                    n1.nspname, relname, attname, typname""" % schema.lower(), True)
    +
    +            info_(this, "*** Uninstalling MADlib ***", True)
    +            info_(this, "***********************************************************************************", True)
    +            info_(this, "* Schema %s and all database objects depending on it will be dropped!" % schema.upper(), True)
    +            if affected_objects:
    +                info_(this, "* If you continue the following data will be lost (schema : table.column : type):", True)
    +                for ao in affected_objects:
    +                    info_(this, '* - ' + ao['schema'] + ' : ' + ao['relation'] + '.' +
    +                          ao['column'] + ' : ' + ao['type'], True)
    +            info_(this, "***********************************************************************************", True)
    +            info_(this, "Would you like to continue? [Y/N]", True)
    +            go = raw_input('>>> ').upper()
    +            while (go not in ('Y', 'N', 'YES', 'NO')):
    +                go = raw_input('Yes or No >>> ').upper()
    +
    +            # 2) Do the uninstall/drop
    +            if go in ('N', 'NO'):
    +                info_(this, 'No problem. Nothing dropped.', True)
    +                return 1
    +            elif go in ('Y', 'YES'):
    +                try:
    +                    _write_to_file(output_filehandle,
    +                                   "DROP SCHEMA %s CASCADE;" % (schema), True)
    +                    is_schema_in_db = False
    +                except:
    +                    error_(this, "Cannot drop schema %s." % schema.upper(), True)
    +
    +            else:
    +                return 1
    +
    +        # COMMAND: install/reinstall
    +        if madpack_cmd in ('install', 'reinstall'):
    +            # Refresh MADlib version in DB, None for GP/PG
    +            if madpack_cmd == 'reinstall':
    +                info_(this, "Setting MADlib database version to be None for reinstall", verbose)
    +                dbrev = None
    +
    +            info_(this, "*** Installing MADlib ***", True)
    +
    +            # 1) Compare OS and DB versions.
    +            # noop if OS <= DB.
    +            _print_revs(rev, dbrev, con_args, schema)
    +            if is_rev_gte(get_rev_num(dbrev), get_rev_num(rev)):
    --- End diff --
    
    Keeping the `if elif`, but changed the order. Although the checks are on different variables, they are directly related.
    Renamed variables: `dbrev`->`db_madlib_ver`, and `rev`->`new_madlib_ver`.


---

[GitHub] madlib issue #271: Madpack: Make install, reinstall and upgrade atomic

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/271
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/492/



---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191580503
  
    --- Diff: src/madpack/utilities.py ---
    @@ -33,6 +33,23 @@
     this = os.path.basename(sys.argv[0])    # name of this script
     
     
    +class AtomicFileOpen:
    --- End diff --
    
    do we need this `AtomicFileOpen`class ? I don't think it's used anywhere


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191585342
  
    --- Diff: src/madpack/madpack.py ---
    @@ -131,10 +141,73 @@ def _get_relative_maddir(maddir, port):
             return maddir
     # ------------------------------------------------------------------------------
     
    +def _cleanup_comments_in_sqlfile(output_filename, upgrade):
    --- End diff --
    
    should this function be moved to the `ScriptCleaner` class ? There is some code duplication in `_clean_comment` function and this function. 


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191589054
  
    --- Diff: src/madpack/madpack.py ---
    @@ -131,10 +141,73 @@ def _get_relative_maddir(maddir, port):
             return maddir
     # ------------------------------------------------------------------------------
     
    +def _cleanup_comments_in_sqlfile(output_filename, upgrade):
    +    """
    +    @brief Remove comments in the sql script
    +    """
    +    if not upgrade:
    +        with open(output_filename, 'r+') as output_filehandle:
    +            full_sql = output_filehandle.read()
    +            pattern = re.compile(r"""(/\*(.|[\r\n])*?\*/)|(--(.*|[\r\n]))""")
    +            res = ''
    +            lines = re.split(r'[\r\n]+', full_sql)
    +            for line in lines:
    +                tmp = line
    +                if not tmp.strip().startswith("E'"):
    +                    line = re.sub(pattern, '', line)
    +                res += line + '\n'
    +            full_sql = res.strip()
    +            full_sql = re.sub(pattern, '', full_sql).strip()
    +        # Re-write the cleaned-up sql to a new file. Python does not let us
    +        # erase all the content of a file and rewrite the same file again.
    +        cleaned_output_filename = output_filename+'.tmp'
    +        with open(cleaned_output_filename, 'w') as output_filehandle:
    +            _write_to_file(output_filehandle, full_sql)
    +        # Move the cleaned output file to the old one.
    +        os.rename(cleaned_output_filename, output_filename)
    +
    +def _run_m4_and_append(schema, maddir_mod_py, module, sqlfile,
    +                       output_filehandle, pre_sql=None):
    +    """
    +    Function to process a sql file with M4.
    +    """
    +    # Check if the SQL file exists
    +    if not os.path.isfile(sqlfile):
    +        error_(this, "Missing module SQL file (%s)" % sqlfile, False)
    +        raise ValueError("Missing module SQL file (%s)" % sqlfile)
     
    -def _run_sql_file(schema, maddir_mod_py, module, sqlfile,
    -                  tmpfile, logfile, pre_sql, upgrade=False,
    -                  sc=None):
    +    # Prepare the file using M4
    +    try:
    +        # Add the before SQL
    +        if pre_sql:
    +            output_filehandle.writelines([pre_sql, '\n\n'])
    +        # Find the madpack dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/" + dbver + "/madpack"):
    +            maddir_madpack = maddir + "/ports/" + portid + "/" + dbver + "/madpack"
    +        else:
    +            maddir_madpack = maddir + "/madpack"
    +        maddir_ext_py = maddir + "/lib/python"
    +
    +        m4args = ['m4',
    +                  '-P',
    +                  '-DMADLIB_SCHEMA=' + schema,
    +                  '-DPLPYTHON_LIBDIR=' + maddir_mod_py,
    +                  '-DEXT_PYTHON_LIBDIR=' + maddir_ext_py,
    +                  '-DMODULE_PATHNAME=' + maddir_lib,
    +                  '-DMODULE_NAME=' + module,
    +                  '-I' + maddir_madpack,
    +                  sqlfile]
    +
    +        info_(this, "> ... parsing: " + " ".join(m4args), verbose)
    +        output_filehandle.flush()
    +        subprocess.call(m4args, stdout=output_filehandle)
    +    except:
    +        error_(this, "Failed executing m4 on %s" % sqlfile, False)
    +        raise Exception
    +
    +def _run_sql_file_install_check(schema, maddir_mod_py, module, sqlfile,
    --- End diff --
    
    maybe rename this as `run_install_check_sql` and also explain in comments why install check needs it's own function for running sql 


---

[GitHub] madlib issue #271: Madpack: Make install, reinstall and upgrade atomic

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/271
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/482/



---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192175486
  
    --- Diff: src/madpack/upgrade_util.py ---
    @@ -1299,18 +1303,19 @@ def _clean_function(self):
             pattern = re.compile(r"""CREATE(\s+)FUNCTION""", re.DOTALL | re.IGNORECASE)
             self._sql = re.sub(pattern, 'CREATE OR REPLACE FUNCTION', self._sql)
     
    -    def cleanup(self, sql):
    +    def cleanup(self, sql, algoname):
             """
             @brief Entry function for cleaning the sql script
             """
             self._sql = sql
    -        self._clean_comment()
    -        self._clean_type()
    -        self._clean_cast()
    -        self._clean_operator()
    -        self._clean_opclass()
    -        self._clean_aggregate()
    -        self._clean_function()
    +        if algoname not in self.get_change_handler().newmodule:
    --- End diff --
    
    This if check was originally done in `madpack.py` before `cleanup()` function was called. We moved it to this function instead. Will add the comment, but keep the if logic as is, since we will anyway have to return at the end.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191607127
  
    --- Diff: src/madpack/madpack.py ---
    @@ -824,6 +873,246 @@ def parse_arguments():
         # Get the arguments
         return parser.parse_args()
     
    +def run_install_check(args, testcase):
    +    schema = args['schema']
    --- End diff --
    
    there is a decent chunk of code logic duplicated in this function and `_db_create_objects`. Can we reuse this code logic ?


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191587596
  
    --- Diff: src/madpack/madpack.py ---
    @@ -131,10 +141,73 @@ def _get_relative_maddir(maddir, port):
             return maddir
     # ------------------------------------------------------------------------------
     
    +def _cleanup_comments_in_sqlfile(output_filename, upgrade):
    +    """
    +    @brief Remove comments in the sql script
    +    """
    +    if not upgrade:
    +        with open(output_filename, 'r+') as output_filehandle:
    +            full_sql = output_filehandle.read()
    +            pattern = re.compile(r"""(/\*(.|[\r\n])*?\*/)|(--(.*|[\r\n]))""")
    +            res = ''
    +            lines = re.split(r'[\r\n]+', full_sql)
    +            for line in lines:
    +                tmp = line
    +                if not tmp.strip().startswith("E'"):
    +                    line = re.sub(pattern, '', line)
    +                res += line + '\n'
    +            full_sql = res.strip()
    +            full_sql = re.sub(pattern, '', full_sql).strip()
    +        # Re-write the cleaned-up sql to a new file. Python does not let us
    --- End diff --
    
    can we move the new file creation and the renaming logic to a different function? This way the function will have a single responsibility of just cleaning the input.
    



---

[GitHub] madlib issue #271: Madpack: Make install, reinstall and upgrade atomic

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/271
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/496/



---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191598228
  
    --- Diff: src/madpack/upgrade_util.py ---
    @@ -1299,18 +1303,19 @@ def _clean_function(self):
             pattern = re.compile(r"""CREATE(\s+)FUNCTION""", re.DOTALL | re.IGNORECASE)
             self._sql = re.sub(pattern, 'CREATE OR REPLACE FUNCTION', self._sql)
     
    -    def cleanup(self, sql):
    +    def cleanup(self, sql, algoname):
    --- End diff --
    
    line 269 in madpack.py calls `sql = sc.cleanup(sql)`. This will fail because cleanup needs another argument. I think we can remove the code from line 263 to 270 because _run_sql_file_install_check is ony called once and `upgrade` is always false 
    ```
        if upgrade:
            # get filename from complete path without the extension
            sub_module = os.path.splitext(os.path.basename(sqlfile))[0]
            info_(this, sub_module, verbose)
            if sub_module not in sc.get_change_handler().newmodule:
                sql = open(tmpfile).read()
                sql = sc.cleanup(sql)
                open(tmpfile, 'w').write(sql)
    ```



---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191589562
  
    --- Diff: src/madpack/madpack.py ---
    @@ -238,6 +311,88 @@ def _run_sql_file(schema, maddir_mod_py, module, sqlfile,
         return retval
     # ------------------------------------------------------------------------------
     
    +def _run_sql_file(schema, sqlfile):
    --- End diff --
    
    The three functions `_run_sql_file` , `_run_sql_file_install_check` and `_run_m4_and_append` have some code logic duplicated related to running m4 and running the sql file. Can we refactor this to not have any duplicated code ?


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191601387
  
    --- Diff: src/madpack/madpack.py ---
    @@ -987,275 +1276,42 @@ def main(argv):
                 error_(this, "Missing -p/--platform parameter.", True)
             if not con_args:
                 error_(this, "Unknown problem with database connection string: %s" % con_args, True)
    +    # ---------------- Completed "Get and validate arguments" -----------------
     
         # COMMAND: version
         if args.command[0] == 'version':
             _print_revs(rev, dbrev, con_args, schema)
     
    -    # COMMAND: uninstall/reinstall
    -    if args.command[0] in ('uninstall', 'reinstall'):
    -        if get_rev_num(dbrev) == [0]:
    -            info_(this, "Nothing to uninstall. No version found in schema %s." % schema.upper(), True)
    -            return
    -
    -        # Find any potential data to lose
    -        affected_objects = _internal_run_query("""
    -            SELECT
    -                n1.nspname AS schema,
    -                relname AS relation,
    -                attname AS column,
    -                typname AS type
    -            FROM
    -                pg_attribute a,
    -                pg_class c,
    -                pg_type t,
    -                pg_namespace n,
    -                pg_namespace n1
    -            WHERE
    -                n.nspname = '%s'
    -                AND t.typnamespace = n.oid
    -                AND a.atttypid = t.oid
    -                AND c.oid = a.attrelid
    -                AND c.relnamespace = n1.oid
    -                AND c.relkind = 'r'
    -            ORDER BY
    -                n1.nspname, relname, attname, typname""" % schema.lower(), True)
    -
    -        info_(this, "*** Uninstalling MADlib ***", True)
    -        info_(this, "***********************************************************************************", True)
    -        info_(this, "* Schema %s and all database objects depending on it will be dropped!" % schema.upper(), True)
    -        if affected_objects:
    -            info_(this, "* If you continue the following data will be lost (schema : table.column : type):", True)
    -            for ao in affected_objects:
    -                info_(this, '* - ' + ao['schema'] + ' : ' + ao['relation'] + '.' +
    -                      ao['column'] + ' : ' + ao['type'], True)
    -        info_(this, "***********************************************************************************", True)
    -        info_(this, "Would you like to continue? [Y/N]", True)
    -        go = raw_input('>>> ').upper()
    -        while go != 'Y' and go != 'N':
    -            go = raw_input('Yes or No >>> ').upper()
    -
    -        # 2) Do the uninstall/drop
    -        if go == 'N':
    -            info_(this, 'No problem. Nothing dropped.', True)
    -            return
    -
    -        elif go == 'Y':
    -            info_(this, "> dropping schema %s" % schema.upper(), verbose)
    -            try:
    -                _internal_run_query("DROP SCHEMA %s CASCADE;" % (schema), True)
    -            except:
    -                error_(this, "Cannot drop schema %s." % schema.upper(), True)
    -
    -            info_(this, 'Schema %s (and all dependent objects) has been dropped.' % schema.upper(), True)
    -            info_(this, 'MADlib uninstalled successfully.', True)
    -
    -        else:
    -            return
    -
    -    # COMMAND: install/reinstall
    -    if args.command[0] in ('install', 'reinstall'):
    -        # Refresh MADlib version in DB, None for GP/PG
    -        if args.command[0] == 'reinstall':
    -            print "Setting MADlib database version to be None for reinstall"
    -            dbrev = None
    -
    -        info_(this, "*** Installing MADlib ***", True)
    -
    -        # 1) Compare OS and DB versions.
    -        # noop if OS <= DB.
    -        _print_revs(rev, dbrev, con_args, schema)
    -        if is_rev_gte(get_rev_num(dbrev), get_rev_num(rev)):
    -            info_(this, "Current MADlib version already up to date.", True)
    -            return
    -        # proceed to create objects if nothing installed in DB
    -        elif dbrev is None:
    -            pass
    -        # error and refer to upgrade if OS > DB
    -        else:
    -            error_(this, """Aborting installation: existing MADlib version detected in {0} schema
    -                    To upgrade the {0} schema to MADlib v{1} please run the following command:
    -                    madpack upgrade -s {0} -p {2} [-c ...]
    -                    """.format(schema, rev, portid), True)
    -
    -        # 2) Run installation
    -        try:
    -            _plpy_check(py_min_ver)
    -            _db_install(schema, dbrev, args.testcase)
    -        except:
    -            error_(this, "MADlib installation failed.", True)
    -
    -    # COMMAND: upgrade
    -    if args.command[0] in ('upgrade', 'update'):
    -        info_(this, "*** Upgrading MADlib ***", True)
    -        dbrev = get_madlib_dbrev(con_args, schema)
    -
    -        # 1) Check DB version. If None, nothing to upgrade.
    -        if not dbrev:
    -            info_(this, "MADlib is not installed in {schema} schema and there "
    -                  "is nothing to upgrade. Please use install "
    -                  "instead.".format(schema=schema.upper()),
    -                  True)
    -            return
    -
    -        # 2) Compare OS and DB versions. Continue if OS > DB.
    -        _print_revs(rev, dbrev, con_args, schema)
    -        if is_rev_gte(get_rev_num(dbrev), get_rev_num(rev)):
    -            info_(this, "Current MADlib version is already up-to-date.", True)
    -            return
    -
    -        if float('.'.join(dbrev.split('.')[0:2])) < 1.0:
    -            info_(this, "The version gap is too large, upgrade is supported only for "
    -                  "packages greater than or equal to v1.0.", True)
    -            return
    -
    -        # 3) Run upgrade
    -        try:
    -            _plpy_check(py_min_ver)
    -            _db_upgrade(schema, dbrev)
    -        except Exception as e:
    -            # Uncomment the following lines when debugging
    -            print "Exception: " + str(e)
    -            print sys.exc_info()
    -            traceback.print_tb(sys.exc_info()[2])
    -            error_(this, "MADlib upgrade failed.", True)
    -
         # COMMAND: install-check
         if args.command[0] == 'install-check':
    -
    -        # 1) Compare OS and DB versions. Continue if OS = DB.
    -        if get_rev_num(dbrev) != get_rev_num(rev):
    -            _print_revs(rev, dbrev, con_args, schema)
    -            info_(this, "Versions do not match. Install-check stopped.", True)
    -            return
    -
    -        # Create install-check user
    -        test_user = ('madlib_' +
    -                     rev.replace('.', '').replace('-', '_') +
    -                     '_installcheck')
    +        run_install_check(locals(), args.testcase)
    +    else:
             try:
    -            _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), False)
    +            is_schema_in_db = _internal_run_query("SELECT schema_name FROM information_schema.schemata WHERE schema_name='%s';" % schema, True)
             except:
    -            _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    -            _internal_run_query("DROP USER IF EXISTS %s;" % (test_user), True)
    -        _internal_run_query("CREATE USER %s;" % (test_user), True)
    -
    -        _internal_run_query("GRANT USAGE ON SCHEMA %s TO %s;" % (schema, test_user), True)
    -
    -        # 2) Run test SQLs
    -        info_(this, "> Running test scripts for:", verbose)
    -
    -        caseset = (set([test.strip() for test in args.testcase.split(',')])
    -                   if args.testcase != "" else set())
    -
    -        modset = {}
    -        for case in caseset:
    -            if case.find('/') > -1:
    -                [mod, algo] = case.split('/')
    -                if mod not in modset:
    -                    modset[mod] = []
    -                if algo not in modset[mod]:
    -                    modset[mod].append(algo)
    -            else:
    -                modset[case] = []
    -
    -        # Loop through all modules
    -        for moduleinfo in portspecs['modules']:
    -
    -            # Get module name
    -            module = moduleinfo['name']
    -
    -            # Skip if doesn't meet specified modules
    -            if modset is not None and len(modset) > 0 and module not in modset:
    -                continue
    -            # JIRA: MADLIB-1078 fix
    -            # Skip pmml during install-check (when run without the -t option).
    -            # We can still run install-check on pmml with '-t' option.
    -            if not modset and module in ['pmml']:
    -                continue
    -            info_(this, "> - %s" % module, verbose)
    -
    -            # Make a temp dir for this module (if doesn't exist)
    -            cur_tmpdir = tmpdir + '/' + module + '/test'  # tmpdir is a global variable
    -            _make_dir(cur_tmpdir)
    -
    -            # Find the Python module dir (platform specific or generic)
    -            if os.path.isdir(maddir + "/ports/" + portid + "/" + dbver + "/modules/" + module):
    -                maddir_mod_py = maddir + "/ports/" + portid + "/" + dbver + "/modules"
    +            error_(this, "Cannot validate if schema already exists.", True)
    +
    +        output_filename = tmpdir + "/madlib_{0}.sql".format(args.command[0])
    +        upgrade = False
    +        return_val = create_install_madlib_sqlfile(locals(), args.command[0], args.testcase)
    +        if return_val == 0:
    +            info_(this, "Installing MADlib modules...", True)
    +            _cleanup_comments_in_sqlfile(output_filename, upgrade)
    +            result = _run_sql_file(schema, output_filename)
    +
    +            if result == 'FAIL':
    +                info_(this, "MADlib {0} unsuccessful.".format(args.command[0]), True)
                 else:
    -                maddir_mod_py = maddir + "/modules"
    -
    -            # Find the SQL module dir (platform specific or generic)
    -            if os.path.isdir(maddir + "/ports/" + portid + "/modules/" + module):
    -                maddir_mod_sql = maddir + "/ports/" + portid + "/modules"
    -            else:
    -                maddir_mod_sql = maddir + "/modules"
    -
    -            # Prepare test schema
    -            test_schema = "madlib_installcheck_%s" % (module)
    -            _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE; CREATE SCHEMA %s;" %
    -                                (test_schema, test_schema), True)
    -            _internal_run_query("GRANT ALL ON SCHEMA %s TO %s;" %
    -                                (test_schema, test_user), True)
    -
    -            # Switch to test user and prepare the search_path
    -            pre_sql = '-- Switch to test user:\n' \
    -                      'SET ROLE %s;\n' \
    -                      '-- Set SEARCH_PATH for install-check:\n' \
    -                      'SET search_path=%s,%s;\n' \
    -                      % (test_user, test_schema, schema)
    -
    -            # Loop through all test SQL files for this module
    -            sql_files = maddir_mod_sql + '/' + module + '/test/*.sql_in'
    -            for sqlfile in sorted(glob.glob(sql_files), reverse=True):
    -                algoname = os.path.basename(sqlfile).split('.')[0]
    -                # run only algo specified
    -                if (module in modset and modset[module] and
    -                        algoname not in modset[module]):
    -                    continue
    -
    -                # Set file names
    -                tmpfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.tmp'
    -                logfile = cur_tmpdir + '/' + os.path.basename(sqlfile) + '.log'
    -
    -                # If there is no problem with the SQL file
    -                milliseconds = 0
    -
    -                # Run the SQL
    -                run_start = datetime.datetime.now()
    -                retval = _run_sql_file(schema, maddir_mod_py, module,
    -                                       sqlfile, tmpfile, logfile, pre_sql)
    -                # Runtime evaluation
    -                run_end = datetime.datetime.now()
    -                milliseconds = round((run_end - run_start).seconds * 1000 +
    -                                     (run_end - run_start).microseconds / 1000)
    -
    -                # Check the exit status
    -                if retval != 0:
    -                    result = 'FAIL'
    -                    keeplogs = True
    -                # Since every single statement in the test file gets logged,
    -                # an empty log file indicates an empty or a failed test
    -                elif os.path.isfile(logfile) and os.path.getsize(logfile) > 0:
    -                    result = 'PASS'
    -                # Otherwise
    -                else:
    -                    result = 'ERROR'
    -
    -                # Output result
    -                print "TEST CASE RESULT|Module: " + module + \
    -                    "|" + os.path.basename(sqlfile) + "|" + result + \
    -                    "|Time: %d milliseconds" % (milliseconds)
    -
    -                if result == 'FAIL':
    -                    error_(this, "Failed executing %s" % tmpfile, False)
    -                    error_(this, "Check the log at %s" % logfile, False)
    -            # Cleanup test schema for the module
    -            _internal_run_query("DROP SCHEMA IF EXISTS %s CASCADE;" % (test_schema), True)
    -
    -        # Drop install-check user
    -        _internal_run_query("DROP OWNED BY %s CASCADE;" % (test_user), True)
    -        _internal_run_query("DROP USER %s;" % (test_user), True)
    +                if args.command[0] != 'uninstall':
    +                    if args.command[0] == 'upgrade':
    +                        info_(this, "MADlib %s upgraded successfully in %s schema." % (str(rev), schema.upper()), True)
    +                    else:
    +                        info_(this, "> Created %s schema" % schema.upper(), True)
    --- End diff --
    
    why do we need to print `schema.upper()` ? This comment applies to all the other places where we print `schem.upper()` instead of just `schema` 


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192184979
  
    --- Diff: src/madpack/madpack.py ---
    @@ -131,10 +141,73 @@ def _get_relative_maddir(maddir, port):
             return maddir
     # ------------------------------------------------------------------------------
     
    +def _cleanup_comments_in_sqlfile(output_filename, upgrade):
    --- End diff --
    
    We refactored the code, and moved the actual regex related stuff to `madpack/utilities.py`.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191588071
  
    --- Diff: src/madpack/madpack.py ---
    @@ -131,10 +141,73 @@ def _get_relative_maddir(maddir, port):
             return maddir
     # ------------------------------------------------------------------------------
     
    +def _cleanup_comments_in_sqlfile(output_filename, upgrade):
    +    """
    +    @brief Remove comments in the sql script
    +    """
    +    if not upgrade:
    +        with open(output_filename, 'r+') as output_filehandle:
    +            full_sql = output_filehandle.read()
    +            pattern = re.compile(r"""(/\*(.|[\r\n])*?\*/)|(--(.*|[\r\n]))""")
    +            res = ''
    +            lines = re.split(r'[\r\n]+', full_sql)
    +            for line in lines:
    +                tmp = line
    +                if not tmp.strip().startswith("E'"):
    +                    line = re.sub(pattern, '', line)
    +                res += line + '\n'
    +            full_sql = res.strip()
    +            full_sql = re.sub(pattern, '', full_sql).strip()
    +        # Re-write the cleaned-up sql to a new file. Python does not let us
    +        # erase all the content of a file and rewrite the same file again.
    +        cleaned_output_filename = output_filename+'.tmp'
    +        with open(cleaned_output_filename, 'w') as output_filehandle:
    +            _write_to_file(output_filehandle, full_sql)
    +        # Move the cleaned output file to the old one.
    +        os.rename(cleaned_output_filename, output_filename)
    +
    +def _run_m4_and_append(schema, maddir_mod_py, module, sqlfile,
    +                       output_filehandle, pre_sql=None):
    +    """
    +    Function to process a sql file with M4.
    +    """
    +    # Check if the SQL file exists
    +    if not os.path.isfile(sqlfile):
    +        error_(this, "Missing module SQL file (%s)" % sqlfile, False)
    --- End diff --
    
    why do we need to call `error_`, isn't `ValueError` enough ?


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191603261
  
    --- Diff: src/madpack/madpack.py ---
    @@ -131,10 +141,73 @@ def _get_relative_maddir(maddir, port):
             return maddir
     # ------------------------------------------------------------------------------
     
    +def _cleanup_comments_in_sqlfile(output_filename, upgrade):
    +    """
    +    @brief Remove comments in the sql script
    +    """
    +    if not upgrade:
    +        with open(output_filename, 'r+') as output_filehandle:
    +            full_sql = output_filehandle.read()
    +            pattern = re.compile(r"""(/\*(.|[\r\n])*?\*/)|(--(.*|[\r\n]))""")
    +            res = ''
    +            lines = re.split(r'[\r\n]+', full_sql)
    +            for line in lines:
    +                tmp = line
    +                if not tmp.strip().startswith("E'"):
    +                    line = re.sub(pattern, '', line)
    +                res += line + '\n'
    +            full_sql = res.strip()
    +            full_sql = re.sub(pattern, '', full_sql).strip()
    +        # Re-write the cleaned-up sql to a new file. Python does not let us
    +        # erase all the content of a file and rewrite the same file again.
    +        cleaned_output_filename = output_filename+'.tmp'
    +        with open(cleaned_output_filename, 'w') as output_filehandle:
    +            _write_to_file(output_filehandle, full_sql)
    +        # Move the cleaned output file to the old one.
    +        os.rename(cleaned_output_filename, output_filename)
    +
    +def _run_m4_and_append(schema, maddir_mod_py, module, sqlfile,
    +                       output_filehandle, pre_sql=None):
    +    """
    +    Function to process a sql file with M4.
    +    """
    +    # Check if the SQL file exists
    +    if not os.path.isfile(sqlfile):
    +        error_(this, "Missing module SQL file (%s)" % sqlfile, False)
    +        raise ValueError("Missing module SQL file (%s)" % sqlfile)
     
    -def _run_sql_file(schema, maddir_mod_py, module, sqlfile,
    -                  tmpfile, logfile, pre_sql, upgrade=False,
    -                  sc=None):
    +    # Prepare the file using M4
    +    try:
    +        # Add the before SQL
    +        if pre_sql:
    +            output_filehandle.writelines([pre_sql, '\n\n'])
    +        # Find the madpack dir (platform specific or generic)
    +        if os.path.isdir(maddir + "/ports/" + portid + "/" + dbver + "/madpack"):
    +            maddir_madpack = maddir + "/ports/" + portid + "/" + dbver + "/madpack"
    +        else:
    +            maddir_madpack = maddir + "/madpack"
    +        maddir_ext_py = maddir + "/lib/python"
    +
    +        m4args = ['m4',
    +                  '-P',
    +                  '-DMADLIB_SCHEMA=' + schema,
    +                  '-DPLPYTHON_LIBDIR=' + maddir_mod_py,
    +                  '-DEXT_PYTHON_LIBDIR=' + maddir_ext_py,
    +                  '-DMODULE_PATHNAME=' + maddir_lib,
    +                  '-DMODULE_NAME=' + module,
    +                  '-I' + maddir_madpack,
    +                  sqlfile]
    +
    +        info_(this, "> ... parsing: " + " ".join(m4args), verbose)
    +        output_filehandle.flush()
    +        subprocess.call(m4args, stdout=output_filehandle)
    +    except:
    +        error_(this, "Failed executing m4 on %s" % sqlfile, False)
    +        raise Exception
    +
    +def _run_sql_file_install_check(schema, maddir_mod_py, module, sqlfile,
    +                                tmpfile, logfile, pre_sql, upgrade=False,
    +                                sc=None):
    --- End diff --
    
    we don't really need the two optional params since `_run_sql_file_install_check` is only called once.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191581891
  
    --- Diff: src/madpack/madpack.py ---
    @@ -95,6 +95,16 @@ def _internal_run_query(sql, show_error):
         return run_query(sql, con_args, show_error)
     # ------------------------------------------------------------------------------
     
    +def _write_to_file(handle, sql, show_error=False):
    --- End diff --
    
    `show_error `arg is never used


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192186350
  
    --- Diff: src/madpack/madpack.py ---
    @@ -131,10 +141,73 @@ def _get_relative_maddir(maddir, port):
             return maddir
     # ------------------------------------------------------------------------------
     
    +def _cleanup_comments_in_sqlfile(output_filename, upgrade):
    +    """
    +    @brief Remove comments in the sql script
    +    """
    +    if not upgrade:
    +        with open(output_filename, 'r+') as output_filehandle:
    +            full_sql = output_filehandle.read()
    +            pattern = re.compile(r"""(/\*(.|[\r\n])*?\*/)|(--(.*|[\r\n]))""")
    +            res = ''
    +            lines = re.split(r'[\r\n]+', full_sql)
    +            for line in lines:
    +                tmp = line
    +                if not tmp.strip().startswith("E'"):
    +                    line = re.sub(pattern, '', line)
    +                res += line + '\n'
    +            full_sql = res.strip()
    +            full_sql = re.sub(pattern, '', full_sql).strip()
    +        # Re-write the cleaned-up sql to a new file. Python does not let us
    +        # erase all the content of a file and rewrite the same file again.
    +        cleaned_output_filename = output_filename+'.tmp'
    +        with open(cleaned_output_filename, 'w') as output_filehandle:
    +            _write_to_file(output_filehandle, full_sql)
    +        # Move the cleaned output file to the old one.
    +        os.rename(cleaned_output_filename, output_filename)
    +
    +def _run_m4_and_append(schema, maddir_mod_py, module, sqlfile,
    +                       output_filehandle, pre_sql=None):
    +    """
    +    Function to process a sql file with M4.
    +    """
    +    # Check if the SQL file exists
    +    if not os.path.isfile(sqlfile):
    +        error_(this, "Missing module SQL file (%s)" % sqlfile, False)
    --- End diff --
    
    We removed the error message from `ValueError`, while we still print the message from `error_()`.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191586806
  
    --- Diff: src/madpack/madpack.py ---
    @@ -131,10 +141,73 @@ def _get_relative_maddir(maddir, port):
             return maddir
     # ------------------------------------------------------------------------------
     
    +def _cleanup_comments_in_sqlfile(output_filename, upgrade):
    --- End diff --
    
    This function does more than just cleaning up the comments. It also creates a new file with the cleaned contents. Maybe rename the function to reflect this ?
    



---

[GitHub] madlib issue #271: Madpack: Make install, reinstall and upgrade atomic

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on the issue:

    https://github.com/apache/madlib/pull/271
  
    A few other observations from madpack output
    
    1. While uninstalling , this message is printed on the console’
    `madpack.py: INFO : Installing MADlib modules` We should instead say that we are uninstalling modules.
    
    1. In case of failure, the output looks something like `MADlib upgrade unsuccessful.` It would be nice to also say that everything was rolled back.
    
    1. I also noticed that the output messages have changed a bit for install/uninstall/reinstall and upgrade
    
    old output 
    ```
    madpack.py: INFO : Installing MADlib into MADLIB schema...
    madpack.py: INFO : > Creating MADLIB schema
    madpack.py: INFO : > Creating MADLIB.MigrationHistory table
    madpack.py: INFO : > Writing version info in MigrationHistory table
    madpack.py: INFO : > Creating objects for modules:
    madpack.py: INFO : > - array_ops
    madpack.py: INFO : > - bayes
    ...
    ...
    madpack.py: INFO : > - validation
    madpack.py: INFO : MADlib 1.13 installed successfully in MADLIB schema.
    ``` 
    
    new output
    ```
    madpack.py: INFO : Testing PL/Python environment...
    madpack.py: INFO : > Creating language PL/Python...
    madpack.py: INFO : > PL/Python environment OK (version: 2.7.14)
    madpack.py: INFO : > Preparing objects for modules:
    madpack.py: INFO : > - array_ops
    madpack.py: INFO : > - bayes
    ...
    ...
    madpack.py: INFO : > - validation
    madpack.py: INFO : Installing MADlib modules...
    madpack.py: INFO : > Created MADLIB schema
    madpack.py: INFO : > Created MADLIB.MigrationHistory table
    madpack.py: INFO : > Wrote version info in MigrationHistory table
    madpack.py: INFO : MADlib 1.15-dev installed successfully in MADLIB schema.
    ```
    I think the previous message was better because all the modules were printed after `Installing MADlib into MADLIB schema` instead of after `Preparing objects for modules`. And with the new output`Installing MADlib modules...` is followed by `Created madlib schema` which looks a bit weird.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191609335
  
    --- Diff: src/madpack/madpack.py ---
    @@ -537,9 +629,8 @@ def _db_upgrade(schema, dbrev):
         ch.drop_changed_udf()
         ch.drop_changed_udt()  # assume dependent udf for udt does not change
         ch.drop_traininginfo_4dt()  # used types: oid, text, integer, float
    -    _db_create_objects(schema, None, True, sc)
    -
    -    info_(this, "MADlib %s upgraded successfully in %s schema." % (str(rev), schema.upper()), True)
    +    _db_create_objects(schema, filehandle, True, sc)
    --- End diff --
    
    can we rewrite this as 
    ```python
    _db_create_objects(schema, filehandle, upgrade=True, sc=sc)
    ```


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by njayaram2 <gi...@git.apache.org>.
Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r192178673
  
    --- Diff: src/madpack/utilities.py ---
    @@ -33,6 +33,23 @@
     this = os.path.basename(sys.argv[0])    # name of this script
     
     
    +class AtomicFileOpen:
    --- End diff --
    
    It's not used, will remove it.


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191580904
  
    --- Diff: src/madpack/madpack.py ---
    @@ -95,6 +95,16 @@ def _internal_run_query(sql, show_error):
         return run_query(sql, con_args, show_error)
     # ------------------------------------------------------------------------------
     
    +def _write_to_file(handle, sql, show_error=False):
    --- End diff --
    
    There's another `_write_to_file` function in upgrade_util.py, can we delete one of them ?


---

[GitHub] madlib pull request #271: Madpack: Make install, reinstall and upgrade atomi...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/271#discussion_r191604722
  
    --- Diff: src/madpack/madpack.py ---
    @@ -95,6 +95,16 @@ def _internal_run_query(sql, show_error):
         return run_query(sql, con_args, show_error)
     # ------------------------------------------------------------------------------
     
    +def _write_to_file(handle, sql, show_error=False):
    +    handle.write(sql)
    +    handle.write('\n')
    +# ------------------------------------------------------------------------------
    +
    +
    +def _merge_to_file(handle, other_file, show_error=False):
    --- End diff --
    
    no one calls the  `_merge_to_file` function


---