You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by orhankislal <gi...@git.apache.org> on 2018/09/04 12:52:29 UTC

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

GitHub user orhankislal opened a pull request:

    https://github.com/apache/madlib/pull/318

    Madpack: Add a script for automating changelist creation

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib madpack/auto-changelist

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/318.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #318
    
----
commit 28528b21ecb9b53d03de7683ff7e8db2bb409675
Author: Orhan Kislal <ok...@...>
Date:   2018-09-04T12:50:50Z

    Madpack: Add a script for automating changelist creation

----


---

[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/318
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/679/



---

[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/318
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/681/



---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r217842978
  
    --- Diff: src/madpack/create_changelist.py ---
    @@ -0,0 +1,239 @@
    +#!/usr/bin/python
    +# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +
    +# Create changelist for any two branches/tags
    +
    +# Prequisites:
    +# The old version has to be installed in the "madlib_old_vers" schema
    +# The new version has to be installed in the "madlib" (default) schema
    +# Two branches/tags must exist locally (run 'git fetch' to ensure you have the latest version)
    +# The current branch does not matter
    +
    +# Usage (must be executed in the src/madpack directory):
    +# python create_changelist.py <database name> <old version branch> <new version branch> <changelist filename>
    +# If you are using the master branch, plase make sure to edit the branch/tag in the output file
    +
    +# Example (should be equivalent to changelist_1.13_1.14.yaml):
    +# python create_changelist.py madlib rel/v1.13 rel/v1.14 chtest1.yaml
    +
    +import sys
    +import os
    +
    +database = sys.argv[1]
    +old_vers = sys.argv[2]
    +new_vers = sys.argv[3]
    +ch_filename = sys.argv[4]
    +
    +if os.path.exists(ch_filename):
    +    print "{0} already exists".format(ch_filename)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -l > /dev/null""".format(database))
    +if err1 != 0:
    +    print "Database {0} does not exist".format(database)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -c "select madlib_old_vers.version()" > /dev/null
    +                 """.format(database))
    +if err1 != 0:
    +    print "MADlib is not installed in the madlib_old_vers schema. Please refer to the Prequisites."
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -c "select madlib.version()" > /dev/null
    +                 """.format(database))
    +if err1 != 0:
    +    print "MADlib is not installed in the madlib schema. Please refer to the Prequisites."
    +    raise SystemExit
    +
    +print "Creating changelist {0}".format(ch_filename)
    +os.system("rm -f /tmp/madlib_tmp_nm.txt /tmp/madlib_tmp_udf.txt /tmp/madlib_tmp_udt.txt")
    +try:
    +    # Find the new modules using the git diff
    +    err1 = os.system("git diff {old_vers} {new_vers} --name-only --diff-filter=A > /tmp/madlib_tmp_nm.txt".format(**locals()))
    +    if err1 != 0:
    +        print "Git diff failed. Please ensure that branches/tags are fetched."
    +        raise SystemExit
    +
    +    f = open("/tmp/madlib_tmp_cl.yaml", "w")
    +    f.write(
    +"""# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +""")
    +
    +    f.write(
    +    """
    +# Changelist for MADlib version {old_vers} to {new_vers}
    +
    +# This file contains all changes that were introduced in a new version of
    +# MADlib. This changelist is used by the upgrade script to detect what objects
    +# should be upgraded (while retaining all other objects from the previous version)
    +
    +# New modules (actually .sql_in files) added in upgrade version
    +# For these files the sql_in code is retained as is with the functions in the
    +# file installed on the upgrade version. All other files (that don't have
    +# updates), are cleaned up to remove object replacements
    +""".format(**locals()))
    +
    +    # Find the new .sql_in files that are not in test folders
    +    f.write("new module:\n")
    +    with open('/tmp/madlib_tmp_nm.txt') as fp:
    +        for line in fp:
    +            if 'sql_in' in line and '/test/' not in line:
    +                 f.write('    ' + line.split('/')[5].split('.')[0]+':\n')
    +
    +    # Find the changed types and keep a list for future use
    +    os.system("psql {0} -f diff_udt.sql > /tmp/madlib_tmp_udt.txt".format(database))
    +
    +    f.write("\n# Changes in the types (UDT) including removal and modification\n")
    +    f.write("udt:\n")
    +    udt_list=[]
    +    with open('/tmp/madlib_tmp_udt.txt') as fp:
    +        for line in fp:
    +           if 'UDT' in line and '[]' not in line:
    +                ch_type = line.split('|')[1].strip()
    +                udt_list.append(ch_type)
    +                f.write('    ' + ch_type +":\n")
    +
    +    # Find the list of UDFs and UDAs
    +    # There are two main sources for these lists.
    +    # 1. The functions that actually got changed
    +    # 2. The functions that depend on a changed type
    +
    +    # We will keep two lists (for UDF and UDA) and fill them as we parse the
    +    # output of diff functions
    +
    +    udf_list=[]
    +    uda_list=[]
    +    current_list = udf_list
    --- End diff --
    
    Do we really need the current_list variable? The code will look a bit simpler if we just append to the right list whenever necessary. 


---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r217842739
  
    --- Diff: src/madpack/create_changelist.py ---
    @@ -0,0 +1,239 @@
    +#!/usr/bin/python
    +# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +
    +# Create changelist for any two branches/tags
    +
    +# Prequisites:
    +# The old version has to be installed in the "madlib_old_vers" schema
    +# The new version has to be installed in the "madlib" (default) schema
    +# Two branches/tags must exist locally (run 'git fetch' to ensure you have the latest version)
    +# The current branch does not matter
    +
    +# Usage (must be executed in the src/madpack directory):
    +# python create_changelist.py <database name> <old version branch> <new version branch> <changelist filename>
    +# If you are using the master branch, plase make sure to edit the branch/tag in the output file
    +
    +# Example (should be equivalent to changelist_1.13_1.14.yaml):
    +# python create_changelist.py madlib rel/v1.13 rel/v1.14 chtest1.yaml
    +
    +import sys
    +import os
    +
    +database = sys.argv[1]
    +old_vers = sys.argv[2]
    +new_vers = sys.argv[3]
    +ch_filename = sys.argv[4]
    +
    +if os.path.exists(ch_filename):
    +    print "{0} already exists".format(ch_filename)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -l > /dev/null""".format(database))
    +if err1 != 0:
    +    print "Database {0} does not exist".format(database)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -c "select madlib_old_vers.version()" > /dev/null
    +                 """.format(database))
    +if err1 != 0:
    +    print "MADlib is not installed in the madlib_old_vers schema. Please refer to the Prequisites."
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -c "select madlib.version()" > /dev/null
    +                 """.format(database))
    +if err1 != 0:
    +    print "MADlib is not installed in the madlib schema. Please refer to the Prequisites."
    +    raise SystemExit
    +
    +print "Creating changelist {0}".format(ch_filename)
    +os.system("rm -f /tmp/madlib_tmp_nm.txt /tmp/madlib_tmp_udf.txt /tmp/madlib_tmp_udt.txt")
    +try:
    +    # Find the new modules using the git diff
    +    err1 = os.system("git diff {old_vers} {new_vers} --name-only --diff-filter=A > /tmp/madlib_tmp_nm.txt".format(**locals()))
    +    if err1 != 0:
    +        print "Git diff failed. Please ensure that branches/tags are fetched."
    +        raise SystemExit
    +
    +    f = open("/tmp/madlib_tmp_cl.yaml", "w")
    +    f.write(
    +"""# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +""")
    +
    +    f.write(
    +    """
    +# Changelist for MADlib version {old_vers} to {new_vers}
    +
    +# This file contains all changes that were introduced in a new version of
    +# MADlib. This changelist is used by the upgrade script to detect what objects
    +# should be upgraded (while retaining all other objects from the previous version)
    +
    +# New modules (actually .sql_in files) added in upgrade version
    +# For these files the sql_in code is retained as is with the functions in the
    +# file installed on the upgrade version. All other files (that don't have
    +# updates), are cleaned up to remove object replacements
    +""".format(**locals()))
    +
    +    # Find the new .sql_in files that are not in test folders
    +    f.write("new module:\n")
    +    with open('/tmp/madlib_tmp_nm.txt') as fp:
    +        for line in fp:
    +            if 'sql_in' in line and '/test/' not in line:
    +                 f.write('    ' + line.split('/')[5].split('.')[0]+':\n')
    +
    +    # Find the changed types and keep a list for future use
    +    os.system("psql {0} -f diff_udt.sql > /tmp/madlib_tmp_udt.txt".format(database))
    +
    +    f.write("\n# Changes in the types (UDT) including removal and modification\n")
    +    f.write("udt:\n")
    +    udt_list=[]
    +    with open('/tmp/madlib_tmp_udt.txt') as fp:
    +        for line in fp:
    +           if 'UDT' in line and '[]' not in line:
    --- End diff --
    
    Can we add a comment explaining the condition in this if check ?


---

[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/318
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/668/



---

[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on the issue:

    https://github.com/apache/madlib/pull/318
  
    I ran the utility with 1.13 and 1.14 as the release tags and noticed that the output file created by this utility isn't the same as the one checked in `changelist_1.13_1.14.yaml`. For ex, all the summary functions are missing from the udf section. Also the udt section has an extra entry `mlp_step_result[]:`


---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r216566555
  
    --- Diff: src/madpack/create_changelist.py ---
    @@ -0,0 +1,132 @@
    +#!/usr/bin/python
    +# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +
    +# Create changelist for any two branches
    +
    +# Prequisites:
    +# The old version has to be installed in the "madlib_old_vers" schema
    +# The new version has to be installed in the "madlib" (default) schema
    +# Two branches must exist locally (run 'git fetch' to ensure you have the latest version)
    --- End diff --
    
    They can be branches as well as tags.


---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r216443032
  
    --- Diff: src/madpack/create_changelist.py ---
    @@ -0,0 +1,132 @@
    +#!/usr/bin/python
    +# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +
    +# Create changelist for any two branches
    +
    +# Prequisites:
    +# The old version has to be installed in the "madlib_old_vers" schema
    --- End diff --
    
    Consider adding a sanity check to test if both the schemas exist. The utility does print an error message but still creates a changelist file which may be misleading. This is true for any failure. Ideally we shouldn't create the changelist file if there was a failure. 
    



---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r217361584
  
    --- Diff: src/madpack/create_changelist.py ---
    @@ -0,0 +1,229 @@
    +#!/usr/bin/python
    +# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +
    +# Create changelist for any two branches/tags
    +
    +# Prequisites:
    +# The old version has to be installed in the "madlib_old_vers" schema
    +# The new version has to be installed in the "madlib" (default) schema
    +# Two branches/tags must exist locally (run 'git fetch' to ensure you have the latest version)
    +# The current branch does not matter
    +
    +# Usage (must be executed in the src/madpack directory):
    +# python create_changelist.py <database name> <old version branch> <new version branch> <changelist filename>
    +
    +# Example (should be equivalent to changelist_1.13_1.14.yaml):
    +# python create_changelist.py madlib rel/v1.13 rel/v1.14 chtest1.yaml
    +
    +import sys
    +import os
    +
    +database = sys.argv[1]
    +old_vers = sys.argv[2]
    +new_vers = sys.argv[3]
    +ch_filename = sys.argv[4]
    +
    +if os.path.exists(ch_filename):
    +    print "{0} already exists".format(ch_filename)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -l > /dev/null""".format(database))
    +if err1 != 0:
    +    print "Database {0} does not exist".format(old_vers)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -c "select madlib_old_vers.version()" > /dev/null
    +                 """.format(database))
    +if err1 != 0:
    +    print "Schema {0} does not exist".format(old_vers)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -c "select madlib.version()" > /dev/null
    +                 """.format(database))
    +if err1 != 0:
    +    print "Schema {0} does not exist".format(new_vers)
    +    raise SystemExit
    +
    --- End diff --
    
    That would be tricky for branches. `madlib.version()` gives a <tag-commit count-commit tag> on a branch (and just a tag on tags). We can use `git rev-parse --short HEAD` to get the commit tag but it seems complicating the code for handholding the developer that uses it. 


---

[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/318
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/673/



---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r217143101
  
    --- Diff: src/madpack/create_changelist.py ---
    @@ -0,0 +1,229 @@
    +#!/usr/bin/python
    +# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +
    +# Create changelist for any two branches/tags
    +
    +# Prequisites:
    +# The old version has to be installed in the "madlib_old_vers" schema
    +# The new version has to be installed in the "madlib" (default) schema
    +# Two branches/tags must exist locally (run 'git fetch' to ensure you have the latest version)
    +# The current branch does not matter
    +
    +# Usage (must be executed in the src/madpack directory):
    +# python create_changelist.py <database name> <old version branch> <new version branch> <changelist filename>
    +
    +# Example (should be equivalent to changelist_1.13_1.14.yaml):
    +# python create_changelist.py madlib rel/v1.13 rel/v1.14 chtest1.yaml
    +
    +import sys
    +import os
    +
    +database = sys.argv[1]
    +old_vers = sys.argv[2]
    +new_vers = sys.argv[3]
    +ch_filename = sys.argv[4]
    +
    +if os.path.exists(ch_filename):
    +    print "{0} already exists".format(ch_filename)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -l > /dev/null""".format(database))
    +if err1 != 0:
    +    print "Database {0} does not exist".format(old_vers)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -c "select madlib_old_vers.version()" > /dev/null
    +                 """.format(database))
    +if err1 != 0:
    +    print "Schema {0} does not exist".format(old_vers)
    --- End diff --
    
    This error message prints the branch name instead of the schema name. Same with the previous and next error checks. Maybe we should change the var name to be more descriptive


---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r216443402
  
    --- Diff: src/madpack/create_changelist.py ---
    @@ -0,0 +1,132 @@
    +#!/usr/bin/python
    +# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +
    +# Create changelist for any two branches
    +
    +# Prequisites:
    +# The old version has to be installed in the "madlib_old_vers" schema
    +# The new version has to be installed in the "madlib" (default) schema
    +# Two branches must exist locally (run 'git fetch' to ensure you have the latest version)
    --- End diff --
    
    Can we expand on this comment ? Also mention that the old/new version branches should be the release tags and not branches.
    



---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/madlib/pull/318


---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r216438294
  
    --- Diff: src/madpack/diff_udf.sql ---
    @@ -142,9 +142,12 @@ DROP TABLE IF EXISTS functions_madlib_new_version;
     SELECT get_functions('madlib_old_vers');
     
     SELECT
    +    type,
         --'\t-' || name || ':' || '\n\t\t-rettype: ' || retype || '\n\t\t-argument: ' || argtypes
    -    '    - ' || name || ':' || '\n        rettype: ' || retype || '\n        argument: ' || argtypes AS "Dropped UDFs"
    -    , type
    +    '    - ' || name || ':' AS "Dropped UDF part1",
    --- End diff --
    
    I think it would be helpful to add comments in the commit description explaining the reasoning behind modifying the `diff_udf.sql` and `diff_udt.sql`. Just looking at these changes, it's hard to know why for ex `rettype` was removed and why we need `part1` at the end


---

[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on the issue:

    https://github.com/apache/madlib/pull/318
  
    @kaknikhil I checked the 1.11 -> 1.12 scenario. The tool is missing the`tree_train` and `forest_train` entries. The change seems to be the removal of `surrogate_params` and the addition of `null_handling_params`. Since both of them are `text` type, it does not get picked up by the `diff_udf.sql` script. 


---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r217152555
  
    --- Diff: src/madpack/create_changelist.py ---
    @@ -0,0 +1,229 @@
    +#!/usr/bin/python
    +# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +
    +# Create changelist for any two branches/tags
    +
    +# Prequisites:
    +# The old version has to be installed in the "madlib_old_vers" schema
    +# The new version has to be installed in the "madlib" (default) schema
    +# Two branches/tags must exist locally (run 'git fetch' to ensure you have the latest version)
    +# The current branch does not matter
    +
    +# Usage (must be executed in the src/madpack directory):
    +# python create_changelist.py <database name> <old version branch> <new version branch> <changelist filename>
    +
    +# Example (should be equivalent to changelist_1.13_1.14.yaml):
    +# python create_changelist.py madlib rel/v1.13 rel/v1.14 chtest1.yaml
    +
    +import sys
    +import os
    +
    +database = sys.argv[1]
    +old_vers = sys.argv[2]
    +new_vers = sys.argv[3]
    +ch_filename = sys.argv[4]
    +
    +if os.path.exists(ch_filename):
    +    print "{0} already exists".format(ch_filename)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -l > /dev/null""".format(database))
    +if err1 != 0:
    +    print "Database {0} does not exist".format(old_vers)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -c "select madlib_old_vers.version()" > /dev/null
    +                 """.format(database))
    +if err1 != 0:
    +    print "Schema {0} does not exist".format(old_vers)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -c "select madlib.version()" > /dev/null
    +                 """.format(database))
    +if err1 != 0:
    +    print "Schema {0} does not exist".format(new_vers)
    +    raise SystemExit
    +
    --- End diff --
    
    Can we add another error check to ensure that the `madlib_old_vers.schema()` is the same as `old_vers branch/tag`? This does assume that the branch/tag will always be a release tag.


---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r216567704
  
    --- Diff: src/madpack/diff_udf.sql ---
    @@ -142,9 +142,12 @@ DROP TABLE IF EXISTS functions_madlib_new_version;
     SELECT get_functions('madlib_old_vers');
     
     SELECT
    +    type,
         --'\t-' || name || ':' || '\n\t\t-rettype: ' || retype || '\n\t\t-argument: ' || argtypes
    -    '    - ' || name || ':' || '\n        rettype: ' || retype || '\n        argument: ' || argtypes AS "Dropped UDFs"
    -    , type
    +    '    - ' || name || ':' AS "Dropped UDF part1",
    --- End diff --
    
    `rettype` was not removed, I just changed the column names to `Dropped UDF part1` format so that I can easily parse them. The newline characters were just complicating the output for no reason.


---

[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on the issue:

    https://github.com/apache/madlib/pull/318
  
    @orhankislal The new changes look good. Apart from the 2 minor comments LGTM +1 


---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r217358090
  
    --- Diff: src/madpack/diff_udf.sql ---
    @@ -11,71 +11,13 @@ RETURNS text AS $$
     $$ LANGUAGE plpythonu;
     
     
    -CREATE OR REPLACE FUNCTION get_functions(schema_name text)
    +CREATE OR REPLACE FUNCTION get_functions(table_name text, schema_name text,
    +                                         type_filter text)
     RETURNS VOID AS
     $$
         import plpy
         plpy.execute("""
    -        CREATE TABLE functions_madlib_new_version AS
    -        SELECT
    -            "schema", "name", filter_schema("retype", 'madlib') retype,
    -            filter_schema("argtypes", 'madlib') argtypes, "type"
    -        FROM
    -        (
    -
    -            SELECT n.nspname as "schema",
    --- End diff --
    
    It was duplicated code. I just added another parameter to the function and reused it.


---

[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on the issue:

    https://github.com/apache/madlib/pull/318
  
    Thanks for the comments @kaknikhil. I addressed them and added the support for return type based dependencies. It would be great if you could take another look at the latest version. 


---

[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on the issue:

    https://github.com/apache/madlib/pull/318
  
    Thanks for the review @kaknikhil .
    
    1. We had an issue with the knn help messages during one of the releases because they didn't show up on the `diff_udf.sql` output. IIRC, the problem was that the same function was converted from a pure SQL function to a plpython function. I don't have a quick solution to identify such cases.
    2. This indentation should be OK.
    3. We can check if the `new_vers` tag/branch exists and use `master` in its place. Alternatively, we can change the comment to avoid using the new version. It does not have any functional value.


---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r217227453
  
    --- Diff: src/madpack/diff_udf.sql ---
    @@ -11,71 +11,13 @@ RETURNS text AS $$
     $$ LANGUAGE plpythonu;
     
     
    -CREATE OR REPLACE FUNCTION get_functions(schema_name text)
    +CREATE OR REPLACE FUNCTION get_functions(table_name text, schema_name text,
    +                                         type_filter text)
     RETURNS VOID AS
     $$
         import plpy
         plpy.execute("""
    -        CREATE TABLE functions_madlib_new_version AS
    -        SELECT
    -            "schema", "name", filter_schema("retype", 'madlib') retype,
    -            filter_schema("argtypes", 'madlib') argtypes, "type"
    -        FROM
    -        (
    -
    -            SELECT n.nspname as "schema",
    --- End diff --
    
    Why was this piece of code removed ? 


---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by orhankislal <gi...@git.apache.org>.
Github user orhankislal commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r217953051
  
    --- Diff: src/madpack/create_changelist.py ---
    @@ -0,0 +1,239 @@
    +#!/usr/bin/python
    +# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +
    +# Create changelist for any two branches/tags
    +
    +# Prequisites:
    +# The old version has to be installed in the "madlib_old_vers" schema
    +# The new version has to be installed in the "madlib" (default) schema
    +# Two branches/tags must exist locally (run 'git fetch' to ensure you have the latest version)
    +# The current branch does not matter
    +
    +# Usage (must be executed in the src/madpack directory):
    +# python create_changelist.py <database name> <old version branch> <new version branch> <changelist filename>
    +# If you are using the master branch, plase make sure to edit the branch/tag in the output file
    +
    +# Example (should be equivalent to changelist_1.13_1.14.yaml):
    +# python create_changelist.py madlib rel/v1.13 rel/v1.14 chtest1.yaml
    +
    +import sys
    +import os
    +
    +database = sys.argv[1]
    +old_vers = sys.argv[2]
    +new_vers = sys.argv[3]
    +ch_filename = sys.argv[4]
    +
    +if os.path.exists(ch_filename):
    +    print "{0} already exists".format(ch_filename)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -l > /dev/null""".format(database))
    +if err1 != 0:
    +    print "Database {0} does not exist".format(database)
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -c "select madlib_old_vers.version()" > /dev/null
    +                 """.format(database))
    +if err1 != 0:
    +    print "MADlib is not installed in the madlib_old_vers schema. Please refer to the Prequisites."
    +    raise SystemExit
    +
    +err1 = os.system("""psql {0} -c "select madlib.version()" > /dev/null
    +                 """.format(database))
    +if err1 != 0:
    +    print "MADlib is not installed in the madlib schema. Please refer to the Prequisites."
    +    raise SystemExit
    +
    +print "Creating changelist {0}".format(ch_filename)
    +os.system("rm -f /tmp/madlib_tmp_nm.txt /tmp/madlib_tmp_udf.txt /tmp/madlib_tmp_udt.txt")
    +try:
    +    # Find the new modules using the git diff
    +    err1 = os.system("git diff {old_vers} {new_vers} --name-only --diff-filter=A > /tmp/madlib_tmp_nm.txt".format(**locals()))
    +    if err1 != 0:
    +        print "Git diff failed. Please ensure that branches/tags are fetched."
    +        raise SystemExit
    +
    +    f = open("/tmp/madlib_tmp_cl.yaml", "w")
    +    f.write(
    +"""# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +""")
    +
    +    f.write(
    +    """
    +# Changelist for MADlib version {old_vers} to {new_vers}
    +
    +# This file contains all changes that were introduced in a new version of
    +# MADlib. This changelist is used by the upgrade script to detect what objects
    +# should be upgraded (while retaining all other objects from the previous version)
    +
    +# New modules (actually .sql_in files) added in upgrade version
    +# For these files the sql_in code is retained as is with the functions in the
    +# file installed on the upgrade version. All other files (that don't have
    +# updates), are cleaned up to remove object replacements
    +""".format(**locals()))
    +
    +    # Find the new .sql_in files that are not in test folders
    +    f.write("new module:\n")
    +    with open('/tmp/madlib_tmp_nm.txt') as fp:
    +        for line in fp:
    +            if 'sql_in' in line and '/test/' not in line:
    +                 f.write('    ' + line.split('/')[5].split('.')[0]+':\n')
    +
    +    # Find the changed types and keep a list for future use
    +    os.system("psql {0} -f diff_udt.sql > /tmp/madlib_tmp_udt.txt".format(database))
    +
    +    f.write("\n# Changes in the types (UDT) including removal and modification\n")
    +    f.write("udt:\n")
    +    udt_list=[]
    +    with open('/tmp/madlib_tmp_udt.txt') as fp:
    +        for line in fp:
    +           if 'UDT' in line and '[]' not in line:
    +                ch_type = line.split('|')[1].strip()
    +                udt_list.append(ch_type)
    +                f.write('    ' + ch_type +":\n")
    +
    +    # Find the list of UDFs and UDAs
    +    # There are two main sources for these lists.
    +    # 1. The functions that actually got changed
    +    # 2. The functions that depend on a changed type
    +
    +    # We will keep two lists (for UDF and UDA) and fill them as we parse the
    +    # output of diff functions
    +
    +    udf_list=[]
    +    uda_list=[]
    +    current_list = udf_list
    --- End diff --
    
    We don't "need" the current list but it helps us to reduce the number of copy/pasted lines.


---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r216438799
  
    --- Diff: src/madpack/diff_udt.sql ---
    @@ -119,5 +119,5 @@ FROM
     WHERE old_vers.typrelid <> 0; -- 0 means base type
     
     SELECT
    -    array_upper(detect_changed_types('types_common'), 1) AS N,
    -    detect_changed_types('types_common') AS "Changed UDTs";
    +    --array_upper(detect_changed_types('types_common'), 1) AS N,
    --- End diff --
    
    consider removing the comment if not necessary 


---

[GitHub] madlib issue #318: Madpack: Add a script for automating changelist creation

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on the issue:

    https://github.com/apache/madlib/pull/318
  
    I tested the utility with the new changes but I am still reviewing the code.Here are my observations so far:
    
    1. I noticed that there were a few indentation changes. I am assuming that this will not affect upgrade testing.
    1. We will run this utility before tagging the new release. So for ex if the previous release was 1.15 and the new release is 1.16, then the utility will be run when the version is 1.16-dev on the master branch. In this case, how should the utility be run? Should it be run against the master branch for the latest release? This is important because the branch name is used for the description message `Changelist for MADlib version old_branch/tag to new_branch/tag`



---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r216443112
  
    --- Diff: src/madpack/create_changelist.py ---
    @@ -0,0 +1,132 @@
    +#!/usr/bin/python
    +# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +
    +# Create changelist for any two branches
    +
    +# Prequisites:
    +# The old version has to be installed in the "madlib_old_vers" schema
    +# The new version has to be installed in the "madlib" (default) schema
    +# Two branches must exist locally (run 'git fetch' to ensure you have the latest version)
    --- End diff --
    
    Also mention that this utility must be run from the src/madpack directory otherwise it will fail.
    



---

[GitHub] madlib pull request #318: Madpack: Add a script for automating changelist cr...

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/318#discussion_r216444473
  
    --- Diff: src/madpack/create_changelist.py ---
    @@ -0,0 +1,132 @@
    +#!/usr/bin/python
    +# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +
    +# Create changelist for any two branches
    +
    +# Prequisites:
    +# The old version has to be installed in the "madlib_old_vers" schema
    +# The new version has to be installed in the "madlib" (default) schema
    +# Two branches must exist locally (run 'git fetch' to ensure you have the latest version)
    +
    +# Usage:
    +# python create_changelist.py <changelist filename> <old version branch> <new version branch>
    +
    +# Example (should be equivalent to changelist_1.13_1.14.yaml):
    +# python create_changelist.py chtest1.yaml rel/v1.13 rel/v1.14
    +
    +import sys
    +import os
    +
    +ch_filename = sys.argv[1]
    +old_vers = sys.argv[2]
    +new_vers = sys.argv[3]
    +
    +if os.path.exists(ch_filename):
    +    print "{0} already exists".format(ch_filename)
    +    raise SystemExit
    +print "Creating changelist {0}".format(ch_filename)
    +os.system("rm -f /tmp/madlib_tmp_nm.txt /tmp/madlib_tmp_udf.txt /tmp/madlib_tmp_udt.txt")
    +f = open(ch_filename, "w")
    +f.write(
    +"""# ------------------------------------------------------------------------------
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#   http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing,
    +# software distributed under the License is distributed on an
    +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +# KIND, either express or implied.  See the License for the
    +# specific language governing permissions and limitations
    +# under the License.
    +# ------------------------------------------------------------------------------
    +""")
    +
    +f.write(
    +"""
    +# Changelist for MADlib version {old_vers} to {new_vers}
    +
    +# This file contains all changes that were introduced in a new version of
    +# MADlib. This changelist is used by the upgrade script to detect what objects
    +# should be upgraded (while retaining all other objects from the previous version)
    +
    +# New modules (actually .sql_in files) added in upgrade version
    +# For these files the sql_in code is retained as is with the functions in the
    +# file installed on the upgrade version. All other files (that don't have
    +# updates), are cleaned up to remove object replacements
    +""".format(**locals()))
    +
    +os.system("git diff {old_vers} {new_vers} --name-only --diff-filter=A > /tmp/madlib_tmp_nm.txt".format(**locals()))
    +
    +f.write("new module:\n")
    +with open('/tmp/madlib_tmp_nm.txt') as fp:
    +    for line in fp:
    +        if 'sql_in' in line and '/test/' not in line:
    +             f.write('    ' + line.split('/')[5].split('.')[0]+':\n')
    +
    +os.system("psql madlib -f diff_udf.sql > /tmp/madlib_tmp_udf.txt")
    --- End diff --
    
    This assumes that the dbname would always be `madlib`. We should either parametrize the dbname (default to madlib) or add this to the pre requisites. I vote for the former 


---