You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by "orhankislal (via GitHub)" <gi...@apache.org> on 2023/02/13 16:36:37 UTC

[GitHub] [madlib] orhankislal opened a new pull request, #594: WCC: Add warm start

orhankislal opened a new pull request, #594:
URL: https://github.com/apache/madlib/pull/594

   WCC creates a large number of subtransactions which may cause system performance degredation in some cases. This cpmmit adds a parameter to limit the number of iterations it runs as well as another one to continue from the incomplete state.
   
   <!--  
   
   Thanks for sending a pull request!  Here are some tips for you:
   1. Refer to this link for contribution guidelines https://cwiki.apache.org/confluence/display/MADLIB/Contribution+Guidelines
   2. Please Provide the Module Name, a JIRA Number and a short description about your changes.
   -->
   
   - [ ] Add the module name, JIRA# to PR/commit and description.
   - [ ] Add tests for the change. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [madlib] orhankislal merged pull request #594: WCC: Add warm start

Posted by "orhankislal (via GitHub)" <gi...@apache.org>.
orhankislal merged PR #594:
URL: https://github.com/apache/madlib/pull/594


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [madlib] kaknikhil commented on a diff in pull request #594: WCC: Add warm start

Posted by "kaknikhil (via GitHub)" <gi...@apache.org>.
kaknikhil commented on code in PR #594:
URL: https://github.com/apache/madlib/pull/594#discussion_r1114882309


##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -47,14 +48,20 @@ from graph_utils import validate_output_and_summary_tables
 
 def validate_wcc_args(schema_madlib, vertex_table, vertex_id, edge_table,
                       edge_params, out_table, out_table_summary,
-                      grouping_cols_list, module_name):
+                      grouping_cols_list, warm_start, out_table_message, module_name):
     """
     Function to validate input parameters for wcc
     """
     validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
-                          out_table, module_name)
-    _assert(not table_exists(out_table_summary),
-            "Graph {module_name}: Output summary table already exists!".format(**locals()))
+                          out_table, module_name, warm_start)
+    if not warm_start:
+        _assert(not table_exists(out_table_summary),

Review Comment:
   Maybe we could also assert that the msg table doesn't exist for non warm start scenarios ?



##########
src/ports/postgres/modules/graph/graph_utils.py_in:
##########
@@ -74,14 +74,18 @@ def validate_output_and_summary_tables(model_out_table, module_name,
                 "Graph WCC: Output table {0} already exists.".format(out_table))
 
 def validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
-                          out_table, func_name, **kwargs):
+                          out_table, func_name, warm_start = False, **kwargs):
     """
     Validates graph tables (vertex and edge) as well as the output table.
     """
     _assert(out_table and out_table.strip().lower() not in ('null', ''),
-            "Graph {func_name}: Invalid output table name!".format(**locals()))
-    _assert(not table_exists(out_table),
-            "Graph {func_name}: Output table already exists!".format(**locals()))
+                "Graph {func_name}: Invalid output table name!".format(**locals()))
+    if warm_start:
+        _assert(table_exists(out_table),

Review Comment:
   Shouldn't we also assert that the msg table exists ?



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -47,14 +48,20 @@ from graph_utils import validate_output_and_summary_tables
 
 def validate_wcc_args(schema_madlib, vertex_table, vertex_id, edge_table,
                       edge_params, out_table, out_table_summary,
-                      grouping_cols_list, module_name):
+                      grouping_cols_list, warm_start, out_table_message, module_name):
     """
     Function to validate input parameters for wcc
     """
     validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
-                          out_table, module_name)
-    _assert(not table_exists(out_table_summary),
-            "Graph {module_name}: Output summary table already exists!".format(**locals()))
+                          out_table, module_name, warm_start)
+    if not warm_start:
+        _assert(not table_exists(out_table_summary),
+                "Graph {module_name}: Output summary table already exists!".format(**locals()))
+    else:
+        _assert(table_exists(out_table_summary),
+                "Graph {module_name}: Output summary table is missing for warm start!".format(**locals()))
+        _assert(table_exists(out_table_message),
+                "Graph {module_name}: Output message table is missing for warm start! Either the wcc was completed in the last run or the table got dropped/renamed.".format(**locals()))

Review Comment:
   s/the wcc/wcc



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -211,6 +227,14 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
     else:
         edge_inverse = edge_table
 
+    if warm_start:
+        out_table_sql = ""
+        msg_sql = ""
+        if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':

Review Comment:
   We do the same check later again, I'm assuming we need both ?



##########
src/ports/postgres/modules/graph/test/wcc.sql_in:
##########
@@ -276,3 +276,12 @@ SELECT graph_wcc_num_cpts(
 SELECT assert(relative_error(num_components, 3) < 0.00001,
         'Weakly Connected Components: Incorrect largest component value.'
     ) FROM count_table WHERE user_id1=1;
+
+DROP TABLE IF EXISTS wcc_warm_start_out, wcc_warm_start_out_summary;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', 'user_id', 2);

Review Comment:
   1. Can there be any issues if the user upgrades madlib and then uses the new function with warm start set to true on the previous out tables ? We might have to explicitly call this out as well
   2. We should also add a test case for this scenario (This is based on the assumption that it takes 5 iterations to update all the nodes)
       * Run 2 iterations without warm start
       * Run 2 more with warm start set to true
       * Run 1 or 2 more with warm start set to true.
       For the first two runs, we should at the very least assert that nodes_to_update > 0 and for the final run, we should assert the contents of the out table and also that nodes_to_update = 0
   3. There are still tables leftover after each warm start wcc run(with or without grouping). I think it might be a good idea to add a test to make sure that there aren't any extra tables left behind
   4. Another scenario to consider:
       * Let's say it takes 10 iterations to update all the nodes in the graph
       * User runs wcc with iteration limit of 2
       * User runs wcc with iteration limit of 3 and warm start
       * User runs wcc with iteration limit of 3 and warm start but removes the grouping col
   What is the ideal behavior here ? (same applies if we start with no grouping col and then start using one before all the nodes have been updated)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [madlib] orhankislal commented on a diff in pull request #594: WCC: Add warm start

Posted by "orhankislal (via GitHub)" <gi...@apache.org>.
orhankislal commented on code in PR #594:
URL: https://github.com/apache/madlib/pull/594#discussion_r1110251401


##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -211,6 +221,18 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
     else:
         edge_inverse = edge_table
 
+    if warm_start:
+        new_update_sql = """

Review Comment:
   I'm not sure about that. Let's say it got deleted in one of the branches by accident. The code might compile and just work incorrectly, I would rather get a simple error that is easy to fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [madlib] orhankislal commented on a diff in pull request #594: WCC: Add warm start

Posted by "orhankislal (via GitHub)" <gi...@apache.org>.
orhankislal commented on code in PR #594:
URL: https://github.com/apache/madlib/pull/594#discussion_r1114972371


##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -211,6 +227,14 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
     else:
         edge_inverse = edge_table
 
+    if warm_start:
+        out_table_sql = ""
+        msg_sql = ""
+        if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':

Review Comment:
   yes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [madlib] orhankislal commented on a diff in pull request #594: WCC: Add warm start

Posted by "orhankislal (via GitHub)" <gi...@apache.org>.
orhankislal commented on code in PR #594:
URL: https://github.com/apache/madlib/pull/594#discussion_r1114953481


##########
src/ports/postgres/modules/graph/graph_utils.py_in:
##########
@@ -74,14 +74,18 @@ def validate_output_and_summary_tables(model_out_table, module_name,
                 "Graph WCC: Output table {0} already exists.".format(out_table))
 
 def validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
-                          out_table, func_name, **kwargs):
+                          out_table, func_name, warm_start = False, **kwargs):
     """
     Validates graph tables (vertex and edge) as well as the output table.
     """
     _assert(out_table and out_table.strip().lower() not in ('null', ''),
-            "Graph {func_name}: Invalid output table name!".format(**locals()))
-    _assert(not table_exists(out_table),
-            "Graph {func_name}: Output table already exists!".format(**locals()))
+                "Graph {func_name}: Invalid output table name!".format(**locals()))
+    if warm_start:
+        _assert(table_exists(out_table),

Review Comment:
   We do it in the wcc specific function, this is generic graph utility.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [madlib] kaknikhil commented on a diff in pull request #594: WCC: Add warm start

Posted by "kaknikhil (via GitHub)" <gi...@apache.org>.
kaknikhil commented on code in PR #594:
URL: https://github.com/apache/madlib/pull/594#discussion_r1107670385


##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -47,14 +48,18 @@ from graph_utils import validate_output_and_summary_tables
 
 def validate_wcc_args(schema_madlib, vertex_table, vertex_id, edge_table,
                       edge_params, out_table, out_table_summary,
-                      grouping_cols_list, module_name):
+                      grouping_cols_list, warm_start, module_name):
     """
     Function to validate input parameters for wcc
     """
     validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
-                          out_table, module_name)
-    _assert(not table_exists(out_table_summary),
-            "Graph {module_name}: Output summary table already exists!".format(**locals()))
+                          out_table, module_name, warm_start)
+    if not warm_start:

Review Comment:
   Shouldn't the summary table validation also happen in `validate_graph_coding` ?



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -211,6 +221,18 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
     else:
         edge_inverse = edge_table
 
+    if warm_start:

Review Comment:
   1. I think it will be a good idea to explain the workflow for wcc when warm_start is set to true vs when it's set to false. This could be added to the commit message, PR description and also as a comment in the python file. 
   2. Should we also update the design doc and user docs ?



##########
src/ports/postgres/modules/graph/wcc.sql_in:
##########
@@ -115,6 +117,17 @@ weakly connected components are generated for all data
 (single graph).
 @note Expressions are not currently supported for 'grouping_cols'.</dd>
 
+<dt>iteration_limit (optional)</dt>
+<dd>INTEGER, default: NULL. Maximum number of iterations to run wcc. This

Review Comment:
   We should explicitly call out all the tables that get created for various scenarios of iteration limit and nodes_to_update



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -211,6 +221,18 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
     else:
         edge_inverse = edge_table
 
+    if warm_start:
+        new_update_sql = """
+            CREATE TABLE {newupdate} AS SELECT * FROM {out_table};
+        """.format(**locals())
+        msg_sql = """
+            CREATE TABLE {message} AS SELECT * FROM {out_table_message};
+        """.format(**locals())
+        if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':
+            new_update_sql += """

Review Comment:
   Maybe use `.format` instead of `+`. Also applies to other places where we use `+`



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -366,44 +406,61 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
         # found in the current iteration.
         with SetGUC("dev_opt_unsafe_truncate_in_subtransaction", "on"):
 
-            plpy.execute(loop_sql.format(**locals()))
-
-            if grouping_cols:
-                nodes_to_update = plpy.execute("""
-                                    SELECT SUM(cnt) AS cnt_sum
-                                    FROM (
-                                        SELECT COUNT(*) AS cnt
-                                        FROM {toupdate}
-                                        GROUP BY {grouping_cols}
-                                    ) t
-                    """.format(**locals()))[0]["cnt_sum"]
-            else:
-                nodes_to_update = plpy.execute("""
-                                    SELECT COUNT(*) AS cnt FROM {toupdate}
-                                """.format(**locals()))[0]["cnt"]
+            nodes_to_update = plpy.execute(loop_sql.format(**locals()))[0]["cnt_sum"]
+            iteration_counter += 1
+
 
     if not is_platform_pg():
         # Drop intermediate table created for Greenplum
         plpy.execute("DROP TABLE IF EXISTS {0}".format(edge_inverse))
 
-    rename_table(schema_madlib, newupdate, out_table)
-    if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':
-        plpy.execute("ALTER TABLE {out_table} RENAME COLUMN id TO {vertex_id_in}".format(**locals()))
+    if not warm_start:
+        rename_table(schema_madlib, newupdate, out_table)
+        if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':
+            plpy.execute("ALTER TABLE {out_table} RENAME COLUMN id TO {vertex_id_in}".format(**locals()))
+    else:
+        plpy.execute("""
+            TRUNCATE TABLE {out_table};

Review Comment:
   Is there a performance reason for doing a truncate and insert rather than drop and rename ? If yes, we should add that as a comment here 



##########
src/ports/postgres/modules/graph/test/wcc.sql_in:
##########
@@ -276,3 +276,12 @@ SELECT graph_wcc_num_cpts(
 SELECT assert(relative_error(num_components, 3) < 0.00001,
         'Weakly Connected Components: Incorrect largest component value.'
     ) FROM count_table WHERE user_id1=1;
+
+DROP TABLE IF EXISTS wcc_warm_start_out, wcc_warm_start_out_summary;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', 'user_id', 2);

Review Comment:
   I think there's a minor bug. Consider the following scenario
   
   Add data
   ```
   test=# select * from "EDGE";
    src_node | dest_node | user_id
   ----------+-----------+---------
           1 |         2 |       1
           1 |         3 |       1
           6 |         3 |       1
           5 |         4 |       1
           5 |         8 |       1
           2 |         3 |       1
           7 |         6 |       1
           8 |         4 |       1
   (8 rows)
   
   Time: 4.080 ms
   test=# select * from vertex ;
    vertex_id
   -----------
            1
            2
            3
            4
            7
            8
            5
            6
   (8 rows)
   ```
   
   Run with no grouping col but an iteration limit of 1
   ```
   SELECT madlib.weakly_connected_components('vertex','vertex_id','"EDGE"','src=src_node,dest=dest_node','wcc_out', NULL, 1);
   ```
   
   Now without dropping any tables, run the same query again with warm start set to true
   ```
   SELECT madlib.weakly_connected_components('vertex','vertex_id','"EDGE"','src=src_node,dest=dest_node','wcc_out', NULL, 1, TRUE);
   ERROR:  spiexceptions.DuplicateTable: relation "wcc_out_message" already exists
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "weakly_connected_components", line 21, in <module>
       return wcc.wcc(**globals())
     PL/Python function "weakly_connected_components", line 432, in wcc
     PL/Python function "weakly_connected_components", line 1212, in rename_table
     PL/Python function "weakly_connected_components", line 1261, in __do_rename_and_get_new_name
   PL/Python function "weakly_connected_components"
   ```
   The interesting thing is that if you set the iteration limit to 4 for the second wcc query, it does not error out. I think that's because it takes 5 iterations for the `nodes_to_update` to become 0. 
   
   We should also dd a test case for this scenario  (This is based on the assumption that it takes 5 iteration to update all the nodes)
   1. Run 2 iterations without warm start
   2. Run 2 more with warm start set to true 
   3. Run 1 or 2 more with warm start set to true.
   For the first two runs, we should at the very least assert that nodes_to_update > 0 and for the final run, we should assert the contents of the out table and also that nodes_to_update = 0
   



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -211,6 +221,18 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
     else:
         edge_inverse = edge_table
 
+    if warm_start:
+        new_update_sql = """

Review Comment:
   I think we should initialize all these sql related variables to "" outside the if check. That way we won't ever run into `variable not defined` issues



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -211,6 +221,18 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
     else:
         edge_inverse = edge_table
 
+    if warm_start:
+        new_update_sql = """
+            CREATE TABLE {newupdate} AS SELECT * FROM {out_table};
+        """.format(**locals())
+        msg_sql = """
+            CREATE TABLE {message} AS SELECT * FROM {out_table_message};
+        """.format(**locals())
+        if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':

Review Comment:
   Just curious, why do we need this if check ? Why is there a need to rename to `{vertex_id}` ?



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -90,6 +96,8 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
     edge_params = extract_keyvalue_params(
         edge_args, params_types, default_args)
 
+    if iteration_limit is None or iteration_limit == 0:

Review Comment:
   Do we also need to prevent against <0 values ?



##########
src/ports/postgres/modules/graph/test/wcc.sql_in:
##########
@@ -276,3 +276,12 @@ SELECT graph_wcc_num_cpts(
 SELECT assert(relative_error(num_components, 3) < 0.00001,
         'Weakly Connected Components: Incorrect largest component value.'
     ) FROM count_table WHERE user_id1=1;
+
+DROP TABLE IF EXISTS wcc_warm_start_out, wcc_warm_start_out_summary;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', 'user_id', 2);

Review Comment:
   Other observations:
   1. There is a table that doesn't get dropped if grouping cols is passed in
   2. Run without an iteration limit so that eventually there are no nodes to update
   ```
   SELECT madlib.weakly_connected_components('vertex','vertex_id','"EDGE"','src=src_node,dest=dest_node','wcc_out', 'user_id',0,TRUE);
   ```
   Now run with warm start set to true
   ```
   SELECT madlib.weakly_connected_components('vertex','vertex_id','"EDGE"','src=src_node,dest=dest_node','wcc_out', 'user_id',2,TRUE);
   ERROR:  spiexceptions.UndefinedTable: relation "wcc_out_message" does not exist
   LINE 2: ...age15774726_1676600954_3919632__ AS SELECT * FROM wcc_out_me...
                                                                ^
   QUERY:
               CREATE TABLE __madlib_temp_message15774726_1676600954_3919632__ AS SELECT * FROM wcc_out_message;
   
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "weakly_connected_components", line 21, in <module>
       return wcc.wcc(**globals())
     PL/Python function "weakly_connected_components", line 340, in wcc
   PL/Python function "weakly_connected_components"
   ```
   Instead of this exception, we could print a message along the lines of "We have already updated all the nodes, nothing to do"
   3. If the user passes in -1 as the iteration_limit, the `nodes_to_update` column in out_summary table is always 1 which doesn't seem right.



##########
src/ports/postgres/modules/graph/graph_utils.py_in:
##########
@@ -74,14 +74,18 @@ def validate_output_and_summary_tables(model_out_table, module_name,
                 "Graph WCC: Output table {0} already exists.".format(out_table))
 
 def validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
-                          out_table, func_name, **kwargs):
+                          out_table, func_name, warm_start = False, **kwargs):
     """
     Validates graph tables (vertex and edge) as well as the output table.
     """
     _assert(out_table and out_table.strip().lower() not in ('null', ''),
-            "Graph {func_name}: Invalid output table name!".format(**locals()))
-    _assert(not table_exists(out_table),
-            "Graph {func_name}: Output table already exists!".format(**locals()))
+                "Graph {func_name}: Invalid output table name!".format(**locals()))
+    if not warm_start:

Review Comment:
   Flipping this if check might improve code readability but I'll leave it up to you to decide



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -158,20 +166,21 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
     out_table_summary = ''
     if out_table:
         out_table_summary = add_postfix(out_table, "_summary")
+        out_table_message = add_postfix(out_table, "_message")
     grouping_cols_list = split_quoted_delimited_str(grouping_cols)
     validate_wcc_args(schema_madlib, vertex_table, vertex_id, edge_table,
                       edge_params, out_table, out_table_summary,
-                      grouping_cols_list, 'Weakly Connected Components')
+                      grouping_cols_list, warm_start, 'Weakly Connected Components')
 
     vertex_view_sql = vertex_view_sql.format(**locals())
-    plpy.execute(vertex_view_sql)
 
-    sql = """
+    edge_view_sql = """
         CREATE VIEW {edge_view} AS
         SELECT {src} AS src, {dest} AS dest {grouping_sql}
-        FROM {edge_table}
+        FROM {edge_table};
         """.format(**locals())
-    plpy.execute(sql)
+
+    plpy.execute(vertex_view_sql + edge_view_sql)

Review Comment:
   We should add a comment explaining why we need to run all these sqls in the same plpy.execute. 



##########
src/ports/postgres/modules/graph/test/wcc.sql_in:
##########
@@ -276,3 +276,12 @@ SELECT graph_wcc_num_cpts(
 SELECT assert(relative_error(num_components, 3) < 0.00001,
         'Weakly Connected Components: Incorrect largest component value.'
     ) FROM count_table WHERE user_id1=1;
+
+DROP TABLE IF EXISTS wcc_warm_start_out, wcc_warm_start_out_summary;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', 'user_id', 2);

Review Comment:
   Test improvements/notes:
   1. This is a test for wcc with warm start and grouping cols right ? Shouldn't we also add a test with warm start and no grouping cols ?
   2. We should also add an assert for the output table and the summary table. That will help us in catching regressions(if any)
   3. Can there be any issues if the user upgrades madlib and then uses the new function with warm start set to true on the previous out tables ? We might have to explicitly call this out as well



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -235,77 +257,92 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
         join_grouping_cols = _check_groups(subq, distinct_grp_table, grouping_cols_list)
         group_by_clause_newupdate = ('{0}, {1}.{2}'.format(subq_prefixed_grouping_cols,
                                                            subq, vertex_id))
+        select_grouping_cols = ',' + subq_prefixed_grouping_cols
+
+        if not warm_start:
+            new_update_sql = """

Review Comment:
   We should also mention in the commit message the reason behind combining all these sqls into one plpy execute



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -366,44 +406,61 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
         # found in the current iteration.
         with SetGUC("dev_opt_unsafe_truncate_in_subtransaction", "on"):
 
-            plpy.execute(loop_sql.format(**locals()))
-
-            if grouping_cols:
-                nodes_to_update = plpy.execute("""
-                                    SELECT SUM(cnt) AS cnt_sum
-                                    FROM (
-                                        SELECT COUNT(*) AS cnt
-                                        FROM {toupdate}
-                                        GROUP BY {grouping_cols}
-                                    ) t
-                    """.format(**locals()))[0]["cnt_sum"]
-            else:
-                nodes_to_update = plpy.execute("""
-                                    SELECT COUNT(*) AS cnt FROM {toupdate}
-                                """.format(**locals()))[0]["cnt"]
+            nodes_to_update = plpy.execute(loop_sql.format(**locals()))[0]["cnt_sum"]
+            iteration_counter += 1
+
 
     if not is_platform_pg():
         # Drop intermediate table created for Greenplum
         plpy.execute("DROP TABLE IF EXISTS {0}".format(edge_inverse))
 
-    rename_table(schema_madlib, newupdate, out_table)
-    if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':
-        plpy.execute("ALTER TABLE {out_table} RENAME COLUMN id TO {vertex_id_in}".format(**locals()))
+    if not warm_start:
+        rename_table(schema_madlib, newupdate, out_table)
+        if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':
+            plpy.execute("ALTER TABLE {out_table} RENAME COLUMN id TO {vertex_id_in}".format(**locals()))
+    else:
+        plpy.execute("""
+            TRUNCATE TABLE {out_table};
+            INSERT INTO {out_table}
+            SELECT
+                {vertex_id} AS {vertex_id_in},
+                {component_id}
+                {comma_grouping_cols}
+            FROM {newupdate};

Review Comment:
   I'm probably missing something but don't we also need to drop the newupdate table when warm_start is set to true ? We probably are but just wanted to make sure



##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -235,77 +257,92 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
         join_grouping_cols = _check_groups(subq, distinct_grp_table, grouping_cols_list)
         group_by_clause_newupdate = ('{0}, {1}.{2}'.format(subq_prefixed_grouping_cols,
                                                            subq, vertex_id))
+        select_grouping_cols = ',' + subq_prefixed_grouping_cols
+
+        if not warm_start:
+            new_update_sql = """
+                CREATE TABLE {newupdate} AS
+                SELECT {subq}.{vertex_id},
+                        CAST({BIGINT_MAX} AS BIGINT) AS {component_id}
+                        {select_grouping_cols}
+                FROM {distinct_grp_table} INNER JOIN (
+                    SELECT {grouping_cols_comma} {src} AS {vertex_id}
+                    FROM {edge_table}
+                    UNION
+                    SELECT {grouping_cols_comma} {dest} AS {vertex_id}
+                    FROM {edge_inverse}
+                ) {subq}
+                ON {join_grouping_cols}
+                GROUP BY {group_by_clause_newupdate}
+                {distribution};
+            """.format(**locals())
+            msg_sql = """
+                CREATE TABLE {message} AS
+                SELECT {vertex_table}.{vertex_id},
+                        CAST({vertex_table}.{single_id} AS BIGINT) AS {component_id}
+                        {comma_grouping_cols}
+                FROM {newupdate} INNER JOIN {vertex_table}
+                ON {vertex_table}.{vertex_id} = {newupdate}.{vertex_id}
+                {distribution};
+            """.format(**locals())
+
+        distinct_grp_sql = """
+            CREATE TABLE {distinct_grp_table} AS
+            SELECT DISTINCT {grouping_cols} FROM {edge_table};
 
-        grp_sql = """
-                CREATE TABLE {distinct_grp_table} AS
-                SELECT DISTINCT {grouping_cols} FROM {edge_table};
-            """
-        plpy.execute(grp_sql.format(**locals()))
-
-        prep_sql = """
-            CREATE TABLE {newupdate} AS
-            SELECT {subq}.{vertex_id},
-                    CAST({BIGINT_MAX} AS BIGINT) AS {component_id}
-                    {select_grouping_cols}
-            FROM {distinct_grp_table} INNER JOIN (
-                SELECT {grouping_cols_comma} {src} AS {vertex_id}
-                FROM {edge_table}
-                UNION
-                SELECT {grouping_cols_comma} {dest} AS {vertex_id}
-                FROM {edge_inverse}
-            ) {subq}
-            ON {join_grouping_cols}
-            GROUP BY {group_by_clause_newupdate}
-            {distribution};
-
-            DROP TABLE IF EXISTS {distinct_grp_table};
-
-        """.format(select_grouping_cols=',' + subq_prefixed_grouping_cols,
-                   **locals())
-        plpy.execute(prep_sql)
-
-        message_sql = """
-            CREATE TABLE {message} AS
-            SELECT {vertex_table}.{vertex_id},
-                    CAST({vertex_table}.{single_id} AS BIGINT) AS {component_id}
-                    {comma_grouping_cols}
-            FROM {newupdate} INNER JOIN {vertex_table}
-            ON {vertex_table}.{vertex_id} = {newupdate}.{vertex_id}
-            {distribution};
         """
-        plpy.execute(message_sql.format(**locals()))
+
+        nodes_to_update_sql = """
+            SELECT SUM(cnt) AS cnt_sum
+            FROM (
+                SELECT COUNT(*) AS cnt
+                FROM {toupdate}
+                GROUP BY {grouping_cols}
+                ) t
+        """.format(**locals())
     else:
-        prep_sql = """
-            CREATE TABLE {newupdate} AS
-            SELECT {vertex_id}, CAST({BIGINT_MAX} AS BIGINT) AS {component_id}
-            FROM {vertex_table}
-            {distribution};
-
-            CREATE TABLE {message} AS
-            SELECT {vertex_id}, CAST({single_id} AS BIGINT) AS {component_id}
-            FROM {vertex_table}
-            {distribution};
-        """
-        plpy.execute(prep_sql.format(**locals()))
-
-    oldupdate_sql = """
-            CREATE TABLE {oldupdate} AS
-            SELECT {message}.{vertex_id},
-                    MIN({message}.{component_id}) AS {component_id}
-                    {comma_grouping_cols}
-            FROM {message}
-            GROUP BY {grouping_cols_comma} {vertex_id}
-            LIMIT 0
-            {distribution};
-    """
-    plpy.execute(oldupdate_sql.format(**locals()))
+        if not warm_start:
+            msg_sql = """
+                CREATE TABLE {message} AS
+                SELECT {vertex_id}, CAST({single_id} AS BIGINT) AS {component_id}
+                FROM {vertex_table}
+                {distribution};
+            """.format(**locals())
+            new_update_sql = """
+                CREATE TABLE {newupdate} AS
+                SELECT {vertex_id}, CAST({BIGINT_MAX} AS BIGINT) AS {component_id}
+                FROM {vertex_table}
+                {distribution};
+            """.format(**locals())
+
+        nodes_to_update_sql = """
+            SELECT COUNT(*) AS cnt_sum FROM {toupdate}
+        """.format(**locals())
 
-    toupdate_sql = """
-            CREATE TABLE {toupdate} AS
-            SELECT * FROM {oldupdate}
-            {distribution};
-        """
-    plpy.execute(toupdate_sql.format(**locals()))
+    old_update_sql = """
+        CREATE TABLE {oldupdate} AS
+        SELECT {message}.{vertex_id},
+                MIN({message}.{component_id}) AS {component_id}
+                {comma_grouping_cols}
+        FROM {message}
+        GROUP BY {grouping_cols_comma} {vertex_id}
+        LIMIT 0
+        {distribution};
+    """
+    to_update_sql = """
+        CREATE TABLE {toupdate} AS
+        SELECT * FROM {oldupdate}
+        {distribution};
+    """
+    if is_platform_pg or not is_platform_gp6_or_up():

Review Comment:
   We should also explain the need for this if check. Also I think it might be easier to read the code if we flip the if check to
   ```
   if is_platform_gp6_or_up():
       plpy.execute((distinct_grp_sql + new_update_sql + msg_sql + old_update_sql + to_update_sql).format(**locals()))
   else:
       .....
   ```
   This should also achieve the same thing right ?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [madlib] orhankislal commented on a diff in pull request #594: WCC: Add warm start

Posted by "orhankislal (via GitHub)" <gi...@apache.org>.
orhankislal commented on code in PR #594:
URL: https://github.com/apache/madlib/pull/594#discussion_r1110235678


##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -47,14 +48,18 @@ from graph_utils import validate_output_and_summary_tables
 
 def validate_wcc_args(schema_madlib, vertex_table, vertex_id, edge_table,
                       edge_params, out_table, out_table_summary,
-                      grouping_cols_list, module_name):
+                      grouping_cols_list, warm_start, module_name):
     """
     Function to validate input parameters for wcc
     """
     validate_graph_coding(vertex_table, vertex_id, edge_table, edge_params,
-                          out_table, module_name)
-    _assert(not table_exists(out_table_summary),
-            "Graph {module_name}: Output summary table already exists!".format(**locals()))
+                          out_table, module_name, warm_start)
+    if not warm_start:

Review Comment:
   It should, but that requires changes in the rest of the graph modules since `validate_graph_coding` is common.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [madlib] orhankislal commented on a diff in pull request #594: WCC: Add warm start

Posted by "orhankislal (via GitHub)" <gi...@apache.org>.
orhankislal commented on code in PR #594:
URL: https://github.com/apache/madlib/pull/594#discussion_r1110248891


##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -211,6 +221,18 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
     else:
         edge_inverse = edge_table
 
+    if warm_start:
+        new_update_sql = """
+            CREATE TABLE {newupdate} AS SELECT * FROM {out_table};
+        """.format(**locals())
+        msg_sql = """
+            CREATE TABLE {message} AS SELECT * FROM {out_table_message};
+        """.format(**locals())
+        if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':

Review Comment:
   This was necessary to support multiple column identifiers. If the user specifies id1 and id2 as vertex_ids, we create a new column [id1,id2] and give it a name. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [madlib] kaknikhil commented on a diff in pull request #594: WCC: Add warm start

Posted by "kaknikhil (via GitHub)" <gi...@apache.org>.
kaknikhil commented on code in PR #594:
URL: https://github.com/apache/madlib/pull/594#discussion_r1115121420


##########
src/ports/postgres/modules/graph/test/wcc.sql_in:
##########
@@ -276,3 +276,82 @@ SELECT graph_wcc_num_cpts(
 SELECT assert(relative_error(num_components, 3) < 0.00001,
         'Weakly Connected Components: Incorrect largest component value.'
     ) FROM count_table WHERE user_id1=1;
+
+-- Warm Start
+
+-- Without grouping
+DROP TABLE IF EXISTS wcc_non_warm_start_out, wcc_non_warm_start_out_summary;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_non_warm_start_out');
+
+DROP TABLE IF EXISTS wcc_warm_start_out, wcc_warm_start_out_summary, wcc_warm_start_out_message;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', NULL, 1);
+
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start summary table.')
+FROM wcc_warm_start_out_summary;
+SELECT assert(nodes_to_update > 0,
+        'Weakly Connected Components: Warm start incorrect nodes_to_update.'
+    ) FROM wcc_warm_start_out_summary;
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start message table.')
+FROM wcc_warm_start_out_message;
+
+SELECT assert(nodes_to_update > 0,

Review Comment:
   This seems to be doing the exact same thing as line 291. Did we mean to query a different table here ?



##########
src/ports/postgres/modules/graph/test/wcc.sql_in:
##########
@@ -276,3 +276,82 @@ SELECT graph_wcc_num_cpts(
 SELECT assert(relative_error(num_components, 3) < 0.00001,
         'Weakly Connected Components: Incorrect largest component value.'
     ) FROM count_table WHERE user_id1=1;
+
+-- Warm Start
+
+-- Without grouping
+DROP TABLE IF EXISTS wcc_non_warm_start_out, wcc_non_warm_start_out_summary;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_non_warm_start_out');
+
+DROP TABLE IF EXISTS wcc_warm_start_out, wcc_warm_start_out_summary, wcc_warm_start_out_message;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', NULL, 1);
+
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start summary table.')
+FROM wcc_warm_start_out_summary;
+SELECT assert(nodes_to_update > 0,
+        'Weakly Connected Components: Warm start incorrect nodes_to_update.'
+    ) FROM wcc_warm_start_out_summary;
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start message table.')
+FROM wcc_warm_start_out_message;
+
+SELECT assert(nodes_to_update > 0,
+        'Weakly Connected Components: Incorrect nodes to update count.'
+    ) FROM wcc_warm_start_out_summary;
+
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', NULL, 1, True);
+
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start summary table.')
+FROM wcc_warm_start_out_summary;
+SELECT assert(nodes_to_update > 0,
+        'Weakly Connected Components: Warm start incorrect nodes_to_update.'
+    ) FROM wcc_warm_start_out_summary;
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start message table.')
+FROM wcc_warm_start_out_message;
+
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', NULL, 2, True);
+
+SELECT assert(nodes_to_update = 0,
+        'Weakly Connected Components: Warm start incorrect nodes_to_update.'
+    ) FROM wcc_warm_start_out_summary;
+
+SELECT assert(count(*) < 0.00001, 'Weakly Connected Components: Different warm start result.')
+FROM wcc_non_warm_start_out w1, wcc_warm_start_out w2
+WHERE w1.id = w2.id AND w1.component_id != w2.component_id;
+
+SELECT assert(relative_error(count(*), 4) < 0.00001,
+        'Weakly Connected Components: Warm start incorrect component_id.'
+    ) FROM wcc_warm_start_out
+WHERE component_id = 992147483657;
+
+-- With grouping
+DROP TABLE IF EXISTS wcc_non_warm_start_out, wcc_non_warm_start_out_summary;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_non_warm_start_out', 'user_id');
+
+DROP TABLE IF EXISTS wcc_warm_start_out, wcc_warm_start_out_summary, wcc_warm_start_out_message;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', 'user_id', 2);
+
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start summary table.')
+FROM wcc_warm_start_out_summary;
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start message table.')
+FROM wcc_warm_start_out_message;
+
+SELECT assert(nodes_to_update > 0,
+        'Weakly Connected Components: Incorrect nodes to update count.'
+    ) FROM wcc_warm_start_out_summary;
+
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', 'user_id', 2, True);
+
+SELECT assert(count(*) < 0.00001, 'Weakly Connected Components: Different warm start result.')
+FROM wcc_non_warm_start_out w1, wcc_warm_start_out w2
+WHERE w1.id = w2.id AND w1.user_id = w2.user_id AND w1.component_id != w2.component_id;
+
+SELECT assert(relative_error(count(0), 4) < 0.00001,

Review Comment:
   This should be count(*) and not count(0), right ? count(0) will always return 1



##########
src/ports/postgres/modules/graph/test/wcc.sql_in:
##########
@@ -276,3 +276,82 @@ SELECT graph_wcc_num_cpts(
 SELECT assert(relative_error(num_components, 3) < 0.00001,
         'Weakly Connected Components: Incorrect largest component value.'
     ) FROM count_table WHERE user_id1=1;
+
+-- Warm Start
+
+-- Without grouping
+DROP TABLE IF EXISTS wcc_non_warm_start_out, wcc_non_warm_start_out_summary;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_non_warm_start_out');
+
+DROP TABLE IF EXISTS wcc_warm_start_out, wcc_warm_start_out_summary, wcc_warm_start_out_message;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', NULL, 1);
+
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start summary table.')
+FROM wcc_warm_start_out_summary;
+SELECT assert(nodes_to_update > 0,
+        'Weakly Connected Components: Warm start incorrect nodes_to_update.'
+    ) FROM wcc_warm_start_out_summary;
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start message table.')
+FROM wcc_warm_start_out_message;
+
+SELECT assert(nodes_to_update > 0,
+        'Weakly Connected Components: Incorrect nodes to update count.'
+    ) FROM wcc_warm_start_out_summary;
+
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', NULL, 1, True);
+
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start summary table.')
+FROM wcc_warm_start_out_summary;
+SELECT assert(nodes_to_update > 0,
+        'Weakly Connected Components: Warm start incorrect nodes_to_update.'
+    ) FROM wcc_warm_start_out_summary;
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start message table.')
+FROM wcc_warm_start_out_message;
+
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', NULL, 2, True);
+
+SELECT assert(nodes_to_update = 0,
+        'Weakly Connected Components: Warm start incorrect nodes_to_update.'
+    ) FROM wcc_warm_start_out_summary;
+
+SELECT assert(count(*) < 0.00001, 'Weakly Connected Components: Different warm start result.')
+FROM wcc_non_warm_start_out w1, wcc_warm_start_out w2
+WHERE w1.id = w2.id AND w1.component_id != w2.component_id;
+
+SELECT assert(relative_error(count(*), 4) < 0.00001,
+        'Weakly Connected Components: Warm start incorrect component_id.'
+    ) FROM wcc_warm_start_out
+WHERE component_id = 992147483657;
+
+-- With grouping
+DROP TABLE IF EXISTS wcc_non_warm_start_out, wcc_non_warm_start_out_summary;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_non_warm_start_out', 'user_id');
+
+DROP TABLE IF EXISTS wcc_warm_start_out, wcc_warm_start_out_summary, wcc_warm_start_out_message;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', 'user_id', 2);
+
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start summary table.')
+FROM wcc_warm_start_out_summary;
+SELECT assert(count(*) > 0, 'Weakly Connected Components: Empty warm start message table.')
+FROM wcc_warm_start_out_message;
+
+SELECT assert(nodes_to_update > 0,
+        'Weakly Connected Components: Incorrect nodes to update count.'
+    ) FROM wcc_warm_start_out_summary;
+
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', 'user_id', 2, True);
+
+SELECT assert(count(*) < 0.00001, 'Weakly Connected Components: Different warm start result.')

Review Comment:
   Should this instead be `count(*) = 0`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [madlib] orhankislal commented on a diff in pull request #594: WCC: Add warm start

Posted by "orhankislal (via GitHub)" <gi...@apache.org>.
orhankislal commented on code in PR #594:
URL: https://github.com/apache/madlib/pull/594#discussion_r1110257271


##########
src/ports/postgres/modules/graph/wcc.py_in:
##########
@@ -366,44 +406,61 @@ def wcc(schema_madlib, vertex_table, vertex_id, edge_table, edge_args,
         # found in the current iteration.
         with SetGUC("dev_opt_unsafe_truncate_in_subtransaction", "on"):
 
-            plpy.execute(loop_sql.format(**locals()))
-
-            if grouping_cols:
-                nodes_to_update = plpy.execute("""
-                                    SELECT SUM(cnt) AS cnt_sum
-                                    FROM (
-                                        SELECT COUNT(*) AS cnt
-                                        FROM {toupdate}
-                                        GROUP BY {grouping_cols}
-                                    ) t
-                    """.format(**locals()))[0]["cnt_sum"]
-            else:
-                nodes_to_update = plpy.execute("""
-                                    SELECT COUNT(*) AS cnt FROM {toupdate}
-                                """.format(**locals()))[0]["cnt"]
+            nodes_to_update = plpy.execute(loop_sql.format(**locals()))[0]["cnt_sum"]
+            iteration_counter += 1
+
 
     if not is_platform_pg():
         # Drop intermediate table created for Greenplum
         plpy.execute("DROP TABLE IF EXISTS {0}".format(edge_inverse))
 
-    rename_table(schema_madlib, newupdate, out_table)
-    if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':
-        plpy.execute("ALTER TABLE {out_table} RENAME COLUMN id TO {vertex_id_in}".format(**locals()))
+    if not warm_start:
+        rename_table(schema_madlib, newupdate, out_table)
+        if vertex_type != "BIGINT[]" and vertex_id_in and vertex_id_in != 'id':
+            plpy.execute("ALTER TABLE {out_table} RENAME COLUMN id TO {vertex_id_in}".format(**locals()))
+    else:
+        plpy.execute("""
+            TRUNCATE TABLE {out_table};

Review Comment:
   Creating multiple tables in each iteration causes catalog bloat.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [madlib] orhankislal commented on a diff in pull request #594: WCC: Add warm start

Posted by "orhankislal (via GitHub)" <gi...@apache.org>.
orhankislal commented on code in PR #594:
URL: https://github.com/apache/madlib/pull/594#discussion_r1114957314


##########
src/ports/postgres/modules/graph/test/wcc.sql_in:
##########
@@ -276,3 +276,12 @@ SELECT graph_wcc_num_cpts(
 SELECT assert(relative_error(num_components, 3) < 0.00001,
         'Weakly Connected Components: Incorrect largest component value.'
     ) FROM count_table WHERE user_id1=1;
+
+DROP TABLE IF EXISTS wcc_warm_start_out, wcc_warm_start_out_summary;
+SELECT weakly_connected_components('v2',NULL,'e2',NULL,'wcc_warm_start_out', 'user_id', 2);

Review Comment:
   1. That will error as expected since the message table do not exist for them.
   2. Added
   3. Fixed
   4. Added a check to make sure the same set of arguments are passed for warm start (except num of iterations)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@madlib.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org