You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2021/02/26 12:10:32 UTC

[GitHub] [madlib] orhankislal opened a new pull request #554: DL: Check if the owner of the object table is a superuser

orhankislal opened a new pull request #554:
URL: https://github.com/apache/madlib/pull/554


   <!--  
   
   Thanks for sending a pull request!  Here are some tips for you:
   1. Refer to this link for contribution guidelines https://cwiki.apache.org/confluence/display/MADLIB/Contribution+Guidelines
   2. Please Provide the Module Name, a JIRA Number and a short description about your changes.
   -->
   
   - [ ] Add the module name, JIRA# to PR/commit and description.
   - [ ] Add tests for the change. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] fmcquillan99 commented on pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
fmcquillan99 commented on pull request #554:
URL: https://github.com/apache/madlib/pull/554#issuecomment-790190171


   Also if I prepend with schema
   ```
   cur.execute("DROP TABLE IF EXISTS madlib.custom_function_table")
   cur.execute("SELECT madlib.load_custom_function('madlib.custom_function_table',  %s,'squared_error', 'squared error')", [p2.Binary(pb_squared_error)])
   cur.execute("SELECT madlib.load_custom_function('madlib.custom_function_table',  %s,'rmse', 'root mean square error')", [p2.Binary(pb_rmse)])
   conn.commit()
   ```
   I get this error
   ```
   ---------------------------------------------------------------------------
   InternalError_                            Traceback (most recent call last)
   <ipython-input-5-c77ecf220203> in <module>()
        21 # call load function
        22 cur.execute("DROP TABLE IF EXISTS madlib.custom_function_table")
   ---> 23 cur.execute("SELECT madlib.load_custom_function('madlib.custom_function_table',  %s,'squared_error', 'squared error')", [p2.Binary(pb_squared_error)])
        24 cur.execute("SELECT madlib.load_custom_function('madlib.custom_function_table',  %s,'rmse', 'root mean square error')", [p2.Binary(pb_rmse)])
        25 conn.commit()
   
   InternalError_: plpy.Error: Incorrect table name (madlib."madlib.custom_function_table") provided! Table name should be of the form: <schema name>.<table name> (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "load_custom_function", line 21, in <module>
       madlib_keras_custom_function.load_custom_function(**globals())
     PL/Python function "load_custom_function", line 42, in wrapper
     PL/Python function "load_custom_function", line 121, in load_custom_function
   PL/Python function "load_custom_function"
   ```
   so I am not sure how to use this utility or what the rules are.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] orhankislal commented on pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
orhankislal commented on pull request #554:
URL: https://github.com/apache/madlib/pull/554#issuecomment-790402878


   > It look like it created the `custom_function_table` in the madlib schema even though I was working in the public schema. So the `DROP TABLE IF EXISTS custom_function_table` did not work. If I change it to `DROP TABLE IF EXISTS madlib.custom_function_table` then it does work.
   > 
   > Is that the intended behavior?
   
   Yes, this is the intended behavior. The custom_function_table is created in the madlib schema, it doesn't matter what your search_path looks like. Your code tries to drop the non-existent `public.custom_function_table` and then tries to load the `squared_error` function a second time on the `madlib.custom_function_table` table which fails as expected. 
   Since you are an admin, you are allowed to drop tables from the `madlib` schema.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista commented on a change in pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
reductionista commented on a change in pull request #554:
URL: https://github.com/apache/madlib/pull/554#discussion_r586978514



##########
File path: src/ports/postgres/modules/utilities/utilities.py_in
##########
@@ -775,6 +775,17 @@ def is_superuser(user):
     return plpy.execute("SELECT rolsuper FROM pg_catalog.pg_roles "\
                         "WHERE rolname = '{0}'".format(user))[0]['rolsuper']
 
+def get_table_owner(schema_table):
+
+    split_table = schema_table.split(".",1)
+    schema = split_table[0]

Review comment:
       I think we need to be more careful here.
   
   Let's say there is a custom function table an admin created called `madlib.custom_functions`
   
   I think I see a loophole in the way `get_table_ower` is implemented which allows any ordinary user to gain admin access.
   
   Steps:
   1.  User creates a table in public schema named "madlib.custom_functions.haha", filling it with their own malicious custom functions.
   2.  User sets search_path=madlib,public
   3.  Malicious user passes object_table='madlib.custom_functions.haha' to MADlib function
   
   Seems like this would pass the security check, and then proceed to load the custom functions from the user's table instead of the admin's table.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] orhankislal commented on pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
orhankislal commented on pull request #554:
URL: https://github.com/apache/madlib/pull/554#issuecomment-790406426


   > Also if I prepend with schema to the table in the load utility
   > ...
   > I get this error
   > ...
   > so I am not sure how to use this utility or what the rules are.
   
   The rule is to just give a table name without any schema. We append the madlib schema and create the table there. If you want to delete a function, there is `delete_custom_function` with the same rule (no schema in the table name). This will actually drop the table if you remove every custom function therein. If you don't want to delete the functions one-by-one and want to completely get rid of the table right away, then you'll have to do it manually by using the madlib schema. If that seems confusing, we can add a new function to drop the whole table as well. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] orhankislal merged pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
orhankislal merged pull request #554:
URL: https://github.com/apache/madlib/pull/554


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista commented on a change in pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
reductionista commented on a change in pull request #554:
URL: https://github.com/apache/madlib/pull/554#discussion_r587858209



##########
File path: src/ports/postgres/modules/utilities/utilities.py_in
##########
@@ -775,6 +775,17 @@ def is_superuser(user):
     return plpy.execute("SELECT rolsuper FROM pg_catalog.pg_roles "\
                         "WHERE rolname = '{0}'".format(user))[0]['rolsuper']
 
+def get_table_owner(schema_table):
+
+    split_table = schema_table.split(".",1)
+    schema = split_table[0]

Review comment:
       Ok, that makes sense!  I thought I remembered us only prepending madlib if they didn't already specify it... must have misremembered.  I'm not worried about the case where the admin installs madlib to a schema with a . in it, that seems like enough of a bizarre situation we can ignore it.
   
   If we do want this to be a general utility function, then we might want to let postgres do the splitting into schema + table name rather than hoping we're doing it in exactly the same way it's done internally.  But I don't know off the top of my head how to do that (pg_catalog.pg_class?)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] fmcquillan99 edited a comment on pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
fmcquillan99 edited a comment on pull request #554:
URL: https://github.com/apache/madlib/pull/554#issuecomment-790190171


   Also if I prepend with schema to the table in the load utility
   ```
   cur.execute("DROP TABLE IF EXISTS madlib.custom_function_table")
   cur.execute("SELECT madlib.load_custom_function('madlib.custom_function_table',  %s,'squared_error', 'squared error')", [p2.Binary(pb_squared_error)])
   cur.execute("SELECT madlib.load_custom_function('madlib.custom_function_table',  %s,'rmse', 'root mean square error')", [p2.Binary(pb_rmse)])
   conn.commit()
   ```
   I get this error
   ```
   ---------------------------------------------------------------------------
   InternalError_                            Traceback (most recent call last)
   <ipython-input-5-c77ecf220203> in <module>()
        21 # call load function
        22 cur.execute("DROP TABLE IF EXISTS madlib.custom_function_table")
   ---> 23 cur.execute("SELECT madlib.load_custom_function('madlib.custom_function_table',  %s,'squared_error', 'squared error')", [p2.Binary(pb_squared_error)])
        24 cur.execute("SELECT madlib.load_custom_function('madlib.custom_function_table',  %s,'rmse', 'root mean square error')", [p2.Binary(pb_rmse)])
        25 conn.commit()
   
   InternalError_: plpy.Error: Incorrect table name (madlib."madlib.custom_function_table") provided! Table name should be of the form: <schema name>.<table name> (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "load_custom_function", line 21, in <module>
       madlib_keras_custom_function.load_custom_function(**globals())
     PL/Python function "load_custom_function", line 42, in wrapper
     PL/Python function "load_custom_function", line 121, in load_custom_function
   PL/Python function "load_custom_function"
   ```
   so I am not sure how to use this utility or what the rules are.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] fmcquillan99 commented on pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
fmcquillan99 commented on pull request #554:
URL: https://github.com/apache/madlib/pull/554#issuecomment-790183101


   If I run this 1x it works
   ```
   # import database connector psycopg2 and create connection cursor
   import psycopg2 as p2
   conn = p2.connect('postgresql://gpadmin@localhost:8000/madlib')
   cur = conn.cursor()
   
   # import Dill and define functions
   import dill
   
   # custom loss
   def squared_error(y_true, y_pred):
       import tensorflow.keras.backend as K
       return K.square(y_pred - y_true)
   pb_squared_error=dill.dumps(squared_error)
   
   # custom metric
   def rmse(y_true, y_pred):
       import tensorflow.keras.backend as K
       return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
   pb_rmse=dill.dumps(rmse)
   
   # call load function
   cur.execute("DROP TABLE IF EXISTS custom_function_table")
   cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'squared_error', 'squared error')", [p2.Binary(pb_squared_error)])
   cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'rmse', 'root mean square error')", [p2.Binary(pb_rmse)])
   conn.commit()
   ```
   But if I run it a second time I get this error:
   ```
   ---------------------------------------------------------------------------
   InternalError_                            Traceback (most recent call last)
   <ipython-input-20-03935b6aea23> in <module>()
        21 # call load function
        22 cur.execute("DROP TABLE IF EXISTS custom_function_table")
   ---> 23 cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'squared_error', 'squared error')", [p2.Binary(pb_squared_error)])
        24 cur.execute("SELECT madlib.load_custom_function('custom_function_table',  %s,'rmse', 'root mean square error')", [p2.Binary(pb_rmse)])
        25 conn.commit()
   
   InternalError_: plpy.Error: Function 'squared_error' already exists in madlib.custom_function_table (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "load_custom_function", line 21, in <module>
       madlib_keras_custom_function.load_custom_function(**globals())
     PL/Python function "load_custom_function", line 42, in wrapper
     PL/Python function "load_custom_function", line 120, in load_custom_function
   PL/Python function "load_custom_function"
   ```
   
   It look like it created the `custom_function_table` in the madlib schema even though I was working in the public schema. So the `DROP TABLE IF EXISTS custom_function_table`  did not work.  If I change it to `DROP TABLE IF EXISTS madlib.custom_function_table` then it does work.
   
   Is that the intended behavior?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista commented on a change in pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
reductionista commented on a change in pull request #554:
URL: https://github.com/apache/madlib/pull/554#discussion_r586993101



##########
File path: src/ports/postgres/modules/utilities/utilities.py_in
##########
@@ -775,6 +775,17 @@ def is_superuser(user):
     return plpy.execute("SELECT rolsuper FROM pg_catalog.pg_roles "\
                         "WHERE rolname = '{0}'".format(user))[0]['rolsuper']
 
+def get_table_owner(schema_table):
+
+    split_table = schema_table.split(".",1)
+    schema = split_table[0]

Review comment:
       Oh, I guess these wouldn't actually work, because of the ,1 argument passed to split.
   But still, it seems risky... maybe we should think it through carefully and add some more test cases to make sure.
   
   What happens if the user table in the public schema is named "madlib.custom_functions"?
   
   Some users may have permission to create their own schemas, but not have full admin access.  They could create a schema named "madlib.custom" and a table named "_functions"... seems like that might slip through, but I'd have to look at it more to be sure.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista edited a comment on pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
reductionista edited a comment on pull request #554:
URL: https://github.com/apache/madlib/pull/554#issuecomment-790936348


   > The rule is to just give a table name without any schema. We append the madlib schema and create the table there. If you want > to delete a function, there is `delete_custom_function` with the same rule (no schema in the table name). This will actually drop > the table if you remove every custom function therein. If you don't want to delete the functions one-by-one and want to 
   > completely get rid of the table right away, then you'll have to do it manually by using the madlib schema. If that seems
   > confusing, we can add a new function to drop the whole table as well.
   
   I still think the ideal behavior would be to let both the admin and the user pass in any path to any table in any schema they want, either fully qualified or relative to their search_path.  I think this would be simpler and probably less risky, as well as allowing the admin more flexibility in where they want to have the table stored.
   
   But if it's easier to leave it the way it is for this release, then I'm fine with that... as long as we're sure there aren't any loopholes.  I think the main thing we need to be sure of, in either case, is that the table we check permissions on is the same one we read from--so let's make sure any modification of what they pass in happens before the check rather than after.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista commented on a change in pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
reductionista commented on a change in pull request #554:
URL: https://github.com/apache/madlib/pull/554#discussion_r586978514



##########
File path: src/ports/postgres/modules/utilities/utilities.py_in
##########
@@ -775,6 +775,17 @@ def is_superuser(user):
     return plpy.execute("SELECT rolsuper FROM pg_catalog.pg_roles "\
                         "WHERE rolname = '{0}'".format(user))[0]['rolsuper']
 
+def get_table_owner(schema_table):
+
+    split_table = schema_table.split(".",1)
+    schema = split_table[0]

Review comment:
       I think we need to be more careful here.
   
   Let's say there is a custom function table an admin created called `madlib.custom_functions`
   
   I think I see a loophole in the way `get_table_owner` is implemented which allows any ordinary user to gain admin access.
   
   Steps:
   1.  User creates a table in public schema named "madlib.custom_functions.haha", filling it with their own malicious custom functions.
   2.  User sets search_path=madlib,public
   3.  Malicious user passes object_table='madlib.custom_functions.haha' to MADlib function
   
   Seems like this would pass the security check, and then proceed to load the custom functions from the user's table instead of the admin's table.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista commented on a change in pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
reductionista commented on a change in pull request #554:
URL: https://github.com/apache/madlib/pull/554#discussion_r586993101



##########
File path: src/ports/postgres/modules/utilities/utilities.py_in
##########
@@ -775,6 +775,17 @@ def is_superuser(user):
     return plpy.execute("SELECT rolsuper FROM pg_catalog.pg_roles "\
                         "WHERE rolname = '{0}'".format(user))[0]['rolsuper']
 
+def get_table_owner(schema_table):
+
+    split_table = schema_table.split(".",1)
+    schema = split_table[0]

Review comment:
       Oh, I guess these steps wouldn't actually work, because of the ,1 argument passed to split.
   But still, it seems risky... maybe we should think it through carefully and add some more test cases to make sure.
   
   What happens if the user table in the public schema is named "madlib.custom_functions"?
   
   Some users may have permission to create their own schemas, but not have full admin access.  They could create a schema named "madlib.custom" and a table named "_functions"... seems like that might slip through, but I'd have to look at it more to be sure.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista edited a comment on pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
reductionista edited a comment on pull request #554:
URL: https://github.com/apache/madlib/pull/554#issuecomment-790936348


   
   > The rule is to just give a table name without any schema. We append the madlib schema and create the table there. If you want > to delete a function, there is `delete_custom_function` with the same rule (no schema in the table name). This will actually drop > the table if you remove every custom function therein. If you don't want to delete the functions one-by-one and want to 
   > completely get rid of the table right away, then you'll have to do it manually by using the madlib schema. If that seems
   > confusing, we can add a new function to drop the whole table as well.
   
   I still think the ideal behavior would be to let both the admin and the user pass in any path to any table in any schema they want, either fully qualified or relative to their search_path.  I think this would be simpler and probably less risky, as well as allowing the admin more flexibility in where they want to have the table stored.
   
   But if it's easier to leave it the way it is for this release, then I'm fine with that... as long as we're sure there aren't any loopholes.  I think the main thing we need to be sure of, in either case, is that the table we check permissions on is the same one we write to--so let's make sure any modification of what they pass in happens before the check rather than after.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] reductionista commented on pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
reductionista commented on pull request #554:
URL: https://github.com/apache/madlib/pull/554#issuecomment-790936348


   > > Also if I prepend with schema to the table in the load utility
   > > ...
   > > I get this error
   > > ...
   > > so I am not sure how to use this utility or what the rules are.
   > 
   > The rule is to just give a table name without any schema. We append the madlib schema and create the table there. If you want to delete a function, there is `delete_custom_function` with the same rule (no schema in the table name). This will actually drop the table if you remove every custom function therein. If you don't want to delete the functions one-by-one and want to completely get rid of the table right away, then you'll have to do it manually by using the madlib schema. If that seems confusing, we can add a new function to drop the whole table as well.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [madlib] orhankislal commented on a change in pull request #554: DL: Check if the owner of the object table is a superuser

Posted by GitBox <gi...@apache.org>.
orhankislal commented on a change in pull request #554:
URL: https://github.com/apache/madlib/pull/554#discussion_r587272084



##########
File path: src/ports/postgres/modules/utilities/utilities.py_in
##########
@@ -775,6 +775,17 @@ def is_superuser(user):
     return plpy.execute("SELECT rolsuper FROM pg_catalog.pg_roles "\
                         "WHERE rolname = '{0}'".format(user))[0]['rolsuper']
 
+def get_table_owner(schema_table):
+
+    split_table = schema_table.split(".",1)
+    schema = split_table[0]

Review comment:
       There are two aspects at play. The accuracy of the `get_table_owner` utility function and how it is used in `load_custom_function`. 
   
   Let's start with the second point. `load_custom_function` appends the madlib schema to the given parameter no matter what it looks like (madlib_keras_custom_function.py_in, line 71).  When we call the `get_table_owner`, our own prefixed schema name is split and the users' `object_table` argument is tested. This means the user cannot trick us with any schema name because we don't accept any and we don't parse their input, we directly look at the catalog to see if we have an exact match for the table name.
   
   Getting back to the first point, the accuracy of the `get_table_owner` function. If some other future function tries to use this (which is possible since this is a generic utility function, not DL specific) there might be a way to break it since we might not have put the same amount of control on the input. I guess they might have a schema name like "as.df" which will mess up the split. Please note that the existing `get_schema` utility function uses a similar split. 
   
   Considering these two points, I cannot think of an issue for `load_custom_function` unless the dbadmin installed madlib to a schema with `.` in its name. madpack does not allow usage of `.` in the schema name but an admin can rename the `madlib` schema once it is installed. I don't know if madlib even works in this state but I don't think we can account for any arbitrary manipulations of our installation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org