You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/03/08 21:55:54 UTC

[GitHub] [superset] john-bodley commented on a change in pull request #19055: feat: helper functions for RLS

john-bodley commented on a change in pull request #19055:
URL: https://github.com/apache/superset/pull/19055#discussion_r822090110



##########
File path: superset/sql_parse.py
##########
@@ -458,3 +460,178 @@ def validate_filter_clause(clause: str) -> None:
                 )
     if open_parens > 0:
         raise QueryClauseValidationException("Unclosed parenthesis in filter clause")
+
+
+def has_table_query(statement: Statement) -> bool:
+    """
+    Return if a stament has a query reading from a table.
+
+        >>> has_table_query(sqlparse.parse("COUNT(*)")[0])
+        False
+        >>> has_table_query(sqlparse.parse("SELECT * FROM table")[0])
+        True
+
+    Note that queries reading from constant values return false:
+
+        >>> has_table_query(sqlparse.parse("SELECT * FROM (SELECT 1)")[0])
+        False
+
+    """
+    seen_source = False
+    tokens = statement.tokens[:]
+    while tokens:
+        token = tokens.pop(0)
+        if isinstance(token, TokenList):
+            tokens.extend(token.tokens)
+
+        if token.ttype == Keyword and token.value.lower() in ("from", "join"):

Review comment:
       ```suggestion
           if imt(token, m=[(Keyword, "FROM"), (Keyword, "JOIN")]):
   ```
   
   Note this would require the `from sqlparse.utils import imt` import.

##########
File path: superset/sql_parse.py
##########
@@ -458,3 +460,178 @@ def validate_filter_clause(clause: str) -> None:
                 )
     if open_parens > 0:
         raise QueryClauseValidationException("Unclosed parenthesis in filter clause")
+
+
+def has_table_query(statement: Statement) -> bool:

Review comment:
       @betodealmeida there's also [this](https://github.com/andialbrecht/sqlparse/blob/master/examples/extract_table_names.py) example which has logic for identifying tables.

##########
File path: superset/sql_parse.py
##########
@@ -458,3 +460,178 @@ def validate_filter_clause(clause: str) -> None:
                 )
     if open_parens > 0:
         raise QueryClauseValidationException("Unclosed parenthesis in filter clause")
+
+
+def has_table_query(statement: Statement) -> bool:
+    """
+    Return if a stament has a query reading from a table.
+
+        >>> has_table_query(sqlparse.parse("COUNT(*)")[0])
+        False
+        >>> has_table_query(sqlparse.parse("SELECT * FROM table")[0])
+        True
+
+    Note that queries reading from constant values return false:
+
+        >>> has_table_query(sqlparse.parse("SELECT * FROM (SELECT 1)")[0])
+        False
+
+    """
+    seen_source = False
+    tokens = statement.tokens[:]
+    while tokens:

Review comment:
       You likely could just do `for token in stmt.flatten():` and remove the logic from lines 483–485.

##########
File path: superset/sql_parse.py
##########
@@ -458,3 +460,178 @@ def validate_filter_clause(clause: str) -> None:
                 )
     if open_parens > 0:
         raise QueryClauseValidationException("Unclosed parenthesis in filter clause")
+
+
+def has_table_query(statement: Statement) -> bool:
+    """
+    Return if a stament has a query reading from a table.
+
+        >>> has_table_query(sqlparse.parse("COUNT(*)")[0])
+        False
+        >>> has_table_query(sqlparse.parse("SELECT * FROM table")[0])
+        True
+
+    Note that queries reading from constant values return false:
+
+        >>> has_table_query(sqlparse.parse("SELECT * FROM (SELECT 1)")[0])
+        False
+
+    """
+    seen_source = False
+    tokens = statement.tokens[:]
+    while tokens:
+        token = tokens.pop(0)
+        if isinstance(token, TokenList):
+            tokens.extend(token.tokens)
+
+        if token.ttype == Keyword and token.value.lower() in ("from", "join"):
+            seen_source = True
+        elif seen_source and (
+            isinstance(token, sqlparse.sql.Identifier) or token.ttype == Keyword
+        ):
+            return True
+        elif seen_source and token.ttype not in (Whitespace, Punctuation):
+            seen_source = False
+
+    return False
+
+
+def add_table_name(rls: TokenList, table: str) -> None:
+    """
+    Modify a RLS expression ensuring columns are fully qualified.
+    """
+    tokens = rls.tokens[:]
+    while tokens:

Review comment:
       You likely could use `flatten` here. It uses a generator so likely a copy should be made given you're mutating the tokens, i.e., 
   
   ```python
   for token in list(rls.flatten()):
       if imt(token, i=Identifier) and token.get_parent_name() is None:
           ...
   ```

##########
File path: superset/sql_parse.py
##########
@@ -458,3 +460,178 @@ def validate_filter_clause(clause: str) -> None:
                 )
     if open_parens > 0:
         raise QueryClauseValidationException("Unclosed parenthesis in filter clause")
+
+
+def has_table_query(statement: Statement) -> bool:
+    """
+    Return if a stament has a query reading from a table.
+
+        >>> has_table_query(sqlparse.parse("COUNT(*)")[0])
+        False
+        >>> has_table_query(sqlparse.parse("SELECT * FROM table")[0])
+        True
+
+    Note that queries reading from constant values return false:
+
+        >>> has_table_query(sqlparse.parse("SELECT * FROM (SELECT 1)")[0])
+        False
+
+    """
+    seen_source = False
+    tokens = statement.tokens[:]
+    while tokens:
+        token = tokens.pop(0)
+        if isinstance(token, TokenList):
+            tokens.extend(token.tokens)
+
+        if token.ttype == Keyword and token.value.lower() in ("from", "join"):
+            seen_source = True
+        elif seen_source and (

Review comment:
       The challenge here is there's no strong connection to ensure that the consecutive (or near consecutive) tokens are those which are being identified here. I guess the question is how robust do we want this logic. The proposed solution may well we suffice.
   
   The _correct_ way of doing this is more of a tree traversal (as opposed to a flattened list) where one checks the next token (which could be a group) from the `FROM` or `JOIN` keyword and iterate from there.
   
   My sense is that can likely be addressed later. We probably need to cleanup the `sqlparse` logic to junk it completely in favor of something else given that it seems like the package is somewhat on life support.

##########
File path: superset/sql_parse.py
##########
@@ -458,3 +460,178 @@ def validate_filter_clause(clause: str) -> None:
                 )
     if open_parens > 0:
         raise QueryClauseValidationException("Unclosed parenthesis in filter clause")
+
+
+def has_table_query(statement: Statement) -> bool:
+    """
+    Return if a stament has a query reading from a table.
+
+        >>> has_table_query(sqlparse.parse("COUNT(*)")[0])
+        False
+        >>> has_table_query(sqlparse.parse("SELECT * FROM table")[0])
+        True
+
+    Note that queries reading from constant values return false:
+
+        >>> has_table_query(sqlparse.parse("SELECT * FROM (SELECT 1)")[0])
+        False
+
+    """
+    seen_source = False
+    tokens = statement.tokens[:]
+    while tokens:
+        token = tokens.pop(0)
+        if isinstance(token, TokenList):
+            tokens.extend(token.tokens)
+
+        if token.ttype == Keyword and token.value.lower() in ("from", "join"):
+            seen_source = True
+        elif seen_source and (
+            isinstance(token, sqlparse.sql.Identifier) or token.ttype == Keyword
+        ):
+            return True
+        elif seen_source and token.ttype not in (Whitespace, Punctuation):
+            seen_source = False
+
+    return False
+
+
+def add_table_name(rls: TokenList, table: str) -> None:

Review comment:
       Should there be specific unit tests—docstring might be suffice—for this function?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org