You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2020/05/14 16:22:15 UTC
[GitHub] [incubator-superset] etr2460 commented on a change in pull request #9734: Implement csv upload configuration func for the schema enforcement

etr2460 commented on a change in pull request #9734:
URL: https://github.com/apache/incubator-superset/pull/9734#discussion_r425259829



##########
File path: superset/config.py
##########
@@ -586,11 +586,24 @@ class CeleryConfig:  # pylint: disable=too-few-public-methods
 # The directory within the bucket specified above that will
 # contain all the external tables
 CSV_TO_HIVE_UPLOAD_DIRECTORY = "EXTERNAL_HIVE_TABLES/"
+# Function that creates upload directory dynamically based on the
+# database used, user and schema provided.
+CSV_TO_HIVE_UPLOAD_DIRECTORY_FUNC: Callable[
+    ["Database", "models.User", str], Optional[str]
+] = lambda database, user, schema: CSV_TO_HIVE_UPLOAD_DIRECTORY
 
 # The namespace within hive where the tables created from
 # uploading CSVs will be stored.
+# Make it a func, e.g. /hive/warehouse/<dbname>.db/<tablename>

Review comment:
       Not sure what this comment means, if this config var should be a func, then maybe we should type it as such?

##########
File path: superset/config.py
##########
@@ -586,11 +586,24 @@ class CeleryConfig:  # pylint: disable=too-few-public-methods
 # The directory within the bucket specified above that will
 # contain all the external tables
 CSV_TO_HIVE_UPLOAD_DIRECTORY = "EXTERNAL_HIVE_TABLES/"
+# Function that creates upload directory dynamically based on the
+# database used, user and schema provided.
+CSV_TO_HIVE_UPLOAD_DIRECTORY_FUNC: Callable[
+    ["Database", "models.User", str], Optional[str]
+] = lambda database, user, schema: CSV_TO_HIVE_UPLOAD_DIRECTORY
 
 # The namespace within hive where the tables created from
 # uploading CSVs will be stored.
+# Make it a func, e.g. /hive/warehouse/<dbname>.db/<tablename>
 UPLOADED_CSV_HIVE_NAMESPACE = None
 
+# database
+ALLOWED_USER_CSV_SCHEMA_FUNC: Callable[
+    ["Database", "models.User"], List[str]
+] = lambda database, user: [
+    UPLOADED_CSV_HIVE_NAMESPACE  # type: ignore

Review comment:
       I think we could remove the type ignore here if you define a type above too

##########
File path: superset/models/core.py
##########
@@ -603,6 +603,11 @@ def get_foreign_keys(
     def get_schema_access_for_csv_upload(  # pylint: disable=invalid-name
         self,
     ) -> List[str]:
+        if hasattr(g, "user"):
+            allowed_databases = config["ALLOWED_USER_CSV_SCHEMA_FUNC"](self, g.user)
+            if allowed_databases:
+                return allowed_databases

Review comment:
       Should we be returning a union of this and the array from `extra`?

##########
File path: superset/db_engine_specs/hive.py
##########
@@ -179,6 +171,7 @@ def convert_to_hive_type(col_type: str) -> str:
             bucket_path,
             os.path.join(upload_prefix, table_name, os.path.basename(filename)),
         )
+        # TODO(bkyryliuk): support other cvs delimiters

Review comment:
       typo: `other csv delimiters` 
   
   although with CSV meaning "comma separated values", this might be better as `support other delimiters`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org