You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/03/17 04:10:05 UTC

[GitHub] [airflow] pohek321 opened a new pull request #22331: Add import_notebook method to databricks hook

pohek321 opened a new pull request #22331:
URL: https://github.com/apache/airflow/pull/22331

<!--
Thank you for contributing! Please make sure that your code changes
are covered with tests. And in case of new features or big changes
remember to adjust the documentation.

Feel free to ping committers for the review!

In case of existing issue, reference it using one of the following:

closes: #22330

How to write a good git commit message:
http://chris.beams.io/posts/git-commit/
-->

---
**^ Add meaningful description above**

Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alex-astronomer commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alex-astronomer commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829275367



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       Looks like _do_api_call returns response.json, so I would take that value and then return it at the end.  So success, or RESOURCE_ALREADY_EXISTS.  Rather than returning the path that the user already has because they passed it in via parameters.  Addresses @eladkal's return question at the same time.
   
   I think catch was the wrong word. Rather than "catching" the exception and running something else in an `except` block, more what I mean is saving that output and somehow reporting that to the user.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alexott commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alexott commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r840396480



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,50 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, dbfs_path: str, raw_code: str, language: str, overwrite: bool = True):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs

Review comment:
       it's incorrect sentence - notebooks aren't stored on DBFS, they are stored in the workspace. To work with DBFS there is an another API

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,50 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, dbfs_path: str, raw_code: str, language: str, overwrite: bool = True):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param dbfs_path: String path on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        # encode notebook
+        encoded_bytes = base64.b64encode(raw_code.encode("utf-8"))
+        encoded_str = str(encoded_bytes, "utf-8")
+
+        # create parent directory if not exists
+        path_parts = dbfs_path.split('/')
+        path_parts.pop(0)
+        path_parts = path_parts[:-1]
+
+        path = ''
+        for part in path_parts:
+            path += f'/{part}'
+        #TODO: Add warning if already exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': path})

Review comment:
       `MKDIRS` can also return `RESOURCE_ALREADY_EXISTS` - see [docs](https://docs.databricks.com/dev-tools/api/latest/workspace.html#mkdirs)

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,50 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, dbfs_path: str, raw_code: str, language: str, overwrite: bool = True):

Review comment:
       Add a format option - besides `SOURCE`, notebooks could be imported as `HTML`, `DBC`, etc. in this case `language` should be optional.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,50 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, dbfs_path: str, raw_code: str, language: str, overwrite: bool = True):

Review comment:
       Rename `dbfs_path` to `workspace_path`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829397551



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):

Review comment:
       Added and tested. Works great!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829560266



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',
+            'content': encodedStr,
+            'language': language,
+            'overwrite': str(overwrite).lower(),
+            'format': format
+        }
+        self._do_api_call(WORKSPACE_IMPORT_ENDPOINT, json)
+
+        return f'/Shared/airflow/{notebook_name}'

Review comment:
       The databricks api endpoints return nothing if the api call is successful. I'm assuming that we want to just return nothing at all instead of the DBFS path. Then, just handle the rest of the potential errors that could be raised from the initial api call. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alex-astronomer commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alex-astronomer commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829203137



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',
+            'content': encodedStr,
+            'language': language,
+            'overwrite': str(overwrite).lower(),
+            'format': format
+        }
+        self._do_api_call(WORKSPACE_IMPORT_ENDPOINT, json)
+
+        return f'/Shared/airflow/{notebook_name}'

Review comment:
       I believe that this function would become more testable and have more functionality if we return the response of the API call.  If we do error returns when the directory or file already exists this would be useful to know as a return from the FN.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] potiuk commented on pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

potiuk commented on pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#issuecomment-1085755424


   checks + docs failing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] eladkal commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

eladkal commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829198970



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")

Review comment:
       Same question/concern




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] potiuk commented on pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

potiuk commented on pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#issuecomment-1079947112


   docs failing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829562804



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',
+            'content': encodedStr,
+            'language': language,
+            'overwrite': str(overwrite).lower(),

Review comment:
       Added exception handling for this scenario.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alex-astronomer commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alex-astronomer commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829197490



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',
+            'content': encodedStr,
+            'language': language,
+            'overwrite': str(overwrite).lower(),

Review comment:
       If overwrite is false and the file exists the import endpoint will return a [RESOURCE_ALREADY_EXISTS error](https://docs.databricks.com/dev-tools/api/latest/workspace.html#import).  Maybe worth catching and printing an INFO message letting the user know that their file was not uploaded.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       This will return an [error if the directory already exists](https://docs.databricks.com/dev-tools/api/latest/workspace.html#mkdirs).  Might be worth catching that error and printing out a DEBUG log saying that the directory already exists.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")

Review comment:
       Nitpick here, but snake case might be more appropriate according to [Style Guide](https://peps.python.org/pep-0008/#function-and-variable-names)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alex-astronomer commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alex-astronomer commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829198860



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       Will be more applicable if the user is able to specify a path to create




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] eladkal commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

eladkal commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829198826



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")

Review comment:
       What happens if you pass unsupported language to databricks? Will the API tell you that the language is not supported?
   
   I would prefer not to maintain our own list of languages because this means that if in the future new languages will be supported users will be blocked from using it due to enforcement from Airflow side.
   

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',
+            'content': encodedStr,
+            'language': language,
+            'overwrite': str(overwrite).lower(),
+            'format': format
+        }
+        self._do_api_call(WORKSPACE_IMPORT_ENDPOINT, json)
+
+        return f'/Shared/airflow/{notebook_name}'

Review comment:
       I don't know databricks but this return is strange to me.
   The user provided `notebook_name` as parameter to this function. What is the benefit in returning this string?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829474103



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")

Review comment:
       I simply removed the `raise ValueError` to see what the error would be in the event an unsupported language was sent to the API call, I just get back the following error: `"error_code":"INVALID_PARAMETER_VALUE"`. @eladkal, Is this sufficient? Or do we want to do further handling?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alex-astronomer commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alex-astronomer commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829398587



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -379,20 +379,27 @@ def import_notebook(self, notebook_name: str, raw_code: str, language: str, over
             raise ValueError(f"results: format must be one of the following: {str(format_options)}")
 
         # encode notebook
-        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
-        encodedStr = str(encodedBytes, "utf-8")
+        encoded_bytes = base64.b64encode(raw_code.encode("utf-8"))
+        encoded_str = str(encoded_bytes, "utf-8")
 
         # create parent directory if not exists
-        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+        path_parts = dbfs_path.split('/')
+        path_parts.pop(0)
+        path_parts = path_parts[:-1]
+
+        path = ''
+        for part in path_parts:
+            path += f'/{part}'
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': path})

Review comment:
       Would be worth testing how this section behaves on different machines.  I know the os.path lib does a lot of work standardizing this and I've had problem in the past with different file systems misbehaving when manipulating paths without os.path tools.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r833703847



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -379,20 +379,27 @@ def import_notebook(self, notebook_name: str, raw_code: str, language: str, over
             raise ValueError(f"results: format must be one of the following: {str(format_options)}")
 
         # encode notebook
-        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
-        encodedStr = str(encodedBytes, "utf-8")
+        encoded_bytes = base64.b64encode(raw_code.encode("utf-8"))
+        encoded_str = str(encoded_bytes, "utf-8")
 
         # create parent directory if not exists
-        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+        path_parts = dbfs_path.split('/')
+        path_parts.pop(0)
+        path_parts = path_parts[:-1]
+
+        path = ''
+        for part in path_parts:
+            path += f'/{part}'
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': path})

Review comment:
       Marking as resolved.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alexott commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alexott commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r840784243



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       `RESOURCE_ALREADY_EXISTS` will be returned if you're trying to create a directory with the same name as existing non-directory object (notebook, for example).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829272016



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       So, I thought this same thing based on Databricks docs, but I've run it on a pre-existing directory and it doesn't fail. Should I still add the catch?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829272016



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       So, I thought this same thing based on Databricks docs, but I've run it and it doesn't fail. Should I still add the catch?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alex-astronomer removed a comment on pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alex-astronomer removed a comment on pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#issuecomment-1070992330


   Can we think of some sort of unit testing to do here?  Honestly, probably not, given that we would just be testing the databricks API calls at that point, but if you start doing some error handling when something already exists that might be worth testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alex-astronomer commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alex-astronomer commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829195376



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):

Review comment:
       Maybe instead of notebook name we let the user define a notebook path so that they can define the directory structure on the FS.  See comment above as well in the json section.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alex-astronomer commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alex-astronomer commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829194925



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',

Review comment:
       Maybe remove the Airflow here, so that users get the opportunity to define a path for themselves if their directory conventions are different.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):

Review comment:
       Maybe instead of notebook name we let the user define a notebook path so that they can define the directory structure on the FS.  See comment below as well in the json section.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alex-astronomer removed a comment on pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alex-astronomer removed a comment on pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#issuecomment-1070992330


   Can we think of some sort of unit testing to do here?  Honestly, probably not, given that we would just be testing the databricks API calls at that point, but if you start doing some error handling when something already exists that might be worth testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829275469



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829610102



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -379,20 +379,27 @@ def import_notebook(self, notebook_name: str, raw_code: str, language: str, over
             raise ValueError(f"results: format must be one of the following: {str(format_options)}")
 
         # encode notebook
-        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
-        encodedStr = str(encodedBytes, "utf-8")
+        encoded_bytes = base64.b64encode(raw_code.encode("utf-8"))
+        encoded_str = str(encoded_bytes, "utf-8")
 
         # create parent directory if not exists
-        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+        path_parts = dbfs_path.split('/')
+        path_parts.pop(0)
+        path_parts = path_parts[:-1]
+
+        path = ''
+        for part in path_parts:
+            path += f'/{part}'
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': path})

Review comment:
       I think we might have a disconnect here. This isn't using the os.path lib, it's just a string. That being said, would it still need to be be tested on different machines?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829397250



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',

Review comment:
       Added and tested. Works great!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829563280



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")

Review comment:
       Added exception handling that refers users to Databricks docs for a list of acceptable values so that we don't have to maintain the list of acceptable values.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829272016



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       So, I thought this same thing based on Databricks docs, but I've run it and it doesn't fail. Should I still add the catch?

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       So, I thought this same thing based on Databricks docs, but I've run it on a pre-existing directory and it doesn't fail. Should I still add the catch?

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")

Review comment:
       Done.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',

Review comment:
       Added and tested. Works great!

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):

Review comment:
       Added and tested. Works great!

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")

Review comment:
       I simply removed the `raise ValueError` to see what the error would be in the event an unsupported language was sent to the API call, I just get back the following error: `"error_code":"INVALID_PARAMETER_VALUE"`. @eladkal, Is this sufficient? Or do we want to do further handling?

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',
+            'content': encodedStr,
+            'language': language,
+            'overwrite': str(overwrite).lower(),
+            'format': format
+        }
+        self._do_api_call(WORKSPACE_IMPORT_ENDPOINT, json)
+
+        return f'/Shared/airflow/{notebook_name}'

Review comment:
       The databricks api endpoints return nothing if the api call is successful. I'm assuming that we want to just return nothing at all instead of the DBFS path. Then, just handle the rest of the potential errors that could be raised from the initial api call. 

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       Despite what the docs say the `RESOURCE_ALREADY_EXISTS` error doesn't appear when I re-run the task. Nothing gets returned from the api endpoint.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',
+            'content': encodedStr,
+            'language': language,
+            'overwrite': str(overwrite).lower(),

Review comment:
       Added exception handling for this scenario.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")

Review comment:
       Added exception handling that refers users to Databricks docs for a list of acceptable values so that we don't have to maintain the list of acceptable values.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")

Review comment:
       After reading the docs, if a language is specified, then format should always be 'SOURCE'. Updated the method to follow this recommendation.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -379,20 +379,27 @@ def import_notebook(self, notebook_name: str, raw_code: str, language: str, over
             raise ValueError(f"results: format must be one of the following: {str(format_options)}")
 
         # encode notebook
-        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
-        encodedStr = str(encodedBytes, "utf-8")
+        encoded_bytes = base64.b64encode(raw_code.encode("utf-8"))
+        encoded_str = str(encoded_bytes, "utf-8")
 
         # create parent directory if not exists
-        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+        path_parts = dbfs_path.split('/')
+        path_parts.pop(0)
+        path_parts = path_parts[:-1]
+
+        path = ''
+        for part in path_parts:
+            path += f'/{part}'
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': path})

Review comment:
       I think we might have a disconnect here. This isn't using the os.path lib, it's just a string. That being said, would it still need to be be tested on different machines?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] potiuk commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

potiuk commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r831183683



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -379,20 +379,27 @@ def import_notebook(self, notebook_name: str, raw_code: str, language: str, over
             raise ValueError(f"results: format must be one of the following: {str(format_options)}")
 
         # encode notebook
-        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
-        encodedStr = str(encodedBytes, "utf-8")
+        encoded_bytes = base64.b64encode(raw_code.encode("utf-8"))
+        encoded_str = str(encoded_bytes, "utf-8")
 
         # create parent directory if not exists
-        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+        path_parts = dbfs_path.split('/')
+        path_parts.pop(0)
+        path_parts = path_parts[:-1]
+
+        path = ''
+        for part in path_parts:
+            path += f'/{part}'
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': path})

Review comment:
       Yeah. I think dbfs path on databricks FS is "standardized" with '/` so there is no need to use pathlib for those.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] boring-cyborg[bot] commented on pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

boring-cyborg[bot] commented on pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#issuecomment-1070305458

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:
- Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
- In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
- Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
- Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
- Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
- Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: dev@airflow.apache.org
Slack: https://s.apache.org/airflow-slack

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829562040



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       Despite what the docs say the `RESOURCE_ALREADY_EXISTS` error doesn't appear when I re-run the task. Nothing gets returned from the api endpoint.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] pohek321 commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

pohek321 commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829564016



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")

Review comment:
       After reading the docs, if a language is specified, then format should always be 'SOURCE'. Updated the method to follow this recommendation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] eladkal commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

eladkal commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829198970



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")

Review comment:
       Same question/concern

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")

Review comment:
       What happens if you pass unsupported language to databricks? Will the API tell you that the language is not supported?
   
   I would prefer not to maintain our own list of languages because this means that if in the future new languages will be supported users will be blocked from using it due to enforcement from Airflow side.
   

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',
+            'content': encodedStr,
+            'language': language,
+            'overwrite': str(overwrite).lower(),
+            'format': format
+        }
+        self._do_api_call(WORKSPACE_IMPORT_ENDPOINT, json)
+
+        return f'/Shared/airflow/{notebook_name}'

Review comment:
       I don't know databricks but this return is strange to me.
   The user provided `notebook_name` as parameter to this function. What is the benefit in returning this string?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] alex-astronomer commented on a change in pull request #22331: Add import_notebook method to databricks hook

Posted by GitBox <gi...@apache.org>.

alex-astronomer commented on a change in pull request #22331:
URL: https://github.com/apache/airflow/pull/22331#discussion_r829194925



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',

Review comment:
       Maybe remove the Airflow here, so that users get the opportunity to define a path for themselves if their directory conventions are different.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):

Review comment:
       Maybe instead of notebook name we let the user define a notebook path so that they can define the directory structure on the FS.  See comment below as well in the json section.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',
+            'content': encodedStr,
+            'language': language,
+            'overwrite': str(overwrite).lower(),

Review comment:
       If overwrite is false and the file exists the import endpoint will return a [RESOURCE_ALREADY_EXISTS error](https://docs.databricks.com/dev-tools/api/latest/workspace.html#import).  Maybe worth catching and printing an INFO message letting the user know that their file was not uploaded.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       This will return an [error if the directory already exists](https://docs.databricks.com/dev-tools/api/latest/workspace.html#mkdirs).  Might be worth catching that error and printing out a DEBUG log saying that the directory already exists.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")

Review comment:
       Nitpick here, but snake case might be more appropriate according to [Style Guide](https://peps.python.org/pep-0008/#function-and-variable-names)

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       Will be more applicable if the user is able to specify a path to create

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+
+        # upload notebook
+        json = {
+            'path': f'/Shared/airflow/{notebook_name}',
+            'content': encodedStr,
+            'language': language,
+            'overwrite': str(overwrite).lower(),
+            'format': format
+        }
+        self._do_api_call(WORKSPACE_IMPORT_ENDPOINT, json)
+
+        return f'/Shared/airflow/{notebook_name}'

Review comment:
       I believe that this function would become more testable and have more functionality if we return the response of the API call.  If we do error returns when the directory or file already exists this would be useful to know as a return from the FN.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):
+        """
+        Import a local notebook from Airflow into Databricks FS. Notebooks saved to /Shared/airflow dbfs
+
+        Utility function to call the ``2.0/workspace/import`` endpoint.
+
+        :param notebook_name: String name of notebook on Databricks FS
+        :param raw_code: String of non-encoded code
+        :param language: Use one of the following strings 'SCALA', 'PYTHON', 'SQL', OR 'R'
+        :param overwrite: Boolean flag specifying whether to overwrite existing object. It is true by default
+        :return: full dbfs notebook path
+        """
+        #enforce language options
+        language_options = ['SCALA', 'PYTHON', 'SQL', 'R']
+        if language.upper() not in language_options:
+            raise ValueError(f"results: language must be one of the following: {str(language_options)}")
+
+        # enforce format options
+        format_options = ['SOURCE', 'HTML', 'JUPYTER', 'DBC']
+        if format.upper() not in format_options:
+            raise ValueError(f"results: format must be one of the following: {str(format_options)}")
+
+        # encode notebook
+        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
+        encodedStr = str(encodedBytes, "utf-8")
+
+        # create parent directory if not exists
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})

Review comment:
       Looks like _do_api_call returns response.json, so I would take that value and then return it at the end.  So success, or RESOURCE_ALREADY_EXISTS.  Rather than returning the path that the user already has because they passed it in via parameters.  Addresses @eladkal's return question at the same time.
   
   I think catch was the wrong word. Rather than "catching" the exception and running something else in an `except` block, more what I mean is saving that output and somehow reporting that to the user.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -352,3 +355,44 @@ def get_repo_by_path(self, path: str) -> Optional[str]:
             return str(result['object_id'])
 
         return None
+
+    def import_notebook(self, notebook_name: str, raw_code: str, language: str, overwrite: bool = True, format: str = 'SOURCE'):

Review comment:
       Maybe instead of notebook name we let the user define a notebook path so that they can define the directory structure on the FS.  See comment above as well in the json section.

##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -379,20 +379,27 @@ def import_notebook(self, notebook_name: str, raw_code: str, language: str, over
             raise ValueError(f"results: format must be one of the following: {str(format_options)}")
 
         # encode notebook
-        encodedBytes = base64.b64encode(raw_code.encode("utf-8"))
-        encodedStr = str(encodedBytes, "utf-8")
+        encoded_bytes = base64.b64encode(raw_code.encode("utf-8"))
+        encoded_str = str(encoded_bytes, "utf-8")
 
         # create parent directory if not exists
-        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': "/Shared/airflow"})
+        path_parts = dbfs_path.split('/')
+        path_parts.pop(0)
+        path_parts = path_parts[:-1]
+
+        path = ''
+        for part in path_parts:
+            path += f'/{part}'
+        self._do_api_call(WORKSPACE_MKDIR_ENDPOINT, {'path': path})

Review comment:
       Would be worth testing how this section behaves on different machines.  I know the os.path lib does a lot of work standardizing this and I've had problem in the past with different file systems misbehaving when manipulating paths without os.path tools.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org