You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/22 12:48:54 UTC

[GitHub] [airflow] aru-trackunit opened a new issue, #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

aru-trackunit opened a new issue, #25232:
URL: https://github.com/apache/airflow/issues/25232

   ### Apache Airflow version
   
   2.2.2
   
   ### What happened
   
   When using `apache-airflow-providers-databricks` in version 2.2.0 I am sending a request to databricks to submit a job.
   https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsCreate -> `api/2.0/jobs/runs/submit`
   
   Databricks is expecting a boolean on a property `enable_elastic_disk` while `airflow-databricks-provider` sends a string.
   
   ```
   new_cluster = {
       "autoscale": {"min_workers": 2, "max_workers": 5},
       "spark_version": "10.4.x-scala2.12",
       "aws_attributes": {
           "first_on_demand": 1,
           "availability": "SPOT_WITH_FALLBACK",
           "zone_id": "auto",
           "spot_bid_price_percent": 100,
       },
       "enable_elastic_disk": True,  # for some reason this property is not picked up by Databricks
       "driver_node_type_id": "r5a.large",
       "node_type_id": "c5a.xlarge",
       "cluster_source": "JOB",
   }
   ```
   
   And the property `enable_elastic_disk` is not set on databricks side. I did also the same request to databricks from a Postman and the property was set to `true` which means that the problem does not lie on databricks side.
   ```
   {
       "name": "test",
       "tasks": [
           {
               "task_key": "test-task-key",
               "notebook_task": {
                   "notebook_path": "path_to_notebook"
               },
               "new_cluster": {
                   "autoscale": {"min_workers": 1, "max_workers": 2},
                   "cluster_name": "",
                   "spark_version": "10.4.x-scala2.12",    
                   "aws_attributes": {
                       "first_on_demand": 1,
                       "availability": "SPOT_WITH_FALLBACK",
                       "zone_id": "auto",        
                       "spot_bid_price_percent": 100                    
                   },
                   "driver_node_type_id": "r5a.large",
                   "node_type_id": "c5a.xlarge",
                   "enable_elastic_disk": true,                     
                   "cluster_source": "JOB"                
               }
           }
       ]    
   }
   ```
   
   I have tried to find the problem and it apparently is this line. Before executing the line `enable_elastic_disk` is `True` of type boolean but after it becomes a string `'True'` which databricks does not parse.
   https://github.com/apache/airflow/blob/1cb16d5588306fcb7177486dc60c1974ea3034d4/airflow/providers/databricks/operators/databricks.py#L381
   
   
   ### What you think should happen instead
   
   After setting property `enable_elastic_disk` it should be propagated into databricks but it's not.
   
   ### How to reproduce
   
   Try to run:
   
   ```
   new_cluster = {
       "autoscale": {"min_workers": 2, "max_workers": 5},
       "spark_version": "10.4.x-scala2.12",
       "aws_attributes": {
           "first_on_demand": 1,
           "availability": "SPOT_WITH_FALLBACK",
           "zone_id": "auto",
           "spot_bid_price_percent": 100,
       },
       "enable_elastic_disk": True,  # for some reason this property is not picked up by Databricks
       "driver_node_type_id": "r5a.large",
       "node_type_id": "c5a.xlarge",
       "cluster_source": "JOB",
   }
   
   notebook_task = {
       "notebook_path": f"/Repos/path_to_notebook"/main_asset_information",
       "base_parameters": {"env": env},
   }
   
   
   asset_information = DatabricksSubmitRunOperator(
           task_id="task_id"
           databricks_conn_id="databricks",
           new_cluster=new_cluster,
           notebook_task=notebook_task,
       )
   ```
   
   Make sure airflow connection named `databricks` is set and check whether databricks has the property set.
   
   After executing there is a need to check whether the property is set on databricks we can do it by using endpoint:
   `https://DATABRICKS_HOST/api/2.1/jobs/runs/get?run_id=123`
   
   ### Operating System
   
   MWWA
   
   ### Versions of Apache Airflow Providers
   
   `apache-airflow-providers-databricks` in version 2.2.0
   
   ### Deployment
   
   MWAA
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   That's a permanent and repeatable problem. It would be great if this fix could be attached to lower versions for example `2.2.1`, because I am not sure when AWS decides to upgrade to the latest airflow code and I am also not sure if installing higher versions of databricks provider on airflow `2.2.2` will not cause issues.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jgr-trackunit commented on issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

Posted by GitBox <gi...@apache.org>.
jgr-trackunit commented on issue #25232:
URL: https://github.com/apache/airflow/issues/25232#issuecomment-1193648498

   Hi, I will provide fix for that issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jgr-trackunit commented on issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

Posted by GitBox <gi...@apache.org>.
jgr-trackunit commented on issue #25232:
URL: https://github.com/apache/airflow/issues/25232#issuecomment-1194358160

   @potiuk Could you add me to the contributors list? I can't push my local dev branch.
   Thanks in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jgr-trackunit commented on issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

Posted by GitBox <gi...@apache.org>.
jgr-trackunit commented on issue #25232:
URL: https://github.com/apache/airflow/issues/25232#issuecomment-1199085960

   Hi @potiuk, Hi @alexott,
   The PR has been created: https://github.com/apache/airflow/pull/25394 , could you take a look on that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alexott commented on issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

Posted by GitBox <gi...@apache.org>.
alexott commented on issue #25232:
URL: https://github.com/apache/airflow/issues/25232#issuecomment-1194360328

   @jgr-trackunit you need to create your own fork, and create PR from it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25232:
URL: https://github.com/apache/airflow/issues/25232#issuecomment-1192937939

   I marked it as "good first issue" but that's as much as I can do. Same with "eariler version" - if there is somone who commits to cherry-picking the fix and preparing an earlier version of  the provider, they are free to make a PR so there must be someone who will take care of it  The most cretain way is to just roll sleevs up and do it. See https://github.com/apache/airflow#release-process-for-providers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25232:
URL: https://github.com/apache/airflow/issues/25232#issuecomment-1192936430

   Basically - things get implemented here when someone implements it. Airflow is created by > 2100 people (most of them like you - users) so if you want to make sure a problem is fixed timely, the best way is to make a PR - otherwise it will have to wait for someone who will pick it up and implement. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks
URL: https://github.com/apache/airflow/issues/25232


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25232:
URL: https://github.com/apache/airflow/issues/25232#issuecomment-1192935336

   Maybe you can provide a PR fixing that? There are Databricks people here (for example @alexott ) that can do some review and double check it. Shall we assign it to you @aru-trackunit ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25232:
URL: https://github.com/apache/airflow/issues/25232#issuecomment-1193882569

   > Hi, I will provide fix for that issue.
   
   Cool!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #25232:
URL: https://github.com/apache/airflow/issues/25232#issuecomment-1192539222

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25232: enable_elastic_disk property incorrectly mapped when making a request to Databricks

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25232:
URL: https://github.com/apache/airflow/issues/25232#issuecomment-1194557467

   Yep. See https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst for all details about contribution (including the need to create a fork).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org