You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/08 20:18:37 UTC

[GitHub] [airflow] dstandish opened a new pull request, #25613: User must supply the % in uri pattern

dstandish opened a new pull request, #25613:
URL: https://github.com/apache/airflow/pull/25613

   This way you can do one-sided `ilike`, or exact match.
   
   Note this differs from how it's done with dag_id, but this is a different endpoint so I don't think it matters.  The other behavior is not good because it unnecessarily limits the functionality for users.
   
   cc @blag 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1209776812

   I don't really care too much either way .... But wanted to at least put this up for discussion because it just seems more logistical to me.  To me the point here is not supporting equality -- that's a side effect. The point is just allowing flexibility for a user that is trying to query datasets.  I don't really think it matters how the dag endpoint behaves ... It's a different endpoint and the docs can clarify the behavior.  But I'm no expert in API design.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1210150184

   > I’m not a fan of “pattern” meaning different things either, but do we have an efficient way to implement these things and unify the syntax? Implementing the glob syntax means we need to fetch all the entries from the database and iterate on them in Python code.
   
   well, i would assume that @blag's suggestion would be to convert `*` to `%` and then just run `ilike`
   
   i think loading to memory is a non-starter.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1209781677

   > Since APIs really shouldn't remove features, I kind of think we should wait for a user to request this before implementing it.
   
   If we were to add it later it would be a breaking change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
uranusjr commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1210145707

   I’m not a fan of “pattern” meaning different things either, but do we have an efficient way to implement these things and unify the syntax? Implementing the glob syntax means we need to fetch all the entries from the database and iterate on them in Python code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jedcunningham commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1209888825

   > I don't really think it matters how the dag endpoint behaves ... It's a different endpoint and the docs can clarify the behavior.
   
   This is my biggest objection, personally. Having `{whatever}_pattern`'s behave differently just seems confusing 🤷‍♂️


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
uranusjr commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1210098097

   > > How do we handle compatibility issues for this?
   > What do you mean?
   
   For users using this endpoint, what do they need to do to keep their usage to work? They now wouldn’t add the `%` prefix and suffix, but will need to.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on code in PR #25613:
URL: https://github.com/apache/airflow/pull/25613#discussion_r942123509


##########
airflow/www/static/js/types/api-generated.ts:
##########
@@ -3604,7 +3604,10 @@ export interface operations {
          * *New in version 2.1.0*
          */
         order_by?: components["parameters"]["OrderBy"];
-        /** If set, only return datasets with uris matching this pattern. */
+        /**
+         * If set, only return datasets with URIs matching this pattern using a string pattern
+         * to be evaluated as a SQL-compliant case-insensive LIKE expression.

Review Comment:
   ```suggestion
            * to be evaluated as a SQL-compliant case-insensitive LIKE expression.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1209488463

   > How do we handle compatibility issues for this?
   
   What do you mean?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] blag commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
blag commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1209769001

   I think I'm against this, but I can be argued otherwise.
   
   I'm a little skeptical that giving users the flexibility to specify the, effectively, globbing pattern is going to be super useful over just doing a containment search like the existing code does. Since APIs really shouldn't remove features, I kind of think we should wait for a user to request this before implementing it.
   
   However, the `_pattern` query param suffix indicates that the user should supply a _pattern_ for a URI, and the existing code just has them specifying a URI fragment, so that's not 100% ideal.
   
   But I'm also not a huge fan of putting SQL syntax into query parameters, or making this inconsistent with how [`dag_id_pattern` works](https://github.com/apache/airflow/blob/c954ce2126430ea3ebb56ac0a51a4762fb6d27c3/airflow/api_connexion/endpoints/dag_endpoint.py#L85). I would probably implement something more similar to globbing, and convert `*` at the beginning and/or end of the uri_pattern to `%` on the backend.
   
   I do think we should support exact matches though. What if we did something like how google works, and had more fuzzy matching on by default (eg: the current behavior), but allowed users to specify the URL in quotes for an exact match?
   
   Example:
   ```
   GET /api/v1/datasets?uri_pattern=s3  # Get all datasets that resolve to an s3 bucket (kinda sorta)
   
   GET /api/v1/datasets?uri_pattern=%22s3://...%22  # Get the dataset with URI exactly matching `s3://...`
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1210031952

   > In practice, I'm skeptical that it really will. The only thing it might break is searches that users do where the search term itself is wrapped in quotes, and I suspect that that's already unlikely in URIs.
   > 
   > @dstandish What do you see it breaking?
   
   Oh I thought you were talking about removing percent later :) never mind


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bbovenzi commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
bbovenzi commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1211044196

   So the need for me was to be able to look up a dataset by its uri. Hence an exact match, this way worked but I think there's a better approach.
   
   This endpoint should stay as-is (hence why this PR is closed), and here `uri_pattern` is a filter
   But, we should change `/datasets/{id}` to `/datasets/{uri}` and do an exact match there. We do have precedence for this. Pool lookup is by `pool_name`, not id. I think we should avoid ever exposing numerical ids in the REST API or the UI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bbovenzi commented on a diff in pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
bbovenzi commented on code in PR #25613:
URL: https://github.com/apache/airflow/pull/25613#discussion_r941589863


##########
airflow/api_connexion/endpoints/dataset_endpoint.py:
##########
@@ -70,7 +70,7 @@ def get_datasets(
     total_entries = session.query(func.count(Dataset.id)).scalar()
     query = session.query(Dataset)
     if uri_pattern:
-        query = query.filter(Dataset.uri.ilike(f"%{uri_pattern}%"))
+        query = query.filter(Dataset.uri.ilike(uri_pattern))

Review Comment:
   Could we add a description in the API yaml docs to tell users to add `%` for less strict equality lookups?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on a diff in pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
dstandish commented on code in PR #25613:
URL: https://github.com/apache/airflow/pull/25613#discussion_r941641148


##########
airflow/api_connexion/endpoints/dataset_endpoint.py:
##########
@@ -70,7 +70,7 @@ def get_datasets(
     total_entries = session.query(func.count(Dataset.id)).scalar()
     query = session.query(Dataset)
     if uri_pattern:
-        query = query.filter(Dataset.uri.ilike(f"%{uri_pattern}%"))
+        query = query.filter(Dataset.uri.ilike(uri_pattern))

Review Comment:
   i added some description for the API... please take a look



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1210952644

   OK i close this one now since sounds like we will go another direction....


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
uranusjr commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1209077823

   How do we handle compatibility issues for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] blag commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
blag commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1210032680

   No worries, apologies if I didn't explain what I was thinking very well. Cheers! :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ephraimbuddy commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1210296411

   Won't it be better to have a different query param as suggested by @jedcunningham [here](https://github.com/apache/airflow/pull/25411#pullrequestreview-1056035246), if we want exact match? Having users include `%` doesn't sound like a good UX. Converting `%` to `*` sounds ok but I vote to have this endpoint work like the `dag_id_pattern` instead


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] blag commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
blag commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1210929951

   > well, i would assume that @blag's suggestion would be to convert * to % and then just run ilike
   
   Yes, this. 👆
   
   But like I said, I think we should use a different query parameter for exact matches.
   
   In talking with @ashb, he was thinking that for the `/dataset/<dataset_id_or_uri>` endpoint, we could try to coerce `dataset_id_or_uri` to an integer, and if successful, grab the dataset with that `dataset_id`. If the coercion isn't successful, treat `dataset_id_or_uri` like a string and use exact matching to grab the dataset with that `uri`.
   
   The only issue with that is if users create datasets named after integers. In which case it will be impossible to query for them them by exact matching their `uri` in the API. But OTOH, it seems like a Bad Idea to name datasets as integers, so users probably won't do that in the first place.
   
   Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] blag commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
blag commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1210024220

   > > Since APIs really shouldn't remove features, I kind of think we should wait for a user to request this before implementing it.
   > 
   > If we were to add it later it would be a breaking change
   
   In practice, I'm skeptical that it really will. The only thing it might break is searches that users do where the search term itself is wrapped in quotes, and I suspect that that's already unlikely in URIs.
   
   @dstandish What do you see it breaking?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on code in PR #25613:
URL: https://github.com/apache/airflow/pull/25613#discussion_r942122925


##########
airflow/api_connexion/openapi/v1.yaml:
##########
@@ -1654,7 +1654,8 @@ paths:
             type: string
           required: false
           description: |
-            If set, only return datasets with uris matching this pattern.
+            If set, only return datasets with URIs matching this pattern using a string pattern
+            to be evaluated as a SQL-compliant case-insensive LIKE expression.

Review Comment:
   ```suggestion
               to be evaluated as a SQL-compliant case-insensitive LIKE expression.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1210124087

   > > > How do we handle compatibility issues for this?
   > > 
   > > 
   > > What do you mean?
   > 
   > For users using this endpoint, what do they need to do to keep their usage to work? They now wouldn’t add the `%` prefix and suffix, but will need to.
   
   Ah i understand what you mean.  Well the thing is, this endpoint has never been in a release -- it's new in 2.4, so there is no backcompat issue.  It's just a question of us deciding what's the better behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
dstandish commented on PR #25613:
URL: https://github.com/apache/airflow/pull/25613#issuecomment-1210131859

   so, given that backcompat is non-issue ... do you have an opinion on what's the right behavior? because we have mixed opinions on the team :) jed feels one way, i feel another, drew has a third way, and brent just wants his data :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish closed pull request #25613: User must supply the % in uri pattern

Posted by GitBox <gi...@apache.org>.
dstandish closed pull request #25613: User must supply the % in uri pattern
URL: https://github.com/apache/airflow/pull/25613


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org