You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/25 07:25:00 UTC

[GitHub] [airflow] erparas opened a new issue #15001: S3MultipleKeysSensor operator

erparas opened a new issue #15001:
URL: https://github.com/apache/airflow/issues/15001


   **Description**
   
   Currently we have an operator, S3KeySensor which polls for the given prefix in the bucket. At times, there is need to poll for multiple prefixes in given bucket in one go. To have that - I propose to have a S3MultipleKeysSensor, which would poll for multiple prefixes in the given bucket in one go.
   
   **Use case / motivation**
   
   To make it easier for users to multiple S3 prefixes in a given bucket.
   
   **Are you willing to submit a PR?**
   
   Yes, I have an implementation ready for that.
   
   **Related Issues**
   
   NA
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #15001:
URL: https://github.com/apache/airflow/issues/15001


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #15001:
URL: https://github.com/apache/airflow/issues/15001#issuecomment-806509789


   Must we have a new sensor for it?
   Couldn't the current [S3PrefixSensor](https://github.com/apache/airflow/blob/775ee51d0e58aeab5d29683dd2ff21b8c9057095/airflow/providers/amazon/aws/sensors/s3_prefix.py#L36) be enhanced by changing `prefix` to accept `str` or `List[str]` that way it also backward compatible.
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SevakAvetGD commented on issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
SevakAvetGD commented on issue #15001:
URL: https://github.com/apache/airflow/issues/15001#issuecomment-818151138


   I'd more towards modifying existing operator and making it accept list of prefixes.
   Also, the expected behaviour is that poke() will return True only if all keys exist, right?
   Would that be useful to have sort of any(..) (when at least one key exist) instead of all(..)?
   
   Also, this can be implemented as such:
    ```
   return all(hook.check_for_key(key, self.bucket_name) for key in key_list)
    ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] AmarEL commented on issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
AmarEL commented on issue #15001:
URL: https://github.com/apache/airflow/issues/15001#issuecomment-807891419


   We created the same sensor here to look for specific keys in the s3.
   In our case, since the prefix was different, we did it on top of the `check_for_key` from s3 hook class (The same used by the S3KeySensor).
   ```
           key_list = context['task_instance'].xcom_pull(task_ids='get_s3_keys_of_extracted_tables', key='s3_parquet_keys')
           hook = S3Hook(aws_conn_id=self.aws_conn_id, verify=self.verify)
           for key in key_list:
               self.log.info('Poking for key: s3://%s/%s', self.bucket_name, key)
               if not hook.check_for_key(key, self.bucket_name):
                   return False
   ```
   
   Make `S3KeySensor` to accept str or List[str] would work too.
   Btw, apply this behavior for the three current sensor for s3 would be nice


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] AmarEL edited a comment on issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
AmarEL edited a comment on issue #15001:
URL: https://github.com/apache/airflow/issues/15001#issuecomment-807891419


   We created the same sensor here to look for specific keys in the s3.
   In our case, since the prefix was different, we did it on top of the `check_for_key` from the s3 hook class (The same used by the S3KeySensor).
   ```
           key_list = context['task_instance'].xcom_pull(task_ids='get_s3_keys_of_extracted_tables', key='s3_parquet_keys')
           hook = S3Hook(aws_conn_id=self.aws_conn_id, verify=self.verify)
           for key in key_list:
               self.log.info('Poking for key: s3://%s/%s', self.bucket_name, key)
               if not hook.check_for_key(key, self.bucket_name):
                   return False
           return True
   ```
   
   Make the `S3KeySensor` accept str or List[str] would work too.
   Btw, apply this behavior for the three current sensors for s3 would be nice.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SevakAvetGD commented on issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
SevakAvetGD commented on issue #15001:
URL: https://github.com/apache/airflow/issues/15001#issuecomment-818150046


   I'd more towards modifying existing operator and making it accept list of prefixes.
   Also, the expected behaviour is that poke() will return True only if all keys exist, right?
   Would that be useful to have sort of any(..) (when at least one key exist) instead of all(..)?
   
   Also, this can be implemented as such:
    ```
   return all(hook.check_for_key(key, self.bucket_name) for key in key_list)
    ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] AmarEL edited a comment on issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
AmarEL edited a comment on issue #15001:
URL: https://github.com/apache/airflow/issues/15001#issuecomment-807891419


   We created the same sensor here to look for specific keys in the s3.
   In our case, since the prefix was different, we did it on top of the `check_for_key` from s3 hook class (The same used by the S3KeySensor).
   ```
           key_list = context['task_instance'].xcom_pull(task_ids='get_s3_keys_of_extracted_tables', key='s3_parquet_keys')
           hook = S3Hook(aws_conn_id=self.aws_conn_id, verify=self.verify)
           for key in key_list:
               self.log.info('Poking for key: s3://%s/%s', self.bucket_name, key)
               if not hook.check_for_key(key, self.bucket_name):
                   return False
           return True
   ```
   
   Make `S3KeySensor` to accept str or List[str] would work too.
   Btw, apply this behavior for the three current sensor for s3 would be nice


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #15001:
URL: https://github.com/apache/airflow/issues/15001#issuecomment-806426030


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SevakAvetGD removed a comment on issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
SevakAvetGD removed a comment on issue #15001:
URL: https://github.com/apache/airflow/issues/15001#issuecomment-818150046


   I'd more towards modifying existing operator and making it accept list of prefixes.
   Also, the expected behaviour is that poke() will return True only if all keys exist, right?
   Would that be useful to have sort of any(..) (when at least one key exist) instead of all(..)?
   
   Also, this can be implemented as such:
    ```
   return all(hook.check_for_key(key, self.bucket_name) for key in key_list)
    ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] anaynayak commented on issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
anaynayak commented on issue #15001:
URL: https://github.com/apache/airflow/issues/15001#issuecomment-934527384


   I can pick this up.  Was thinking of the same approach as suggested by @SevakAvet 
   
   For the choice between `any` and `all` I believe it will depend on the use case. I can either get started with a simple `all` based PR or accept a Callable[[Dict[str, bool]],bool] to let the consumer decide based on the use case. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SevakAvet commented on issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
SevakAvet commented on issue #15001:
URL: https://github.com/apache/airflow/issues/15001#issuecomment-818188105


   I'd more towards modifying existing operator and making it accept list of prefixes.
   Also, the expected behaviour is that poke() will return True only if all keys exist, right?
   Would that be useful to have sort of any(..) (when at least one key exist) instead of all(..)?
   
   Also, this can be implemented as such:
    ```
   return all(hook.check_for_key(key, self.bucket_name) for key in key_list)
    ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SevakAvetGD removed a comment on issue #15001: S3MultipleKeysSensor operator

Posted by GitBox <gi...@apache.org>.
SevakAvetGD removed a comment on issue #15001:
URL: https://github.com/apache/airflow/issues/15001#issuecomment-818151138


   I'd more towards modifying existing operator and making it accept list of prefixes.
   Also, the expected behaviour is that poke() will return True only if all keys exist, right?
   Would that be useful to have sort of any(..) (when at least one key exist) instead of all(..)?
   
   Also, this can be implemented as such:
    ```
   return all(hook.check_for_key(key, self.bucket_name) for key in key_list)
    ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org