You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/10 13:44:31 UTC
[GitHub] [airflow] hyangminj opened a new issue, #28830: DynamoDB
hyangminj opened a new issue, #28830:
URL: https://github.com/apache/airflow/issues/28830
### Description
Airflow provides the Amazon DynamoDB to Amazon S3 below.
https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/operators/transfer/dynamodb_to_s3.html
Most of Data Engineer build their "export DDB data to s3" pipeline using "within the point in time recovery window".
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.export_table_to_point_in_time
I appreciate if airflow has this function as a native function.
### Use case/motivation
My daily batch job exports its data with pitr option. All of tasks is written by apache-airflow-providers-amazon except "export_table_to_point_in_time" task.
"export_table_to_point_in_time" task only used the python operator. I expect I can unify the task as apache-airflow-providers-amazon library.
### Related issues
_No response_
### Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] utkarsharma2 commented on issue #28830: Export DynamoDB table to S3 with PITR
Posted by "utkarsharma2 (via GitHub)" <gi...@apache.org>.
utkarsharma2 commented on issue #28830:
URL: https://github.com/apache/airflow/issues/28830#issuecomment-1498441140
@eladkal @Taragolis I would like to work on this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] hyangminj commented on issue #28830: Export DynamoDB table to S3 with PITR
Posted by GitBox <gi...@apache.org>.
hyangminj commented on issue #28830:
URL: https://github.com/apache/airflow/issues/28830#issuecomment-1380375817
Sure, I will create separate PRs for operator and sensors.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #28830: Export DynamoDB table to S3 with PITR
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #28830: Export DynamoDB table to S3 with PITR
URL: https://github.com/apache/airflow/issues/28830
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] hyangminj commented on issue #28830: Export DynamoDB table to S3 with PITR
Posted by GitBox <gi...@apache.org>.
hyangminj commented on issue #28830:
URL: https://github.com/apache/airflow/issues/28830#issuecomment-1378813813
Your code and explanation would be best practice.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #28830: DynamoDB
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #28830:
URL: https://github.com/apache/airflow/issues/28830#issuecomment-1377307061
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #28830: export DDB table to s3 with pitr
Posted by GitBox <gi...@apache.org>.
Taragolis commented on issue #28830:
URL: https://github.com/apache/airflow/issues/28830#issuecomment-1377370315
Feel free to make a PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #28830: Export DynamoDB table to S3 with PITR
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #28830:
URL: https://github.com/apache/airflow/issues/28830#issuecomment-1498673419
Feel free
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] hyangminj commented on issue #28830: Export DynamoDB table to S3 with PITR
Posted by GitBox <gi...@apache.org>.
hyangminj commented on issue #28830:
URL: https://github.com/apache/airflow/issues/28830#issuecomment-1378766119
Thank you for your information.
However, the exporting job is not final task in some data pipelines. It might be first steps of a dag to analysis data.
So the staring stage (Sensors) might be needed.
For example, If you create EMR job using [Create the Job Flow](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/operators/emr.html#create-the-job-flow), you need "Wait on an Amazon EMR job flow state" or "Wait on an Amazon EMR step state" methods .
Anyway, I will prepare the PR for this function and ask any questions if it is needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #28830: Export DynamoDB table to S3 with PITR
Posted by GitBox <gi...@apache.org>.
Taragolis commented on issue #28830:
URL: https://github.com/apache/airflow/issues/28830#issuecomment-1379058219
You could create multiple PRs one by one which could implemented separate parts: one for operator one for sensors, it handy especially if it your first contribution: less time for review, less time for change requests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #28830: export DDB table to s3 with pitr
Posted by GitBox <gi...@apache.org>.
Taragolis commented on issue #28830:
URL: https://github.com/apache/airflow/issues/28830#issuecomment-1377388529
BTW, Hooks which provided by amazon-provider is basically just a wrapper around of `boto3` clients / resources.
So you always could access to all `boto3` clients methods within the hook
```python
from airflow.providers.amazon.aws.hooks.dynamodb import DynamoDBHook
hook = DynamoDBHook(aws_conn_id="awesome-connection-id", region_name="us-east-1")
# DynamoDBHook create `resource` as high level client
# but you have an access to regular client by call `meta.client`.
# This potentially could be uniform in the future, see: https://github.com/apache/airflow/discussions/28560
client = hook.conn.meta.client
client.export_table_to_point_in_time(
TableArn='string',
ExportTime=datetime(2015, 1, 1),
ClientToken='string',
S3Bucket='string',
S3BucketOwner='string',
S3Prefix='string',
S3SseAlgorithm='AES256'|'KMS',
S3SseKmsKeyId='string',
ExportFormat='DYNAMODB_JSON'|'ION'
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org