You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/20 23:49:48 UTC

[GitHub] [airflow] gmontanola opened a new pull request #10437: Fixes MySQLToS3 float to int conversion

gmontanola opened a new pull request #10437:
URL: https://github.com/apache/airflow/pull/10437


   The conversion wasn't being applied because the `astype` method needs to be used in a `pd.Series` object. This means all integer columns that have **NULL** values will be parsed as floats and the resulting file will have float formated numbers.
   
   Also, the Int64Dtype must be passed as an **instance** in order for it to work as intended.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stale[bot] commented on pull request #10437: Fixes MySQLToS3 float to int conversion

Posted by GitBox <gi...@apache.org>.
stale[bot] commented on pull request #10437:
URL: https://github.com/apache/airflow/pull/10437#issuecomment-706789481


   This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #10437: Fixes MySQLToS3 float to int conversion

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #10437:
URL: https://github.com/apache/airflow/pull/10437


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gmontanola commented on pull request #10437: Fixes MySQLToS3 float to int conversion

Posted by GitBox <gi...@apache.org>.
gmontanola commented on pull request #10437:
URL: https://github.com/apache/airflow/pull/10437#issuecomment-711440948


   @feluelle 
   
   I added a simple test to check the **integer conversion**.
   I'm open to _feedback_ or improvements. I'm really not used to unit testing (but you love to improve :blue_heart:)
   
   ![image](https://user-images.githubusercontent.com/35972814/96388782-88e38e80-1181-11eb-97fa-52d173de76e1.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] feluelle commented on a change in pull request #10437: Fixes MySQLToS3 float to int conversion

Posted by GitBox <gi...@apache.org>.
feluelle commented on a change in pull request #10437:
URL: https://github.com/apache/airflow/pull/10437#discussion_r477173414



##########
File path: airflow/providers/amazon/aws/transfers/mysql_to_s3.py
##########
@@ -105,7 +105,8 @@ def _fix_int_dtypes(self, df):
                 notna_series = df[col].dropna().values
                 if np.isclose(notna_series, notna_series.astype(int)).all():
                     # set to dtype that retains integers and supports NaNs
-                    df[col] = np.where(df[col].isnull(), None, df[col]).astype(pd.Int64Dtype)
+                    df[col] = np.where(df[col].isnull(), None, df[col])
+                    df[col] = df[col].astype(pd.Int64Dtype())

Review comment:
       Don't we have a test for that? Maybe can you add one to see how it should look like after this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10437: Fixes MySQLToS3 float to int conversion

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10437:
URL: https://github.com/apache/airflow/pull/10437#issuecomment-711443444


   [The Workflow run](https://github.com/apache/airflow/actions/runs/314405054) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gmontanola edited a comment on pull request #10437: Fixes MySQLToS3 float to int conversion

Posted by GitBox <gi...@apache.org>.
gmontanola edited a comment on pull request #10437:
URL: https://github.com/apache/airflow/pull/10437#issuecomment-711440948


   @feluelle 
   
   I added a simple test to check the **integer conversion**.
   I'm open to _feedback_ or improvements to the code. 
   
   Really not used to unit testing (but I would love to improve :blue_heart:), thanks for the opportunity :+1: 
   
   ![image](https://user-images.githubusercontent.com/35972814/96388782-88e38e80-1181-11eb-97fa-52d173de76e1.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gmontanola commented on a change in pull request #10437: Fixes MySQLToS3 float to int conversion

Posted by GitBox <gi...@apache.org>.
gmontanola commented on a change in pull request #10437:
URL: https://github.com/apache/airflow/pull/10437#discussion_r477242112



##########
File path: airflow/providers/amazon/aws/transfers/mysql_to_s3.py
##########
@@ -105,7 +105,8 @@ def _fix_int_dtypes(self, df):
                 notna_series = df[col].dropna().values
                 if np.isclose(notna_series, notna_series.astype(int)).all():
                     # set to dtype that retains integers and supports NaNs
-                    df[col] = np.where(df[col].isnull(), None, df[col]).astype(pd.Int64Dtype)
+                    df[col] = np.where(df[col].isnull(), None, df[col])
+                    df[col] = df[col].astype(pd.Int64Dtype())

Review comment:
       The test for this operator **doesn't** cover this method (`_fix_int_dtypes`). So I'll add one.
   
   Thanks for the input!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gmontanola edited a comment on pull request #10437: Fixes MySQLToS3 float to int conversion

Posted by GitBox <gi...@apache.org>.
gmontanola edited a comment on pull request #10437:
URL: https://github.com/apache/airflow/pull/10437#issuecomment-711440948


   @feluelle 
   
   I added a simple test to check the **integer conversion**.
   I'm open to _feedback_ or improvements. I'm really not used to unit testing (but I would love to improve :blue_heart:)
   
   ![image](https://user-images.githubusercontent.com/35972814/96388782-88e38e80-1181-11eb-97fa-52d173de76e1.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] HaloKo4 commented on a change in pull request #10437: Fixes MySQLToS3 float to int conversion

Posted by GitBox <gi...@apache.org>.
HaloKo4 commented on a change in pull request #10437:
URL: https://github.com/apache/airflow/pull/10437#discussion_r504236662



##########
File path: airflow/providers/amazon/aws/transfers/mysql_to_s3.py
##########
@@ -105,7 +105,8 @@ def _fix_int_dtypes(self, df):
                 notna_series = df[col].dropna().values
                 if np.isclose(notna_series, notna_series.astype(int)).all():
                     # set to dtype that retains integers and supports NaNs
-                    df[col] = np.where(df[col].isnull(), None, df[col]).astype(pd.Int64Dtype)
+                    df[col] = np.where(df[col].isnull(), None, df[col])
+                    df[col] = df[col].astype(pd.Int64Dtype())

Review comment:
       @gmontanola are you planning to finish it? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gmontanola commented on a change in pull request #10437: Fixes MySQLToS3 float to int conversion

Posted by GitBox <gi...@apache.org>.
gmontanola commented on a change in pull request #10437:
URL: https://github.com/apache/airflow/pull/10437#discussion_r504240367



##########
File path: airflow/providers/amazon/aws/transfers/mysql_to_s3.py
##########
@@ -105,7 +105,8 @@ def _fix_int_dtypes(self, df):
                 notna_series = df[col].dropna().values
                 if np.isclose(notna_series, notna_series.astype(int)).all():
                     # set to dtype that retains integers and supports NaNs
-                    df[col] = np.where(df[col].isnull(), None, df[col]).astype(pd.Int64Dtype)
+                    df[col] = np.where(df[col].isnull(), None, df[col])
+                    df[col] = df[col].astype(pd.Int64Dtype())

Review comment:
       I totally forgot this. Well, I guess adding this test should be pretty straight forward. I'll try to add this this week. Are you willing to add this by yourself or found a better a way to fix the problem?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10437: Fixes MySQLToS3 float to int conversion

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10437:
URL: https://github.com/apache/airflow/pull/10437#issuecomment-711443480


   [The Workflow run](https://github.com/apache/airflow/actions/runs/314405131) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org