You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/05/03 11:06:00 UTC

[jira] [Commented] (AIRFLOW-6481) SalesforceHook attempts to use .str accessor on object dtype

    [ https://issues.apache.org/jira/browse/AIRFLOW-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098361#comment-17098361 ] 

ASF GitHub Bot commented on AIRFLOW-6481:
-----------------------------------------

potiuk commented on pull request #7703:
URL: https://github.com/apache/airflow/pull/7703#issuecomment-623092386


   We are trying to minimise cherry-picking now. Next week hopefully we are going to release backport operators - so yo will be able to install airflow-backported-salesforce operators and use the operators from the new providers package in the previous 1.10.* airflow (in parallel to the operators from the 1.10. Would that be a better solution @jeffolsi  ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> SalesforceHook attempts to use .str accessor on object dtype
> ------------------------------------------------------------
>
>                 Key: AIRFLOW-6481
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6481
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks
>    Affects Versions: 1.10.7
>            Reporter: Teddy Hartanto
>            Assignee: Teddy Hartanto
>            Priority: Minor
>             Fix For: 2.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I've searched through Airflow's issues and couldn't find any report regarding this. I wonder if I'm the only one who's facing this? 
> {noformat}
> Panda version: 0.24.2{noformat}
> *Bug description*
> I'm using the SalesforceHook to fetch data from SalesForce and I encountered this exception:
> {code:java}
> AttributeError: ('Can only use .str accessor with string values, which use np.object_ dtype in pandas', ...)
> {code}
> The root of the problem is that some of the object in Salesforce has a column with compound data type. Eg: User's address is a Python dict:
> {code:java}
> <class 'dict'>: {'city': None, 'country': 'my', 'geocodeAccuracy': None, 'latitude': None, 'longitude': None, 'postalCode': None, 'state': None, 'street': None}{code}
> The problematic code is here:
> {code:java}
> if fmt == "csv":
>     # there are also a ton of newline objects
>     # that mess up our ability to write to csv
>     # we remove these newlines so that the output is a valid CSV format
>     self.log.info("Cleaning data and writing to CSV")
>     possible_strings = df.columns[df.dtypes == "object"]
>     df[possible_strings] = df[possible_strings].apply(
>         lambda x: x.str.replace("\r\n", "")
>     )
>     df[possible_strings] = df[possible_strings].apply(
>         lambda x: x.str.replace("\n", "")
>     )
>     # write the dataframe
>     df.to_csv(filename, index=False)
> {code}
> Because a Series containing Python dicts are also considered of dtype object, they're assumed to be "possible_strings". And then, when .str is called on that Series, the exception is thrown.
> To fix it, we could explicitly cast the object type to string as such: 
> {code:java}
> if fmt == "csv":
>     ...
>     df[possible_strings] = df[possible_strings].astype(str).apply(
>         lambda x: x.str.replace("\r\n", "")
>     )
>     df[possible_strings] = df[possible_strings].astype(str).apply(
>         lambda x: x.str.replace("\n", "")
>     )
> {code}
> I've tested this and it works for me. Could somebody help me verify that the type conversion is indeed needed? If yes, I'm keen to submit a PR to fix this with the unit test included.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)