You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bjørn Jørgensen (Jira)" <ji...@apache.org> on 2021/09/12 11:03:00 UTC
[jira] [Updated] (SPARK-36728) Can't create datetime object from anything other then year column Pyspark - koalas

     [ https://issues.apache.org/jira/browse/SPARK-36728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bjørn Jørgensen updated SPARK-36728:
------------------------------------
    Attachment: pyspark_date.txt

> Can't create datetime object from anything other then year column Pyspark - koalas
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-36728
>                 URL: https://issues.apache.org/jira/browse/SPARK-36728
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.3.0
>            Reporter: Bjørn Jørgensen
>            Priority: Major
>         Attachments: pyspark_date.txt
>
>
> If I create a datetime object it must be from columns named year.
>  
> df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, 2016],                   'month': [2, 3],                    'day': [4, 5],                    'hour': [2, 3],                    'minute': [10, 30],                    'second': [21,25]}) df.info()
> <class 'pyspark.pandas.frame.DataFrame'>Int64Index: 2 entries, 1 to 0Data columns (total 6 columns): #   Column  Non-Null Count  Dtype---  ------  --------------  ----- 0   year    2 non-null      int64 1   month   2 non-null      int64 2   day     2 non-null      int64 3   hour    2 non-null      int64 4   minute  2 non-null      int64 5   second  2 non-null      int64dtypes: int64(6)
> df['date'] = ps.to_datetime(df[['year', 'month', 'day']])
> df.info()
> <class 'pyspark.pandas.frame.DataFrame'>Int64Index: 2 entries, 1 to 0Data columns (total 7 columns): #   Column  Non-Null Count  Dtype     ---  ------  --------------  -----      0   year    2 non-null      int64      1   month   2 non-null      int64      2   day     2 non-null      int64      3   hour    2 non-null      int64      4   minute  2 non-null      int64      5   second  2 non-null      int64      6   date    2 non-null      datetime64dtypes: datetime64(1), int64(6)
> df_test = ps.DataFrame(\{'testyear': [2015, 2016],                   'testmonth': [2, 3],                    'testday': [4, 5],                    'hour': [2, 3],                    'minute': [10, 30],                    'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', 'testmonth', 'testday']])
> ---------------------------------------------------------------------------KeyError                                  Traceback (most recent call last)/tmp/ipykernel_73/904491906.py in <module>----> 1 df_test['date'] = ps.to_datetime(df[['testyear', 'testmonth', 'testday']])
> /opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key)  11853             return self.loc[:, key]  11854         elif is_list_like(key):> 11855             return self.loc[:, list(key)]  11856         raise NotImplementedError(key)  11857 
> /opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key)    476                 returns_series,    477                 series_name,--> 478             ) = self._select_cols(cols_sel)    479     480             if cond is None and limit is None and returns_series:
> /opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, missing_keys)    322             return self._select_cols_else(cols_sel, missing_keys)    323         elif is_list_like(cols_sel):--> 324             return self._select_cols_by_iterable(cols_sel, missing_keys)    325         else:    326             return self._select_cols_else(cols_sel, missing_keys)
> /opt/spark/python/pyspark/pandas/indexing.py in _select_cols_by_iterable(self, cols_sel, missing_keys)   1352                 if not found:   1353                     if missing_keys is None:-> 1354                         raise KeyError("['{}'] not in index".format(name_like_string(key)))   1355                     else:   1356                         missing_keys.append(key)
> KeyError: "['testyear'] not in index"
> df_test
> testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 30 25



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org