You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bjørn Jørgensen (Jira)" <ji...@apache.org> on 2021/09/12 11:01:00 UTC
[jira] [Created] (SPARK-36728) Can't create datetime object from
anything other then year column Pyspark - koalas
Bjørn Jørgensen created SPARK-36728:
---------------------------------------
Summary: Can't create datetime object from anything other then year column Pyspark - koalas
Key: SPARK-36728
URL: https://issues.apache.org/jira/browse/SPARK-36728
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 3.3.0
Reporter: Bjørn Jørgensen
If I create a datetime object it must be from columns named year.
df = ps.DataFrame(\{'year': [2015, 2016],df = ps.DataFrame({'year': [2015, 2016], 'month': [2, 3], 'day': [4, 5], 'hour': [2, 3], 'minute': [10, 30], 'second': [21,25]}) df.info()
<class 'pyspark.pandas.frame.DataFrame'>Int64Index: 2 entries, 1 to 0Data columns (total 6 columns): # Column Non-Null Count Dtype--- ------ -------------- ----- 0 year 2 non-null int64 1 month 2 non-null int64 2 day 2 non-null int64 3 hour 2 non-null int64 4 minute 2 non-null int64 5 second 2 non-null int64dtypes: int64(6)
df['date'] = ps.to_datetime(df[['year', 'month', 'day']])
df.info()
<class 'pyspark.pandas.frame.DataFrame'>Int64Index: 2 entries, 1 to 0Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 2 non-null int64 1 month 2 non-null int64 2 day 2 non-null int64 3 hour 2 non-null int64 4 minute 2 non-null int64 5 second 2 non-null int64 6 date 2 non-null datetime64dtypes: datetime64(1), int64(6)
df_test = ps.DataFrame(\{'testyear': [2015, 2016], 'testmonth': [2, 3], 'testday': [4, 5], 'hour': [2, 3], 'minute': [10, 30], 'second': [21,25]}) df_test['date'] = ps.to_datetime(df[['testyear', 'testmonth', 'testday']])
---------------------------------------------------------------------------KeyError Traceback (most recent call last)/tmp/ipykernel_73/904491906.py in <module>----> 1 df_test['date'] = ps.to_datetime(df[['testyear', 'testmonth', 'testday']])
/opt/spark/python/pyspark/pandas/frame.py in __getitem__(self, key) 11853 return self.loc[:, key] 11854 elif is_list_like(key):> 11855 return self.loc[:, list(key)] 11856 raise NotImplementedError(key) 11857
/opt/spark/python/pyspark/pandas/indexing.py in __getitem__(self, key) 476 returns_series, 477 series_name,--> 478 ) = self._select_cols(cols_sel) 479 480 if cond is None and limit is None and returns_series:
/opt/spark/python/pyspark/pandas/indexing.py in _select_cols(self, cols_sel, missing_keys) 322 return self._select_cols_else(cols_sel, missing_keys) 323 elif is_list_like(cols_sel):--> 324 return self._select_cols_by_iterable(cols_sel, missing_keys) 325 else: 326 return self._select_cols_else(cols_sel, missing_keys)
/opt/spark/python/pyspark/pandas/indexing.py in _select_cols_by_iterable(self, cols_sel, missing_keys) 1352 if not found: 1353 if missing_keys is None:-> 1354 raise KeyError("['{}'] not in index".format(name_like_string(key))) 1355 else: 1356 missing_keys.append(key)
KeyError: "['testyear'] not in index"
df_test
testyear testmonth testday hour minute second0 2015 2 4 2 10 211 2016 3 5 3 30 25
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org