You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by ayan guha <gu...@gmail.com> on 2017/07/30 13:34:04 UTC

OrderedDict to DF

Hi

I have a orderedDict in python, and I would like to convert it to a DF,
with columns in the same order.

from collections import OrderedDict


str = [OrderedDict([(u'MID', 15784879), (u'START_DATE', u'1983-06-16
00:00:00'), (u'END_DATE', u'1984-01-31 00:00:00'), (u'AUDIT_ID',
u'16994174'), (u'AUDIT_TIMESTAMP', u'2011-05-19 14:01:16.761979000
+10:00')]), OrderedDict([(u'MID', 15784879), (u'START_DATE', u'1984-02-01
00:00:00'), (u'END_DATE', u'1995-10-09 00:00:00'), (u'AUDIT_ID',
u'16994174'), (u'AUDIT_TIMESTAMP', u'2011-05-19 14:01:16.760966000
+10:00')])]

print str

df = spark.sparkContext.parallelize(str).toDF()

df.printSchema()

[OrderedDict([(u'MID', 15784879), (u'START_DATE', u'1983-06-16 00:00:00'),
(u'END_DATE', u'1984-01-31 00:00:00'), (u'AUDIT_ID', u'16994174'),
(u'AUDIT_TIMESTAMP', u'2011-05-19 14:01:16.761979000 +10:00')]),
OrderedDict([(u'MID', 15784879), (u'START_DATE', u'1984-02-01 00:00:00'),
(u'END_DATE', u'1995-10-09 00:00:00'), (u'AUDIT_ID', u'16994174'),
(u'AUDIT_TIMESTAMP', u'2011-05-19 14:01:16.760966000 +10:00')])]
root
|-- AUDIT_ID: string (nullable = true)
|-- AUDIT_TIMESTAMP: string (nullable = true)
|-- END_DATE: string (nullable = true)
|-- MID: long (nullable = true)
|-- START_DATE: string (nullable = true)

Is there any way to do it?

I have control over to use OrderedDict vs normal dict, but the column order
is the requirement. Any help would be great!!

-- 
Best Regards,
Ayan Guha