You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "AIT OUFKIR (JIRA)" <ji...@apache.org> on 2018/02/23 14:00:00 UTC

[jira] [Updated] (SPARK-23495) Creating a json file using a dataframe Generates an issue

     [ https://issues.apache.org/jira/browse/SPARK-23495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

AIT OUFKIR updated SPARK-23495:
-------------------------------
    Summary: Creating a json file using a dataframe Generates an issue  (was: Creating a json file using a dataframe creates an issue)

> Creating a json file using a dataframe Generates an issue
> ---------------------------------------------------------
>
>                 Key: SPARK-23495
>                 URL: https://issues.apache.org/jira/browse/SPARK-23495
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.0
>            Reporter: AIT OUFKIR
>            Priority: Major
>             Fix For: 2.1.0
>
>
> Issue happen when trying to create json file using a dataframe (see code below)
> catis = ["CAT1","CAT2"]
> constis = ["CONST1","CONST2","CONST3"]
> datis = ["DAT1","DATE2","DATE3"]
> dictis = \{'A':1, 'B':2}
> dummis = ['dum1','dumm2','dumm3']
> fifis = \{'fifi1':1, 'fifi2':2, 'fifi3':3}
> khikhis = ['khikhi1','khikhi12','khikhi3','khikhi4']
> metadata_dump = dict(cati=catis, consti=constis, dati=datis, dicti=dictis, khikhi=khikhis, dummi=dummis, fifi=fifis)
> md = sqlContext.createDataFrame([metadata_dump]).collect()
> metadata = sqlContext.createDataFrame(md,['cati', 'consti', 'dati', 'dicti','khikhi', 'dummi', 'fifi'])
> metadata_path = "/mypath"
> metadata.write.mode('overwrite').json(metadata_path)
> This gives the following Results :
> {"cati":["CAT1","CAT2"]
> ,"consti":["CONST1","CONST2","CONST3"]
> ,"dati":["DAT1","DATE2","DATE3"]
> ,"dicti":\{"A":1,"B":2}
> ,"khikhi":["dum1","dumm2","dumm3"]
> ,"dummi":\{"fifi2":2,"fifi3":3,"fifi1":1}
> ,"fifi":["khikhi1","khikhi12","khikhi3","khikhi4"]}
> Which is wrong
>  
> When I try switching the fifis dict and not putting it at the end of the dict metadata_dump then I get the correct results :
>  {
> "cati":["CAT1","CAT2"]
> ,"consti":["CONST1","CONST2","CONST3"]
> ,"dati":["DAT1","DATE2","DATE3"]
> ,"dicti":\{"A":1,"B":2}
> ,"dummi":["dum1","dumm2","dumm3"]
> ,"fifi":\{"fifi2":2,"fifi3":3,"fifi1":1}
> ,"khikhi":["khikhi1","khikhi12","khikhi3","khikhi4"]
> }
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org