You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by stephend-realitymine <gi...@git.apache.org> on 2015/10/23 10:16:31 UTC

[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

GitHub user stephend-realitymine opened a pull request:

    https://github.com/apache/spark/pull/9245

    [SPARK-10947] [SQL] With schema inference from JSON into a Dataframe, add option to infer all primitive object types as strings

    
    Currently, when a schema is inferred from a JSON file using sqlContext.read.json, the primitive object types are inferred as string, long, boolean, etc.
    
    However, if the inferred type is too specific (JSON obviously does not enforce types itself), this can cause issues with merging dataframe schemas.
    
    This pull request adds the option "primitivesAsString" to the JSON DataFrameReader which when true (defaults to false if not set) will infer all primitives as strings.
    
    Below is an example usage of this new functionality.
    ```
    val jsonDf = sqlContext.read.option("primitivesAsString", "true").json(primitiveFieldAndType)
    
    scala> jsonDf.printSchema()
    root
    |-- bigInteger: string (nullable = true)
    |-- boolean: string (nullable = true)
    |-- double: string (nullable = true)
    |-- integer: string (nullable = true)
    |-- long: string (nullable = true)
    |-- null: string (nullable = true)
    |-- string: string (nullable = true)
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/RealityMineLtd/spark stephend-primitivesAsString

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9245.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9245
    
----
commit 79b68a886a3a6e324e682709af346e599b3efd57
Author: RealityMine Ltd Coordinator <re...@users.noreply.github.com>
Date:   2015-10-06T11:16:13Z

    Merge pull request #1 from apache/master
    
    Resyncing to master

commit ec74be6f92ef4940bacb9455989c473ed3a1539a
Author: Stephen De Gennaro <st...@realitymine.com>
Date:   2015-10-15T12:19:34Z

    SPARK-10947 Added option to json schema primativesAsString when true will infer primative types as strings

commit 8a879c80a87e1bf9fe17fd58b18bce36294bc17b
Author: Ewan Leith <ew...@realitymine.com>
Date:   2015-10-22T16:07:39Z

    Fixing spelling of primitive from primative

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

Posted by stephend-realitymine <gi...@git.apache.org>.
Github user stephend-realitymine closed the pull request at:

    https://github.com/apache/spark/pull/9245


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

Posted by stephend-realitymine <gi...@git.apache.org>.
Github user stephend-realitymine commented on the pull request:

    https://github.com/apache/spark/pull/9245#issuecomment-150523833
  
    Going to close this pull request and recreate from rebased master. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9245#issuecomment-150506040
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org