You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Yeikel <em...@yeikel.com> on 2019/02/22 22:15:43 UTC

How can I parse an "unnamed" json array present in a column?

I have an "unnamed" json array stored in a *column*.  

The format is the following : 

column name : news

Data : 

[
  {
    "source": "source1",
    "name": "News site1"
  },
   {
    "source": "source2",
    "name": "News site2"
  }
]


Ideally , I'd like to parse it as : 

news ARRAY<struct&lt;source:string, name:string>>

I've tried the following : 

import org.apache.spark.sql.Encoders
import org.apache.spark.sql.types._;

val entry = scala.io.Source.fromFile("1.txt").mkString

val ds = Seq(entry).toDF("news")

val schema = Array(new StructType().add("name", StringType).add("source",
StringType))

ds.select(from_json($"news", schema) as "news_parsed").show(false)

But this is not allowed : 

found   : Array[org.apache.spark.sql.types.StructType]
required: org.apache.spark.sql.types.StructType


I also tried passing the following schema : 

val schema = StructType(new StructType().add("name",
StringType).add("source", StringType))

But this only parsed the first record : 

+--------------------+
|news_parsed         |
+--------------------+
|[News site1,source1]|
+--------------------+


I am aware that if I fix the JSON like this : 

{
  "news": [
    {
      "source": "source1",
      "name": "News site1"
    },
    {
      "source": "source2",
      "name": "News site2"
    }
  ]
}

The parsing works as expected , but I would like to avoid doing that if
possible. 

Another approach that I can think of is to map on it and parse it using
third party libraries like Gson , but  I am not sure if this is any better
than fixing the json beforehand. 

I am running Spark 2.1



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

RE: How can I parse an "unnamed" json array present in a column?

Posted by em...@yeikel.com.

Individual columns are small but the table contains millions of rows with this problem.  I am probably overthinking , and I will implement the workaround for now. 

 

Thank you for your help  <ma...@kth.se> @Magnus Nilsson , I will try that for now and wait for the upgrade.

 

From: Magnus Nilsson <ma...@kth.se> 
Sent: Sunday, February 24, 2019 4:47 PM
To: Yeikel <em...@yeikel.com>
Cc: user@spark.apache.org
Subject: Re: How can I parse an "unnamed" json array present in a column?

 

Well, I'm guessing the file is small enough so you don't have any memory issues.

 

If you're using spark to read the file use the spark.sql.functions.concat function. If you use Scala use the string method concat.

 

val prepend = """{"next":"""

val append = "}"

 

df.select(concat(prepend, $"rawStringFromFileColumn", append) as "values")

or

val df = Seq(prepend.concat(rawStringFromFile).concat(append)).toDF("values")

 

val df2 = df.select(from_json($"values", schema))

 

Haven't tried it, might be a comma wrong somewhere.

 

Extracting the function, compiling it and using in your own library you mean? I wouldn't even bother looking into it to be honest. Might work, might not. Might have to pull in half of spark to compile it, might not. Is it really worth your time investigating when the workaround is so simple and you know it will be fixed once you upgrade to a newer Spark version?

 

 

On Sun, Feb 24, 2019 at 10:17 PM <email@yeikel.com <ma...@yeikel.com> > wrote:

Unfortunately , I can’t change the source system , so changing the JSON at runtime is the best I can do right now.

 

Is there any preferred way to modify the String other than an UDF or map on the string? 

 

At the moment I am modifying it returning a generic type “t” so I can use the same UDF  for many different JSONs that have the same issue. 

 

Also , is there any advantage(if possible) to extract the function from the original source code and run it on an older version of Spark? 

 

 

From: Magnus Nilsson <magnn@kth.se <ma...@kth.se> > 
Sent: Sunday, February 24, 2019 5:34 AM
To: Yeikel <email@yeikel.com <ma...@yeikel.com> >
Cc: user@spark.apache.org <ma...@spark.apache.org> 
Subject: Re: How can I parse an "unnamed" json array present in a column?

 

That's a bummer, if you're unable to upgrade to Spark 2.3+ your best bet is probably to prepend/append the jsonarray-string and locate the json array as the value of a root attribute in a json-document (as in your first work around). I mean, it's such an easy and safe fix, you can still do it even if you stream the file.

 

Even better, make the source system create a JSON-lines file instead of an json array if possible.

 

When I use Datasets (Tungsten) I basically try to stay there and use the available column functions unless I have no choice but to serialize and run custom advanced calculations/parsings. In your case just modifying the string and use the tested from_json function beats the available alternatives if you ask me.

 

 

On Sun, Feb 24, 2019 at 1:13 AM <email@yeikel.com <ma...@yeikel.com> > wrote:

What you suggested works in Spark 2.3 , but in the version that I am using (2.1) it produces the following exception : 

 

found   : org.apache.spark.sql.types.ArrayType

required: org.apache.spark.sql.types.StructType

       ds.select(from_json($"news", schema) as "news_parsed").show(false)

 

Is it viable/possible to export a function from 2.3 to 2.1?  What other options do I have? 

 

Thank you.

 

 

From: Magnus Nilsson <magnn@kth.se <ma...@kth.se> > 
Sent: Saturday, February 23, 2019 3:43 PM
Cc: user@spark.apache.org <ma...@spark.apache.org> 
Subject: Re: How can I parse an "unnamed" json array present in a column?

 

Use spark.sql.types.ArrayType instead of a Scala Array as the root type when you define the schema and it will work.

 

Regards,

 

Magnus

 

On Fri, Feb 22, 2019 at 11:15 PM Yeikel <email@yeikel.com <ma...@yeikel.com> > wrote:

I have an "unnamed" json array stored in a *column*.  

The format is the following : 

column name : news

Data : 

[
  {
    "source": "source1",
    "name": "News site1"
  },
   {
    "source": "source2",
    "name": "News site2"
  }
]


Ideally , I'd like to parse it as : 

news ARRAY<struct&lt;source:string, name:string>>

I've tried the following : 

import org.apache.spark.sql.Encoders
import org.apache.spark.sql.types._;

val entry = scala.io.Source.fromFile("1.txt").mkString

val ds = Seq(entry).toDF("news")

val schema = Array(new StructType().add("name", StringType).add("source",
StringType))

ds.select(from_json($"news", schema) as "news_parsed").show(false)

But this is not allowed : 

found   : Array[org.apache.spark.sql.types.StructType]
required: org.apache.spark.sql.types.StructType


I also tried passing the following schema : 

val schema = StructType(new StructType().add("name",
StringType).add("source", StringType))

But this only parsed the first record : 

+--------------------+
|news_parsed         |
+--------------------+
|[News site1,source1]|
+--------------------+


I am aware that if I fix the JSON like this : 

{
  "news": [
    {
      "source": "source1",
      "name": "News site1"
    },
    {
      "source": "source2",
      "name": "News site2"
    }
  ]
}

The parsing works as expected , but I would like to avoid doing that if
possible. 

Another approach that I can think of is to map on it and parse it using
third party libraries like Gson , but  I am not sure if this is any better
than fixing the json beforehand. 

I am running Spark 2.1



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>

Re: How can I parse an "unnamed" json array present in a column?

Posted by Magnus Nilsson <ma...@kth.se>.

Well, I'm guessing the file is small enough so you don't have any memory
issues.

If you're using spark to read the file use the spark.sql.functions.concat
function. If you use Scala use the string method concat.

val prepend = """{"next":"""
val append = "}"

df.select(concat(prepend, $"rawStringFromFileColumn", append) as "values")
or
val df =
Seq(prepend.concat(rawStringFromFile).concat(append)).toDF("values")

val df2 = df.select(from_json($"values", schema))

Haven't tried it, might be a comma wrong somewhere.

Extracting the function, compiling it and using in your own library you
mean? I wouldn't even bother looking into it to be honest. Might work,
might not. Might have to pull in half of spark to compile it, might not. Is
it really worth your time investigating when the workaround is so simple
and you know it will be fixed once you upgrade to a newer Spark version?


On Sun, Feb 24, 2019 at 10:17 PM <em...@yeikel.com> wrote:

> Unfortunately , I can’t change the source system , so changing the JSON at
> runtime is the best I can do right now.
>
>
>
> Is there any preferred way to modify the String other than an UDF or map
> on the string?
>
>
>
> At the moment I am modifying it returning a generic type “t” so I can use
> the same UDF  for many different JSONs that have the same issue.
>
>
>
> Also , is there any advantage(if possible) to extract the function from
> the original source code and run it on an older version of Spark?
>
>
>
>
>
> *From:* Magnus Nilsson <ma...@kth.se>
> *Sent:* Sunday, February 24, 2019 5:34 AM
> *To:* Yeikel <em...@yeikel.com>
> *Cc:* user@spark.apache.org
> *Subject:* Re: How can I parse an "unnamed" json array present in a
> column?
>
>
>
> That's a bummer, if you're unable to upgrade to Spark 2.3+ your best bet
> is probably to prepend/append the jsonarray-string and locate the json
> array as the value of a root attribute in a json-document (as in your first
> work around). I mean, it's such an easy and safe fix, you can still do it
> even if you stream the file.
>
>
>
> Even better, make the source system create a JSON-lines file instead of an
> json array if possible.
>
>
>
> When I use Datasets (Tungsten) I basically try to stay there and use the
> available column functions unless I have no choice but to serialize and run
> custom advanced calculations/parsings. In your case just modifying the
> string and use the tested from_json function beats the available
> alternatives if you ask me.
>
>
>
>
>
> On Sun, Feb 24, 2019 at 1:13 AM <em...@yeikel.com> wrote:
>
> What you suggested works in Spark 2.3 , but in the version that I am using
> (2.1) it produces the following exception :
>
>
>
> found   : org.apache.spark.sql.types.ArrayType
>
> required: org.apache.spark.sql.types.StructType
>
>        ds.select(from_json($"news", schema) as "news_parsed").show(false)
>
>
>
> Is it viable/possible to export a function from 2.3 to 2.1?  What other
> options do I have?
>
>
>
> Thank you.
>
>
>
>
>
> *From:* Magnus Nilsson <ma...@kth.se>
> *Sent:* Saturday, February 23, 2019 3:43 PM
> *Cc:* user@spark.apache.org
> *Subject:* Re: How can I parse an "unnamed" json array present in a
> column?
>
>
>
> Use spark.sql.types.ArrayType instead of a Scala Array as the root type
> when you define the schema and it will work.
>
>
>
> Regards,
>
>
>
> Magnus
>
>
>
> On Fri, Feb 22, 2019 at 11:15 PM Yeikel <em...@yeikel.com> wrote:
>
> I have an "unnamed" json array stored in a *column*.
>
> The format is the following :
>
> column name : news
>
> Data :
>
> [
>   {
>     "source": "source1",
>     "name": "News site1"
>   },
>    {
>     "source": "source2",
>     "name": "News site2"
>   }
> ]
>
>
> Ideally , I'd like to parse it as :
>
> news ARRAY<struct&lt;source:string, name:string>>
>
> I've tried the following :
>
> import org.apache.spark.sql.Encoders
> import org.apache.spark.sql.types._;
>
> val entry = scala.io.Source.fromFile("1.txt").mkString
>
> val ds = Seq(entry).toDF("news")
>
> val schema = Array(new StructType().add("name", StringType).add("source",
> StringType))
>
> ds.select(from_json($"news", schema) as "news_parsed").show(false)
>
> But this is not allowed :
>
> found   : Array[org.apache.spark.sql.types.StructType]
> required: org.apache.spark.sql.types.StructType
>
>
> I also tried passing the following schema :
>
> val schema = StructType(new StructType().add("name",
> StringType).add("source", StringType))
>
> But this only parsed the first record :
>
> +--------------------+
> |news_parsed         |
> +--------------------+
> |[News site1,source1]|
> +--------------------+
>
>
> I am aware that if I fix the JSON like this :
>
> {
>   "news": [
>     {
>       "source": "source1",
>       "name": "News site1"
>     },
>     {
>       "source": "source2",
>       "name": "News site2"
>     }
>   ]
> }
>
> The parsing works as expected , but I would like to avoid doing that if
> possible.
>
> Another approach that I can think of is to map on it and parse it using
> third party libraries like Gson , but  I am not sure if this is any better
> than fixing the json beforehand.
>
> I am running Spark 2.1
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

RE: How can I parse an "unnamed" json array present in a column?

Posted by em...@yeikel.com.

Unfortunately , I can’t change the source system , so changing the JSON at runtime is the best I can do right now.

 

Is there any preferred way to modify the String other than an UDF or map on the string? 

 

At the moment I am modifying it returning a generic type “t” so I can use the same UDF  for many different JSONs that have the same issue. 

 

Also , is there any advantage(if possible) to extract the function from the original source code and run it on an older version of Spark? 

 

 

From: Magnus Nilsson <ma...@kth.se> 
Sent: Sunday, February 24, 2019 5:34 AM
To: Yeikel <em...@yeikel.com>
Cc: user@spark.apache.org
Subject: Re: How can I parse an "unnamed" json array present in a column?

 

That's a bummer, if you're unable to upgrade to Spark 2.3+ your best bet is probably to prepend/append the jsonarray-string and locate the json array as the value of a root attribute in a json-document (as in your first work around). I mean, it's such an easy and safe fix, you can still do it even if you stream the file.

 

Even better, make the source system create a JSON-lines file instead of an json array if possible.

 

When I use Datasets (Tungsten) I basically try to stay there and use the available column functions unless I have no choice but to serialize and run custom advanced calculations/parsings. In your case just modifying the string and use the tested from_json function beats the available alternatives if you ask me.

 

 

On Sun, Feb 24, 2019 at 1:13 AM <email@yeikel.com <ma...@yeikel.com> > wrote:

What you suggested works in Spark 2.3 , but in the version that I am using (2.1) it produces the following exception : 

 

found   : org.apache.spark.sql.types.ArrayType

required: org.apache.spark.sql.types.StructType

       ds.select(from_json($"news", schema) as "news_parsed").show(false)

 

Is it viable/possible to export a function from 2.3 to 2.1?  What other options do I have? 

 

Thank you.

 

 

From: Magnus Nilsson <magnn@kth.se <ma...@kth.se> > 
Sent: Saturday, February 23, 2019 3:43 PM
Cc: user@spark.apache.org <ma...@spark.apache.org> 
Subject: Re: How can I parse an "unnamed" json array present in a column?

 

Use spark.sql.types.ArrayType instead of a Scala Array as the root type when you define the schema and it will work.

 

Regards,

 

Magnus

 

On Fri, Feb 22, 2019 at 11:15 PM Yeikel <email@yeikel.com <ma...@yeikel.com> > wrote:

I have an "unnamed" json array stored in a *column*.  

The format is the following : 

column name : news

Data : 

[
  {
    "source": "source1",
    "name": "News site1"
  },
   {
    "source": "source2",
    "name": "News site2"
  }
]


Ideally , I'd like to parse it as : 

news ARRAY<struct&lt;source:string, name:string>>

I've tried the following : 

import org.apache.spark.sql.Encoders
import org.apache.spark.sql.types._;

val entry = scala.io.Source.fromFile("1.txt").mkString

val ds = Seq(entry).toDF("news")

val schema = Array(new StructType().add("name", StringType).add("source",
StringType))

ds.select(from_json($"news", schema) as "news_parsed").show(false)

But this is not allowed : 

found   : Array[org.apache.spark.sql.types.StructType]
required: org.apache.spark.sql.types.StructType


I also tried passing the following schema : 

val schema = StructType(new StructType().add("name",
StringType).add("source", StringType))

But this only parsed the first record : 

+--------------------+
|news_parsed         |
+--------------------+
|[News site1,source1]|
+--------------------+


I am aware that if I fix the JSON like this : 

{
  "news": [
    {
      "source": "source1",
      "name": "News site1"
    },
    {
      "source": "source2",
      "name": "News site2"
    }
  ]
}

The parsing works as expected , but I would like to avoid doing that if
possible. 

Another approach that I can think of is to map on it and parse it using
third party libraries like Gson , but  I am not sure if this is any better
than fixing the json beforehand. 

I am running Spark 2.1



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>

Re: How can I parse an "unnamed" json array present in a column?

Posted by Magnus Nilsson <ma...@kth.se>.

That's a bummer, if you're unable to upgrade to Spark 2.3+ your best bet is
probably to prepend/append the jsonarray-string and locate the json array
as the value of a root attribute in a json-document (as in your first work
around). I mean, it's such an easy and safe fix, you can still do it even
if you stream the file.

Even better, make the source system create a JSON-lines file instead of an
json array if possible.

When I use Datasets (Tungsten) I basically try to stay there and use the
available column functions unless I have no choice but to serialize and run
custom advanced calculations/parsings. In your case just modifying the
string and use the tested from_json function beats the available
alternatives if you ask me.


On Sun, Feb 24, 2019 at 1:13 AM <em...@yeikel.com> wrote:

> What you suggested works in Spark 2.3 , but in the version that I am using
> (2.1) it produces the following exception :
>
>
>
> found   : org.apache.spark.sql.types.ArrayType
>
> required: org.apache.spark.sql.types.StructType
>
>        ds.select(from_json($"news", schema) as "news_parsed").show(false)
>
>
>
> Is it viable/possible to export a function from 2.3 to 2.1?  What other
> options do I have?
>
>
>
> Thank you.
>
>
>
>
>
> *From:* Magnus Nilsson <ma...@kth.se>
> *Sent:* Saturday, February 23, 2019 3:43 PM
> *Cc:* user@spark.apache.org
> *Subject:* Re: How can I parse an "unnamed" json array present in a
> column?
>
>
>
> Use spark.sql.types.ArrayType instead of a Scala Array as the root type
> when you define the schema and it will work.
>
>
>
> Regards,
>
>
>
> Magnus
>
>
>
> On Fri, Feb 22, 2019 at 11:15 PM Yeikel <em...@yeikel.com> wrote:
>
> I have an "unnamed" json array stored in a *column*.
>
> The format is the following :
>
> column name : news
>
> Data :
>
> [
>   {
>     "source": "source1",
>     "name": "News site1"
>   },
>    {
>     "source": "source2",
>     "name": "News site2"
>   }
> ]
>
>
> Ideally , I'd like to parse it as :
>
> news ARRAY<struct&lt;source:string, name:string>>
>
> I've tried the following :
>
> import org.apache.spark.sql.Encoders
> import org.apache.spark.sql.types._;
>
> val entry = scala.io.Source.fromFile("1.txt").mkString
>
> val ds = Seq(entry).toDF("news")
>
> val schema = Array(new StructType().add("name", StringType).add("source",
> StringType))
>
> ds.select(from_json($"news", schema) as "news_parsed").show(false)
>
> But this is not allowed :
>
> found   : Array[org.apache.spark.sql.types.StructType]
> required: org.apache.spark.sql.types.StructType
>
>
> I also tried passing the following schema :
>
> val schema = StructType(new StructType().add("name",
> StringType).add("source", StringType))
>
> But this only parsed the first record :
>
> +--------------------+
> |news_parsed         |
> +--------------------+
> |[News site1,source1]|
> +--------------------+
>
>
> I am aware that if I fix the JSON like this :
>
> {
>   "news": [
>     {
>       "source": "source1",
>       "name": "News site1"
>     },
>     {
>       "source": "source2",
>       "name": "News site2"
>     }
>   ]
> }
>
> The parsing works as expected , but I would like to avoid doing that if
> possible.
>
> Another approach that I can think of is to map on it and parse it using
> third party libraries like Gson , but  I am not sure if this is any better
> than fixing the json beforehand.
>
> I am running Spark 2.1
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

RE: How can I parse an "unnamed" json array present in a column?

Posted by em...@yeikel.com.

What you suggested works in Spark 2.3 , but in the version that I am using (2.1) it produces the following exception : 

 

found   : org.apache.spark.sql.types.ArrayType

required: org.apache.spark.sql.types.StructType

       ds.select(from_json($"news", schema) as "news_parsed").show(false)

 

Is it viable/possible to export a function from 2.3 to 2.1?  What other options do I have? 

 

Thank you.

 

 

From: Magnus Nilsson <ma...@kth.se> 
Sent: Saturday, February 23, 2019 3:43 PM
Cc: user@spark.apache.org
Subject: Re: How can I parse an "unnamed" json array present in a column?

 

Use spark.sql.types.ArrayType instead of a Scala Array as the root type when you define the schema and it will work.

 

Regards,

 

Magnus

 

On Fri, Feb 22, 2019 at 11:15 PM Yeikel <email@yeikel.com <ma...@yeikel.com> > wrote:

I have an "unnamed" json array stored in a *column*.  

The format is the following : 

column name : news

Data : 

[
  {
    "source": "source1",
    "name": "News site1"
  },
   {
    "source": "source2",
    "name": "News site2"
  }
]


Ideally , I'd like to parse it as : 

news ARRAY<struct&lt;source:string, name:string>>

I've tried the following : 

import org.apache.spark.sql.Encoders
import org.apache.spark.sql.types._;

val entry = scala.io.Source.fromFile("1.txt").mkString

val ds = Seq(entry).toDF("news")

val schema = Array(new StructType().add("name", StringType).add("source",
StringType))

ds.select(from_json($"news", schema) as "news_parsed").show(false)

But this is not allowed : 

found   : Array[org.apache.spark.sql.types.StructType]
required: org.apache.spark.sql.types.StructType


I also tried passing the following schema : 

val schema = StructType(new StructType().add("name",
StringType).add("source", StringType))

But this only parsed the first record : 

+--------------------+
|news_parsed         |
+--------------------+
|[News site1,source1]|
+--------------------+


I am aware that if I fix the JSON like this : 

{
  "news": [
    {
      "source": "source1",
      "name": "News site1"
    },
    {
      "source": "source2",
      "name": "News site2"
    }
  ]
}

The parsing works as expected , but I would like to avoid doing that if
possible. 

Another approach that I can think of is to map on it and parse it using
third party libraries like Gson , but  I am not sure if this is any better
than fixing the json beforehand. 

I am running Spark 2.1



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>

Re: How can I parse an "unnamed" json array present in a column?

Posted by Magnus Nilsson <ma...@kth.se>.

Use spark.sql.types.ArrayType instead of a Scala Array as the root type
when you define the schema and it will work.

Regards,

Magnus

On Fri, Feb 22, 2019 at 11:15 PM Yeikel <em...@yeikel.com> wrote:

> I have an "unnamed" json array stored in a *column*.
>
> The format is the following :
>
> column name : news
>
> Data :
>
> [
>   {
>     "source": "source1",
>     "name": "News site1"
>   },
>    {
>     "source": "source2",
>     "name": "News site2"
>   }
> ]
>
>
> Ideally , I'd like to parse it as :
>
> news ARRAY<struct&lt;source:string, name:string>>
>
> I've tried the following :
>
> import org.apache.spark.sql.Encoders
> import org.apache.spark.sql.types._;
>
> val entry = scala.io.Source.fromFile("1.txt").mkString
>
> val ds = Seq(entry).toDF("news")
>
> val schema = Array(new StructType().add("name", StringType).add("source",
> StringType))
>
> ds.select(from_json($"news", schema) as "news_parsed").show(false)
>
> But this is not allowed :
>
> found   : Array[org.apache.spark.sql.types.StructType]
> required: org.apache.spark.sql.types.StructType
>
>
> I also tried passing the following schema :
>
> val schema = StructType(new StructType().add("name",
> StringType).add("source", StringType))
>
> But this only parsed the first record :
>
> +--------------------+
> |news_parsed         |
> +--------------------+
> |[News site1,source1]|
> +--------------------+
>
>
> I am aware that if I fix the JSON like this :
>
> {
>   "news": [
>     {
>       "source": "source1",
>       "name": "News site1"
>     },
>     {
>       "source": "source2",
>       "name": "News site2"
>     }
>   ]
> }
>
> The parsing works as expected , but I would like to avoid doing that if
> possible.
>
> Another approach that I can think of is to map on it and parse it using
> third party libraries like Gson , but  I am not sure if this is any better
> than fixing the json beforehand.
>
> I am running Spark 2.1
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>