You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Ryan <fr...@gmail.com> on 2015/09/28 16:29:36 UTC

Help with loading a CSV using Spark-SQL & Spark-CSV

Hi,

In a Zeppelin notebook, I am trying to load a csv using the spark-csv
package by databricks. I am using the Hortonworks sandbox to run Zeppelin
on. Unfortunately, the methods I have been trying have not been working.

My latest attempt is:
%dep
z.load("com.databricks:spark-csv_2.10:1.2.0")
%spark
val crimeData = "hdfs://
sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv"
sqlContext.load("hdfs://
sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes")

This is the error I receive:
<console>:16: error: not found: value sqlContext sqlContext.load("hdfs://
sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes") ^
<console>:12: error: not found: value % %spark ^
Thank you for any help in advance,
Ryan

Re: Help with loading a CSV using Spark-SQL & Spark-CSV

Posted by Felix Cheung <fe...@hotmail.com>.

Do you have the %spark line in the middle of a notebook "box"? It should be only at the beginning of a paragraph.






On Tue, Oct 6, 2015 at 6:42 AM -0700, "Alexander Bezzubov" <ab...@nflabs.com> wrote:
Hi,

it's really hard to say more without looking into the logs of Zeppelin
Server and Spark interpreter in your case.

They way you do it seems to be right and I had no problems before, using it
exactly the same way to read csv, except that I never used %spark
explicitly but always made sure that in interpreter binding for spark is
the first one on the list (you can drag'n'drop bindings to reorder), so it
becomes a default one and hence no need to type %spark

Can you try that out and in case this still does not work it would be
better to create an issue in JIRA with logs attached. It might be worth
posting a link there to the particular notebook i.e by using something like
https://www.zeppelinhub.com/viewer/ to verify the paragraph structure.

Hope this helps!

On Thu, Oct 1, 2015 at 1:50 AM, Ryan <fr...@gmail.com> wrote:

> Any updates on this? Or perhaps tutorials that successfully integrate
> spark-csv into Zeppelin? If I can rule out the code as the problem I can
> start looking into the install to see what's going wrong.
>
> Thanks,
> Ryan
>
> On Tue, Sep 29, 2015 at 9:09 AM, Ryan <fr...@gmail.com>
> wrote:
>
>> Hi Alex,
>>
>> Thank you for getting back to me!
>>
>> The tutorial code was a bit confusing and made it seem like sqlContext
>> was the proper variable to use:
>> // Zeppelin creates and injects sc (SparkContext) and sqlContext
>> (HiveContext or SqlContext)
>>
>> I tried as you mentioned, but am still getting similar errors. Here is
>> the code I tried:
>>
>> %dep
>> z.reset()
>> z.load("com.databricks:spark-csv_2.10:1.2.0")
>>
>> %spark
>> val crimeData = "hdfs://
>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv"
>> sqlc.load("com.databricks.spark.csv", Map("path" -> crimeData, "header"
>> -> "true")).registerTempTable("crimes")
>>
>> The %spark interpreter is binded in the settings. I clicked save again to
>> make sure, then ran it again. I am getting this error:
>>
>> <console>:17: error: not found: value sqlc
>> sqlc.load("com.databricks.spark.csv", Map("path" -> crimeData, "header" ->
>> "true")).registerTempTable("crimes") ^ <console>:13: error: not found:
>> value % %spark
>> Could it be something to do with my Zeppelin installation? The tutorial
>> code ran without any issues though.
>>
>> Thanks!
>> Ryan
>>
>>
>>
>> On Mon, Sep 28, 2015 at 5:07 PM, Alexander Bezzubov <bz...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> thank you for your interested in Zeppelin!
>>>
>>> Couple of things I noticed: as you probably already know , %dep and
>>> %spark parts should always be in separate paragraphs.
>>>
>>> %spark already exposes sql context though `sqlc` variable, so you better
>>> use sqlc.load("...") instead.
>>>
>>> And of course to be able to use %spark interpreter in the notebook, you
>>> need to make sure you have it binded (cog button, on the top right)
>>>
>>> Hope this helps!
>>>
>>> --
>>> Kind regards,
>>> Alex
>>>
>>>
>>> On Mon, Sep 28, 2015 at 4:29 PM, Ryan <fr...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> In a Zeppelin notebook, I am trying to load a csv using the spark-csv
>>>> package by databricks. I am using the Hortonworks sandbox to run Zeppelin
>>>> on. Unfortunately, the methods I have been trying have not been working.
>>>>
>>>> My latest attempt is:
>>>> %dep
>>>> z.load("com.databricks:spark-csv_2.10:1.2.0")
>>>> %spark
>>>> val crimeData = "hdfs://
>>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv
>>>> "
>>>> sqlContext.load("hdfs://
>>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
>>>> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes")
>>>>
>>>> This is the error I receive:
>>>> <console>:16: error: not found: value sqlContext
>>>> sqlContext.load("hdfs://
>>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
>>>> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes") ^
>>>> <console>:12: error: not found: value % %spark ^
>>>> Thank you for any help in advance,
>>>> Ryan
>>>>
>>>
>>>
>>
>


--
--
Kind regards,
Alexander.

Re: Help with loading a CSV using Spark-SQL & Spark-CSV

Posted by Alexander Bezzubov <ab...@nflabs.com>.

Hi,

it's really hard to say more without looking into the logs of Zeppelin
Server and Spark interpreter in your case.

They way you do it seems to be right and I had no problems before, using it
exactly the same way to read csv, except that I never used %spark
explicitly but always made sure that in interpreter binding for spark is
the first one on the list (you can drag'n'drop bindings to reorder), so it
becomes a default one and hence no need to type %spark

Can you try that out and in case this still does not work it would be
better to create an issue in JIRA with logs attached. It might be worth
posting a link there to the particular notebook i.e by using something like
https://www.zeppelinhub.com/viewer/ to verify the paragraph structure.

Hope this helps!

On Thu, Oct 1, 2015 at 1:50 AM, Ryan <fr...@gmail.com> wrote:

> Any updates on this? Or perhaps tutorials that successfully integrate
> spark-csv into Zeppelin? If I can rule out the code as the problem I can
> start looking into the install to see what's going wrong.
>
> Thanks,
> Ryan
>
> On Tue, Sep 29, 2015 at 9:09 AM, Ryan <fr...@gmail.com>
> wrote:
>
>> Hi Alex,
>>
>> Thank you for getting back to me!
>>
>> The tutorial code was a bit confusing and made it seem like sqlContext
>> was the proper variable to use:
>> // Zeppelin creates and injects sc (SparkContext) and sqlContext
>> (HiveContext or SqlContext)
>>
>> I tried as you mentioned, but am still getting similar errors. Here is
>> the code I tried:
>>
>> %dep
>> z.reset()
>> z.load("com.databricks:spark-csv_2.10:1.2.0")
>>
>> %spark
>> val crimeData = "hdfs://
>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv"
>> sqlc.load("com.databricks.spark.csv", Map("path" -> crimeData, "header"
>> -> "true")).registerTempTable("crimes")
>>
>> The %spark interpreter is binded in the settings. I clicked save again to
>> make sure, then ran it again. I am getting this error:
>>
>> <console>:17: error: not found: value sqlc
>> sqlc.load("com.databricks.spark.csv", Map("path" -> crimeData, "header" ->
>> "true")).registerTempTable("crimes") ^ <console>:13: error: not found:
>> value % %spark
>> Could it be something to do with my Zeppelin installation? The tutorial
>> code ran without any issues though.
>>
>> Thanks!
>> Ryan
>>
>>
>>
>> On Mon, Sep 28, 2015 at 5:07 PM, Alexander Bezzubov <bz...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> thank you for your interested in Zeppelin!
>>>
>>> Couple of things I noticed: as you probably already know , %dep and
>>> %spark parts should always be in separate paragraphs.
>>>
>>> %spark already exposes sql context though `sqlc` variable, so you better
>>> use sqlc.load("...") instead.
>>>
>>> And of course to be able to use %spark interpreter in the notebook, you
>>> need to make sure you have it binded (cog button, on the top right)
>>>
>>> Hope this helps!
>>>
>>> --
>>> Kind regards,
>>> Alex
>>>
>>>
>>> On Mon, Sep 28, 2015 at 4:29 PM, Ryan <fr...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> In a Zeppelin notebook, I am trying to load a csv using the spark-csv
>>>> package by databricks. I am using the Hortonworks sandbox to run Zeppelin
>>>> on. Unfortunately, the methods I have been trying have not been working.
>>>>
>>>> My latest attempt is:
>>>> %dep
>>>> z.load("com.databricks:spark-csv_2.10:1.2.0")
>>>> %spark
>>>> val crimeData = "hdfs://
>>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv
>>>> "
>>>> sqlContext.load("hdfs://
>>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
>>>> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes")
>>>>
>>>> This is the error I receive:
>>>> <console>:16: error: not found: value sqlContext
>>>> sqlContext.load("hdfs://
>>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
>>>> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes") ^
>>>> <console>:12: error: not found: value % %spark ^
>>>> Thank you for any help in advance,
>>>> Ryan
>>>>
>>>
>>>
>>
>


-- 
--
Kind regards,
Alexander.

Re: Help with loading a CSV using Spark-SQL & Spark-CSV

Posted by Ryan <fr...@gmail.com>.

Any updates on this? Or perhaps tutorials that successfully integrate
spark-csv into Zeppelin? If I can rule out the code as the problem I can
start looking into the install to see what's going wrong.

Thanks,
Ryan

On Tue, Sep 29, 2015 at 9:09 AM, Ryan <fr...@gmail.com> wrote:

> Hi Alex,
>
> Thank you for getting back to me!
>
> The tutorial code was a bit confusing and made it seem like sqlContext was
> the proper variable to use:
> // Zeppelin creates and injects sc (SparkContext) and sqlContext
> (HiveContext or SqlContext)
>
> I tried as you mentioned, but am still getting similar errors. Here is the
> code I tried:
>
> %dep
> z.reset()
> z.load("com.databricks:spark-csv_2.10:1.2.0")
>
> %spark
> val crimeData = "hdfs://
> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv"
> sqlc.load("com.databricks.spark.csv", Map("path" -> crimeData, "header" ->
> "true")).registerTempTable("crimes")
>
> The %spark interpreter is binded in the settings. I clicked save again to
> make sure, then ran it again. I am getting this error:
>
> <console>:17: error: not found: value sqlc
> sqlc.load("com.databricks.spark.csv", Map("path" -> crimeData, "header" ->
> "true")).registerTempTable("crimes") ^ <console>:13: error: not found:
> value % %spark
> Could it be something to do with my Zeppelin installation? The tutorial
> code ran without any issues though.
>
> Thanks!
> Ryan
>
>
>
> On Mon, Sep 28, 2015 at 5:07 PM, Alexander Bezzubov <bz...@apache.org>
> wrote:
>
>> Hi,
>>
>> thank you for your interested in Zeppelin!
>>
>> Couple of things I noticed: as you probably already know , %dep and
>> %spark parts should always be in separate paragraphs.
>>
>> %spark already exposes sql context though `sqlc` variable, so you better
>> use sqlc.load("...") instead.
>>
>> And of course to be able to use %spark interpreter in the notebook, you
>> need to make sure you have it binded (cog button, on the top right)
>>
>> Hope this helps!
>>
>> --
>> Kind regards,
>> Alex
>>
>>
>> On Mon, Sep 28, 2015 at 4:29 PM, Ryan <fr...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> In a Zeppelin notebook, I am trying to load a csv using the spark-csv
>>> package by databricks. I am using the Hortonworks sandbox to run Zeppelin
>>> on. Unfortunately, the methods I have been trying have not been working.
>>>
>>> My latest attempt is:
>>> %dep
>>> z.load("com.databricks:spark-csv_2.10:1.2.0")
>>> %spark
>>> val crimeData = "hdfs://
>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv
>>> "
>>> sqlContext.load("hdfs://
>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
>>> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes")
>>>
>>> This is the error I receive:
>>> <console>:16: error: not found: value sqlContext sqlContext.load("hdfs://
>>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
>>> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes") ^
>>> <console>:12: error: not found: value % %spark ^
>>> Thank you for any help in advance,
>>> Ryan
>>>
>>
>>
>

Re: Help with loading a CSV using Spark-SQL & Spark-CSV

Posted by Ryan <fr...@gmail.com>.

Hi Alex,

Thank you for getting back to me!

The tutorial code was a bit confusing and made it seem like sqlContext was
the proper variable to use:
// Zeppelin creates and injects sc (SparkContext) and sqlContext
(HiveContext or SqlContext)

I tried as you mentioned, but am still getting similar errors. Here is the
code I tried:

%dep
z.reset()
z.load("com.databricks:spark-csv_2.10:1.2.0")

%spark
val crimeData = "hdfs://
sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv"
sqlc.load("com.databricks.spark.csv", Map("path" -> crimeData, "header" ->
"true")).registerTempTable("crimes")

The %spark interpreter is binded in the settings. I clicked save again to
make sure, then ran it again. I am getting this error:

<console>:17: error: not found: value sqlc
sqlc.load("com.databricks.spark.csv", Map("path" -> crimeData, "header" ->
"true")).registerTempTable("crimes") ^ <console>:13: error: not found:
value % %spark
Could it be something to do with my Zeppelin installation? The tutorial
code ran without any issues though.

Thanks!
Ryan



On Mon, Sep 28, 2015 at 5:07 PM, Alexander Bezzubov <bz...@apache.org> wrote:

> Hi,
>
> thank you for your interested in Zeppelin!
>
> Couple of things I noticed: as you probably already know , %dep and %spark
> parts should always be in separate paragraphs.
>
> %spark already exposes sql context though `sqlc` variable, so you better
> use sqlc.load("...") instead.
>
> And of course to be able to use %spark interpreter in the notebook, you
> need to make sure you have it binded (cog button, on the top right)
>
> Hope this helps!
>
> --
> Kind regards,
> Alex
>
>
> On Mon, Sep 28, 2015 at 4:29 PM, Ryan <fr...@gmail.com>
> wrote:
>
>> Hi,
>>
>> In a Zeppelin notebook, I am trying to load a csv using the spark-csv
>> package by databricks. I am using the Hortonworks sandbox to run Zeppelin
>> on. Unfortunately, the methods I have been trying have not been working.
>>
>> My latest attempt is:
>> %dep
>> z.load("com.databricks:spark-csv_2.10:1.2.0")
>> %spark
>> val crimeData = "hdfs://
>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv"
>> sqlContext.load("hdfs://
>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
>> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes")
>>
>> This is the error I receive:
>> <console>:16: error: not found: value sqlContext sqlContext.load("hdfs://
>> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
>> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes") ^
>> <console>:12: error: not found: value % %spark ^
>> Thank you for any help in advance,
>> Ryan
>>
>
>

Re: Help with loading a CSV using Spark-SQL & Spark-CSV

Posted by Alexander Bezzubov <bz...@apache.org>.

Hi,

thank you for your interested in Zeppelin!

Couple of things I noticed: as you probably already know , %dep and %spark
parts should always be in separate paragraphs.

%spark already exposes sql context though `sqlc` variable, so you better
use sqlc.load("...") instead.

And of course to be able to use %spark interpreter in the notebook, you
need to make sure you have it binded (cog button, on the top right)

Hope this helps!

--
Kind regards,
Alex


On Mon, Sep 28, 2015 at 4:29 PM, Ryan <fr...@gmail.com> wrote:

> Hi,
>
> In a Zeppelin notebook, I am trying to load a csv using the spark-csv
> package by databricks. I am using the Hortonworks sandbox to run Zeppelin
> on. Unfortunately, the methods I have been trying have not been working.
>
> My latest attempt is:
> %dep
> z.load("com.databricks:spark-csv_2.10:1.2.0")
> %spark
> val crimeData = "hdfs://
> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv"
> sqlContext.load("hdfs://
> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes")
>
> This is the error I receive:
> <console>:16: error: not found: value sqlContext sqlContext.load("hdfs://
> sandbox.hortonworks.com:8020/user/root/data/crime_incidents_2013_CSV.csv",
> Map("path" -> crimeData, "header" -> "true")).registerTempTable("crimes") ^
> <console>:12: error: not found: value % %spark ^
> Thank you for any help in advance,
> Ryan
>