You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Curtis Burkhalter <cu...@gmail.com> on 2017/06/06 14:45:45 UTC

problem initiating spark context with pyspark

Hello all,

I'm new to Spark and I'm trying to interact with it using Pyspark. I'm
using the prebuilt version of spark v. 2.1.1 and when I go to the command
line and use the command 'bin\pyspark' I have initialization problems and
get the following message:

C:\spark\spark-2.1.1-bin-hadoop2.7> bin\pyspark
Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41)
[MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
17/06/06 10:30:14 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
17/06/06 10:30:21 WARN ObjectStore: Version information not found in
metastore. hive.metastore.schema.verification is not enabled so recording
the schema version 1.2.0
17/06/06 10:30:21 WARN ObjectStore: Failed to get database default,
returning NoSuchObjectException
Traceback (most recent call last):
  File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
line 63, in deco
    return f(*a, **kw)
  File
"C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py",
line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o22.sessionState.
: java.lang.IllegalArgumentException: Error while instantiating
'org.apache.spark.sql.hive.HiveSessionState':
        at
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
        at
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
        at
org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
        ... 13 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating
'org.apache.spark.sql.hive.HiveExternalCatalog':
        at
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
        at
org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
        at
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
        at
org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
        at scala.Option.getOrElse(Option.scala:121)
        at
org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
        at
org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
        at
org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
        at
org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
        ... 18 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at
org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
        ... 26 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at
org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
        at
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
        at
org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
        at
org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66)
        ... 31 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root
scratch dir: /tmp/hive on HDFS should be writable. Current permissions are:
rw-rw-rw-
        at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
        at
org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
        ... 39 more
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on
HDFS should be writable. Current permissions are: rw-rw-rw-
        at
org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
        at
org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
        at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
        ... 40 more


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\spark\spark-2.1.1-bin-hadoop2.7\bin\..\python\pyspark\shell.py",
line 43, in <module>
    spark = SparkSession.builder\
  File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\session.py",
line 179, in getOrCreate
    session._jsparkSession.sessionState().conf().setConfString(key, value)
  File
"C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py",
line 1133, in __call__
  File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: "Error while instantiating
'org.apache.spark.sql.hive.HiveSessionState':"
>>>

Any help with what might be going wrong here would be greatly appreciated.

Best
-- 
Curtis Burkhalter
Postdoctoral Research Associate, National Audubon Society

https://sites.google.com/site/curtisburkhalter/

Re: problem initiating spark context with pyspark

Posted by Gourav Sengupta <go...@gmail.com>.
Generally I try to make best of the amount of memory my system has for
computation. It might just be of help to see the amount of memory Windows
takes just for running itself and then compare it with Ubuntu or any other
linux or unix or solaris systems.

But I am not quite sure of the used case of course.

Regards,
Gourav Sengupta

On Sat, Jun 10, 2017 at 11:29 PM, Felix Cheung <fe...@hotmail.com>
wrote:

> Curtis, assuming you are running a somewhat recent windows version you
> would not have access to c:\tmp, in your command example
>
> winutils.exe ls -F C:\tmp\hive
>
> Try changing the path to under your user directory.
>
> Running Spark on Windows should work :)
>
> ------------------------------
> *From:* Curtis Burkhalter <cu...@gmail.com>
> *Sent:* Wednesday, June 7, 2017 7:46:56 AM
> *To:* Doc Dwarf
> *Cc:* user@spark.apache.org
> *Subject:* Re: problem initiating spark context with pyspark
>
> Thanks Doc I saw this on another board yesterday so I've tried this by
> first going to the directory where I've stored the wintutils.exe and then
> as an admin running the command  that you suggested and I get this
> exception when checking the permissions:
>
> C:\winutils\bin>winutils.exe ls -F C:\tmp\hive
> FindFileOwnerAndPermission error (1789): The trust relationship between
> this workstation and the primary domain failed.
>
> I'm fairly new to the command line and determining what the different
> exceptions mean. Do you have any advice what this error means and how I
> might go about fixing this?
>
> Thanks again
>
>
> On Wed, Jun 7, 2017 at 9:51 AM, Doc Dwarf <do...@gmail.com> wrote:
>
>> Hi Curtis,
>>
>> I believe in windows, the following command needs to be executed: (will
>> need winutils installed)
>>
>> D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
>>
>>
>>
>> On 6 June 2017 at 09:45, Curtis Burkhalter <cu...@gmail.com>
>> wrote:
>>
>>> Hello all,
>>>
>>> I'm new to Spark and I'm trying to interact with it using Pyspark. I'm
>>> using the prebuilt version of spark v. 2.1.1 and when I go to the command
>>> line and use the command 'bin\pyspark' I have initialization problems and
>>> get the following message:
>>>
>>> C:\spark\spark-2.1.1-bin-hadoop2.7> bin\pyspark
>>> Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41)
>>> [MSC v.1900 64 bit (AMD64)] on win32
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> Using Spark's default log4j profile: org/apache/spark/log4j-default
>>> s.properties
>>> Setting default log level to "WARN".
>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>> setLogLevel(newLevel).
>>> 17/06/06 10:30:14 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> 17/06/06 10:30:21 WARN ObjectStore: Version information not found in
>>> metastore. hive.metastore.schema.verification is not enabled so
>>> recording the schema version 1.2.0
>>> 17/06/06 10:30:21 WARN ObjectStore: Failed to get database default,
>>> returning NoSuchObjectException
>>> Traceback (most recent call last):
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
>>> line 63, in deco
>>>     return f(*a, **kw)
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py",
>>> line 319, in get_return_value
>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>> o22.sessionState.
>>> : java.lang.IllegalArgumentException: Error while instantiating
>>> 'org.apache.spark.sql.hive.HiveSessionState':
>>>         at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$Spar
>>> kSession$$reflect(SparkSession.scala:981)
>>>         at org.apache.spark.sql.SparkSession.sessionState$lzycompute(Sp
>>> arkSession.scala:110)
>>>         at org.apache.spark.sql.SparkSession.sessionState(SparkSession.
>>> scala:109)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>>>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.jav
>>> a:357)
>>>         at py4j.Gateway.invoke(Gateway.java:280)
>>>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.j
>>> ava:132)
>>>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>>>         at java.lang.Thread.run(Thread.java:748)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>>> ConstructorAccessorImpl.java:62)
>>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>>> legatingConstructorAccessorImpl.java:45)
>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>>> 23)
>>>         at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$Spar
>>> kSession$$reflect(SparkSession.scala:978)
>>>         ... 13 more
>>> Caused by: java.lang.IllegalArgumentException: Error while
>>> instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
>>>         at org.apache.spark.sql.internal.SharedState$.org$apache$spark$
>>> sql$internal$SharedState$$reflect(SharedState.scala:169)
>>>         at org.apache.spark.sql.internal.SharedState.<init>(SharedState
>>> .scala:86)
>>>         at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.app
>>> ly(SparkSession.scala:101)
>>>         at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.app
>>> ly(SparkSession.scala:101)
>>>         at scala.Option.getOrElse(Option.scala:121)
>>>         at org.apache.spark.sql.SparkSession.sharedState$lzycompute(Spa
>>> rkSession.scala:101)
>>>         at org.apache.spark.sql.SparkSession.sharedState(SparkSession.s
>>> cala:100)
>>>         at org.apache.spark.sql.internal.SessionState.<init>(SessionSta
>>> te.scala:157)
>>>         at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessio
>>> nState.scala:32)
>>>         ... 18 more
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>>> ConstructorAccessorImpl.java:62)
>>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>>> legatingConstructorAccessorImpl.java:45)
>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>>> 23)
>>>         at org.apache.spark.sql.internal.SharedState$.org$apache$spark$
>>> sql$internal$SharedState$$reflect(SharedState.scala:166)
>>>         ... 26 more
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>>> ConstructorAccessorImpl.java:62)
>>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>>> legatingConstructorAccessorImpl.java:45)
>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>>> 23)
>>>         at org.apache.spark.sql.hive.client.IsolatedClientLoader.create
>>> Client(IsolatedClientLoader.scala:264)
>>>         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(Hi
>>> veUtils.scala:358)
>>>         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(Hi
>>> veUtils.scala:262)
>>>         at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExt
>>> ernalCatalog.scala:66)
>>>         ... 31 more
>>> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The
>>> root scratch dir: /tmp/hive on HDFS should be writable. Current permissions
>>> are: rw-rw-rw-
>>>         at org.apache.hadoop.hive.ql.session.SessionState.start(Session
>>> State.java:522)
>>>         at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveC
>>> lientImpl.scala:188)
>>>         ... 39 more
>>> Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive
>>> on HDFS should be writable. Current permissions are: rw-rw-rw-
>>>         at org.apache.hadoop.hive.ql.session.SessionState.createRootHDF
>>> SDir(SessionState.java:612)
>>>         at org.apache.hadoop.hive.ql.session.SessionState.createSession
>>> Dirs(SessionState.java:554)
>>>         at org.apache.hadoop.hive.ql.session.SessionState.start(Session
>>> State.java:508)
>>>         ... 40 more
>>>
>>>
>>> During handling of the above exception, another exception occurred:
>>>
>>> Traceback (most recent call last):
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\bin\..\python\pyspark\shell.py",
>>> line 43, in <module>
>>>     spark = SparkSession.builder\
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\session.py",
>>> line 179, in getOrCreate
>>>     session._jsparkSession.sessionState().conf().setConfString(key,
>>> value)
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py",
>>> line 1133, in __call__
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
>>> line 79, in deco
>>>     raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
>>> pyspark.sql.utils.IllegalArgumentException: "Error while instantiating
>>> 'org.apache.spark.sql.hive.HiveSessionState':"
>>> >>>
>>>
>>> Any help with what might be going wrong here would be greatly
>>> appreciated.
>>>
>>> Best
>>> --
>>> Curtis Burkhalter
>>> Postdoctoral Research Associate, National Audubon Society
>>>
>>> https://sites.google.com/site/curtisburkhalter/
>>>
>>
>>
>
>
> --
> Curtis Burkhalter
> Postdoctoral Research Associate, National Audubon Society
>
> https://sites.google.com/site/curtisburkhalter/
>

Re: problem initiating spark context with pyspark

Posted by Felix Cheung <fe...@hotmail.com>.
Curtis, assuming you are running a somewhat recent windows version you would not have access to c:\tmp, in your command example

winutils.exe ls -F C:\tmp\hive

Try changing the path to under your user directory.

Running Spark on Windows should work :)

________________________________
From: Curtis Burkhalter <cu...@gmail.com>
Sent: Wednesday, June 7, 2017 7:46:56 AM
To: Doc Dwarf
Cc: user@spark.apache.org
Subject: Re: problem initiating spark context with pyspark

Thanks Doc I saw this on another board yesterday so I've tried this by first going to the directory where I've stored the wintutils.exe and then as an admin running the command  that you suggested and I get this exception when checking the permissions:

C:\winutils\bin>winutils.exe ls -F C:\tmp\hive
FindFileOwnerAndPermission error (1789): The trust relationship between this workstation and the primary domain failed.

I'm fairly new to the command line and determining what the different exceptions mean. Do you have any advice what this error means and how I might go about fixing this?

Thanks again


On Wed, Jun 7, 2017 at 9:51 AM, Doc Dwarf <do...@gmail.com>> wrote:
Hi Curtis,

I believe in windows, the following command needs to be executed: (will need winutils installed)

D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive



On 6 June 2017 at 09:45, Curtis Burkhalter <cu...@gmail.com>> wrote:
Hello all,

I'm new to Spark and I'm trying to interact with it using Pyspark. I'm using the prebuilt version of spark v. 2.1.1 and when I go to the command line and use the command 'bin\pyspark' I have initialization problems and get the following message:

C:\spark\spark-2.1.1-bin-hadoop2.7> bin\pyspark
Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/06/06 10:30:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/06 10:30:21 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/06/06 10:30:21 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Traceback (most recent call last):
  File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py", line 63, in deco
    return f(*a, **kw)
  File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o22.sessionState.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
        at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
        at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
        at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
        ... 13 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
        at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
        at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
        at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
        at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
        at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
        at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
        at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
        ... 18 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
        ... 26 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
        at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
        at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
        at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66)
        ... 31 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
        at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
        ... 39 more
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
        at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
        at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
        ... 40 more


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\spark\spark-2.1.1-bin-hadoop2.7\bin\..\python\pyspark\shell.py", line 43, in <module>
    spark = SparkSession.builder\
  File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\session.py", line 179, in getOrCreate
    session._jsparkSession.sessionState().conf().setConfString(key, value)
  File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
  File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
>>>

Any help with what might be going wrong here would be greatly appreciated.

Best
--
Curtis Burkhalter
Postdoctoral Research Associate, National Audubon Society

https://sites.google.com/site/curtisburkhalter/




--
Curtis Burkhalter
Postdoctoral Research Associate, National Audubon Society

https://sites.google.com/site/curtisburkhalter/

Re: problem initiating spark context with pyspark

Posted by Marco Mistroni <mm...@gmail.com>.
Ha...it's a 1 off.....I run spk on Ubuntu and docker on windows.......I
don't think spark and windows are best friends.  😀

On Jun 10, 2017 6:36 PM, "Gourav Sengupta" <go...@gmail.com>
wrote:

> seeing for the very first time someone try SPARK on Windows :)
>
> On Thu, Jun 8, 2017 at 8:38 PM, Marco Mistroni <mm...@gmail.com>
> wrote:
>
>> try this link
>>
>> http://letstalkspark.blogspot.co.uk/2016/02/getting-started-
>> with-spark-on-window-64.html
>>
>> it helped me when i had similar problems with windows...........
>>
>> hth
>>
>> On Wed, Jun 7, 2017 at 3:46 PM, Curtis Burkhalter <
>> curtisburkhalter@gmail.com> wrote:
>>
>>> Thanks Doc I saw this on another board yesterday so I've tried this by
>>> first going to the directory where I've stored the wintutils.exe and then
>>> as an admin running the command  that you suggested and I get this
>>> exception when checking the permissions:
>>>
>>> C:\winutils\bin>winutils.exe ls -F C:\tmp\hive
>>> FindFileOwnerAndPermission error (1789): The trust relationship between
>>> this workstation and the primary domain failed.
>>>
>>> I'm fairly new to the command line and determining what the different
>>> exceptions mean. Do you have any advice what this error means and how I
>>> might go about fixing this?
>>>
>>> Thanks again
>>>
>>>
>>> On Wed, Jun 7, 2017 at 9:51 AM, Doc Dwarf <do...@gmail.com> wrote:
>>>
>>>> Hi Curtis,
>>>>
>>>> I believe in windows, the following command needs to be executed: (will
>>>> need winutils installed)
>>>>
>>>> D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
>>>>
>>>>
>>>>
>>>> On 6 June 2017 at 09:45, Curtis Burkhalter <cu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I'm new to Spark and I'm trying to interact with it using Pyspark. I'm
>>>>> using the prebuilt version of spark v. 2.1.1 and when I go to the command
>>>>> line and use the command 'bin\pyspark' I have initialization problems and
>>>>> get the following message:
>>>>>
>>>>> C:\spark\spark-2.1.1-bin-hadoop2.7> bin\pyspark
>>>>> Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016,
>>>>> 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32
>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> Using Spark's default log4j profile: org/apache/spark/log4j-default
>>>>> s.properties
>>>>> Setting default log level to "WARN".
>>>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>>>> setLogLevel(newLevel).
>>>>> 17/06/06 10:30:14 WARN NativeCodeLoader: Unable to load native-hadoop
>>>>> library for your platform... using builtin-java classes where applicable
>>>>> 17/06/06 10:30:21 WARN ObjectStore: Version information not found in
>>>>> metastore. hive.metastore.schema.verification is not enabled so
>>>>> recording the schema version 1.2.0
>>>>> 17/06/06 10:30:21 WARN ObjectStore: Failed to get database default,
>>>>> returning NoSuchObjectException
>>>>> Traceback (most recent call last):
>>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
>>>>> line 63, in deco
>>>>>     return f(*a, **kw)
>>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py",
>>>>> line 319, in get_return_value
>>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>>>> o22.sessionState.
>>>>> : java.lang.IllegalArgumentException: Error while instantiating
>>>>> 'org.apache.spark.sql.hive.HiveSessionState':
>>>>>         at org.apache.spark.sql.SparkSess
>>>>> ion$.org$apache$spark$sql$SparkSession$$reflect(SparkSession
>>>>> .scala:981)
>>>>>         at org.apache.spark.sql.SparkSess
>>>>> ion.sessionState$lzycompute(SparkSession.scala:110)
>>>>>         at org.apache.spark.sql.SparkSess
>>>>> ion.sessionState(SparkSession.scala:109)
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>         at sun.reflect.NativeMethodAccess
>>>>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>         at sun.reflect.DelegatingMethodAc
>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>>         at py4j.reflection.MethodInvoker.
>>>>> invoke(MethodInvoker.java:244)
>>>>>         at py4j.reflection.ReflectionEngi
>>>>> ne.invoke(ReflectionEngine.java:357)
>>>>>         at py4j.Gateway.invoke(Gateway.java:280)
>>>>>         at py4j.commands.AbstractCommand.
>>>>> invokeMethod(AbstractCommand.java:132)
>>>>>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>>>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>>>>>         at java.lang.Thread.run(Thread.java:748)
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>         at sun.reflect.NativeConstructorA
>>>>> ccessorImpl.newInstance0(Native Method)
>>>>>         at sun.reflect.NativeConstructorA
>>>>> ccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>>>>         at sun.reflect.DelegatingConstruc
>>>>> torAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>>         at java.lang.reflect.Constructor.
>>>>> newInstance(Constructor.java:423)
>>>>>         at org.apache.spark.sql.SparkSess
>>>>> ion$.org$apache$spark$sql$SparkSession$$reflect(SparkSession
>>>>> .scala:978)
>>>>>         ... 13 more
>>>>> Caused by: java.lang.IllegalArgumentException: Error while
>>>>> instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
>>>>>         at org.apache.spark.sql.internal.
>>>>> SharedState$.org$apache$spark$sql$internal$SharedState$$refl
>>>>> ect(SharedState.scala:169)
>>>>>         at org.apache.spark.sql.internal.
>>>>> SharedState.<init>(SharedState.scala:86)
>>>>>         at org.apache.spark.sql.SparkSess
>>>>> ion$$anonfun$sharedState$1.apply(SparkSession.scala:101)
>>>>>         at org.apache.spark.sql.SparkSess
>>>>> ion$$anonfun$sharedState$1.apply(SparkSession.scala:101)
>>>>>         at scala.Option.getOrElse(Option.scala:121)
>>>>>         at org.apache.spark.sql.SparkSess
>>>>> ion.sharedState$lzycompute(SparkSession.scala:101)
>>>>>         at org.apache.spark.sql.SparkSess
>>>>> ion.sharedState(SparkSession.scala:100)
>>>>>         at org.apache.spark.sql.internal.
>>>>> SessionState.<init>(SessionState.scala:157)
>>>>>         at org.apache.spark.sql.hive.Hive
>>>>> SessionState.<init>(HiveSessionState.scala:32)
>>>>>         ... 18 more
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>         at sun.reflect.NativeConstructorA
>>>>> ccessorImpl.newInstance0(Native Method)
>>>>>         at sun.reflect.NativeConstructorA
>>>>> ccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>>>>         at sun.reflect.DelegatingConstruc
>>>>> torAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>>         at java.lang.reflect.Constructor.
>>>>> newInstance(Constructor.java:423)
>>>>>         at org.apache.spark.sql.internal.
>>>>> SharedState$.org$apache$spark$sql$internal$SharedState$$refl
>>>>> ect(SharedState.scala:166)
>>>>>         ... 26 more
>>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>>         at sun.reflect.NativeConstructorA
>>>>> ccessorImpl.newInstance0(Native Method)
>>>>>         at sun.reflect.NativeConstructorA
>>>>> ccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>>>>         at sun.reflect.DelegatingConstruc
>>>>> torAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>>         at java.lang.reflect.Constructor.
>>>>> newInstance(Constructor.java:423)
>>>>>         at org.apache.spark.sql.hive.clie
>>>>> nt.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
>>>>>         at org.apache.spark.sql.hive.Hive
>>>>> Utils$.newClientForMetadata(HiveUtils.scala:358)
>>>>>         at org.apache.spark.sql.hive.Hive
>>>>> Utils$.newClientForMetadata(HiveUtils.scala:262)
>>>>>         at org.apache.spark.sql.hive.Hive
>>>>> ExternalCatalog.<init>(HiveExternalCatalog.scala:66)
>>>>>         ... 31 more
>>>>> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The
>>>>> root scratch dir: /tmp/hive on HDFS should be writable. Current permissions
>>>>> are: rw-rw-rw-
>>>>>         at org.apache.hadoop.hive.ql.sess
>>>>> ion.SessionState.start(SessionState.java:522)
>>>>>         at org.apache.spark.sql.hive.clie
>>>>> nt.HiveClientImpl.<init>(HiveClientImpl.scala:188)
>>>>>         ... 39 more
>>>>> Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive
>>>>> on HDFS should be writable. Current permissions are: rw-rw-rw-
>>>>>         at org.apache.hadoop.hive.ql.sess
>>>>> ion.SessionState.createRootHDFSDir(SessionState.java:612)
>>>>>         at org.apache.hadoop.hive.ql.sess
>>>>> ion.SessionState.createSessionDirs(SessionState.java:554)
>>>>>         at org.apache.hadoop.hive.ql.sess
>>>>> ion.SessionState.start(SessionState.java:508)
>>>>>         ... 40 more
>>>>>
>>>>>
>>>>> During handling of the above exception, another exception occurred:
>>>>>
>>>>> Traceback (most recent call last):
>>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\bin\..\python\pyspark\shell.py",
>>>>> line 43, in <module>
>>>>>     spark = SparkSession.builder\
>>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\session.py",
>>>>> line 179, in getOrCreate
>>>>>     session._jsparkSession.sessionState().conf().setConfString(key,
>>>>> value)
>>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py",
>>>>> line 1133, in __call__
>>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
>>>>> line 79, in deco
>>>>>     raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
>>>>> pyspark.sql.utils.IllegalArgumentException: "Error while
>>>>> instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
>>>>> >>>
>>>>>
>>>>> Any help with what might be going wrong here would be greatly
>>>>> appreciated.
>>>>>
>>>>> Best
>>>>> --
>>>>> Curtis Burkhalter
>>>>> Postdoctoral Research Associate, National Audubon Society
>>>>>
>>>>> https://sites.google.com/site/curtisburkhalter/
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Curtis Burkhalter
>>> Postdoctoral Research Associate, National Audubon Society
>>>
>>> https://sites.google.com/site/curtisburkhalter/
>>>
>>
>>
>

Re: problem initiating spark context with pyspark

Posted by Gourav Sengupta <go...@gmail.com>.
seeing for the very first time someone try SPARK on Windows :)

On Thu, Jun 8, 2017 at 8:38 PM, Marco Mistroni <mm...@gmail.com> wrote:

> try this link
>
> http://letstalkspark.blogspot.co.uk/2016/02/getting-started-
> with-spark-on-window-64.html
>
> it helped me when i had similar problems with windows...........
>
> hth
>
> On Wed, Jun 7, 2017 at 3:46 PM, Curtis Burkhalter <
> curtisburkhalter@gmail.com> wrote:
>
>> Thanks Doc I saw this on another board yesterday so I've tried this by
>> first going to the directory where I've stored the wintutils.exe and then
>> as an admin running the command  that you suggested and I get this
>> exception when checking the permissions:
>>
>> C:\winutils\bin>winutils.exe ls -F C:\tmp\hive
>> FindFileOwnerAndPermission error (1789): The trust relationship between
>> this workstation and the primary domain failed.
>>
>> I'm fairly new to the command line and determining what the different
>> exceptions mean. Do you have any advice what this error means and how I
>> might go about fixing this?
>>
>> Thanks again
>>
>>
>> On Wed, Jun 7, 2017 at 9:51 AM, Doc Dwarf <do...@gmail.com> wrote:
>>
>>> Hi Curtis,
>>>
>>> I believe in windows, the following command needs to be executed: (will
>>> need winutils installed)
>>>
>>> D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
>>>
>>>
>>>
>>> On 6 June 2017 at 09:45, Curtis Burkhalter <cu...@gmail.com>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> I'm new to Spark and I'm trying to interact with it using Pyspark. I'm
>>>> using the prebuilt version of spark v. 2.1.1 and when I go to the command
>>>> line and use the command 'bin\pyspark' I have initialization problems and
>>>> get the following message:
>>>>
>>>> C:\spark\spark-2.1.1-bin-hadoop2.7> bin\pyspark
>>>> Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41)
>>>> [MSC v.1900 64 bit (AMD64)] on win32
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>> Using Spark's default log4j profile: org/apache/spark/log4j-default
>>>> s.properties
>>>> Setting default log level to "WARN".
>>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>>> setLogLevel(newLevel).
>>>> 17/06/06 10:30:14 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> 17/06/06 10:30:21 WARN ObjectStore: Version information not found in
>>>> metastore. hive.metastore.schema.verification is not enabled so
>>>> recording the schema version 1.2.0
>>>> 17/06/06 10:30:21 WARN ObjectStore: Failed to get database default,
>>>> returning NoSuchObjectException
>>>> Traceback (most recent call last):
>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
>>>> line 63, in deco
>>>>     return f(*a, **kw)
>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py",
>>>> line 319, in get_return_value
>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>>> o22.sessionState.
>>>> : java.lang.IllegalArgumentException: Error while instantiating
>>>> 'org.apache.spark.sql.hive.HiveSessionState':
>>>>         at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$Spar
>>>> kSession$$reflect(SparkSession.scala:981)
>>>>         at org.apache.spark.sql.SparkSession.sessionState$lzycompute(Sp
>>>> arkSession.scala:110)
>>>>         at org.apache.spark.sql.SparkSession.sessionState(SparkSession.
>>>> scala:109)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>> ssorImpl.java:62)
>>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>> thodAccessorImpl.java:43)
>>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>>>>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.jav
>>>> a:357)
>>>>         at py4j.Gateway.invoke(Gateway.java:280)
>>>>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.j
>>>> ava:132)
>>>>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>>>>         at java.lang.Thread.run(Thread.java:748)
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>> Method)
>>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>>>> ConstructorAccessorImpl.java:62)
>>>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>>>> legatingConstructorAccessorImpl.java:45)
>>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>>>> 23)
>>>>         at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$Spar
>>>> kSession$$reflect(SparkSession.scala:978)
>>>>         ... 13 more
>>>> Caused by: java.lang.IllegalArgumentException: Error while
>>>> instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
>>>>         at org.apache.spark.sql.internal.SharedState$.org$apache$spark$
>>>> sql$internal$SharedState$$reflect(SharedState.scala:169)
>>>>         at org.apache.spark.sql.internal.SharedState.<init>(SharedState
>>>> .scala:86)
>>>>         at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.app
>>>> ly(SparkSession.scala:101)
>>>>         at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.app
>>>> ly(SparkSession.scala:101)
>>>>         at scala.Option.getOrElse(Option.scala:121)
>>>>         at org.apache.spark.sql.SparkSession.sharedState$lzycompute(Spa
>>>> rkSession.scala:101)
>>>>         at org.apache.spark.sql.SparkSession.sharedState(SparkSession.s
>>>> cala:100)
>>>>         at org.apache.spark.sql.internal.SessionState.<init>(SessionSta
>>>> te.scala:157)
>>>>         at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessio
>>>> nState.scala:32)
>>>>         ... 18 more
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>> Method)
>>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>>>> ConstructorAccessorImpl.java:62)
>>>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>>>> legatingConstructorAccessorImpl.java:45)
>>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>>>> 23)
>>>>         at org.apache.spark.sql.internal.SharedState$.org$apache$spark$
>>>> sql$internal$SharedState$$reflect(SharedState.scala:166)
>>>>         ... 26 more
>>>> Caused by: java.lang.reflect.InvocationTargetException
>>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>> Method)
>>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>>>> ConstructorAccessorImpl.java:62)
>>>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>>>> legatingConstructorAccessorImpl.java:45)
>>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>>>> 23)
>>>>         at org.apache.spark.sql.hive.client.IsolatedClientLoader.create
>>>> Client(IsolatedClientLoader.scala:264)
>>>>         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(Hi
>>>> veUtils.scala:358)
>>>>         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(Hi
>>>> veUtils.scala:262)
>>>>         at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExt
>>>> ernalCatalog.scala:66)
>>>>         ... 31 more
>>>> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The
>>>> root scratch dir: /tmp/hive on HDFS should be writable. Current permissions
>>>> are: rw-rw-rw-
>>>>         at org.apache.hadoop.hive.ql.session.SessionState.start(Session
>>>> State.java:522)
>>>>         at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveC
>>>> lientImpl.scala:188)
>>>>         ... 39 more
>>>> Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive
>>>> on HDFS should be writable. Current permissions are: rw-rw-rw-
>>>>         at org.apache.hadoop.hive.ql.session.SessionState.createRootHDF
>>>> SDir(SessionState.java:612)
>>>>         at org.apache.hadoop.hive.ql.session.SessionState.createSession
>>>> Dirs(SessionState.java:554)
>>>>         at org.apache.hadoop.hive.ql.session.SessionState.start(Session
>>>> State.java:508)
>>>>         ... 40 more
>>>>
>>>>
>>>> During handling of the above exception, another exception occurred:
>>>>
>>>> Traceback (most recent call last):
>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\bin\..\python\pyspark\shell.py",
>>>> line 43, in <module>
>>>>     spark = SparkSession.builder\
>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\session.py",
>>>> line 179, in getOrCreate
>>>>     session._jsparkSession.sessionState().conf().setConfString(key,
>>>> value)
>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py",
>>>> line 1133, in __call__
>>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
>>>> line 79, in deco
>>>>     raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
>>>> pyspark.sql.utils.IllegalArgumentException: "Error while instantiating
>>>> 'org.apache.spark.sql.hive.HiveSessionState':"
>>>> >>>
>>>>
>>>> Any help with what might be going wrong here would be greatly
>>>> appreciated.
>>>>
>>>> Best
>>>> --
>>>> Curtis Burkhalter
>>>> Postdoctoral Research Associate, National Audubon Society
>>>>
>>>> https://sites.google.com/site/curtisburkhalter/
>>>>
>>>
>>>
>>
>>
>> --
>> Curtis Burkhalter
>> Postdoctoral Research Associate, National Audubon Society
>>
>> https://sites.google.com/site/curtisburkhalter/
>>
>
>

Re: problem initiating spark context with pyspark

Posted by Marco Mistroni <mm...@gmail.com>.
try this link

http://letstalkspark.blogspot.co.uk/2016/02/getting-started-with-spark-on-window-64.html

it helped me when i had similar problems with windows...........

hth

On Wed, Jun 7, 2017 at 3:46 PM, Curtis Burkhalter <
curtisburkhalter@gmail.com> wrote:

> Thanks Doc I saw this on another board yesterday so I've tried this by
> first going to the directory where I've stored the wintutils.exe and then
> as an admin running the command  that you suggested and I get this
> exception when checking the permissions:
>
> C:\winutils\bin>winutils.exe ls -F C:\tmp\hive
> FindFileOwnerAndPermission error (1789): The trust relationship between
> this workstation and the primary domain failed.
>
> I'm fairly new to the command line and determining what the different
> exceptions mean. Do you have any advice what this error means and how I
> might go about fixing this?
>
> Thanks again
>
>
> On Wed, Jun 7, 2017 at 9:51 AM, Doc Dwarf <do...@gmail.com> wrote:
>
>> Hi Curtis,
>>
>> I believe in windows, the following command needs to be executed: (will
>> need winutils installed)
>>
>> D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
>>
>>
>>
>> On 6 June 2017 at 09:45, Curtis Burkhalter <cu...@gmail.com>
>> wrote:
>>
>>> Hello all,
>>>
>>> I'm new to Spark and I'm trying to interact with it using Pyspark. I'm
>>> using the prebuilt version of spark v. 2.1.1 and when I go to the command
>>> line and use the command 'bin\pyspark' I have initialization problems and
>>> get the following message:
>>>
>>> C:\spark\spark-2.1.1-bin-hadoop2.7> bin\pyspark
>>> Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41)
>>> [MSC v.1900 64 bit (AMD64)] on win32
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> Using Spark's default log4j profile: org/apache/spark/log4j-default
>>> s.properties
>>> Setting default log level to "WARN".
>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>> setLogLevel(newLevel).
>>> 17/06/06 10:30:14 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> 17/06/06 10:30:21 WARN ObjectStore: Version information not found in
>>> metastore. hive.metastore.schema.verification is not enabled so
>>> recording the schema version 1.2.0
>>> 17/06/06 10:30:21 WARN ObjectStore: Failed to get database default,
>>> returning NoSuchObjectException
>>> Traceback (most recent call last):
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
>>> line 63, in deco
>>>     return f(*a, **kw)
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py",
>>> line 319, in get_return_value
>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>> o22.sessionState.
>>> : java.lang.IllegalArgumentException: Error while instantiating
>>> 'org.apache.spark.sql.hive.HiveSessionState':
>>>         at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$Spar
>>> kSession$$reflect(SparkSession.scala:981)
>>>         at org.apache.spark.sql.SparkSession.sessionState$lzycompute(Sp
>>> arkSession.scala:110)
>>>         at org.apache.spark.sql.SparkSession.sessionState(SparkSession.
>>> scala:109)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>>>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.jav
>>> a:357)
>>>         at py4j.Gateway.invoke(Gateway.java:280)
>>>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.j
>>> ava:132)
>>>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>>>         at java.lang.Thread.run(Thread.java:748)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>>> ConstructorAccessorImpl.java:62)
>>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>>> legatingConstructorAccessorImpl.java:45)
>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>>> 23)
>>>         at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$Spar
>>> kSession$$reflect(SparkSession.scala:978)
>>>         ... 13 more
>>> Caused by: java.lang.IllegalArgumentException: Error while
>>> instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
>>>         at org.apache.spark.sql.internal.SharedState$.org$apache$spark$
>>> sql$internal$SharedState$$reflect(SharedState.scala:169)
>>>         at org.apache.spark.sql.internal.SharedState.<init>(SharedState
>>> .scala:86)
>>>         at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.app
>>> ly(SparkSession.scala:101)
>>>         at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.app
>>> ly(SparkSession.scala:101)
>>>         at scala.Option.getOrElse(Option.scala:121)
>>>         at org.apache.spark.sql.SparkSession.sharedState$lzycompute(Spa
>>> rkSession.scala:101)
>>>         at org.apache.spark.sql.SparkSession.sharedState(SparkSession.s
>>> cala:100)
>>>         at org.apache.spark.sql.internal.SessionState.<init>(SessionSta
>>> te.scala:157)
>>>         at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessio
>>> nState.scala:32)
>>>         ... 18 more
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>>> ConstructorAccessorImpl.java:62)
>>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>>> legatingConstructorAccessorImpl.java:45)
>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>>> 23)
>>>         at org.apache.spark.sql.internal.SharedState$.org$apache$spark$
>>> sql$internal$SharedState$$reflect(SharedState.scala:166)
>>>         ... 26 more
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>>> ConstructorAccessorImpl.java:62)
>>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>>> legatingConstructorAccessorImpl.java:45)
>>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>>> 23)
>>>         at org.apache.spark.sql.hive.client.IsolatedClientLoader.create
>>> Client(IsolatedClientLoader.scala:264)
>>>         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(Hi
>>> veUtils.scala:358)
>>>         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(Hi
>>> veUtils.scala:262)
>>>         at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExt
>>> ernalCatalog.scala:66)
>>>         ... 31 more
>>> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The
>>> root scratch dir: /tmp/hive on HDFS should be writable. Current permissions
>>> are: rw-rw-rw-
>>>         at org.apache.hadoop.hive.ql.session.SessionState.start(Session
>>> State.java:522)
>>>         at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveC
>>> lientImpl.scala:188)
>>>         ... 39 more
>>> Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive
>>> on HDFS should be writable. Current permissions are: rw-rw-rw-
>>>         at org.apache.hadoop.hive.ql.session.SessionState.createRootHDF
>>> SDir(SessionState.java:612)
>>>         at org.apache.hadoop.hive.ql.session.SessionState.createSession
>>> Dirs(SessionState.java:554)
>>>         at org.apache.hadoop.hive.ql.session.SessionState.start(Session
>>> State.java:508)
>>>         ... 40 more
>>>
>>>
>>> During handling of the above exception, another exception occurred:
>>>
>>> Traceback (most recent call last):
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\bin\..\python\pyspark\shell.py",
>>> line 43, in <module>
>>>     spark = SparkSession.builder\
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\session.py",
>>> line 179, in getOrCreate
>>>     session._jsparkSession.sessionState().conf().setConfString(key,
>>> value)
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py",
>>> line 1133, in __call__
>>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
>>> line 79, in deco
>>>     raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
>>> pyspark.sql.utils.IllegalArgumentException: "Error while instantiating
>>> 'org.apache.spark.sql.hive.HiveSessionState':"
>>> >>>
>>>
>>> Any help with what might be going wrong here would be greatly
>>> appreciated.
>>>
>>> Best
>>> --
>>> Curtis Burkhalter
>>> Postdoctoral Research Associate, National Audubon Society
>>>
>>> https://sites.google.com/site/curtisburkhalter/
>>>
>>
>>
>
>
> --
> Curtis Burkhalter
> Postdoctoral Research Associate, National Audubon Society
>
> https://sites.google.com/site/curtisburkhalter/
>

Re: problem initiating spark context with pyspark

Posted by Curtis Burkhalter <cu...@gmail.com>.
Thanks Doc I saw this on another board yesterday so I've tried this by
first going to the directory where I've stored the wintutils.exe and then
as an admin running the command  that you suggested and I get this
exception when checking the permissions:

C:\winutils\bin>winutils.exe ls -F C:\tmp\hive
FindFileOwnerAndPermission error (1789): The trust relationship between
this workstation and the primary domain failed.

I'm fairly new to the command line and determining what the different
exceptions mean. Do you have any advice what this error means and how I
might go about fixing this?

Thanks again


On Wed, Jun 7, 2017 at 9:51 AM, Doc Dwarf <do...@gmail.com> wrote:

> Hi Curtis,
>
> I believe in windows, the following command needs to be executed: (will
> need winutils installed)
>
> D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
>
>
>
> On 6 June 2017 at 09:45, Curtis Burkhalter <cu...@gmail.com>
> wrote:
>
>> Hello all,
>>
>> I'm new to Spark and I'm trying to interact with it using Pyspark. I'm
>> using the prebuilt version of spark v. 2.1.1 and when I go to the command
>> line and use the command 'bin\pyspark' I have initialization problems and
>> get the following message:
>>
>> C:\spark\spark-2.1.1-bin-hadoop2.7> bin\pyspark
>> Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41)
>> [MSC v.1900 64 bit (AMD64)] on win32
>> Type "help", "copyright", "credits" or "license" for more information.
>> Using Spark's default log4j profile: org/apache/spark/log4j-default
>> s.properties
>> Setting default log level to "WARN".
>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>> setLogLevel(newLevel).
>> 17/06/06 10:30:14 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 17/06/06 10:30:21 WARN ObjectStore: Version information not found in
>> metastore. hive.metastore.schema.verification is not enabled so
>> recording the schema version 1.2.0
>> 17/06/06 10:30:21 WARN ObjectStore: Failed to get database default,
>> returning NoSuchObjectException
>> Traceback (most recent call last):
>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
>> line 63, in deco
>>     return f(*a, **kw)
>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py",
>> line 319, in get_return_value
>> py4j.protocol.Py4JJavaError: An error occurred while calling
>> o22.sessionState.
>> : java.lang.IllegalArgumentException: Error while instantiating
>> 'org.apache.spark.sql.hive.HiveSessionState':
>>         at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$
>> SparkSession$$reflect(SparkSession.scala:981)
>>         at org.apache.spark.sql.SparkSession.sessionState$lzycompute(
>> SparkSession.scala:110)
>>         at org.apache.spark.sql.SparkSession.sessionState(SparkSession.
>> scala:109)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.
>> java:357)
>>         at py4j.Gateway.invoke(Gateway.java:280)
>>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.j
>> ava:132)
>>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>>         at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.lang.reflect.InvocationTargetException
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>> ConstructorAccessorImpl.java:62)
>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>> legatingConstructorAccessorImpl.java:45)
>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>> 23)
>>         at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$
>> SparkSession$$reflect(SparkSession.scala:978)
>>         ... 13 more
>> Caused by: java.lang.IllegalArgumentException: Error while instantiating
>> 'org.apache.spark.sql.hive.HiveExternalCatalog':
>>         at org.apache.spark.sql.internal.SharedState$.org$apache$spark$
>> sql$internal$SharedState$$reflect(SharedState.scala:169)
>>         at org.apache.spark.sql.internal.SharedState.<init>(SharedState
>> .scala:86)
>>         at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.
>> apply(SparkSession.scala:101)
>>         at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.
>> apply(SparkSession.scala:101)
>>         at scala.Option.getOrElse(Option.scala:121)
>>         at org.apache.spark.sql.SparkSession.sharedState$lzycompute(
>> SparkSession.scala:101)
>>         at org.apache.spark.sql.SparkSession.sharedState(SparkSession.
>> scala:100)
>>         at org.apache.spark.sql.internal.SessionState.<init>(SessionSta
>> te.scala:157)
>>         at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessio
>> nState.scala:32)
>>         ... 18 more
>> Caused by: java.lang.reflect.InvocationTargetException
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>> ConstructorAccessorImpl.java:62)
>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>> legatingConstructorAccessorImpl.java:45)
>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>> 23)
>>         at org.apache.spark.sql.internal.SharedState$.org$apache$spark$
>> sql$internal$SharedState$$reflect(SharedState.scala:166)
>>         ... 26 more
>> Caused by: java.lang.reflect.InvocationTargetException
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>> ConstructorAccessorImpl.java:62)
>>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>> legatingConstructorAccessorImpl.java:45)
>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:4
>> 23)
>>         at org.apache.spark.sql.hive.client.IsolatedClientLoader.create
>> Client(IsolatedClientLoader.scala:264)
>>         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(Hi
>> veUtils.scala:358)
>>         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(Hi
>> veUtils.scala:262)
>>         at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExt
>> ernalCatalog.scala:66)
>>         ... 31 more
>> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The
>> root scratch dir: /tmp/hive on HDFS should be writable. Current permissions
>> are: rw-rw-rw-
>>         at org.apache.hadoop.hive.ql.session.SessionState.start(Session
>> State.java:522)
>>         at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveC
>> lientImpl.scala:188)
>>         ... 39 more
>> Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on
>> HDFS should be writable. Current permissions are: rw-rw-rw-
>>         at org.apache.hadoop.hive.ql.session.SessionState.createRootHDF
>> SDir(SessionState.java:612)
>>         at org.apache.hadoop.hive.ql.session.SessionState.createSession
>> Dirs(SessionState.java:554)
>>         at org.apache.hadoop.hive.ql.session.SessionState.start(Session
>> State.java:508)
>>         ... 40 more
>>
>>
>> During handling of the above exception, another exception occurred:
>>
>> Traceback (most recent call last):
>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\bin\..\python\pyspark\shell.py",
>> line 43, in <module>
>>     spark = SparkSession.builder\
>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\session.py",
>> line 179, in getOrCreate
>>     session._jsparkSession.sessionState().conf().setConfString(key,
>> value)
>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py",
>> line 1133, in __call__
>>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
>> line 79, in deco
>>     raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
>> pyspark.sql.utils.IllegalArgumentException: "Error while instantiating
>> 'org.apache.spark.sql.hive.HiveSessionState':"
>> >>>
>>
>> Any help with what might be going wrong here would be greatly appreciated.
>>
>> Best
>> --
>> Curtis Burkhalter
>> Postdoctoral Research Associate, National Audubon Society
>>
>> https://sites.google.com/site/curtisburkhalter/
>>
>
>


-- 
Curtis Burkhalter
Postdoctoral Research Associate, National Audubon Society

https://sites.google.com/site/curtisburkhalter/

Re: problem initiating spark context with pyspark

Posted by Doc Dwarf <do...@gmail.com>.
Hi Curtis,

I believe in windows, the following command needs to be executed: (will
need winutils installed)

D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive



On 6 June 2017 at 09:45, Curtis Burkhalter <cu...@gmail.com>
wrote:

> Hello all,
>
> I'm new to Spark and I'm trying to interact with it using Pyspark. I'm
> using the prebuilt version of spark v. 2.1.1 and when I go to the command
> line and use the command 'bin\pyspark' I have initialization problems and
> get the following message:
>
> C:\spark\spark-2.1.1-bin-hadoop2.7> bin\pyspark
> Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41)
> [MSC v.1900 64 bit (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
> Using Spark's default log4j profile: org/apache/spark/log4j-
> defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
> setLogLevel(newLevel).
> 17/06/06 10:30:14 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 17/06/06 10:30:21 WARN ObjectStore: Version information not found in
> metastore. hive.metastore.schema.verification is not enabled so recording
> the schema version 1.2.0
> 17/06/06 10:30:21 WARN ObjectStore: Failed to get database default,
> returning NoSuchObjectException
> Traceback (most recent call last):
>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
> line 63, in deco
>     return f(*a, **kw)
>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.
> 10.4-src.zip\py4j\protocol.py", line 319, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o22.sessionState.
> : java.lang.IllegalArgumentException: Error while instantiating
> 'org.apache.spark.sql.hive.HiveSessionState':
>         at org.apache.spark.sql.SparkSession$.org$apache$
> spark$sql$SparkSession$$reflect(SparkSession.scala:981)
>         at org.apache.spark.sql.SparkSession.sessionState$
> lzycompute(SparkSession.scala:110)
>         at org.apache.spark.sql.SparkSession.sessionState(
> SparkSession.scala:109)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>         at py4j.reflection.ReflectionEngine.invoke(
> ReflectionEngine.java:357)
>         at py4j.Gateway.invoke(Gateway.java:280)
>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.
> java:132)
>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>         at py4j.GatewayConnection.run(GatewayConnection.java:214)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.spark.sql.SparkSession$.org$apache$
> spark$sql$SparkSession$$reflect(SparkSession.scala:978)
>         ... 13 more
> Caused by: java.lang.IllegalArgumentException: Error while instantiating
> 'org.apache.spark.sql.hive.HiveExternalCatalog':
>         at org.apache.spark.sql.internal.SharedState$.org$apache$spark$
> sql$internal$SharedState$$reflect(SharedState.scala:169)
>         at org.apache.spark.sql.internal.SharedState.<init>(
> SharedState.scala:86)
>         at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(
> SparkSession.scala:101)
>         at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(
> SparkSession.scala:101)
>         at scala.Option.getOrElse(Option.scala:121)
>         at org.apache.spark.sql.SparkSession.sharedState$
> lzycompute(SparkSession.scala:101)
>         at org.apache.spark.sql.SparkSession.sharedState(
> SparkSession.scala:100)
>         at org.apache.spark.sql.internal.SessionState.<init>(
> SessionState.scala:157)
>         at org.apache.spark.sql.hive.HiveSessionState.<init>(
> HiveSessionState.scala:32)
>         ... 18 more
> Caused by: java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.spark.sql.internal.SharedState$.org$apache$spark$
> sql$internal$SharedState$$reflect(SharedState.scala:166)
>         ... 26 more
> Caused by: java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.spark.sql.hive.client.IsolatedClientLoader.
> createClient(IsolatedClientLoader.scala:264)
>         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(
> HiveUtils.scala:358)
>         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(
> HiveUtils.scala:262)
>         at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(
> HiveExternalCatalog.scala:66)
>         ... 31 more
> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The
> root scratch dir: /tmp/hive on HDFS should be writable. Current permissions
> are: rw-rw-rw-
>         at org.apache.hadoop.hive.ql.session.SessionState.start(
> SessionState.java:522)
>         at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(
> HiveClientImpl.scala:188)
>         ... 39 more
> Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on
> HDFS should be writable. Current permissions are: rw-rw-rw-
>         at org.apache.hadoop.hive.ql.session.SessionState.
> createRootHDFSDir(SessionState.java:612)
>         at org.apache.hadoop.hive.ql.session.SessionState.
> createSessionDirs(SessionState.java:554)
>         at org.apache.hadoop.hive.ql.session.SessionState.start(
> SessionState.java:508)
>         ... 40 more
>
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\bin\..\python\pyspark\shell.py",
> line 43, in <module>
>     spark = SparkSession.builder\
>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\session.py",
> line 179, in getOrCreate
>     session._jsparkSession.sessionState().conf().setConfString(key, value)
>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.
> 10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
>   File "C:\spark\spark-2.1.1-bin-hadoop2.7\python\pyspark\sql\utils.py",
> line 79, in deco
>     raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
> pyspark.sql.utils.IllegalArgumentException: "Error while instantiating
> 'org.apache.spark.sql.hive.HiveSessionState':"
> >>>
>
> Any help with what might be going wrong here would be greatly appreciated.
>
> Best
> --
> Curtis Burkhalter
> Postdoctoral Research Associate, National Audubon Society
>
> https://sites.google.com/site/curtisburkhalter/
>