You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2018/02/05 14:04:00 UTC

[jira] [Commented] (SPARK-23338) Spark unable to run on HDP deployed Azure Blob File System

    [ https://issues.apache.org/jira/browse/SPARK-23338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352410#comment-16352410 ] 

Sean Owen commented on SPARK-23338:
-----------------------------------

This all shows an error from Azure APIs, and ultimately a failure from the Azure blob store. THis doesn't sound Spark-related.

> Spark unable to run on HDP deployed Azure Blob File System
> ----------------------------------------------------------
>
>                 Key: SPARK-23338
>                 URL: https://issues.apache.org/jira/browse/SPARK-23338
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Spark Shell
>    Affects Versions: 2.2.0
>         Environment: HDP 2.6.0.3
> Spark2 2.2.0
> HDFS 2.7.3
> CentOS 7.1
>            Reporter: Subhankar
>            Priority: Major
>              Labels: Azure, BLOB, HDP, azureblob, hadoop, hive, spark
>
> Hello,
> It is impossible to run Spark on the BLOB storage file system deployed on HDP.
> I am unable to run Spark as it is giving errors related to HiveSessionState, HiveExternalCatalog and various Azure File storage exceptions.
> I request you to kindly help in case you have a suggestion to address this. Or is it that my exercise is futile and Spark is not configured to run on BLOB storage after all.
> Thanks in advance.
>  
> Detailed Description:
>  
> h5. *We are unable to access spark/spark2 when we change the file system storage form HDFS to WASB. We are using HDP 2.6 platform and running Hadoop 2.7.3. All other services are working fine.*
> I have set the following configurations:
> *HDFS*:
> core-site-
> fs.defaultFS = wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]
> fs.AbstractFileSystem.wasb.impl = org.apache.hadoop.fs.azure.Wasb
> fs.AbstractFileSystem.wasbs.impl = org.apache.hadoop.fs.azure.Wasbs
> fs.azure.selfthrottling.read.factor = 1.0
> fs.azure.selfthrottling.write.factor = 1.0
> [fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://fs.azure.account.key.storage_account_name.blob.core.windows.net/] = KEY
> [spark.hadoop.fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net|http://spark.hadoop.fs.azure.account.key.storage_account_name.blob.core.windows.net/] = KEY
> *SPARK2:*
> spark.eventLog.dir = wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/
> spark.history.fs.logDirectory = wasb:[//CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net|mailto://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net]/spark2-history/
> In spite of trying multiple times and irrespective of alternative configurations, the *spark-shell* command is yielding the below results:
> $ spark-shell
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
> java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
> at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983)
> at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
> at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
> at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
> at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
> at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
> at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
> ... 47 elided
> Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980)
> ... 58 more
> Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
> at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:176)
> at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
> at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
> at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
> at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
> at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
> at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
> ... 63 more
> Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:173)
> ... 71 more
> Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
> at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
> at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65)
> ... 76 more
> Caused by: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
> at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
> at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
> ... 84 more
> Caused by: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
> at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2027)
> at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2081)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
> at org.apache.hadoop.fs.azure.NativeAzureFileSystem.conditionalRedoFolderRename(NativeAzureFileSystem.java:2137)
> at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2104)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
> at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:596)
> at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
> at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
> ... 85 more
> Caused by: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
> at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113)
> at org.apache.hadoop.fs.azure.StorageInterfaceImpl$WrappingIterator.hasNext(StorageInterfaceImpl.java:130)
> at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2006)
> ... 93 more
> Caused by: com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: OK
> at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:101)
> at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:199)
> at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109)
> ... 95 more
> Caused by: java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to org.apache.xerces.xni.parser.XMLParserConfiguration
> at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
> at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.<init>(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
> at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParser(Unknown Source)
> at com.microsoft.azure.storage.core.Utility.getSAXParser(Utility.java:668)
> at com.microsoft.azure.storage.blob.BlobListHandler.getBlobList(BlobListHandler.java:72)
> at com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1284)
> at com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1248)
> at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:146)
> ... 96 more
> <console>:14: error: not found: value spark
> import spark.implicits._
> ^
> <console>:14: error: not found: value spark
> import spark.sql
>  
>  
>  
> It would be immensely helpful if anyone could assist in resolving the above. It may happen that we have missed out on configuring an important aspect of HDFS or Spark, as a result of which it is unable to locate certain JARS and is getting incompatible with the BLOB storage.
> Kindly assist !
> PS: I have made sure the required jars of azure-storage and Hadoop-azure are made available in the spark and the Hadoop lib folders. I have even tried to specify the same explicitly when starting spark-shell but to no effect.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org