You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ajay Chander <it...@gmail.com> on 2016/11/07 21:37:47 UTC
Access_Remote_Kerberized_Cluster_Through_Spark
Hi Everyone,
I am trying to develop a simple codebase on my machine to read data from
secured Hadoop cluster. We have a development cluster which is secured
through Kerberos and I want to run a Spark job from my IntelliJ to read
some sample data from the cluster. Has anyone done this before ? Can you
point me to some sample examples?
I understand that, if we want to talk to secured cluster, we need to have
keytab and principle. I tried using it through
UserGroupInformation.loginUserFromKeytab
and SparkHadoopUtil.get.loginUserFromKeytab but so far no luck.
I have been trying to do this from quite a while ago. Please let me know if
you need more info. Thanks
Regards,
Ajay
Re: Access_Remote_Kerberized_Cluster_Through_Spark
Posted by KhajaAsmath Mohammed <md...@gmail.com>.
Hi Ajay,
I was able to resolve it by adding yarn user principal. here is complete
code.
def main(args: Array[String]) {
// create Spark context with Spark configuration
val cmdLine = Parse.commandLine(args)
val configFile = cmdLine.getOptionValue("c")
val propertyConfiguration = new PropertyConfiguration()
val props = propertyConfiguration.get(configFile)
// val
fs=com.yourcompany.telematics.fs.FileSystem.getHdfsFileSystem(props);
val fs=com.yourcompany.telematics.fs.FileSystem getHdfsFileSystem props;
var sparkConfig = propertyConfiguration.initConfiguration()
val sc = new SparkContext(sparkConfig)
val
configuration:org.apache.hadoop.conf.Configuration=sc.hadoopConfiguration
java.lang.System.setProperty("javax.security.auth.useSubjectCredsOnly","true");
java.lang.System.setProperty("java.security.krb5.conf",
"C:\\krb5.conf");
System.setProperty("sun.security.krb5.debug", "true")
configuration.set("hadoop.security.authentication", "Kerberos");
configuration.set("hdfs.namenode.kerberos.principal","hdfs/_
HOST@AD.yourcompany.COM")
configuration.set("hdfs.datanode.kerberos.principal.pattern","hdfs/*@
AD.yourcompany.COM")
configuration.set("hdfs.master.kerberos.principal","hdfs/*@
AD.yourcompany.COM")
configuration.set("yarn.nodemanager.principal","yarn/*@
AD.yourcompany.COM")
configuration.set("yarn.resourcemanager.principal","yarn/*@
AD.yourcompany.COM")
val hadoopConf="C:\\devtools\\hadoop\\hadoop-2.2.0\\hadoop-2.2.0\\conf"
configuration.addResource(new Path(hadoopConf + "core-site.xml"));
configuration.addResource(new Path(hadoopConf + "hdfs-site.xml"));
configuration.addResource(new Path(hadoopConf + "mapred-site.xml"));
configuration.addResource(new Path(hadoopConf + "yarn-site.xml"));
configuration.addResource(new Path(hadoopConf + "hadoop-policy.xml"));
configuration.set("hadoop.security.authentication", "kerberos");
UserGroupInformation.setConfiguration(configuration);
UserGroupInformation.loginUserFromKeytab("va_dflt@AD.yourcompany.COM",
"C:\\va_dflt.keytab");
// get threshold
// val threshold = args(1).toInt
// read in text file and split each document into words
val lineRdd =
sc.textFile("hdfs://XXXXXX:8020/user/yyy1k78/vehscanxmltext")
val tokenized=lineRdd.flatMap(_.split(" "))
// System.out.println(tokenized.collect().mkString(", "))
// count the occurrence of each word
val wordCounts = tokenized.map((_, 1)).reduceByKey(_ + _)
// filter out words with fewer than threshold occurrences
//val filtered = wordCounts.filter(_._2 >= threshold)
// count characters
//val charCounts = filtered.flatMap(_._1.toCharArray).map((_,
1)).reduceByKey(_ + _)
System.out.println(wordCounts.collect().mkString(", "))
}
}
Thanks,
Asmath.
On Wed, Nov 9, 2016 at 7:44 PM, Ajay Chander <it...@gmail.com> wrote:
> Hi Everyone,
>
> I am still trying to figure this one out. I am stuck with this error "java.io.IOException:
> Can't get Master Kerberos principal for use as renewer ". Below is my code.
> Can any of you please provide any insights on this? Thanks for your time.
>
>
> import java.io.{BufferedInputStream, File, FileInputStream}
> import java.net.URI
>
> import org.apache.hadoop.fs.FileSystem
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.Path
> import org.apache.hadoop.io.IOUtils
> import org.apache.hadoop.security.UserGroupInformation
> import org.apache.spark.deploy.SparkHadoopUtil
> import org.apache.spark.{SparkConf, SparkContext}
>
>
> object SparkHdfs {
>
> def main(args: Array[String]): Unit = {
>
> System.setProperty("java.security.krb5.conf", new File("src\\main\\files\\krb5.conf").getAbsolutePath )
> System.setProperty("sun.security.krb5.debug", "true")
>
> val sparkConf = new SparkConf().setAppName("SparkHdfs").setMaster("local")
> val sc = new SparkContext(sparkConf)
> // Loading remote cluster configurations
> sc.hadoopConfiguration.addResource(new File("src\\main\\files\\core-site.xml").getAbsolutePath )
> sc.hadoopConfiguration.addResource(new File("src\\main\\files\\hdfs-site.xml").getAbsolutePath )
> sc.hadoopConfiguration.addResource(new File("src\\main\\files\\mapred-site.xml").getAbsolutePath )
> sc.hadoopConfiguration.addResource(new File("src\\main\\files\\yarn-site.xml").getAbsolutePath )
> sc.hadoopConfiguration.addResource(new File("src\\main\\files\\ssl-client.xml").getAbsolutePath )
> sc.hadoopConfiguration.addResource(new File("src\\main\\files\\topology.map").getAbsolutePath )
>
> val conf = new Configuration()
> // Loading remote cluster configurations
> conf.addResource(new Path(new File("src\\main\\files\\core-site.xml").getAbsolutePath ))
> conf.addResource(new Path(new File("src\\main\\files\\hdfs-site.xml").getAbsolutePath ))
> conf.addResource(new Path(new File("src\\main\\files\\mapred-site.xml").getAbsolutePath ))
> conf.addResource(new Path(new File("src\\main\\files\\yarn-site.xml").getAbsolutePath ))
> conf.addResource(new Path(new File("src\\main\\files\\ssl-client.xml").getAbsolutePath ))
> conf.addResource(new Path(new File("src\\main\\files\\topology.map").getAbsolutePath ))
>
> conf.set("hadoop.security.authentication", "Kerberos")
>
> UserGroupInformation.setConfiguration(conf)
>
> UserGroupInformation.loginUserFromKeytab("myusr@INTERNAL.COMPANY.COM",
> new File("src\\main\\files\\myusr.keytab").getAbsolutePath )
>
> // SparkHadoopUtil.get.loginUserFromKeytab("tsadusr@INTERNAL.IMSGLOBAL.COM",
> // new File("src\\main\\files\\tsadusr.keytab").getAbsolutePath)
> // Getting this error: java.io.IOException: Can't get Master Kerberos principal for use as renewer
>
> sc.textFile("hdfs://vm1.comp.com:8020/user/myusr/temp/file1").collect().foreach(println)
> // Getting this error: java.io.IOException: Can't get Master Kerberos principal for use as renewer
>
> }
> }
>
>
>
>
> On Mon, Nov 7, 2016 at 9:42 PM, Ajay Chander <it...@gmail.com> wrote:
>
>> Did anyone use https://www.codatlas.com/githu
>> b.com/apache/spark/HEAD/core/src/main/scala/org/apache/
>> spark/deploy/SparkHadoopUtil.scala to interact with secured Hadoop from
>> Spark ?
>>
>> Thanks,
>> Ajay
>>
>> On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander <it...@gmail.com> wrote:
>>
>>>
>>> Hi Everyone,
>>>
>>> I am trying to develop a simple codebase on my machine to read data from
>>> secured Hadoop cluster. We have a development cluster which is secured
>>> through Kerberos and I want to run a Spark job from my IntelliJ to read
>>> some sample data from the cluster. Has anyone done this before ? Can you
>>> point me to some sample examples?
>>>
>>> I understand that, if we want to talk to secured cluster, we need to
>>> have keytab and principle. I tried using it through
>>> UserGroupInformation.loginUserFromKeytab and
>>> SparkHadoopUtil.get.loginUserFromKeytab but so far no luck.
>>>
>>> I have been trying to do this from quite a while ago. Please let me know
>>> if you need more info. Thanks
>>>
>>> Regards,
>>> Ajay
>>>
>>
>>
>
Re: Access_Remote_Kerberized_Cluster_Through_Spark
Posted by Ajay Chander <it...@gmail.com>.
Hi Everyone,
I am still trying to figure this one out. I am stuck with this error
"java.io.IOException:
Can't get Master Kerberos principal for use as renewer ". Below is my code.
Can any of you please provide any insights on this? Thanks for your time.
import java.io.{BufferedInputStream, File, FileInputStream}
import java.net.URI
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.IOUtils
import org.apache.hadoop.security.UserGroupInformation
import org.apache.spark.deploy.SparkHadoopUtil
import org.apache.spark.{SparkConf, SparkContext}
object SparkHdfs {
def main(args: Array[String]): Unit = {
System.setProperty("java.security.krb5.conf", new
File("src\\main\\files\\krb5.conf").getAbsolutePath )
System.setProperty("sun.security.krb5.debug", "true")
val sparkConf = new SparkConf().setAppName("SparkHdfs").setMaster("local")
val sc = new SparkContext(sparkConf)
// Loading remote cluster configurations
sc.hadoopConfiguration.addResource(new
File("src\\main\\files\\core-site.xml").getAbsolutePath )
sc.hadoopConfiguration.addResource(new
File("src\\main\\files\\hdfs-site.xml").getAbsolutePath )
sc.hadoopConfiguration.addResource(new
File("src\\main\\files\\mapred-site.xml").getAbsolutePath )
sc.hadoopConfiguration.addResource(new
File("src\\main\\files\\yarn-site.xml").getAbsolutePath )
sc.hadoopConfiguration.addResource(new
File("src\\main\\files\\ssl-client.xml").getAbsolutePath )
sc.hadoopConfiguration.addResource(new
File("src\\main\\files\\topology.map").getAbsolutePath )
val conf = new Configuration()
// Loading remote cluster configurations
conf.addResource(new Path(new
File("src\\main\\files\\core-site.xml").getAbsolutePath ))
conf.addResource(new Path(new
File("src\\main\\files\\hdfs-site.xml").getAbsolutePath ))
conf.addResource(new Path(new
File("src\\main\\files\\mapred-site.xml").getAbsolutePath ))
conf.addResource(new Path(new
File("src\\main\\files\\yarn-site.xml").getAbsolutePath ))
conf.addResource(new Path(new
File("src\\main\\files\\ssl-client.xml").getAbsolutePath ))
conf.addResource(new Path(new
File("src\\main\\files\\topology.map").getAbsolutePath ))
conf.set("hadoop.security.authentication", "Kerberos")
UserGroupInformation.setConfiguration(conf)
UserGroupInformation.loginUserFromKeytab("myusr@INTERNAL.COMPANY.COM",
new File("src\\main\\files\\myusr.keytab").getAbsolutePath )
// SparkHadoopUtil.get.loginUserFromKeytab("tsadusr@INTERNAL.IMSGLOBAL.COM",
// new File("src\\main\\files\\tsadusr.keytab").getAbsolutePath)
// Getting this error: java.io.IOException: Can't get Master
Kerberos principal for use as renewer
sc.textFile("hdfs://vm1.comp.com:8020/user/myusr/temp/file1").collect().foreach(println)
// Getting this error: java.io.IOException: Can't get Master
Kerberos principal for use as renewer
}
}
On Mon, Nov 7, 2016 at 9:42 PM, Ajay Chander <it...@gmail.com> wrote:
> Did anyone use https://www.codatlas.com/github.com/apache/spark/HEAD/
> core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala to
> interact with secured Hadoop from Spark ?
>
> Thanks,
> Ajay
>
> On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander <it...@gmail.com> wrote:
>
>>
>> Hi Everyone,
>>
>> I am trying to develop a simple codebase on my machine to read data from
>> secured Hadoop cluster. We have a development cluster which is secured
>> through Kerberos and I want to run a Spark job from my IntelliJ to read
>> some sample data from the cluster. Has anyone done this before ? Can you
>> point me to some sample examples?
>>
>> I understand that, if we want to talk to secured cluster, we need to have
>> keytab and principle. I tried using it through
>> UserGroupInformation.loginUserFromKeytab and
>> SparkHadoopUtil.get.loginUserFromKeytab but so far no luck.
>>
>> I have been trying to do this from quite a while ago. Please let me know
>> if you need more info. Thanks
>>
>> Regards,
>> Ajay
>>
>
>
Re: Access_Remote_Kerberized_Cluster_Through_Spark
Posted by Ajay Chander <it...@gmail.com>.
Did anyone use
https://www.codatlas.com/github.com/apache/spark/HEAD/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
to interact with secured Hadoop from Spark ?
Thanks,
Ajay
On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander <it...@gmail.com> wrote:
>
> Hi Everyone,
>
> I am trying to develop a simple codebase on my machine to read data from
> secured Hadoop cluster. We have a development cluster which is secured
> through Kerberos and I want to run a Spark job from my IntelliJ to read
> some sample data from the cluster. Has anyone done this before ? Can you
> point me to some sample examples?
>
> I understand that, if we want to talk to secured cluster, we need to have
> keytab and principle. I tried using it through UserGroupInformation.loginUserFromKeytab
> and SparkHadoopUtil.get.loginUserFromKeytab but so far no luck.
>
> I have been trying to do this from quite a while ago. Please let me know
> if you need more info. Thanks
>
> Regards,
> Ajay
>