You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by dylan <dy...@163.com> on 2018/03/12 12:02:57 UTC

carbondata load all stored data to tmp dir

hi guys:
       i am use carbondata1.3 and spark2.2.1 on standalone,
       i start the CarbonThriftServer like this:
/bin/spark-submit
--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
$SPARK_HOME/carbonlib/carbondata_2.11-1.3.0-shade-hadoop2.7.2.jar
hdfs://nameservice1/hive/carbon/store
       i get this log:Downloading hdfs://nameservice1/hive/carbon/store to
/tmp/tmp6465512979544197326/hive/carbon/store.
     this will download all carbonstore to tmp dir,If my carbonstore is very
large, this will take a lot of boot time, and my temporary directory will be
full; each time you start it will create a new temporary directory.
Was it designed in this way, or was my configuration wrong?



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: carbondata load all stored data to tmp dir

Posted by dylan <dy...@163.com>.

hello babulal:
            I know the problem,i use the wrong command to start spark-submit
with --jars.
Thank you very much for your answer and solved my problem.



thanks!



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: carbondata load all stored data to tmp dir

Posted by BabuLal <ba...@gmail.com>.

Hi dylan
As per your console log , this error comes when wrong command to start
spark-submit  while providing resources(jars/file) .  i tried below command
and got same error like you ( i has given --jars option and at last store
location with space ).

root@master /opt/sparkrelease/spark-2.2.1-bin-hadoop2.7 # bin/spark-submit
--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer --jars
$SPARK_HOME/carbonlib/carbondata_2.11-1.3.0-shade-hadoop2.7.2.jar 
hdfs://master:9000/carbonstore
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
Downloading hdfs://master:9000/carbonstore to
/tmp/tmp1358150251291982356/carbonstore.
Exception in thread "main" java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
        at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
        at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)



Spark has below code to do this for resource localization 

..\core\src\main\scala\org\apache\spark\deploy\SparkSubmit.scala
#downloadFile



<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t203/downloadFileMethod.png> 


private[deploy] def downloadFile(path: String, hadoopConf:
HadoopConfiguration): String = {
    require(path != null, "path cannot be null.")
    val uri = Utils.resolveURI(path)
    uri.getScheme match {
      case "file" | "local" =>
        path

      case _ =>
        val fs = FileSystem.get(uri, hadoopConf)
        val tmpFile = new File(Files.createTempDirectory("tmp").toFile,
uri.getPath)
        // scalastyle:off println
        printStream.println(s"Downloading ${uri.toString} to
${tmpFile.getAbsolutePath}.")
        // scalastyle:on println
        fs.copyToLocalFile(new Path(uri), new Path(tmpFile.getAbsolutePath))
        Utils.resolveURI(tmpFile.getAbsolutePath).toString
    }
  }



And this method is called only in below case 

if (deployMode == CLIENT) {
      val hadoopConf = conf.getOrElse(new HadoopConfiguration())
      localPrimaryResource =
Option(args.primaryResource).map(downloadFile(_, hadoopConf)).orNull
      localJars = Option(args.jars).map(downloadFileList(_,
hadoopConf)).orNull
      localPyFiles = Option(args.pyFiles).map(downloadFileList(_,
hadoopConf)).orNull
      localFiles = Option(args.files).map(downloadFileList(_,
hadoopConf)).orNull
    }

<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t203/callingMethods.jpg> 

So Please check your command to start CarbonThriftServer Or send me the
exact command. 

Thanks
Babu





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: carbondata load all stored data to tmp dir

Posted by dylan <dy...@163.com>.

hello babulal:
     thanks for your reply.
      1.my spark-default.conf is: 
      spark.executor.extraJavaOptions        
-Dcarbon.properties.filepath=/home/spark-2.2.1-bin-hadoop2.7/conf/carbon.properties
spark.driver.extraJavaOptions          
-Dcarbon.properties.filepath=/home/spark-2.2.1-bin-hadoop2.7/conf/carbon.properties
     2.console log
 18/03/13 19:12:51 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
18/03/13 19:12:51 WARN DomainSocketFactory: The short-circuit local reads
feature cannot be used because libhadoop cannot be loaded.
Downloading hdfs://nameservice1/hive/carbon/store to
/tmp/tmp3188425816613265318/hive/carbon/store.
     The download operation will continue for a long time until it downloads
all the data to the tmp directory
     



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: carbondata load all stored data to tmp dir

Posted by BabuLal <ba...@gmail.com>.

Hi dylan

I have verified your scenario in my setup and it is working fine without
downloading store to local /tmp/location . Below command is used to started
Thriftserver & Carbon Store is NOT getting copied to /tmp location .


bin/spark-submit --class
org.apache.carbondata.spark.thriftserver.CarbonThriftServer
/opt/sparkrelease/spark-2.2.1-bin-hadoop2.7/carbonlib/carbondata_2.11-1.3.0-shade-hadoop2.7.2.jar
hdfs://master:9000/carbonstore

Can you please provide below detail to analyze issue further. 

1. spark-default.conf under <SPARK-HOME>/conf
2. driver logs ( console log when starting thriftserver) 


Thanks
Babu



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/