You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by dylan <dy...@163.com> on 2018/03/12 12:02:57 UTC
carbondata load all stored data to tmp dir
hi guys:
i am use carbondata1.3 and spark2.2.1 on standalone,
i start the CarbonThriftServer like this:
/bin/spark-submit
--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
$SPARK_HOME/carbonlib/carbondata_2.11-1.3.0-shade-hadoop2.7.2.jar
hdfs://nameservice1/hive/carbon/store
i get this log:Downloading hdfs://nameservice1/hive/carbon/store to
/tmp/tmp6465512979544197326/hive/carbon/store.
this will download all carbonstore to tmp dir,If my carbonstore is very
large, this will take a lot of boot time, and my temporary directory will be
full; each time you start it will create a new temporary directory.
Was it designed in this way, or was my configuration wrong?
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Re: carbondata load all stored data to tmp dir
Posted by dylan <dy...@163.com>.
hello babulal:
I know the problem,i use the wrong command to start spark-submit
with --jars.
Thank you very much for your answer and solved my problem.
thanks!
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Re: carbondata load all stored data to tmp dir
Posted by BabuLal <ba...@gmail.com>.
Hi dylan
As per your console log , this error comes when wrong command to start
spark-submit while providing resources(jars/file) . i tried below command
and got same error like you ( i has given --jars option and at last store
location with space ).
root@master /opt/sparkrelease/spark-2.2.1-bin-hadoop2.7 # bin/spark-submit
--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer --jars
$SPARK_HOME/carbonlib/carbondata_2.11-1.3.0-shade-hadoop2.7.2.jar
hdfs://master:9000/carbonstore
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
Downloading hdfs://master:9000/carbonstore to
/tmp/tmp1358150251291982356/carbonstore.
Exception in thread "main" java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
Spark has below code to do this for resource localization
..\core\src\main\scala\org\apache\spark\deploy\SparkSubmit.scala
#downloadFile
<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t203/downloadFileMethod.png>
private[deploy] def downloadFile(path: String, hadoopConf:
HadoopConfiguration): String = {
require(path != null, "path cannot be null.")
val uri = Utils.resolveURI(path)
uri.getScheme match {
case "file" | "local" =>
path
case _ =>
val fs = FileSystem.get(uri, hadoopConf)
val tmpFile = new File(Files.createTempDirectory("tmp").toFile,
uri.getPath)
// scalastyle:off println
printStream.println(s"Downloading ${uri.toString} to
${tmpFile.getAbsolutePath}.")
// scalastyle:on println
fs.copyToLocalFile(new Path(uri), new Path(tmpFile.getAbsolutePath))
Utils.resolveURI(tmpFile.getAbsolutePath).toString
}
}
And this method is called only in below case
if (deployMode == CLIENT) {
val hadoopConf = conf.getOrElse(new HadoopConfiguration())
localPrimaryResource =
Option(args.primaryResource).map(downloadFile(_, hadoopConf)).orNull
localJars = Option(args.jars).map(downloadFileList(_,
hadoopConf)).orNull
localPyFiles = Option(args.pyFiles).map(downloadFileList(_,
hadoopConf)).orNull
localFiles = Option(args.files).map(downloadFileList(_,
hadoopConf)).orNull
}
<http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/file/t203/callingMethods.jpg>
So Please check your command to start CarbonThriftServer Or send me the
exact command.
Thanks
Babu
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Re: carbondata load all stored data to tmp dir
Posted by dylan <dy...@163.com>.
hello babulal:
thanks for your reply.
1.my spark-default.conf is:
spark.executor.extraJavaOptions
-Dcarbon.properties.filepath=/home/spark-2.2.1-bin-hadoop2.7/conf/carbon.properties
spark.driver.extraJavaOptions
-Dcarbon.properties.filepath=/home/spark-2.2.1-bin-hadoop2.7/conf/carbon.properties
2.console log
18/03/13 19:12:51 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
18/03/13 19:12:51 WARN DomainSocketFactory: The short-circuit local reads
feature cannot be used because libhadoop cannot be loaded.
Downloading hdfs://nameservice1/hive/carbon/store to
/tmp/tmp3188425816613265318/hive/carbon/store.
The download operation will continue for a long time until it downloads
all the data to the tmp directory
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Re: carbondata load all stored data to tmp dir
Posted by BabuLal <ba...@gmail.com>.
Hi dylan
I have verified your scenario in my setup and it is working fine without
downloading store to local /tmp/location . Below command is used to started
Thriftserver & Carbon Store is NOT getting copied to /tmp location .
bin/spark-submit --class
org.apache.carbondata.spark.thriftserver.CarbonThriftServer
/opt/sparkrelease/spark-2.2.1-bin-hadoop2.7/carbonlib/carbondata_2.11-1.3.0-shade-hadoop2.7.2.jar
hdfs://master:9000/carbonstore
Can you please provide below detail to analyze issue further.
1. spark-default.conf under <SPARK-HOME>/conf
2. driver logs ( console log when starting thriftserver)
Thanks
Babu
--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/