You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ashish Mittal <as...@hotwaxsystems.com> on 2019/10/08 13:58:10 UTC
Problem of how to retrieve file from HDFS
Hi,
I am trying to store and retrieve csv file from HDFS.but i have
successfully store csv file in HDFS using LinearRegressionModel in spark
using Java.but not retrieve csv file from HDFS. how to retrieve csv file
from HDFS.
code--
SparkSession sparkSession =
SparkSession.builder().appName("JavaSparkModelWithHadoopHDFSExample").master("local[2]").getOrCreate();
SQLContext sqlContext = new SQLContext(sparkSession);
VectorAssembler assembler = new VectorAssembler();
assembler.setInputCols(new String[] { "MONTH_1", "MONTH_2",
"MONTH_3", "MONTH_4", "MONTH_5", "MONTH_6" })
.setOutputCol("features");
Dataset<Row> rowDataSet =
sqlContext.read().format("csv").option("header",
"true").option("inferSchema", "true")
.load("hdfs://localhost:9000/user/hadoop/inpit/data/history.csv");
rowDataSet.show();
rowDataSet.printSchema();
Dataset<Row> vectorDataSet =
assembler.transform(rowDataSet).drop("CUST_ID");
vectorDataSet.show();
LinearRegression lr = new
LinearRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
.setFeaturesCol("features").setLabelCol("CLV");
lr.setPredictionCol("prediction");
LinearRegressionModel lrModel = lr.fit(vectorDataSet);
lrModel.write().overwrite().save("hdfs://localhost:9000/user/hadoop/inpit/data/history.csv");
This code is successfully store csv file. but i don't know how to retrieve
csv file from hdfs. Please help me.
Thanks & Regards,
Ashish Mittal