You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ashish Mittal <as...@hotwaxsystems.com> on 2019/10/08 13:58:10 UTC

Problem of how to retrieve file from HDFS

Hi,
I am trying to store and retrieve csv file from HDFS.but i have
successfully store csv file in HDFS using LinearRegressionModel in spark
using Java.but not retrieve csv file from HDFS. how to retrieve csv file
from HDFS.
code--
SparkSession sparkSession =
SparkSession.builder().appName("JavaSparkModelWithHadoopHDFSExample").master("local[2]").getOrCreate();
        SQLContext sqlContext = new SQLContext(sparkSession);

        VectorAssembler assembler = new VectorAssembler();
        assembler.setInputCols(new String[] { "MONTH_1", "MONTH_2",
"MONTH_3", "MONTH_4", "MONTH_5", "MONTH_6" })
                .setOutputCol("features");

        Dataset<Row> rowDataSet =
sqlContext.read().format("csv").option("header",
"true").option("inferSchema", "true")

.load("hdfs://localhost:9000/user/hadoop/inpit/data/history.csv");
        rowDataSet.show();
        rowDataSet.printSchema();

        Dataset<Row> vectorDataSet =
assembler.transform(rowDataSet).drop("CUST_ID");
        vectorDataSet.show();

        LinearRegression lr = new
LinearRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
                .setFeaturesCol("features").setLabelCol("CLV");
        lr.setPredictionCol("prediction");

        LinearRegressionModel lrModel = lr.fit(vectorDataSet);

lrModel.write().overwrite().save("hdfs://localhost:9000/user/hadoop/inpit/data/history.csv");

This code is successfully store csv file. but i don't know how to retrieve
csv file from hdfs. Please help me.

Thanks & Regards,
Ashish Mittal