You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Divya Gehlot <di...@gmail.com> on 2016/02/22 05:45:28 UTC
Error :Type mismatch error when passing hdfs file path to spark-csv
load method
Hi,
I am trying to dynamically create Dataframe by reading subdirectories under
parent directory
My code looks like
> import org.apache.spark._
> import org.apache.spark.sql._
> val hadoopConf = new org.apache.hadoop.conf.Configuration()
> val hdfsConn = org.apache.hadoop.fs.FileSystem.get(new
> java.net.URI("hdfs://xxx.xx.xx.xxx:8020"), hadoopConf)
> hdfsConn.listStatus(new
> org.apache.hadoop.fs.Path("/TestDivya/Spark/ParentDir/")).foreach{
> fileStatus =>
> val filePathName = fileStatus.getPath().toString()
> val fileName = fileStatus.getPath().getName().toLowerCase()
> var df = "df"+fileName
> df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load(filePathName)
> }
getting below error
> <console>:35: error: type mismatch;
> found : org.apache.spark.sql.DataFrame
> required: String
> df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load(filePathName)
Am I missing something ?
Would really appreciate the help .
Thanks,
Divya
Re: Error :Type mismatch error when passing hdfs file path to
spark-csv load method
Posted by Jonathan Kelly <jo...@gmail.com>.
On the line preceding the one that the compiler is complaining about (which
doesn't actually have a problem in itself), you declare df as
"df"+fileName, making it a string. Then you try to assign a DataFrame to
df, but it's already a string. I don't quite understand your intent with
that previous line, but I'm guessing you didn't mean to assign a string to
df.
~ Jonathan
On Sun, Feb 21, 2016 at 8:45 PM Divya Gehlot <di...@gmail.com>
wrote:
> Hi,
> I am trying to dynamically create Dataframe by reading subdirectories
> under parent directory
>
> My code looks like
>
>> import org.apache.spark._
>> import org.apache.spark.sql._
>> val hadoopConf = new org.apache.hadoop.conf.Configuration()
>> val hdfsConn = org.apache.hadoop.fs.FileSystem.get(new
>> java.net.URI("hdfs://xxx.xx.xx.xxx:8020"), hadoopConf)
>> hdfsConn.listStatus(new
>> org.apache.hadoop.fs.Path("/TestDivya/Spark/ParentDir/")).foreach{
>> fileStatus =>
>> val filePathName = fileStatus.getPath().toString()
>> val fileName = fileStatus.getPath().getName().toLowerCase()
>> var df = "df"+fileName
>> df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load(filePathName)
>> }
>
>
> getting below error
>
>> <console>:35: error: type mismatch;
>> found : org.apache.spark.sql.DataFrame
>> required: String
>> df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load(filePathName)
>
>
> Am I missing something ?
>
> Would really appreciate the help .
>
>
> Thanks,
> Divya
>
>