You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Divya Gehlot <di...@gmail.com> on 2016/02/22 05:45:28 UTC

Error :Type mismatch error when passing hdfs file path to spark-csv load method

Hi,
I am trying to dynamically create Dataframe by reading subdirectories under
parent directory

My code looks like

> import org.apache.spark._
> import org.apache.spark.sql._
> val hadoopConf = new org.apache.hadoop.conf.Configuration()
> val hdfsConn = org.apache.hadoop.fs.FileSystem.get(new
> java.net.URI("hdfs://xxx.xx.xx.xxx:8020"), hadoopConf)
> hdfsConn.listStatus(new
> org.apache.hadoop.fs.Path("/TestDivya/Spark/ParentDir/")).foreach{
> fileStatus =>
>    val filePathName = fileStatus.getPath().toString()
>    val fileName = fileStatus.getPath().getName().toLowerCase()
>    var df =  "df"+fileName
>    df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load(filePathName)
> }


getting below error

> <console>:35: error: type mismatch;
>  found   : org.apache.spark.sql.DataFrame
>  required: String
>                  df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").load(filePathName)


Am I missing something ?

Would really appreciate the help .


Thanks,
Divya

Re: Error :Type mismatch error when passing hdfs file path to spark-csv load method

Posted by Jonathan Kelly <jo...@gmail.com>.

On the line preceding the one that the compiler is complaining about (which
doesn't actually have a problem in itself), you declare df as
"df"+fileName, making it a string. Then you try to assign a DataFrame to
df, but it's already a string. I don't quite understand your intent with
that previous line, but I'm guessing you didn't mean to assign a string to
df.

~ Jonathan
On Sun, Feb 21, 2016 at 8:45 PM Divya Gehlot <di...@gmail.com>
wrote:

> Hi,
> I am trying to dynamically create Dataframe by reading subdirectories
> under parent directory
>
> My code looks like
>
>> import org.apache.spark._
>> import org.apache.spark.sql._
>> val hadoopConf = new org.apache.hadoop.conf.Configuration()
>> val hdfsConn = org.apache.hadoop.fs.FileSystem.get(new
>> java.net.URI("hdfs://xxx.xx.xx.xxx:8020"), hadoopConf)
>> hdfsConn.listStatus(new
>> org.apache.hadoop.fs.Path("/TestDivya/Spark/ParentDir/")).foreach{
>> fileStatus =>
>>    val filePathName = fileStatus.getPath().toString()
>>    val fileName = fileStatus.getPath().getName().toLowerCase()
>>    var df =  "df"+fileName
>>    df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load(filePathName)
>> }
>
>
> getting below error
>
>> <console>:35: error: type mismatch;
>>  found   : org.apache.spark.sql.DataFrame
>>  required: String
>>                  df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load(filePathName)
>
>
> Am I missing something ?
>
> Would really appreciate the help .
>
>
> Thanks,
> Divya
>
>