You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Srabasti Banerjee <sr...@ymail.com.INVALID> on 2018/08/31 03:52:11 UTC

Spark Streaming : Multiple sources found for csv : Error

Hi,
I am trying to run below code to read file as a dataframe onto a Stream (for Spark Streaming) developed via Eclipse IDE, defining schemas appropriately, by running thin jar on server and am getting error below. Tried out suggestions from researching on internet based on "spark.read.option.schema.csv" similar errors with no success.
Am thinking this can be a bug as the changes might not have been done for readStream option? Has anybody encountered similar issue for Spark Streaming?
Looking forward to hear your response(s)!
ThanksSrabasti Banerjee

Error
Exception in thread "main" java.lang.RuntimeException: Multiple sources found for csv (com.databricks.spark.csv.DefaultSource15, org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please specify the fully qualified class name.

Code:  val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv").csv("server_path") //does not resolve error val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve errorval csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv").csv("server_path") //does not resolve errorval csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").csv("server_path") //does not resolve errorval csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv.DefaultSource15").csv("server_path") //does not resolve error
    

    

Re: Spark Streaming : Multiple sources found for csv : Error

Posted by Srabasti Banerjee <sr...@ymail.com.INVALID>.
Great we are already discussing/working to fix the issue.Happy to help if I can :-)

Any workarounds that we can use for now?
Please note I am not invoking any additional packages while running spark submit on the thin jar.
Thanks,Srabasti Banerjee





   On Thursday, 30 August, 2018, 9:02:11 PM GMT-7, Hyukjin Kwon <gu...@gmail.com> wrote:  
 
 Yea, this is exactly what I have been worried of the recent changes (discussed in https://issues.apache.org/jira/browse/SPARK-24924)See https://github.com/apache/spark/pull/17916. This should be fine in upper Spark versions.

FYI, +Wechen and DongjoonI want to add Thomas Graves and Gengliang Wang too but can't fine their email addresses.
2018년 8월 31일 (금) 오전 11:52, Srabasti Banerjee <sr...@ymail.com.invalid>님이 작성:

Hi,
I am trying to run below code to read file as a dataframe onto a Stream (for Spark Streaming) developed via Eclipse IDE, defining schemas appropriately, by running thin jar on server and am getting error below. Tried out suggestions from researching on internet based on "spark.read.option.schema.csv" similar errors with no success.
Am thinking this can be a bug as the changes might not have been done for readStream option? Has anybody encountered similar issue for Spark Streaming?
Looking forward to hear your response(s)!
ThanksSrabasti Banerjee

Error
Exception in thread "main" java.lang.RuntimeException: Multiple sources found for csv (com.databricks.spark.csv.DefaultSource15, org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please specify the fully qualified class name.

Code:  val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv").csv("server_path") //does not resolve error val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve errorval csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv").csv("server_path") //does not resolve errorval csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").csv("server_path") //does not resolve errorval csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv.DefaultSource15").csv("server_path") //does not resolve error
    

    

  

Re: Spark Streaming : Multiple sources found for csv : Error

Posted by Hyukjin Kwon <gu...@gmail.com>.
Yea, this is exactly what I have been worried of the recent changes
(discussed in https://issues.apache.org/jira/browse/SPARK-24924)
See https://github.com/apache/spark/pull/17916. This should be fine in
upper Spark versions.

FYI, +Wechen and Dongjoon
I want to add Thomas Graves and Gengliang Wang too but can't fine their
email addresses.

2018년 8월 31일 (금) 오전 11:52, Srabasti Banerjee <sr...@ymail.com.invalid>님이
작성:

> Hi,
>
> I am trying to run below code to read file as a dataframe onto a Stream
> (for Spark Streaming) developed via Eclipse IDE, defining schemas
> appropriately, by running thin jar on server and am getting error below.
> Tried out suggestions from researching on internet based on "spark.read.option.schema.csv"
> similar errors with no success.
>
> Am thinking this can be a bug as the changes might not have been done for
> readStream option? Has anybody encountered similar issue for Spark
> Streaming?
>
> Looking forward to hear your response(s)!
>
> Thanks
> Srabasti Banerjee
>
> *Error*
> *Exception in thread "main" java.lang.RuntimeException: Multiple sources
> found for csv (com.databricks.spark.csv.DefaultSource15,
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please
> specify the fully qualified class name.*
>
> *Code:*
> *val csvdf = spark.readStream.option("sep",
> ",").schema(userSchema).csv("server_path") //does not resolve error*
> *val csvdf = spark.readStream.option("sep",
> ",").schema(userSchema).format("com.databricks.spark.csv").csv("server_path")
> //does not resolve error*
> * val csvdf = spark.readStream.option("sep",
> ",").schema(userSchema).csv("server_path") //does not resolve error*
> *val csvdf = spark.readStream.option("sep",
> ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv").csv("server_path")
> //does not resolve errorval csvdf = spark.readStream.option("sep",
> ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").csv("server_path")
> //does not resolve errorval csvdf = spark.readStream.option("sep",
> ",").schema(userSchema).format("com.databricks.spark.csv.DefaultSource15").csv("server_path")
> //does not resolve error*
>
>
>
>
>

Re: Spark Streaming : Multiple sources found for csv : Error

Posted by Srabasti Banerjee <sr...@ymail.com.INVALID>.
Hi Jorn,
Do you have suggestions as to how to do that? 

The conflicting packages are being picked up by default from pom.xml. I am not invoking any additional packages while running spark submit on the thin jar.

ThanksSrabasti Banerjee
   On Thursday, 30 August, 2018, 9:45:36 PM GMT-7, Jörn Franke <jo...@gmail.com> wrote:  
 
 Can’t you remove the dependency to the databricks CSV data source? Spark has them now integrated since some versions so it is not needed.
On 31. Aug 2018, at 05:52, Srabasti Banerjee <sr...@ymail.com.INVALID> wrote:


Hi,
I am trying to run below code to read file as a dataframe onto a Stream (for Spark Streaming) developed via Eclipse IDE, defining schemas appropriately, by running thin jar on server and am getting error below. Tried out suggestions from researching on internet based on "spark.read.option.schema.csv" similar errors with no success.
Am thinking this can be a bug as the changes might not have been done for readStream option? Has anybody encountered similar issue for Spark Streaming?
Looking forward to hear your response(s)!
ThanksSrabasti Banerjee

Error
Exception in thread "main" java.lang.RuntimeException: Multiple sources found for csv (com.databricks.spark.csv.DefaultSource15, org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please specify the fully qualified class name.

Code:  val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv").csv("server_path") //does not resolve error val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve errorval csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv").csv("server_path") //does not resolve errorval csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").csv("server_path") //does not resolve errorval csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv.DefaultSource15").csv("server_path") //does not resolve error
    

    

  

Re: Spark Streaming : Multiple sources found for csv : Error

Posted by Jörn Franke <jo...@gmail.com>.
Can’t you remove the dependency to the databricks CSV data source? Spark has them now integrated since some versions so it is not needed.

> On 31. Aug 2018, at 05:52, Srabasti Banerjee <sr...@ymail.com.INVALID> wrote:
> 
> Hi,
> 
> I am trying to run below code to read file as a dataframe onto a Stream (for Spark Streaming) developed via Eclipse IDE, defining schemas appropriately, by running thin jar on server and am getting error below. Tried out suggestions from researching on internet based on "spark.read.option.schema.csv" similar errors with no success.
> 
> Am thinking this can be a bug as the changes might not have been done for readStream option? Has anybody encountered similar issue for Spark Streaming?
> 
> Looking forward to hear your response(s)!
> 
> Thanks
> Srabasti Banerjee
> 
> Error
> Exception in thread "main" java.lang.RuntimeException: Multiple sources found for csv (com.databricks.spark.csv.DefaultSource15, org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please specify the fully qualified class name.
> 
> Code:
> val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve error
> val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv").csv("server_path") //does not resolve error
> val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve error
> val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv").csv("server_path") //does not resolve error
> val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").csv("server_path") //does not resolve error
> val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv.DefaultSource15").csv("server_path") //does not resolve error
>     
> 
>    
>