You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Satyajit varma (JIRA)" <ji...@apache.org> on 2017/07/07 17:11:00 UTC
[jira] [Commented] (SPARK-20597) KafkaSourceProvider falls back on path as synonym for topic

    [ https://issues.apache.org/jira/browse/SPARK-20597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078395#comment-16078395 ] 

Satyajit varma commented on SPARK-20597:
----------------------------------------

Hi [~jlaskowski],

I am almost done, with the above required change and i would like to confirm few things before i submit the PR. (SPARK-20597)

1.In the ticket when you say, "What seems a quite interesting option is to support start(path: String) as the least precedence option in which path would designate the default topic when no other options are used.". Were you referring to only option("topic","topic_name")? or any other option like option("checkpointLocation", ...) ?

I would like to check on this with you because, we would end up getting "org.apache.spark.sql.AnalysisException: checkpointLocation must be specified either through option("checkpointLocation", ...) or SparkSession.conf.set("spark.sql.streaming.checkpointLocation", ...);." error, if we try in executing the below line of code.

     df.writeStream.format("kafka").start("topic") because we have not provided any checkpointlocation option.


2.PFB code , that i am using to get the above functionality working,
   (This is in KafkaSourceProvider.scala) Line 145
// Picks the defaulttopicname from "path" key, an entry in "parameters" Map,
// if no topic key is present in the "parameters" Map and is provided with key "path".
val defaultTopic = parameters.get(TOPIC_OPTION_KEY) match {
  case None => parameters.get(PATH_OPTION_KEY) match {
    case path: Option[String] => parameters.get(PATH_OPTION_KEY).map(_.trim) case _ => None}
  case topic: Option[String] => parameters.get(TOPIC_OPTION_KEY).map(_.trim)
}

Let me know, if this looks okay, or if i am missing any more edge cases or something that i should be taking care of.
I am trying to be very careful and because i am newbie , i would like the experts feedback to my above approach or any other feedback.

if this looks good, i can set the same in createRelation method , Line 163(KafkaSourceProvider.scala), test it for the topic column option(our other scenario to test) and can submit the PR immediately.

Regards,
Satyajit.

> KafkaSourceProvider falls back on path as synonym for topic
> -----------------------------------------------------------
>
>                 Key: SPARK-20597
>                 URL: https://issues.apache.org/jira/browse/SPARK-20597
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>            Reporter: Jacek Laskowski
>            Priority: Trivial
>              Labels: starter
>
> # {{KafkaSourceProvider}} supports {{topic}} option that sets the Kafka topic to save a DataFrame's rows to
> # {{KafkaSourceProvider}} can use {{topic}} column to assign rows to Kafka topics for writing
> What seems a quite interesting option is to support {{start(path: String)}} as the least precedence option in which {{path}} would designate the default topic when no other options are used.
> {code}
> df.writeStream.format("kafka").start("topic")
> {code}
> See http://apache-spark-developers-list.1001551.n3.nabble.com/KafkaSourceProvider-Why-topic-option-and-column-without-reverting-to-path-as-the-least-priority-td21458.html for discussion



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org