You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by 十六夜涙 <cr...@qq.com> on 2014/12/09 07:42:20 UTC

spark broadcast unavailable

Hi all    In my spark application,I load a csv file and map the datas to a Map vairable for later uses on driver node ,then broadcast it,every thing works fine untill the exception java.io.FileNotFoundException occurs.the console log information shows me the broadcast unavailable,I googled this problem,says spark will  clean up the broadcast,while these's an solution,the author mentioned about re-broadcast,I followed this way,written some exception handle code with `try` ,`catch`.after compliling and submitting the jar,I faced anthoner problem,It shows " task not serializable‍".‍‍‍
so here I have  there options:
1,get the right way persisting broadcast
2,solve the "task not serializable" problem re-broadcast variable
3,save the data to some kind of database,although I prefer save data in memory.


here is come code snippets:
  val esRdd = kafkaDStreams.flatMap(_.split("\\n"))
      .map{
      case esregex(datetime, time_request) =>


        var ipInfo:Array[String]=Array.empty
        try{
            ipInfo = Utils.getIpInfo(client_ip,b.value)
        }catch{
          case e:java.io.FileNotFoundException =>{
            val b = Utils.load(sc,ip_lib_path)
            ipInfo = Utils.getIpInfo(client_ip,b.value)
          }
        }
‍

Re: spark broadcast unavailable

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
You cannot pass the sc object (*val b = Utils.load(sc,ip_lib_path)*) inside
a map function and that's why the Serialization exception is popping up(
since sc is not serializable). You can try tachyon's cache if you want to
persist the data in memory kind of forever.

Thanks
Best Regards

On Tue, Dec 9, 2014 at 12:12 PM, 十六夜涙 <cr...@qq.com> wrote:

> Hi all
>     In my spark application,I load a csv file and map the datas to a Map
> vairable for later uses on driver node ,then broadcast it,every thing works
> fine untill the exception java.io.FileNotFoundException occurs.the console
> log information shows me the broadcast unavailable,I googled this
> problem,says spark will  clean up the broadcast,while these's an
> solution,the author mentioned about re-broadcast,I followed this
> way,written some exception handle code with `try` ,`catch`.after compliling
> and submitting the jar,I faced anthoner problem,It shows " task
> not serializable‍".‍‍‍
> so here I have  there options:
> 1,get the right way persisting broadcast
> 2,solve the "task not serializable" problem re-broadcast variable
> 3,save the data to some kind of database,although I prefer save data in
> memory.
>
> here is come code snippets:
>   val esRdd = kafkaDStreams.flatMap(_.split("\\n"))
>       .map{
>       case esregex(datetime, time_request) =>
>         var ipInfo:Array[String]=Array.empty
>         try{
>             ipInfo = Utils.getIpInfo(client_ip,b.value)
>         }catch{
>           case e:java.io.FileNotFoundException =>{
>             val b = Utils.load(sc,ip_lib_path)
>             ipInfo = Utils.getIpInfo(client_ip,b.value)
>           }
>         }
> ‍
>