You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Ian Wilkinson <ia...@me.com> on 2014/12/03 16:31:24 UTC

Providing query dsl to Elasticsearch for Spark (2.1.0.Beta3)

Hi,

I'm trying the Elasticsearch support for Spark (2.1.0.Beta3).

In the following I provide the query (as query dsl):


import org.elasticsearch.spark._

object TryES {
  val sparkConf = new SparkConf().setAppName("Campaigns")
  sparkConf.set("es.nodes", "<es_cluster>:9200")
  sparkConf.set("es.nodes.discovery", "false")
  val sc = new SparkContext(sparkConf)

  def main(args: Array[String]) {
    val query = """"{
   "query": {
      ...
   }
}
"""
    val campaigns = sc.esRDD("<resource>", query)
    campaigns.count();
  }
}


However when I submit this (using spark-1.1.0-bin-hadoop2.4),
I am experiencing the following exceptions:

14/12/03 14:55:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/12/03 14:55:27 INFO scheduler.DAGScheduler: Failed to run count at TryES.scala:<...>
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot open stream for resource "{
   "query": {
       ...
   }
}


Is the query dsl supported with esRDD, or am I missing something
more fundamental?

Huge thanks,
ian
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Providing query dsl to Elasticsearch for Spark (2.1.0.Beta3)

Posted by Ian Wilkinson <ia...@me.com>.

Quick follow-up: this works sweetly with spark-1.1.1-bin-hadoop2.4.


> On Dec 3, 2014, at 3:31 PM, Ian Wilkinson <ia...@me.com> wrote:
> 
> Hi,
> 
> I'm trying the Elasticsearch support for Spark (2.1.0.Beta3).
> 
> In the following I provide the query (as query dsl):
> 
> 
> import org.elasticsearch.spark._
> 
> object TryES {
>  val sparkConf = new SparkConf().setAppName("Campaigns")
>  sparkConf.set("es.nodes", "<es_cluster>:9200")
>  sparkConf.set("es.nodes.discovery", "false")
>  val sc = new SparkContext(sparkConf)
> 
>  def main(args: Array[String]) {
>    val query = """"{
>   "query": {
>      ...
>   }
> }
> """
>    val campaigns = sc.esRDD("<resource>", query)
>    campaigns.count();
>  }
> }
> 
> 
> However when I submit this (using spark-1.1.0-bin-hadoop2.4),
> I am experiencing the following exceptions:
> 
> 14/12/03 14:55:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
> 14/12/03 14:55:27 INFO scheduler.DAGScheduler: Failed to run count at TryES.scala:<...>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot open stream for resource "{
>   "query": {
>       ...
>   }
> }
> 
> 
> Is the query dsl supported with esRDD, or am I missing something
> more fundamental?
> 
> Huge thanks,
> ian
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org