You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ian Wilkinson <ia...@me.com> on 2014/12/03 16:31:24 UTC
Providing query dsl to Elasticsearch for Spark (2.1.0.Beta3)
Hi,
I'm trying the Elasticsearch support for Spark (2.1.0.Beta3).
In the following I provide the query (as query dsl):
import org.elasticsearch.spark._
object TryES {
val sparkConf = new SparkConf().setAppName("Campaigns")
sparkConf.set("es.nodes", "<es_cluster>:9200")
sparkConf.set("es.nodes.discovery", "false")
val sc = new SparkContext(sparkConf)
def main(args: Array[String]) {
val query = """"{
"query": {
...
}
}
"""
val campaigns = sc.esRDD("<resource>", query)
campaigns.count();
}
}
However when I submit this (using spark-1.1.0-bin-hadoop2.4),
I am experiencing the following exceptions:
14/12/03 14:55:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/12/03 14:55:27 INFO scheduler.DAGScheduler: Failed to run count at TryES.scala:<...>
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot open stream for resource "{
"query": {
...
}
}
Is the query dsl supported with esRDD, or am I missing something
more fundamental?
Huge thanks,
ian
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Providing query dsl to Elasticsearch for Spark (2.1.0.Beta3)
Posted by Ian Wilkinson <ia...@me.com>.
Quick follow-up: this works sweetly with spark-1.1.1-bin-hadoop2.4.
> On Dec 3, 2014, at 3:31 PM, Ian Wilkinson <ia...@me.com> wrote:
>
> Hi,
>
> I'm trying the Elasticsearch support for Spark (2.1.0.Beta3).
>
> In the following I provide the query (as query dsl):
>
>
> import org.elasticsearch.spark._
>
> object TryES {
> val sparkConf = new SparkConf().setAppName("Campaigns")
> sparkConf.set("es.nodes", "<es_cluster>:9200")
> sparkConf.set("es.nodes.discovery", "false")
> val sc = new SparkContext(sparkConf)
>
> def main(args: Array[String]) {
> val query = """"{
> "query": {
> ...
> }
> }
> """
> val campaigns = sc.esRDD("<resource>", query)
> campaigns.count();
> }
> }
>
>
> However when I submit this (using spark-1.1.0-bin-hadoop2.4),
> I am experiencing the following exceptions:
>
> 14/12/03 14:55:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
> 14/12/03 14:55:27 INFO scheduler.DAGScheduler: Failed to run count at TryES.scala:<...>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot open stream for resource "{
> "query": {
> ...
> }
> }
>
>
> Is the query dsl supported with esRDD, or am I missing something
> more fundamental?
>
> Huge thanks,
> ian
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org