You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Oren Shani <os...@iucc.ac.il> on 2016/04/11 15:11:00 UTC
pyspark and elasticsearch - accessing ES fileds
Hi All,
I connected pyspark under Zeppelin to my Elasticsearch DB and I am able to do this:
%pyspark
es_rdd = sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf={ "es.resource" : "logstash-uni-*" })
es_rdd.toDF().registerTempTable("elk")
and then
%sql select * from elk
And then what I get is a table with just two columns. One is some objectID, I guess and the other is a string with a mapping of all the fields in the ES record into values ( " Map(@timestamp -> 2016-03-16T14:31:12.861Z, host -> ..." ).
My question is how do I create a spark table, or even just a python object ( probably a dict ), that will enable me to access each filed seperatly?
Thanks,
Oren