You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Conconscious <co...@gmail.com> on 2018/01/22 13:43:10 UTC
Spark querying C* in Scala
Hi list,
I have a Cassandra table with two fields; id bigint, kafka text
My goal is to read only the kafka field (that is a JSON) and infer the
schema
Hi have this skeleton code (not working):
sc.stop
import org.apache.spark._
import com.datastax.spark._
import org.apache.spark.sql.functions.get_json_object
import org.apache.spark.sql.functions.to_json
import org.apache.spark.sql.functions.from_json
import org.apache.spark.sql.types._
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "127.0.0.1")
.set("spark.cassandra.auth.username", "cassandra")
.set("spark.cassandra.auth.password", "cassandra")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.sql("SELECT kafka from table1")
df.printSchema()
I think at least I have two problems; is missing the keyspace, is not
recognizing the table and for sure is not going to infer the schema from
the text field.
I have a working solution for json files, but I can't "translate" this
to Cassandra:
import org.apache.spark.sql.SparkSession
import spark.implicits._
val spark = SparkSession.builder().appName("Spark SQL basic
example").getOrCreate()
val redf = spark.read.json("/usr/local/spark/examples/cqlsh_r.json")
redf.printSchema
redf.count
redf.show
redf.createOrReplaceTempView("clicks")
val clicksDF = spark.sql("SELECT * FROM clicks")
clicksDF.show()
My Spark version is 2.2.1 and Cassandra version is 3.11.1
Thanks in advance
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Spark querying C* in Scala
Posted by Conconscious <co...@gmail.com>.
Hi list,
val dfs = spark
.read
.format("org.apache.spark.sql.cassandra")
.options(Map("cluster" -> "helloCassandra",
"spark.cassandra.connection.host" -> "127.0.0.1",
"spark.input.fetch.size_in_rows" -> "100000",
"spark.cassandra.input.consistency.level" -> "ONE",
"table" -> "cliks",
"pushdown" -> "true",
"keyspace" -> "test",
"header" -> "false",
"inferSchema" -> "true"))
.load()
.select("kafka")
dfs.printSchema()
Any way to put this schema in json?
Thanks in advance
On 22-01-2018 17:51, Sathish Kumaran Vairavelu wrote:
> You have to register a Cassandra table in spark as dataframes
>
>
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md
>
>
> Thanks
>
> Sathish
> On Mon, Jan 22, 2018 at 7:43 AM Conconscious <conconscious@gmail.com
> <ma...@gmail.com>> wrote:
>
> Hi list,
>
> I have a Cassandra table with two fields; id bigint, kafka text
>
> My goal is to read only the kafka field (that is a JSON) and infer the
> schema
>
> Hi have this skeleton code (not working):
>
> sc.stop
> import org.apache.spark._
> import com.datastax.spark._
> import org.apache.spark.sql.functions.get_json_object
>
> import org.apache.spark.sql.functions.to_json
> import org.apache.spark.sql.functions.from_json
> import org.apache.spark.sql.types._
>
> val conf = new SparkConf(true)
> .set("spark.cassandra.connection.host", "127.0.0.1")
> .set("spark.cassandra.auth.username", "cassandra")
> .set("spark.cassandra.auth.password", "cassandra")
> val sc = new SparkContext(conf)
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> val df = sqlContext.sql("SELECT kafka from table1")
> df.printSchema()
>
> I think at least I have two problems; is missing the keyspace, is not
> recognizing the table and for sure is not going to infer the
> schema from
> the text field.
>
> I have a working solution for json files, but I can't "translate" this
> to Cassandra:
>
> import org.apache.spark.sql.SparkSession
> import spark.implicits._
> val spark = SparkSession.builder().appName("Spark SQL basic
> example").getOrCreate()
> val redf = spark.read.json("/usr/local/spark/examples/cqlsh_r.json")
> redf.printSchema
> redf.count
> redf.show
> redf.createOrReplaceTempView("clicks")
> val clicksDF = spark.sql("SELECT * FROM clicks")
> clicksDF.show()
>
> My Spark version is 2.2.1 and Cassandra version is 3.11.1
>
> Thanks in advance
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> <ma...@spark.apache.org>
>
Re: Spark querying C* in Scala
Posted by Sathish Kumaran Vairavelu <vs...@gmail.com>.
You have to register a Cassandra table in spark as dataframes
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md
Thanks
Sathish
On Mon, Jan 22, 2018 at 7:43 AM Conconscious <co...@gmail.com> wrote:
> Hi list,
>
> I have a Cassandra table with two fields; id bigint, kafka text
>
> My goal is to read only the kafka field (that is a JSON) and infer the
> schema
>
> Hi have this skeleton code (not working):
>
> sc.stop
> import org.apache.spark._
> import com.datastax.spark._
> import org.apache.spark.sql.functions.get_json_object
>
> import org.apache.spark.sql.functions.to_json
> import org.apache.spark.sql.functions.from_json
> import org.apache.spark.sql.types._
>
> val conf = new SparkConf(true)
> .set("spark.cassandra.connection.host", "127.0.0.1")
> .set("spark.cassandra.auth.username", "cassandra")
> .set("spark.cassandra.auth.password", "cassandra")
> val sc = new SparkContext(conf)
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> val df = sqlContext.sql("SELECT kafka from table1")
> df.printSchema()
>
> I think at least I have two problems; is missing the keyspace, is not
> recognizing the table and for sure is not going to infer the schema from
> the text field.
>
> I have a working solution for json files, but I can't "translate" this
> to Cassandra:
>
> import org.apache.spark.sql.SparkSession
> import spark.implicits._
> val spark = SparkSession.builder().appName("Spark SQL basic
> example").getOrCreate()
> val redf = spark.read.json("/usr/local/spark/examples/cqlsh_r.json")
> redf.printSchema
> redf.count
> redf.show
> redf.createOrReplaceTempView("clicks")
> val clicksDF = spark.sql("SELECT * FROM clicks")
> clicksDF.show()
>
> My Spark version is 2.2.1 and Cassandra version is 3.11.1
>
> Thanks in advance
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>