You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Conconscious <co...@gmail.com> on 2018/01/22 13:43:10 UTC

Spark querying C* in Scala

Hi list,

I have a Cassandra table with two fields; id bigint, kafka text

My goal is to read only the kafka field (that is a JSON) and infer the
schema

Hi have this skeleton code (not working):

sc.stop
import org.apache.spark._
import com.datastax.spark._
import org.apache.spark.sql.functions.get_json_object

import org.apache.spark.sql.functions.to_json
import org.apache.spark.sql.functions.from_json
import org.apache.spark.sql.types._

val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "127.0.0.1")
.set("spark.cassandra.auth.username", "cassandra")
.set("spark.cassandra.auth.password", "cassandra")
val sc = new SparkContext(conf)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.sql("SELECT kafka from table1")
df.printSchema()

I think at least I have two problems; is missing the keyspace, is not
recognizing the table and for sure is not going to infer the schema from
the text field.

I have a working solution for json files, but I can't "translate" this
to Cassandra:

import org.apache.spark.sql.SparkSession
import spark.implicits._
val spark = SparkSession.builder().appName("Spark SQL basic
example").getOrCreate()
val redf = spark.read.json("/usr/local/spark/examples/cqlsh_r.json")
redf.printSchema
redf.count
redf.show
redf.createOrReplaceTempView("clicks")
val clicksDF = spark.sql("SELECT * FROM clicks")
clicksDF.show()

My Spark version is 2.2.1 and Cassandra version is 3.11.1

Thanks in advance



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark querying C* in Scala

Posted by Conconscious <co...@gmail.com>.

Hi list,

val dfs = spark
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map("cluster" -> "helloCassandra",
  "spark.cassandra.connection.host" -> "127.0.0.1",
  "spark.input.fetch.size_in_rows" -> "100000",
  "spark.cassandra.input.consistency.level" -> "ONE",
  "table" -> "cliks",
  "pushdown" -> "true",
  "keyspace" -> "test",
  "header" -> "false",
  "inferSchema" -> "true"))
  .load()
  .select("kafka")

dfs.printSchema()

Any way to put this schema in json?

Thanks in advance

On 22-01-2018 17:51, Sathish Kumaran Vairavelu wrote:
> You have to register a Cassandra table in spark as dataframes
>
>
> https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md
>
>
> Thanks
>
> Sathish
> On Mon, Jan 22, 2018 at 7:43 AM Conconscious <conconscious@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Hi list,
>
>     I have a Cassandra table with two fields; id bigint, kafka text
>
>     My goal is to read only the kafka field (that is a JSON) and infer the
>     schema
>
>     Hi have this skeleton code (not working):
>
>     sc.stop
>     import org.apache.spark._
>     import com.datastax.spark._
>     import org.apache.spark.sql.functions.get_json_object
>
>     import org.apache.spark.sql.functions.to_json
>     import org.apache.spark.sql.functions.from_json
>     import org.apache.spark.sql.types._
>
>     val conf = new SparkConf(true)
>     .set("spark.cassandra.connection.host", "127.0.0.1")
>     .set("spark.cassandra.auth.username", "cassandra")
>     .set("spark.cassandra.auth.password", "cassandra")
>     val sc = new SparkContext(conf)
>
>     val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>     val df = sqlContext.sql("SELECT kafka from table1")
>     df.printSchema()
>
>     I think at least I have two problems; is missing the keyspace, is not
>     recognizing the table and for sure is not going to infer the
>     schema from
>     the text field.
>
>     I have a working solution for json files, but I can't "translate" this
>     to Cassandra:
>
>     import org.apache.spark.sql.SparkSession
>     import spark.implicits._
>     val spark = SparkSession.builder().appName("Spark SQL basic
>     example").getOrCreate()
>     val redf = spark.read.json("/usr/local/spark/examples/cqlsh_r.json")
>     redf.printSchema
>     redf.count
>     redf.show
>     redf.createOrReplaceTempView("clicks")
>     val clicksDF = spark.sql("SELECT * FROM clicks")
>     clicksDF.show()
>
>     My Spark version is 2.2.1 and Cassandra version is 3.11.1
>
>     Thanks in advance
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>     <ma...@spark.apache.org>
>

Re: Spark querying C* in Scala

Posted by Sathish Kumaran Vairavelu <vs...@gmail.com>.

You have to register a Cassandra table in spark as dataframes


https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md


Thanks

Sathish
On Mon, Jan 22, 2018 at 7:43 AM Conconscious <co...@gmail.com> wrote:

> Hi list,
>
> I have a Cassandra table with two fields; id bigint, kafka text
>
> My goal is to read only the kafka field (that is a JSON) and infer the
> schema
>
> Hi have this skeleton code (not working):
>
> sc.stop
> import org.apache.spark._
> import com.datastax.spark._
> import org.apache.spark.sql.functions.get_json_object
>
> import org.apache.spark.sql.functions.to_json
> import org.apache.spark.sql.functions.from_json
> import org.apache.spark.sql.types._
>
> val conf = new SparkConf(true)
> .set("spark.cassandra.connection.host", "127.0.0.1")
> .set("spark.cassandra.auth.username", "cassandra")
> .set("spark.cassandra.auth.password", "cassandra")
> val sc = new SparkContext(conf)
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> val df = sqlContext.sql("SELECT kafka from table1")
> df.printSchema()
>
> I think at least I have two problems; is missing the keyspace, is not
> recognizing the table and for sure is not going to infer the schema from
> the text field.
>
> I have a working solution for json files, but I can't "translate" this
> to Cassandra:
>
> import org.apache.spark.sql.SparkSession
> import spark.implicits._
> val spark = SparkSession.builder().appName("Spark SQL basic
> example").getOrCreate()
> val redf = spark.read.json("/usr/local/spark/examples/cqlsh_r.json")
> redf.printSchema
> redf.count
> redf.show
> redf.createOrReplaceTempView("clicks")
> val clicksDF = spark.sql("SELECT * FROM clicks")
> clicksDF.show()
>
> My Spark version is 2.2.1 and Cassandra version is 3.11.1
>
> Thanks in advance
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>