You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by bonnahu <bo...@gmail.com> on 2014/12/02 22:26:04 UTC

Using SparkSQL to query Hbase entity takes very long time

Hi all, 
I am new to Spark and currently I am trying to run a SparkSQL query on HBase
entity. For an entity with about 4000 rows, it will take about 12 seconds.
Is it expected? Is there any way to shorten the query process?

Here is the code snippet:


SparkConf sparkConf = new
SparkConf().setMaster("spark://serverUrl:port").setAppName("Javasparksqltest");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
Configuration hbase_conf = HBaseConfiguration.create();
hbase_conf.set("hbase.zookeeper.quorum", serverList);
hbase_conf.set("hbase.regionserver.port", "60020");
hbase_conf.set("hbase.master", master_url);
hbase_conf.set(TableInputFormat.INPUT_TABLE, entityName);
JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD =
jsc.newAPIHadoopRDD(hbase_conf,
                TableInputFormat.class, ImmutableBytesWritable.class,
                Result.class).cache();
// Generate the schema based on the string of schema
final List<StructField> keyFields = new ArrayList<StructField>();
for (String fieldName: this.getSchemaString().split(",")) {
     KeyFields.add(DataType.createStructField(fieldName,
DataType.StringType, true));
}
StructType schema = DataType.createStructType(keyFields);
JavaRDD<Row> rowRDD = hBaseRDD.map(
     new Function<Tuple2&lt;ImmutableBytesWritable, Result>, Row>() {
	public Row call(Tuple2<ImmutableBytesWritable, Result> re)
				throws Exception {
	    return createRow(re, this.getSchemaString());
	}
});
				
JavaSQLContext sqlContext = new
org.apache.spark.sql.api.java.JavaSQLContext(jsc);
// Apply the schema to the RDD.
JavaSchemaRDD schemaRDD = sqlContext.applySchema(rowRDD, schema);
schemaRDD.registerTempTable("queryEntity");
JavaSchemaRDD retRDD = sqlContext.sql("SELECT * FROM mldata WHERE name=
'Spark'");
logger.info("retRDD count is " + retRDD.count());

thanks
		



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-SparkSQL-to-query-Hbase-entity-takes-very-long-time-tp20194.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org