You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by SparkUser6 <al...@gmail.com> on 2018/05/18 00:21:26 UTC

Getting Data From Hbase using Spark is Extremely Slow

I have written four lines of simple spark program to process data in Phoenix
table:  
queryString = getQueryFullString( );// Get data from Phoenix table select
col from table

        
        JavaPairRDD<NullWritable, TestWritable> phRDD = jsc.newAPIHadoopRDD(
                        configuration,
                        PhoenixInputFormat.class,
                        NullWritable.class,
                        TestWritable.class);
       
         JavaRDD<Long> rdd = phRDD.map(new Function<Tuple2&lt;NullWritable,
TestWritable>, Long>() {      
				@Override//Goal is to scan all the data
				public Long call(Tuple2<NullWritable, TestWritable> tuple) throws
Exception {
					return 1L;
				}
        });
       System.out.println(rdd.count());

This program takes 2 hours to process for 2 million record, can anyone help
me understand what is wrong.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org