You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by beeshma r <be...@gmail.com> on 2016/01/20 00:09:49 UTC

is Hbase Scan really need thorough Get (Hbase+solr+spark)

Hi

 I trying to integrated Hbase-solr-spark.
Solr  is indexing all the documents from Hbase through hbase-indexer .
Through the Spark I am manipulating all datasets .Thing is after getting
the solrdocuments from the solr query ,it has the  rowkey and rowvalues .So
directly i got the rowkeys and corresponding values

question is 'its really need once again scan Hbase table through Get with
rowkey from solrdocument'?

example code

HTable table = new HTable(conf, "");
Get get = null;
List<Get> list = new ArrayList<Get>();
String url =  " ";
SolrServer server = new HttpSolrServer(url);
SolrQuery query = new SolrQuery(" ");
query.setStart(0);
query.setRows(10);
QueryResponse response = server.query(query);
SolrDocumentList docs = response.getResults();
for (SolrDocument doc : docs) {
get = new Get(Bytes.toBytes((String) doc.getFieldValue("rowkey")));
 list.add(get);

  }

*Result[] res = table.get(list);//This is really need? because it takes
extra time to scan right?*
This piece of code i got from
http://www.programering.com/a/MTM5kDMwATI.html

please correct if anything wrong :)

Thanks
Beesh

Re: is Hbase Scan really need thorough Get (Hbase+solr+spark)

Posted by beeshma r <be...@gmail.com>.

Thanks Ted, :)

if everything gets indexed  from Hbase into solr ,then no need to trace
Regionservers once again


Thanks
Beesh


On Wed, Jan 20, 2016 at 5:05 AM, Ted Yu <yu...@gmail.com> wrote:

> get(List<Get> gets) will call:
>
>       Object [] r1 = batch((List)gets);
>
> where batch() would do:
>
>     AsyncRequestFuture ars = multiAp.submitAll(pool, tableName, actions,
> null, results);
>
>     ars.waitUntilDone();
>
> multiAp is an AsyncProcess.
>
> In short, client would access region server for the results.
>
>
> FYI
>
> On Tue, Jan 19, 2016 at 3:28 PM, ayan guha <gu...@gmail.com> wrote:
>
>> It is not scanning the HBase. What it is doing is looping through your
>> list of Row keys and fetching data for each 1 at a time.
>>
>> Ex: Your solr result has 5 records, with Row Keys R1...R5.
>> Then list will be [R1,R2,...R5]
>>
>> Then table.get(list) will do something like:
>>
>> res=[]
>> for k in list:
>>      v = getFromHbaseWithRowKey(k)        ########## This is just for
>> illustration, there is no such function :)
>>       res.add(v)
>> return res
>>
>> On Wed, Jan 20, 2016 at 10:09 AM, beeshma r <be...@gmail.com> wrote:
>>
>>> Hi
>>>
>>>  I trying to integrated Hbase-solr-spark.
>>> Solr  is indexing all the documents from Hbase through hbase-indexer .
>>> Through the Spark I am manipulating all datasets .Thing is after getting
>>> the solrdocuments from the solr query ,it has the  rowkey and rowvalues .So
>>> directly i got the rowkeys and corresponding values
>>>
>>> question is 'its really need once again scan Hbase table through Get
>>> with rowkey from solrdocument'?
>>>
>>> example code
>>>
>>> HTable table = new HTable(conf, "");
>>> Get get = null;
>>> List<Get> list = new ArrayList<Get>();
>>> String url =  " ";
>>> SolrServer server = new HttpSolrServer(url);
>>> SolrQuery query = new SolrQuery(" ");
>>> query.setStart(0);
>>> query.setRows(10);
>>> QueryResponse response = server.query(query);
>>> SolrDocumentList docs = response.getResults();
>>> for (SolrDocument doc : docs) {
>>> get = new Get(Bytes.toBytes((String) doc.getFieldValue("rowkey")));
>>>  list.add(get);
>>>
>>>   }
>>>
>>> *Result[] res = table.get(list);//This is really need? because it takes
>>> extra time to scan right?*
>>> This piece of code i got from
>>> http://www.programering.com/a/MTM5kDMwATI.html
>>>
>>> please correct if anything wrong :)
>>>
>>> Thanks
>>> Beesh
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


--

Re: is Hbase Scan really need thorough Get (Hbase+solr+spark)

Posted by Ted Yu <yu...@gmail.com>.

get(List<Get> gets) will call:

      Object [] r1 = batch((List)gets);

where batch() would do:

    AsyncRequestFuture ars = multiAp.submitAll(pool, tableName, actions,
null, results);

    ars.waitUntilDone();

multiAp is an AsyncProcess.

In short, client would access region server for the results.


FYI

On Tue, Jan 19, 2016 at 3:28 PM, ayan guha <gu...@gmail.com> wrote:

> It is not scanning the HBase. What it is doing is looping through your
> list of Row keys and fetching data for each 1 at a time.
>
> Ex: Your solr result has 5 records, with Row Keys R1...R5.
> Then list will be [R1,R2,...R5]
>
> Then table.get(list) will do something like:
>
> res=[]
> for k in list:
>      v = getFromHbaseWithRowKey(k)        ########## This is just for
> illustration, there is no such function :)
>       res.add(v)
> return res
>
> On Wed, Jan 20, 2016 at 10:09 AM, beeshma r <be...@gmail.com> wrote:
>
>> Hi
>>
>>  I trying to integrated Hbase-solr-spark.
>> Solr  is indexing all the documents from Hbase through hbase-indexer .
>> Through the Spark I am manipulating all datasets .Thing is after getting
>> the solrdocuments from the solr query ,it has the  rowkey and rowvalues .So
>> directly i got the rowkeys and corresponding values
>>
>> question is 'its really need once again scan Hbase table through Get with
>> rowkey from solrdocument'?
>>
>> example code
>>
>> HTable table = new HTable(conf, "");
>> Get get = null;
>> List<Get> list = new ArrayList<Get>();
>> String url =  " ";
>> SolrServer server = new HttpSolrServer(url);
>> SolrQuery query = new SolrQuery(" ");
>> query.setStart(0);
>> query.setRows(10);
>> QueryResponse response = server.query(query);
>> SolrDocumentList docs = response.getResults();
>> for (SolrDocument doc : docs) {
>> get = new Get(Bytes.toBytes((String) doc.getFieldValue("rowkey")));
>>  list.add(get);
>>
>>   }
>>
>> *Result[] res = table.get(list);//This is really need? because it takes
>> extra time to scan right?*
>> This piece of code i got from
>> http://www.programering.com/a/MTM5kDMwATI.html
>>
>> please correct if anything wrong :)
>>
>> Thanks
>> Beesh
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: is Hbase Scan really need thorough Get (Hbase+solr+spark)

Posted by ayan guha <gu...@gmail.com>.

It is not scanning the HBase. What it is doing is looping through your list
of Row keys and fetching data for each 1 at a time.

Ex: Your solr result has 5 records, with Row Keys R1...R5.
Then list will be [R1,R2,...R5]

Then table.get(list) will do something like:

res=[]
for k in list:
     v = getFromHbaseWithRowKey(k)        ########## This is just for
illustration, there is no such function :)
      res.add(v)
return res

On Wed, Jan 20, 2016 at 10:09 AM, beeshma r <be...@gmail.com> wrote:

> Hi
>
>  I trying to integrated Hbase-solr-spark.
> Solr  is indexing all the documents from Hbase through hbase-indexer .
> Through the Spark I am manipulating all datasets .Thing is after getting
> the solrdocuments from the solr query ,it has the  rowkey and rowvalues .So
> directly i got the rowkeys and corresponding values
>
> question is 'its really need once again scan Hbase table through Get with
> rowkey from solrdocument'?
>
> example code
>
> HTable table = new HTable(conf, "");
> Get get = null;
> List<Get> list = new ArrayList<Get>();
> String url =  " ";
> SolrServer server = new HttpSolrServer(url);
> SolrQuery query = new SolrQuery(" ");
> query.setStart(0);
> query.setRows(10);
> QueryResponse response = server.query(query);
> SolrDocumentList docs = response.getResults();
> for (SolrDocument doc : docs) {
> get = new Get(Bytes.toBytes((String) doc.getFieldValue("rowkey")));
>  list.add(get);
>
>   }
>
> *Result[] res = table.get(list);//This is really need? because it takes
> extra time to scan right?*
> This piece of code i got from
> http://www.programering.com/a/MTM5kDMwATI.html
>
> please correct if anything wrong :)
>
> Thanks
> Beesh
>
>


-- 
Best Regards,
Ayan Guha