You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by beeshma r <be...@gmail.com> on 2016/01/20 00:09:49 UTC
is Hbase Scan really need thorough Get (Hbase+solr+spark)
Hi
I trying to integrated Hbase-solr-spark.
Solr is indexing all the documents from Hbase through hbase-indexer .
Through the Spark I am manipulating all datasets .Thing is after getting
the solrdocuments from the solr query ,it has the rowkey and rowvalues .So
directly i got the rowkeys and corresponding values
question is 'its really need once again scan Hbase table through Get with
rowkey from solrdocument'?
example code
HTable table = new HTable(conf, "");
Get get = null;
List<Get> list = new ArrayList<Get>();
String url = " ";
SolrServer server = new HttpSolrServer(url);
SolrQuery query = new SolrQuery(" ");
query.setStart(0);
query.setRows(10);
QueryResponse response = server.query(query);
SolrDocumentList docs = response.getResults();
for (SolrDocument doc : docs) {
get = new Get(Bytes.toBytes((String) doc.getFieldValue("rowkey")));
list.add(get);
}
*Result[] res = table.get(list);//This is really need? because it takes
extra time to scan right?*
This piece of code i got from
http://www.programering.com/a/MTM5kDMwATI.html
please correct if anything wrong :)
Thanks
Beesh
Re: is Hbase Scan really need thorough Get (Hbase+solr+spark)
Posted by beeshma r <be...@gmail.com>.
Thanks Ted, :)
if everything gets indexed from Hbase into solr ,then no need to trace
Regionservers once again
Thanks
Beesh
On Wed, Jan 20, 2016 at 5:05 AM, Ted Yu <yu...@gmail.com> wrote:
> get(List<Get> gets) will call:
>
> Object [] r1 = batch((List)gets);
>
> where batch() would do:
>
> AsyncRequestFuture ars = multiAp.submitAll(pool, tableName, actions,
> null, results);
>
> ars.waitUntilDone();
>
> multiAp is an AsyncProcess.
>
> In short, client would access region server for the results.
>
>
> FYI
>
> On Tue, Jan 19, 2016 at 3:28 PM, ayan guha <gu...@gmail.com> wrote:
>
>> It is not scanning the HBase. What it is doing is looping through your
>> list of Row keys and fetching data for each 1 at a time.
>>
>> Ex: Your solr result has 5 records, with Row Keys R1...R5.
>> Then list will be [R1,R2,...R5]
>>
>> Then table.get(list) will do something like:
>>
>> res=[]
>> for k in list:
>> v = getFromHbaseWithRowKey(k) ########## This is just for
>> illustration, there is no such function :)
>> res.add(v)
>> return res
>>
>> On Wed, Jan 20, 2016 at 10:09 AM, beeshma r <be...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I trying to integrated Hbase-solr-spark.
>>> Solr is indexing all the documents from Hbase through hbase-indexer .
>>> Through the Spark I am manipulating all datasets .Thing is after getting
>>> the solrdocuments from the solr query ,it has the rowkey and rowvalues .So
>>> directly i got the rowkeys and corresponding values
>>>
>>> question is 'its really need once again scan Hbase table through Get
>>> with rowkey from solrdocument'?
>>>
>>> example code
>>>
>>> HTable table = new HTable(conf, "");
>>> Get get = null;
>>> List<Get> list = new ArrayList<Get>();
>>> String url = " ";
>>> SolrServer server = new HttpSolrServer(url);
>>> SolrQuery query = new SolrQuery(" ");
>>> query.setStart(0);
>>> query.setRows(10);
>>> QueryResponse response = server.query(query);
>>> SolrDocumentList docs = response.getResults();
>>> for (SolrDocument doc : docs) {
>>> get = new Get(Bytes.toBytes((String) doc.getFieldValue("rowkey")));
>>> list.add(get);
>>>
>>> }
>>>
>>> *Result[] res = table.get(list);//This is really need? because it takes
>>> extra time to scan right?*
>>> This piece of code i got from
>>> http://www.programering.com/a/MTM5kDMwATI.html
>>>
>>> please correct if anything wrong :)
>>>
>>> Thanks
>>> Beesh
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>
--
Re: is Hbase Scan really need thorough Get (Hbase+solr+spark)
Posted by Ted Yu <yu...@gmail.com>.
get(List<Get> gets) will call:
Object [] r1 = batch((List)gets);
where batch() would do:
AsyncRequestFuture ars = multiAp.submitAll(pool, tableName, actions,
null, results);
ars.waitUntilDone();
multiAp is an AsyncProcess.
In short, client would access region server for the results.
FYI
On Tue, Jan 19, 2016 at 3:28 PM, ayan guha <gu...@gmail.com> wrote:
> It is not scanning the HBase. What it is doing is looping through your
> list of Row keys and fetching data for each 1 at a time.
>
> Ex: Your solr result has 5 records, with Row Keys R1...R5.
> Then list will be [R1,R2,...R5]
>
> Then table.get(list) will do something like:
>
> res=[]
> for k in list:
> v = getFromHbaseWithRowKey(k) ########## This is just for
> illustration, there is no such function :)
> res.add(v)
> return res
>
> On Wed, Jan 20, 2016 at 10:09 AM, beeshma r <be...@gmail.com> wrote:
>
>> Hi
>>
>> I trying to integrated Hbase-solr-spark.
>> Solr is indexing all the documents from Hbase through hbase-indexer .
>> Through the Spark I am manipulating all datasets .Thing is after getting
>> the solrdocuments from the solr query ,it has the rowkey and rowvalues .So
>> directly i got the rowkeys and corresponding values
>>
>> question is 'its really need once again scan Hbase table through Get with
>> rowkey from solrdocument'?
>>
>> example code
>>
>> HTable table = new HTable(conf, "");
>> Get get = null;
>> List<Get> list = new ArrayList<Get>();
>> String url = " ";
>> SolrServer server = new HttpSolrServer(url);
>> SolrQuery query = new SolrQuery(" ");
>> query.setStart(0);
>> query.setRows(10);
>> QueryResponse response = server.query(query);
>> SolrDocumentList docs = response.getResults();
>> for (SolrDocument doc : docs) {
>> get = new Get(Bytes.toBytes((String) doc.getFieldValue("rowkey")));
>> list.add(get);
>>
>> }
>>
>> *Result[] res = table.get(list);//This is really need? because it takes
>> extra time to scan right?*
>> This piece of code i got from
>> http://www.programering.com/a/MTM5kDMwATI.html
>>
>> please correct if anything wrong :)
>>
>> Thanks
>> Beesh
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>
Re: is Hbase Scan really need thorough Get (Hbase+solr+spark)
Posted by ayan guha <gu...@gmail.com>.
It is not scanning the HBase. What it is doing is looping through your list
of Row keys and fetching data for each 1 at a time.
Ex: Your solr result has 5 records, with Row Keys R1...R5.
Then list will be [R1,R2,...R5]
Then table.get(list) will do something like:
res=[]
for k in list:
v = getFromHbaseWithRowKey(k) ########## This is just for
illustration, there is no such function :)
res.add(v)
return res
On Wed, Jan 20, 2016 at 10:09 AM, beeshma r <be...@gmail.com> wrote:
> Hi
>
> I trying to integrated Hbase-solr-spark.
> Solr is indexing all the documents from Hbase through hbase-indexer .
> Through the Spark I am manipulating all datasets .Thing is after getting
> the solrdocuments from the solr query ,it has the rowkey and rowvalues .So
> directly i got the rowkeys and corresponding values
>
> question is 'its really need once again scan Hbase table through Get with
> rowkey from solrdocument'?
>
> example code
>
> HTable table = new HTable(conf, "");
> Get get = null;
> List<Get> list = new ArrayList<Get>();
> String url = " ";
> SolrServer server = new HttpSolrServer(url);
> SolrQuery query = new SolrQuery(" ");
> query.setStart(0);
> query.setRows(10);
> QueryResponse response = server.query(query);
> SolrDocumentList docs = response.getResults();
> for (SolrDocument doc : docs) {
> get = new Get(Bytes.toBytes((String) doc.getFieldValue("rowkey")));
> list.add(get);
>
> }
>
> *Result[] res = table.get(list);//This is really need? because it takes
> extra time to scan right?*
> This piece of code i got from
> http://www.programering.com/a/MTM5kDMwATI.html
>
> please correct if anything wrong :)
>
> Thanks
> Beesh
>
>
--
Best Regards,
Ayan Guha