You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by David Medinets <da...@gmail.com> on 2014/05/14 21:19:14 UTC

Pagination in Accumulo (also D4M Data Explorer!)

While working on the D4M Data Explorer (
https://github.com/medined/D4M_Schema) web application, I started thinking
about how to paginate. The results are at
https://github.com/medined/D4M_Schema/blob/master/docs/pagination.md. I'll
reproduce only the first paragraph below since I don't want you to miss out
on my cool images.

- First Paragraph -
Pagination in Accumulo is not simple. Pages are not deterministic since the
data can be constantly changing. Also authorization levels can change the
number of returned results. Another consideration is that Accumulo tables
can only be scanned forwards and not backwards. With these factors in mind,
I am implementing the following technique. I hope the community can point
out flaws and provide improvements.
---

Several people helped to review the page via private emails. Thanks! If you
want credit, please step-up and self-identify.

p.s. - The D4M Data Explorer is not too existing just yet. But it should
get better. If you're interested I'd be glad for any help.

Re: Pagination in Accumulo (also D4M Data Explorer!)

Posted by David Medinets <da...@gmail.com>.

Josh, this morning I woke up and remembered that I wrote
http://affy.blogspot.com/2012/11/how-can-i-use-reverse-sort-on-generic.html
about 18 months ago. I can easily add a reverse index in order to
extend
the D4M schema.

I'm glad to see that reverse scanning is possible in HBase.


On Thu, May 15, 2014 at 8:43 PM, Josh Elser <jo...@gmail.com> wrote:

> Reverse scanning isn't necessarily infeasible: https://issues.apache.org/
> jira/browse/HBASE-4811
>
> This might be something cool that could be implemented to make this sort
> of thing easiser.
>
> The pagination isolation you mention in Approach B is interesting. I'm
> curious as to how clone'ing tables would work to get you this. I imagine
> for a highly active system (read and write) this would start to break down
> pretty quickly.
>
> I think having a "loose" interpretation of the actual page you're on would
> be easiest to implement without omitting data (your "next?"). This would be
> a bit easier with a reverse scanner. The "page" then becomes a bit of a
> guess -- really, it's just a guess at the section of records being viewed
> (e.g. 20% behind and 80% forward). That might make for a more honest
> pagination view instead of explicitly listing page numbers which you know
> will change.
>
> Have you considered creating a reverse row index to get around the lack of
> a reverse scanner?
>
>
> On 5/14/14, 3:19 PM, David Medinets wrote:
>
>> While working on the D4M Data Explorer
>> (https://github.com/medined/D4M_Schema) web application, I started
>> thinking about how to paginate. The results are at
>> https://github.com/medined/D4M_Schema/blob/master/docs/pagination.md.
>> I'll reproduce only the first paragraph below since I don't want you to
>> miss out on my cool images.
>>
>> - First Paragraph -
>> Pagination in Accumulo is not simple. Pages are not deterministic since
>> the data can be constantly changing. Also authorization levels can
>> change the number of returned results. Another consideration is that
>> Accumulo tables can only be scanned forwards and not backwards. With
>> these factors in mind, I am implementing the following technique. I hope
>> the community can point out flaws and provide improvements.
>> ---
>>
>> Several people helped to review the page via private emails. Thanks! If
>> you want credit, please step-up and self-identify.
>>
>> p.s. - The D4M Data Explorer is not too existing just yet. But it should
>> get better. If you're interested I'd be glad for any help.
>>
>>

Re: Pagination in Accumulo (also D4M Data Explorer!)

Posted by Josh Elser <jo...@gmail.com>.

Reverse scanning isn't necessarily infeasible: 
https://issues.apache.org/jira/browse/HBASE-4811

This might be something cool that could be implemented to make this sort 
of thing easiser.

The pagination isolation you mention in Approach B is interesting. I'm 
curious as to how clone'ing tables would work to get you this. I imagine 
for a highly active system (read and write) this would start to break 
down pretty quickly.

I think having a "loose" interpretation of the actual page you're on 
would be easiest to implement without omitting data (your "next?"). This 
would be a bit easier with a reverse scanner. The "page" then becomes a 
bit of a guess -- really, it's just a guess at the section of records 
being viewed (e.g. 20% behind and 80% forward). That might make for a 
more honest pagination view instead of explicitly listing page numbers 
which you know will change.

Have you considered creating a reverse row index to get around the lack 
of a reverse scanner?

On 5/14/14, 3:19 PM, David Medinets wrote:
> While working on the D4M Data Explorer
> (https://github.com/medined/D4M_Schema) web application, I started
> thinking about how to paginate. The results are at
> https://github.com/medined/D4M_Schema/blob/master/docs/pagination.md.
> I'll reproduce only the first paragraph below since I don't want you to
> miss out on my cool images.
>
> - First Paragraph -
> Pagination in Accumulo is not simple. Pages are not deterministic since
> the data can be constantly changing. Also authorization levels can
> change the number of returned results. Another consideration is that
> Accumulo tables can only be scanned forwards and not backwards. With
> these factors in mind, I am implementing the following technique. I hope
> the community can point out flaws and provide improvements.
> ---
>
> Several people helped to review the page via private emails. Thanks! If
> you want credit, please step-up and self-identify.
>
> p.s. - The D4M Data Explorer is not too existing just yet. But it should
> get better. If you're interested I'd be glad for any help.
>