You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Vijay Ganesan <vi...@scaligent.com> on 2013/01/25 05:58:58 UTC

Pagination with HBase - getting previous page of data

I'm displaying rows of data from a HBase table in a data grid UI. The grid
shows 25 rows at a time i.e. it is paginated. User can click on
Next/Previous to paginate through the data 25 rows at a time. I can
implement Next easily by setting a HBase
org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the
org.apache.hadoop.hbase.client.Scan to be the row id of the next batch's
row that is sent to the UI with the previous batch. However, I can't seem
to be able to do the same with Previous. I can set the endRow on the Scan
to be the row id of the last row of the previous batch but since HBase
Scans are always in the forward direction, there is no way to set a
PageFilter that can get 25 rows ending at a particular row. The only option
seems to be to get *all* rows up to the end row and filter out all but the
last 25 in the caller, which seems very inefficient. Any ideas on how this
can be done efficiently?

-- 
-Vijay

Re: Pagination with HBase - getting previous page of data

Posted by anil gupta <an...@gmail.com>.

On Sun, Feb 3, 2013 at 8:07 AM, Anoop John <an...@gmail.com> wrote:

> >lets say for a scan setCaching is
> 10 and scan is done across two regions. 9 Results(satisfying the filter)
> are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
> will this scan return 19 (9+10) results?
>
> @Anil.
> No it will return 10 results only not 19. The client here takes into
> account the no# of results got from previous region. But a filter is
> different. The filter has no logic to do at the client side. It fully
> executed at server side. This is the way it is designed. Personally I would
> prefer to do the pagination by app alone by using plain scan with caching
> (to avoid so many RPCs) and app level logic.
>
@Anoop: Nice, that's why even i try to stick simple Scans and maintain the
logic of pagination in application. :)

>
> -Anoop-
>
> On Sat, Feb 2, 2013 at 1:32 PM, anil gupta <an...@gmail.com> wrote:
>
> > Hi Anoop,
> >
> > Please find my reply inline.
> >
> > Thanks,
> > Anil
> >
> > On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <an...@huawei.com>
> > wrote:
> >
> > > @Anil
> > >
> > > >I could not understand that why it goes to multiple regionservers in
> > > parallel. Why it cannot guarantee results <= page size( my guess: due
> to
> > > multiple RS scans)? If you have used it then maybe you can explain the
> > > behaviour?
> > >
> > > Scan from client side never go to multiple RS in parallel. Scan from
> > > HTable API will be sequential with one region after the other. For
> every
> > > region it will open up scanner in the RS and do next() calls. The
> filter
> > > will be instantiated at server side per region level ...
> > >
> > > When u need 100 rows in the page and you created a Scan at client side
> > > with the filter and suppose there are 2 regions, 1st the scanner is
> > opened
> > > at for region1 and scan is happening. It will ensure that max 100 rows
> > will
> > > be retrieved from that region.  But when the region boundary crosses
> and
> > > client automatically open up scanner for the region2, there also it
> will
> > > pass filter with max 100 rows and so from there also max 100 rows can
> > > come..  So over all at the client side we can not guartee that the scan
> > > created will only scan 100 rows as a whole from the table.
> > >
> >
> > I agree with other people on this email chain that the 2nd region should
> > only return (100 - no. of rows returned by Region1), if possible.
> >
> > When the region boundary crosses and client automatically open up scanner
> > for the region2, why doesnt the scanner in Region2 knows that some of the
> > rows are already fetched by region1. Do you mean to say that by default,
> > for a scan spanning multiple regions, every region has it's own count of
> > no.of rows that its going to return? i.e. lets say for a scan setCaching
> is
> > 10 and scan is done across two regions. 9 Results(satisfying the filter)
> > are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
> > will this scan return 19 (9+10) results?
> >
> > >
> > > I think I am making it clear.   I have not PageFilter at all.. I am
> just
> > > explaining as per the knowledge on scan flow and the general filter
> > usage.
> > >
> > > "This is because the filter is applied separately on different region
> > > servers. It does however optimize the scan of individual HRegions by
> > making
> > > sure that the page size is never exceeded locally. "
> > >
> > > I guess it need to be saying that   "This is because the filter is
> > applied
> > > separately on different regions".
> > >
> > > -Anoop-
> > >
> > > ________________________________________
> > > From: anil gupta [anilgupta84@gmail.com]
> > > Sent: Wednesday, January 30, 2013 1:33 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: Pagination with HBase - getting previous page of data
> > >
> > > Hi Mohammad,
> > >
> > > You are most welcome to join the discussion. I have never used
> PageFilter
> > > so i don't really have concrete input.
> > > I had a look at
> > >
> > >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> > > I could not understand that why it goes to multiple regionservers in
> > > parallel. Why it cannot guarantee results <= page size( my guess: due
> to
> > > multiple RS scans)? If you have used it then maybe you can explain the
> > > behaviour?
> > >
> > > Thanks,
> > > Anil
> > >
> > > On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <do...@gmail.com>
> > > wrote:
> > >
> > > > I'm kinda hesitant to put my leg in between the pros ;)But, does it
> > sound
> > > > sane to use PageFilter for both rows and columns and having some
> > > additional
> > > > logic to handle the 'nth' page logic?It'll help us in both kind of
> > > paging.
> > > >
> > > > On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org>
> > > > wrote:
> > > > > Hi Anil,
> > > > >
> > > > > I think it really depend on the way you want to use the pagination.
> > > > >
> > > > > Do you need to be able to jump to page X? Are you ok if you miss a
> > > > > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> > > > > page indexes are a day old? Do you need to paginate over 300
> colums?
> > > > > Or just 1? Do you need to always have the exact same number of
> > entries
> > > > > in each page?
> > > > >
> > > > > For my usecase I need to be able to jump to the page X and I don't
> > > > > have any content. I have hundred of millions lines. Only the rowkey
> > > > > matter for me and I'm fine if sometime I have 50 entries displayed,
> > > > > and sometime only 45. So I'm thinking about calculating which row
> is
> > > > > the first one for each page, and store that separatly. Then I just
> > > > > need to run the MR daily.
> > > > >
> > > > > It's not a perfect solution I agree, but this might do the job for
> > me.
> > > > > I'm totally open to all other idea which might do the job to.
> > > > >
> > > > > JM
> > > > >
> > > > > 2013/1/29, anil gupta <an...@gmail.com>:
> > > > >> Yes, your suggested solution only works on RowKey based
> pagination.
> > It
> > > > will
> > > > >> fail when you start filtering on the basis of columns.
> > > > >>
> > > > >> Still, i would say it's comparatively easier to maintain this at
> > > > >> Application level rather than creating tables for pagination.
> > > > >>
> > > > >> What if you have 300 columns in your schema. Will you create 300
> > > tables?
> > > > >> What about handling of pagination when filtering is done based on
> > > > multiple
> > > > >> columns ("and" and "or" conditions)?
> > > > >>
> > > > >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> > > > >> jean-marc@spaggiari.org> wrote:
> > > > >>
> > > > >>> No, no killer solution here ;)
> > > > >>>
> > > > >>> But I'm still thinking about that because I might have to
> implement
> > > > >>> some pagination options soon...
> > > > >>>
> > > > >>> As you are saying, it's only working on the row-key, but if you
> > want
> > > > >>> to do the same-thing on non-rowkey, you might have to create a
> > > > >>> secondary index table...
> > > > >>>
> > > > >>> JM
> > > > >>>
> > > > >>> 2013/1/27, anil gupta <an...@gmail.com>:
> > > > >>> > That's alright..I thought that you have come-up with a killer
> > > > solution.
> > > > >>> So,
> > > > >>> > got curious to hear your ideas. ;)
> > > > >>> > It seems like your below mentioned solution will not work on
> > > > filtering
> > > > >>> > on
> > > > >>> > non row-key columns since when you are deciding the page
> numbers
> > > you
> > > > >>> > are
> > > > >>> > only considering rowkey.
> > > > >>> >
> > > > >>> > Thanks,
> > > > >>> > Anil
> > > > >>> >
> > > > >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> > > > >>> > jean-marc@spaggiari.org> wrote:
> > > > >>> >
> > > > >>> >> Hi Anil,
> > > > >>> >>
> > > > >>> >> I don't have a solution. I never tought about that ;) But I
> was
> > > > >>> >> thinking about something like you create a 2nd table where you
> > > place
> > > > >>> >> the raw number (4 bytes) then the raw key. You go directly to
> a
> > > > >>> >> specific page, you query by the number, found the key, and you
> > > know
> > > > >>> >> where to start you scan in the main table.
> > > > >>> >>
> > > > >>> >> The issue is properly the number for each lines since with a
> MR
> > > you
> > > > >>> >> don't know where you are from the beginning. But you can built
> > > > >>> >> something where you store the line number from the beginning
> of
> > > the
> > > > >>> >> region, then when all regions are parsed you can reconstruct
> the
> > > > total
> > > > >>> >> numbering... That should work...
> > > > >>> >>
> > > > >>> >> JM
> > > > >>> >>
> > > > >>> >> 2013/1/25, anil gupta <an...@gmail.com>:
> > > > >>> >> > Inline...
> > > > >>> >> >
> > > > >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> > > > >>> >> > jean-marc@spaggiari.org> wrote:
> > > > >>> >> >
> > > > >>> >> >> Hi Anil,
> > > > >>> >> >>
> > > > >>> >> >> The issue is that all the other sub-sequent page start
> should
> > > be
> > > > >>> moved
> > > > >>> >> >> too...
> > > > >>> >> >>
> > > > >>> >> > Yes, this is a possibility. Hence the Developer has to take
> > care
> > > > of
> > > > >>> >> > this
> > > > >>> >> > case. It might also be possible that the pageSize is not a
> > hard
> > > > >>> >> > limit
> > > > >>> >> > on
> > > > >>> >> > number of results(more like a hint or suggestion on size). I
> > > would
> > > > >>> >> > say
> > > > >>> >> > it
> > > > >>> >> > varies by use case.
> > > > >>> >> >
> > > > >>> >> >>
> > > > >>> >> >> so if you want to jump directly to page n, you might be
> > totally
> > > > >>> >> >> shifted because of all the data inserted in the meantime...
> > > > >>> >> >>
> > > > >>> >> >> If you want a real complete pagination feature, you might
> > want
> > > to
> > > > >>> have
> > > > >>> >> >> a coproccessor or a MR updating another table refering to
> the
> > > > >>> >> >> pages....
> > > > >>> >> >>
> > > > >>> >> > Well, the solution depends on the use case. I will be doing
> > > > >>> >> > pagination
> > > > >
> > > >
> > > > --
> > > > Warm Regards,
> > > > Tariq
> > > > https://mtariq.jux.com/
> > > > cloudfront.blogspot.com
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Pagination with HBase - getting previous page of data

Posted by anil gupta <an...@gmail.com>.

Inline...
On Sun, Feb 3, 2013 at 9:25 AM, Toby Lazar <tl...@gmail.com> wrote:

> Quick question - if you perform the pagination client-side and just
> call scanner.iterator().next()
> to get to the necessary results, doesn't this add unecessary network
> traffic of the unused results?


Anil: It depends on the solution. If 95% your scans are limited to a single
region then there wont be unnecessary Network I/O.

>  If you want results 100-120, does the
> client need to first read results 1-100 over the network?


Anil: If you do a simple scan and you want result 100-120 then i would say
yes. Maybe you only get 100-120 by using pagination filter or writing some
custom filter or coprocessor. As, i have mentioned earlier in this post
that we wont be allowing the user to jump to100-120 directly. So, first the
user needs to go through 1-100 results. Hence, i will know the rowkey of
100th results and "rowkey of 100th results" will become my startKey for
100-120 results. So, no unnecessary network I/O.

>  Couldn't a
> filter help prevent some of that unneeded traffic?  Or, is the data only
> transferred when inspecting the result object?
>

Anil: Filters might help reduce unnecessary traffic. It all depends on your
use case.

>
> Thanks,
>
> Toby
> On Sun, Feb 3, 2013 at 11:07 AM, Anoop John <an...@gmail.com> wrote:
>
> > >lets say for a scan setCaching is
> > 10 and scan is done across two regions. 9 Results(satisfying the filter)
> > are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
> > will this scan return 19 (9+10) results?
> >
> > @Anil.
> > No it will return 10 results only not 19. The client here takes into
> > account the no# of results got from previous region. But a filter is
> > different. The filter has no logic to do at the client side. It fully
> > executed at server side. This is the way it is designed. Personally I
> would
> > prefer to do the pagination by app alone by using plain scan with caching
> > (to avoid so many RPCs) and app level logic.
> >
> > -Anoop-
> >
> > On Sat, Feb 2, 2013 at 1:32 PM, anil gupta <an...@gmail.com>
> wrote:
> >
> > > Hi Anoop,
> > >
> > > Please find my reply inline.
> > >
> > > Thanks,
> > > Anil
> > >
> > > On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <an...@huawei.com>
> > > wrote:
> > >
> > > > @Anil
> > > >
> > > > >I could not understand that why it goes to multiple regionservers in
> > > > parallel. Why it cannot guarantee results <= page size( my guess: due
> > to
> > > > multiple RS scans)? If you have used it then maybe you can explain
> the
> > > > behaviour?
> > > >
> > > > Scan from client side never go to multiple RS in parallel. Scan from
> > > > HTable API will be sequential with one region after the other. For
> > every
> > > > region it will open up scanner in the RS and do next() calls. The
> > filter
> > > > will be instantiated at server side per region level ...
> > > >
> > > > When u need 100 rows in the page and you created a Scan at client
> side
> > > > with the filter and suppose there are 2 regions, 1st the scanner is
> > > opened
> > > > at for region1 and scan is happening. It will ensure that max 100
> rows
> > > will
> > > > be retrieved from that region.  But when the region boundary crosses
> > and
> > > > client automatically open up scanner for the region2, there also it
> > will
> > > > pass filter with max 100 rows and so from there also max 100 rows can
> > > > come..  So over all at the client side we can not guartee that the
> scan
> > > > created will only scan 100 rows as a whole from the table.
> > > >
> > >
> > > I agree with other people on this email chain that the 2nd region
> should
> > > only return (100 - no. of rows returned by Region1), if possible.
> > >
> > > When the region boundary crosses and client automatically open up
> scanner
> > > for the region2, why doesnt the scanner in Region2 knows that some of
> the
> > > rows are already fetched by region1. Do you mean to say that by
> default,
> > > for a scan spanning multiple regions, every region has it's own count
> of
> > > no.of rows that its going to return? i.e. lets say for a scan
> setCaching
> > is
> > > 10 and scan is done across two regions. 9 Results(satisfying the
> filter)
> > > are in Region1 and 10 Results(satisfying the filter) are in Region2.
> Then
> > > will this scan return 19 (9+10) results?
> > >
> > > >
> > > > I think I am making it clear.   I have not PageFilter at all.. I am
> > just
> > > > explaining as per the knowledge on scan flow and the general filter
> > > usage.
> > > >
> > > > "This is because the filter is applied separately on different region
> > > > servers. It does however optimize the scan of individual HRegions by
> > > making
> > > > sure that the page size is never exceeded locally. "
> > > >
> > > > I guess it need to be saying that   "This is because the filter is
> > > applied
> > > > separately on different regions".
> > > >
> > > > -Anoop-
> > > >
> > > > ________________________________________
> > > > From: anil gupta [anilgupta84@gmail.com]
> > > > Sent: Wednesday, January 30, 2013 1:33 PM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: Pagination with HBase - getting previous page of data
> > > >
> > > > Hi Mohammad,
> > > >
> > > > You are most welcome to join the discussion. I have never used
> > PageFilter
> > > > so i don't really have concrete input.
> > > > I had a look at
> > > >
> > > >
> > >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> > > > I could not understand that why it goes to multiple regionservers in
> > > > parallel. Why it cannot guarantee results <= page size( my guess: due
> > to
> > > > multiple RS scans)? If you have used it then maybe you can explain
> the
> > > > behaviour?
> > > >
> > > > Thanks,
> > > > Anil
> > > >
> > > > On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <do...@gmail.com>
> > > > wrote:
> > > >
> > > > > I'm kinda hesitant to put my leg in between the pros ;)But, does it
> > > sound
> > > > > sane to use PageFilter for both rows and columns and having some
> > > > additional
> > > > > logic to handle the 'nth' page logic?It'll help us in both kind of
> > > > paging.
> > > > >
> > > > > On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> > > > > jean-marc@spaggiari.org>
> > > > > wrote:
> > > > > > Hi Anil,
> > > > > >
> > > > > > I think it really depend on the way you want to use the
> pagination.
> > > > > >
> > > > > > Do you need to be able to jump to page X? Are you ok if you miss
> a
> > > > > > line or 2? Is your data growing fastly? Or slowly? Is it ok if
> your
> > > > > > page indexes are a day old? Do you need to paginate over 300
> > colums?
> > > > > > Or just 1? Do you need to always have the exact same number of
> > > entries
> > > > > > in each page?
> > > > > >
> > > > > > For my usecase I need to be able to jump to the page X and I
> don't
> > > > > > have any content. I have hundred of millions lines. Only the
> rowkey
> > > > > > matter for me and I'm fine if sometime I have 50 entries
> displayed,
> > > > > > and sometime only 45. So I'm thinking about calculating which row
> > is
> > > > > > the first one for each page, and store that separatly. Then I
> just
> > > > > > need to run the MR daily.
> > > > > >
> > > > > > It's not a perfect solution I agree, but this might do the job
> for
> > > me.
> > > > > > I'm totally open to all other idea which might do the job to.
> > > > > >
> > > > > > JM
> > > > > >
> > > > > > 2013/1/29, anil gupta <an...@gmail.com>:
> > > > > >> Yes, your suggested solution only works on RowKey based
> > pagination.
> > > It
> > > > > will
> > > > > >> fail when you start filtering on the basis of columns.
> > > > > >>
> > > > > >> Still, i would say it's comparatively easier to maintain this at
> > > > > >> Application level rather than creating tables for pagination.
> > > > > >>
> > > > > >> What if you have 300 columns in your schema. Will you create 300
> > > > tables?
> > > > > >> What about handling of pagination when filtering is done based
> on
> > > > > multiple
> > > > > >> columns ("and" and "or" conditions)?
> > > > > >>
> > > > > >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> > > > > >> jean-marc@spaggiari.org> wrote:
> > > > > >>
> > > > > >>> No, no killer solution here ;)
> > > > > >>>
> > > > > >>> But I'm still thinking about that because I might have to
> > implement
> > > > > >>> some pagination options soon...
> > > > > >>>
> > > > > >>> As you are saying, it's only working on the row-key, but if you
> > > want
> > > > > >>> to do the same-thing on non-rowkey, you might have to create a
> > > > > >>> secondary index table...
> > > > > >>>
> > > > > >>> JM
> > > > > >>>
> > > > > >>> 2013/1/27, anil gupta <an...@gmail.com>:
> > > > > >>> > That's alright..I thought that you have come-up with a killer
> > > > > solution.
> > > > > >>> So,
> > > > > >>> > got curious to hear your ideas. ;)
> > > > > >>> > It seems like your below mentioned solution will not work on
> > > > > filtering
> > > > > >>> > on
> > > > > >>> > non row-key columns since when you are deciding the page
> > numbers
> > > > you
> > > > > >>> > are
> > > > > >>> > only considering rowkey.
> > > > > >>> >
> > > > > >>> > Thanks,
> > > > > >>> > Anil
> > > > > >>> >
> > > > > >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> > > > > >>> > jean-marc@spaggiari.org> wrote:
> > > > > >>> >
> > > > > >>> >> Hi Anil,
> > > > > >>> >>
> > > > > >>> >> I don't have a solution. I never tought about that ;) But I
> > was
> > > > > >>> >> thinking about something like you create a 2nd table where
> you
> > > > place
> > > > > >>> >> the raw number (4 bytes) then the raw key. You go directly
> to
> > a
> > > > > >>> >> specific page, you query by the number, found the key, and
> you
> > > > know
> > > > > >>> >> where to start you scan in the main table.
> > > > > >>> >>
> > > > > >>> >> The issue is properly the number for each lines since with a
> > MR
> > > > you
> > > > > >>> >> don't know where you are from the beginning. But you can
> built
> > > > > >>> >> something where you store the line number from the beginning
> > of
> > > > the
> > > > > >>> >> region, then when all regions are parsed you can reconstruct
> > the
> > > > > total
> > > > > >>> >> numbering... That should work...
> > > > > >>> >>
> > > > > >>> >> JM
> > > > > >>> >>
> > > > > >>> >> 2013/1/25, anil gupta <an...@gmail.com>:
> > > > > >>> >> > Inline...
> > > > > >>> >> >
> > > > > >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> > > > > >>> >> > jean-marc@spaggiari.org> wrote:
> > > > > >>> >> >
> > > > > >>> >> >> Hi Anil,
> > > > > >>> >> >>
> > > > > >>> >> >> The issue is that all the other sub-sequent page start
> > should
> > > > be
> > > > > >>> moved
> > > > > >>> >> >> too...
> > > > > >>> >> >>
> > > > > >>> >> > Yes, this is a possibility. Hence the Developer has to
> take
> > > care
> > > > > of
> > > > > >>> >> > this
> > > > > >>> >> > case. It might also be possible that the pageSize is not a
> > > hard
> > > > > >>> >> > limit
> > > > > >>> >> > on
> > > > > >>> >> > number of results(more like a hint or suggestion on
> size). I
> > > > would
> > > > > >>> >> > say
> > > > > >>> >> > it
> > > > > >>> >> > varies by use case.
> > > > > >>> >> >
> > > > > >>> >> >>
> > > > > >>> >> >> so if you want to jump directly to page n, you might be
> > > totally
> > > > > >>> >> >> shifted because of all the data inserted in the
> meantime...
> > > > > >>> >> >>
> > > > > >>> >> >> If you want a real complete pagination feature, you might
> > > want
> > > > to
> > > > > >>> have
> > > > > >>> >> >> a coproccessor or a MR updating another table refering to
> > the
> > > > > >>> >> >> pages....
> > > > > >>> >> >>
> > > > > >>> >> > Well, the solution depends on the use case. I will be
> doing
> > > > > >>> >> > pagination
> > > > > >
> > > > >
> > > > > --
> > > > > Warm Regards,
> > > > > Tariq
> > > > > https://mtariq.jux.com/
> > > > > cloudfront.blogspot.com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Anil Gupta
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> > >
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Pagination with HBase - getting previous page of data

Posted by Toby Lazar <tl...@gmail.com>.

Quick question - if you perform the pagination client-side and just
call scanner.iterator().next()
to get to the necessary results, doesn't this add unecessary network
traffic of the unused results?  If you want results 100-120, does the
client need to first read results 1-100 over the network?  Couldn't a
filter help prevent some of that unneeded traffic?  Or, is the data only
transferred when inspecting the result object?

Thanks,

Toby
On Sun, Feb 3, 2013 at 11:07 AM, Anoop John <an...@gmail.com> wrote:

> >lets say for a scan setCaching is
> 10 and scan is done across two regions. 9 Results(satisfying the filter)
> are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
> will this scan return 19 (9+10) results?
>
> @Anil.
> No it will return 10 results only not 19. The client here takes into
> account the no# of results got from previous region. But a filter is
> different. The filter has no logic to do at the client side. It fully
> executed at server side. This is the way it is designed. Personally I would
> prefer to do the pagination by app alone by using plain scan with caching
> (to avoid so many RPCs) and app level logic.
>
> -Anoop-
>
> On Sat, Feb 2, 2013 at 1:32 PM, anil gupta <an...@gmail.com> wrote:
>
> > Hi Anoop,
> >
> > Please find my reply inline.
> >
> > Thanks,
> > Anil
> >
> > On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <an...@huawei.com>
> > wrote:
> >
> > > @Anil
> > >
> > > >I could not understand that why it goes to multiple regionservers in
> > > parallel. Why it cannot guarantee results <= page size( my guess: due
> to
> > > multiple RS scans)? If you have used it then maybe you can explain the
> > > behaviour?
> > >
> > > Scan from client side never go to multiple RS in parallel. Scan from
> > > HTable API will be sequential with one region after the other. For
> every
> > > region it will open up scanner in the RS and do next() calls. The
> filter
> > > will be instantiated at server side per region level ...
> > >
> > > When u need 100 rows in the page and you created a Scan at client side
> > > with the filter and suppose there are 2 regions, 1st the scanner is
> > opened
> > > at for region1 and scan is happening. It will ensure that max 100 rows
> > will
> > > be retrieved from that region.  But when the region boundary crosses
> and
> > > client automatically open up scanner for the region2, there also it
> will
> > > pass filter with max 100 rows and so from there also max 100 rows can
> > > come..  So over all at the client side we can not guartee that the scan
> > > created will only scan 100 rows as a whole from the table.
> > >
> >
> > I agree with other people on this email chain that the 2nd region should
> > only return (100 - no. of rows returned by Region1), if possible.
> >
> > When the region boundary crosses and client automatically open up scanner
> > for the region2, why doesnt the scanner in Region2 knows that some of the
> > rows are already fetched by region1. Do you mean to say that by default,
> > for a scan spanning multiple regions, every region has it's own count of
> > no.of rows that its going to return? i.e. lets say for a scan setCaching
> is
> > 10 and scan is done across two regions. 9 Results(satisfying the filter)
> > are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
> > will this scan return 19 (9+10) results?
> >
> > >
> > > I think I am making it clear.   I have not PageFilter at all.. I am
> just
> > > explaining as per the knowledge on scan flow and the general filter
> > usage.
> > >
> > > "This is because the filter is applied separately on different region
> > > servers. It does however optimize the scan of individual HRegions by
> > making
> > > sure that the page size is never exceeded locally. "
> > >
> > > I guess it need to be saying that   "This is because the filter is
> > applied
> > > separately on different regions".
> > >
> > > -Anoop-
> > >
> > > ________________________________________
> > > From: anil gupta [anilgupta84@gmail.com]
> > > Sent: Wednesday, January 30, 2013 1:33 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: Pagination with HBase - getting previous page of data
> > >
> > > Hi Mohammad,
> > >
> > > You are most welcome to join the discussion. I have never used
> PageFilter
> > > so i don't really have concrete input.
> > > I had a look at
> > >
> > >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> > > I could not understand that why it goes to multiple regionservers in
> > > parallel. Why it cannot guarantee results <= page size( my guess: due
> to
> > > multiple RS scans)? If you have used it then maybe you can explain the
> > > behaviour?
> > >
> > > Thanks,
> > > Anil
> > >
> > > On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <do...@gmail.com>
> > > wrote:
> > >
> > > > I'm kinda hesitant to put my leg in between the pros ;)But, does it
> > sound
> > > > sane to use PageFilter for both rows and columns and having some
> > > additional
> > > > logic to handle the 'nth' page logic?It'll help us in both kind of
> > > paging.
> > > >
> > > > On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org>
> > > > wrote:
> > > > > Hi Anil,
> > > > >
> > > > > I think it really depend on the way you want to use the pagination.
> > > > >
> > > > > Do you need to be able to jump to page X? Are you ok if you miss a
> > > > > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> > > > > page indexes are a day old? Do you need to paginate over 300
> colums?
> > > > > Or just 1? Do you need to always have the exact same number of
> > entries
> > > > > in each page?
> > > > >
> > > > > For my usecase I need to be able to jump to the page X and I don't
> > > > > have any content. I have hundred of millions lines. Only the rowkey
> > > > > matter for me and I'm fine if sometime I have 50 entries displayed,
> > > > > and sometime only 45. So I'm thinking about calculating which row
> is
> > > > > the first one for each page, and store that separatly. Then I just
> > > > > need to run the MR daily.
> > > > >
> > > > > It's not a perfect solution I agree, but this might do the job for
> > me.
> > > > > I'm totally open to all other idea which might do the job to.
> > > > >
> > > > > JM
> > > > >
> > > > > 2013/1/29, anil gupta <an...@gmail.com>:
> > > > >> Yes, your suggested solution only works on RowKey based
> pagination.
> > It
> > > > will
> > > > >> fail when you start filtering on the basis of columns.
> > > > >>
> > > > >> Still, i would say it's comparatively easier to maintain this at
> > > > >> Application level rather than creating tables for pagination.
> > > > >>
> > > > >> What if you have 300 columns in your schema. Will you create 300
> > > tables?
> > > > >> What about handling of pagination when filtering is done based on
> > > > multiple
> > > > >> columns ("and" and "or" conditions)?
> > > > >>
> > > > >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> > > > >> jean-marc@spaggiari.org> wrote:
> > > > >>
> > > > >>> No, no killer solution here ;)
> > > > >>>
> > > > >>> But I'm still thinking about that because I might have to
> implement
> > > > >>> some pagination options soon...
> > > > >>>
> > > > >>> As you are saying, it's only working on the row-key, but if you
> > want
> > > > >>> to do the same-thing on non-rowkey, you might have to create a
> > > > >>> secondary index table...
> > > > >>>
> > > > >>> JM
> > > > >>>
> > > > >>> 2013/1/27, anil gupta <an...@gmail.com>:
> > > > >>> > That's alright..I thought that you have come-up with a killer
> > > > solution.
> > > > >>> So,
> > > > >>> > got curious to hear your ideas. ;)
> > > > >>> > It seems like your below mentioned solution will not work on
> > > > filtering
> > > > >>> > on
> > > > >>> > non row-key columns since when you are deciding the page
> numbers
> > > you
> > > > >>> > are
> > > > >>> > only considering rowkey.
> > > > >>> >
> > > > >>> > Thanks,
> > > > >>> > Anil
> > > > >>> >
> > > > >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> > > > >>> > jean-marc@spaggiari.org> wrote:
> > > > >>> >
> > > > >>> >> Hi Anil,
> > > > >>> >>
> > > > >>> >> I don't have a solution. I never tought about that ;) But I
> was
> > > > >>> >> thinking about something like you create a 2nd table where you
> > > place
> > > > >>> >> the raw number (4 bytes) then the raw key. You go directly to
> a
> > > > >>> >> specific page, you query by the number, found the key, and you
> > > know
> > > > >>> >> where to start you scan in the main table.
> > > > >>> >>
> > > > >>> >> The issue is properly the number for each lines since with a
> MR
> > > you
> > > > >>> >> don't know where you are from the beginning. But you can built
> > > > >>> >> something where you store the line number from the beginning
> of
> > > the
> > > > >>> >> region, then when all regions are parsed you can reconstruct
> the
> > > > total
> > > > >>> >> numbering... That should work...
> > > > >>> >>
> > > > >>> >> JM
> > > > >>> >>
> > > > >>> >> 2013/1/25, anil gupta <an...@gmail.com>:
> > > > >>> >> > Inline...
> > > > >>> >> >
> > > > >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> > > > >>> >> > jean-marc@spaggiari.org> wrote:
> > > > >>> >> >
> > > > >>> >> >> Hi Anil,
> > > > >>> >> >>
> > > > >>> >> >> The issue is that all the other sub-sequent page start
> should
> > > be
> > > > >>> moved
> > > > >>> >> >> too...
> > > > >>> >> >>
> > > > >>> >> > Yes, this is a possibility. Hence the Developer has to take
> > care
> > > > of
> > > > >>> >> > this
> > > > >>> >> > case. It might also be possible that the pageSize is not a
> > hard
> > > > >>> >> > limit
> > > > >>> >> > on
> > > > >>> >> > number of results(more like a hint or suggestion on size). I
> > > would
> > > > >>> >> > say
> > > > >>> >> > it
> > > > >>> >> > varies by use case.
> > > > >>> >> >
> > > > >>> >> >>
> > > > >>> >> >> so if you want to jump directly to page n, you might be
> > totally
> > > > >>> >> >> shifted because of all the data inserted in the meantime...
> > > > >>> >> >>
> > > > >>> >> >> If you want a real complete pagination feature, you might
> > want
> > > to
> > > > >>> have
> > > > >>> >> >> a coproccessor or a MR updating another table refering to
> the
> > > > >>> >> >> pages....
> > > > >>> >> >>
> > > > >>> >> > Well, the solution depends on the use case. I will be doing
> > > > >>> >> > pagination
> > > > >
> > > >
> > > > --
> > > > Warm Regards,
> > > > Tariq
> > > > https://mtariq.jux.com/
> > > > cloudfront.blogspot.com
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>

Re: Pagination with HBase - getting previous page of data

Posted by Anoop John <an...@gmail.com>.

>lets say for a scan setCaching is
10 and scan is done across two regions. 9 Results(satisfying the filter)
are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
will this scan return 19 (9+10) results?

@Anil.
No it will return 10 results only not 19. The client here takes into
account the no# of results got from previous region. But a filter is
different. The filter has no logic to do at the client side. It fully
executed at server side. This is the way it is designed. Personally I would
prefer to do the pagination by app alone by using plain scan with caching
(to avoid so many RPCs) and app level logic.

-Anoop-

On Sat, Feb 2, 2013 at 1:32 PM, anil gupta <an...@gmail.com> wrote:

> Hi Anoop,
>
> Please find my reply inline.
>
> Thanks,
> Anil
>
> On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <an...@huawei.com>
> wrote:
>
> > @Anil
> >
> > >I could not understand that why it goes to multiple regionservers in
> > parallel. Why it cannot guarantee results <= page size( my guess: due to
> > multiple RS scans)? If you have used it then maybe you can explain the
> > behaviour?
> >
> > Scan from client side never go to multiple RS in parallel. Scan from
> > HTable API will be sequential with one region after the other. For every
> > region it will open up scanner in the RS and do next() calls. The filter
> > will be instantiated at server side per region level ...
> >
> > When u need 100 rows in the page and you created a Scan at client side
> > with the filter and suppose there are 2 regions, 1st the scanner is
> opened
> > at for region1 and scan is happening. It will ensure that max 100 rows
> will
> > be retrieved from that region.  But when the region boundary crosses and
> > client automatically open up scanner for the region2, there also it will
> > pass filter with max 100 rows and so from there also max 100 rows can
> > come..  So over all at the client side we can not guartee that the scan
> > created will only scan 100 rows as a whole from the table.
> >
>
> I agree with other people on this email chain that the 2nd region should
> only return (100 - no. of rows returned by Region1), if possible.
>
> When the region boundary crosses and client automatically open up scanner
> for the region2, why doesnt the scanner in Region2 knows that some of the
> rows are already fetched by region1. Do you mean to say that by default,
> for a scan spanning multiple regions, every region has it's own count of
> no.of rows that its going to return? i.e. lets say for a scan setCaching is
> 10 and scan is done across two regions. 9 Results(satisfying the filter)
> are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
> will this scan return 19 (9+10) results?
>
> >
> > I think I am making it clear.   I have not PageFilter at all.. I am just
> > explaining as per the knowledge on scan flow and the general filter
> usage.
> >
> > "This is because the filter is applied separately on different region
> > servers. It does however optimize the scan of individual HRegions by
> making
> > sure that the page size is never exceeded locally. "
> >
> > I guess it need to be saying that   "This is because the filter is
> applied
> > separately on different regions".
> >
> > -Anoop-
> >
> > ________________________________________
> > From: anil gupta [anilgupta84@gmail.com]
> > Sent: Wednesday, January 30, 2013 1:33 PM
> > To: user@hbase.apache.org
> > Subject: Re: Pagination with HBase - getting previous page of data
> >
> > Hi Mohammad,
> >
> > You are most welcome to join the discussion. I have never used PageFilter
> > so i don't really have concrete input.
> > I had a look at
> >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> > I could not understand that why it goes to multiple regionservers in
> > parallel. Why it cannot guarantee results <= page size( my guess: due to
> > multiple RS scans)? If you have used it then maybe you can explain the
> > behaviour?
> >
> > Thanks,
> > Anil
> >
> > On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <do...@gmail.com>
> > wrote:
> >
> > > I'm kinda hesitant to put my leg in between the pros ;)But, does it
> sound
> > > sane to use PageFilter for both rows and columns and having some
> > additional
> > > logic to handle the 'nth' page logic?It'll help us in both kind of
> > paging.
> > >
> > > On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org>
> > > wrote:
> > > > Hi Anil,
> > > >
> > > > I think it really depend on the way you want to use the pagination.
> > > >
> > > > Do you need to be able to jump to page X? Are you ok if you miss a
> > > > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> > > > page indexes are a day old? Do you need to paginate over 300 colums?
> > > > Or just 1? Do you need to always have the exact same number of
> entries
> > > > in each page?
> > > >
> > > > For my usecase I need to be able to jump to the page X and I don't
> > > > have any content. I have hundred of millions lines. Only the rowkey
> > > > matter for me and I'm fine if sometime I have 50 entries displayed,
> > > > and sometime only 45. So I'm thinking about calculating which row is
> > > > the first one for each page, and store that separatly. Then I just
> > > > need to run the MR daily.
> > > >
> > > > It's not a perfect solution I agree, but this might do the job for
> me.
> > > > I'm totally open to all other idea which might do the job to.
> > > >
> > > > JM
> > > >
> > > > 2013/1/29, anil gupta <an...@gmail.com>:
> > > >> Yes, your suggested solution only works on RowKey based pagination.
> It
> > > will
> > > >> fail when you start filtering on the basis of columns.
> > > >>
> > > >> Still, i would say it's comparatively easier to maintain this at
> > > >> Application level rather than creating tables for pagination.
> > > >>
> > > >> What if you have 300 columns in your schema. Will you create 300
> > tables?
> > > >> What about handling of pagination when filtering is done based on
> > > multiple
> > > >> columns ("and" and "or" conditions)?
> > > >>
> > > >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> > > >> jean-marc@spaggiari.org> wrote:
> > > >>
> > > >>> No, no killer solution here ;)
> > > >>>
> > > >>> But I'm still thinking about that because I might have to implement
> > > >>> some pagination options soon...
> > > >>>
> > > >>> As you are saying, it's only working on the row-key, but if you
> want
> > > >>> to do the same-thing on non-rowkey, you might have to create a
> > > >>> secondary index table...
> > > >>>
> > > >>> JM
> > > >>>
> > > >>> 2013/1/27, anil gupta <an...@gmail.com>:
> > > >>> > That's alright..I thought that you have come-up with a killer
> > > solution.
> > > >>> So,
> > > >>> > got curious to hear your ideas. ;)
> > > >>> > It seems like your below mentioned solution will not work on
> > > filtering
> > > >>> > on
> > > >>> > non row-key columns since when you are deciding the page numbers
> > you
> > > >>> > are
> > > >>> > only considering rowkey.
> > > >>> >
> > > >>> > Thanks,
> > > >>> > Anil
> > > >>> >
> > > >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> > > >>> > jean-marc@spaggiari.org> wrote:
> > > >>> >
> > > >>> >> Hi Anil,
> > > >>> >>
> > > >>> >> I don't have a solution. I never tought about that ;) But I was
> > > >>> >> thinking about something like you create a 2nd table where you
> > place
> > > >>> >> the raw number (4 bytes) then the raw key. You go directly to a
> > > >>> >> specific page, you query by the number, found the key, and you
> > know
> > > >>> >> where to start you scan in the main table.
> > > >>> >>
> > > >>> >> The issue is properly the number for each lines since with a MR
> > you
> > > >>> >> don't know where you are from the beginning. But you can built
> > > >>> >> something where you store the line number from the beginning of
> > the
> > > >>> >> region, then when all regions are parsed you can reconstruct the
> > > total
> > > >>> >> numbering... That should work...
> > > >>> >>
> > > >>> >> JM
> > > >>> >>
> > > >>> >> 2013/1/25, anil gupta <an...@gmail.com>:
> > > >>> >> > Inline...
> > > >>> >> >
> > > >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> > > >>> >> > jean-marc@spaggiari.org> wrote:
> > > >>> >> >
> > > >>> >> >> Hi Anil,
> > > >>> >> >>
> > > >>> >> >> The issue is that all the other sub-sequent page start should
> > be
> > > >>> moved
> > > >>> >> >> too...
> > > >>> >> >>
> > > >>> >> > Yes, this is a possibility. Hence the Developer has to take
> care
> > > of
> > > >>> >> > this
> > > >>> >> > case. It might also be possible that the pageSize is not a
> hard
> > > >>> >> > limit
> > > >>> >> > on
> > > >>> >> > number of results(more like a hint or suggestion on size). I
> > would
> > > >>> >> > say
> > > >>> >> > it
> > > >>> >> > varies by use case.
> > > >>> >> >
> > > >>> >> >>
> > > >>> >> >> so if you want to jump directly to page n, you might be
> totally
> > > >>> >> >> shifted because of all the data inserted in the meantime...
> > > >>> >> >>
> > > >>> >> >> If you want a real complete pagination feature, you might
> want
> > to
> > > >>> have
> > > >>> >> >> a coproccessor or a MR updating another table refering to the
> > > >>> >> >> pages....
> > > >>> >> >>
> > > >>> >> > Well, the solution depends on the use case. I will be doing
> > > >>> >> > pagination
> > > >
> > >
> > > --
> > > Warm Regards,
> > > Tariq
> > > https://mtariq.jux.com/
> > > cloudfront.blogspot.com
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Pagination with HBase - getting previous page of data

Posted by anil gupta <an...@gmail.com>.

Hi Anoop,

Please find my reply inline.

Thanks,
Anil

On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <an...@huawei.com> wrote:

> @Anil
>
> >I could not understand that why it goes to multiple regionservers in
> parallel. Why it cannot guarantee results <= page size( my guess: due to
> multiple RS scans)? If you have used it then maybe you can explain the
> behaviour?
>
> Scan from client side never go to multiple RS in parallel. Scan from
> HTable API will be sequential with one region after the other. For every
> region it will open up scanner in the RS and do next() calls. The filter
> will be instantiated at server side per region level ...
>
> When u need 100 rows in the page and you created a Scan at client side
> with the filter and suppose there are 2 regions, 1st the scanner is opened
> at for region1 and scan is happening. It will ensure that max 100 rows will
> be retrieved from that region.  But when the region boundary crosses and
> client automatically open up scanner for the region2, there also it will
> pass filter with max 100 rows and so from there also max 100 rows can
> come..  So over all at the client side we can not guartee that the scan
> created will only scan 100 rows as a whole from the table.
>

I agree with other people on this email chain that the 2nd region should
only return (100 - no. of rows returned by Region1), if possible.

When the region boundary crosses and client automatically open up scanner
for the region2, why doesnt the scanner in Region2 knows that some of the
rows are already fetched by region1. Do you mean to say that by default,
for a scan spanning multiple regions, every region has it's own count of
no.of rows that its going to return? i.e. lets say for a scan setCaching is
10 and scan is done across two regions. 9 Results(satisfying the filter)
are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
will this scan return 19 (9+10) results?

>
> I think I am making it clear.   I have not PageFilter at all.. I am just
> explaining as per the knowledge on scan flow and the general filter usage.
>
> "This is because the filter is applied separately on different region
> servers. It does however optimize the scan of individual HRegions by making
> sure that the page size is never exceeded locally. "
>
> I guess it need to be saying that   "This is because the filter is applied
> separately on different regions".
>
> -Anoop-
>
> ________________________________________
> From: anil gupta [anilgupta84@gmail.com]
> Sent: Wednesday, January 30, 2013 1:33 PM
> To: user@hbase.apache.org
> Subject: Re: Pagination with HBase - getting previous page of data
>
> Hi Mohammad,
>
> You are most welcome to join the discussion. I have never used PageFilter
> so i don't really have concrete input.
> I had a look at
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> I could not understand that why it goes to multiple regionservers in
> parallel. Why it cannot guarantee results <= page size( my guess: due to
> multiple RS scans)? If you have used it then maybe you can explain the
> behaviour?
>
> Thanks,
> Anil
>
> On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
>
> > I'm kinda hesitant to put my leg in between the pros ;)But, does it sound
> > sane to use PageFilter for both rows and columns and having some
> additional
> > logic to handle the 'nth' page logic?It'll help us in both kind of
> paging.
> >
> > On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org>
> > wrote:
> > > Hi Anil,
> > >
> > > I think it really depend on the way you want to use the pagination.
> > >
> > > Do you need to be able to jump to page X? Are you ok if you miss a
> > > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> > > page indexes are a day old? Do you need to paginate over 300 colums?
> > > Or just 1? Do you need to always have the exact same number of entries
> > > in each page?
> > >
> > > For my usecase I need to be able to jump to the page X and I don't
> > > have any content. I have hundred of millions lines. Only the rowkey
> > > matter for me and I'm fine if sometime I have 50 entries displayed,
> > > and sometime only 45. So I'm thinking about calculating which row is
> > > the first one for each page, and store that separatly. Then I just
> > > need to run the MR daily.
> > >
> > > It's not a perfect solution I agree, but this might do the job for me.
> > > I'm totally open to all other idea which might do the job to.
> > >
> > > JM
> > >
> > > 2013/1/29, anil gupta <an...@gmail.com>:
> > >> Yes, your suggested solution only works on RowKey based pagination. It
> > will
> > >> fail when you start filtering on the basis of columns.
> > >>
> > >> Still, i would say it's comparatively easier to maintain this at
> > >> Application level rather than creating tables for pagination.
> > >>
> > >> What if you have 300 columns in your schema. Will you create 300
> tables?
> > >> What about handling of pagination when filtering is done based on
> > multiple
> > >> columns ("and" and "or" conditions)?
> > >>
> > >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> > >> jean-marc@spaggiari.org> wrote:
> > >>
> > >>> No, no killer solution here ;)
> > >>>
> > >>> But I'm still thinking about that because I might have to implement
> > >>> some pagination options soon...
> > >>>
> > >>> As you are saying, it's only working on the row-key, but if you want
> > >>> to do the same-thing on non-rowkey, you might have to create a
> > >>> secondary index table...
> > >>>
> > >>> JM
> > >>>
> > >>> 2013/1/27, anil gupta <an...@gmail.com>:
> > >>> > That's alright..I thought that you have come-up with a killer
> > solution.
> > >>> So,
> > >>> > got curious to hear your ideas. ;)
> > >>> > It seems like your below mentioned solution will not work on
> > filtering
> > >>> > on
> > >>> > non row-key columns since when you are deciding the page numbers
> you
> > >>> > are
> > >>> > only considering rowkey.
> > >>> >
> > >>> > Thanks,
> > >>> > Anil
> > >>> >
> > >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> > >>> > jean-marc@spaggiari.org> wrote:
> > >>> >
> > >>> >> Hi Anil,
> > >>> >>
> > >>> >> I don't have a solution. I never tought about that ;) But I was
> > >>> >> thinking about something like you create a 2nd table where you
> place
> > >>> >> the raw number (4 bytes) then the raw key. You go directly to a
> > >>> >> specific page, you query by the number, found the key, and you
> know
> > >>> >> where to start you scan in the main table.
> > >>> >>
> > >>> >> The issue is properly the number for each lines since with a MR
> you
> > >>> >> don't know where you are from the beginning. But you can built
> > >>> >> something where you store the line number from the beginning of
> the
> > >>> >> region, then when all regions are parsed you can reconstruct the
> > total
> > >>> >> numbering... That should work...
> > >>> >>
> > >>> >> JM
> > >>> >>
> > >>> >> 2013/1/25, anil gupta <an...@gmail.com>:
> > >>> >> > Inline...
> > >>> >> >
> > >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> > >>> >> > jean-marc@spaggiari.org> wrote:
> > >>> >> >
> > >>> >> >> Hi Anil,
> > >>> >> >>
> > >>> >> >> The issue is that all the other sub-sequent page start should
> be
> > >>> moved
> > >>> >> >> too...
> > >>> >> >>
> > >>> >> > Yes, this is a possibility. Hence the Developer has to take care
> > of
> > >>> >> > this
> > >>> >> > case. It might also be possible that the pageSize is not a hard
> > >>> >> > limit
> > >>> >> > on
> > >>> >> > number of results(more like a hint or suggestion on size). I
> would
> > >>> >> > say
> > >>> >> > it
> > >>> >> > varies by use case.
> > >>> >> >
> > >>> >> >>
> > >>> >> >> so if you want to jump directly to page n, you might be totally
> > >>> >> >> shifted because of all the data inserted in the meantime...
> > >>> >> >>
> > >>> >> >> If you want a real complete pagination feature, you might want
> to
> > >>> have
> > >>> >> >> a coproccessor or a MR updating another table refering to the
> > >>> >> >> pages....
> > >>> >> >>
> > >>> >> > Well, the solution depends on the use case. I will be doing
> > >>> >> > pagination
> > >
> >
> > --
> > Warm Regards,
> > Tariq
> > https://mtariq.jux.com/
> > cloudfront.blogspot.com
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Thanks & Regards,
Anil Gupta

RE: Pagination with HBase - getting previous page of data

Posted by Anoop Sam John <an...@huawei.com>.

JM,

>100 rows from the 2nd region is using extra time and resources. Why
not ask for only the number of missing lines?

These are some thing needs to be controlled by the scanning app. It can well control the pagination with out using the PageFilter I guess..  What do u say?


-Anoop-
________________________________________
From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
Sent: Wednesday, January 30, 2013 5:48 PM
To: user@hbase.apache.org
Subject: Re: Pagination with HBase - getting previous page of data

Hi Anoop,

So does it mean the scanner can send back LIMIT*2-1 lines max? Reading
100 rows from the 2nd region is using extra time and resources. Why
not ask for only the number of missing lines?

JM

2013/1/30, Anoop Sam John <an...@huawei.com>:
> @Anil
>
>>I could not understand that why it goes to multiple regionservers in
> parallel. Why it cannot guarantee results <= page size( my guess: due to
> multiple RS scans)? If you have used it then maybe you can explain the
> behaviour?
>
> Scan from client side never go to multiple RS in parallel. Scan from HTable
> API will be sequential with one region after the other. For every region it
> will open up scanner in the RS and do next() calls. The filter will be
> instantiated at server side per region level ...
>
> When u need 100 rows in the page and you created a Scan at client side with
> the filter and suppose there are 2 regions, 1st the scanner is opened at for
> region1 and scan is happening. It will ensure that max 100 rows will be
> retrieved from that region.  But when the region boundary crosses and client
> automatically open up scanner for the region2, there also it will pass
> filter with max 100 rows and so from there also max 100 rows can come..  So
> over all at the client side we can not guartee that the scan created will
> only scan 100 rows as a whole from the table.
>
> I think I am making it clear.   I have not PageFilter at all.. I am just
> explaining as per the knowledge on scan flow and the general filter usage.
>
> "This is because the filter is applied separately on different region
> servers. It does however optimize the scan of individual HRegions by making
> sure that the page size is never exceeded locally. "
>
> I guess it need to be saying that   "This is because the filter is applied
> separately on different regions".
>
> -Anoop-
>
> ________________________________________
> From: anil gupta [anilgupta84@gmail.com]
> Sent: Wednesday, January 30, 2013 1:33 PM
> To: user@hbase.apache.org
> Subject: Re: Pagination with HBase - getting previous page of data
>
> Hi Mohammad,
>
> You are most welcome to join the discussion. I have never used PageFilter
> so i don't really have concrete input.
> I had a look at
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> I could not understand that why it goes to multiple regionservers in
> parallel. Why it cannot guarantee results <= page size( my guess: due to
> multiple RS scans)? If you have used it then maybe you can explain the
> behaviour?
>
> Thanks,
> Anil
>
> On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <do...@gmail.com> wrote:
>
>> I'm kinda hesitant to put my leg in between the pros ;)But, does it sound
>> sane to use PageFilter for both rows and columns and having some
>> additional
>> logic to handle the 'nth' page logic?It'll help us in both kind of
>> paging.
>>
>> On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
>> jean-marc@spaggiari.org>
>> wrote:
>> > Hi Anil,
>> >
>> > I think it really depend on the way you want to use the pagination.
>> >
>> > Do you need to be able to jump to page X? Are you ok if you miss a
>> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
>> > page indexes are a day old? Do you need to paginate over 300 colums?
>> > Or just 1? Do you need to always have the exact same number of entries
>> > in each page?
>> >
>> > For my usecase I need to be able to jump to the page X and I don't
>> > have any content. I have hundred of millions lines. Only the rowkey
>> > matter for me and I'm fine if sometime I have 50 entries displayed,
>> > and sometime only 45. So I'm thinking about calculating which row is
>> > the first one for each page, and store that separatly. Then I just
>> > need to run the MR daily.
>> >
>> > It's not a perfect solution I agree, but this might do the job for me.
>> > I'm totally open to all other idea which might do the job to.
>> >
>> > JM
>> >
>> > 2013/1/29, anil gupta <an...@gmail.com>:
>> >> Yes, your suggested solution only works on RowKey based pagination. It
>> will
>> >> fail when you start filtering on the basis of columns.
>> >>
>> >> Still, i would say it's comparatively easier to maintain this at
>> >> Application level rather than creating tables for pagination.
>> >>
>> >> What if you have 300 columns in your schema. Will you create 300
>> >> tables?
>> >> What about handling of pagination when filtering is done based on
>> multiple
>> >> columns ("and" and "or" conditions)?
>> >>
>> >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
>> >> jean-marc@spaggiari.org> wrote:
>> >>
>> >>> No, no killer solution here ;)
>> >>>
>> >>> But I'm still thinking about that because I might have to implement
>> >>> some pagination options soon...
>> >>>
>> >>> As you are saying, it's only working on the row-key, but if you want
>> >>> to do the same-thing on non-rowkey, you might have to create a
>> >>> secondary index table...
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/1/27, anil gupta <an...@gmail.com>:
>> >>> > That's alright..I thought that you have come-up with a killer
>> solution.
>> >>> So,
>> >>> > got curious to hear your ideas. ;)
>> >>> > It seems like your below mentioned solution will not work on
>> filtering
>> >>> > on
>> >>> > non row-key columns since when you are deciding the page numbers
>> >>> > you
>> >>> > are
>> >>> > only considering rowkey.
>> >>> >
>> >>> > Thanks,
>> >>> > Anil
>> >>> >
>> >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
>> >>> > jean-marc@spaggiari.org> wrote:
>> >>> >
>> >>> >> Hi Anil,
>> >>> >>
>> >>> >> I don't have a solution. I never tought about that ;) But I was
>> >>> >> thinking about something like you create a 2nd table where you
>> >>> >> place
>> >>> >> the raw number (4 bytes) then the raw key. You go directly to a
>> >>> >> specific page, you query by the number, found the key, and you
>> >>> >> know
>> >>> >> where to start you scan in the main table.
>> >>> >>
>> >>> >> The issue is properly the number for each lines since with a MR
>> >>> >> you
>> >>> >> don't know where you are from the beginning. But you can built
>> >>> >> something where you store the line number from the beginning of
>> >>> >> the
>> >>> >> region, then when all regions are parsed you can reconstruct the
>> total
>> >>> >> numbering... That should work...
>> >>> >>
>> >>> >> JM
>> >>> >>
>> >>> >> 2013/1/25, anil gupta <an...@gmail.com>:
>> >>> >> > Inline...
>> >>> >> >
>> >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
>> >>> >> > jean-marc@spaggiari.org> wrote:
>> >>> >> >
>> >>> >> >> Hi Anil,
>> >>> >> >>
>> >>> >> >> The issue is that all the other sub-sequent page start should
>> >>> >> >> be
>> >>> moved
>> >>> >> >> too...
>> >>> >> >>
>> >>> >> > Yes, this is a possibility. Hence the Developer has to take care
>> of
>> >>> >> > this
>> >>> >> > case. It might also be possible that the pageSize is not a hard
>> >>> >> > limit
>> >>> >> > on
>> >>> >> > number of results(more like a hint or suggestion on size). I
>> >>> >> > would
>> >>> >> > say
>> >>> >> > it
>> >>> >> > varies by use case.
>> >>> >> >
>> >>> >> >>
>> >>> >> >> so if you want to jump directly to page n, you might be totally
>> >>> >> >> shifted because of all the data inserted in the meantime...
>> >>> >> >>
>> >>> >> >> If you want a real complete pagination feature, you might want
>> >>> >> >> to
>> >>> have
>> >>> >> >> a coproccessor or a MR updating another table refering to the
>> >>> >> >> pages....
>> >>> >> >>
>> >>> >> > Well, the solution depends on the use case. I will be doing
>> >>> >> > pagination
>> >
>>
>> --
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta

Re: Pagination with HBase - getting previous page of data

Posted by Asaf Mesika <as...@gmail.com>.

Here are my thoughts on this matter:

1. If you define setCaching(numOfRows) on the the scan object, you can
check before each call to make sure you haven't passed your page limit,
thus won't get to the point in which you retrieve from each region pageSize
results.

2. I think its o.k. for the UI to present a certain point in time in the
database on offer paging on that. You can achieve that by taking current
timestamp (System.currentTime()) and force the results to returned up to
that time by using scan.setTimeRange(0, currentTime). If you save
currentTime and send it back with the results to the UI, it can keep
sending it to backend, thus ensuring you're viewing that point in time.
If rows keeps being inserted, their timestamp will be greater, thus not
displayed


On Wed, Jan 30, 2013 at 2:42 PM, Toby Lazar <tl...@gmail.com> wrote:

> Sounds like if you had 1000 regions, each with 99 rows, and you asked
> for 100 that you'd get back 99,000. My guess is that a Filter is
> serialized once and that is sent successively to each region and that
> it isn't updated between regions.  Don't think doing that would be too
> easy.
>
> Toby
>
> On 1/30/13, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:
> > Hi Anoop,
> >
> > So does it mean the scanner can send back LIMIT*2-1 lines max? Reading
> > 100 rows from the 2nd region is using extra time and resources. Why
> > not ask for only the number of missing lines?
> >
> > JM
> >
> > 2013/1/30, Anoop Sam John <an...@huawei.com>:
> >> @Anil
> >>
> >>>I could not understand that why it goes to multiple regionservers in
> >> parallel. Why it cannot guarantee results <= page size( my guess: due to
> >> multiple RS scans)? If you have used it then maybe you can explain the
> >> behaviour?
> >>
> >> Scan from client side never go to multiple RS in parallel. Scan from
> >> HTable
> >> API will be sequential with one region after the other. For every region
> >> it
> >> will open up scanner in the RS and do next() calls. The filter will be
> >> instantiated at server side per region level ...
> >>
> >> When u need 100 rows in the page and you created a Scan at client side
> >> with
> >> the filter and suppose there are 2 regions, 1st the scanner is opened at
> >> for
> >> region1 and scan is happening. It will ensure that max 100 rows will be
> >> retrieved from that region.  But when the region boundary crosses and
> >> client
> >> automatically open up scanner for the region2, there also it will pass
> >> filter with max 100 rows and so from there also max 100 rows can come..
> >> So
> >> over all at the client side we can not guartee that the scan created
> will
> >> only scan 100 rows as a whole from the table.
> >>
> >> I think I am making it clear.   I have not PageFilter at all.. I am just
> >> explaining as per the knowledge on scan flow and the general filter
> >> usage.
> >>
> >> "This is because the filter is applied separately on different region
> >> servers. It does however optimize the scan of individual HRegions by
> >> making
> >> sure that the page size is never exceeded locally. "
> >>
> >> I guess it need to be saying that   "This is because the filter is
> >> applied
> >> separately on different regions".
> >>
> >> -Anoop-
> >>
> >> ________________________________________
> >> From: anil gupta [anilgupta84@gmail.com]
> >> Sent: Wednesday, January 30, 2013 1:33 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: Pagination with HBase - getting previous page of data
> >>
> >> Hi Mohammad,
> >>
> >> You are most welcome to join the discussion. I have never used
> PageFilter
> >> so i don't really have concrete input.
> >> I had a look at
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> >> I could not understand that why it goes to multiple regionservers in
> >> parallel. Why it cannot guarantee results <= page size( my guess: due to
> >> multiple RS scans)? If you have used it then maybe you can explain the
> >> behaviour?
> >>
> >> Thanks,
> >> Anil
> >>
> >> On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <do...@gmail.com>
> >> wrote:
> >>
> >>> I'm kinda hesitant to put my leg in between the pros ;)But, does it
> >>> sound
> >>> sane to use PageFilter for both rows and columns and having some
> >>> additional
> >>> logic to handle the 'nth' page logic?It'll help us in both kind of
> >>> paging.
> >>>
> >>> On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> >>> jean-marc@spaggiari.org>
> >>> wrote:
> >>> > Hi Anil,
> >>> >
> >>> > I think it really depend on the way you want to use the pagination.
> >>> >
> >>> > Do you need to be able to jump to page X? Are you ok if you miss a
> >>> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> >>> > page indexes are a day old? Do you need to paginate over 300 colums?
> >>> > Or just 1? Do you need to always have the exact same number of
> entries
> >>> > in each page?
> >>> >
> >>> > For my usecase I need to be able to jump to the page X and I don't
> >>> > have any content. I have hundred of millions lines. Only the rowkey
> >>> > matter for me and I'm fine if sometime I have 50 entries displayed,
> >>> > and sometime only 45. So I'm thinking about calculating which row is
> >>> > the first one for each page, and store that separatly. Then I just
> >>> > need to run the MR daily.
> >>> >
> >>> > It's not a perfect solution I agree, but this might do the job for
> me.
> >>> > I'm totally open to all other idea which might do the job to.
> >>> >
> >>> > JM
> >>> >
> >>> > 2013/1/29, anil gupta <an...@gmail.com>:
> >>> >> Yes, your suggested solution only works on RowKey based pagination.
> >>> >> It
> >>> will
> >>> >> fail when you start filtering on the basis of columns.
> >>> >>
> >>> >> Still, i would say it's comparatively easier to maintain this at
> >>> >> Application level rather than creating tables for pagination.
> >>> >>
> >>> >> What if you have 300 columns in your schema. Will you create 300
> >>> >> tables?
> >>> >> What about handling of pagination when filtering is done based on
> >>> multiple
> >>> >> columns ("and" and "or" conditions)?
> >>> >>
> >>> >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> >>> >> jean-marc@spaggiari.org> wrote:
> >>> >>
> >>> >>> No, no killer solution here ;)
> >>> >>>
> >>> >>> But I'm still thinking about that because I might have to implement
> >>> >>> some pagination options soon...
> >>> >>>
> >>> >>> As you are saying, it's only working on the row-key, but if you
> want
> >>> >>> to do the same-thing on non-rowkey, you might have to create a
> >>> >>> secondary index table...
> >>> >>>
> >>> >>> JM
> >>> >>>
> >>> >>> 2013/1/27, anil gupta <an...@gmail.com>:
> >>> >>> > That's alright..I thought that you have come-up with a killer
> >>> solution.
> >>> >>> So,
> >>> >>> > got curious to hear your ideas. ;)
> >>> >>> > It seems like your below mentioned solution will not work on
> >>> filtering
> >>> >>> > on
> >>> >>> > non row-key columns since when you are deciding the page numbers
> >>> >>> > you
> >>> >>> > are
> >>> >>> > only considering rowkey.
> >>> >>> >
> >>> >>> > Thanks,
> >>> >>> > Anil
> >>> >>> >
> >>> >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> >>> >>> > jean-marc@spaggiari.org> wrote:
> >>> >>> >
> >>> >>> >> Hi Anil,
> >>> >>> >>
> >>> >>> >> I don't have a solution. I never tought about that ;) But I was
> >>> >>> >> thinking about something like you create a 2nd table where you
> >>> >>> >> place
> >>> >>> >> the raw number (4 bytes) then the raw key. You go directly to a
> >>> >>> >> specific page, you query by the number, found the key, and you
> >>> >>> >> know
> >>> >>> >> where to start you scan in the main table.
> >>> >>> >>
> >>> >>> >> The issue is properly the number for each lines since with a MR
> >>> >>> >> you
> >>> >>> >> don't know where you are from the beginning. But you can built
> >>> >>> >> something where you store the line number from the beginning of
> >>> >>> >> the
> >>> >>> >> region, then when all regions are parsed you can reconstruct the
> >>> total
> >>> >>> >> numbering... That should work...
> >>> >>> >>
> >>> >>> >> JM
> >>> >>> >>
> >>> >>> >> 2013/1/25, anil gupta <an...@gmail.com>:
> >>> >>> >> > Inline...
> >>> >>> >> >
> >>> >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> >>> >>> >> > jean-marc@spaggiari.org> wrote:
> >>> >>> >> >
> >>> >>> >> >> Hi Anil,
> >>> >>> >> >>
> >>> >>> >> >> The issue is that all the other sub-sequent page start should
> >>> >>> >> >> be
> >>> >>> moved
> >>> >>> >> >> too...
> >>> >>> >> >>
> >>> >>> >> > Yes, this is a possibility. Hence the Developer has to take
> >>> >>> >> > care
> >>> of
> >>> >>> >> > this
> >>> >>> >> > case. It might also be possible that the pageSize is not a
> hard
> >>> >>> >> > limit
> >>> >>> >> > on
> >>> >>> >> > number of results(more like a hint or suggestion on size). I
> >>> >>> >> > would
> >>> >>> >> > say
> >>> >>> >> > it
> >>> >>> >> > varies by use case.
> >>> >>> >> >
> >>> >>> >> >>
> >>> >>> >> >> so if you want to jump directly to page n, you might be
> >>> >>> >> >> totally
> >>> >>> >> >> shifted because of all the data inserted in the meantime...
> >>> >>> >> >>
> >>> >>> >> >> If you want a real complete pagination feature, you might
> want
> >>> >>> >> >> to
> >>> >>> have
> >>> >>> >> >> a coproccessor or a MR updating another table refering to the
> >>> >>> >> >> pages....
> >>> >>> >> >>
> >>> >>> >> > Well, the solution depends on the use case. I will be doing
> >>> >>> >> > pagination
> >>> >
> >>>
> >>> --
> >>> Warm Regards,
> >>> Tariq
> >>> https://mtariq.jux.com/
> >>> cloudfront.blogspot.com
> >>>
> >>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> Anil Gupta
> >
>
> --
> Sent from my mobile device
>

Re: Pagination with HBase - getting previous page of data

Posted by Toby Lazar <tl...@gmail.com>.

Sounds like if you had 1000 regions, each with 99 rows, and you asked
for 100 that you'd get back 99,000. My guess is that a Filter is
serialized once and that is sent successively to each region and that
it isn't updated between regions.  Don't think doing that would be too
easy.

Toby

On 1/30/13, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:
> Hi Anoop,
>
> So does it mean the scanner can send back LIMIT*2-1 lines max? Reading
> 100 rows from the 2nd region is using extra time and resources. Why
> not ask for only the number of missing lines?
>
> JM
>
> 2013/1/30, Anoop Sam John <an...@huawei.com>:
>> @Anil
>>
>>>I could not understand that why it goes to multiple regionservers in
>> parallel. Why it cannot guarantee results <= page size( my guess: due to
>> multiple RS scans)? If you have used it then maybe you can explain the
>> behaviour?
>>
>> Scan from client side never go to multiple RS in parallel. Scan from
>> HTable
>> API will be sequential with one region after the other. For every region
>> it
>> will open up scanner in the RS and do next() calls. The filter will be
>> instantiated at server side per region level ...
>>
>> When u need 100 rows in the page and you created a Scan at client side
>> with
>> the filter and suppose there are 2 regions, 1st the scanner is opened at
>> for
>> region1 and scan is happening. It will ensure that max 100 rows will be
>> retrieved from that region.  But when the region boundary crosses and
>> client
>> automatically open up scanner for the region2, there also it will pass
>> filter with max 100 rows and so from there also max 100 rows can come..
>> So
>> over all at the client side we can not guartee that the scan created will
>> only scan 100 rows as a whole from the table.
>>
>> I think I am making it clear.   I have not PageFilter at all.. I am just
>> explaining as per the knowledge on scan flow and the general filter
>> usage.
>>
>> "This is because the filter is applied separately on different region
>> servers. It does however optimize the scan of individual HRegions by
>> making
>> sure that the page size is never exceeded locally. "
>>
>> I guess it need to be saying that   "This is because the filter is
>> applied
>> separately on different regions".
>>
>> -Anoop-
>>
>> ________________________________________
>> From: anil gupta [anilgupta84@gmail.com]
>> Sent: Wednesday, January 30, 2013 1:33 PM
>> To: user@hbase.apache.org
>> Subject: Re: Pagination with HBase - getting previous page of data
>>
>> Hi Mohammad,
>>
>> You are most welcome to join the discussion. I have never used PageFilter
>> so i don't really have concrete input.
>> I had a look at
>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
>> I could not understand that why it goes to multiple regionservers in
>> parallel. Why it cannot guarantee results <= page size( my guess: due to
>> multiple RS scans)? If you have used it then maybe you can explain the
>> behaviour?
>>
>> Thanks,
>> Anil
>>
>> On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <do...@gmail.com>
>> wrote:
>>
>>> I'm kinda hesitant to put my leg in between the pros ;)But, does it
>>> sound
>>> sane to use PageFilter for both rows and columns and having some
>>> additional
>>> logic to handle the 'nth' page logic?It'll help us in both kind of
>>> paging.
>>>
>>> On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
>>> jean-marc@spaggiari.org>
>>> wrote:
>>> > Hi Anil,
>>> >
>>> > I think it really depend on the way you want to use the pagination.
>>> >
>>> > Do you need to be able to jump to page X? Are you ok if you miss a
>>> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
>>> > page indexes are a day old? Do you need to paginate over 300 colums?
>>> > Or just 1? Do you need to always have the exact same number of entries
>>> > in each page?
>>> >
>>> > For my usecase I need to be able to jump to the page X and I don't
>>> > have any content. I have hundred of millions lines. Only the rowkey
>>> > matter for me and I'm fine if sometime I have 50 entries displayed,
>>> > and sometime only 45. So I'm thinking about calculating which row is
>>> > the first one for each page, and store that separatly. Then I just
>>> > need to run the MR daily.
>>> >
>>> > It's not a perfect solution I agree, but this might do the job for me.
>>> > I'm totally open to all other idea which might do the job to.
>>> >
>>> > JM
>>> >
>>> > 2013/1/29, anil gupta <an...@gmail.com>:
>>> >> Yes, your suggested solution only works on RowKey based pagination.
>>> >> It
>>> will
>>> >> fail when you start filtering on the basis of columns.
>>> >>
>>> >> Still, i would say it's comparatively easier to maintain this at
>>> >> Application level rather than creating tables for pagination.
>>> >>
>>> >> What if you have 300 columns in your schema. Will you create 300
>>> >> tables?
>>> >> What about handling of pagination when filtering is done based on
>>> multiple
>>> >> columns ("and" and "or" conditions)?
>>> >>
>>> >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
>>> >> jean-marc@spaggiari.org> wrote:
>>> >>
>>> >>> No, no killer solution here ;)
>>> >>>
>>> >>> But I'm still thinking about that because I might have to implement
>>> >>> some pagination options soon...
>>> >>>
>>> >>> As you are saying, it's only working on the row-key, but if you want
>>> >>> to do the same-thing on non-rowkey, you might have to create a
>>> >>> secondary index table...
>>> >>>
>>> >>> JM
>>> >>>
>>> >>> 2013/1/27, anil gupta <an...@gmail.com>:
>>> >>> > That's alright..I thought that you have come-up with a killer
>>> solution.
>>> >>> So,
>>> >>> > got curious to hear your ideas. ;)
>>> >>> > It seems like your below mentioned solution will not work on
>>> filtering
>>> >>> > on
>>> >>> > non row-key columns since when you are deciding the page numbers
>>> >>> > you
>>> >>> > are
>>> >>> > only considering rowkey.
>>> >>> >
>>> >>> > Thanks,
>>> >>> > Anil
>>> >>> >
>>> >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
>>> >>> > jean-marc@spaggiari.org> wrote:
>>> >>> >
>>> >>> >> Hi Anil,
>>> >>> >>
>>> >>> >> I don't have a solution. I never tought about that ;) But I was
>>> >>> >> thinking about something like you create a 2nd table where you
>>> >>> >> place
>>> >>> >> the raw number (4 bytes) then the raw key. You go directly to a
>>> >>> >> specific page, you query by the number, found the key, and you
>>> >>> >> know
>>> >>> >> where to start you scan in the main table.
>>> >>> >>
>>> >>> >> The issue is properly the number for each lines since with a MR
>>> >>> >> you
>>> >>> >> don't know where you are from the beginning. But you can built
>>> >>> >> something where you store the line number from the beginning of
>>> >>> >> the
>>> >>> >> region, then when all regions are parsed you can reconstruct the
>>> total
>>> >>> >> numbering... That should work...
>>> >>> >>
>>> >>> >> JM
>>> >>> >>
>>> >>> >> 2013/1/25, anil gupta <an...@gmail.com>:
>>> >>> >> > Inline...
>>> >>> >> >
>>> >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
>>> >>> >> > jean-marc@spaggiari.org> wrote:
>>> >>> >> >
>>> >>> >> >> Hi Anil,
>>> >>> >> >>
>>> >>> >> >> The issue is that all the other sub-sequent page start should
>>> >>> >> >> be
>>> >>> moved
>>> >>> >> >> too...
>>> >>> >> >>
>>> >>> >> > Yes, this is a possibility. Hence the Developer has to take
>>> >>> >> > care
>>> of
>>> >>> >> > this
>>> >>> >> > case. It might also be possible that the pageSize is not a hard
>>> >>> >> > limit
>>> >>> >> > on
>>> >>> >> > number of results(more like a hint or suggestion on size). I
>>> >>> >> > would
>>> >>> >> > say
>>> >>> >> > it
>>> >>> >> > varies by use case.
>>> >>> >> >
>>> >>> >> >>
>>> >>> >> >> so if you want to jump directly to page n, you might be
>>> >>> >> >> totally
>>> >>> >> >> shifted because of all the data inserted in the meantime...
>>> >>> >> >>
>>> >>> >> >> If you want a real complete pagination feature, you might want
>>> >>> >> >> to
>>> >>> have
>>> >>> >> >> a coproccessor or a MR updating another table refering to the
>>> >>> >> >> pages....
>>> >>> >> >>
>>> >>> >> > Well, the solution depends on the use case. I will be doing
>>> >>> >> > pagination
>>> >
>>>
>>> --
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>

-- 
Sent from my mobile device

Re: Pagination with HBase - getting previous page of data

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Anoop,

So does it mean the scanner can send back LIMIT*2-1 lines max? Reading
100 rows from the 2nd region is using extra time and resources. Why
not ask for only the number of missing lines?

JM

2013/1/30, Anoop Sam John <an...@huawei.com>:
> @Anil
>
>>I could not understand that why it goes to multiple regionservers in
> parallel. Why it cannot guarantee results <= page size( my guess: due to
> multiple RS scans)? If you have used it then maybe you can explain the
> behaviour?
>
> Scan from client side never go to multiple RS in parallel. Scan from HTable
> API will be sequential with one region after the other. For every region it
> will open up scanner in the RS and do next() calls. The filter will be
> instantiated at server side per region level ...
>
> When u need 100 rows in the page and you created a Scan at client side with
> the filter and suppose there are 2 regions, 1st the scanner is opened at for
> region1 and scan is happening. It will ensure that max 100 rows will be
> retrieved from that region.  But when the region boundary crosses and client
> automatically open up scanner for the region2, there also it will pass
> filter with max 100 rows and so from there also max 100 rows can come..  So
> over all at the client side we can not guartee that the scan created will
> only scan 100 rows as a whole from the table.
>
> I think I am making it clear.   I have not PageFilter at all.. I am just
> explaining as per the knowledge on scan flow and the general filter usage.
>
> "This is because the filter is applied separately on different region
> servers. It does however optimize the scan of individual HRegions by making
> sure that the page size is never exceeded locally. "
>
> I guess it need to be saying that   "This is because the filter is applied
> separately on different regions".
>
> -Anoop-
>
> ________________________________________
> From: anil gupta [anilgupta84@gmail.com]
> Sent: Wednesday, January 30, 2013 1:33 PM
> To: user@hbase.apache.org
> Subject: Re: Pagination with HBase - getting previous page of data
>
> Hi Mohammad,
>
> You are most welcome to join the discussion. I have never used PageFilter
> so i don't really have concrete input.
> I had a look at
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> I could not understand that why it goes to multiple regionservers in
> parallel. Why it cannot guarantee results <= page size( my guess: due to
> multiple RS scans)? If you have used it then maybe you can explain the
> behaviour?
>
> Thanks,
> Anil
>
> On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <do...@gmail.com> wrote:
>
>> I'm kinda hesitant to put my leg in between the pros ;)But, does it sound
>> sane to use PageFilter for both rows and columns and having some
>> additional
>> logic to handle the 'nth' page logic?It'll help us in both kind of
>> paging.
>>
>> On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
>> jean-marc@spaggiari.org>
>> wrote:
>> > Hi Anil,
>> >
>> > I think it really depend on the way you want to use the pagination.
>> >
>> > Do you need to be able to jump to page X? Are you ok if you miss a
>> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
>> > page indexes are a day old? Do you need to paginate over 300 colums?
>> > Or just 1? Do you need to always have the exact same number of entries
>> > in each page?
>> >
>> > For my usecase I need to be able to jump to the page X and I don't
>> > have any content. I have hundred of millions lines. Only the rowkey
>> > matter for me and I'm fine if sometime I have 50 entries displayed,
>> > and sometime only 45. So I'm thinking about calculating which row is
>> > the first one for each page, and store that separatly. Then I just
>> > need to run the MR daily.
>> >
>> > It's not a perfect solution I agree, but this might do the job for me.
>> > I'm totally open to all other idea which might do the job to.
>> >
>> > JM
>> >
>> > 2013/1/29, anil gupta <an...@gmail.com>:
>> >> Yes, your suggested solution only works on RowKey based pagination. It
>> will
>> >> fail when you start filtering on the basis of columns.
>> >>
>> >> Still, i would say it's comparatively easier to maintain this at
>> >> Application level rather than creating tables for pagination.
>> >>
>> >> What if you have 300 columns in your schema. Will you create 300
>> >> tables?
>> >> What about handling of pagination when filtering is done based on
>> multiple
>> >> columns ("and" and "or" conditions)?
>> >>
>> >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
>> >> jean-marc@spaggiari.org> wrote:
>> >>
>> >>> No, no killer solution here ;)
>> >>>
>> >>> But I'm still thinking about that because I might have to implement
>> >>> some pagination options soon...
>> >>>
>> >>> As you are saying, it's only working on the row-key, but if you want
>> >>> to do the same-thing on non-rowkey, you might have to create a
>> >>> secondary index table...
>> >>>
>> >>> JM
>> >>>
>> >>> 2013/1/27, anil gupta <an...@gmail.com>:
>> >>> > That's alright..I thought that you have come-up with a killer
>> solution.
>> >>> So,
>> >>> > got curious to hear your ideas. ;)
>> >>> > It seems like your below mentioned solution will not work on
>> filtering
>> >>> > on
>> >>> > non row-key columns since when you are deciding the page numbers
>> >>> > you
>> >>> > are
>> >>> > only considering rowkey.
>> >>> >
>> >>> > Thanks,
>> >>> > Anil
>> >>> >
>> >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
>> >>> > jean-marc@spaggiari.org> wrote:
>> >>> >
>> >>> >> Hi Anil,
>> >>> >>
>> >>> >> I don't have a solution. I never tought about that ;) But I was
>> >>> >> thinking about something like you create a 2nd table where you
>> >>> >> place
>> >>> >> the raw number (4 bytes) then the raw key. You go directly to a
>> >>> >> specific page, you query by the number, found the key, and you
>> >>> >> know
>> >>> >> where to start you scan in the main table.
>> >>> >>
>> >>> >> The issue is properly the number for each lines since with a MR
>> >>> >> you
>> >>> >> don't know where you are from the beginning. But you can built
>> >>> >> something where you store the line number from the beginning of
>> >>> >> the
>> >>> >> region, then when all regions are parsed you can reconstruct the
>> total
>> >>> >> numbering... That should work...
>> >>> >>
>> >>> >> JM
>> >>> >>
>> >>> >> 2013/1/25, anil gupta <an...@gmail.com>:
>> >>> >> > Inline...
>> >>> >> >
>> >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
>> >>> >> > jean-marc@spaggiari.org> wrote:
>> >>> >> >
>> >>> >> >> Hi Anil,
>> >>> >> >>
>> >>> >> >> The issue is that all the other sub-sequent page start should
>> >>> >> >> be
>> >>> moved
>> >>> >> >> too...
>> >>> >> >>
>> >>> >> > Yes, this is a possibility. Hence the Developer has to take care
>> of
>> >>> >> > this
>> >>> >> > case. It might also be possible that the pageSize is not a hard
>> >>> >> > limit
>> >>> >> > on
>> >>> >> > number of results(more like a hint or suggestion on size). I
>> >>> >> > would
>> >>> >> > say
>> >>> >> > it
>> >>> >> > varies by use case.
>> >>> >> >
>> >>> >> >>
>> >>> >> >> so if you want to jump directly to page n, you might be totally
>> >>> >> >> shifted because of all the data inserted in the meantime...
>> >>> >> >>
>> >>> >> >> If you want a real complete pagination feature, you might want
>> >>> >> >> to
>> >>> have
>> >>> >> >> a coproccessor or a MR updating another table refering to the
>> >>> >> >> pages....
>> >>> >> >>
>> >>> >> > Well, the solution depends on the use case. I will be doing
>> >>> >> > pagination
>> >
>>
>> --
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta

RE: Pagination with HBase - getting previous page of data

Posted by Anoop Sam John <an...@huawei.com>.

@Anil

>I could not understand that why it goes to multiple regionservers in
parallel. Why it cannot guarantee results <= page size( my guess: due to
multiple RS scans)? If you have used it then maybe you can explain the
behaviour?

Scan from client side never go to multiple RS in parallel. Scan from HTable API will be sequential with one region after the other. For every region it will open up scanner in the RS and do next() calls. The filter will be instantiated at server side per region level ...

When u need 100 rows in the page and you created a Scan at client side with the filter and suppose there are 2 regions, 1st the scanner is opened at for region1 and scan is happening. It will ensure that max 100 rows will be retrieved from that region.  But when the region boundary crosses and client automatically open up scanner for the region2, there also it will pass filter with max 100 rows and so from there also max 100 rows can come..  So over all at the client side we can not guartee that the scan created will only scan 100 rows as a whole from the table.

I think I am making it clear.   I have not PageFilter at all.. I am just explaining as per the knowledge on scan flow and the general filter usage.

"This is because the filter is applied separately on different region servers. It does however optimize the scan of individual HRegions by making sure that the page size is never exceeded locally. "

I guess it need to be saying that   "This is because the filter is applied separately on different regions".

-Anoop-

________________________________________
From: anil gupta [anilgupta84@gmail.com]
Sent: Wednesday, January 30, 2013 1:33 PM
To: user@hbase.apache.org
Subject: Re: Pagination with HBase - getting previous page of data

Hi Mohammad,

You are most welcome to join the discussion. I have never used PageFilter
so i don't really have concrete input.
I had a look at
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
I could not understand that why it goes to multiple regionservers in
parallel. Why it cannot guarantee results <= page size( my guess: due to
multiple RS scans)? If you have used it then maybe you can explain the
behaviour?

Thanks,
Anil

On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <do...@gmail.com> wrote:

> I'm kinda hesitant to put my leg in between the pros ;)But, does it sound
> sane to use PageFilter for both rows and columns and having some additional
> logic to handle the 'nth' page logic?It'll help us in both kind of paging.
>
> On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org>
> wrote:
> > Hi Anil,
> >
> > I think it really depend on the way you want to use the pagination.
> >
> > Do you need to be able to jump to page X? Are you ok if you miss a
> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> > page indexes are a day old? Do you need to paginate over 300 colums?
> > Or just 1? Do you need to always have the exact same number of entries
> > in each page?
> >
> > For my usecase I need to be able to jump to the page X and I don't
> > have any content. I have hundred of millions lines. Only the rowkey
> > matter for me and I'm fine if sometime I have 50 entries displayed,
> > and sometime only 45. So I'm thinking about calculating which row is
> > the first one for each page, and store that separatly. Then I just
> > need to run the MR daily.
> >
> > It's not a perfect solution I agree, but this might do the job for me.
> > I'm totally open to all other idea which might do the job to.
> >
> > JM
> >
> > 2013/1/29, anil gupta <an...@gmail.com>:
> >> Yes, your suggested solution only works on RowKey based pagination. It
> will
> >> fail when you start filtering on the basis of columns.
> >>
> >> Still, i would say it's comparatively easier to maintain this at
> >> Application level rather than creating tables for pagination.
> >>
> >> What if you have 300 columns in your schema. Will you create 300 tables?
> >> What about handling of pagination when filtering is done based on
> multiple
> >> columns ("and" and "or" conditions)?
> >>
> >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> >> jean-marc@spaggiari.org> wrote:
> >>
> >>> No, no killer solution here ;)
> >>>
> >>> But I'm still thinking about that because I might have to implement
> >>> some pagination options soon...
> >>>
> >>> As you are saying, it's only working on the row-key, but if you want
> >>> to do the same-thing on non-rowkey, you might have to create a
> >>> secondary index table...
> >>>
> >>> JM
> >>>
> >>> 2013/1/27, anil gupta <an...@gmail.com>:
> >>> > That's alright..I thought that you have come-up with a killer
> solution.
> >>> So,
> >>> > got curious to hear your ideas. ;)
> >>> > It seems like your below mentioned solution will not work on
> filtering
> >>> > on
> >>> > non row-key columns since when you are deciding the page numbers you
> >>> > are
> >>> > only considering rowkey.
> >>> >
> >>> > Thanks,
> >>> > Anil
> >>> >
> >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> >>> > jean-marc@spaggiari.org> wrote:
> >>> >
> >>> >> Hi Anil,
> >>> >>
> >>> >> I don't have a solution. I never tought about that ;) But I was
> >>> >> thinking about something like you create a 2nd table where you place
> >>> >> the raw number (4 bytes) then the raw key. You go directly to a
> >>> >> specific page, you query by the number, found the key, and you know
> >>> >> where to start you scan in the main table.
> >>> >>
> >>> >> The issue is properly the number for each lines since with a MR you
> >>> >> don't know where you are from the beginning. But you can built
> >>> >> something where you store the line number from the beginning of the
> >>> >> region, then when all regions are parsed you can reconstruct the
> total
> >>> >> numbering... That should work...
> >>> >>
> >>> >> JM
> >>> >>
> >>> >> 2013/1/25, anil gupta <an...@gmail.com>:
> >>> >> > Inline...
> >>> >> >
> >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> >>> >> > jean-marc@spaggiari.org> wrote:
> >>> >> >
> >>> >> >> Hi Anil,
> >>> >> >>
> >>> >> >> The issue is that all the other sub-sequent page start should be
> >>> moved
> >>> >> >> too...
> >>> >> >>
> >>> >> > Yes, this is a possibility. Hence the Developer has to take care
> of
> >>> >> > this
> >>> >> > case. It might also be possible that the pageSize is not a hard
> >>> >> > limit
> >>> >> > on
> >>> >> > number of results(more like a hint or suggestion on size). I would
> >>> >> > say
> >>> >> > it
> >>> >> > varies by use case.
> >>> >> >
> >>> >> >>
> >>> >> >> so if you want to jump directly to page n, you might be totally
> >>> >> >> shifted because of all the data inserted in the meantime...
> >>> >> >>
> >>> >> >> If you want a real complete pagination feature, you might want to
> >>> have
> >>> >> >> a coproccessor or a MR updating another table refering to the
> >>> >> >> pages....
> >>> >> >>
> >>> >> > Well, the solution depends on the use case. I will be doing
> >>> >> > pagination
> >
>
> --
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>



--
Thanks & Regards,
Anil Gupta

Re: Pagination with HBase - getting previous page of data

Posted by anil gupta <an...@gmail.com>.

Hi Mohammad,

You are most welcome to join the discussion. I have never used PageFilter
so i don't really have concrete input.
I had a look at
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
I could not understand that why it goes to multiple regionservers in
parallel. Why it cannot guarantee results <= page size( my guess: due to
multiple RS scans)? If you have used it then maybe you can explain the
behaviour?

Thanks,
Anil

On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <do...@gmail.com> wrote:

> I'm kinda hesitant to put my leg in between the pros ;)But, does it sound
> sane to use PageFilter for both rows and columns and having some additional
> logic to handle the 'nth' page logic?It'll help us in both kind of paging.
>
> On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org>
> wrote:
> > Hi Anil,
> >
> > I think it really depend on the way you want to use the pagination.
> >
> > Do you need to be able to jump to page X? Are you ok if you miss a
> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> > page indexes are a day old? Do you need to paginate over 300 colums?
> > Or just 1? Do you need to always have the exact same number of entries
> > in each page?
> >
> > For my usecase I need to be able to jump to the page X and I don't
> > have any content. I have hundred of millions lines. Only the rowkey
> > matter for me and I'm fine if sometime I have 50 entries displayed,
> > and sometime only 45. So I'm thinking about calculating which row is
> > the first one for each page, and store that separatly. Then I just
> > need to run the MR daily.
> >
> > It's not a perfect solution I agree, but this might do the job for me.
> > I'm totally open to all other idea which might do the job to.
> >
> > JM
> >
> > 2013/1/29, anil gupta <an...@gmail.com>:
> >> Yes, your suggested solution only works on RowKey based pagination. It
> will
> >> fail when you start filtering on the basis of columns.
> >>
> >> Still, i would say it's comparatively easier to maintain this at
> >> Application level rather than creating tables for pagination.
> >>
> >> What if you have 300 columns in your schema. Will you create 300 tables?
> >> What about handling of pagination when filtering is done based on
> multiple
> >> columns ("and" and "or" conditions)?
> >>
> >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> >> jean-marc@spaggiari.org> wrote:
> >>
> >>> No, no killer solution here ;)
> >>>
> >>> But I'm still thinking about that because I might have to implement
> >>> some pagination options soon...
> >>>
> >>> As you are saying, it's only working on the row-key, but if you want
> >>> to do the same-thing on non-rowkey, you might have to create a
> >>> secondary index table...
> >>>
> >>> JM
> >>>
> >>> 2013/1/27, anil gupta <an...@gmail.com>:
> >>> > That's alright..I thought that you have come-up with a killer
> solution.
> >>> So,
> >>> > got curious to hear your ideas. ;)
> >>> > It seems like your below mentioned solution will not work on
> filtering
> >>> > on
> >>> > non row-key columns since when you are deciding the page numbers you
> >>> > are
> >>> > only considering rowkey.
> >>> >
> >>> > Thanks,
> >>> > Anil
> >>> >
> >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> >>> > jean-marc@spaggiari.org> wrote:
> >>> >
> >>> >> Hi Anil,
> >>> >>
> >>> >> I don't have a solution. I never tought about that ;) But I was
> >>> >> thinking about something like you create a 2nd table where you place
> >>> >> the raw number (4 bytes) then the raw key. You go directly to a
> >>> >> specific page, you query by the number, found the key, and you know
> >>> >> where to start you scan in the main table.
> >>> >>
> >>> >> The issue is properly the number for each lines since with a MR you
> >>> >> don't know where you are from the beginning. But you can built
> >>> >> something where you store the line number from the beginning of the
> >>> >> region, then when all regions are parsed you can reconstruct the
> total
> >>> >> numbering... That should work...
> >>> >>
> >>> >> JM
> >>> >>
> >>> >> 2013/1/25, anil gupta <an...@gmail.com>:
> >>> >> > Inline...
> >>> >> >
> >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> >>> >> > jean-marc@spaggiari.org> wrote:
> >>> >> >
> >>> >> >> Hi Anil,
> >>> >> >>
> >>> >> >> The issue is that all the other sub-sequent page start should be
> >>> moved
> >>> >> >> too...
> >>> >> >>
> >>> >> > Yes, this is a possibility. Hence the Developer has to take care
> of
> >>> >> > this
> >>> >> > case. It might also be possible that the pageSize is not a hard
> >>> >> > limit
> >>> >> > on
> >>> >> > number of results(more like a hint or suggestion on size). I would
> >>> >> > say
> >>> >> > it
> >>> >> > varies by use case.
> >>> >> >
> >>> >> >>
> >>> >> >> so if you want to jump directly to page n, you might be totally
> >>> >> >> shifted because of all the data inserted in the meantime...
> >>> >> >>
> >>> >> >> If you want a real complete pagination feature, you might want to
> >>> have
> >>> >> >> a coproccessor or a MR updating another table refering to the
> >>> >> >> pages....
> >>> >> >>
> >>> >> > Well, the solution depends on the use case. I will be doing
> >>> >> > pagination
> >
>
> --
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>



-- 
Thanks & Regards,
Anil Gupta

Re: Pagination with HBase - getting previous page of data

Posted by Mohammad Tariq <do...@gmail.com>.

I'm kinda hesitant to put my leg in between the pros ;)But, does it sound
sane to use PageFilter for both rows and columns and having some additional
logic to handle the 'nth' page logic?It'll help us in both kind of paging.

On Wednesday, January 30, 2013, Jean-Marc Spaggiari <je...@spaggiari.org>
wrote:
> Hi Anil,
>
> I think it really depend on the way you want to use the pagination.
>
> Do you need to be able to jump to page X? Are you ok if you miss a
> line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> page indexes are a day old? Do you need to paginate over 300 colums?
> Or just 1? Do you need to always have the exact same number of entries
> in each page?
>
> For my usecase I need to be able to jump to the page X and I don't
> have any content. I have hundred of millions lines. Only the rowkey
> matter for me and I'm fine if sometime I have 50 entries displayed,
> and sometime only 45. So I'm thinking about calculating which row is
> the first one for each page, and store that separatly. Then I just
> need to run the MR daily.
>
> It's not a perfect solution I agree, but this might do the job for me.
> I'm totally open to all other idea which might do the job to.
>
> JM
>
> 2013/1/29, anil gupta <an...@gmail.com>:
>> Yes, your suggested solution only works on RowKey based pagination. It
will
>> fail when you start filtering on the basis of columns.
>>
>> Still, i would say it's comparatively easier to maintain this at
>> Application level rather than creating tables for pagination.
>>
>> What if you have 300 columns in your schema. Will you create 300 tables?
>> What about handling of pagination when filtering is done based on
multiple
>> columns ("and" and "or" conditions)?
>>
>> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
>> jean-marc@spaggiari.org> wrote:
>>
>>> No, no killer solution here ;)
>>>
>>> But I'm still thinking about that because I might have to implement
>>> some pagination options soon...
>>>
>>> As you are saying, it's only working on the row-key, but if you want
>>> to do the same-thing on non-rowkey, you might have to create a
>>> secondary index table...
>>>
>>> JM
>>>
>>> 2013/1/27, anil gupta <an...@gmail.com>:
>>> > That's alright..I thought that you have come-up with a killer
solution.
>>> So,
>>> > got curious to hear your ideas. ;)
>>> > It seems like your below mentioned solution will not work on filtering
>>> > on
>>> > non row-key columns since when you are deciding the page numbers you
>>> > are
>>> > only considering rowkey.
>>> >
>>> > Thanks,
>>> > Anil
>>> >
>>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
>>> > jean-marc@spaggiari.org> wrote:
>>> >
>>> >> Hi Anil,
>>> >>
>>> >> I don't have a solution. I never tought about that ;) But I was
>>> >> thinking about something like you create a 2nd table where you place
>>> >> the raw number (4 bytes) then the raw key. You go directly to a
>>> >> specific page, you query by the number, found the key, and you know
>>> >> where to start you scan in the main table.
>>> >>
>>> >> The issue is properly the number for each lines since with a MR you
>>> >> don't know where you are from the beginning. But you can built
>>> >> something where you store the line number from the beginning of the
>>> >> region, then when all regions are parsed you can reconstruct the
total
>>> >> numbering... That should work...
>>> >>
>>> >> JM
>>> >>
>>> >> 2013/1/25, anil gupta <an...@gmail.com>:
>>> >> > Inline...
>>> >> >
>>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
>>> >> > jean-marc@spaggiari.org> wrote:
>>> >> >
>>> >> >> Hi Anil,
>>> >> >>
>>> >> >> The issue is that all the other sub-sequent page start should be
>>> moved
>>> >> >> too...
>>> >> >>
>>> >> > Yes, this is a possibility. Hence the Developer has to take care of
>>> >> > this
>>> >> > case. It might also be possible that the pageSize is not a hard
>>> >> > limit
>>> >> > on
>>> >> > number of results(more like a hint or suggestion on size). I would
>>> >> > say
>>> >> > it
>>> >> > varies by use case.
>>> >> >
>>> >> >>
>>> >> >> so if you want to jump directly to page n, you might be totally
>>> >> >> shifted because of all the data inserted in the meantime...
>>> >> >>
>>> >> >> If you want a real complete pagination feature, you might want to
>>> have
>>> >> >> a coproccessor or a MR updating another table refering to the
>>> >> >> pages....
>>> >> >>
>>> >> > Well, the solution depends on the use case. I will be doing
>>> >> > pagination
>

-- 
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com

Re: Pagination with HBase - getting previous page of data

Posted by anil gupta <an...@gmail.com>.

Hi Jean,

Please find my reply inline.

On Tue, Jan 29, 2013 at 1:40 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Anil,
>
> I think it really depend on the way you want to use the pagination.
>
Absolutely true!

>
> Do you need to be able to jump to page X? Are you ok if you miss a
> line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> page indexes are a day old? Do you need to paginate over 300 colums?
> Or just 1? Do you need to always have the exact same number of entries
> in each page?
>
No, i dont need to be able to jump page X.
I dont think that missing lines will be acceptable. I need to filter the
rows on non-rowkey attributes. It wont be ok if my page indexes are 1 day
old. I need to paginate on basis of various filters based on columns
or(and) rowkey. So, the number of combinations are quite large.

>
> For my usecase I need to be able to jump to the page X and I don't
> have any content. I have hundred of millions lines. Only the rowkey
> matter for me and I'm fine if sometime I have 50 entries displayed,
> and sometime only 45. So I'm thinking about calculating which row is
> the first one for each page, and store that separatly. Then I just
> need to run the MR daily.
>
hmm..yeah, it might work for you.

>
> It's not a perfect solution I agree, but this might do the job for me.
> I'm totally open to all other idea which might do the job to.
>
There is nothing like a "perfect" solution. If the implementation is able
to fulfill your business needs, then go for it.

>
> JM
>
> 2013/1/29, anil gupta <an...@gmail.com>:
> > Yes, your suggested solution only works on RowKey based pagination. It
> will
> > fail when you start filtering on the basis of columns.
> >
> > Still, i would say it's comparatively easier to maintain this at
> > Application level rather than creating tables for pagination.
> >
> > What if you have 300 columns in your schema. Will you create 300 tables?
> > What about handling of pagination when filtering is done based on
> multiple
> > columns ("and" and "or" conditions)?
> >
> > On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> No, no killer solution here ;)
> >>
> >> But I'm still thinking about that because I might have to implement
> >> some pagination options soon...
> >>
> >> As you are saying, it's only working on the row-key, but if you want
> >> to do the same-thing on non-rowkey, you might have to create a
> >> secondary index table...
> >>
> >> JM
> >>
> >> 2013/1/27, anil gupta <an...@gmail.com>:
> >> > That's alright..I thought that you have come-up with a killer
> solution.
> >> So,
> >> > got curious to hear your ideas. ;)
> >> > It seems like your below mentioned solution will not work on filtering
> >> > on
> >> > non row-key columns since when you are deciding the page numbers you
> >> > are
> >> > only considering rowkey.
> >> >
> >> > Thanks,
> >> > Anil
> >> >
> >> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org> wrote:
> >> >
> >> >> Hi Anil,
> >> >>
> >> >> I don't have a solution. I never tought about that ;) But I was
> >> >> thinking about something like you create a 2nd table where you place
> >> >> the raw number (4 bytes) then the raw key. You go directly to a
> >> >> specific page, you query by the number, found the key, and you know
> >> >> where to start you scan in the main table.
> >> >>
> >> >> The issue is properly the number for each lines since with a MR you
> >> >> don't know where you are from the beginning. But you can built
> >> >> something where you store the line number from the beginning of the
> >> >> region, then when all regions are parsed you can reconstruct the
> total
> >> >> numbering... That should work...
> >> >>
> >> >> JM
> >> >>
> >> >> 2013/1/25, anil gupta <an...@gmail.com>:
> >> >> > Inline...
> >> >> >
> >> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> >> >> > jean-marc@spaggiari.org> wrote:
> >> >> >
> >> >> >> Hi Anil,
> >> >> >>
> >> >> >> The issue is that all the other sub-sequent page start should be
> >> moved
> >> >> >> too...
> >> >> >>
> >> >> > Yes, this is a possibility. Hence the Developer has to take care of
> >> >> > this
> >> >> > case. It might also be possible that the pageSize is not a hard
> >> >> > limit
> >> >> > on
> >> >> > number of results(more like a hint or suggestion on size). I would
> >> >> > say
> >> >> > it
> >> >> > varies by use case.
> >> >> >
> >> >> >>
> >> >> >> so if you want to jump directly to page n, you might be totally
> >> >> >> shifted because of all the data inserted in the meantime...
> >> >> >>
> >> >> >> If you want a real complete pagination feature, you might want to
> >> have
> >> >> >> a coproccessor or a MR updating another table refering to the
> >> >> >> pages....
> >> >> >>
> >> >> > Well, the solution depends on the use case. I will be doing
> >> >> > pagination
> >> >> > in
> >> >> > HBase for a restful service but till now i am unable to find any
> >> reason
> >> >> why
> >> >> > this cant be done at application level.
> >> >> > Are you suggesting to use MR for paging in HBase? If yes, how?
> >> >> > How would you use another table for pagination?what would you store
> >> >> > in
> >> >> the
> >> >> > extra table?
> >> >> >
> >> >> >>
> >> >> >> JM
> >> >> >>
> >> >> >> 2013/1/25, anil gupta <an...@gmail.com>:
> >> >> >> > Hi Vijay,
> >> >> >> >
> >> >> >> > I've done paging in HBase by using Scan only(no pagination
> >> >> >> > filter)
> >> >> >> > as
> >> >> >> > Mohammed has explained. However it was just an experimental
> >> >> >> > stuff.
> >> >> >> > It
> >> >> >> works
> >> >> >> > but Jean raised a very good point.
> >> >> >> > Find my answer inline to fix the problem that Jean reported.
> >> >> >> >
> >> >> >> >
> >> >> >> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari <
> >> >> >> > jean-marc@spaggiari.org> wrote:
> >> >> >> >
> >> >> >> >> Hi Vijay,
> >> >> >> >>
> >> >> >> >> If, while the user os scrolling forward, you store the key of
> >> >> >> >> each
> >> >> >> >> page, then you will be able to go back to a specific page, and
> >> jump
> >> >> >> >> forward back up to where he was.
> >> >> >> >>
> >> >> >> >> The only issue is that, if while the user is scrolling the
> >> >> >> >> table,
> >> >> >> >> someone insert a row between the last of a page, and the first
> >> >> >> >> of
> >> >> >> >> the
> >> >> >> >> next page, you will never see this row.
> >> >> >> >>
> >> >> >> >> Let's take this exemaple.
> >> >> >> >>
> >> >> >> >> You have 10 items per page.
> >> >> >> >>
> >> >> >> >> 010 020 030 040 050 060 070 080 090 100 is the first page.
> >> >> >> >> 110 120 130 140 150 160 170 180 190 200 is the second one.
> >> >> >> >>
> >> >> >> >> Now, if someone insert 101... If will be just after 100 and
> >> >> >> >> before
> >> >> >> >> 110.
> >> >> >> >>
> >> >> >> > Anil: Instead of scanning from 010 to 100, scan from 010 to 110.
> >> >> >> > Then
> >> >> >> > we
> >> >> >> > wont have this problem. So, i mean to say that
> >> >> >> > startRow(firstRowKeyofPage(N)) and
> >> >> >> > stopRow(firstRowKeyofPage(N+1)).
> >> >> >> > This
> >> >> >> > would fix it. Also, in that case number of results might exceed
> >> >> >> > the
> >> >> >> > pageSize. So you might need to handle this logic.
> >> >> >> >
> >> >> >> >>
> >> >> >> >> When you will display 10 rows starting at 010 you will stop
> just
> >> >> >> >> before 101... And for the next page you will start at 110...
> And
> >> >> >> >> 101
> >> >> >> >> will never be displayed...
> >> >> >> >>
> >> >> >> >> HTH
> >> >> >> >>
> >> >> >> >> JM
> >> >> >> >>
> >> >> >> >> 2013/1/25, Mohammad Tariq <do...@gmail.com>:
> >> >> >> >> > Hello sir,
> >> >> >> >> >
> >> >> >> >> >       While paging through, store the startkey of the current
> >> >> >> >> > page
> >> >> >> >> > of
> >> >> >> >> > 25
> >> >> >> >> > rows
> >> >> >> >> > in a separate byte[]. Now, if you want to come back to this
> >> >> >> >> > page
> >> >> >> >> > when
> >> >> >> >> > you
> >> >> >> >> > are at the next page do a range query where  startkey would
> be
> >> >> >> >> > the
> >> >> >> >> > rowkey
> >> >> >> >> > you had stored earlier and the endkey would be the
> startrowkey
> >> >> >> >> > of
> >> >> >> >>  current
> >> >> >> >> > page. You have to store just one rowkey each time you show a
> >> page
> >> >> >> using
> >> >> >> >> > which you could come back to this page when you are at the
> >> >> >> >> > next
> >> >> >> >> > page.
> >> >> >> >> >
> >> >> >> >> > However, this approach will fail in a case where your user
> >> >> >> >> > would
> >> >> >> >> > like
> >> >> >> >> > to
> >> >> >> >> go
> >> >> >> >> > to a particular previous page.
> >> >> >> >> >
> >> >> >> >> > Warm Regards,
> >> >> >> >> > Tariq
> >> >> >> >> > https://mtariq.jux.com/
> >> >> >> >> > cloudfront.blogspot.com
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan
> >> >> >> >> > <vi...@scaligent.com>
> >> >> >> >> > wrote:
> >> >> >> >> >
> >> >> >> >> >> I'm displaying rows of data from a HBase table in a data
> grid
> >> >> >> >> >> UI.
> >> >> >> >> >> The
> >> >> >> >> >> grid
> >> >> >> >> >> shows 25 rows at a time i.e. it is paginated. User can click
> >> >> >> >> >> on
> >> >> >> >> >> Next/Previous to paginate through the data 25 rows at a
> time.
> >> >> >> >> >> I
> >> >> can
> >> >> >> >> >> implement Next easily by setting a HBase
> >> >> >> >> >> org.apache.hadoop.hbase.filter.PageFilter and setting
> >> >> >> >> >> startRow
> >> >> >> >> >> on
> >> >> >> >> >> the
> >> >> >> >> >> org.apache.hadoop.hbase.client.Scan to be the row id of the
> >> next
> >> >> >> >> >> batch's
> >> >> >> >> >> row that is sent to the UI with the previous batch. However,
> >> >> >> >> >> I
> >> >> >> >> >> can't
> >> >> >> >> seem
> >> >> >> >> >> to be able to do the same with Previous. I can set the
> endRow
> >> on
> >> >> >> >> >> the
> >> >> >> >> Scan
> >> >> >> >> >> to be the row id of the last row of the previous batch but
> >> since
> >> >> >> HBase
> >> >> >> >> >> Scans are always in the forward direction, there is no way
> to
> >> >> >> >> >> set
> >> >> a
> >> >> >> >> >> PageFilter that can get 25 rows ending at a particular row.
> >> >> >> >> >> The
> >> >> >> >> >> only
> >> >> >> >> >> option
> >> >> >> >> >> seems to be to get *all* rows up to the end row and filter
> >> >> >> >> >> out
> >> >> >> >> >> all
> >> >> >> but
> >> >> >> >> >> the
> >> >> >> >> >> last 25 in the caller, which seems very inefficient. Any
> >> >> >> >> >> ideas
> >> >> >> >> >> on
> >> >> >> >> >> how
> >> >> >> >> >> this
> >> >> >> >> >> can be done efficiently?
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >> -Vijay
> >> >> >> >> >>
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Thanks & Regards,
> >> >> >> > Anil Gupta
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks & Regards,
> >> >> > Anil Gupta
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks & Regards,
> >> > Anil Gupta
> >> >
> >>
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Pagination with HBase - getting previous page of data

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Anil,

I think it really depend on the way you want to use the pagination.

Do you need to be able to jump to page X? Are you ok if you miss a
line or 2? Is your data growing fastly? Or slowly? Is it ok if your
page indexes are a day old? Do you need to paginate over 300 colums?
Or just 1? Do you need to always have the exact same number of entries
in each page?

For my usecase I need to be able to jump to the page X and I don't
have any content. I have hundred of millions lines. Only the rowkey
matter for me and I'm fine if sometime I have 50 entries displayed,
and sometime only 45. So I'm thinking about calculating which row is
the first one for each page, and store that separatly. Then I just
need to run the MR daily.

It's not a perfect solution I agree, but this might do the job for me.
I'm totally open to all other idea which might do the job to.

JM

2013/1/29, anil gupta <an...@gmail.com>:
> Yes, your suggested solution only works on RowKey based pagination. It will
> fail when you start filtering on the basis of columns.
>
> Still, i would say it's comparatively easier to maintain this at
> Application level rather than creating tables for pagination.
>
> What if you have 300 columns in your schema. Will you create 300 tables?
> What about handling of pagination when filtering is done based on multiple
> columns ("and" and "or" conditions)?
>
> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> No, no killer solution here ;)
>>
>> But I'm still thinking about that because I might have to implement
>> some pagination options soon...
>>
>> As you are saying, it's only working on the row-key, but if you want
>> to do the same-thing on non-rowkey, you might have to create a
>> secondary index table...
>>
>> JM
>>
>> 2013/1/27, anil gupta <an...@gmail.com>:
>> > That's alright..I thought that you have come-up with a killer solution.
>> So,
>> > got curious to hear your ideas. ;)
>> > It seems like your below mentioned solution will not work on filtering
>> > on
>> > non row-key columns since when you are deciding the page numbers you
>> > are
>> > only considering rowkey.
>> >
>> > Thanks,
>> > Anil
>> >
>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> >> Hi Anil,
>> >>
>> >> I don't have a solution. I never tought about that ;) But I was
>> >> thinking about something like you create a 2nd table where you place
>> >> the raw number (4 bytes) then the raw key. You go directly to a
>> >> specific page, you query by the number, found the key, and you know
>> >> where to start you scan in the main table.
>> >>
>> >> The issue is properly the number for each lines since with a MR you
>> >> don't know where you are from the beginning. But you can built
>> >> something where you store the line number from the beginning of the
>> >> region, then when all regions are parsed you can reconstruct the total
>> >> numbering... That should work...
>> >>
>> >> JM
>> >>
>> >> 2013/1/25, anil gupta <an...@gmail.com>:
>> >> > Inline...
>> >> >
>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
>> >> > jean-marc@spaggiari.org> wrote:
>> >> >
>> >> >> Hi Anil,
>> >> >>
>> >> >> The issue is that all the other sub-sequent page start should be
>> moved
>> >> >> too...
>> >> >>
>> >> > Yes, this is a possibility. Hence the Developer has to take care of
>> >> > this
>> >> > case. It might also be possible that the pageSize is not a hard
>> >> > limit
>> >> > on
>> >> > number of results(more like a hint or suggestion on size). I would
>> >> > say
>> >> > it
>> >> > varies by use case.
>> >> >
>> >> >>
>> >> >> so if you want to jump directly to page n, you might be totally
>> >> >> shifted because of all the data inserted in the meantime...
>> >> >>
>> >> >> If you want a real complete pagination feature, you might want to
>> have
>> >> >> a coproccessor or a MR updating another table refering to the
>> >> >> pages....
>> >> >>
>> >> > Well, the solution depends on the use case. I will be doing
>> >> > pagination
>> >> > in
>> >> > HBase for a restful service but till now i am unable to find any
>> reason
>> >> why
>> >> > this cant be done at application level.
>> >> > Are you suggesting to use MR for paging in HBase? If yes, how?
>> >> > How would you use another table for pagination?what would you store
>> >> > in
>> >> the
>> >> > extra table?
>> >> >
>> >> >>
>> >> >> JM
>> >> >>
>> >> >> 2013/1/25, anil gupta <an...@gmail.com>:
>> >> >> > Hi Vijay,
>> >> >> >
>> >> >> > I've done paging in HBase by using Scan only(no pagination
>> >> >> > filter)
>> >> >> > as
>> >> >> > Mohammed has explained. However it was just an experimental
>> >> >> > stuff.
>> >> >> > It
>> >> >> works
>> >> >> > but Jean raised a very good point.
>> >> >> > Find my answer inline to fix the problem that Jean reported.
>> >> >> >
>> >> >> >
>> >> >> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari <
>> >> >> > jean-marc@spaggiari.org> wrote:
>> >> >> >
>> >> >> >> Hi Vijay,
>> >> >> >>
>> >> >> >> If, while the user os scrolling forward, you store the key of
>> >> >> >> each
>> >> >> >> page, then you will be able to go back to a specific page, and
>> jump
>> >> >> >> forward back up to where he was.
>> >> >> >>
>> >> >> >> The only issue is that, if while the user is scrolling the
>> >> >> >> table,
>> >> >> >> someone insert a row between the last of a page, and the first
>> >> >> >> of
>> >> >> >> the
>> >> >> >> next page, you will never see this row.
>> >> >> >>
>> >> >> >> Let's take this exemaple.
>> >> >> >>
>> >> >> >> You have 10 items per page.
>> >> >> >>
>> >> >> >> 010 020 030 040 050 060 070 080 090 100 is the first page.
>> >> >> >> 110 120 130 140 150 160 170 180 190 200 is the second one.
>> >> >> >>
>> >> >> >> Now, if someone insert 101... If will be just after 100 and
>> >> >> >> before
>> >> >> >> 110.
>> >> >> >>
>> >> >> > Anil: Instead of scanning from 010 to 100, scan from 010 to 110.
>> >> >> > Then
>> >> >> > we
>> >> >> > wont have this problem. So, i mean to say that
>> >> >> > startRow(firstRowKeyofPage(N)) and
>> >> >> > stopRow(firstRowKeyofPage(N+1)).
>> >> >> > This
>> >> >> > would fix it. Also, in that case number of results might exceed
>> >> >> > the
>> >> >> > pageSize. So you might need to handle this logic.
>> >> >> >
>> >> >> >>
>> >> >> >> When you will display 10 rows starting at 010 you will stop just
>> >> >> >> before 101... And for the next page you will start at 110... And
>> >> >> >> 101
>> >> >> >> will never be displayed...
>> >> >> >>
>> >> >> >> HTH
>> >> >> >>
>> >> >> >> JM
>> >> >> >>
>> >> >> >> 2013/1/25, Mohammad Tariq <do...@gmail.com>:
>> >> >> >> > Hello sir,
>> >> >> >> >
>> >> >> >> >       While paging through, store the startkey of the current
>> >> >> >> > page
>> >> >> >> > of
>> >> >> >> > 25
>> >> >> >> > rows
>> >> >> >> > in a separate byte[]. Now, if you want to come back to this
>> >> >> >> > page
>> >> >> >> > when
>> >> >> >> > you
>> >> >> >> > are at the next page do a range query where  startkey would be
>> >> >> >> > the
>> >> >> >> > rowkey
>> >> >> >> > you had stored earlier and the endkey would be the startrowkey
>> >> >> >> > of
>> >> >> >>  current
>> >> >> >> > page. You have to store just one rowkey each time you show a
>> page
>> >> >> using
>> >> >> >> > which you could come back to this page when you are at the
>> >> >> >> > next
>> >> >> >> > page.
>> >> >> >> >
>> >> >> >> > However, this approach will fail in a case where your user
>> >> >> >> > would
>> >> >> >> > like
>> >> >> >> > to
>> >> >> >> go
>> >> >> >> > to a particular previous page.
>> >> >> >> >
>> >> >> >> > Warm Regards,
>> >> >> >> > Tariq
>> >> >> >> > https://mtariq.jux.com/
>> >> >> >> > cloudfront.blogspot.com
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan
>> >> >> >> > <vi...@scaligent.com>
>> >> >> >> > wrote:
>> >> >> >> >
>> >> >> >> >> I'm displaying rows of data from a HBase table in a data grid
>> >> >> >> >> UI.
>> >> >> >> >> The
>> >> >> >> >> grid
>> >> >> >> >> shows 25 rows at a time i.e. it is paginated. User can click
>> >> >> >> >> on
>> >> >> >> >> Next/Previous to paginate through the data 25 rows at a time.
>> >> >> >> >> I
>> >> can
>> >> >> >> >> implement Next easily by setting a HBase
>> >> >> >> >> org.apache.hadoop.hbase.filter.PageFilter and setting
>> >> >> >> >> startRow
>> >> >> >> >> on
>> >> >> >> >> the
>> >> >> >> >> org.apache.hadoop.hbase.client.Scan to be the row id of the
>> next
>> >> >> >> >> batch's
>> >> >> >> >> row that is sent to the UI with the previous batch. However,
>> >> >> >> >> I
>> >> >> >> >> can't
>> >> >> >> seem
>> >> >> >> >> to be able to do the same with Previous. I can set the endRow
>> on
>> >> >> >> >> the
>> >> >> >> Scan
>> >> >> >> >> to be the row id of the last row of the previous batch but
>> since
>> >> >> HBase
>> >> >> >> >> Scans are always in the forward direction, there is no way to
>> >> >> >> >> set
>> >> a
>> >> >> >> >> PageFilter that can get 25 rows ending at a particular row.
>> >> >> >> >> The
>> >> >> >> >> only
>> >> >> >> >> option
>> >> >> >> >> seems to be to get *all* rows up to the end row and filter
>> >> >> >> >> out
>> >> >> >> >> all
>> >> >> but
>> >> >> >> >> the
>> >> >> >> >> last 25 in the caller, which seems very inefficient. Any
>> >> >> >> >> ideas
>> >> >> >> >> on
>> >> >> >> >> how
>> >> >> >> >> this
>> >> >> >> >> can be done efficiently?
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> -Vijay
>> >> >> >> >>
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Thanks & Regards,
>> >> >> > Anil Gupta
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Thanks & Regards,
>> >> > Anil Gupta
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks & Regards,
>> > Anil Gupta
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Pagination with HBase - getting previous page of data

Posted by anil gupta <an...@gmail.com>.

Yes, your suggested solution only works on RowKey based pagination. It will
fail when you start filtering on the basis of columns.

Still, i would say it's comparatively easier to maintain this at
Application level rather than creating tables for pagination.

What if you have 300 columns in your schema. Will you create 300 tables?
What about handling of pagination when filtering is done based on multiple
columns ("and" and "or" conditions)?

On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> No, no killer solution here ;)
>
> But I'm still thinking about that because I might have to implement
> some pagination options soon...
>
> As you are saying, it's only working on the row-key, but if you want
> to do the same-thing on non-rowkey, you might have to create a
> secondary index table...
>
> JM
>
> 2013/1/27, anil gupta <an...@gmail.com>:
> > That's alright..I thought that you have come-up with a killer solution.
> So,
> > got curious to hear your ideas. ;)
> > It seems like your below mentioned solution will not work on filtering on
> > non row-key columns since when you are deciding the page numbers you are
> > only considering rowkey.
> >
> > Thanks,
> > Anil
> >
> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> Hi Anil,
> >>
> >> I don't have a solution. I never tought about that ;) But I was
> >> thinking about something like you create a 2nd table where you place
> >> the raw number (4 bytes) then the raw key. You go directly to a
> >> specific page, you query by the number, found the key, and you know
> >> where to start you scan in the main table.
> >>
> >> The issue is properly the number for each lines since with a MR you
> >> don't know where you are from the beginning. But you can built
> >> something where you store the line number from the beginning of the
> >> region, then when all regions are parsed you can reconstruct the total
> >> numbering... That should work...
> >>
> >> JM
> >>
> >> 2013/1/25, anil gupta <an...@gmail.com>:
> >> > Inline...
> >> >
> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org> wrote:
> >> >
> >> >> Hi Anil,
> >> >>
> >> >> The issue is that all the other sub-sequent page start should be
> moved
> >> >> too...
> >> >>
> >> > Yes, this is a possibility. Hence the Developer has to take care of
> >> > this
> >> > case. It might also be possible that the pageSize is not a hard limit
> >> > on
> >> > number of results(more like a hint or suggestion on size). I would say
> >> > it
> >> > varies by use case.
> >> >
> >> >>
> >> >> so if you want to jump directly to page n, you might be totally
> >> >> shifted because of all the data inserted in the meantime...
> >> >>
> >> >> If you want a real complete pagination feature, you might want to
> have
> >> >> a coproccessor or a MR updating another table refering to the
> >> >> pages....
> >> >>
> >> > Well, the solution depends on the use case. I will be doing pagination
> >> > in
> >> > HBase for a restful service but till now i am unable to find any
> reason
> >> why
> >> > this cant be done at application level.
> >> > Are you suggesting to use MR for paging in HBase? If yes, how?
> >> > How would you use another table for pagination?what would you store in
> >> the
> >> > extra table?
> >> >
> >> >>
> >> >> JM
> >> >>
> >> >> 2013/1/25, anil gupta <an...@gmail.com>:
> >> >> > Hi Vijay,
> >> >> >
> >> >> > I've done paging in HBase by using Scan only(no pagination filter)
> >> >> > as
> >> >> > Mohammed has explained. However it was just an experimental stuff.
> >> >> > It
> >> >> works
> >> >> > but Jean raised a very good point.
> >> >> > Find my answer inline to fix the problem that Jean reported.
> >> >> >
> >> >> >
> >> >> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari <
> >> >> > jean-marc@spaggiari.org> wrote:
> >> >> >
> >> >> >> Hi Vijay,
> >> >> >>
> >> >> >> If, while the user os scrolling forward, you store the key of each
> >> >> >> page, then you will be able to go back to a specific page, and
> jump
> >> >> >> forward back up to where he was.
> >> >> >>
> >> >> >> The only issue is that, if while the user is scrolling the table,
> >> >> >> someone insert a row between the last of a page, and the first of
> >> >> >> the
> >> >> >> next page, you will never see this row.
> >> >> >>
> >> >> >> Let's take this exemaple.
> >> >> >>
> >> >> >> You have 10 items per page.
> >> >> >>
> >> >> >> 010 020 030 040 050 060 070 080 090 100 is the first page.
> >> >> >> 110 120 130 140 150 160 170 180 190 200 is the second one.
> >> >> >>
> >> >> >> Now, if someone insert 101... If will be just after 100 and before
> >> >> >> 110.
> >> >> >>
> >> >> > Anil: Instead of scanning from 010 to 100, scan from 010 to 110.
> >> >> > Then
> >> >> > we
> >> >> > wont have this problem. So, i mean to say that
> >> >> > startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)).
> >> >> > This
> >> >> > would fix it. Also, in that case number of results might exceed the
> >> >> > pageSize. So you might need to handle this logic.
> >> >> >
> >> >> >>
> >> >> >> When you will display 10 rows starting at 010 you will stop just
> >> >> >> before 101... And for the next page you will start at 110... And
> >> >> >> 101
> >> >> >> will never be displayed...
> >> >> >>
> >> >> >> HTH
> >> >> >>
> >> >> >> JM
> >> >> >>
> >> >> >> 2013/1/25, Mohammad Tariq <do...@gmail.com>:
> >> >> >> > Hello sir,
> >> >> >> >
> >> >> >> >       While paging through, store the startkey of the current
> >> >> >> > page
> >> >> >> > of
> >> >> >> > 25
> >> >> >> > rows
> >> >> >> > in a separate byte[]. Now, if you want to come back to this page
> >> >> >> > when
> >> >> >> > you
> >> >> >> > are at the next page do a range query where  startkey would be
> >> >> >> > the
> >> >> >> > rowkey
> >> >> >> > you had stored earlier and the endkey would be the startrowkey
> >> >> >> > of
> >> >> >>  current
> >> >> >> > page. You have to store just one rowkey each time you show a
> page
> >> >> using
> >> >> >> > which you could come back to this page when you are at the next
> >> >> >> > page.
> >> >> >> >
> >> >> >> > However, this approach will fail in a case where your user would
> >> >> >> > like
> >> >> >> > to
> >> >> >> go
> >> >> >> > to a particular previous page.
> >> >> >> >
> >> >> >> > Warm Regards,
> >> >> >> > Tariq
> >> >> >> > https://mtariq.jux.com/
> >> >> >> > cloudfront.blogspot.com
> >> >> >> >
> >> >> >> >
> >> >> >> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan
> >> >> >> > <vi...@scaligent.com>
> >> >> >> > wrote:
> >> >> >> >
> >> >> >> >> I'm displaying rows of data from a HBase table in a data grid
> >> >> >> >> UI.
> >> >> >> >> The
> >> >> >> >> grid
> >> >> >> >> shows 25 rows at a time i.e. it is paginated. User can click on
> >> >> >> >> Next/Previous to paginate through the data 25 rows at a time. I
> >> can
> >> >> >> >> implement Next easily by setting a HBase
> >> >> >> >> org.apache.hadoop.hbase.filter.PageFilter and setting startRow
> >> >> >> >> on
> >> >> >> >> the
> >> >> >> >> org.apache.hadoop.hbase.client.Scan to be the row id of the
> next
> >> >> >> >> batch's
> >> >> >> >> row that is sent to the UI with the previous batch. However, I
> >> >> >> >> can't
> >> >> >> seem
> >> >> >> >> to be able to do the same with Previous. I can set the endRow
> on
> >> >> >> >> the
> >> >> >> Scan
> >> >> >> >> to be the row id of the last row of the previous batch but
> since
> >> >> HBase
> >> >> >> >> Scans are always in the forward direction, there is no way to
> >> >> >> >> set
> >> a
> >> >> >> >> PageFilter that can get 25 rows ending at a particular row. The
> >> >> >> >> only
> >> >> >> >> option
> >> >> >> >> seems to be to get *all* rows up to the end row and filter out
> >> >> >> >> all
> >> >> but
> >> >> >> >> the
> >> >> >> >> last 25 in the caller, which seems very inefficient. Any ideas
> >> >> >> >> on
> >> >> >> >> how
> >> >> >> >> this
> >> >> >> >> can be done efficiently?
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> -Vijay
> >> >> >> >>
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks & Regards,
> >> >> > Anil Gupta
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks & Regards,
> >> > Anil Gupta
> >> >
> >>
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Pagination with HBase - getting previous page of data

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

No, no killer solution here ;)

But I'm still thinking about that because I might have to implement
some pagination options soon...

As you are saying, it's only working on the row-key, but if you want
to do the same-thing on non-rowkey, you might have to create a
secondary index table...

JM

2013/1/27, anil gupta <an...@gmail.com>:
> That's alright..I thought that you have come-up with a killer solution. So,
> got curious to hear your ideas. ;)
> It seems like your below mentioned solution will not work on filtering on
> non row-key columns since when you are deciding the page numbers you are
> only considering rowkey.
>
> Thanks,
> Anil
>
> On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi Anil,
>>
>> I don't have a solution. I never tought about that ;) But I was
>> thinking about something like you create a 2nd table where you place
>> the raw number (4 bytes) then the raw key. You go directly to a
>> specific page, you query by the number, found the key, and you know
>> where to start you scan in the main table.
>>
>> The issue is properly the number for each lines since with a MR you
>> don't know where you are from the beginning. But you can built
>> something where you store the line number from the beginning of the
>> region, then when all regions are parsed you can reconstruct the total
>> numbering... That should work...
>>
>> JM
>>
>> 2013/1/25, anil gupta <an...@gmail.com>:
>> > Inline...
>> >
>> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> >> Hi Anil,
>> >>
>> >> The issue is that all the other sub-sequent page start should be moved
>> >> too...
>> >>
>> > Yes, this is a possibility. Hence the Developer has to take care of
>> > this
>> > case. It might also be possible that the pageSize is not a hard limit
>> > on
>> > number of results(more like a hint or suggestion on size). I would say
>> > it
>> > varies by use case.
>> >
>> >>
>> >> so if you want to jump directly to page n, you might be totally
>> >> shifted because of all the data inserted in the meantime...
>> >>
>> >> If you want a real complete pagination feature, you might want to have
>> >> a coproccessor or a MR updating another table refering to the
>> >> pages....
>> >>
>> > Well, the solution depends on the use case. I will be doing pagination
>> > in
>> > HBase for a restful service but till now i am unable to find any reason
>> why
>> > this cant be done at application level.
>> > Are you suggesting to use MR for paging in HBase? If yes, how?
>> > How would you use another table for pagination?what would you store in
>> the
>> > extra table?
>> >
>> >>
>> >> JM
>> >>
>> >> 2013/1/25, anil gupta <an...@gmail.com>:
>> >> > Hi Vijay,
>> >> >
>> >> > I've done paging in HBase by using Scan only(no pagination filter)
>> >> > as
>> >> > Mohammed has explained. However it was just an experimental stuff.
>> >> > It
>> >> works
>> >> > but Jean raised a very good point.
>> >> > Find my answer inline to fix the problem that Jean reported.
>> >> >
>> >> >
>> >> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari <
>> >> > jean-marc@spaggiari.org> wrote:
>> >> >
>> >> >> Hi Vijay,
>> >> >>
>> >> >> If, while the user os scrolling forward, you store the key of each
>> >> >> page, then you will be able to go back to a specific page, and jump
>> >> >> forward back up to where he was.
>> >> >>
>> >> >> The only issue is that, if while the user is scrolling the table,
>> >> >> someone insert a row between the last of a page, and the first of
>> >> >> the
>> >> >> next page, you will never see this row.
>> >> >>
>> >> >> Let's take this exemaple.
>> >> >>
>> >> >> You have 10 items per page.
>> >> >>
>> >> >> 010 020 030 040 050 060 070 080 090 100 is the first page.
>> >> >> 110 120 130 140 150 160 170 180 190 200 is the second one.
>> >> >>
>> >> >> Now, if someone insert 101... If will be just after 100 and before
>> >> >> 110.
>> >> >>
>> >> > Anil: Instead of scanning from 010 to 100, scan from 010 to 110.
>> >> > Then
>> >> > we
>> >> > wont have this problem. So, i mean to say that
>> >> > startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)).
>> >> > This
>> >> > would fix it. Also, in that case number of results might exceed the
>> >> > pageSize. So you might need to handle this logic.
>> >> >
>> >> >>
>> >> >> When you will display 10 rows starting at 010 you will stop just
>> >> >> before 101... And for the next page you will start at 110... And
>> >> >> 101
>> >> >> will never be displayed...
>> >> >>
>> >> >> HTH
>> >> >>
>> >> >> JM
>> >> >>
>> >> >> 2013/1/25, Mohammad Tariq <do...@gmail.com>:
>> >> >> > Hello sir,
>> >> >> >
>> >> >> >       While paging through, store the startkey of the current
>> >> >> > page
>> >> >> > of
>> >> >> > 25
>> >> >> > rows
>> >> >> > in a separate byte[]. Now, if you want to come back to this page
>> >> >> > when
>> >> >> > you
>> >> >> > are at the next page do a range query where  startkey would be
>> >> >> > the
>> >> >> > rowkey
>> >> >> > you had stored earlier and the endkey would be the startrowkey
>> >> >> > of
>> >> >>  current
>> >> >> > page. You have to store just one rowkey each time you show a page
>> >> using
>> >> >> > which you could come back to this page when you are at the next
>> >> >> > page.
>> >> >> >
>> >> >> > However, this approach will fail in a case where your user would
>> >> >> > like
>> >> >> > to
>> >> >> go
>> >> >> > to a particular previous page.
>> >> >> >
>> >> >> > Warm Regards,
>> >> >> > Tariq
>> >> >> > https://mtariq.jux.com/
>> >> >> > cloudfront.blogspot.com
>> >> >> >
>> >> >> >
>> >> >> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan
>> >> >> > <vi...@scaligent.com>
>> >> >> > wrote:
>> >> >> >
>> >> >> >> I'm displaying rows of data from a HBase table in a data grid
>> >> >> >> UI.
>> >> >> >> The
>> >> >> >> grid
>> >> >> >> shows 25 rows at a time i.e. it is paginated. User can click on
>> >> >> >> Next/Previous to paginate through the data 25 rows at a time. I
>> can
>> >> >> >> implement Next easily by setting a HBase
>> >> >> >> org.apache.hadoop.hbase.filter.PageFilter and setting startRow
>> >> >> >> on
>> >> >> >> the
>> >> >> >> org.apache.hadoop.hbase.client.Scan to be the row id of the next
>> >> >> >> batch's
>> >> >> >> row that is sent to the UI with the previous batch. However, I
>> >> >> >> can't
>> >> >> seem
>> >> >> >> to be able to do the same with Previous. I can set the endRow on
>> >> >> >> the
>> >> >> Scan
>> >> >> >> to be the row id of the last row of the previous batch but since
>> >> HBase
>> >> >> >> Scans are always in the forward direction, there is no way to
>> >> >> >> set
>> a
>> >> >> >> PageFilter that can get 25 rows ending at a particular row. The
>> >> >> >> only
>> >> >> >> option
>> >> >> >> seems to be to get *all* rows up to the end row and filter out
>> >> >> >> all
>> >> but
>> >> >> >> the
>> >> >> >> last 25 in the caller, which seems very inefficient. Any ideas
>> >> >> >> on
>> >> >> >> how
>> >> >> >> this
>> >> >> >> can be done efficiently?
>> >> >> >>
>> >> >> >> --
>> >> >> >> -Vijay
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Thanks & Regards,
>> >> > Anil Gupta
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks & Regards,
>> > Anil Gupta
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Pagination with HBase - getting previous page of data

Posted by anil gupta <an...@gmail.com>.

That's alright..I thought that you have come-up with a killer solution. So,
got curious to hear your ideas. ;)
It seems like your below mentioned solution will not work on filtering on
non row-key columns since when you are deciding the page numbers you are
only considering rowkey.

Thanks,
Anil

On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Anil,
>
> I don't have a solution. I never tought about that ;) But I was
> thinking about something like you create a 2nd table where you place
> the raw number (4 bytes) then the raw key. You go directly to a
> specific page, you query by the number, found the key, and you know
> where to start you scan in the main table.
>
> The issue is properly the number for each lines since with a MR you
> don't know where you are from the beginning. But you can built
> something where you store the line number from the beginning of the
> region, then when all regions are parsed you can reconstruct the total
> numbering... That should work...
>
> JM
>
> 2013/1/25, anil gupta <an...@gmail.com>:
> > Inline...
> >
> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> Hi Anil,
> >>
> >> The issue is that all the other sub-sequent page start should be moved
> >> too...
> >>
> > Yes, this is a possibility. Hence the Developer has to take care of this
> > case. It might also be possible that the pageSize is not a hard limit on
> > number of results(more like a hint or suggestion on size). I would say it
> > varies by use case.
> >
> >>
> >> so if you want to jump directly to page n, you might be totally
> >> shifted because of all the data inserted in the meantime...
> >>
> >> If you want a real complete pagination feature, you might want to have
> >> a coproccessor or a MR updating another table refering to the
> >> pages....
> >>
> > Well, the solution depends on the use case. I will be doing pagination in
> > HBase for a restful service but till now i am unable to find any reason
> why
> > this cant be done at application level.
> > Are you suggesting to use MR for paging in HBase? If yes, how?
> > How would you use another table for pagination?what would you store in
> the
> > extra table?
> >
> >>
> >> JM
> >>
> >> 2013/1/25, anil gupta <an...@gmail.com>:
> >> > Hi Vijay,
> >> >
> >> > I've done paging in HBase by using Scan only(no pagination filter) as
> >> > Mohammed has explained. However it was just an experimental stuff. It
> >> works
> >> > but Jean raised a very good point.
> >> > Find my answer inline to fix the problem that Jean reported.
> >> >
> >> >
> >> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org> wrote:
> >> >
> >> >> Hi Vijay,
> >> >>
> >> >> If, while the user os scrolling forward, you store the key of each
> >> >> page, then you will be able to go back to a specific page, and jump
> >> >> forward back up to where he was.
> >> >>
> >> >> The only issue is that, if while the user is scrolling the table,
> >> >> someone insert a row between the last of a page, and the first of the
> >> >> next page, you will never see this row.
> >> >>
> >> >> Let's take this exemaple.
> >> >>
> >> >> You have 10 items per page.
> >> >>
> >> >> 010 020 030 040 050 060 070 080 090 100 is the first page.
> >> >> 110 120 130 140 150 160 170 180 190 200 is the second one.
> >> >>
> >> >> Now, if someone insert 101... If will be just after 100 and before
> >> >> 110.
> >> >>
> >> > Anil: Instead of scanning from 010 to 100, scan from 010 to 110. Then
> >> > we
> >> > wont have this problem. So, i mean to say that
> >> > startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)).
> >> > This
> >> > would fix it. Also, in that case number of results might exceed the
> >> > pageSize. So you might need to handle this logic.
> >> >
> >> >>
> >> >> When you will display 10 rows starting at 010 you will stop just
> >> >> before 101... And for the next page you will start at 110... And 101
> >> >> will never be displayed...
> >> >>
> >> >> HTH
> >> >>
> >> >> JM
> >> >>
> >> >> 2013/1/25, Mohammad Tariq <do...@gmail.com>:
> >> >> > Hello sir,
> >> >> >
> >> >> >       While paging through, store the startkey of the current page
> >> >> > of
> >> >> > 25
> >> >> > rows
> >> >> > in a separate byte[]. Now, if you want to come back to this page
> >> >> > when
> >> >> > you
> >> >> > are at the next page do a range query where  startkey would be the
> >> >> > rowkey
> >> >> > you had stored earlier and the endkey would be the startrowkey  of
> >> >>  current
> >> >> > page. You have to store just one rowkey each time you show a page
> >> using
> >> >> > which you could come back to this page when you are at the next
> >> >> > page.
> >> >> >
> >> >> > However, this approach will fail in a case where your user would
> >> >> > like
> >> >> > to
> >> >> go
> >> >> > to a particular previous page.
> >> >> >
> >> >> > Warm Regards,
> >> >> > Tariq
> >> >> > https://mtariq.jux.com/
> >> >> > cloudfront.blogspot.com
> >> >> >
> >> >> >
> >> >> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan
> >> >> > <vi...@scaligent.com>
> >> >> > wrote:
> >> >> >
> >> >> >> I'm displaying rows of data from a HBase table in a data grid UI.
> >> >> >> The
> >> >> >> grid
> >> >> >> shows 25 rows at a time i.e. it is paginated. User can click on
> >> >> >> Next/Previous to paginate through the data 25 rows at a time. I
> can
> >> >> >> implement Next easily by setting a HBase
> >> >> >> org.apache.hadoop.hbase.filter.PageFilter and setting startRow on
> >> >> >> the
> >> >> >> org.apache.hadoop.hbase.client.Scan to be the row id of the next
> >> >> >> batch's
> >> >> >> row that is sent to the UI with the previous batch. However, I
> >> >> >> can't
> >> >> seem
> >> >> >> to be able to do the same with Previous. I can set the endRow on
> >> >> >> the
> >> >> Scan
> >> >> >> to be the row id of the last row of the previous batch but since
> >> HBase
> >> >> >> Scans are always in the forward direction, there is no way to set
> a
> >> >> >> PageFilter that can get 25 rows ending at a particular row. The
> >> >> >> only
> >> >> >> option
> >> >> >> seems to be to get *all* rows up to the end row and filter out all
> >> but
> >> >> >> the
> >> >> >> last 25 in the caller, which seems very inefficient. Any ideas on
> >> >> >> how
> >> >> >> this
> >> >> >> can be done efficiently?
> >> >> >>
> >> >> >> --
> >> >> >> -Vijay
> >> >> >>
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks & Regards,
> >> > Anil Gupta
> >> >
> >>
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Pagination with HBase - getting previous page of data

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Anil,

I don't have a solution. I never tought about that ;) But I was
thinking about something like you create a 2nd table where you place
the raw number (4 bytes) then the raw key. You go directly to a
specific page, you query by the number, found the key, and you know
where to start you scan in the main table.

The issue is properly the number for each lines since with a MR you
don't know where you are from the beginning. But you can built
something where you store the line number from the beginning of the
region, then when all regions are parsed you can reconstruct the total
numbering... That should work...

JM

2013/1/25, anil gupta <an...@gmail.com>:
> Inline...
>
> On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi Anil,
>>
>> The issue is that all the other sub-sequent page start should be moved
>> too...
>>
> Yes, this is a possibility. Hence the Developer has to take care of this
> case. It might also be possible that the pageSize is not a hard limit on
> number of results(more like a hint or suggestion on size). I would say it
> varies by use case.
>
>>
>> so if you want to jump directly to page n, you might be totally
>> shifted because of all the data inserted in the meantime...
>>
>> If you want a real complete pagination feature, you might want to have
>> a coproccessor or a MR updating another table refering to the
>> pages....
>>
> Well, the solution depends on the use case. I will be doing pagination in
> HBase for a restful service but till now i am unable to find any reason why
> this cant be done at application level.
> Are you suggesting to use MR for paging in HBase? If yes, how?
> How would you use another table for pagination?what would you store in the
> extra table?
>
>>
>> JM
>>
>> 2013/1/25, anil gupta <an...@gmail.com>:
>> > Hi Vijay,
>> >
>> > I've done paging in HBase by using Scan only(no pagination filter) as
>> > Mohammed has explained. However it was just an experimental stuff. It
>> works
>> > but Jean raised a very good point.
>> > Find my answer inline to fix the problem that Jean reported.
>> >
>> >
>> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> >> Hi Vijay,
>> >>
>> >> If, while the user os scrolling forward, you store the key of each
>> >> page, then you will be able to go back to a specific page, and jump
>> >> forward back up to where he was.
>> >>
>> >> The only issue is that, if while the user is scrolling the table,
>> >> someone insert a row between the last of a page, and the first of the
>> >> next page, you will never see this row.
>> >>
>> >> Let's take this exemaple.
>> >>
>> >> You have 10 items per page.
>> >>
>> >> 010 020 030 040 050 060 070 080 090 100 is the first page.
>> >> 110 120 130 140 150 160 170 180 190 200 is the second one.
>> >>
>> >> Now, if someone insert 101... If will be just after 100 and before
>> >> 110.
>> >>
>> > Anil: Instead of scanning from 010 to 100, scan from 010 to 110. Then
>> > we
>> > wont have this problem. So, i mean to say that
>> > startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)).
>> > This
>> > would fix it. Also, in that case number of results might exceed the
>> > pageSize. So you might need to handle this logic.
>> >
>> >>
>> >> When you will display 10 rows starting at 010 you will stop just
>> >> before 101... And for the next page you will start at 110... And 101
>> >> will never be displayed...
>> >>
>> >> HTH
>> >>
>> >> JM
>> >>
>> >> 2013/1/25, Mohammad Tariq <do...@gmail.com>:
>> >> > Hello sir,
>> >> >
>> >> >       While paging through, store the startkey of the current page
>> >> > of
>> >> > 25
>> >> > rows
>> >> > in a separate byte[]. Now, if you want to come back to this page
>> >> > when
>> >> > you
>> >> > are at the next page do a range query where  startkey would be the
>> >> > rowkey
>> >> > you had stored earlier and the endkey would be the startrowkey  of
>> >>  current
>> >> > page. You have to store just one rowkey each time you show a page
>> using
>> >> > which you could come back to this page when you are at the next
>> >> > page.
>> >> >
>> >> > However, this approach will fail in a case where your user would
>> >> > like
>> >> > to
>> >> go
>> >> > to a particular previous page.
>> >> >
>> >> > Warm Regards,
>> >> > Tariq
>> >> > https://mtariq.jux.com/
>> >> > cloudfront.blogspot.com
>> >> >
>> >> >
>> >> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan
>> >> > <vi...@scaligent.com>
>> >> > wrote:
>> >> >
>> >> >> I'm displaying rows of data from a HBase table in a data grid UI.
>> >> >> The
>> >> >> grid
>> >> >> shows 25 rows at a time i.e. it is paginated. User can click on
>> >> >> Next/Previous to paginate through the data 25 rows at a time. I can
>> >> >> implement Next easily by setting a HBase
>> >> >> org.apache.hadoop.hbase.filter.PageFilter and setting startRow on
>> >> >> the
>> >> >> org.apache.hadoop.hbase.client.Scan to be the row id of the next
>> >> >> batch's
>> >> >> row that is sent to the UI with the previous batch. However, I
>> >> >> can't
>> >> seem
>> >> >> to be able to do the same with Previous. I can set the endRow on
>> >> >> the
>> >> Scan
>> >> >> to be the row id of the last row of the previous batch but since
>> HBase
>> >> >> Scans are always in the forward direction, there is no way to set a
>> >> >> PageFilter that can get 25 rows ending at a particular row. The
>> >> >> only
>> >> >> option
>> >> >> seems to be to get *all* rows up to the end row and filter out all
>> but
>> >> >> the
>> >> >> last 25 in the caller, which seems very inefficient. Any ideas on
>> >> >> how
>> >> >> this
>> >> >> can be done efficiently?
>> >> >>
>> >> >> --
>> >> >> -Vijay
>> >> >>
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks & Regards,
>> > Anil Gupta
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Pagination with HBase - getting previous page of data

Posted by anil gupta <an...@gmail.com>.

Inline...

On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Anil,
>
> The issue is that all the other sub-sequent page start should be moved
> too...
>
Yes, this is a possibility. Hence the Developer has to take care of this
case. It might also be possible that the pageSize is not a hard limit on
number of results(more like a hint or suggestion on size). I would say it
varies by use case.

>
> so if you want to jump directly to page n, you might be totally
> shifted because of all the data inserted in the meantime...
>
> If you want a real complete pagination feature, you might want to have
> a coproccessor or a MR updating another table refering to the
> pages....
>
Well, the solution depends on the use case. I will be doing pagination in
HBase for a restful service but till now i am unable to find any reason why
this cant be done at application level.
Are you suggesting to use MR for paging in HBase? If yes, how?
How would you use another table for pagination?what would you store in the
extra table?

>
> JM
>
> 2013/1/25, anil gupta <an...@gmail.com>:
> > Hi Vijay,
> >
> > I've done paging in HBase by using Scan only(no pagination filter) as
> > Mohammed has explained. However it was just an experimental stuff. It
> works
> > but Jean raised a very good point.
> > Find my answer inline to fix the problem that Jean reported.
> >
> >
> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> Hi Vijay,
> >>
> >> If, while the user os scrolling forward, you store the key of each
> >> page, then you will be able to go back to a specific page, and jump
> >> forward back up to where he was.
> >>
> >> The only issue is that, if while the user is scrolling the table,
> >> someone insert a row between the last of a page, and the first of the
> >> next page, you will never see this row.
> >>
> >> Let's take this exemaple.
> >>
> >> You have 10 items per page.
> >>
> >> 010 020 030 040 050 060 070 080 090 100 is the first page.
> >> 110 120 130 140 150 160 170 180 190 200 is the second one.
> >>
> >> Now, if someone insert 101... If will be just after 100 and before 110.
> >>
> > Anil: Instead of scanning from 010 to 100, scan from 010 to 110. Then we
> > wont have this problem. So, i mean to say that
> > startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)). This
> > would fix it. Also, in that case number of results might exceed the
> > pageSize. So you might need to handle this logic.
> >
> >>
> >> When you will display 10 rows starting at 010 you will stop just
> >> before 101... And for the next page you will start at 110... And 101
> >> will never be displayed...
> >>
> >> HTH
> >>
> >> JM
> >>
> >> 2013/1/25, Mohammad Tariq <do...@gmail.com>:
> >> > Hello sir,
> >> >
> >> >       While paging through, store the startkey of the current page of
> >> > 25
> >> > rows
> >> > in a separate byte[]. Now, if you want to come back to this page when
> >> > you
> >> > are at the next page do a range query where  startkey would be the
> >> > rowkey
> >> > you had stored earlier and the endkey would be the startrowkey  of
> >>  current
> >> > page. You have to store just one rowkey each time you show a page
> using
> >> > which you could come back to this page when you are at the next page.
> >> >
> >> > However, this approach will fail in a case where your user would like
> >> > to
> >> go
> >> > to a particular previous page.
> >> >
> >> > Warm Regards,
> >> > Tariq
> >> > https://mtariq.jux.com/
> >> > cloudfront.blogspot.com
> >> >
> >> >
> >> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan <vi...@scaligent.com>
> >> > wrote:
> >> >
> >> >> I'm displaying rows of data from a HBase table in a data grid UI. The
> >> >> grid
> >> >> shows 25 rows at a time i.e. it is paginated. User can click on
> >> >> Next/Previous to paginate through the data 25 rows at a time. I can
> >> >> implement Next easily by setting a HBase
> >> >> org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the
> >> >> org.apache.hadoop.hbase.client.Scan to be the row id of the next
> >> >> batch's
> >> >> row that is sent to the UI with the previous batch. However, I can't
> >> seem
> >> >> to be able to do the same with Previous. I can set the endRow on the
> >> Scan
> >> >> to be the row id of the last row of the previous batch but since
> HBase
> >> >> Scans are always in the forward direction, there is no way to set a
> >> >> PageFilter that can get 25 rows ending at a particular row. The only
> >> >> option
> >> >> seems to be to get *all* rows up to the end row and filter out all
> but
> >> >> the
> >> >> last 25 in the caller, which seems very inefficient. Any ideas on how
> >> >> this
> >> >> can be done efficiently?
> >> >>
> >> >> --
> >> >> -Vijay
> >> >>
> >> >
> >>
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Pagination with HBase - getting previous page of data

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Anil,

The issue is that all the other sub-sequent page start should be moved too...

so if you want to jump directly to page n, you might be totally
shifted because of all the data inserted in the meantime...

If you want a real complete pagination feature, you might want to have
a coproccessor or a MR updating another table refering to the
pages....

JM

2013/1/25, anil gupta <an...@gmail.com>:
> Hi Vijay,
>
> I've done paging in HBase by using Scan only(no pagination filter) as
> Mohammed has explained. However it was just an experimental stuff. It works
> but Jean raised a very good point.
> Find my answer inline to fix the problem that Jean reported.
>
>
> On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi Vijay,
>>
>> If, while the user os scrolling forward, you store the key of each
>> page, then you will be able to go back to a specific page, and jump
>> forward back up to where he was.
>>
>> The only issue is that, if while the user is scrolling the table,
>> someone insert a row between the last of a page, and the first of the
>> next page, you will never see this row.
>>
>> Let's take this exemaple.
>>
>> You have 10 items per page.
>>
>> 010 020 030 040 050 060 070 080 090 100 is the first page.
>> 110 120 130 140 150 160 170 180 190 200 is the second one.
>>
>> Now, if someone insert 101... If will be just after 100 and before 110.
>>
> Anil: Instead of scanning from 010 to 100, scan from 010 to 110. Then we
> wont have this problem. So, i mean to say that
> startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)). This
> would fix it. Also, in that case number of results might exceed the
> pageSize. So you might need to handle this logic.
>
>>
>> When you will display 10 rows starting at 010 you will stop just
>> before 101... And for the next page you will start at 110... And 101
>> will never be displayed...
>>
>> HTH
>>
>> JM
>>
>> 2013/1/25, Mohammad Tariq <do...@gmail.com>:
>> > Hello sir,
>> >
>> >       While paging through, store the startkey of the current page of
>> > 25
>> > rows
>> > in a separate byte[]. Now, if you want to come back to this page when
>> > you
>> > are at the next page do a range query where  startkey would be the
>> > rowkey
>> > you had stored earlier and the endkey would be the startrowkey  of
>>  current
>> > page. You have to store just one rowkey each time you show a page using
>> > which you could come back to this page when you are at the next page.
>> >
>> > However, this approach will fail in a case where your user would like
>> > to
>> go
>> > to a particular previous page.
>> >
>> > Warm Regards,
>> > Tariq
>> > https://mtariq.jux.com/
>> > cloudfront.blogspot.com
>> >
>> >
>> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan <vi...@scaligent.com>
>> > wrote:
>> >
>> >> I'm displaying rows of data from a HBase table in a data grid UI. The
>> >> grid
>> >> shows 25 rows at a time i.e. it is paginated. User can click on
>> >> Next/Previous to paginate through the data 25 rows at a time. I can
>> >> implement Next easily by setting a HBase
>> >> org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the
>> >> org.apache.hadoop.hbase.client.Scan to be the row id of the next
>> >> batch's
>> >> row that is sent to the UI with the previous batch. However, I can't
>> seem
>> >> to be able to do the same with Previous. I can set the endRow on the
>> Scan
>> >> to be the row id of the last row of the previous batch but since HBase
>> >> Scans are always in the forward direction, there is no way to set a
>> >> PageFilter that can get 25 rows ending at a particular row. The only
>> >> option
>> >> seems to be to get *all* rows up to the end row and filter out all but
>> >> the
>> >> last 25 in the caller, which seems very inefficient. Any ideas on how
>> >> this
>> >> can be done efficiently?
>> >>
>> >> --
>> >> -Vijay
>> >>
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Pagination with HBase - getting previous page of data

Posted by anil gupta <an...@gmail.com>.

Hi Vijay,

I've done paging in HBase by using Scan only(no pagination filter) as
Mohammed has explained. However it was just an experimental stuff. It works
but Jean raised a very good point.
Find my answer inline to fix the problem that Jean reported.


On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Vijay,
>
> If, while the user os scrolling forward, you store the key of each
> page, then you will be able to go back to a specific page, and jump
> forward back up to where he was.
>
> The only issue is that, if while the user is scrolling the table,
> someone insert a row between the last of a page, and the first of the
> next page, you will never see this row.
>
> Let's take this exemaple.
>
> You have 10 items per page.
>
> 010 020 030 040 050 060 070 080 090 100 is the first page.
> 110 120 130 140 150 160 170 180 190 200 is the second one.
>
> Now, if someone insert 101... If will be just after 100 and before 110.
>
Anil: Instead of scanning from 010 to 100, scan from 010 to 110. Then we
wont have this problem. So, i mean to say that
startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)). This
would fix it. Also, in that case number of results might exceed the
pageSize. So you might need to handle this logic.

>
> When you will display 10 rows starting at 010 you will stop just
> before 101... And for the next page you will start at 110... And 101
> will never be displayed...
>
> HTH
>
> JM
>
> 2013/1/25, Mohammad Tariq <do...@gmail.com>:
> > Hello sir,
> >
> >       While paging through, store the startkey of the current page of 25
> > rows
> > in a separate byte[]. Now, if you want to come back to this page when you
> > are at the next page do a range query where  startkey would be the rowkey
> > you had stored earlier and the endkey would be the startrowkey  of
>  current
> > page. You have to store just one rowkey each time you show a page using
> > which you could come back to this page when you are at the next page.
> >
> > However, this approach will fail in a case where your user would like to
> go
> > to a particular previous page.
> >
> > Warm Regards,
> > Tariq
> > https://mtariq.jux.com/
> > cloudfront.blogspot.com
> >
> >
> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan <vi...@scaligent.com>
> > wrote:
> >
> >> I'm displaying rows of data from a HBase table in a data grid UI. The
> >> grid
> >> shows 25 rows at a time i.e. it is paginated. User can click on
> >> Next/Previous to paginate through the data 25 rows at a time. I can
> >> implement Next easily by setting a HBase
> >> org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the
> >> org.apache.hadoop.hbase.client.Scan to be the row id of the next batch's
> >> row that is sent to the UI with the previous batch. However, I can't
> seem
> >> to be able to do the same with Previous. I can set the endRow on the
> Scan
> >> to be the row id of the last row of the previous batch but since HBase
> >> Scans are always in the forward direction, there is no way to set a
> >> PageFilter that can get 25 rows ending at a particular row. The only
> >> option
> >> seems to be to get *all* rows up to the end row and filter out all but
> >> the
> >> last 25 in the caller, which seems very inefficient. Any ideas on how
> >> this
> >> can be done efficiently?
> >>
> >> --
> >> -Vijay
> >>
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Pagination with HBase - getting previous page of data

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Vijay,

If, while the user os scrolling forward, you store the key of each
page, then you will be able to go back to a specific page, and jump
forward back up to where he was.

The only issue is that, if while the user is scrolling the table,
someone insert a row between the last of a page, and the first of the
next page, you will never see this row.

Let's take this exemaple.

You have 10 items per page.

010 020 030 040 050 060 070 080 090 100 is the first page.
110 120 130 140 150 160 170 180 190 200 is the second one.

Now, if someone insert 101... If will be just after 100 and before 110.

When you will display 10 rows starting at 010 you will stop just
before 101... And for the next page you will start at 110... And 101
will never be displayed...

HTH

JM

2013/1/25, Mohammad Tariq <do...@gmail.com>:
> Hello sir,
>
>       While paging through, store the startkey of the current page of 25
> rows
> in a separate byte[]. Now, if you want to come back to this page when you
> are at the next page do a range query where  startkey would be the rowkey
> you had stored earlier and the endkey would be the startrowkey  of  current
> page. You have to store just one rowkey each time you show a page using
> which you could come back to this page when you are at the next page.
>
> However, this approach will fail in a case where your user would like to go
> to a particular previous page.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan <vi...@scaligent.com>
> wrote:
>
>> I'm displaying rows of data from a HBase table in a data grid UI. The
>> grid
>> shows 25 rows at a time i.e. it is paginated. User can click on
>> Next/Previous to paginate through the data 25 rows at a time. I can
>> implement Next easily by setting a HBase
>> org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the
>> org.apache.hadoop.hbase.client.Scan to be the row id of the next batch's
>> row that is sent to the UI with the previous batch. However, I can't seem
>> to be able to do the same with Previous. I can set the endRow on the Scan
>> to be the row id of the last row of the previous batch but since HBase
>> Scans are always in the forward direction, there is no way to set a
>> PageFilter that can get 25 rows ending at a particular row. The only
>> option
>> seems to be to get *all* rows up to the end row and filter out all but
>> the
>> last 25 in the caller, which seems very inefficient. Any ideas on how
>> this
>> can be done efficiently?
>>
>> --
>> -Vijay
>>
>

Re: Pagination with HBase - getting previous page of data

Posted by Mohammad Tariq <do...@gmail.com>.

Hello sir,

      While paging through, store the startkey of the current page of 25
rows
in a separate byte[]. Now, if you want to come back to this page when you
are at the next page do a range query where  startkey would be the rowkey
you had stored earlier and the endkey would be the startrowkey  of  current
page. You have to store just one rowkey each time you show a page using
which you could come back to this page when you are at the next page.

However, this approach will fail in a case where your user would like to go
to a particular previous page.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan <vi...@scaligent.com> wrote:

> I'm displaying rows of data from a HBase table in a data grid UI. The grid
> shows 25 rows at a time i.e. it is paginated. User can click on
> Next/Previous to paginate through the data 25 rows at a time. I can
> implement Next easily by setting a HBase
> org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the
> org.apache.hadoop.hbase.client.Scan to be the row id of the next batch's
> row that is sent to the UI with the previous batch. However, I can't seem
> to be able to do the same with Previous. I can set the endRow on the Scan
> to be the row id of the last row of the previous batch but since HBase
> Scans are always in the forward direction, there is no way to set a
> PageFilter that can get 25 rows ending at a particular row. The only option
> seems to be to get *all* rows up to the end row and filter out all but the
> last 25 in the caller, which seems very inefficient. Any ideas on how this
> can be done efficiently?
>
> --
> -Vijay
>