You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Terry P." <te...@gmail.com> on 2013/10/11 16:53:28 UTC

Re: How to get count of table rows using accumulo shell

Hi guys,
I'm still a bit of a newbie as I'm more of an admin than a developer, and
now that formal testing has begun, I have testers asking me how to get a
total count of records in Accumulo for verification purposes after test
ingests have been run.

In our case when I say "records" I mean the number of distinct rowkeys, not
the total number of entries.

Is there any way to do this using just the Accumulo shell, maybe by writing
an aggregator or other class that can be run from within the Accumulo shell?

Many thanks in advance,
Terry


On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <te...@gmail.com> wrote:

> Greetings everyone,
> I want to simply get the total count of rows in a table using the accumulo
> shell.  I'm very new to Accumulo so I apologize if it's a newbie question.
>
> I'm prototyping with the accumulo shell, and love how it can ingest
> records using exefile, so I've used python to generate a lot of test data.
> For some test cases in this sprint I need to verify the rows loaded match
> what's expected, hence the reason I need to get the total rows in a table.
>
> I'd bet there is some way to use setiter or setscaniter with the -agg
> option, but I can't figure it out.
>
> Any help would be greatly appreciated.
>
> Best regards,
> Terry
>

Re: How to get count of table rows using accumulo shell

Posted by Billie Rinaldi <bi...@gmail.com>.
It may be the case that temporary scan-specific iterators (which are set
with the setscaniter command) are not applied to the grep or egrep
commands.  It should work with the scan command, though.


On Fri, Oct 11, 2013 at 12:24 PM, Eric Newton <er...@gmail.com> wrote:

> Ya, you'll want to remove the iterator after you do the count.  You
> might be able to use it as a scan-only iterator, but I was just being
> lazy.
>
> -Eric
>
>
> On Fri, Oct 11, 2013 at 3:18 PM, Terry P. <te...@gmail.com> wrote:
> > Thanks Eric, Jared, and Josh.
> >
> > Jared's reply I realize that the setiter command stays in effect beyond
> my
> > shell session obviously.  I see it now with the listiter command in the
> > shell.
> >
> > Our app normally does lookups by rowkey.  Will the firstEntry iterator
> > adversely affect those queries?  I assume not, but I want to double
> check.
> >
> > Thanks again guys, this is very helpful,
> > Terry
> >
> >
> >
> > On Fri, Oct 11, 2013 at 2:15 PM, Eric Newton <er...@gmail.com>
> wrote:
> >>
> >> Actually, the egrep was used on purpose: it's the only way to get the
> >> shell to use the BatchScanner, which can talk to multiple tservers at
> >> once.
> >>
> >> -Eric
> >>
> >>
> >> On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser <jo...@gmail.com>
> wrote:
> >> > You'll need to add the '-np' option on the scan command as well.
> >> >
> >> >
> >> > On 10/11/2013 03:05 PM, Jared Winick wrote:
> >> >>
> >> >> After following the commands Eric lists to set the iterator for that
> >> >> table, instead of running 'egrep' in the shell, you could do this
> from
> >> >> the
> >> >> Linux command line
> >> >>
> >> >> accumulo shell -u username -p password -e "scan -t foo" | wc -l
> >> >>
> >> >>
> >> >> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <eric.newton@gmail.com
> >> >> <ma...@gmail.com>> wrote:
> >> >>
> >> >>     You can stack a counting Combiner over the
> FirstEntryInRowIterator
> >> >> and
> >> >>     batch scan the table. If it's just a test data set with under a
> >> >>     billion rows, you can just count the result set coming out of the
> >> >>     FirstEntryInRowIterator.  You'll be I/O bound at the client, but
> it
> >> >>     will work.
> >> >>
> >> >>     This does it with the shell, but the output is kinda voluminous:
> >> >>
> >> >>     root@test> createtable foo
> >> >>     root@test foo> insert row1 cf col1 value
> >> >>     root@test foo> insert row1 cf col2 value
> >> >>     root@test foo> insert row1 cf col999 value
> >> >>     root@test foo> insert row2 cf col1 value
> >> >>     root@test foo> scan
> >> >>     row1 cf:col1 []    value
> >> >>     row1 cf:col2 []    value
> >> >>     row1 cf:col999 []    value
> >> >>     row2 cf:col1 []    value
> >> >>     root@test foo> setiter -class
> >> >>     org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99
> >> >> -scan
> >> >>     Only allows iteration over the first entry per row
> >> >>     ----------> set FirstEntryInRowIterator parameter
> scansBeforeSeek,
> >> >>     Number of scans to try before seeking [10]: 10
> >> >>     root@test foo> egrep .*
> >> >>     row1 cf:col1 []    value
> >> >>     row2 cf:col1 []    value
> >> >>
> >> >>
> >> >>     On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <texpilot@gmail.com
> >> >>     <ma...@gmail.com>> wrote:
> >> >>     > Hi guys,
> >> >>     > I'm still a bit of a newbie as I'm more of an admin than a
> >> >>     developer, and
> >> >>     > now that formal testing has begun, I have testers asking me how
> >> >>     to get a
> >> >>     > total count of records in Accumulo for verification purposes
> >> >>     after test
> >> >>     > ingests have been run.
> >> >>     >
> >> >>     > In our case when I say "records" I mean the number of distinct
> >> >>     rowkeys, not
> >> >>     > the total number of entries.
> >> >>     >
> >> >>     > Is there any way to do this using just the Accumulo shell,
> maybe
> >> >>     by writing
> >> >>     > an aggregator or other class that can be run from within the
> >> >>     Accumulo shell?
> >> >>     >
> >> >>     > Many thanks in advance,
> >> >>     > Terry
> >> >>     >
> >> >>     >
> >> >>     > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <texpilot@gmail.com
> >> >>     <ma...@gmail.com>> wrote:
> >> >>     >>
> >> >>     >> Greetings everyone,
> >> >>     >> I want to simply get the total count of rows in a table using
> >> >>     the accumulo
> >> >>     >> shell.  I'm very new to Accumulo so I apologize if it's a
> >> >>     newbie question.
> >> >>     >>
> >> >>     >> I'm prototyping with the accumulo shell, and love how it can
> >> >> ingest
> >> >>     >> records using exefile, so I've used python to generate a lot
> of
> >> >>     test data.
> >> >>     >> For some test cases in this sprint I need to verify the rows
> >> >>     loaded match
> >> >>     >> what's expected, hence the reason I need to get the total rows
> >> >>     in a table.
> >> >>     >>
> >> >>     >> I'd bet there is some way to use setiter or setscaniter with
> >> >>     the -agg
> >> >>     >> option, but I can't figure it out.
> >> >>     >>
> >> >>     >> Any help would be greatly appreciated.
> >> >>     >>
> >> >>     >> Best regards,
> >> >>     >> Terry
> >> >>     >
> >> >>     >
> >> >>
> >> >>
> >> >
> >
> >
>

Re: How to get count of table rows using accumulo shell

Posted by Eric Newton <er...@gmail.com>.
Ya, you'll want to remove the iterator after you do the count.  You
might be able to use it as a scan-only iterator, but I was just being
lazy.

-Eric


On Fri, Oct 11, 2013 at 3:18 PM, Terry P. <te...@gmail.com> wrote:
> Thanks Eric, Jared, and Josh.
>
> Jared's reply I realize that the setiter command stays in effect beyond my
> shell session obviously.  I see it now with the listiter command in the
> shell.
>
> Our app normally does lookups by rowkey.  Will the firstEntry iterator
> adversely affect those queries?  I assume not, but I want to double check.
>
> Thanks again guys, this is very helpful,
> Terry
>
>
>
> On Fri, Oct 11, 2013 at 2:15 PM, Eric Newton <er...@gmail.com> wrote:
>>
>> Actually, the egrep was used on purpose: it's the only way to get the
>> shell to use the BatchScanner, which can talk to multiple tservers at
>> once.
>>
>> -Eric
>>
>>
>> On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser <jo...@gmail.com> wrote:
>> > You'll need to add the '-np' option on the scan command as well.
>> >
>> >
>> > On 10/11/2013 03:05 PM, Jared Winick wrote:
>> >>
>> >> After following the commands Eric lists to set the iterator for that
>> >> table, instead of running 'egrep' in the shell, you could do this from
>> >> the
>> >> Linux command line
>> >>
>> >> accumulo shell -u username -p password -e "scan -t foo" | wc -l
>> >>
>> >>
>> >> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <eric.newton@gmail.com
>> >> <ma...@gmail.com>> wrote:
>> >>
>> >>     You can stack a counting Combiner over the FirstEntryInRowIterator
>> >> and
>> >>     batch scan the table. If it's just a test data set with under a
>> >>     billion rows, you can just count the result set coming out of the
>> >>     FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
>> >>     will work.
>> >>
>> >>     This does it with the shell, but the output is kinda voluminous:
>> >>
>> >>     root@test> createtable foo
>> >>     root@test foo> insert row1 cf col1 value
>> >>     root@test foo> insert row1 cf col2 value
>> >>     root@test foo> insert row1 cf col999 value
>> >>     root@test foo> insert row2 cf col1 value
>> >>     root@test foo> scan
>> >>     row1 cf:col1 []    value
>> >>     row1 cf:col2 []    value
>> >>     row1 cf:col999 []    value
>> >>     row2 cf:col1 []    value
>> >>     root@test foo> setiter -class
>> >>     org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99
>> >> -scan
>> >>     Only allows iteration over the first entry per row
>> >>     ----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
>> >>     Number of scans to try before seeking [10]: 10
>> >>     root@test foo> egrep .*
>> >>     row1 cf:col1 []    value
>> >>     row2 cf:col1 []    value
>> >>
>> >>
>> >>     On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <texpilot@gmail.com
>> >>     <ma...@gmail.com>> wrote:
>> >>     > Hi guys,
>> >>     > I'm still a bit of a newbie as I'm more of an admin than a
>> >>     developer, and
>> >>     > now that formal testing has begun, I have testers asking me how
>> >>     to get a
>> >>     > total count of records in Accumulo for verification purposes
>> >>     after test
>> >>     > ingests have been run.
>> >>     >
>> >>     > In our case when I say "records" I mean the number of distinct
>> >>     rowkeys, not
>> >>     > the total number of entries.
>> >>     >
>> >>     > Is there any way to do this using just the Accumulo shell, maybe
>> >>     by writing
>> >>     > an aggregator or other class that can be run from within the
>> >>     Accumulo shell?
>> >>     >
>> >>     > Many thanks in advance,
>> >>     > Terry
>> >>     >
>> >>     >
>> >>     > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <texpilot@gmail.com
>> >>     <ma...@gmail.com>> wrote:
>> >>     >>
>> >>     >> Greetings everyone,
>> >>     >> I want to simply get the total count of rows in a table using
>> >>     the accumulo
>> >>     >> shell.  I'm very new to Accumulo so I apologize if it's a
>> >>     newbie question.
>> >>     >>
>> >>     >> I'm prototyping with the accumulo shell, and love how it can
>> >> ingest
>> >>     >> records using exefile, so I've used python to generate a lot of
>> >>     test data.
>> >>     >> For some test cases in this sprint I need to verify the rows
>> >>     loaded match
>> >>     >> what's expected, hence the reason I need to get the total rows
>> >>     in a table.
>> >>     >>
>> >>     >> I'd bet there is some way to use setiter or setscaniter with
>> >>     the -agg
>> >>     >> option, but I can't figure it out.
>> >>     >>
>> >>     >> Any help would be greatly appreciated.
>> >>     >>
>> >>     >> Best regards,
>> >>     >> Terry
>> >>     >
>> >>     >
>> >>
>> >>
>> >
>
>

Re: How to get count of table rows using accumulo shell

Posted by "Terry P." <te...@gmail.com>.
Thanks Eric, Jared, and Josh.

Jared's reply I realize that the setiter command stays in effect beyond my
shell session obviously.  I see it now with the listiter command in the
shell.

Our app normally does lookups by rowkey.  Will the firstEntry iterator
adversely affect those queries?  I assume not, but I want to double check.

Thanks again guys, this is very helpful,
Terry



On Fri, Oct 11, 2013 at 2:15 PM, Eric Newton <er...@gmail.com> wrote:

> Actually, the egrep was used on purpose: it's the only way to get the
> shell to use the BatchScanner, which can talk to multiple tservers at
> once.
>
> -Eric
>
>
> On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser <jo...@gmail.com> wrote:
> > You'll need to add the '-np' option on the scan command as well.
> >
> >
> > On 10/11/2013 03:05 PM, Jared Winick wrote:
> >>
> >> After following the commands Eric lists to set the iterator for that
> >> table, instead of running 'egrep' in the shell, you could do this from
> the
> >> Linux command line
> >>
> >> accumulo shell -u username -p password -e "scan -t foo" | wc -l
> >>
> >>
> >> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <eric.newton@gmail.com
> >> <ma...@gmail.com>> wrote:
> >>
> >>     You can stack a counting Combiner over the FirstEntryInRowIterator
> and
> >>     batch scan the table. If it's just a test data set with under a
> >>     billion rows, you can just count the result set coming out of the
> >>     FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
> >>     will work.
> >>
> >>     This does it with the shell, but the output is kinda voluminous:
> >>
> >>     root@test> createtable foo
> >>     root@test foo> insert row1 cf col1 value
> >>     root@test foo> insert row1 cf col2 value
> >>     root@test foo> insert row1 cf col999 value
> >>     root@test foo> insert row2 cf col1 value
> >>     root@test foo> scan
> >>     row1 cf:col1 []    value
> >>     row1 cf:col2 []    value
> >>     row1 cf:col999 []    value
> >>     row2 cf:col1 []    value
> >>     root@test foo> setiter -class
> >>     org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99
> -scan
> >>     Only allows iteration over the first entry per row
> >>     ----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
> >>     Number of scans to try before seeking [10]: 10
> >>     root@test foo> egrep .*
> >>     row1 cf:col1 []    value
> >>     row2 cf:col1 []    value
> >>
> >>
> >>     On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <texpilot@gmail.com
> >>     <ma...@gmail.com>> wrote:
> >>     > Hi guys,
> >>     > I'm still a bit of a newbie as I'm more of an admin than a
> >>     developer, and
> >>     > now that formal testing has begun, I have testers asking me how
> >>     to get a
> >>     > total count of records in Accumulo for verification purposes
> >>     after test
> >>     > ingests have been run.
> >>     >
> >>     > In our case when I say "records" I mean the number of distinct
> >>     rowkeys, not
> >>     > the total number of entries.
> >>     >
> >>     > Is there any way to do this using just the Accumulo shell, maybe
> >>     by writing
> >>     > an aggregator or other class that can be run from within the
> >>     Accumulo shell?
> >>     >
> >>     > Many thanks in advance,
> >>     > Terry
> >>     >
> >>     >
> >>     > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <texpilot@gmail.com
> >>     <ma...@gmail.com>> wrote:
> >>     >>
> >>     >> Greetings everyone,
> >>     >> I want to simply get the total count of rows in a table using
> >>     the accumulo
> >>     >> shell.  I'm very new to Accumulo so I apologize if it's a
> >>     newbie question.
> >>     >>
> >>     >> I'm prototyping with the accumulo shell, and love how it can
> ingest
> >>     >> records using exefile, so I've used python to generate a lot of
> >>     test data.
> >>     >> For some test cases in this sprint I need to verify the rows
> >>     loaded match
> >>     >> what's expected, hence the reason I need to get the total rows
> >>     in a table.
> >>     >>
> >>     >> I'd bet there is some way to use setiter or setscaniter with
> >>     the -agg
> >>     >> option, but I can't figure it out.
> >>     >>
> >>     >> Any help would be greatly appreciated.
> >>     >>
> >>     >> Best regards,
> >>     >> Terry
> >>     >
> >>     >
> >>
> >>
> >
>

Re: How to get count of table rows using accumulo shell

Posted by Eric Newton <er...@gmail.com>.
Actually, the egrep was used on purpose: it's the only way to get the
shell to use the BatchScanner, which can talk to multiple tservers at
once.

-Eric


On Fri, Oct 11, 2013 at 3:10 PM, Josh Elser <jo...@gmail.com> wrote:
> You'll need to add the '-np' option on the scan command as well.
>
>
> On 10/11/2013 03:05 PM, Jared Winick wrote:
>>
>> After following the commands Eric lists to set the iterator for that
>> table, instead of running 'egrep' in the shell, you could do this from the
>> Linux command line
>>
>> accumulo shell -u username -p password -e "scan -t foo" | wc -l
>>
>>
>> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <eric.newton@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     You can stack a counting Combiner over the FirstEntryInRowIterator and
>>     batch scan the table. If it's just a test data set with under a
>>     billion rows, you can just count the result set coming out of the
>>     FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
>>     will work.
>>
>>     This does it with the shell, but the output is kinda voluminous:
>>
>>     root@test> createtable foo
>>     root@test foo> insert row1 cf col1 value
>>     root@test foo> insert row1 cf col2 value
>>     root@test foo> insert row1 cf col999 value
>>     root@test foo> insert row2 cf col1 value
>>     root@test foo> scan
>>     row1 cf:col1 []    value
>>     row1 cf:col2 []    value
>>     row1 cf:col999 []    value
>>     row2 cf:col1 []    value
>>     root@test foo> setiter -class
>>     org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan
>>     Only allows iteration over the first entry per row
>>     ----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
>>     Number of scans to try before seeking [10]: 10
>>     root@test foo> egrep .*
>>     row1 cf:col1 []    value
>>     row2 cf:col1 []    value
>>
>>
>>     On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <texpilot@gmail.com
>>     <ma...@gmail.com>> wrote:
>>     > Hi guys,
>>     > I'm still a bit of a newbie as I'm more of an admin than a
>>     developer, and
>>     > now that formal testing has begun, I have testers asking me how
>>     to get a
>>     > total count of records in Accumulo for verification purposes
>>     after test
>>     > ingests have been run.
>>     >
>>     > In our case when I say "records" I mean the number of distinct
>>     rowkeys, not
>>     > the total number of entries.
>>     >
>>     > Is there any way to do this using just the Accumulo shell, maybe
>>     by writing
>>     > an aggregator or other class that can be run from within the
>>     Accumulo shell?
>>     >
>>     > Many thanks in advance,
>>     > Terry
>>     >
>>     >
>>     > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <texpilot@gmail.com
>>     <ma...@gmail.com>> wrote:
>>     >>
>>     >> Greetings everyone,
>>     >> I want to simply get the total count of rows in a table using
>>     the accumulo
>>     >> shell.  I'm very new to Accumulo so I apologize if it's a
>>     newbie question.
>>     >>
>>     >> I'm prototyping with the accumulo shell, and love how it can ingest
>>     >> records using exefile, so I've used python to generate a lot of
>>     test data.
>>     >> For some test cases in this sprint I need to verify the rows
>>     loaded match
>>     >> what's expected, hence the reason I need to get the total rows
>>     in a table.
>>     >>
>>     >> I'd bet there is some way to use setiter or setscaniter with
>>     the -agg
>>     >> option, but I can't figure it out.
>>     >>
>>     >> Any help would be greatly appreciated.
>>     >>
>>     >> Best regards,
>>     >> Terry
>>     >
>>     >
>>
>>
>

Re: How to get count of table rows using accumulo shell

Posted by Josh Elser <jo...@gmail.com>.
You'll need to add the '-np' option on the scan command as well.

On 10/11/2013 03:05 PM, Jared Winick wrote:
> After following the commands Eric lists to set the iterator for that 
> table, instead of running 'egrep' in the shell, you could do this from 
> the Linux command line
>
> accumulo shell -u username -p password -e "scan -t foo" | wc -l
>
>
> On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <eric.newton@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     You can stack a counting Combiner over the FirstEntryInRowIterator and
>     batch scan the table. If it's just a test data set with under a
>     billion rows, you can just count the result set coming out of the
>     FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
>     will work.
>
>     This does it with the shell, but the output is kinda voluminous:
>
>     root@test> createtable foo
>     root@test foo> insert row1 cf col1 value
>     root@test foo> insert row1 cf col2 value
>     root@test foo> insert row1 cf col999 value
>     root@test foo> insert row2 cf col1 value
>     root@test foo> scan
>     row1 cf:col1 []    value
>     row1 cf:col2 []    value
>     row1 cf:col999 []    value
>     row2 cf:col1 []    value
>     root@test foo> setiter -class
>     org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan
>     Only allows iteration over the first entry per row
>     ----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
>     Number of scans to try before seeking [10]: 10
>     root@test foo> egrep .*
>     row1 cf:col1 []    value
>     row2 cf:col1 []    value
>
>
>     On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <texpilot@gmail.com
>     <ma...@gmail.com>> wrote:
>     > Hi guys,
>     > I'm still a bit of a newbie as I'm more of an admin than a
>     developer, and
>     > now that formal testing has begun, I have testers asking me how
>     to get a
>     > total count of records in Accumulo for verification purposes
>     after test
>     > ingests have been run.
>     >
>     > In our case when I say "records" I mean the number of distinct
>     rowkeys, not
>     > the total number of entries.
>     >
>     > Is there any way to do this using just the Accumulo shell, maybe
>     by writing
>     > an aggregator or other class that can be run from within the
>     Accumulo shell?
>     >
>     > Many thanks in advance,
>     > Terry
>     >
>     >
>     > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <texpilot@gmail.com
>     <ma...@gmail.com>> wrote:
>     >>
>     >> Greetings everyone,
>     >> I want to simply get the total count of rows in a table using
>     the accumulo
>     >> shell.  I'm very new to Accumulo so I apologize if it's a
>     newbie question.
>     >>
>     >> I'm prototyping with the accumulo shell, and love how it can ingest
>     >> records using exefile, so I've used python to generate a lot of
>     test data.
>     >> For some test cases in this sprint I need to verify the rows
>     loaded match
>     >> what's expected, hence the reason I need to get the total rows
>     in a table.
>     >>
>     >> I'd bet there is some way to use setiter or setscaniter with
>     the -agg
>     >> option, but I can't figure it out.
>     >>
>     >> Any help would be greatly appreciated.
>     >>
>     >> Best regards,
>     >> Terry
>     >
>     >
>
>


Re: How to get count of table rows using accumulo shell

Posted by Jared Winick <ja...@gmail.com>.
After following the commands Eric lists to set the iterator for that table,
instead of running 'egrep' in the shell, you could do this from the Linux
command line

accumulo shell -u username -p password -e "scan -t foo" | wc -l


On Fri, Oct 11, 2013 at 11:42 AM, Eric Newton <er...@gmail.com> wrote:

> You can stack a counting Combiner over the FirstEntryInRowIterator and
> batch scan the table. If it's just a test data set with under a
> billion rows, you can just count the result set coming out of the
> FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
> will work.
>
> This does it with the shell, but the output is kinda voluminous:
>
> root@test> createtable foo
> root@test foo> insert row1 cf col1 value
> root@test foo> insert row1 cf col2 value
> root@test foo> insert row1 cf col999 value
> root@test foo> insert row2 cf col1 value
> root@test foo> scan
> row1 cf:col1 []    value
> row1 cf:col2 []    value
> row1 cf:col999 []    value
> row2 cf:col1 []    value
> root@test foo> setiter -class
> org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan
> Only allows iteration over the first entry per row
> ----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
> Number of scans to try before seeking [10]: 10
> root@test foo> egrep .*
> row1 cf:col1 []    value
> row2 cf:col1 []    value
>
>
> On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <te...@gmail.com> wrote:
> > Hi guys,
> > I'm still a bit of a newbie as I'm more of an admin than a developer, and
> > now that formal testing has begun, I have testers asking me how to get a
> > total count of records in Accumulo for verification purposes after test
> > ingests have been run.
> >
> > In our case when I say "records" I mean the number of distinct rowkeys,
> not
> > the total number of entries.
> >
> > Is there any way to do this using just the Accumulo shell, maybe by
> writing
> > an aggregator or other class that can be run from within the Accumulo
> shell?
> >
> > Many thanks in advance,
> > Terry
> >
> >
> > On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <te...@gmail.com> wrote:
> >>
> >> Greetings everyone,
> >> I want to simply get the total count of rows in a table using the
> accumulo
> >> shell.  I'm very new to Accumulo so I apologize if it's a newbie
> question.
> >>
> >> I'm prototyping with the accumulo shell, and love how it can ingest
> >> records using exefile, so I've used python to generate a lot of test
> data.
> >> For some test cases in this sprint I need to verify the rows loaded
> match
> >> what's expected, hence the reason I need to get the total rows in a
> table.
> >>
> >> I'd bet there is some way to use setiter or setscaniter with the -agg
> >> option, but I can't figure it out.
> >>
> >> Any help would be greatly appreciated.
> >>
> >> Best regards,
> >> Terry
> >
> >
>

Re: How to get count of table rows using accumulo shell

Posted by Eric Newton <er...@gmail.com>.
You can stack a counting Combiner over the FirstEntryInRowIterator and
batch scan the table. If it's just a test data set with under a
billion rows, you can just count the result set coming out of the
FirstEntryInRowIterator.  You'll be I/O bound at the client, but it
will work.

This does it with the shell, but the output is kinda voluminous:

root@test> createtable foo
root@test foo> insert row1 cf col1 value
root@test foo> insert row1 cf col2 value
root@test foo> insert row1 cf col999 value
root@test foo> insert row2 cf col1 value
root@test foo> scan
row1 cf:col1 []    value
row1 cf:col2 []    value
row1 cf:col999 []    value
row2 cf:col1 []    value
root@test foo> setiter -class
org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan
Only allows iteration over the first entry per row
----------> set FirstEntryInRowIterator parameter scansBeforeSeek,
Number of scans to try before seeking [10]: 10
root@test foo> egrep .*
row1 cf:col1 []    value
row2 cf:col1 []    value


On Fri, Oct 11, 2013 at 10:53 AM, Terry P. <te...@gmail.com> wrote:
> Hi guys,
> I'm still a bit of a newbie as I'm more of an admin than a developer, and
> now that formal testing has begun, I have testers asking me how to get a
> total count of records in Accumulo for verification purposes after test
> ingests have been run.
>
> In our case when I say "records" I mean the number of distinct rowkeys, not
> the total number of entries.
>
> Is there any way to do this using just the Accumulo shell, maybe by writing
> an aggregator or other class that can be run from within the Accumulo shell?
>
> Many thanks in advance,
> Terry
>
>
> On Tue, Jan 22, 2013 at 6:03 PM, Terry P. <te...@gmail.com> wrote:
>>
>> Greetings everyone,
>> I want to simply get the total count of rows in a table using the accumulo
>> shell.  I'm very new to Accumulo so I apologize if it's a newbie question.
>>
>> I'm prototyping with the accumulo shell, and love how it can ingest
>> records using exefile, so I've used python to generate a lot of test data.
>> For some test cases in this sprint I need to verify the rows loaded match
>> what's expected, hence the reason I need to get the total rows in a table.
>>
>> I'd bet there is some way to use setiter or setscaniter with the -agg
>> option, but I can't figure it out.
>>
>> Any help would be greatly appreciated.
>>
>> Best regards,
>> Terry
>
>