You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Daniel Ruiz <da...@gmail.com> on 2015/08/12 07:51:34 UTC

Fetch Taking Longer Than Expected

Hi All,

 

I am having an issue where column fetches are taking over a minute on 1.6.3.
I don't believe this should be case and my experience in the past supports
the idea that fetches should be very fast.  

 

For example we doing a scan on the table gives results instantly but doing a
scan  -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds (plus or
minus 1 second).  

 



Figure 1.1. Generated Test Data on GUIDIndexTable

 

Here is the table config

-----------+---------------------------------------------------------------+
----------------------------------------------------------------------------
-----

SCOPE      | NAME                                                          |
VALUE

-----------+---------------------------------------------------------------+
----------------------------------------------------------------------------
-----

default    | table.balancer .............................................. |
org.apache.accumulo.server.master.balancer.DefaultLoadBalancer

default    | table.bloom.enabled ......................................... |
false

default    | table.bloom.error.rate ...................................... |
0.5%

default    | table.bloom.hash.type ....................................... |
murmur

default    | table.bloom.key.functor ..................................... |
org.apache.accumulo.core.file.keyfunctor.RowFunctor

default    | table.bloom.load.threshold .................................. |
1

default    | table.bloom.size ............................................ |
1048576

default    | table.cache.block.enable .................................... |
false

default    | table.cache.index.enable .................................... |
true

default    | table.classpath.context ..................................... |

default    | table.compaction.major.everything.idle ...................... |
1h

default    | table.compaction.major.ratio ................................ |
3

default    | table.compaction.minor.idle ................................. |
5m

default    | table.compaction.minor.logs.threshold ....................... |
3

table      | table.constraint.1 .......................................... |
org.apache.accumulo.core.constraints.DefaultKeySizeConstraint

default    | table.failures.ignore ....................................... |
false

default    | table.file.blocksize ........................................ |
0B

default    | table.file.compress.blocksize ............................... |
100K

default    | table.file.compress.blocksize.index ......................... |
128K

default    | table.file.compress.type .................................... |
gz

default    | table.file.max .............................................. |
15

default    | table.file.replication ...................................... |
0

default    | table.file.type ............................................. |
rf

default    | table.formatter ............................................. |
org.apache.accumulo.core.util.format.DefaultFormatter

default    | table.groups.enabled ........................................ |

default    | table.interepreter .......................................... |
org.apache.accumulo.core.util.interpret.DefaultScanInterpreter

table      | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
1,org.apache.accumulo.core.iterators.user.AgeOffFilter

table      | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
2592000000

table      | table.iterator.majc.vers .................................... |
20,org.apache.accumulo.core.iterators.user.VersioningIterator

table      | table.iterator.majc.vers.opt.maxVersions .................... |
1

table      | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
1,org.apache.accumulo.core.iterators.user.AgeOffFilter

table      | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
2592000000

table      | table.iterator.minc.vers .................................... |
20,org.apache.accumulo.core.iterators.user.VersioningIterator

table      | table.iterator.minc.vers.opt.maxVersions .................... |
1

table      | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
1,org.apache.accumulo.core.iterators.user.AgeOffFilter

table      | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
2592000000

---------------------------------------------------------- hit any key to
continue or 'q' to quit
----------------------------------------------------------

table      | table.iterator.scan.vers .................................... |
20,org.apache.accumulo.core.iterators.user.VersioningIterator

table      | table.iterator.scan.vers.opt.maxVersions .................... |
1

default    | table.majc.compaction.strategy .............................. |
org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy

default    | table.scan.max.memory ....................................... |
512K

table      |    @override ................................................ |
1M

default    | table.security.scan.visibility.default ...................... |

default    | table.split.threshold ....................................... |
1G

default    | table.walog.enabled ......................................... |
true

-----------+---------------------------------------------------------------+
----------------------------------------------------------------------------
-----

 

More Table Info:


 <http://107.23.12.24:50095/tables?t=f> GUIDIndexTable

ONLINE

2

0

82.56M

810.00K

159

 

 

Please let me know if I am doing something wrong to if there is more
information you need. 

 

V/r,

-Daniel

RE: Fetch Taking Longer Than Expected

Posted by Daniel Ruiz <da...@gmail.com>.

Okay, thanks for the information and your time it has been very helpful.

V/r,
-Daniel

-----Original Message-----
From: Josh Elser [mailto:josh.elser@gmail.com] 
Sent: Friday, August 14, 2015 10:04 AM
To: user@accumulo.apache.org
Subject: Re: Fetch Taking Longer Than Expected

"Small" might also be misleading. A locality group can have be a good 
way to separate a large collection of data from an actually small number 
of other records. Discrete yes, but the data itself does not need to be 
small to put it into a locality group.

Christopher wrote:
> I would be surprised if anybody has tested more than a dozen or two
> locality groups or placed more than a dozen or two column families in
> any one locality group.
>
>
> On Fri, Aug 14, 2015, 01:28 Daniel Ruiz <daruiz.work@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Thanks...We landed up doing just that.  Correct having a bunch of
>     random data does not fit well with locality groups.  I did have
>     another question though you mentioned  a "small discrete set".  What
>     would you consider small?  Would you recommend for example against
>     having several thousand locality groups in a table?
>
>     V/r,
>     -Daniel
>     -----Original Message-----
>     From: Christopher [mailto:ctubbsii@apache.org
>     <ma...@apache.org>]
>     Sent: Wednesday, August 12, 2015 3:08 PM
>     To: Accumulo User List <user@accumulo.apache.org
>     <ma...@accumulo.apache.org>>
>     Subject: Re: Fetch Taking Longer Than Expected
>
>     The schema shown above doesn't quite look like it's well-suited for
>     locality groups, though. The CF field looks like it's a composition of
>     an attribute name and that attribute's value. To take advantage of
>     locality groups with that schema, you'd have to have a locality group
>     for every attribute name/value combination, which would probably not
>     work well.
>
>     If you want to take advantage of locality groups, you'll probably want
>     to make your CFs a small, discrete set (like just attribute names).
>     So, if you push the attribute value into the CQ, you could at the very
>     least limit your search to the locality containing the particular
>     attribute name you are searching for.
>
>     If you really want efficient searches based on attribute name/value
>     combinations, you're going to want to put this up the row (at the
>     beginning of your row), so your data is ordered (indexed) by that. You
>     could do this in a secondary index (which could be in a different
>     table, a different segment of this table, or in a separate locality
>     group in this table).
>
>     --
>     Christopher L Tubbs II
>     http://gravatar.com/ctubbsii
>
>
>     On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <josh.elser@gmail.com
>     <ma...@gmail.com>> wrote:
>      > Yup, that would be expected.
>      >
>      > Remember that doing `scan -c ...` is an unbounded search over
>     your entire
>      > table. So, it takes approximately 3 minutes to read your
>     GUIDIndexTable.
>      > Because you have a single locality group, all of the columns in
>     your table
>      > are grouped together.
>      >
>      > One exercise that may be interesting for yourself is to create a
>     locality
>      > group that has your specific column family in it, compact your
>      > GUIDIndexTable, and rerun your `scan -c` query. The speed should
>     be similar
>      > to your exact scan. Removing the locality group and re-compacting
>     the table
>      > should return the query time back to the slow 3 minutes.
>      >
>      > Does that make sense?
>      >
>      > Daniel Ruiz wrote:
>      >>
>      >> Hi All,
>      >>
>      >> I am having an issue where column fetches are taking over a
>     minute on
>      >> 1.6.3. I don’t believe this should be case and my experience in
>     the past
>      >> supports the idea that fetches should be very fast.
>      >>
>      >> For example we doing a scan on the table gives results instantly but
>      >> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44
>     seconds
>      >> (plus or minus 1 second).
>      >>
>      >> Figure 1.1. Generated Test Data on GUIDIndexTable
>      >>
>      >> Here is the table config
>      >>
>      >>
>      >>
>     -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>      >>
>      >> SCOPE | NAME | VALUE
>      >>
>      >>
>      >>
>     -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>      >>
>      >> default | table.balancer
>     ..............................................
>      >> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>      >>
>      >> default | table.bloom.enabled
>     .........................................
>      >> | false
>      >>
>      >> default | table.bloom.error.rate
>     ......................................
>      >> | 0.5%
>      >>
>      >> default | table.bloom.hash.type
>     .......................................
>      >> | murmur
>      >>
>      >> default | table.bloom.key.functor
>     .....................................
>      >> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>      >>
>      >> default | table.bloom.load.threshold
>     .................................. |
>      >> 1
>      >>
>      >> default | table.bloom.size
>     ............................................
>      >> | 1048576
>      >>
>      >> default | table.cache.block.enable
>     ....................................
>      >> | false
>      >>
>      >> default | table.cache.index.enable
>     ....................................
>      >> | true
>      >>
>      >> default | table.classpath.context
>     ..................................... |
>      >>
>      >> default | table.compaction.major.everything.idle
>     ...................... |
>      >> 1h
>      >>
>      >> default | table.compaction.major.ratio
>     ................................ |
>      >> 3
>      >>
>      >> default | table.compaction.minor.idle
>     ................................. |
>      >> 5m
>      >>
>      >> default | table.compaction.minor.logs.threshold
>     ....................... |
>      >> 3
>      >>
>      >> table | table.constraint.1
>     .......................................... |
>      >> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>      >>
>      >> default | table.failures.ignore
>     .......................................
>      >> | false
>      >>
>      >> default | table.file.blocksize
>     ........................................ |
>      >> 0B
>      >>
>      >> default | table.file.compress.blocksize
>     ...............................
>      >> | 100K
>      >>
>      >> default | table.file.compress.blocksize.index
>     .........................
>      >> | 128K
>      >>
>      >> default | table.file.compress.type
>     .................................... |
>      >> gz
>      >>
>      >> default | table.file.max
>     .............................................. |
>      >> 15
>      >>
>      >> default | table.file.replication
>     ...................................... |
>      >> 0
>      >>
>      >> default | table.file.type
>     ............................................. |
>      >> rf
>      >>
>      >> default | table.formatter
>     .............................................
>      >> | org.apache.accumulo.core.util.format.DefaultFormatter
>      >>
>      >> default | table.groups.enabled
>     ........................................ |
>      >>
>      >> default | table.interepreter
>     ..........................................
>      >> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>      >>
>      >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable
>     .......... |
>      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>      >>
>      >> table |
>     table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>      >> 2592000000
>      >>
>      >> table | table.iterator.majc.vers
>     .................................... |
>      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>      >>
>      >> table | table.iterator.majc.vers.opt.maxVersions
>     .................... | 1
>      >>
>      >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable
>     .......... |
>      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>      >>
>      >> table |
>     table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>      >> 2592000000
>      >>
>      >> table | table.iterator.minc.vers
>     .................................... |
>      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>      >>
>      >> table | table.iterator.minc.vers.opt.maxVersions
>     .................... | 1
>      >>
>      >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable
>     .......... |
>      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>      >>
>      >> table |
>     table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>      >> 2592000000
>      >>
>      >> ---------------------------------------------------------- hit
>     any key
>      >> to continue or 'q' to quit
>      >> ----------------------------------------------------------
>      >>
>      >> table | table.iterator.scan.vers
>     .................................... |
>      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>      >>
>      >> table | table.iterator.scan.vers.opt.maxVersions
>     .................... | 1
>      >>
>      >> default | table.majc.compaction.strategy
>     ..............................
>      >> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>      >>
>      >> default | table.scan.max.memory
>     .......................................
>      >> | 512K
>      >>
>      >> table | @override
>     ................................................ | 1M
>      >>
>      >> default | table.security.scan.visibility.default
>     ...................... |
>      >>
>      >> default | table.split.threshold
>     ....................................... |
>      >> 1G
>      >>
>      >> default | table.walog.enabled
>     .........................................
>      >> | true
>      >>
>      >>
>      >>
>     -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>      >>
>      >> More Table Info:
>      >>
>      >> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
>      >>
>      >>
>      >>
>      >> ONLINE
>      >>
>      >>
>      >>
>      >> 2
>      >>
>      >>
>      >>
>      >> 0
>      >>
>      >>
>      >>
>      >> 82.56M
>      >>
>      >>
>      >>
>      >> 810.00K
>      >>
>      >>
>      >>
>      >> 159
>      >>
>      >> Please let me know if I am doing something wrong to if there is more
>      >> information you need.
>      >>
>      >> V/r,
>      >>
>      >> -Daniel
>      >>
>      >
>

Re: Fetch Taking Longer Than Expected

Posted by Christopher <ct...@apache.org>.

Yes, Josh is right. Sorry if my wording led to any unnecessary confusion.

On Fri, Aug 14, 2015, 12:04 Josh Elser <jo...@gmail.com> wrote:

> "Small" might also be misleading. A locality group can have be a good
> way to separate a large collection of data from an actually small number
> of other records. Discrete yes, but the data itself does not need to be
> small to put it into a locality group.
>
> Christopher wrote:
> > I would be surprised if anybody has tested more than a dozen or two
> > locality groups or placed more than a dozen or two column families in
> > any one locality group.
> >
> >
> > On Fri, Aug 14, 2015, 01:28 Daniel Ruiz <daruiz.work@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> >     Thanks...We landed up doing just that.  Correct having a bunch of
> >     random data does not fit well with locality groups.  I did have
> >     another question though you mentioned  a "small discrete set".  What
> >     would you consider small?  Would you recommend for example against
> >     having several thousand locality groups in a table?
> >
> >     V/r,
> >     -Daniel
> >     -----Original Message-----
> >     From: Christopher [mailto:ctubbsii@apache.org
> >     <ma...@apache.org>]
> >     Sent: Wednesday, August 12, 2015 3:08 PM
> >     To: Accumulo User List <user@accumulo.apache.org
> >     <ma...@accumulo.apache.org>>
> >     Subject: Re: Fetch Taking Longer Than Expected
> >
> >     The schema shown above doesn't quite look like it's well-suited for
> >     locality groups, though. The CF field looks like it's a composition
> of
> >     an attribute name and that attribute's value. To take advantage of
> >     locality groups with that schema, you'd have to have a locality group
> >     for every attribute name/value combination, which would probably not
> >     work well.
> >
> >     If you want to take advantage of locality groups, you'll probably
> want
> >     to make your CFs a small, discrete set (like just attribute names).
> >     So, if you push the attribute value into the CQ, you could at the
> very
> >     least limit your search to the locality containing the particular
> >     attribute name you are searching for.
> >
> >     If you really want efficient searches based on attribute name/value
> >     combinations, you're going to want to put this up the row (at the
> >     beginning of your row), so your data is ordered (indexed) by that.
> You
> >     could do this in a secondary index (which could be in a different
> >     table, a different segment of this table, or in a separate locality
> >     group in this table).
> >
> >     --
> >     Christopher L Tubbs II
> >     http://gravatar.com/ctubbsii
> >
> >
> >     On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <josh.elser@gmail.com
> >     <ma...@gmail.com>> wrote:
> >      > Yup, that would be expected.
> >      >
> >      > Remember that doing `scan -c ...` is an unbounded search over
> >     your entire
> >      > table. So, it takes approximately 3 minutes to read your
> >     GUIDIndexTable.
> >      > Because you have a single locality group, all of the columns in
> >     your table
> >      > are grouped together.
> >      >
> >      > One exercise that may be interesting for yourself is to create a
> >     locality
> >      > group that has your specific column family in it, compact your
> >      > GUIDIndexTable, and rerun your `scan -c` query. The speed should
> >     be similar
> >      > to your exact scan. Removing the locality group and re-compacting
> >     the table
> >      > should return the query time back to the slow 3 minutes.
> >      >
> >      > Does that make sense?
> >      >
> >      > Daniel Ruiz wrote:
> >      >>
> >      >> Hi All,
> >      >>
> >      >> I am having an issue where column fetches are taking over a
> >     minute on
> >      >> 1.6.3. I don’t believe this should be case and my experience in
> >     the past
> >      >> supports the idea that fetches should be very fast.
> >      >>
> >      >> For example we doing a scan on the table gives results instantly
> but
> >      >> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44
> >     seconds
> >      >> (plus or minus 1 second).
> >      >>
> >      >> Figure 1.1. Generated Test Data on GUIDIndexTable
> >      >>
> >      >> Here is the table config
> >      >>
> >      >>
> >      >>
> >
>  -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >      >>
> >      >> SCOPE | NAME | VALUE
> >      >>
> >      >>
> >      >>
> >
>  -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >      >>
> >      >> default | table.balancer
> >     ..............................................
> >      >> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
> >      >>
> >      >> default | table.bloom.enabled
> >     .........................................
> >      >> | false
> >      >>
> >      >> default | table.bloom.error.rate
> >     ......................................
> >      >> | 0.5%
> >      >>
> >      >> default | table.bloom.hash.type
> >     .......................................
> >      >> | murmur
> >      >>
> >      >> default | table.bloom.key.functor
> >     .....................................
> >      >> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
> >      >>
> >      >> default | table.bloom.load.threshold
> >     .................................. |
> >      >> 1
> >      >>
> >      >> default | table.bloom.size
> >     ............................................
> >      >> | 1048576
> >      >>
> >      >> default | table.cache.block.enable
> >     ....................................
> >      >> | false
> >      >>
> >      >> default | table.cache.index.enable
> >     ....................................
> >      >> | true
> >      >>
> >      >> default | table.classpath.context
> >     ..................................... |
> >      >>
> >      >> default | table.compaction.major.everything.idle
> >     ...................... |
> >      >> 1h
> >      >>
> >      >> default | table.compaction.major.ratio
> >     ................................ |
> >      >> 3
> >      >>
> >      >> default | table.compaction.minor.idle
> >     ................................. |
> >      >> 5m
> >      >>
> >      >> default | table.compaction.minor.logs.threshold
> >     ....................... |
> >      >> 3
> >      >>
> >      >> table | table.constraint.1
> >     .......................................... |
> >      >> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
> >      >>
> >      >> default | table.failures.ignore
> >     .......................................
> >      >> | false
> >      >>
> >      >> default | table.file.blocksize
> >     ........................................ |
> >      >> 0B
> >      >>
> >      >> default | table.file.compress.blocksize
> >     ...............................
> >      >> | 100K
> >      >>
> >      >> default | table.file.compress.blocksize.index
> >     .........................
> >      >> | 128K
> >      >>
> >      >> default | table.file.compress.type
> >     .................................... |
> >      >> gz
> >      >>
> >      >> default | table.file.max
> >     .............................................. |
> >      >> 15
> >      >>
> >      >> default | table.file.replication
> >     ...................................... |
> >      >> 0
> >      >>
> >      >> default | table.file.type
> >     ............................................. |
> >      >> rf
> >      >>
> >      >> default | table.formatter
> >     .............................................
> >      >> | org.apache.accumulo.core.util.format.DefaultFormatter
> >      >>
> >      >> default | table.groups.enabled
> >     ........................................ |
> >      >>
> >      >> default | table.interepreter
> >     ..........................................
> >      >> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
> >      >>
> >      >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable
> >     .......... |
> >      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >      >>
> >      >> table |
> >     table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >      >> 2592000000
> >      >>
> >      >> table | table.iterator.majc.vers
> >     .................................... |
> >      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >      >>
> >      >> table | table.iterator.majc.vers.opt.maxVersions
> >     .................... | 1
> >      >>
> >      >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable
> >     .......... |
> >      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >      >>
> >      >> table |
> >     table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >      >> 2592000000
> >      >>
> >      >> table | table.iterator.minc.vers
> >     .................................... |
> >      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >      >>
> >      >> table | table.iterator.minc.vers.opt.maxVersions
> >     .................... | 1
> >      >>
> >      >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable
> >     .......... |
> >      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >      >>
> >      >> table |
> >     table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >      >> 2592000000
> >      >>
> >      >> ---------------------------------------------------------- hit
> >     any key
> >      >> to continue or 'q' to quit
> >      >> ----------------------------------------------------------
> >      >>
> >      >> table | table.iterator.scan.vers
> >     .................................... |
> >      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >      >>
> >      >> table | table.iterator.scan.vers.opt.maxVersions
> >     .................... | 1
> >      >>
> >      >> default | table.majc.compaction.strategy
> >     ..............................
> >      >> |
> org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
> >      >>
> >      >> default | table.scan.max.memory
> >     .......................................
> >      >> | 512K
> >      >>
> >      >> table | @override
> >     ................................................ | 1M
> >      >>
> >      >> default | table.security.scan.visibility.default
> >     ...................... |
> >      >>
> >      >> default | table.split.threshold
> >     ....................................... |
> >      >> 1G
> >      >>
> >      >> default | table.walog.enabled
> >     .........................................
> >      >> | true
> >      >>
> >      >>
> >      >>
> >
>  -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >      >>
> >      >> More Table Info:
> >      >>
> >      >> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
> >      >>
> >      >>
> >      >>
> >      >> ONLINE
> >      >>
> >      >>
> >      >>
> >      >> 2
> >      >>
> >      >>
> >      >>
> >      >> 0
> >      >>
> >      >>
> >      >>
> >      >> 82.56M
> >      >>
> >      >>
> >      >>
> >      >> 810.00K
> >      >>
> >      >>
> >      >>
> >      >> 159
> >      >>
> >      >> Please let me know if I am doing something wrong to if there is
> more
> >      >> information you need.
> >      >>
> >      >> V/r,
> >      >>
> >      >> -Daniel
> >      >>
> >      >
> >
>

Re: Fetch Taking Longer Than Expected

Posted by Josh Elser <jo...@gmail.com>.

"Small" might also be misleading. A locality group can have be a good 
way to separate a large collection of data from an actually small number 
of other records. Discrete yes, but the data itself does not need to be 
small to put it into a locality group.

Christopher wrote:
> I would be surprised if anybody has tested more than a dozen or two
> locality groups or placed more than a dozen or two column families in
> any one locality group.
>
>
> On Fri, Aug 14, 2015, 01:28 Daniel Ruiz <daruiz.work@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Thanks...We landed up doing just that.  Correct having a bunch of
>     random data does not fit well with locality groups.  I did have
>     another question though you mentioned  a "small discrete set".  What
>     would you consider small?  Would you recommend for example against
>     having several thousand locality groups in a table?
>
>     V/r,
>     -Daniel
>     -----Original Message-----
>     From: Christopher [mailto:ctubbsii@apache.org
>     <ma...@apache.org>]
>     Sent: Wednesday, August 12, 2015 3:08 PM
>     To: Accumulo User List <user@accumulo.apache.org
>     <ma...@accumulo.apache.org>>
>     Subject: Re: Fetch Taking Longer Than Expected
>
>     The schema shown above doesn't quite look like it's well-suited for
>     locality groups, though. The CF field looks like it's a composition of
>     an attribute name and that attribute's value. To take advantage of
>     locality groups with that schema, you'd have to have a locality group
>     for every attribute name/value combination, which would probably not
>     work well.
>
>     If you want to take advantage of locality groups, you'll probably want
>     to make your CFs a small, discrete set (like just attribute names).
>     So, if you push the attribute value into the CQ, you could at the very
>     least limit your search to the locality containing the particular
>     attribute name you are searching for.
>
>     If you really want efficient searches based on attribute name/value
>     combinations, you're going to want to put this up the row (at the
>     beginning of your row), so your data is ordered (indexed) by that. You
>     could do this in a secondary index (which could be in a different
>     table, a different segment of this table, or in a separate locality
>     group in this table).
>
>     --
>     Christopher L Tubbs II
>     http://gravatar.com/ctubbsii
>
>
>     On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <josh.elser@gmail.com
>     <ma...@gmail.com>> wrote:
>      > Yup, that would be expected.
>      >
>      > Remember that doing `scan -c ...` is an unbounded search over
>     your entire
>      > table. So, it takes approximately 3 minutes to read your
>     GUIDIndexTable.
>      > Because you have a single locality group, all of the columns in
>     your table
>      > are grouped together.
>      >
>      > One exercise that may be interesting for yourself is to create a
>     locality
>      > group that has your specific column family in it, compact your
>      > GUIDIndexTable, and rerun your `scan -c` query. The speed should
>     be similar
>      > to your exact scan. Removing the locality group and re-compacting
>     the table
>      > should return the query time back to the slow 3 minutes.
>      >
>      > Does that make sense?
>      >
>      > Daniel Ruiz wrote:
>      >>
>      >> Hi All,
>      >>
>      >> I am having an issue where column fetches are taking over a
>     minute on
>      >> 1.6.3. I don’t believe this should be case and my experience in
>     the past
>      >> supports the idea that fetches should be very fast.
>      >>
>      >> For example we doing a scan on the table gives results instantly but
>      >> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44
>     seconds
>      >> (plus or minus 1 second).
>      >>
>      >> Figure 1.1. Generated Test Data on GUIDIndexTable
>      >>
>      >> Here is the table config
>      >>
>      >>
>      >>
>     -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>      >>
>      >> SCOPE | NAME | VALUE
>      >>
>      >>
>      >>
>     -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>      >>
>      >> default | table.balancer
>     ..............................................
>      >> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>      >>
>      >> default | table.bloom.enabled
>     .........................................
>      >> | false
>      >>
>      >> default | table.bloom.error.rate
>     ......................................
>      >> | 0.5%
>      >>
>      >> default | table.bloom.hash.type
>     .......................................
>      >> | murmur
>      >>
>      >> default | table.bloom.key.functor
>     .....................................
>      >> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>      >>
>      >> default | table.bloom.load.threshold
>     .................................. |
>      >> 1
>      >>
>      >> default | table.bloom.size
>     ............................................
>      >> | 1048576
>      >>
>      >> default | table.cache.block.enable
>     ....................................
>      >> | false
>      >>
>      >> default | table.cache.index.enable
>     ....................................
>      >> | true
>      >>
>      >> default | table.classpath.context
>     ..................................... |
>      >>
>      >> default | table.compaction.major.everything.idle
>     ...................... |
>      >> 1h
>      >>
>      >> default | table.compaction.major.ratio
>     ................................ |
>      >> 3
>      >>
>      >> default | table.compaction.minor.idle
>     ................................. |
>      >> 5m
>      >>
>      >> default | table.compaction.minor.logs.threshold
>     ....................... |
>      >> 3
>      >>
>      >> table | table.constraint.1
>     .......................................... |
>      >> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>      >>
>      >> default | table.failures.ignore
>     .......................................
>      >> | false
>      >>
>      >> default | table.file.blocksize
>     ........................................ |
>      >> 0B
>      >>
>      >> default | table.file.compress.blocksize
>     ...............................
>      >> | 100K
>      >>
>      >> default | table.file.compress.blocksize.index
>     .........................
>      >> | 128K
>      >>
>      >> default | table.file.compress.type
>     .................................... |
>      >> gz
>      >>
>      >> default | table.file.max
>     .............................................. |
>      >> 15
>      >>
>      >> default | table.file.replication
>     ...................................... |
>      >> 0
>      >>
>      >> default | table.file.type
>     ............................................. |
>      >> rf
>      >>
>      >> default | table.formatter
>     .............................................
>      >> | org.apache.accumulo.core.util.format.DefaultFormatter
>      >>
>      >> default | table.groups.enabled
>     ........................................ |
>      >>
>      >> default | table.interepreter
>     ..........................................
>      >> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>      >>
>      >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable
>     .......... |
>      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>      >>
>      >> table |
>     table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>      >> 2592000000
>      >>
>      >> table | table.iterator.majc.vers
>     .................................... |
>      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>      >>
>      >> table | table.iterator.majc.vers.opt.maxVersions
>     .................... | 1
>      >>
>      >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable
>     .......... |
>      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>      >>
>      >> table |
>     table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>      >> 2592000000
>      >>
>      >> table | table.iterator.minc.vers
>     .................................... |
>      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>      >>
>      >> table | table.iterator.minc.vers.opt.maxVersions
>     .................... | 1
>      >>
>      >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable
>     .......... |
>      >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>      >>
>      >> table |
>     table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>      >> 2592000000
>      >>
>      >> ---------------------------------------------------------- hit
>     any key
>      >> to continue or 'q' to quit
>      >> ----------------------------------------------------------
>      >>
>      >> table | table.iterator.scan.vers
>     .................................... |
>      >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>      >>
>      >> table | table.iterator.scan.vers.opt.maxVersions
>     .................... | 1
>      >>
>      >> default | table.majc.compaction.strategy
>     ..............................
>      >> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>      >>
>      >> default | table.scan.max.memory
>     .......................................
>      >> | 512K
>      >>
>      >> table | @override
>     ................................................ | 1M
>      >>
>      >> default | table.security.scan.visibility.default
>     ...................... |
>      >>
>      >> default | table.split.threshold
>     ....................................... |
>      >> 1G
>      >>
>      >> default | table.walog.enabled
>     .........................................
>      >> | true
>      >>
>      >>
>      >>
>     -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>      >>
>      >> More Table Info:
>      >>
>      >> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
>      >>
>      >>
>      >>
>      >> ONLINE
>      >>
>      >>
>      >>
>      >> 2
>      >>
>      >>
>      >>
>      >> 0
>      >>
>      >>
>      >>
>      >> 82.56M
>      >>
>      >>
>      >>
>      >> 810.00K
>      >>
>      >>
>      >>
>      >> 159
>      >>
>      >> Please let me know if I am doing something wrong to if there is more
>      >> information you need.
>      >>
>      >> V/r,
>      >>
>      >> -Daniel
>      >>
>      >
>

Re: Fetch Taking Longer Than Expected

Posted by Christopher <ct...@apache.org>.

I would be surprised if anybody has tested more than a dozen or two
locality groups or placed more than a dozen or two column families in any
one locality group.

On Fri, Aug 14, 2015, 01:28 Daniel Ruiz <da...@gmail.com> wrote:

> Thanks...We landed up doing just that.  Correct having a bunch of random
> data does not fit well with locality groups.  I did have another question
> though you mentioned  a "small discrete set".  What would you consider
> small?  Would you recommend for example against having several thousand
> locality groups in a table?
>
> V/r,
> -Daniel
> -----Original Message-----
> From: Christopher [mailto:ctubbsii@apache.org]
> Sent: Wednesday, August 12, 2015 3:08 PM
> To: Accumulo User List <us...@accumulo.apache.org>
> Subject: Re: Fetch Taking Longer Than Expected
>
> The schema shown above doesn't quite look like it's well-suited for
> locality groups, though. The CF field looks like it's a composition of
> an attribute name and that attribute's value. To take advantage of
> locality groups with that schema, you'd have to have a locality group
> for every attribute name/value combination, which would probably not
> work well.
>
> If you want to take advantage of locality groups, you'll probably want
> to make your CFs a small, discrete set (like just attribute names).
> So, if you push the attribute value into the CQ, you could at the very
> least limit your search to the locality containing the particular
> attribute name you are searching for.
>
> If you really want efficient searches based on attribute name/value
> combinations, you're going to want to put this up the row (at the
> beginning of your row), so your data is ordered (indexed) by that. You
> could do this in a secondary index (which could be in a different
> table, a different segment of this table, or in a separate locality
> group in this table).
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <jo...@gmail.com> wrote:
> > Yup, that would be expected.
> >
> > Remember that doing `scan -c ...` is an unbounded search over your entire
> > table. So, it takes approximately 3 minutes to read your GUIDIndexTable.
> > Because you have a single locality group, all of the columns in your
> table
> > are grouped together.
> >
> > One exercise that may be interesting for yourself is to create a locality
> > group that has your specific column family in it, compact your
> > GUIDIndexTable, and rerun your `scan -c` query. The speed should be
> similar
> > to your exact scan. Removing the locality group and re-compacting the
> table
> > should return the query time back to the slow 3 minutes.
> >
> > Does that make sense?
> >
> > Daniel Ruiz wrote:
> >>
> >> Hi All,
> >>
> >> I am having an issue where column fetches are taking over a minute on
> >> 1.6.3. I don’t believe this should be case and my experience in the past
> >> supports the idea that fetches should be very fast.
> >>
> >> For example we doing a scan on the table gives results instantly but
> >> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
> >> (plus or minus 1 second).
> >>
> >> Figure 1.1. Generated Test Data on GUIDIndexTable
> >>
> >> Here is the table config
> >>
> >>
> >>
> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >>
> >> SCOPE | NAME | VALUE
> >>
> >>
> >>
> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >>
> >> default | table.balancer ..............................................
> >> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
> >>
> >> default | table.bloom.enabled .........................................
> >> | false
> >>
> >> default | table.bloom.error.rate ......................................
> >> | 0.5%
> >>
> >> default | table.bloom.hash.type .......................................
> >> | murmur
> >>
> >> default | table.bloom.key.functor .....................................
> >> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
> >>
> >> default | table.bloom.load.threshold ..................................
> |
> >> 1
> >>
> >> default | table.bloom.size ............................................
> >> | 1048576
> >>
> >> default | table.cache.block.enable ....................................
> >> | false
> >>
> >> default | table.cache.index.enable ....................................
> >> | true
> >>
> >> default | table.classpath.context .....................................
> |
> >>
> >> default | table.compaction.major.everything.idle ......................
> |
> >> 1h
> >>
> >> default | table.compaction.major.ratio ................................
> |
> >> 3
> >>
> >> default | table.compaction.minor.idle .................................
> |
> >> 5m
> >>
> >> default | table.compaction.minor.logs.threshold .......................
> |
> >> 3
> >>
> >> table | table.constraint.1 .......................................... |
> >> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
> >>
> >> default | table.failures.ignore .......................................
> >> | false
> >>
> >> default | table.file.blocksize ........................................
> |
> >> 0B
> >>
> >> default | table.file.compress.blocksize ...............................
> >> | 100K
> >>
> >> default | table.file.compress.blocksize.index .........................
> >> | 128K
> >>
> >> default | table.file.compress.type ....................................
> |
> >> gz
> >>
> >> default | table.file.max ..............................................
> |
> >> 15
> >>
> >> default | table.file.replication ......................................
> |
> >> 0
> >>
> >> default | table.file.type .............................................
> |
> >> rf
> >>
> >> default | table.formatter .............................................
> >> | org.apache.accumulo.core.util.format.DefaultFormatter
> >>
> >> default | table.groups.enabled ........................................
> |
> >>
> >> default | table.interepreter ..........................................
> >> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
> >>
> >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
> >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >>
> >> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >> 2592000000
> >>
> >> table | table.iterator.majc.vers .................................... |
> >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >>
> >> table | table.iterator.majc.vers.opt.maxVersions .................... |
> 1
> >>
> >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
> >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >>
> >> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >> 2592000000
> >>
> >> table | table.iterator.minc.vers .................................... |
> >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >>
> >> table | table.iterator.minc.vers.opt.maxVersions .................... |
> 1
> >>
> >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
> >> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
> >>
> >> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> >> 2592000000
> >>
> >> ---------------------------------------------------------- hit any key
> >> to continue or 'q' to quit
> >> ----------------------------------------------------------
> >>
> >> table | table.iterator.scan.vers .................................... |
> >> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
> >>
> >> table | table.iterator.scan.vers.opt.maxVersions .................... |
> 1
> >>
> >> default | table.majc.compaction.strategy ..............................
> >> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
> >>
> >> default | table.scan.max.memory .......................................
> >> | 512K
> >>
> >> table | @override ................................................ | 1M
> >>
> >> default | table.security.scan.visibility.default ......................
> |
> >>
> >> default | table.split.threshold .......................................
> |
> >> 1G
> >>
> >> default | table.walog.enabled .........................................
> >> | true
> >>
> >>
> >>
> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
> >>
> >> More Table Info:
> >>
> >> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
> >>
> >>
> >>
> >> ONLINE
> >>
> >>
> >>
> >> 2
> >>
> >>
> >>
> >> 0
> >>
> >>
> >>
> >> 82.56M
> >>
> >>
> >>
> >> 810.00K
> >>
> >>
> >>
> >> 159
> >>
> >> Please let me know if I am doing something wrong to if there is more
> >> information you need.
> >>
> >> V/r,
> >>
> >> -Daniel
> >>
> >
>
>

RE: Fetch Taking Longer Than Expected

Posted by Daniel Ruiz <da...@gmail.com>.

Thanks...We landed up doing just that.  Correct having a bunch of random data does not fit well with locality groups.  I did have another question though you mentioned  a "small discrete set".  What would you consider small?  Would you recommend for example against having several thousand locality groups in a table?

V/r,
-Daniel
-----Original Message-----
From: Christopher [mailto:ctubbsii@apache.org] 
Sent: Wednesday, August 12, 2015 3:08 PM
To: Accumulo User List <us...@accumulo.apache.org>
Subject: Re: Fetch Taking Longer Than Expected

The schema shown above doesn't quite look like it's well-suited for
locality groups, though. The CF field looks like it's a composition of
an attribute name and that attribute's value. To take advantage of
locality groups with that schema, you'd have to have a locality group
for every attribute name/value combination, which would probably not
work well.

If you want to take advantage of locality groups, you'll probably want
to make your CFs a small, discrete set (like just attribute names).
So, if you push the attribute value into the CQ, you could at the very
least limit your search to the locality containing the particular
attribute name you are searching for.

If you really want efficient searches based on attribute name/value
combinations, you're going to want to put this up the row (at the
beginning of your row), so your data is ordered (indexed) by that. You
could do this in a secondary index (which could be in a different
table, a different segment of this table, or in a separate locality
group in this table).

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <jo...@gmail.com> wrote:
> Yup, that would be expected.
>
> Remember that doing `scan -c ...` is an unbounded search over your entire
> table. So, it takes approximately 3 minutes to read your GUIDIndexTable.
> Because you have a single locality group, all of the columns in your table
> are grouped together.
>
> One exercise that may be interesting for yourself is to create a locality
> group that has your specific column family in it, compact your
> GUIDIndexTable, and rerun your `scan -c` query. The speed should be similar
> to your exact scan. Removing the locality group and re-compacting the table
> should return the query time back to the slow 3 minutes.
>
> Does that make sense?
>
> Daniel Ruiz wrote:
>>
>> Hi All,
>>
>> I am having an issue where column fetches are taking over a minute on
>> 1.6.3. I don’t believe this should be case and my experience in the past
>> supports the idea that fetches should be very fast.
>>
>> For example we doing a scan on the table gives results instantly but
>> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
>> (plus or minus 1 second).
>>
>> Figure 1.1. Generated Test Data on GUIDIndexTable
>>
>> Here is the table config
>>
>>
>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> SCOPE | NAME | VALUE
>>
>>
>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> default | table.balancer ..............................................
>> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>>
>> default | table.bloom.enabled .........................................
>> | false
>>
>> default | table.bloom.error.rate ......................................
>> | 0.5%
>>
>> default | table.bloom.hash.type .......................................
>> | murmur
>>
>> default | table.bloom.key.functor .....................................
>> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>>
>> default | table.bloom.load.threshold .................................. |
>> 1
>>
>> default | table.bloom.size ............................................
>> | 1048576
>>
>> default | table.cache.block.enable ....................................
>> | false
>>
>> default | table.cache.index.enable ....................................
>> | true
>>
>> default | table.classpath.context ..................................... |
>>
>> default | table.compaction.major.everything.idle ...................... |
>> 1h
>>
>> default | table.compaction.major.ratio ................................ |
>> 3
>>
>> default | table.compaction.minor.idle ................................. |
>> 5m
>>
>> default | table.compaction.minor.logs.threshold ....................... |
>> 3
>>
>> table | table.constraint.1 .......................................... |
>> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>>
>> default | table.failures.ignore .......................................
>> | false
>>
>> default | table.file.blocksize ........................................ |
>> 0B
>>
>> default | table.file.compress.blocksize ...............................
>> | 100K
>>
>> default | table.file.compress.blocksize.index .........................
>> | 128K
>>
>> default | table.file.compress.type .................................... |
>> gz
>>
>> default | table.file.max .............................................. |
>> 15
>>
>> default | table.file.replication ...................................... |
>> 0
>>
>> default | table.file.type ............................................. |
>> rf
>>
>> default | table.formatter .............................................
>> | org.apache.accumulo.core.util.format.DefaultFormatter
>>
>> default | table.groups.enabled ........................................ |
>>
>> default | table.interepreter ..........................................
>> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>>
>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> table | table.iterator.majc.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.majc.vers.opt.maxVersions .................... | 1
>>
>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> table | table.iterator.minc.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.minc.vers.opt.maxVersions .................... | 1
>>
>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> ---------------------------------------------------------- hit any key
>> to continue or 'q' to quit
>> ----------------------------------------------------------
>>
>> table | table.iterator.scan.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.scan.vers.opt.maxVersions .................... | 1
>>
>> default | table.majc.compaction.strategy ..............................
>> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>>
>> default | table.scan.max.memory .......................................
>> | 512K
>>
>> table | @override ................................................ | 1M
>>
>> default | table.security.scan.visibility.default ...................... |
>>
>> default | table.split.threshold ....................................... |
>> 1G
>>
>> default | table.walog.enabled .........................................
>> | true
>>
>>
>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> More Table Info:
>>
>> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
>>
>>
>>
>> ONLINE
>>
>>
>>
>> 2
>>
>>
>>
>> 0
>>
>>
>>
>> 82.56M
>>
>>
>>
>> 810.00K
>>
>>
>>
>> 159
>>
>> Please let me know if I am doing something wrong to if there is more
>> information you need.
>>
>> V/r,
>>
>> -Daniel
>>
>

Re: Fetch Taking Longer Than Expected

Posted by Josh Elser <jo...@gmail.com>.

No, I was not recommending locality groups as a solution to the problem, 
but using them to illustrate why the query was taking a long time.

do() and observe slow
change config
do() and observe fast

I was not completely clear that I was not recommending use of locality 
groups as a solution to slow scans. The solution is to not do an 
unbounded `scan -c` and expect it to be fast.

Christopher wrote:
> The schema shown above doesn't quite look like it's well-suited for
> locality groups, though. The CF field looks like it's a composition of
> an attribute name and that attribute's value. To take advantage of
> locality groups with that schema, you'd have to have a locality group
> for every attribute name/value combination, which would probably not
> work well.
>
> If you want to take advantage of locality groups, you'll probably want
> to make your CFs a small, discrete set (like just attribute names).
> So, if you push the attribute value into the CQ, you could at the very
> least limit your search to the locality containing the particular
> attribute name you are searching for.
>
> If you really want efficient searches based on attribute name/value
> combinations, you're going to want to put this up the row (at the
> beginning of your row), so your data is ordered (indexed) by that. You
> could do this in a secondary index (which could be in a different
> table, a different segment of this table, or in a separate locality
> group in this table).
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser<jo...@gmail.com>  wrote:
>> Yup, that would be expected.
>>
>> Remember that doing `scan -c ...` is an unbounded search over your entire
>> table. So, it takes approximately 3 minutes to read your GUIDIndexTable.
>> Because you have a single locality group, all of the columns in your table
>> are grouped together.
>>
>> One exercise that may be interesting for yourself is to create a locality
>> group that has your specific column family in it, compact your
>> GUIDIndexTable, and rerun your `scan -c` query. The speed should be similar
>> to your exact scan. Removing the locality group and re-compacting the table
>> should return the query time back to the slow 3 minutes.
>>
>> Does that make sense?
>>
>> Daniel Ruiz wrote:
>>> Hi All,
>>>
>>> I am having an issue where column fetches are taking over a minute on
>>> 1.6.3. I don’t believe this should be case and my experience in the past
>>> supports the idea that fetches should be very fast.
>>>
>>> For example we doing a scan on the table gives results instantly but
>>> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
>>> (plus or minus 1 second).
>>>
>>> Figure 1.1. Generated Test Data on GUIDIndexTable
>>>
>>> Here is the table config
>>>
>>>
>>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>>
>>> SCOPE | NAME | VALUE
>>>
>>>
>>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>>
>>> default | table.balancer ..............................................
>>> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>>>
>>> default | table.bloom.enabled .........................................
>>> | false
>>>
>>> default | table.bloom.error.rate ......................................
>>> | 0.5%
>>>
>>> default | table.bloom.hash.type .......................................
>>> | murmur
>>>
>>> default | table.bloom.key.functor .....................................
>>> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>>>
>>> default | table.bloom.load.threshold .................................. |
>>> 1
>>>
>>> default | table.bloom.size ............................................
>>> | 1048576
>>>
>>> default | table.cache.block.enable ....................................
>>> | false
>>>
>>> default | table.cache.index.enable ....................................
>>> | true
>>>
>>> default | table.classpath.context ..................................... |
>>>
>>> default | table.compaction.major.everything.idle ...................... |
>>> 1h
>>>
>>> default | table.compaction.major.ratio ................................ |
>>> 3
>>>
>>> default | table.compaction.minor.idle ................................. |
>>> 5m
>>>
>>> default | table.compaction.minor.logs.threshold ....................... |
>>> 3
>>>
>>> table | table.constraint.1 .......................................... |
>>> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>>>
>>> default | table.failures.ignore .......................................
>>> | false
>>>
>>> default | table.file.blocksize ........................................ |
>>> 0B
>>>
>>> default | table.file.compress.blocksize ...............................
>>> | 100K
>>>
>>> default | table.file.compress.blocksize.index .........................
>>> | 128K
>>>
>>> default | table.file.compress.type .................................... |
>>> gz
>>>
>>> default | table.file.max .............................................. |
>>> 15
>>>
>>> default | table.file.replication ...................................... |
>>> 0
>>>
>>> default | table.file.type ............................................. |
>>> rf
>>>
>>> default | table.formatter .............................................
>>> | org.apache.accumulo.core.util.format.DefaultFormatter
>>>
>>> default | table.groups.enabled ........................................ |
>>>
>>> default | table.interepreter ..........................................
>>> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>>>
>>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
>>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>>
>>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>>> 2592000000
>>>
>>> table | table.iterator.majc.vers .................................... |
>>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>>
>>> table | table.iterator.majc.vers.opt.maxVersions .................... | 1
>>>
>>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
>>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>>
>>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>>> 2592000000
>>>
>>> table | table.iterator.minc.vers .................................... |
>>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>>
>>> table | table.iterator.minc.vers.opt.maxVersions .................... | 1
>>>
>>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
>>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>>
>>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>>> 2592000000
>>>
>>> ---------------------------------------------------------- hit any key
>>> to continue or 'q' to quit
>>> ----------------------------------------------------------
>>>
>>> table | table.iterator.scan.vers .................................... |
>>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>>
>>> table | table.iterator.scan.vers.opt.maxVersions .................... | 1
>>>
>>> default | table.majc.compaction.strategy ..............................
>>> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>>>
>>> default | table.scan.max.memory .......................................
>>> | 512K
>>>
>>> table | @override ................................................ | 1M
>>>
>>> default | table.security.scan.visibility.default ...................... |
>>>
>>> default | table.split.threshold ....................................... |
>>> 1G
>>>
>>> default | table.walog.enabled .........................................
>>> | true
>>>
>>>
>>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>>
>>> More Table Info:
>>>
>>> GUIDIndexTable<http://107.23.12.24:50095/tables?t=f>
>>>
>>>
>>>
>>> ONLINE
>>>
>>>
>>>
>>> 2
>>>
>>>
>>>
>>> 0
>>>
>>>
>>>
>>> 82.56M
>>>
>>>
>>>
>>> 810.00K
>>>
>>>
>>>
>>> 159
>>>
>>> Please let me know if I am doing something wrong to if there is more
>>> information you need.
>>>
>>> V/r,
>>>
>>> -Daniel
>>>

Re: Fetch Taking Longer Than Expected

Posted by Christopher <ct...@apache.org>.

The schema shown above doesn't quite look like it's well-suited for
locality groups, though. The CF field looks like it's a composition of
an attribute name and that attribute's value. To take advantage of
locality groups with that schema, you'd have to have a locality group
for every attribute name/value combination, which would probably not
work well.

If you want to take advantage of locality groups, you'll probably want
to make your CFs a small, discrete set (like just attribute names).
So, if you push the attribute value into the CQ, you could at the very
least limit your search to the locality containing the particular
attribute name you are searching for.

If you really want efficient searches based on attribute name/value
combinations, you're going to want to put this up the row (at the
beginning of your row), so your data is ordered (indexed) by that. You
could do this in a secondary index (which could be in a different
table, a different segment of this table, or in a separate locality
group in this table).

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Wed, Aug 12, 2015 at 11:20 AM, Josh Elser <jo...@gmail.com> wrote:
> Yup, that would be expected.
>
> Remember that doing `scan -c ...` is an unbounded search over your entire
> table. So, it takes approximately 3 minutes to read your GUIDIndexTable.
> Because you have a single locality group, all of the columns in your table
> are grouped together.
>
> One exercise that may be interesting for yourself is to create a locality
> group that has your specific column family in it, compact your
> GUIDIndexTable, and rerun your `scan -c` query. The speed should be similar
> to your exact scan. Removing the locality group and re-compacting the table
> should return the query time back to the slow 3 minutes.
>
> Does that make sense?
>
> Daniel Ruiz wrote:
>>
>> Hi All,
>>
>> I am having an issue where column fetches are taking over a minute on
>> 1.6.3. I don’t believe this should be case and my experience in the past
>> supports the idea that fetches should be very fast.
>>
>> For example we doing a scan on the table gives results instantly but
>> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
>> (plus or minus 1 second).
>>
>> Figure 1.1. Generated Test Data on GUIDIndexTable
>>
>> Here is the table config
>>
>>
>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> SCOPE | NAME | VALUE
>>
>>
>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> default | table.balancer ..............................................
>> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>>
>> default | table.bloom.enabled .........................................
>> | false
>>
>> default | table.bloom.error.rate ......................................
>> | 0.5%
>>
>> default | table.bloom.hash.type .......................................
>> | murmur
>>
>> default | table.bloom.key.functor .....................................
>> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>>
>> default | table.bloom.load.threshold .................................. |
>> 1
>>
>> default | table.bloom.size ............................................
>> | 1048576
>>
>> default | table.cache.block.enable ....................................
>> | false
>>
>> default | table.cache.index.enable ....................................
>> | true
>>
>> default | table.classpath.context ..................................... |
>>
>> default | table.compaction.major.everything.idle ...................... |
>> 1h
>>
>> default | table.compaction.major.ratio ................................ |
>> 3
>>
>> default | table.compaction.minor.idle ................................. |
>> 5m
>>
>> default | table.compaction.minor.logs.threshold ....................... |
>> 3
>>
>> table | table.constraint.1 .......................................... |
>> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>>
>> default | table.failures.ignore .......................................
>> | false
>>
>> default | table.file.blocksize ........................................ |
>> 0B
>>
>> default | table.file.compress.blocksize ...............................
>> | 100K
>>
>> default | table.file.compress.blocksize.index .........................
>> | 128K
>>
>> default | table.file.compress.type .................................... |
>> gz
>>
>> default | table.file.max .............................................. |
>> 15
>>
>> default | table.file.replication ...................................... |
>> 0
>>
>> default | table.file.type ............................................. |
>> rf
>>
>> default | table.formatter .............................................
>> | org.apache.accumulo.core.util.format.DefaultFormatter
>>
>> default | table.groups.enabled ........................................ |
>>
>> default | table.interepreter ..........................................
>> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>>
>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> table | table.iterator.majc.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.majc.vers.opt.maxVersions .................... | 1
>>
>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> table | table.iterator.minc.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.minc.vers.opt.maxVersions .................... | 1
>>
>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> ---------------------------------------------------------- hit any key
>> to continue or 'q' to quit
>> ----------------------------------------------------------
>>
>> table | table.iterator.scan.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.scan.vers.opt.maxVersions .................... | 1
>>
>> default | table.majc.compaction.strategy ..............................
>> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>>
>> default | table.scan.max.memory .......................................
>> | 512K
>>
>> table | @override ................................................ | 1M
>>
>> default | table.security.scan.visibility.default ...................... |
>>
>> default | table.split.threshold ....................................... |
>> 1G
>>
>> default | table.walog.enabled .........................................
>> | true
>>
>>
>> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>>
>> More Table Info:
>>
>> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
>>
>>
>>
>> ONLINE
>>
>>
>>
>> 2
>>
>>
>>
>> 0
>>
>>
>>
>> 82.56M
>>
>>
>>
>> 810.00K
>>
>>
>>
>> 159
>>
>> Please let me know if I am doing something wrong to if there is more
>> information you need.
>>
>> V/r,
>>
>> -Daniel
>>
>

Re: Fetch Taking Longer Than Expected

Posted by Josh Elser <jo...@gmail.com>.

Looks like we have a SetGroupsCommand (setgroups) which is analogous to 
the Java API in TableOperations, but we don't have those nice add/remove 
commands for locality groups (likely as they're harder to implement 
atomically).

Feel free to log a JIRA if you'd like to see add/remove instead of set. 
We can at least have the conversation on the matter -- there may be a 
reason I'm not aware of :)

Daniel Ruiz wrote:
> Thanks for the quick response.  It does make sense, although it would be
> nice to have an addgroup and delgroup (not sure if that exists in 1.7).
> This was extremely helpful, thanks again!
>
> V/r,
> -Daniel
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.elser@gmail.com]
> Sent: Wednesday, August 12, 2015 9:20 AM
> To: user@accumulo.apache.org
> Subject: Re: Fetch Taking Longer Than Expected
>
> Yup, that would be expected.
>
> Remember that doing `scan -c ...` is an unbounded search over your
> entire table. So, it takes approximately 3 minutes to read your
> GUIDIndexTable. Because you have a single locality group, all of the
> columns in your table are grouped together.
>
> One exercise that may be interesting for yourself is to create a
> locality group that has your specific column family in it, compact your
> GUIDIndexTable, and rerun your `scan -c` query. The speed should be
> similar to your exact scan. Removing the locality group and
> re-compacting the table should return the query time back to the slow 3
> minutes.
>
> Does that make sense?
>
> Daniel Ruiz wrote:
>> Hi All,
>>
>> I am having an issue where column fetches are taking over a minute on
>> 1.6.3. I don't believe this should be case and my experience in the past
>> supports the idea that fetches should be very fast.
>>
>> For example we doing a scan on the table gives results instantly but
>> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
>> (plus or minus 1 second).
>>
>> Figure 1.1. Generated Test Data on GUIDIndexTable
>>
>> Here is the table config
>>
>>
> -----------+---------------------------------------------------------------+
> ----------------------------------------------------------------------------
> -----
>> SCOPE | NAME | VALUE
>>
>>
> -----------+---------------------------------------------------------------+
> ----------------------------------------------------------------------------
> -----
>> default | table.balancer ..............................................
>> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>>
>> default | table.bloom.enabled .........................................
>> | false
>>
>> default | table.bloom.error.rate ......................................
>> | 0.5%
>>
>> default | table.bloom.hash.type .......................................
>> | murmur
>>
>> default | table.bloom.key.functor .....................................
>> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>>
>> default | table.bloom.load.threshold .................................. |
> 1
>> default | table.bloom.size ............................................
>> | 1048576
>>
>> default | table.cache.block.enable ....................................
>> | false
>>
>> default | table.cache.index.enable ....................................
>> | true
>>
>> default | table.classpath.context ..................................... |
>>
>> default | table.compaction.major.everything.idle ...................... |
> 1h
>> default | table.compaction.major.ratio ................................ |
> 3
>> default | table.compaction.minor.idle ................................. |
> 5m
>> default | table.compaction.minor.logs.threshold ....................... |
> 3
>> table | table.constraint.1 .......................................... |
>> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>>
>> default | table.failures.ignore .......................................
>> | false
>>
>> default | table.file.blocksize ........................................ |
> 0B
>> default | table.file.compress.blocksize ...............................
>> | 100K
>>
>> default | table.file.compress.blocksize.index .........................
>> | 128K
>>
>> default | table.file.compress.type .................................... |
> gz
>> default | table.file.max .............................................. |
> 15
>> default | table.file.replication ...................................... |
> 0
>> default | table.file.type ............................................. |
> rf
>> default | table.formatter .............................................
>> | org.apache.accumulo.core.util.format.DefaultFormatter
>>
>> default | table.groups.enabled ........................................ |
>>
>> default | table.interepreter ..........................................
>> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>>
>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> table | table.iterator.majc.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.majc.vers.opt.maxVersions .................... | 1
>>
>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> table | table.iterator.minc.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.minc.vers.opt.maxVersions .................... | 1
>>
>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
>> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>>
>> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
>> 2592000000
>>
>> ---------------------------------------------------------- hit any key
>> to continue or 'q' to quit
>> ----------------------------------------------------------
>>
>> table | table.iterator.scan.vers .................................... |
>> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>>
>> table | table.iterator.scan.vers.opt.maxVersions .................... | 1
>>
>> default | table.majc.compaction.strategy ..............................
>> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>>
>> default | table.scan.max.memory .......................................
>> | 512K
>>
>> table | @override ................................................ | 1M
>>
>> default | table.security.scan.visibility.default ...................... |
>>
>> default | table.split.threshold ....................................... |
> 1G
>> default | table.walog.enabled .........................................
>> | true
>>
>>
> -----------+---------------------------------------------------------------+
> ----------------------------------------------------------------------------
> -----
>> More Table Info:
>>
>> GUIDIndexTable<http://107.23.12.24:50095/tables?t=f>
>>
>> 	
>>
>> ONLINE
>>
>> 	
>>
>> 2
>>
>> 	
>>
>> 0
>>
>> 	
>>
>> 82.56M
>>
>> 	
>>
>> 810.00K
>>
>> 	
>>
>> 159
>>
>> Please let me know if I am doing something wrong to if there is more
>> information you need.
>>
>> V/r,
>>
>> -Daniel
>>
>

RE: Fetch Taking Longer Than Expected

Posted by Daniel Ruiz <da...@gmail.com>.

Thanks for the quick response.  It does make sense, although it would be
nice to have an addgroup and delgroup (not sure if that exists in 1.7).
This was extremely helpful, thanks again!

V/r,
-Daniel

-----Original Message-----
From: Josh Elser [mailto:josh.elser@gmail.com] 
Sent: Wednesday, August 12, 2015 9:20 AM
To: user@accumulo.apache.org
Subject: Re: Fetch Taking Longer Than Expected

Yup, that would be expected.

Remember that doing `scan -c ...` is an unbounded search over your 
entire table. So, it takes approximately 3 minutes to read your 
GUIDIndexTable. Because you have a single locality group, all of the 
columns in your table are grouped together.

One exercise that may be interesting for yourself is to create a 
locality group that has your specific column family in it, compact your 
GUIDIndexTable, and rerun your `scan -c` query. The speed should be 
similar to your exact scan. Removing the locality group and 
re-compacting the table should return the query time back to the slow 3 
minutes.

Does that make sense?

Daniel Ruiz wrote:
> Hi All,
>
> I am having an issue where column fetches are taking over a minute on
> 1.6.3. I don't believe this should be case and my experience in the past
> supports the idea that fetches should be very fast.
>
> For example we doing a scan on the table gives results instantly but
> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
> (plus or minus 1 second).
>
> Figure 1.1. Generated Test Data on GUIDIndexTable
>
> Here is the table config
>
>
-----------+---------------------------------------------------------------+
----------------------------------------------------------------------------
-----
>
> SCOPE | NAME | VALUE
>
>
-----------+---------------------------------------------------------------+
----------------------------------------------------------------------------
-----
>
> default | table.balancer ..............................................
> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>
> default | table.bloom.enabled .........................................
> | false
>
> default | table.bloom.error.rate ......................................
> | 0.5%
>
> default | table.bloom.hash.type .......................................
> | murmur
>
> default | table.bloom.key.functor .....................................
> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>
> default | table.bloom.load.threshold .................................. |
1
>
> default | table.bloom.size ............................................
> | 1048576
>
> default | table.cache.block.enable ....................................
> | false
>
> default | table.cache.index.enable ....................................
> | true
>
> default | table.classpath.context ..................................... |
>
> default | table.compaction.major.everything.idle ...................... |
1h
>
> default | table.compaction.major.ratio ................................ |
3
>
> default | table.compaction.minor.idle ................................. |
5m
>
> default | table.compaction.minor.logs.threshold ....................... |
3
>
> table | table.constraint.1 .......................................... |
> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>
> default | table.failures.ignore .......................................
> | false
>
> default | table.file.blocksize ........................................ |
0B
>
> default | table.file.compress.blocksize ...............................
> | 100K
>
> default | table.file.compress.blocksize.index .........................
> | 128K
>
> default | table.file.compress.type .................................... |
gz
>
> default | table.file.max .............................................. |
15
>
> default | table.file.replication ...................................... |
0
>
> default | table.file.type ............................................. |
rf
>
> default | table.formatter .............................................
> | org.apache.accumulo.core.util.format.DefaultFormatter
>
> default | table.groups.enabled ........................................ |
>
> default | table.interepreter ..........................................
> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>
> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>
> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> 2592000000
>
> table | table.iterator.majc.vers .................................... |
> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>
> table | table.iterator.majc.vers.opt.maxVersions .................... | 1
>
> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>
> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> 2592000000
>
> table | table.iterator.minc.vers .................................... |
> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>
> table | table.iterator.minc.vers.opt.maxVersions .................... | 1
>
> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>
> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> 2592000000
>
> ---------------------------------------------------------- hit any key
> to continue or 'q' to quit
> ----------------------------------------------------------
>
> table | table.iterator.scan.vers .................................... |
> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>
> table | table.iterator.scan.vers.opt.maxVersions .................... | 1
>
> default | table.majc.compaction.strategy ..............................
> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>
> default | table.scan.max.memory .......................................
> | 512K
>
> table | @override ................................................ | 1M
>
> default | table.security.scan.visibility.default ...................... |
>
> default | table.split.threshold ....................................... |
1G
>
> default | table.walog.enabled .........................................
> | true
>
>
-----------+---------------------------------------------------------------+
----------------------------------------------------------------------------
-----
>
> More Table Info:
>
> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
>
> 	
>
> ONLINE
>
> 	
>
> 2
>
> 	
>
> 0
>
> 	
>
> 82.56M
>
> 	
>
> 810.00K
>
> 	
>
> 159
>
> Please let me know if I am doing something wrong to if there is more
> information you need.
>
> V/r,
>
> -Daniel
>

Re: Fetch Taking Longer Than Expected

Posted by Josh Elser <jo...@gmail.com>.

Yup, that would be expected.

Remember that doing `scan -c ...` is an unbounded search over your 
entire table. So, it takes approximately 3 minutes to read your 
GUIDIndexTable. Because you have a single locality group, all of the 
columns in your table are grouped together.

One exercise that may be interesting for yourself is to create a 
locality group that has your specific column family in it, compact your 
GUIDIndexTable, and rerun your `scan -c` query. The speed should be 
similar to your exact scan. Removing the locality group and 
re-compacting the table should return the query time back to the slow 3 
minutes.

Does that make sense?

Daniel Ruiz wrote:
> Hi All,
>
> I am having an issue where column fetches are taking over a minute on
> 1.6.3. I don’t believe this should be case and my experience in the past
> supports the idea that fetches should be very fast.
>
> For example we doing a scan on the table gives results instantly but
> doing a scan -c vesselmmsitext=2706758566 takes 2 minutes and 44 seconds
> (plus or minus 1 second).
>
> Figure 1.1. Generated Test Data on GUIDIndexTable
>
> Here is the table config
>
> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>
> SCOPE | NAME | VALUE
>
> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>
> default | table.balancer ..............................................
> | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
>
> default | table.bloom.enabled .........................................
> | false
>
> default | table.bloom.error.rate ......................................
> | 0.5%
>
> default | table.bloom.hash.type .......................................
> | murmur
>
> default | table.bloom.key.functor .....................................
> | org.apache.accumulo.core.file.keyfunctor.RowFunctor
>
> default | table.bloom.load.threshold .................................. | 1
>
> default | table.bloom.size ............................................
> | 1048576
>
> default | table.cache.block.enable ....................................
> | false
>
> default | table.cache.index.enable ....................................
> | true
>
> default | table.classpath.context ..................................... |
>
> default | table.compaction.major.everything.idle ...................... | 1h
>
> default | table.compaction.major.ratio ................................ | 3
>
> default | table.compaction.minor.idle ................................. | 5m
>
> default | table.compaction.minor.logs.threshold ....................... | 3
>
> table | table.constraint.1 .......................................... |
> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>
> default | table.failures.ignore .......................................
> | false
>
> default | table.file.blocksize ........................................ | 0B
>
> default | table.file.compress.blocksize ...............................
> | 100K
>
> default | table.file.compress.blocksize.index .........................
> | 128K
>
> default | table.file.compress.type .................................... | gz
>
> default | table.file.max .............................................. | 15
>
> default | table.file.replication ...................................... | 0
>
> default | table.file.type ............................................. | rf
>
> default | table.formatter .............................................
> | org.apache.accumulo.core.util.format.DefaultFormatter
>
> default | table.groups.enabled ........................................ |
>
> default | table.interepreter ..........................................
> | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
>
> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable .......... |
> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>
> table | table.iterator.majc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> 2592000000
>
> table | table.iterator.majc.vers .................................... |
> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>
> table | table.iterator.majc.vers.opt.maxVersions .................... | 1
>
> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable .......... |
> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>
> table | table.iterator.minc.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> 2592000000
>
> table | table.iterator.minc.vers .................................... |
> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>
> table | table.iterator.minc.vers.opt.maxVersions .................... | 1
>
> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable .......... |
> 1,org.apache.accumulo.core.iterators.user.AgeOffFilter
>
> table | table.iterator.scan.AgeOffIterator##GUIDIndexTable.opt.ttl .. |
> 2592000000
>
> ---------------------------------------------------------- hit any key
> to continue or 'q' to quit
> ----------------------------------------------------------
>
> table | table.iterator.scan.vers .................................... |
> 20,org.apache.accumulo.core.iterators.user.VersioningIterator
>
> table | table.iterator.scan.vers.opt.maxVersions .................... | 1
>
> default | table.majc.compaction.strategy ..............................
> | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
>
> default | table.scan.max.memory .......................................
> | 512K
>
> table | @override ................................................ | 1M
>
> default | table.security.scan.visibility.default ...................... |
>
> default | table.split.threshold ....................................... | 1G
>
> default | table.walog.enabled .........................................
> | true
>
> -----------+---------------------------------------------------------------+---------------------------------------------------------------------------------
>
> More Table Info:
>
> GUIDIndexTable <http://107.23.12.24:50095/tables?t=f>
>
> 	
>
> ONLINE
>
> 	
>
> 2
>
> 	
>
> 0
>
> 	
>
> 82.56M
>
> 	
>
> 810.00K
>
> 	
>
> 159
>
> Please let me know if I am doing something wrong to if there is more
> information you need.
>
> V/r,
>
> -Daniel
>