You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Suresh Prajapati <su...@gmail.com> on 2017/05/01 08:18:07 UTC
Re: Accumulo Table Sacanning Taking Time!!!

Hello Marc

Thanks for pointing out the area of problems. I tried changing
*table.scan.max.memory
*but didn't find any changes in performance.
I am trying to fetch matching records count for specified query by using
AccumuloDatastore(ds) stats. Here is my sample code:

public int getRideCount(Long rideId) throws Exception {

    if(rideId != null){

         return ((Long) (ds.stats().getCount(sft, CQL.toFilter("r=" + rideId),
true).get())).intValue();

    }

    return 0;

  }

I also tried using Iterator but this is even worst. Below is the sample
code:

public int getRideCount(Long rideId) throws Exception {

   int count = 0;

    if(rideId != null){

      Query q = new Query(tableName, CQL.toFilter("r=" + rideId));

      SimpleFeatureIterator it = sfs.getFeatures(q).features();

      while(it.hasNext()){

      it.next();

      count++;

      }

      it.close();

    }

    return count;

  }


For highlighting the *key structure*, here is my feature type description :


*r:Long:cardinality=high:index=join,*g:Point:srid=4326,di:Integer:index=join,al:Float,s:Float,b:Float,an:Float,he:Float,ve:Float,t:Float,m:Boolean,i:Boolean,ts:Long;geomesa.table.sharing='true',geomesa.indices='attr:4:3,records:2:3,z2:3:3',geomesa.table.sharing.prefix='\\u0001'*


Please feel free to ask for any further clarifications.

Thank You

Suresh Prajapati

On Thu, Apr 27, 2017 at 7:05 PM, Marc P. <ma...@gmail.com> wrote:

> Suresh,
>    There are a lot of configuration points that can have an impact. For
> example, there is a configuration option that dictates how much data is
> returned each "iteration," called table.scan.max.memory [0]. Increasing
> this will cause more work to be done in each RPC call to get data. Lowering
> this can have the illusion of improved response time since you get data
> faster. Playing with this might impact your use case. If your keys/values
> are large you might attempt to increase this configuration number.
>
> Further, scanning can be impacted by the size of the data and the way it is
> stored. Table block caching might have an improvement [1], but I'm curious
> about how the data is stored. Do you have example keys. Are you returning
> all 1 million records from Accumulo through the scanner to perform some
> logic client side or is the logic server side in an iterator? Could you do
> more work in an iterator? Iterating over 1 M keys likely won't take 2-3
> seconds when executed at the tablet server, depending on the size of the
> key. Providing some insight into what the key structure is might give us
> more insight into how to better configure your tablet server properties.
>
>    Finally, is the 2-3 seconds just the time to get the data or does that
> include time to inspect keys?
>
> [0]
> http://accumulo.apache.org/1.6/accumulo_user_manual#_table_scan_max_memory
> [1] http://accumulo.apache.org/1.6/accumulo_user_manual#_block_cache
>
> On Thu, Apr 27, 2017 at 7:09 AM, Suresh Prajapati <
> sureshpraja1234@gmail.com
> > wrote:
>
> > Hello Team
> >
> > I am developing a client in accumulo to store geo-spatial information and
> > using geomesa for indexing on top of it. However i found that scanning
> *~1
> > million* records taking *2-3 sec*. I looked at indexes and query plan of
> > geomesa but not able to find cause of the problem. I am running accumulo
> as
> > single tablet-server(including master). I want to know -
> > what are the factors can affect accumulo scanning operation? how can I
> > optimise this time?
> >
> > Thank You
> > Suresh Prajapati
> >
>