You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Drew Pierce <dr...@live.com> on 2013/05/03 18:39:58 UTC

peformance

does anyone have any anecdotal results (nothing formal) for queries to speak to the likes of impala and near low-latency.
Sent from my Android

Sorry if brief

 		 	   		  

RE: performance

Posted by Drew Pierce <dr...@live.com>.
true David. was just looking for any feedback. thx btw for the work on the stackscript.

Sent from my Android

Sorry if brief


> Date: Fri, 3 May 2013 16:53:59 -0400
> Subject: Re: performance
> From: david.medinets@gmail.com
> To: dev@accumulo.apache.org
> 
> The performance characteristics of Accumulo should be assumed to be
> awe-inspiring. Actual performance depends on too many factors for any given
> test to be meaningful to you. You won't have the same hardware. You won't
> have the same network. You won't have the same data. Your data format will
> be different. Your query patterns will be different. The list of
> differences go on and on.
 		 	   		  

Re: performance

Posted by David Medinets <da...@gmail.com>.
The performance characteristics of Accumulo should be assumed to be
awe-inspiring. Actual performance depends on too many factors for any given
test to be meaningful to you. You won't have the same hardware. You won't
have the same network. You won't have the same data. Your data format will
be different. Your query patterns will be different. The list of
differences go on and on.

RE: performance

Posted by Drew Pierce <dr...@live.com>.
yeah, ya know, us consultants trying to implement something  :>

Sent from my Android

Sorry if brief


> From: dlmarion@comcast.net
> To: dev@accumulo.apache.org
> Date: Fri, 3 May 2013 16:08:34 -0400
> Subject: RE: performance
> 
> Pressure for adoption?
> 
> Sent from my Motorola ATRIX™ 4G on AT&T
> 
> -----Original message-----
> From: Drew Pierce <dr...@live.com>
> To: "dev@accumulo.apache.org" <de...@accumulo.apache.org>
> Sent: Fri, May 3, 2013 19:37:50 GMT+00:00
> Subject: RE: performance
> 
> problem is that pressure is mounting for adoption and GA of sqrrl is some time away. 
> thx
> 
> 
> > Date: Fri, 3 May 2013 14:36:50 -0400
> > Subject: Re: peformance
> > From: wilhelm.von.cloud@accumulo.net
> > To: dev@accumulo.apache.org
> > 
> > Does sqrrl provide an example framework to play around with?
> > 
> > 
> > On Fri, May 3, 2013 at 2:20 PM, Adam Fuchs <af...@apache.org> wrote:
> > 
> > > Hey Drew,
> > >
> > > This could be a very broad question, so I'll give a partial answer and
> > > encourage you to come back for more details.
> > >
> > > Impala is a mechanism that sits on top of HBase or HDFS that is design to
> > > filter and process large quantities of data. People generally like Impala
> > > because it supports a subset of SQL and because it is optimized to reduce
> > > the latency that might be incurred by starting up a job in a bulk
> > > synchronous processing framework. Instead, it uses a series of daemon
> > > processes and a custom API to reduce overhead.
> > >
> > > With Accumulo, our approach to low-latency queries is generally to use a
> > > table structure that incorporates some type of index. With appropriate
> > > indexing techniques, Accumulo can achieve sub-second query latencies even
> > > over multi-petabyte sized corpuses. Some of these table designs are
> > > described in the manual:
> > > http://accumulo.apache.org/1.4/user_manual/Table_Design.html
> > >
> > > Regarding the SQL piece, Accumulo does not natively support an SQL
> > > interface. For that you would need to wrap it in a processing framework,
> > > like Hive (https://issues.apache.org/jira/browse/ACCUMULO-143). To make a
> > > shameless plug, Sqrrl (www.sqrrl.com) also offers that functionality.
> > >
> > > Cheers,
> > > Adam
> > >
> > >
> > >
> > > On Fri, May 3, 2013 at 12:39 PM, Drew Pierce <dr...@live.com> wrote:
> > >
> > > > does anyone have any anecdotal results (nothing formal) for queries to
> > > > speak to the likes of impala and near low-latency.
> > > > Sent from my Android
> > > >
> > > > Sorry if brief
> > > >
> > > >
> > >
>  		 	   		  
 		 	   		  

RE: performance

Posted by Dave Marion <dl...@comcast.net>.
Pressure for adoption?

Sent from my Motorola ATRIX™ 4G on AT&T

-----Original message-----
From: Drew Pierce <dr...@live.com>
To: "dev@accumulo.apache.org" <de...@accumulo.apache.org>
Sent: Fri, May 3, 2013 19:37:50 GMT+00:00
Subject: RE: performance

problem is that pressure is mounting for adoption and GA of sqrrl is some time away. 
thx


> Date: Fri, 3 May 2013 14:36:50 -0400
> Subject: Re: peformance
> From: wilhelm.von.cloud@accumulo.net
> To: dev@accumulo.apache.org
> 
> Does sqrrl provide an example framework to play around with?
> 
> 
> On Fri, May 3, 2013 at 2:20 PM, Adam Fuchs <af...@apache.org> wrote:
> 
> > Hey Drew,
> >
> > This could be a very broad question, so I'll give a partial answer and
> > encourage you to come back for more details.
> >
> > Impala is a mechanism that sits on top of HBase or HDFS that is design to
> > filter and process large quantities of data. People generally like Impala
> > because it supports a subset of SQL and because it is optimized to reduce
> > the latency that might be incurred by starting up a job in a bulk
> > synchronous processing framework. Instead, it uses a series of daemon
> > processes and a custom API to reduce overhead.
> >
> > With Accumulo, our approach to low-latency queries is generally to use a
> > table structure that incorporates some type of index. With appropriate
> > indexing techniques, Accumulo can achieve sub-second query latencies even
> > over multi-petabyte sized corpuses. Some of these table designs are
> > described in the manual:
> > http://accumulo.apache.org/1.4/user_manual/Table_Design.html
> >
> > Regarding the SQL piece, Accumulo does not natively support an SQL
> > interface. For that you would need to wrap it in a processing framework,
> > like Hive (https://issues.apache.org/jira/browse/ACCUMULO-143). To make a
> > shameless plug, Sqrrl (www.sqrrl.com) also offers that functionality.
> >
> > Cheers,
> > Adam
> >
> >
> >
> > On Fri, May 3, 2013 at 12:39 PM, Drew Pierce <dr...@live.com> wrote:
> >
> > > does anyone have any anecdotal results (nothing formal) for queries to
> > > speak to the likes of impala and near low-latency.
> > > Sent from my Android
> > >
> > > Sorry if brief
> > >
> > >
> >
 		 	   		  

RE: performance

Posted by Drew Pierce <dr...@live.com>.
problem is that pressure is mounting for adoption and GA of sqrrl is some time away. 
thx


> Date: Fri, 3 May 2013 14:36:50 -0400
> Subject: Re: peformance
> From: wilhelm.von.cloud@accumulo.net
> To: dev@accumulo.apache.org
> 
> Does sqrrl provide an example framework to play around with?
> 
> 
> On Fri, May 3, 2013 at 2:20 PM, Adam Fuchs <af...@apache.org> wrote:
> 
> > Hey Drew,
> >
> > This could be a very broad question, so I'll give a partial answer and
> > encourage you to come back for more details.
> >
> > Impala is a mechanism that sits on top of HBase or HDFS that is design to
> > filter and process large quantities of data. People generally like Impala
> > because it supports a subset of SQL and because it is optimized to reduce
> > the latency that might be incurred by starting up a job in a bulk
> > synchronous processing framework. Instead, it uses a series of daemon
> > processes and a custom API to reduce overhead.
> >
> > With Accumulo, our approach to low-latency queries is generally to use a
> > table structure that incorporates some type of index. With appropriate
> > indexing techniques, Accumulo can achieve sub-second query latencies even
> > over multi-petabyte sized corpuses. Some of these table designs are
> > described in the manual:
> > http://accumulo.apache.org/1.4/user_manual/Table_Design.html
> >
> > Regarding the SQL piece, Accumulo does not natively support an SQL
> > interface. For that you would need to wrap it in a processing framework,
> > like Hive (https://issues.apache.org/jira/browse/ACCUMULO-143). To make a
> > shameless plug, Sqrrl (www.sqrrl.com) also offers that functionality.
> >
> > Cheers,
> > Adam
> >
> >
> >
> > On Fri, May 3, 2013 at 12:39 PM, Drew Pierce <dr...@live.com> wrote:
> >
> > > does anyone have any anecdotal results (nothing formal) for queries to
> > > speak to the likes of impala and near low-latency.
> > > Sent from my Android
> > >
> > > Sorry if brief
> > >
> > >
> >
 		 	   		  

Re: peformance

Posted by William Slacum <wi...@accumulo.net>.
Does sqrrl provide an example framework to play around with?


On Fri, May 3, 2013 at 2:20 PM, Adam Fuchs <af...@apache.org> wrote:

> Hey Drew,
>
> This could be a very broad question, so I'll give a partial answer and
> encourage you to come back for more details.
>
> Impala is a mechanism that sits on top of HBase or HDFS that is design to
> filter and process large quantities of data. People generally like Impala
> because it supports a subset of SQL and because it is optimized to reduce
> the latency that might be incurred by starting up a job in a bulk
> synchronous processing framework. Instead, it uses a series of daemon
> processes and a custom API to reduce overhead.
>
> With Accumulo, our approach to low-latency queries is generally to use a
> table structure that incorporates some type of index. With appropriate
> indexing techniques, Accumulo can achieve sub-second query latencies even
> over multi-petabyte sized corpuses. Some of these table designs are
> described in the manual:
> http://accumulo.apache.org/1.4/user_manual/Table_Design.html
>
> Regarding the SQL piece, Accumulo does not natively support an SQL
> interface. For that you would need to wrap it in a processing framework,
> like Hive (https://issues.apache.org/jira/browse/ACCUMULO-143). To make a
> shameless plug, Sqrrl (www.sqrrl.com) also offers that functionality.
>
> Cheers,
> Adam
>
>
>
> On Fri, May 3, 2013 at 12:39 PM, Drew Pierce <dr...@live.com> wrote:
>
> > does anyone have any anecdotal results (nothing formal) for queries to
> > speak to the likes of impala and near low-latency.
> > Sent from my Android
> >
> > Sorry if brief
> >
> >
>

Re: peformance

Posted by Adam Fuchs <af...@apache.org>.
Hey Drew,

This could be a very broad question, so I'll give a partial answer and
encourage you to come back for more details.

Impala is a mechanism that sits on top of HBase or HDFS that is design to
filter and process large quantities of data. People generally like Impala
because it supports a subset of SQL and because it is optimized to reduce
the latency that might be incurred by starting up a job in a bulk
synchronous processing framework. Instead, it uses a series of daemon
processes and a custom API to reduce overhead.

With Accumulo, our approach to low-latency queries is generally to use a
table structure that incorporates some type of index. With appropriate
indexing techniques, Accumulo can achieve sub-second query latencies even
over multi-petabyte sized corpuses. Some of these table designs are
described in the manual:
http://accumulo.apache.org/1.4/user_manual/Table_Design.html

Regarding the SQL piece, Accumulo does not natively support an SQL
interface. For that you would need to wrap it in a processing framework,
like Hive (https://issues.apache.org/jira/browse/ACCUMULO-143). To make a
shameless plug, Sqrrl (www.sqrrl.com) also offers that functionality.

Cheers,
Adam



On Fri, May 3, 2013 at 12:39 PM, Drew Pierce <dr...@live.com> wrote:

> does anyone have any anecdotal results (nothing formal) for queries to
> speak to the likes of impala and near low-latency.
> Sent from my Android
>
> Sorry if brief
>
>

Re: peformance

Posted by Josh Elser <jo...@gmail.com>.
You could look at some of the results Eric wrote up about "realtime" 
searches over Wikipedia archives. This code is included with Accumulo 
source so you can take a look at it.

This was a long time ago, and the code could likely be improved, but it 
should give you some general ideas what an iterator framework can 
provide for low-latency results.

http://accumulo.apache.org/example/wikisearch.html

On 5/3/13 12:39 PM, Drew Pierce wrote:
> does anyone have any anecdotal results (nothing formal) for queries to speak to the likes of impala and near low-latency.
> Sent from my Android
>
> Sorry if brief
>
>