You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by arorav <ar...@uw.edu> on 2014/10/24 19:35:29 UTC

New to Drill

Hi All

I am new to Drill and have few questions:


1.       Drill is an incubation project, Is anyone using for production applications?

2.       Data repository recommendation:  I have the source data as relational and want to perform complex adhoc queries involving joins and aggregates. Any recommendation on the data repository for better performance.

3.       Encrypted Data:  Does drill works against encrypted data? Any documentation around it would be helpful?

4.       Concurrent Queries: As I expect 100s of users running against the drill query engine. Is there any limitation on number of queries running against drill?




All help would be appreciated.

Thanks


Vijay Arora


Re: New to Drill

Posted by Jason Altekruse <al...@gmail.com>.
Hi Vijay,

Welcome to the Drill community! The questions you have are common, answers
below are inline

1.       Drill is an incubation project, Is anyone using for production
applications?

There are a number of organization that are starting to deploy Drill for
evaluation purposes. These deployments are querying large amounts of real
data, but as far as we know there are no real production deployments today.
Large portions of Drill are well tested, but we are still working on
remaining bugs as we continue to work towards a stable 1.0 release.

2.       Data repository recommendation:  I have the source data as
relational and want to perform complex adhoc queries involving joins and
aggregates. Any recommendation on the data repository for better
performance.

The fastest format we support currently is Parquet a columnar file format
currently under incubation in Apache, we support JSON, delimited text and
any format with a Hive SerDe available, but this read path had not been
optimized as much.

3.       Encrypted Data:  Does drill works against encrypted data? Any
documentation around it would be helpful?

As far as I know there is no support for encrypted data.

4.       Concurrent Queries: As I expect 100s of users running against the
drill query engine. Is there any limitation on number of queries running
against drill?

There is no strict limit on number of users or queries that can be run at
any time. Drill's architecture is designed to be highly scalable. There is
no single bottleneck to limit the number of concurrent connections, as any
node in a Drill cluster can act as the head node for a query. Different
clients can connect to different nodes to spread the query planning burden
throughout the cluster. Obviously the physical operators are also spread
around the cluster, and we are actively working on better management of
memory for individual fragments of execution plans to allow for more
concurrent queries to run with limited resources.

On Fri, Oct 24, 2014 at 10:56 AM, arorav <ar...@uw.edu> wrote:

> Hi All
>
> I am new to Drill and have few questions:
>
>
> 1.       Drill is an incubation project, Is anyone using for production
> applications?
>
> 2.       Data repository recommendation:  I have the source data as
> relational and want to perform complex adhoc queries involving joins and
> aggregates. Any recommendation on the data repository for better
> performance.
>
> 3.       Encrypted Data:  Does drill works against encrypted data? Any
> documentation around it would be helpful?
>
> 4.       Concurrent Queries: As I expect 100s of users running against the
> drill query engine. Is there any limitation on number of queries running
> against drill?
>
>
>
>
> All help would be appreciated.
>
> Thanks
>
>
> Vijay Arora
>
>

New to Drill

Posted by arorav <ar...@uw.edu>.
Hi All

I am new to Drill and have few questions:


1.       Drill is an incubation project, Is anyone using for production applications?

2.       Data repository recommendation:  I have the source data as relational and want to perform complex adhoc queries involving joins and aggregates. Any recommendation on the data repository for better performance.

3.       Encrypted Data:  Does drill works against encrypted data? Any documentation around it would be helpful?

4.       Concurrent Queries: As I expect 100s of users running against the drill query engine. Is there any limitation on number of queries running against drill?




All help would be appreciated.

Thanks


Vijay Arora


Re: New to Drill

Posted by Ted Dunning <te...@gmail.com>.
On Fri, Oct 24, 2014 at 10:35 AM, arorav <ar...@uw.edu> wrote:

> 1.       Drill is an incubation project, Is anyone using for production
> applications?
>

Any such production work would be very early.  I don't know for sure of any
such work, but I have heard of several projects that were near production
status.

This should change significantly over the next  quarter or so.

2.       Data repository recommendation:  I have the source data as
> relational and want to perform complex adhoc queries involving joins and
> aggregates. Any recommendation on the data repository for better
> performance.
>

That depends a lot on the sizes of what you are working on.

If you add more details, it would be easier to help, but the basics still
apply.  If broadcast joins are practical, then storing everything in flat
files is probably the best since you will just be scanning the fact table.


>
> 3.       Encrypted Data:  Does drill works against encrypted data? Any
> documentation around it would be helpful?
>

Drill has no specific capabilities to support encryption.  If encryption
means data at rest, drill will not care.  If encryption means that you are
using consistent masking, then Drill may work without ever knowing what the
data is.

At this time, Drill does not encrypt wire transmissions.  If you care
enough to encrypt, you may care about this.  That will change before long.


>
> 4.       Concurrent Queries: As I expect 100s of users running against the
> drill query engine. Is there any limitation on number of queries running
> against drill?
>

There is no specific limit, but there will be practical limits.  I don't
that the boundaries are well tested here yet.