You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Sebastian Fischmeister <sf...@uwaterloo.ca> on 2020/11/22 17:07:09 UTC

Querying multiple s3 buckets

Hi,

I set up drill with min.io and have multiple buckets in minio. I was able to setup an s3 connection to a single bucket, however, I actually want to run a query on all files in all buckets. Is this possible?

Regards,
  Sebastian

Re: Querying multiple s3 buckets

Posted by Sebastian Fischmeister <sf...@uwaterloo.ca>.
But this means that (1) the names have to be known upfront and (2) the list of buckets remains static.

I'm looking for a method to dynamically search all buckets.

Now, I could use the REST API to dynamically create drill bits after listing all buckets, but pushes responsibility to the user. It would be nicer to specify in the connection a wildcard (e.g., "connection" = "s3a://foo*" or "s3a://*") and then drill internally search for matching buckets and then traverses through them.

  Sebastian

Nitin Pawar <ni...@gmail.com> writes:

> you can create multiple storages with different names to each bucker and
> query them in a single query
>
> On Sun, Nov 22, 2020 at 10:38 PM Sebastian Fischmeister <
> sfischme@uwaterloo.ca> wrote:
>
>> Hi,
>>
>> I set up drill with min.io and have multiple buckets in minio. I was able
>> to setup an s3 connection to a single bucket, however, I actually want to
>> run a query on all files in all buckets. Is this possible?
>>
>> Regards,
>>   Sebastian
>>
>
>
> -- 
> Nitin Pawar

Re: Querying multiple s3 buckets

Posted by Sebastian Fischmeister <sf...@uwaterloo.ca>.
Since s3n://test*/ doesn't seem to be easily feasible, how do you combine a query on multiple s3 buckets into a single one?

For example, I essentially want to execute 'find / -name "test*_r*.parquet"' on a set of S3 buckets:

select fqn as f from ??? where regexp_matches(f,".*test.*_r.*.parquet");

Regards,
  Sebastian

Nitin Pawar <ni...@gmail.com> writes:

> you can create multiple storages with different names to each bucker and
> query them in a single query
>
> On Sun, Nov 22, 2020 at 10:38 PM Sebastian Fischmeister <
> sfischme@uwaterloo.ca> wrote:
>
>> Hi,
>>
>> I set up drill with min.io and have multiple buckets in minio. I was able
>> to setup an s3 connection to a single bucket, however, I actually want to
>> run a query on all files in all buckets. Is this possible?
>>
>> Regards,
>>   Sebastian
>>
>
>
> -- 
> Nitin Pawar

Re: Querying multiple s3 buckets

Posted by Nitin Pawar <ni...@gmail.com>.
you can create multiple storages with different names to each bucker and
query them in a single query

On Sun, Nov 22, 2020 at 10:38 PM Sebastian Fischmeister <
sfischme@uwaterloo.ca> wrote:

> Hi,
>
> I set up drill with min.io and have multiple buckets in minio. I was able
> to setup an s3 connection to a single bucket, however, I actually want to
> run a query on all files in all buckets. Is this possible?
>
> Regards,
>   Sebastian
>


-- 
Nitin Pawar