You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Ludovic Claude <lu...@gmail.com> on 2016/05/10 20:43:39 UTC

Filtering data files in directories

Hello,

I have a repository of files relatively well organised and containing a 
mix of medical images and csv files produced from those images in a 
neuroscience lab.

The csv files contain some interesting data that I would like to 
aggregate with Drill, but the naming convention is quite special - file 
names contain some id, then a prefix or suffix to identify the category 
of the file and all that is nested into a folder structure organised by 
subjects, for example ID1/processing1/ID1-mx.csv.

How can I use Drill to filter out the files that I do not need and keep 
only the files containing my data?

For example, I would like to write something like

SELECT * FROM dfs.data.`/` where dir1 = "processing1" and file like 
"%-mx.csv";


Thanks

Re: Filtering data files in directories

Posted by François Méthot <fm...@gmail.com>.

like Ted mentioned, here is an example:

SELECT * FROM dfs.data.`/*/processing1/*-mx.csv`



On Tue, May 10, 2016 at 5:28 PM, Ted Dunning <te...@gmail.com> wrote:

> Can you just use wild cards?
>
>
>
> On Tue, May 10, 2016 at 1:43 PM, Ludovic Claude <
> ludovic.claude54@gmail.com>
> wrote:
>
> > Hello,
> >
> > I have a repository of files relatively well organised and containing a
> > mix of medical images and csv files produced from those images in a
> > neuroscience lab.
> >
> > The csv files contain some interesting data that I would like to
> aggregate
> > with Drill, but the naming convention is quite special - file names
> contain
> > some id, then a prefix or suffix to identify the category of the file and
> > all that is nested into a folder structure organised by subjects, for
> > example ID1/processing1/ID1-mx.csv.
> >
> > How can I use Drill to filter out the files that I do not need and keep
> > only the files containing my data?
> >
> > For example, I would like to write something like
> >
> > SELECT * FROM dfs.data.`/` where dir1 = "processing1" and file like
> > "%-mx.csv";
> >
> >
> > Thanks
> >
> >
> >
> >
> >
>

Re: Filtering data files in directories

Posted by Ted Dunning <te...@gmail.com>.

Can you just use wild cards?



On Tue, May 10, 2016 at 1:43 PM, Ludovic Claude <lu...@gmail.com>
wrote:

> Hello,
>
> I have a repository of files relatively well organised and containing a
> mix of medical images and csv files produced from those images in a
> neuroscience lab.
>
> The csv files contain some interesting data that I would like to aggregate
> with Drill, but the naming convention is quite special - file names contain
> some id, then a prefix or suffix to identify the category of the file and
> all that is nested into a folder structure organised by subjects, for
> example ID1/processing1/ID1-mx.csv.
>
> How can I use Drill to filter out the files that I do not need and keep
> only the files containing my data?
>
> For example, I would like to write something like
>
> SELECT * FROM dfs.data.`/` where dir1 = "processing1" and file like
> "%-mx.csv";
>
>
> Thanks
>
>
>
>
>