You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Ludovic Claude <lu...@gmail.com> on 2016/05/10 20:43:39 UTC
Filtering data files in directories
Hello,
I have a repository of files relatively well organised and containing a
mix of medical images and csv files produced from those images in a
neuroscience lab.
The csv files contain some interesting data that I would like to
aggregate with Drill, but the naming convention is quite special - file
names contain some id, then a prefix or suffix to identify the category
of the file and all that is nested into a folder structure organised by
subjects, for example ID1/processing1/ID1-mx.csv.
How can I use Drill to filter out the files that I do not need and keep
only the files containing my data?
For example, I would like to write something like
SELECT * FROM dfs.data.`/` where dir1 = "processing1" and file like
"%-mx.csv";
Thanks
Re: Filtering data files in directories
Posted by François Méthot <fm...@gmail.com>.
like Ted mentioned, here is an example:
SELECT * FROM dfs.data.`/*/processing1/*-mx.csv`
On Tue, May 10, 2016 at 5:28 PM, Ted Dunning <te...@gmail.com> wrote:
> Can you just use wild cards?
>
>
>
> On Tue, May 10, 2016 at 1:43 PM, Ludovic Claude <
> ludovic.claude54@gmail.com>
> wrote:
>
> > Hello,
> >
> > I have a repository of files relatively well organised and containing a
> > mix of medical images and csv files produced from those images in a
> > neuroscience lab.
> >
> > The csv files contain some interesting data that I would like to
> aggregate
> > with Drill, but the naming convention is quite special - file names
> contain
> > some id, then a prefix or suffix to identify the category of the file and
> > all that is nested into a folder structure organised by subjects, for
> > example ID1/processing1/ID1-mx.csv.
> >
> > How can I use Drill to filter out the files that I do not need and keep
> > only the files containing my data?
> >
> > For example, I would like to write something like
> >
> > SELECT * FROM dfs.data.`/` where dir1 = "processing1" and file like
> > "%-mx.csv";
> >
> >
> > Thanks
> >
> >
> >
> >
> >
>
Re: Filtering data files in directories
Posted by Ted Dunning <te...@gmail.com>.
Can you just use wild cards?
On Tue, May 10, 2016 at 1:43 PM, Ludovic Claude <lu...@gmail.com>
wrote:
> Hello,
>
> I have a repository of files relatively well organised and containing a
> mix of medical images and csv files produced from those images in a
> neuroscience lab.
>
> The csv files contain some interesting data that I would like to aggregate
> with Drill, but the naming convention is quite special - file names contain
> some id, then a prefix or suffix to identify the category of the file and
> all that is nested into a folder structure organised by subjects, for
> example ID1/processing1/ID1-mx.csv.
>
> How can I use Drill to filter out the files that I do not need and keep
> only the files containing my data?
>
> For example, I would like to write something like
>
> SELECT * FROM dfs.data.`/` where dir1 = "processing1" and file like
> "%-mx.csv";
>
>
> Thanks
>
>
>
>
>