You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Jacques Nadeau <ja...@dremio.com> on 2015/08/05 04:18:22 UTC

Re: Drill Hangout (2015-08-04) minutes

Quick thought on insert isolation:

Let's just do a hidden directory and then rename.  We can make Drill avoid
reading hidden directories.  No fancy work required.

With regards to Dot Drill, let's not turn this into a mini database.  The
complexities would be overwhelming.  My recommendation is we constrain to
additional metadata that cannot otherwise be divined.  Beyond that, we're
should use ephemeral files (similar to the parquet metadata cache where
deleting doesn't impact logical outcome, may impact planning or
performance).  I would avoid mixing ephemeral and persistent data around
the dot drill concept.

In general, if we want to store Drill's internal ephemeral metadata, lets
have a discussion around the options.  Also remember that not all Drill
installations will use a distributed filesystem.  As such, we need to think
about these abstractions to support multiple types of storage systems.



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Aug 4, 2015 at 2:24 PM, Khurram Faraaz <kf...@maprtech.com> wrote:

> Drill Hangout 2015-08-04
>
> Attendees: Daniel, Khurram, Neeraja, Vicky, Kris, Aman, Parth, Andries,
> Jinfeng, Anas
>
> - Insert and drop
>
> - read isolation during insert into, Aman suggested snapshot level
>
> - have to have some kind of lock manager
>
> - locking on the dot drill file
>
> - should this locking talk to other external programs working with Drill
> used by Drill?
>
>     - Jason - why is this the lock necessary?
>
>     - we want to merge schemas in a dot drill file, avoid gather schemas
> from a                            lot of separate files
>
> - insert feature will be broken into phases
>
>     - this needs to handle schema changes to be consistent with the rest of
> Drill
>
> - partition pruning is not working for some expressions
>
> - we will only fix for cast
>
> - Jinfeng thinks this should be easy enough
>
> - handling unknown types in parquet or other external systems
>
> - should we fail actively, or should we give data back in varbinary
>
> - sould people have to wait for a release to handle new data types
>
> - storage plugin writers should have a clear idea about how to handle these
> cases
>
> - Jason will send a message to the list about this
>
>
> - test framework
>
> - Rahul is working on publishing it to a public repository
>
> - this will include instructions on how to set up the tests on your own
> hardware
>

Re: Drill Hangout (2015-08-04) minutes

Posted by Aman Sinha <as...@maprtech.com>.
The hangout notes refer to dot drill file but I think that may be either
misrepresentation or mis-statement during the hangout.  For the INSERT
discussion, the best source is the separate thread titled '[DISCUSS] Insert
Into Table Support'.  In fact, we are intending to keep the merged schema
in the parquet metadata cache file (which is different from dot drill).
Let's discuss the issues around concurrency (inserts concurrent with reads)
in the other email thread.

Aman

On Tue, Aug 4, 2015 at 7:18 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Quick thought on insert isolation:
>
> Let's just do a hidden directory and then rename.  We can make Drill avoid
> reading hidden directories.  No fancy work required.
>
> With regards to Dot Drill, let's not turn this into a mini database.  The
> complexities would be overwhelming.  My recommendation is we constrain to
> additional metadata that cannot otherwise be divined.  Beyond that, we're
> should use ephemeral files (similar to the parquet metadata cache where
> deleting doesn't impact logical outcome, may impact planning or
> performance).  I would avoid mixing ephemeral and persistent data around
> the dot drill concept.
>
> In general, if we want to store Drill's internal ephemeral metadata, lets
> have a discussion around the options.  Also remember that not all Drill
> installations will use a distributed filesystem.  As such, we need to think
> about these abstractions to support multiple types of storage systems.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Aug 4, 2015 at 2:24 PM, Khurram Faraaz <kf...@maprtech.com>
> wrote:
>
> > Drill Hangout 2015-08-04
> >
> > Attendees: Daniel, Khurram, Neeraja, Vicky, Kris, Aman, Parth, Andries,
> > Jinfeng, Anas
> >
> > - Insert and drop
> >
> > - read isolation during insert into, Aman suggested snapshot level
> >
> > - have to have some kind of lock manager
> >
> > - locking on the dot drill file
> >
> > - should this locking talk to other external programs working with Drill
> > used by Drill?
> >
> >     - Jason - why is this the lock necessary?
> >
> >     - we want to merge schemas in a dot drill file, avoid gather schemas
> > from a                            lot of separate files
> >
> > - insert feature will be broken into phases
> >
> >     - this needs to handle schema changes to be consistent with the rest
> of
> > Drill
> >
> > - partition pruning is not working for some expressions
> >
> > - we will only fix for cast
> >
> > - Jinfeng thinks this should be easy enough
> >
> > - handling unknown types in parquet or other external systems
> >
> > - should we fail actively, or should we give data back in varbinary
> >
> > - sould people have to wait for a release to handle new data types
> >
> > - storage plugin writers should have a clear idea about how to handle
> these
> > cases
> >
> > - Jason will send a message to the list about this
> >
> >
> > - test framework
> >
> > - Rahul is working on publishing it to a public repository
> >
> > - this will include instructions on how to set up the tests on your own
> > hardware
> >
>

Re: Drill Hangout (2015-08-04) minutes

Posted by Aman Sinha <as...@maprtech.com>.
The hangout notes refer to dot drill file but I think that may be either
misrepresentation or mis-statement during the hangout.  For the INSERT
discussion, the best source is the separate thread titled '[DISCUSS] Insert
Into Table Support'.  In fact, we are intending to keep the merged schema
in the parquet metadata cache file (which is different from dot drill).
Let's discuss the issues around concurrency (inserts concurrent with reads)
in the other email thread.

Aman

On Tue, Aug 4, 2015 at 7:18 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Quick thought on insert isolation:
>
> Let's just do a hidden directory and then rename.  We can make Drill avoid
> reading hidden directories.  No fancy work required.
>
> With regards to Dot Drill, let's not turn this into a mini database.  The
> complexities would be overwhelming.  My recommendation is we constrain to
> additional metadata that cannot otherwise be divined.  Beyond that, we're
> should use ephemeral files (similar to the parquet metadata cache where
> deleting doesn't impact logical outcome, may impact planning or
> performance).  I would avoid mixing ephemeral and persistent data around
> the dot drill concept.
>
> In general, if we want to store Drill's internal ephemeral metadata, lets
> have a discussion around the options.  Also remember that not all Drill
> installations will use a distributed filesystem.  As such, we need to think
> about these abstractions to support multiple types of storage systems.
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Aug 4, 2015 at 2:24 PM, Khurram Faraaz <kf...@maprtech.com>
> wrote:
>
> > Drill Hangout 2015-08-04
> >
> > Attendees: Daniel, Khurram, Neeraja, Vicky, Kris, Aman, Parth, Andries,
> > Jinfeng, Anas
> >
> > - Insert and drop
> >
> > - read isolation during insert into, Aman suggested snapshot level
> >
> > - have to have some kind of lock manager
> >
> > - locking on the dot drill file
> >
> > - should this locking talk to other external programs working with Drill
> > used by Drill?
> >
> >     - Jason - why is this the lock necessary?
> >
> >     - we want to merge schemas in a dot drill file, avoid gather schemas
> > from a                            lot of separate files
> >
> > - insert feature will be broken into phases
> >
> >     - this needs to handle schema changes to be consistent with the rest
> of
> > Drill
> >
> > - partition pruning is not working for some expressions
> >
> > - we will only fix for cast
> >
> > - Jinfeng thinks this should be easy enough
> >
> > - handling unknown types in parquet or other external systems
> >
> > - should we fail actively, or should we give data back in varbinary
> >
> > - sould people have to wait for a release to handle new data types
> >
> > - storage plugin writers should have a clear idea about how to handle
> these
> > cases
> >
> > - Jason will send a message to the list about this
> >
> >
> > - test framework
> >
> > - Rahul is working on publishing it to a public repository
> >
> > - this will include instructions on how to set up the tests on your own
> > hardware
> >
>