You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Philip Haynes <ph...@virtualnation.com.au> on 2012/10/01 12:35:10 UTC

Re: What language are you going to use to develop drill?

I actually said ³transactional consistency to atomic clock accuracy²
rather than ³global transactions² ­
there is a difference.  The key assumption to scope is whether Dremel
tables are mutable or not.
Whilst BigQuery tables are immutable, it isn't clear (to me anyway) that
this is true for 
Dremel. 

If it is assumed Drill tables are mutable, then as records are added to
tables,
you would want to support query indempotence over a given time range (i.e.
the same
query returns the same result).  If so, records need to be timestamped at
a precision 
sufficient to allow record differentiation and queries to terminate. Given
Dremel supports
interactive query of trillions of records and scans 10^15 records per
month, Unix timestamp
precision isn¹t good enough.

>From what I can tell, given Google¹s data processing volumes, you are
pushing the limits 
at atomic clock accuracy, hence the TrueTime API mentioned in the Spanner
paper.
 
    

Given the above my thoughts were:
1. Drill relies up system clocks to maintain transaction consistency.
2. Drill records are PTP timestamped on creation.
3. Drill records are immutable.
4. Drill queries are time ranged.

If it is agreed that mutable tables are in scope, the specific scope
implications to 
Drill in the near to medium term is the addition of record timestamps and
additional 
hidden time modifiers on queries. It would seem to me this is quite a
modest burden versus
the more restrictive assumption that Drill tables are mutable.

On the other hand, over the past couple of days I have been looking deeper
into some of 
the implications for  precision timing of DHT operations across a global
network. 
There appears at least one or two nuances to the problem  ;-). Whilst I
think essential
for Drill to process Dremel volumes, it is not todays problem either.


Thus Drill tables - Mutable or immutable?

Kind Regards,
Philip

 



On 29/09/12 11:05 AM, "Ted Dunning" <te...@gmail.com> wrote:

>"we need" != "in scope"
>
>We need faster disks.  Not in scope for drill.
>
>We need file systems that work with SSD's.  Not in scope for drill.
>
>We need global transactions.  Same story.
>
>Remember, Drill is intended to emulate Dremel.  Dremel has no updates
>whatsoever.
>
>On Fri, Sep 28, 2012 at 9:03 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> It's absolutely in the Drill scope, Ryan, there will be a scenario such
>>as
>> some one is appending data and others are concurrently doing query. so
>>we
>> always need global transactions.
>>
>>
>> On Sat, Sep 29, 2012 at 8:58 AM, Ryan Rawson <ry...@gmail.com> wrote:
>>
>> > its a little out of scope of the drill project scope...
>> >
>> > On Fri, Sep 28, 2012 at 5:42 PM, Ted Dunning <te...@gmail.com>
>> > wrote:
>> > > Don't look at me.
>> > >
>> > > I think that the concept of global transactions is absolutely
>>awesome.
>> > >
>> > > I just don't think it is the first order of business for Drill.
>> > >
>> > > If you want to build it, however, I would be absolutely thrilled.  I
>> just
>> > > can't help with the building right now.
>> > >
>> > > On Fri, Sep 28, 2012 at 8:33 PM, Philip Haynes <
>> > > philip.haynes@virtualnation.com.au> wrote:
>> > >
>> > >> I think perhaps people may have been too quick to dismiss the
>>concept.
>> > >>
>> >
>>


Re: What language are you going to use to develop drill?

Posted by Ted Dunning <te...@gmail.com>.
See http://www.apache.org/foundation/how-it-works.html#managementespecially
Decision Making.  This a lazy consensus question.  3 x +1's with
no -1's would pass this.  Our current state is that time based queries are
not in scope (based on recent conversation).

On Mon, Oct 1, 2012 at 11:35 AM, Philip Haynes <
philip.haynes@virtualnation.com.au> wrote:

> Thus Drill tables - Mutable or immutable?
>

I vote that we not increase the scope at this time to include time based
queries.  That is the question actually being called and it fails due to
lack of consensus.

Re: What language are you going to use to develop drill?

Posted by Philip Haynes <ph...@virtualnation.com.au>.
Not sure I entirely agree.

I could have a set of log files from a number of application servers.
Whilst the records are read only, the files are continuously appended too.

Taking a snapshot of the log files, preprocessing them each time you want
to run 
a set of queries is a design option, compared to incremental update.

The query systems I have built have moved away from the former to the
latter
due to the cost and time associated with of full pre-processing. After more
than insignificant pain as system scaled beyond 100M records, we moved to
more stream oriented designs.


Not going to die in a ditch over this one, however, as implicit in my note
is the
fix when people start having the problem of processing larger datasets.

Cheers,
Philip

On 2/10/12 9:31 AM, "Dmitriy Ryaboy" <dv...@gmail.com> wrote:

>On Mon, Oct 1, 2012 at 3:35 AM, Philip Haynes <
>philip.haynes@virtualnation.com.au> wrote:
>
>> I actually said ³transactional consistency to atomic clock accuracy²
>> rather than ³global transactions² ­
>> there is a difference.  The key assumption to scope is whether Dremel
>> tables are mutable or not.
>> Whilst BigQuery tables are immutable, it isn't clear (to me anyway) that
>> this is true for
>> Dremel.
>
>
>Philip, the first sentence of the Dremel paper reads:
>"Dremel is a scalable, interactive ad-hoc query system for analysis
>of*read-only
>*nested data".
>
>So I think that part is fairly clear.
>
>-D



Re: What language are you going to use to develop drill?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
On Mon, Oct 1, 2012 at 3:35 AM, Philip Haynes <
philip.haynes@virtualnation.com.au> wrote:

> I actually said ³transactional consistency to atomic clock accuracy²
> rather than ³global transactions² ­
> there is a difference.  The key assumption to scope is whether Dremel
> tables are mutable or not.
> Whilst BigQuery tables are immutable, it isn't clear (to me anyway) that
> this is true for
> Dremel.


Philip, the first sentence of the Dremel paper reads:
"Dremel is a scalable, interactive ad-hoc query system for analysis
of*read-only
*nested data".

So I think that part is fairly clear.

-D