You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Paul Rogers <pa...@yahoo.com.INVALID> on 2019/01/22 02:51:25 UTC

Good DB theory references

Hi All,

Wanted to pass along some good foundational material about databases. We find ourselves immersed day-to-day in the details of Drill's implementation. It is helpful to occasionally step back and look at the larger DB tradition in which Drill resides. This material is especially good for anyone who didn't study DB theory in college.

"Architecture of a Database System": http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf - By Stonebraker et al. While focused on "classic" DB systems, the ideas readily apply to "Big Data" distributed engines such as Drill. Walks through many of the basic architectural choices. You'll find yourself saying, "I see, Drill chose the shared-nothing, OS thread model but random heap allocation rather than a buffer pool." That is, you can see Drill's design choices in the context of the overall DB solution space.

"Database Management Systems", 3e by Ramakrishnan & Gehrke. A textbook-length overview of DB theory. I used the second edition years ago to design and build a complete embedded hybrid DB and object store. I keep returning to the book any time I need a refresher on some topic or other.

What other favorites do people have? Anyone know of any good references that explain the rule-based architecture of a planner such as Calcite? (R&G, 2e, mostly discuss the classic "dynamic programming" style of planner.)

Thanks,
- Paul


Re: Good DB theory references

Posted by rahul challapalli <ch...@gmail.com>.
The redbook [1] deserves a mention. It also has a chapter (collection of
papers) dedicated to query optimization [2].

[1] http://www.redbook.io/
[2] http://www.redbook.io/ch7-queryoptimization.html

On Tue, Jan 22, 2019 at 4:16 AM Joel Pfaff <jo...@gmail.com> wrote:

>  Hello,
>
> Thanks for this initiative.
> I have found a couple of years ago this page of link from Reynold Xin:
> https://github.com/rxin/db-readings
>
> And it is full of nice things.
>
> Regards, Joel
>
> On Tue, Jan 22, 2019 at 9:01 AM weijie tong <to...@gmail.com>
> wrote:
>
> > Hi Paul:
> > Thanks for the sharing. I would like to share another good latest paper
> > here   "Everything you always wanted to know about compiled and
> vectorized
> > queries but were afraid to ask" :
> > http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf
> >
> > It explains the two kind of database execution architecture : vectorized
> &
> > compiled.  It can also answer the ever asked question about what's the
> > difference between spark's whole stage codegen and Drill's codegen.
> >
> >
> >
> > On Tue, Jan 22, 2019 at 10:51 AM Paul Rogers <pa...@yahoo.com.invalid>
> > wrote:
> >
> > > Hi All,
> > >
> > > Wanted to pass along some good foundational material about databases.
> We
> > > find ourselves immersed day-to-day in the details of Drill's
> > > implementation. It is helpful to occasionally step back and look at the
> > > larger DB tradition in which Drill resides. This material is especially
> > > good for anyone who didn't study DB theory in college.
> > >
> > > "Architecture of a Database System":
> > > http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf - By
> > > Stonebraker et al. While focused on "classic" DB systems, the ideas
> > readily
> > > apply to "Big Data" distributed engines such as Drill. Walks through
> many
> > > of the basic architectural choices. You'll find yourself saying, "I
> see,
> > > Drill chose the shared-nothing, OS thread model but random heap
> > allocation
> > > rather than a buffer pool." That is, you can see Drill's design choices
> > in
> > > the context of the overall DB solution space.
> > >
> > > "Database Management Systems", 3e by Ramakrishnan & Gehrke. A
> > > textbook-length overview of DB theory. I used the second edition years
> > ago
> > > to design and build a complete embedded hybrid DB and object store. I
> > keep
> > > returning to the book any time I need a refresher on some topic or
> other.
> > >
> > > What other favorites do people have? Anyone know of any good references
> > > that explain the rule-based architecture of a planner such as Calcite?
> > > (R&G, 2e, mostly discuss the classic "dynamic programming" style of
> > > planner.)
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> >
>

Re: Good DB theory references

Posted by Joel Pfaff <jo...@gmail.com>.
 Hello,

Thanks for this initiative.
I have found a couple of years ago this page of link from Reynold Xin:
https://github.com/rxin/db-readings

And it is full of nice things.

Regards, Joel

On Tue, Jan 22, 2019 at 9:01 AM weijie tong <to...@gmail.com> wrote:

> Hi Paul:
> Thanks for the sharing. I would like to share another good latest paper
> here   "Everything you always wanted to know about compiled and vectorized
> queries but were afraid to ask" :
> http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf
>
> It explains the two kind of database execution architecture : vectorized &
> compiled.  It can also answer the ever asked question about what's the
> difference between spark's whole stage codegen and Drill's codegen.
>
>
>
> On Tue, Jan 22, 2019 at 10:51 AM Paul Rogers <pa...@yahoo.com.invalid>
> wrote:
>
> > Hi All,
> >
> > Wanted to pass along some good foundational material about databases. We
> > find ourselves immersed day-to-day in the details of Drill's
> > implementation. It is helpful to occasionally step back and look at the
> > larger DB tradition in which Drill resides. This material is especially
> > good for anyone who didn't study DB theory in college.
> >
> > "Architecture of a Database System":
> > http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf - By
> > Stonebraker et al. While focused on "classic" DB systems, the ideas
> readily
> > apply to "Big Data" distributed engines such as Drill. Walks through many
> > of the basic architectural choices. You'll find yourself saying, "I see,
> > Drill chose the shared-nothing, OS thread model but random heap
> allocation
> > rather than a buffer pool." That is, you can see Drill's design choices
> in
> > the context of the overall DB solution space.
> >
> > "Database Management Systems", 3e by Ramakrishnan & Gehrke. A
> > textbook-length overview of DB theory. I used the second edition years
> ago
> > to design and build a complete embedded hybrid DB and object store. I
> keep
> > returning to the book any time I need a refresher on some topic or other.
> >
> > What other favorites do people have? Anyone know of any good references
> > that explain the rule-based architecture of a planner such as Calcite?
> > (R&G, 2e, mostly discuss the classic "dynamic programming" style of
> > planner.)
> >
> > Thanks,
> > - Paul
> >
> >
>

Re: Good DB theory references

Posted by weijie tong <to...@gmail.com>.
Hi Paul:
Thanks for the sharing. I would like to share another good latest paper
here   "Everything you always wanted to know about compiled and vectorized
queries but were afraid to ask" :
http://www.vldb.org/pvldb/vol11/p2209-kersten.pdf

It explains the two kind of database execution architecture : vectorized &
compiled.  It can also answer the ever asked question about what's the
difference between spark's whole stage codegen and Drill's codegen.



On Tue, Jan 22, 2019 at 10:51 AM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi All,
>
> Wanted to pass along some good foundational material about databases. We
> find ourselves immersed day-to-day in the details of Drill's
> implementation. It is helpful to occasionally step back and look at the
> larger DB tradition in which Drill resides. This material is especially
> good for anyone who didn't study DB theory in college.
>
> "Architecture of a Database System":
> http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf - By
> Stonebraker et al. While focused on "classic" DB systems, the ideas readily
> apply to "Big Data" distributed engines such as Drill. Walks through many
> of the basic architectural choices. You'll find yourself saying, "I see,
> Drill chose the shared-nothing, OS thread model but random heap allocation
> rather than a buffer pool." That is, you can see Drill's design choices in
> the context of the overall DB solution space.
>
> "Database Management Systems", 3e by Ramakrishnan & Gehrke. A
> textbook-length overview of DB theory. I used the second edition years ago
> to design and build a complete embedded hybrid DB and object store. I keep
> returning to the book any time I need a refresher on some topic or other.
>
> What other favorites do people have? Anyone know of any good references
> that explain the rule-based architecture of a planner such as Calcite?
> (R&G, 2e, mostly discuss the classic "dynamic programming" style of
> planner.)
>
> Thanks,
> - Paul
>
>