You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Jason Altekruse (JIRA)" <ji...@apache.org> on 2015/01/27 01:42:35 UTC

[jira] [Created] (DRILL-2077) Provide a clear starting point for new developers about what to start reading to learn about Drill

Jason Altekruse created DRILL-2077:
--------------------------------------

Summary: Provide a clear starting point for new developers about what to start reading to learn about Drill
Key: DRILL-2077
URL: https://issues.apache.org/jira/browse/DRILL-2077
Project: Apache Drill
Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse

As part of my package level javadocs posted in DRILL-1904 I tried to document the root org.apache.drill.exec package. We should have some good information here as well as in the markdown file on the git repo about the best place to start reading the code to understand how drill operates.

Here is a description I started. I think we want to make sure this is informative but concise. I want to get in the rest of the package docs, so I am leaving this here as a TODO, please feel free to comment, revise or add to this.

{code}
* A good place to start learning about Drill is exploring the query plans. A
* Drill physical plan is defined as a connected graph of operators that read
* and manipulate data. Operators are configured by implementations of the {@See
* PhysicalOperator} interface. These query graphs are translated into a graph
* of physical operators that will actually process data at query execution
* time. The connections between these nodes are materialized as interfaces
* where data is passed between different operators. As Drill is distributed
* these connections can take the form of an RPC layer between the nodes in a
* Drill cluster.
*
* While physical plans can be written by hand, the primary interface for Drill
* is SQL. Drill is targeted for compliance with the ANSI SQL 2003
* specification. Query parsing and optimization is handled by Calcite, an
* Apache incubator project, also used for planning in Apache Hive. Drill
* defines many planning rules an optimizations that plug into the Calcite
* planning engine to generate optimal plans for the Drill engine.
*
* Unlike most query systems, Drill is designed to query raw files without
* a predefined catalog of metadata defining the types of data or columns
* available in the dataset. To maintain performance in a flexible schema
* environment, Drill uses runtime code generation to compile custom java
* code as operators receive a message of change in schema.
{code}

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)