You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by priteshm <gi...@git.apache.org> on 2017/10/14 04:52:07 UTC
[GitHub] drill pull request #994: Merge from latest
GitHub user priteshm opened a pull request:
https://github.com/apache/drill/pull/994
Merge from latest
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/priteshm/drill master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/994.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #994
----
commit c75dc4904d3ecb734f9369db6a9b4011956fb07c
Author: Paul Rogers <pr...@maprtech.com>
Date: 2017-03-10T23:56:18Z
DRILL-5344: External sort priority queue copier fails with an empty batch
Unit tests showed that the “priority queue copier” does not handle an
empty batch. This has not been an issue because code elsewhere in the
sort specifically works around this issue. This fix resolves the issue
at the source to avoid the need for future work-arounds.
closes #778
commit ee15632df3a869b3cc1063f882356d7eaab5b9f7
Author: Paul Rogers <pr...@maprtech.com>
Date: 2017-03-14T23:18:24Z
DRILL-5323: Test tools for row sets
Provide test tools to create, populate and compare row sets
To simplify tests, we need a TestRowSet concept that wraps a
VectorContainer and provides easy ways to:
- Define a schema for the row set.
- Create a set of vectors that implement the schema.
- Populate the row set with test data via code.
- Add an SV2 to the row set.
- Pass the row set to operator components (such as generated code
blocks.)
- Examine the contents of a row set
- Compare the results of the operation with an expected result set.
- Dispose of the underling direct memory when work is done.
This code builds on that in DRILL-5324 to provide a complete row set
API. See DRILL-5318 for the spec.
Note: this code can be reviewed as-is, but cannot be committed until
after DRILL-5324 is committed: this code has compile-time dependencies
on that code. This PR will be rebased once DRILL-5324 is pulled into
master.
Handles maps and intervals
The row set schema is refined to provide two forms of schema. A
physical schema shows the nested structure of the data with maps
expanding into their contents.
Updates the row set schema builder to easily build a schema with maps.
An access schema shows the row “flattened” to include just scalar
(non-map) columns, with all columns at a single level, with dotted
names identifying nested fields. This form makes for very simple access.
Then, provides tools for reading and writing batches with maps by
presenting the flattened view to the row reader and writer.
HyperVectors have a very complex structure for maps. The hyper row set
implementation takes a first crack at mapping that structure into the
standardized row set format.
Also provides a handy way to set an INTERVAL column from an int. There
is no good mapping from an int to an interval, so an arbitrary
convention is used. This convention is not generally useful, but is
very handy for quickly generating test data.
As before, this is a partial PR. The code here still depends on
DRILL-5324 to provide the column accessors needed by the row reader and
writer.
All this code is getting rather complex, so this commit includes a unit
test of the schema and row set code.
Revisions to support arrays
Arrays require a somewhat different API. Refactored to allow arrays to
appear as a field type.
While refactoring, moved interfaces to more logical locations.
Added more comments.
Rejiggered the row set schema to provide both a physical and flattened
(access) schema, both driven from the original batch schema.
Pushed some accessor and writer classes into the accessor layer.
Added tests for arrays.
Also added more comments where needed.
Moved tests to DRILL-5318
The test classes previously here depend on the new “operator fixture”.
To provide a non-cyclic checkin order, moved the tests to the PR with
the fixtures so that this PR is clear of dependencies. The tests were
reviewed in the context of DRILL-5318.
Also pulls in batch sizer support for map fields which are required by
the tests.
closes #785
commit 51ce7843dfd5d9979dcd205df4797869c12dc3a2
Author: Paul Rogers <pr...@maprtech.com>
Date: 2017-03-14T23:18:24Z
DRILL-5318: Sub-operator test fixture
This commit depends on:
* DRILL-5323
This PR cannot be accepted (or built) until the above are pulled and
this PR is rebased on top of them. The PR is issued now so that reviews
can be done in parallel.
Provides the following:
* A new OperatorFixture to set up all the objects needed to test at the
sub-operator level. This relies on the refactoring to create the
required interfaces.
* Pulls the config builder code out of the cluster fixture builder so
that configs can be build for sub-operator tests.
* Modifies the QueryBuilder test tool to run a query and get back one
of the new row set objects to allow direct inspection of data returned
from a query.
* Modifies the cluster fixture to create a JDBC connection to the test
cluster. (Use requires putting the Drill JDBC project on the test class
path since exec does not depend on JDBC.)
Created a common subclass for the cluster and operator fixtures to
abstract out the allocator and config. Also provides temp directory
support to the operator fixture.
Merged with DRILL-5415 (Improve Fixture Builder to configure client
properties)
Moved row set tests here from DRILL-5323 so that DRILL-5323 is self
contained. (The tests depend on the fixtures defined here.)
Added comments where needed.
Puts code back as it was prior to a code review comment. The code is
redundant, but necessarily so due to code which is specific to several
primitive types.
closes #788
commit 54e9d3bf60bc65846b6ec809226ef82698b90a7a
Author: Paul Rogers <pr...@maprtech.com>
Date: 2017-03-26T02:51:43Z
DRILL-5385: Vector serializer fails to read saved SV2
Unit testing revealed that the VectorAccessorSerializable class claims
to serialize SV2s, but, in fact, does not. Actually, it writes them,
but does not read them, resulting in corrupted data on read.
Fortunately, no code appears to serialize sv2s at present. Still, it is
a bug and needs to be fixed.
First task is to add serialization code for the sv2.
That revealed that the recently-added code to save DrillBufs using a
shared buffer had a bug: it relied on the writer index to know how much
data is in the buffer. Turns out sv2 buffers don’t set this index. So,
new versions of the write function takes a write length.
Then, closer inspection of the read code revealed duplicated code. So,
DrillBuf allocation moved into a version of the read function that now
does reading and DrillBuf allocation.
Turns out that value vectors, but not SV2s, can be built from a
Drillbuf. Added a matching constructor to the SV2 class.
Finally, cleaned up the code a bit to make it easier to follow. Also
allowed test code to access the handy timer already present in the code.
closes #800
commit dfd0abd6c19caf7d8024c189179c9662626b2f22
Author: Paul Rogers <pr...@maprtech.com>
Date: 2017-04-09T03:52:04Z
DRILL-5423: Refactor ScanBatch to allow unit testing record readers
Refactors ScanBatch to allow unit testing of record reader
implementations, especially the “writer” classes.
See JIRA for details.
closes #811
commit c1c2a89bfd0ab5cdc20abaa532a9476c4fa2e252
Author: Paul Rogers <pr...@maprtech.com>
Date: 2017-04-11T21:42:57Z
DRILL-5428: submit_plan fails after Drill 1.8 script revisions
When the other scripts were updated, submit_plan was not corrected.
After Drill 1.8, drill-config.sh consumes all command line arguments,
finds the —config and —site options, removes them, and places the rest
in the new args array.
This PR updates submit_plan to use the new args array.
The fix was tested on a test cluster: we verified that a physical plan
was submitted and ran.
closes #816
commit df15c75f8d2e8bf4b0e1dc7396068bdb9b266c49
Author: Arina Ielchiieva <ar...@gmail.com>
Date: 2017-04-26T13:27:19Z
DRILL-5391: CTAS: make folder and file permission configurable
close #820
commit 496a964d07e900f408529e8c252f9e5fbb4ae0e9
Author: liyun Liu <ll...@hotmail.com>
Date: 2017-05-04T04:46:58Z
DRILL-4039: Query fails when non-ascii characters are used in string literals
closes #825
----
---
[GitHub] drill pull request #994: Merge from latest
Posted by priteshm <gi...@git.apache.org>.
Github user priteshm closed the pull request at:
https://github.com/apache/drill/pull/994
---