You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by priteshm <gi...@git.apache.org> on 2017/10/14 04:52:07 UTC

[GitHub] drill pull request #994: Merge from latest

GitHub user priteshm opened a pull request:

    https://github.com/apache/drill/pull/994

    Merge from latest

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/priteshm/drill master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/994.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #994
    
----
commit c75dc4904d3ecb734f9369db6a9b4011956fb07c
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-03-10T23:56:18Z

    DRILL-5344: External sort priority queue copier fails with an empty batch
    
    Unit tests showed that the “priority queue copier” does not handle an
    empty batch. This has not been an issue because code elsewhere in the
    sort specifically works around this issue. This fix resolves the issue
    at the source to avoid the need for future work-arounds.
    
    closes #778

commit ee15632df3a869b3cc1063f882356d7eaab5b9f7
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-03-14T23:18:24Z

    DRILL-5323: Test tools for row sets
    
    Provide test tools to create, populate and compare row sets
    
    To simplify tests, we need a TestRowSet concept that wraps a
    VectorContainer and provides easy ways to:
    
    - Define a schema for the row set.
    - Create a set of vectors that implement the schema.
    - Populate the row set with test data via code.
    - Add an SV2 to the row set.
    - Pass the row set to operator components (such as generated code
    blocks.)
    - Examine the contents of a row set
    - Compare the results of the operation with an expected result set.
    - Dispose of the underling direct memory when work is done.
    
    This code builds on that in DRILL-5324 to provide a complete row set
    API. See DRILL-5318 for the spec.
    
    Note: this code can be reviewed as-is, but cannot be committed until
    after DRILL-5324 is committed: this code has compile-time dependencies
    on that code. This PR will be rebased once DRILL-5324 is pulled into
    master.
    
    Handles maps and intervals
    
    The row set schema is refined to provide two forms of schema. A
    physical schema shows the nested structure of the data with maps
    expanding into their contents.
    
    Updates the row set schema builder to easily build a schema with maps.
    
    An access schema shows the row “flattened” to include just scalar
    (non-map) columns, with all columns at a single level, with dotted
    names identifying nested fields. This form makes for very simple access.
    
    Then, provides tools for reading and writing batches with maps by
    presenting the flattened view to the row reader and writer.
    
    HyperVectors have a very complex structure for maps. The hyper row set
    implementation takes a first crack at mapping that structure into the
    standardized row set format.
    
    Also provides a handy way to set an INTERVAL column from an int. There
    is no good mapping from an int to an interval, so an arbitrary
    convention is used. This convention is not generally useful, but is
    very handy for quickly generating test data.
    
    As before, this is a partial PR. The code here still depends on
    DRILL-5324 to provide the column accessors needed by the row reader and
    writer.
    
    All this code is getting rather complex, so this commit includes a unit
    test of the schema and row set code.
    
    Revisions to support arrays
    
    Arrays require a somewhat different API. Refactored to allow arrays to
    appear as a field type.
    
    While refactoring, moved interfaces to more logical locations.
    
    Added more comments.
    
    Rejiggered the row set schema to provide both a physical and flattened
    (access) schema, both driven from the original batch schema.
    
    Pushed some accessor and writer classes into the accessor layer.
    
    Added tests for arrays.
    
    Also added more comments where needed.
    
    Moved tests to DRILL-5318
    
    The test classes previously here depend on the new “operator fixture”.
    To provide a non-cyclic checkin order, moved the tests to the PR with
    the fixtures so that this PR is clear of dependencies. The tests were
    reviewed in the context of DRILL-5318.
    
    Also pulls in batch sizer support for map fields which are required by
    the tests.
    
    closes #785

commit 51ce7843dfd5d9979dcd205df4797869c12dc3a2
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-03-14T23:18:24Z

    DRILL-5318: Sub-operator test fixture
    
    This commit depends on:
    
    * DRILL-5323
    
    This PR cannot be accepted (or built) until the above are pulled and
    this PR is rebased on top of them. The PR is issued now so that reviews
    can be done in parallel.
    
    Provides the following:
    
    * A new OperatorFixture to set up all the objects needed to test at the
    sub-operator level. This relies on the refactoring to create the
    required interfaces.
    * Pulls the config builder code out of the cluster fixture builder so
    that configs can be build for sub-operator tests.
    * Modifies the QueryBuilder test tool to run a query and get back one
    of the new row set objects to allow direct inspection of data returned
    from a query.
    * Modifies the cluster fixture to create a JDBC connection to the test
    cluster. (Use requires putting the Drill JDBC project on the test class
    path since exec does not depend on JDBC.)
    
    Created a common subclass for the cluster and operator fixtures to
    abstract out the allocator and config. Also provides temp directory
    support to the operator fixture.
    
    Merged with DRILL-5415 (Improve Fixture Builder to configure client
    properties)
    
    Moved row set tests here from DRILL-5323 so that DRILL-5323 is self
    contained. (The tests depend on the fixtures defined here.)
    
    Added comments where needed.
    
    Puts code back as it was prior to a code review comment. The code is
    redundant, but necessarily so due to code which is specific to several
    primitive types.
    
    closes #788

commit 54e9d3bf60bc65846b6ec809226ef82698b90a7a
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-03-26T02:51:43Z

    DRILL-5385: Vector serializer fails to read saved SV2
    
    Unit testing revealed that the VectorAccessorSerializable class claims
    to serialize SV2s, but, in fact, does not. Actually, it writes them,
    but does not read them, resulting in corrupted data on read.
    
    Fortunately, no code appears to serialize sv2s at present. Still, it is
    a bug and needs to be fixed.
    
    First task is to add serialization code for the sv2.
    
    That revealed that the recently-added code to save DrillBufs using a
    shared buffer had a bug: it relied on the writer index to know how much
    data is in the buffer. Turns out sv2 buffers don’t set this index. So,
    new versions of the write function takes a write length.
    
    Then, closer inspection of the read code revealed duplicated code. So,
    DrillBuf allocation moved into a version of the read function that now
    does reading and DrillBuf allocation.
    
    Turns out that value vectors, but not SV2s, can be built from a
    Drillbuf. Added a matching constructor to the SV2 class.
    
    Finally, cleaned up the code a bit to make it easier to follow. Also
    allowed test code to access the handy timer already present in the code.
    
    closes #800

commit dfd0abd6c19caf7d8024c189179c9662626b2f22
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-04-09T03:52:04Z

    DRILL-5423: Refactor ScanBatch to allow unit testing record readers
    
    Refactors ScanBatch to allow unit testing of record reader
    implementations, especially the “writer” classes.
    
    See JIRA for details.
    
    closes #811

commit c1c2a89bfd0ab5cdc20abaa532a9476c4fa2e252
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-04-11T21:42:57Z

    DRILL-5428: submit_plan fails after Drill 1.8 script revisions
    
    When the other scripts were updated, submit_plan was not corrected.
    After Drill 1.8, drill-config.sh consumes all command line arguments,
    finds the —config and —site options, removes them, and places the rest
    in the new args array.
    
    This PR updates submit_plan to use the new args array.
    
    The fix was tested on a test cluster: we verified that a physical plan
    was submitted and ran.
    
    closes #816

commit df15c75f8d2e8bf4b0e1dc7396068bdb9b266c49
Author: Arina Ielchiieva <ar...@gmail.com>
Date:   2017-04-26T13:27:19Z

    DRILL-5391: CTAS: make folder and file permission configurable
    
    close #820

commit 496a964d07e900f408529e8c252f9e5fbb4ae0e9
Author: liyun Liu <ll...@hotmail.com>
Date:   2017-05-04T04:46:58Z

    DRILL-4039: Query fails when non-ascii characters are used in string literals
    
    closes #825

----


---

[GitHub] drill pull request #994: Merge from latest

Posted by priteshm <gi...@git.apache.org>.
Github user priteshm closed the pull request at:

    https://github.com/apache/drill/pull/994


---