You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Jason Altekruse <al...@gmail.com> on 2016/01/15 00:47:53 UTC

Notes from hangout on Tuesday

Forgot to send these out, here are the notes from the hangout

Hangout 1/12/2015

Attendees: Parth, Jason, Zelaine, Andries, Jinfeng, Stefan, Aman, Jacques

Jacques mentioned Nong's work on vectorized parquet
    - He is working in Spark repo right now
    - Julien send out a message about scheduling a separate hangout with
Nong

1.5 release thread
    - Hash skew
        - follow up with Jacques
        - old mumrhash only showed 5-10% query degradation on tpch scale
factor 100 tests
        - fixed skew issue on other datasets
        - elasticsearch uses murmurhash
        - Jacques arrived
            - Aman - which murmur hash do we want to use, then adapt to
Drills format
            - reiterated perf numbers
    - Parquet date fix
        - patch coming today
    - Amit issue with merge join
        - Jacques will follow up
        - related to making tests fail independent of environment
        - do we need to investigate the fixed memory cost of sort
            - splitting up the memory more ways shouldn't cause a sort to
fill, as
              the work will also be split up, and all of the individual
sorts should
              spill
            - is there a skew issue?
            - Aman - window function tests currently have skew, cases with
lots of nulls
        - will look at operator unit tests to more completely cover all
cases of constrained
          memory that could be generated in different settings and
configuration
    - Not going to support 32 bit windows for Drill server
        - mentioned by Kristine

Making tests fail independent of environment
    - might not just be session options
    - might need to tweak startup and global memory limits

Edmond
    - had offered some resources for large scale testing
    - we could use them to give a consistent environment for all devs
    - he provided logins to a few members on the Drill team
    - we haven't been using it
    - is it just a few machines sitting idle? or will it provision
      resources as we need them

Andries
    - Drill client
    - compatibility of drill clients between versions
    - warning messages returned to client might cause issues
        - decided to make it backwards compatible

Jacques
    - still looking at allocation issue with flatten test

Spilling operations other than sort
    - hash operations may require a row oriented format for spilling

Out of memory handling
    - setting memory low, like 1 gig, smaller queries fail with out of
memory
    - can get in a bad state after running out of memory
    - Jacques - is the memory leak in the RPC layer?
        - change was included with the memory allocator to make the RPC
          have a separate pool of memory, turned off
        - it hits the global pool of memory without using our accounting
        - RPC should not be using much memory
        - things going out can stay in the operator/application allocators
        - change would only apply to incoming data

Calcite fork
    - we should figure out how to make the current version we depend on 1.4
      include extension points we need to put custom functionality in Drill
    - these extension points should be easier to move on top of 1.6
    - much of the code can then move into Drill