You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jason Altekruse <al...@gmail.com> on 2016/01/15 00:47:53 UTC
Notes from hangout on Tuesday
Forgot to send these out, here are the notes from the hangout
Hangout 1/12/2015
Attendees: Parth, Jason, Zelaine, Andries, Jinfeng, Stefan, Aman, Jacques
Jacques mentioned Nong's work on vectorized parquet
- He is working in Spark repo right now
- Julien send out a message about scheduling a separate hangout with
Nong
1.5 release thread
- Hash skew
- follow up with Jacques
- old mumrhash only showed 5-10% query degradation on tpch scale
factor 100 tests
- fixed skew issue on other datasets
- elasticsearch uses murmurhash
- Jacques arrived
- Aman - which murmur hash do we want to use, then adapt to
Drills format
- reiterated perf numbers
- Parquet date fix
- patch coming today
- Amit issue with merge join
- Jacques will follow up
- related to making tests fail independent of environment
- do we need to investigate the fixed memory cost of sort
- splitting up the memory more ways shouldn't cause a sort to
fill, as
the work will also be split up, and all of the individual
sorts should
spill
- is there a skew issue?
- Aman - window function tests currently have skew, cases with
lots of nulls
- will look at operator unit tests to more completely cover all
cases of constrained
memory that could be generated in different settings and
configuration
- Not going to support 32 bit windows for Drill server
- mentioned by Kristine
Making tests fail independent of environment
- might not just be session options
- might need to tweak startup and global memory limits
Edmond
- had offered some resources for large scale testing
- we could use them to give a consistent environment for all devs
- he provided logins to a few members on the Drill team
- we haven't been using it
- is it just a few machines sitting idle? or will it provision
resources as we need them
Andries
- Drill client
- compatibility of drill clients between versions
- warning messages returned to client might cause issues
- decided to make it backwards compatible
Jacques
- still looking at allocation issue with flatten test
Spilling operations other than sort
- hash operations may require a row oriented format for spilling
Out of memory handling
- setting memory low, like 1 gig, smaller queries fail with out of
memory
- can get in a bad state after running out of memory
- Jacques - is the memory leak in the RPC layer?
- change was included with the memory allocator to make the RPC
have a separate pool of memory, turned off
- it hits the global pool of memory without using our accounting
- RPC should not be using much memory
- things going out can stay in the operator/application allocators
- change would only apply to incoming data
Calcite fork
- we should figure out how to make the current version we depend on 1.4
include extension points we need to put custom functionality in Drill
- these extension points should be easier to move on top of 1.6
- much of the code can then move into Drill