You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/01/04 11:47:48 UTC

[GitHub] [drill] jnturton opened a new issue #2421: ValueVectors replacement

jnturton opened a new issue #2421:
URL: https://github.com/apache/drill/issues/2421


   This feature request has been transcribed from messages posted in #2412 and to the mailing list in the first week of January 2021.  The topic is what might replace the current memory structures used for data, ValueVectors.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1004748894


   Charles Givre wrote:
   
   Thanks Ted for the perspective!  I had always wished to be a "fly on the wall" in those conversations.  
   -- C
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton edited a comment on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton edited a comment on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1008235930


   @paul-rogers I can imagine vectorisation helping hash joins because of all the hashing to be crunched. I haven't got a good paper to cite but a search from my phone did turn up at least one story on these lines. 
   
   https://www.cockroachlabs.com/blog/vectorized-hash-joiner/
   
   Regarding pivoting between row and column orientation depending on the operator, I would expect that all the ensuing memory copying would outweigh any gain but again I bring no references or data.
   
   EDIT: There are quite a few open access papers returned by "sql join SIMD". I've got some bedtime reading to do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] Leon-WTF commented on issue #2421: [DISCUSSION] ValueVectors Replacement

Posted by GitBox <gi...@apache.org>.

Leon-WTF commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1013839564


   @paul-rogers  Thanks paul for your knowledge, but for the example of the serialized sort and merge, I don't quite understand. Why the sort and merge are in the same major fragment(f4)? And they are also on the same machine? It exists  more than one merge fragment? not just one? Could you please share more details about it? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1004750707

Paul Rogers wrote:

All: so I've kicked the hornet's nest with the mention of value vectors and Arrow. I'm going to put on my flame-proof suit and debunk some myths.

The columnar format is great for storage, for all the usual reasons. This is why Parquet uses it, Druid uses it for segment files, and various DBs use it for storage. The question we want to ask is, do those benefits apply to the format within the Drill execution engine? I'm here to suggest that columnar has no advantage, and many disadvantages, when used as the *internal* format of an execution engine. "Thems is fighting words", so let's bring it on.

I've had the pleasure of working with several query engines: Drill (columnar) and Impala (row-based) are two well-known examples. This has given me a unique opportunity to see if all the marketing claims for columnar (which still appear in the videos on Drill's website) actually hold up in practice. Spoiler: they don't.

This is a PR about optimization. A good rule in optimization is to start with the biggest issues, then work toward the details. So, rather than tinker with the details of vector execution, let's look at the fundamental issues. I hope this will help us avoid confusing Drill's (and Arrow's) marketing with reality.

**Myth: Vectorized execution**: The biggest myth is around vectorized execution. Of course, a minor myth is that Drill uses such execution (it doesn't.) The bigger myth is that, if we invested enough, it could.

Vectorized execution is great when we have a simple operation we apply to a large amount of data. Think the dot-product operation for neural networks, or data compression, or image transforms, or graphics. In all cases, we apply a simple operation (rescale, say) to a large amount of homogeneous data (the pixels in an image.)

So, the question is, does typical, real-world SQL fit this pattern? I've now seen enough crufty, complex, messy real-world queries to suggest that, no, SQL is not a good candidate for vectorization. `SELECT` and `WHERE` clauses embed business logic, and that logic is based on messy human rules, not crisp, clean mathematics. The resulting SQL tends to have conditionals (`WHEN` or `IF()`, etc.), lots of function calls (all those cool UDFs which @cgivre has written), and so on. Plus, as noted above, SQL deals with NULL values, which must short-circuit entire execution paths.

Hence, even if we could vectorize simple operations, we'd find that, in most queries, we could not actually use that code.

**Myth: Vectors are CPU Cache Friendly**: The second big myth is that vectors are somehow more "friendly" to the CPU L1 cache than a row format. The idea is that one can load a vector into the L1 cache, then zip through many values in one go. This myth is related to the above one.

First, SQL expressions are not based on columns, they are based on rows. Each calculation tends to involve multiple columns: `net_receipts = sales + taxes - returns`. Here each calculation touches four vectors, so we need all four to be in the CPU cache to benefit.

Second, SQL is row based: that above calculation is just one of perhaps many that occur on each row. In the ideal case, the calculations for independent groups: `SELECT a + b AS x, c - d + e AS y, f / g AS z, ...`. In this case, we could load vectors ``a, `b`, `x` into the L1 cache, do the calcs, then load `c`, `d`, `e` and y in the cache and so on. Of course, Drill doesn't work this way (it does all the calculations for a single row before moving to the next), but it could, and it would have to to benefit from vectorization.

A more typical case is that the same column is used in multiple expressions: `SELECT a + b AS x, a / c AS y, (a - d) * e AS z, ...` In this case, we must load the `a` vector into the L1 cache multiple times. (Or, more properly, its values would continually be bumped out of the cache, then reloaded.)

**Myth: Bigger Vectors are Better**: Drill went though a phase when everyone bought into the "L1 cache" myth. To get better performance everyone wanted ever larger vectors. In the code, you'll see that we started with 1K-row batches, then it grew to 4K, then other code would create 64K row batches. It got so bad we'd allocate vectors larger than 16MB, which caused memory fragmentation and OOM errors. (This is the original reason for what evolved to be "EVF": to control vector sizes to prevent memory fragmentation - very basic DB stuff.)

Remember, the CPU L1 cache is only about 256K in size. A 4MB vector is already 16x the L1 cache size. Combine that with real-world expressions and we end up with a "working set" of 10s of MB in size: 20x or more the L1 cache size. The result is lots of cache misses. (This stuff is really hard to measure, would be great for someone to do the experiments to show this happening in practice.)

**Myth: Vectors are Efficient**: A related, minor myth is that writing to vectors is more efficient than writing to rows. This is not true in general. In Drill, it is especially false. Originally, vectors were allocated at some small initial size (256K? Something like that.) Vectors grow as we add data. (That's one of the things that make working with them difficult.) Fill the 256K vector? We double it to 512K and copy across the data. We do again at 1MB, 2MB, 4MB, ... In the end, to grow to 4MB, copy about 4MB of data. That is 4MB of reads, 4MB of writes, in addition to the 4MB of writes needed to create the data.

Later, a bunch of ad-hoc "batch sizing" code was added to try to guess a good initial size for vectors. Not too hard or fixed-width vectors (`INT`, say), but rather tricky for variable-width vectors (`VARCHAR`).

Remember that each vector is sized and grows independently. So, to create a batch, we have to track the size of every vector, grow those that need growing, but not over-allocate all the vectors because the space is not fungible: vector `a` can't "borrow" unused space from vector `b`.

The point is, Drill needs a large amount of complex, add-on code just to work around the fact that every vector will grow, and copy data, if not sized correctly, and, in general, we don't know ahead of time what the size should be. The result is inefficiency.

**Myth: Vectors Allow Efficient Parquet Integration**: The idea here is that Parquet is columnar, vectors are columnar, so we just read Parquet directly into vectors. The reality is that Parquet is a highly-encoded format that requires a large amount of complex code to decode. Drill does the decoding value-by-value, there is no "direct copy", nor could there be. Parquet works with Drill's value vectors, Arrow vectors, or Impala rows equally well: in each case, the Parquet data has to be decoded value-by-value, then written to the target format.

**Single Data Format for In-Memory and Over-the-Wire**: One of Drills' claims to fame is that value vectors (and, later Arrow vectors) use the same layout in memory as over the wire, leading to efficient exchanges via RPC. This myth is true, as far as it goes. But, if you look at the code, the truth is much more complex. On the sender side, vectors are independent buffers (as explained above.) On the receiver side, the whole message comes in as one big buffer. Special code slices up that buffer to recover vectors. A vast amount of complex code is needed to handle the accounting. (Thanks to Chris, a one-time Drill contributor who wrote all that code well enough that it "just works". It would be hell to fix otherwise.)

**Complex Client Code**: All DB clients work a row at a time. Whether it is JDBC, Sql Alchemy, your home grown code, or whatever, a client consumes data a row at a time. All DB APIs (except Drill) are row-based. The client asks for the next row (or, more typically, the next *n* rows), then reads them one-by-one. Super simple and has worked for decades.

With Drill, a client has to include all of Drill's complex vector logic so it can read a batch of a 1K (or 4K or more) rows split across vectors. The client then has to walk the vectors to assemble the row that the client really wants. The result is that Java clients (such as JDBC) pull in a vast amount of Drill code. The C++ ODBC client was a nightmare: pretty much only one person ever could make it work (thanks, Parth!).

The complexity of this simple client operation has led clients to use the REST API instead. But, Drill is session-based (needed to set Drill's zillions of session options), but REST is not. Drill works for "big data", but REST delivers the results in one big blob. The result is that that the vector-based client is so hard to use that folks want to use the REST client, which doesn't work in the general case either. We've shot ourselves in the foot. Yet, every other DB on the planet has a simple row-based client API.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] paul-rogers edited a comment on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

paul-rogers edited a comment on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1007611673

@jnturton, one could do something like what you described. However, to have all of Drill work with Arrow would be a huge amount of work. Optimizations made for one format would be sub-optimal for the other. (Example: exchanges.) Furthermore, your use case would benefit from vectors only in the project and grouping operators.

So, I wonder if we might think about the problem operator-by-operator. If you have a compute-heavy phase, might that first transform data to vectors, apply the compute, then send data along in row format? Every fragment does a network exchange: data is read/written anyway. So, perhaps there is something that can be done to transform formats at fragment boundaries (he says, waving hands wildly...)

You'll also get speed only for queries without joins. If you have joins, then the joins are likely to take the vast amount of the runtime, leaving your projection and grouping in the noise. I'm not sure how vectorization can help joins; certainly in Drill today, vectors make the join code atrociously complex.

This is why DBs (and compiler optimizers) are hard: the answers change based on use case...

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] paul-rogers commented on issue #2421: [DISCUSSION] ValueVectors Replacement

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1014927069

@Leon-WTF, you ask a good question. I'm afraid the answer is rather complex: hard to describe in a few words. Read if you're interested.

Let's take a simple query. We have 10 scan fragments feeding 10 sort fragments that feed a single merge/"screen" fragment. Drill uses a random exchange to shuffle data from the 10 scan fragments to the 10 sorts, so that every sort gets data from every reader, balancing the load. There is an ordered exchange from the sorts to the merge. In "Impala notation":

The dotted lines are an ad-hoc addition to represent fragment boundaries. Remember there are 10 each of the lower two fragments, 1 of the top.

First,lets consider the in-memory case. The scans read data and forward batches to random sorts. The sorts sort each incoming batches, sorts each one, and buffer them. When the sort sees EOF from all its inputs, it merges the buffered batches and sends the data downstream to the merge, one batch at a time.

The merge needs a batch from each sort to proceed, so the merge can't start until the last sort finishes. (This is why a Sort is called a "blocking operator" or "buffering operator": it won't produce its first result until it has consumed all its input batches.)

Everything is in memory, so each Sort can consume batches about as fast as the scans can produce them. The sorts all work in parallel, as we'd hope. The merge kicks in last, but that's to be expected: one can't merge without having first sorted all the data.

OK, now for the spilling case, where the problems seem to crop up. (Full disclosure: I rewrote the current version of the Sort spilling code.) Now, data is too large to fit into memory for the sort. Let's focus on one sort, call it Sort 1. Sort 1 reads input batches until it fills its buffer. Then, Sort 1 pauses to write its sorted batches to disk. Simple enough. But, here is where things get tricky.

All the scans must fill their output batches. Because we're doing random exchanges to the sorts, all outgoing batches are about the same size. Let's now pick one scan to focus on, Scan 2. Scan 2 fill up the outgoing batch for Sort 1, but Sort 1 is busy spilling, so Sort 1 can't read that batch. As a result, Scan 2 is blocked: it needs to add one more row for Sort 1, but it can't because the outgoing batch is full, and the downstream Sort won't accept it. So, Scan 2 grinds to a halt.

The same is true for all other scans: as soon as they want to send to Sort 1 (which is spilling), they block. Soon, all scans are blocked. This means that all the other Sorts stop working: they have no incoming batches, so they are starved.

Eventually, Sort 1 completes its spill and starts reading again. This means Scan 2 can send and start working again. The same is true of the other Scans. Now, the next Sort, Sort 2, needs to spill. (Remember, the data is randomly distributed, so all Sorts see about the same number of rows.) So, the whole show occurs again. The scans can't send to Sort 2, so they stall. The other scans become starved for inputs, and so they stop.

Basically, the entire system becomes serialized on the sort spills: effectively, across the cluster only one spill will be active at any time, and that spill blocks senders which blocks the other sorts. We have a 10-way distributed, single-threaded query.

Now, I've omitted some details. One is that, in Drill, every receiver is obligated to buffer three incoming batches before it blocks. So, the scenario above is not exactly right: there is buffering in the Unordered Receiver below each Sort. But, the net effect is the same once that three-batch buffer is filled.

Even here there is an issue: remember: each scan hast to buffer 1024 for every sort. We have 10 scans and 10 sorts, so we're buffering 100 batches or 100K rows. And, the sorts have buffers of 3 batches each, so that's another 30 batches total, or 30K rows. All that consumes memory which is not available for the sort, hence the sort has to spill. That is, the memory design for Drill takes a narrow, per-operator view, and does not optimize memory use across the whole query.

All of the above is conjecture based on watching large queries grind to a crawl. Drill provides overall metrics, but not metrics broken down by time slices, so we have no good way to visualize behavior. Using ASCII graphics, here's what we'd expect to see:

```text
Sort 1: rrrsss____________r_______r_______rsss_________
Sort 2: rr___rsss__________r_______r_______rsss________
....
Scan 1: www__w____________www_____www_____www___
Scan 2: www__w____________www_____www_____www___
```

Where "r" means "read and sort a batch", "s" means "spill", "w" means "write a batch downstream, and "_" means "blocked or starved." Everything is parallel until the first spill, then everything serializes. There are 10 scans, so the extra wait time is for the other 8 scans to do their thing. All very complex!

And, this problem grows with the square of the number of nodes. With 100 fragments, we buffer 10K batches of 1K rows each, or 10M rows sitting in send buffers. This should be screaming at us, "you're doing it wrong!"

All the scans on a single node share a set of outgoing buffers. Now, nobody understands this beast, and it is implemented as a separate layer of exchanges, which seems to make things worse. I remember back in the day, someone tried to characterize its behavior. But, since we didn't understand it, we just kept trying "Mux On/Mux Off" with out really knowing what was happening. Having a part of the system that no one understands is never a great way to proceed.

Still, the Mux thing is a good idea even if the implementation is flawed. Ideally, it would reduce the number of outgoing batches from n^2 to just n. But, it adds so much complexity, with vectors, that the benefit could never be clearly seen.

How would rows work better than vectors? With vectors, we build up, say, 1024 rows in each batch before sending. We can't send until the batch is full, but if any batch is full, we can't add another row until we send the batch we've got. This is too crude and leads to the above scenario. The Mux exchange can consume rows so that the scans don't have to build up full batches to forward to the Mux.

Better would be to have a buffer of up to 1024 rows, but we send whatever we have after some time. Now, each scan can continue to send to other sorts even while holding rows for Scan 1 while it spills, In this case, some of the other sorts will also fill their buffers and start to spill concurrently with Scan 1.

Another idea is to do what Apex and other systems do: spill to disk if the receiver is blocked. (Or, equivalently, spill to disk on the receiver side if the consuming operator does not consume rows.)

Yet another solution is to run the Sort in multiple threads so that it spills and consumes concurrently. (The current design is single-threaded and is pretty aggressive about how much is spilled: we inherited that logic from the "first-generation" sort spill code.)

Now, I've focused on the Sort because I know that code best. But, the same is true for other blocking operators: joins, aggregations, etc.

When Drill is used embedded, or at small scale, you will never see this issue. When Drill runs at large scale, so that spilling kicks in, then this issue becomes the dominant problem that prevents high performance.

Sorry that this is all rather complex. Since it is complex, it goes unseen and unfixed. This is probably even a well-known problem in distributed computing circles: if any one knows of a good reference, please post it.

For Drill 2.0, we have to decide if we want to stay in the large-scale distributed system business, or just diddle around as an embedded desktop app to read PDF files and Excel spreadsheets. At scale, we have to fix the above issues. As a desktop app, we can rip out all the complex distributed cruft that no one understands and just be a decent several node solution.

But, there is no path that says we compete with Presto/Trino/Impala, etc. at scale while focusing on desktop concerns. Distributed system are *hard*, and we have to know what we're doing to offer an effective solution.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1008235930


   @paul-rogers I can imagine vectorisation helping hash joins because of all the hashing to be crunched. I haven't got a good paper to cite but a search from my phone did turn up at least one story on these lines. 
   
   https://www.cockroachlabs.com/blog/vectorized-hash-joiner/
   
   Regarding pivoting between row and column orientation depending on the operator, I would expect that all the ensuing memory copying would outweigh any gain but again I bring no references or data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton edited a comment on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton edited a comment on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1006287333

Okay, @paul-rogers I've had a few swigs of the kool aid by now and I think I'm ready to forget about in-memory column orientation and SIMD in return for the benefits of row orientation. For workflows that do involve bulk arithmetic I can imagine good interop taking care of that stage:

1. Do some efficient parsing, filtering, sorting, aggregating in Drill
2. Smoothly switch over to Pandas/Numpy (perhaps an Arrow exporter?) or Julia or ...
3. Do bulk arithmetic using SIMD or even a GPU
4. Store results or smoothly switch back to Drill

I've used this workflow myself where the data interchange format was Parquet and the transport medium was the DFS (so perhaps a bit more "clunky" than "smooth", with lots of serialisation and IO incurred).

Going further, if the decoupling of Drill from its in-memory format mentioned higher up is a real possibility then can we even imagine something like this, entirely in Drill?

```
alter session set exec.memory_format = 'drill'; -- the default, row-oriented format

create table as select ... -- do some efficient parsing, filtering, sorting, aggregating in Drill
create table as select ... -- do some efficient parsing, filtering, sorting, aggregating in Drill

alter session set exec.memory_format = 'arrow'; -- switch to Arrow format

create table as select ... do some bulk arithmetic using SIMD
create table as select ... do some bulk arithmetic using SIMD
```

To my mind Drill 2.0 would not try to ship support for the latter, Arrow format, merely make design decisions which leave that door open for a motivated developer...

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton closed issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton closed issue #2421:
URL: https://github.com/apache/drill/issues/2421


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] Leon-WTF edited a comment on issue #2421: [DISCUSSION] ValueVectors Replacement

Posted by GitBox <gi...@apache.org>.

Leon-WTF edited a comment on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1017659713


   @paul-rogers Thanks paul for your detailed explanation. One more question, why scan2 send to other sorts also blocked  when scan2 send to sort1 is blocked? Do you think it's better to make each communication pair execute separately? I see that the PartitionSender can do the partion and flush task in a thread pool.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1006287333

Going further, if the decoupling of Drill from its in-memory format mentioned above is a real possibility then can we even imagine something like this, entirely in Drill?

```
alter session set exec.memory_format = 'drill'; -- the default, row-oriented format

create table as select ... -- do some efficient parsing, filtering, sorting, aggregating in Drill
create table as select ... -- do some efficient parsing, filtering, sorting, aggregating in Drill

alter session set exec.memory_format = 'arrow'; -- switch to Arrow format

create table as select ... do some bulk arithmetic using SIMD
create table as select ... do some bulk arithmetic using SIMD
```

To my mind Drill 2.0 would not try to ship support for the latter, Arrow format, merely make design decisions which leave that door open for a motivated developer...

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] paul-rogers commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1005326255

@luocooong, your picture about page layout and cache performance is helpful: it shows how some DBs lay out pages. (I did something similar way back when I wrote a DB.) However, for an in-memory format, we'd do the layout differently. There would be no header (that's a separate object). We'd ensure that all data within each row is self-contained: no pointers from the footer back into rows.

Impala uses a format which, I believe, it borrowed from earlier DBs. Here is a variation, listing the various fields, in order:

* Row size.
* Fixed-width columns (INT, LONG, etc.) These have a fixed offset from the row start.
* Offset and length of each variable-width field. These also have a fixed offset.
* Variable portion with the variable-width data. (VARCHAR, etc.)

Note that it doesn't matter the order in which the variable fields are written: the offset/length pairs do the right thing. Just as Drill has a `BatchSchema` to say what vectors make up a batch, each block of rows would have a "row schema" to map from the logical schema (i.e. names and types) to offsets. Each fixed-width fetch is thus an addition (base + offset), plus a read/write. This is about the same as for vectors. (Vector address + row * size, then read.) The row data can, of course, reside in direct memory as for vectors.

The result is that the entire row fits into the cache as a unit with no overhead cruft. In practice, each operator has an incoming and outgoing row (filter, project, probe phase of a join, hash sender, ...). Still, the rows should be small enough that both incoming and outgoing rows fit in the cache.

Of course, the JVM will pull in byte code blocks, local variables, etc. So, the code has to also be designed carefully to avoid cache thrashing. That's what Drill's aggressive inlining and byte code fixups are supposed to do. Then, the number of threads has to be managed so we don't get swapped out just after we get our cache nicely set up.

This stuff is HARD. Testing is essential. A somewhat-dated, but still helpful, source is [Martin Thompson's Mechanical Sympathy Blog](https://mechanical-sympathy.blogspot.com/).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1004749499


   Paul Rogers wrote:
   
   Hi All,
   
   Thanks Charles for dredging up that old discussion, your memory is better
   than mine! And, thanks Ted for that summary of MapR history. As one of the
   "replacement crew" brought in after the original folks left, your
   description is consistent with my memory of events. Moreover, as we looked
   at what was needed to run Drill in production, an Arrow port was far down
   on the list: it would not have solved actual customer problems.
   
   Before we get too excited about Arrow, I think we should have a discussion
   about what we want in an internal storage format. I added a long (sorry)
   set of comments in that PR that Charles mentioned that tries to debunk the
   myths that have grown up around using a columnar format as the internal
   representation for a query engine. (Columnar is great for storage.) The
   note presents the many issues we've encountered over the years that have
   caused us to layer ever more code on top of vectors to solve various
   problems. It also highlights a distributed-systems problem which vectors
   make far worse.
   
   Arrow is meant to be portable, as Ted discussed, but it is still columnar,
   and this is the source of endless problems in an execution engine. So, we
   want to ask, what is the optimal format for what Drill actually does? I'm
   now of the opinion that Drill might actually better benefit  from a
   row-based format, similar to what Impala uses. The notes even paint a path
   forward.
   
   Ted's description of the goal for Demio suggests that Arrow might be the
   right answer for that market. Drill, however, tends to be used to query
   myriad data sources at scale and as a "query integrator" across systems.
   This use case has different needs, which may be better served with a
   row-based format.
   
   The upshot is that "value vectors vs. Arrow" is the wrong place to start
   the discussion. The right place is "what does our many years of experience
   with Drill suggest is the most efficient format for how Drill is actually
   used?"
   
   Note that Drill could have an Arrow-based API independent of the internal
   format. The quote from Charles explains how we could do that.
   
   Thanks,
   
   - Paul
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1004751568

Paul Rogers wrote:

One last note. Let's assume we wanted to adopt the row-based format (or, the myths being strong, we want to adopt Arrow.) How would we go about it?

The "brute force" approach is to rewrite all the operators. Must deal with low-level vector code, so we'd rewrite that with low-level row (or Arrow) code. Since we can't really test until all operators are converted, we'd have to do the entire conversion in one huge effort. Then, we get to debug. I hope this approach is setting off alarm bells: it is high cost and high risk. This is why Drill never seriously entertained the change.

But, there is another solution. The scan readers all used to work directly with vectors. (Parquet still does.) Because of the memory reasons explained above, we converted most of them to use EVF. As a result, we could swap vectors for row pages (or Arrow) by changing the low-level code. Readers would be blissfully ignorant of such changes because the higher-level abstractions would be unchanged.

So, a more sane way to approach a change of in-memory representations is to first convert the other operators to use an EVF-like approach. (EVF for writing new batches, a "Result Set Loader" for reading exiting batches.) Such a change can be done gradually, operator-by-operator, and is fully compatible with other, non-converted operators. No big bang.

Once everything is upgraded to EVF, then we can swap out the in-memory format. Maybe try Arrow. Try a row-based format. Run tests. Pick the winner.

This is *not* a trivial exercise, but it is doable over time, if we see value and can muster the resources.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] paul-rogers commented on issue #2421: [DISCUSSION] ValueVectors Replacement

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1019575466

One more factor to consider: memory management. Row-based solutions are far, far easier to memory manage than vectors. Row solutions reduce internal fragmentation from 1/4 of batch size to 1/2 of the (much smaller) row size.

We talked early on about the difficulty of sizing vectors and batches. If a record has 20 columns, then each is a vector. If the columns are fixed-width (`INT`, `BIGINT`, etc.) then we can predict the size of a batch with, say, 1K rows. What we can't predict is the size of variable-width columns (`VARCHAR`, say) so sometimes we have a vector of size 1K (because all the `VARCHAR` values are one byte) and sometimes we have 1GB (because all the values are 1MB.) Drill now has nots of ad-hoc code to try to guess the size, which is much better than the early days, but still not great.

We often want to go the other way: we want to control memory use, so we allocate, say, 1MB per batch. When we do this, if we know the sort is given 1GB of buffer space, then we know we can store 1024 batches. (Picking simple numbers here just for discussion: actual numbers will vary.) So, what do we do to limit overall batch size to 1MB?

Before EVF, we couldn't and batch sizes could become large and cause intermittent OOM errors. With EVF, there is complex and elaborate code to detect when the next doubling of any vector could push us past the target size. EVF reports "your batch is full", and the batch is sent downstream. Today, this works only in the scan; all other operators retain their ad-hoc "batch sizer" based solutions. (Full disclosure: I originally wrote the batch sizer as a quick & dirty solution for the sort, hence its silly name. Little did I realize it would spread to other operators and still be with us years later: I thought EVF would have won the battle by now. Sigh...)

When sizing vectors this way, we end up with a large amount of *internal fragmentation* unused space within vectors. On average, the last half of a vector will be half used, meaning that the vector is, on average 3/4 full. (The reason is subtle: we doubled the vector when it was full. After the doubling, the vector is half full. Variable-width vector sizes are uncorrelated, so, on average we'll have half-filled the second half when the batch as a whole fills.) So, with our 1MB batch, about 1/4 or 256K goes unused. That's pretty inefficient!

Now, consider a row-based solution. Again we allocate 1MB for our buffer. We fill it with rows, end-to-end. When the buffer fills, we will, on average, have written half a row. We shift that row to a new buffer and send the current one on its way. Now, our fragmentation is 1/2 the row width. If rows width is << the buffer size, then we're making much better use of our scarce memory resource (especially in buffering operators such as sort, join, and aggregation.)

Lest you think this is somehow a Drill bug, recall how Parquet is built. The entire data set is buffered in memory before it is written to the file. Why? Parquet has no idea how large a column or row group will be until the columnar data blocks are actually created. To pack data optimally in the file, it is sub-optimally all buffered in memory prior to writes. In a way, this Parquet buffering (and the resulting memory pressure) is very similar to the exchange buffering we discussed in prior notes.

There is another huge win for row-based buffers: they are all of the same size (or, if we get very fancy, of a few sizes.) Rather than needing a malloc-style variable-block size allocator, we can use a much simpler DB-style buffer pool. New buffers come out of the pool, retired buffers go back in. We know exactly how many buffers we have, and can easily allocate quantities to queries and fragments. All the guesswork evaporates from sort, join and aggregate planning: they instead know exactly how many buffers they have to work with, and can plan spilling accordingly. None of this is new; you'll find it written up in any book on DB internals.

Thus, in just about every dimension, columnar storage is suboptimal as an internal query engine data format. The Impala guys knew this and benefited from their row-based structure. Drill is still suffering from its early misunderstanding that the benefits of columnar storage somehow translate into benefits as an internal data format.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] Leon-WTF commented on issue #2421: [DISCUSSION] ValueVectors Replacement

Posted by GitBox <gi...@apache.org>.

Leon-WTF commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1017659713


   @paul-rogers Thanks paul for your detailed explanation. One more question, why scan2 send to other sorts also blocked  when scan2 send to sort1 is blocked? Do you think it's better to make each communication pair execute separately? I see that the PartitionSender can do the partion task in a thread pool.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1004751317

Paul Rogers wrote:

OK, so the above raise the issues we have to consider when thinking about vectors (Drill's or Arrow's.) What is the alternative?

Here, I think Impala got it right. Impala uses Parquet (columnar) for *storage*, but rows for *internal* processing. Impala is like an Italian sports car of old: spends lots of time in the shop, but when it works, it is very fast.

One of the reasons that Impala is fast is because of the row format. First, let's describe what "row-based" means. It means that columns appear together, as in a C `struct`, with rows packed one after another as in an array of `structs`. This means that the data for a given row is contiguous. There is only one buffer to size. Classic DB stuff that seems unusual only because we're all used to Drill's vector format.

Let's look at the same issues above, but from a row-based perspective.

**Expression Execution**: With a row-based model, the CPU can easily load a single row into the L1 cache. All our crufty-real-world expression logic works on that single row. So, no matter how messy the expressions, from the CPU's perspective, all the data is in that single row, which fits nicely into the cache.

Rows can be small (a few dozen bytes) or large (maybe a 10s of K for long VARCHARs). In either case, they are far smaller than the L1 cache. The row is loaded. Once we move onto the next row, we'll never visit the previous one, so we don't care if the CPU flushes it from the cache.

**Memory Allocation**: Rows reside in buffers (AKA "pages"), typically of a fixed size. A reader "pours" data into a row. When the page is full, that last record is copied to the next page. Only that one row is copied, not all the existing data. So, we eliminate the 1X copy + 1X load problem in Drill. Since there is only one page to allocate, memory is simpler. Since pages are of fixed size, memory management is simpler as well.

**Exchanges**: Network exchanges are row-based. Rows are self-contained. A network sender can send single rows, if that is efficient, or batches of rows. In our 100-senders-100-receiver example, we could send rows as soon as they are available. The receiver starts working as soon as the first row is available. There is no near-deadlock from excessive buffering.

Yes, we would want to buffer rows (into pages) for efficiency. But, in extreme cases, we can send small numbers of rows to keep the DAG flowing.

**Clients**: As noted above, row-based clients are the classic solution and are simple to write. We could easily support proper clients in Python, Go, Rust and anything else if we used a row-based format.

**Conclusion**: We tend to focus on the "value vector vs. Arrow" discussion. I'm here to say that that is the wrong question: it buys into myths which have hurt Drill for years. The *correct* question is: what is the most efficient format for the use cases where Drill wants to excel? The above suggests that, rather than Arrow, a better solution is to adopt a row-based internal format.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1004748679


   Ted Dunning wrote:
   
   As a little bit of perspective from somebody who *was* at MapR at the time,
   here are my recollections.
   
   Arrow is pretty much the value vectors from Drill with some lessons learned
   and all dependencies removed so that Arrow can be consumed separately from
   Drill.
   
   The spinout of the Dremio team didn't happen because of the lack of
   integration with Arrow ... it was more the other way around ... because a
   significant chunk of the Drill team left to form Dremio, the driving force
   that could have pushed for integration just wasn't around any more because
   they were off doing Dremio and weren't working on Drill any more very much.
   The motive for the spinout had mostly to do with the fact that Tomer and
   Jacques recognized the opportunity to build a largely in-memory analytical
   engine based on zero serialization techniques and also recognized that this
   could never be a priority for MapR because it was outside the center of
   mass there. Once the Dremio team was out, though, they had a huge need for
   interoperability with systems like Spark and Cassandra, and they needed to
   not impose all of Drill as a dependency if they wanted these other systems
   to take on Arrow.
   
   This history doesn't really impact the merits or methods of integrating
   present-day Drill with Arrow, but it is nice to get the story the right way
   around.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1004749229

Christian Pfarr wrote:

Hi Charles, Ted, and the others here,

it is very interesting to hear the evolution of Drill, Dremio and Arrow in that context and thank you Charles for restarting that discussion.

I think, and James mentioned this in the PR as well, that Drill could benefit from the continues progress, the Arrow project has made since its separation from Drill. And the arrow Community seems to be large, so i assume this goes on and on with improvements, new features, etc. but i have not enough experience in Drill internals to have an Idea in which mass of refactoring this would lead.

In addition to that, im not aware of the current roadmap of Arrow and if these would fit into Drills roadmap. Maybe Arrow would go into a different direction than Drill and what should we do, if Drill is bound to Arrow then?

On the other hand, Arrow could help Drill to a wider adoption with clients like pyarrow, arrow-flight, various other programming languages etc. and (im not sure about that) maybe its a performance benefit if Drill use Arrow to read Data from HDFS(example), useses Arrow to work with it during execution and gives the vectors directly to my Python(example) programm via arrow-flight so that i can Play around with Pandas, etc.

Just some thoughts i have since i have used Dremio with pyarrow and Drill with odbc connections.

Regards
Christian

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] jnturton commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

jnturton commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1004749364


   Ted Dunning wrote:
   
   Christian,
   
   Your thoughts are very helpful. I find Arrow very nice (I use it in Agstack
   with Julia and Python).
   
   I don't think anybody is saying that Drill wouldn't be well set with a
   switch to Arrow or even just interfaces to Arrow. But it is a lot of work
   to make it all happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] paul-rogers commented on issue #2421: [DISCUSSION] ValueVectors Replacement

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1019033520

@Leon-WTF, good question, as always. We don't actually *want* any scan to be blocked: this is the problem we need to fix. Every scan fragment has a set of outgoing buffers. The problem is, in normal Drill operation, we don't send them until they are full. But, to absorb delays, we want to keep them empty most of the time, and allow them to absorb rows when the receiver is blocked.

Our example had 10 sort fragments, so there are 10 outgoing buffers per scan fragment (ignoring the Mux exchange.) Let's look at this from the scan's perspective. It can fill each outgoing buffer to, say 1024 rows.

The scan chugs along, producing rows. The random exchange assigns each to one of the outgoing buffers and copies it across. (Note the copy: we get no benefit from vectors during this operation.)

Our data is evenly distributed, so we start with 0 rows per outgoing buffer. After a while, most buffers have, say, 3 rows. Some have more, some less, since things are random. We chug along some more. Now we're up to an average of 1000 rows. Pretty soon, one of the buffers fills to 1024.

At this point, the scan fragment can't add another row to that buffer, so off the buffer goes across the wire to its target sort fragment. All good. This continues, the next buffer gets to 1024 and is sent off. Again, because of even distribution, pretty soon all are full and sent. Now the scan can chug along and refill them.

Now, let's say that Sort 1 can only buffer 9 incoming batches, so it now wants to spill when, say, Scan 8 sends it the 10th batch.

Now, back to scan 1, it has filled its buffers again, and it happens that the one for Sort 1 (the one which is spilling) is filled first. So, it has to be sent off before the Scan can produce another row. Why? There is no other place to put that row, and we can't get to the row behind that until we dispose of the current wow. But, the destination is blocked, doing a spill (ignoring the three-buffer rule), so the full buffer can't be sent, and so we can't add to it, since it is full.

So, here's the heart of the problem. Scan 1 has just produced a row (or a batch with a row) that must go to Scan 1. The outgoing buffer is full, but it can't be sent. Scan 1 can't go any further until it can get rid of that current row. So...it...waits...

In effect, our buffer to absorb delays can sometimes be as small as 0 or 1 rows (though it will average to 512 rows.)

Even if we have separate channels (which we do), we can't use them. We could send off all the other partly-full buffers and it still wouldn't help us with that one we most care about: the one that has to take the current row but can't (because it is full and we're blocked sending that buffer). Sending the other buffers _might_ get the other sorts to spill sooner, so we get concurrent spilling. (Something for us to consider as a partial fix.)

This is what I meant by a 10-way distributed query that begins to act as a single thread. Each scan can only send as fast as the slowest soft. Any spilling sort blocks all scans, which blocks all sorts. Definitely A Bad Thing.

Now, if we have a row-based exchange, things might work better. Each scan might now have a rule:

* Keep a buffer of up to 1024 rows (just picking a number).
* Send the buffer if it is full, or if non-empty and has not been sent in the last 100 ms. (pick any time interval).

With that rule (properly tuned), the scans will typically have mostly empty send buffers. When Sort 1 stars spilling, Scan 1 can buffer up to 1024 rows for that sort, while also sending partial buffers to other scans, to get them closer to spilling. Once Sort 1 is blocked, its rows are sent and maybe some other Sort's outgoing buffer starts to fill. We get closer to doing spills in parallel rather than sequentially.

In this way, we normally have empty buffers, and have logs of margin to absorb blocking. Contrast this to the current scheme in which we have no leeway when a buffer fills and the receiver blocks.

I don't have hard numbers, but I suspect (based on my experience with Impala) that the row-based schema would not fall prey to the single-threading that the vector-based approach does. (And, of course, we could implement the above idea with vectors, but it would normally lead to small batches downstream which might cause unexpected other problems in the current design.)

To see this, we need more detailed instrumentation (a time series of events, not just a summary), and a beefy cluster, with a large data set, to run tests.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] paul-rogers commented on issue #2421: ValueVectors replacement

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1007611673


   Hi James,
   
   One could do something like what you described. However, to have all of
   Drill work with Arrow would be a huge amount of work. Optimizations made
   for one format would be sub-optimal for the other. (Example: exchanges.)
   Furthermore, your use case would benefit from vectors only in the project
   and grouping operators.
   
   So, I wonder if we might think about the problem operator-by-operator. If
   you have a compute-heavy phase, might that first transform data to vectors,
   apply the compute, then send data along in row format? Every fragment does
   a network exchange: data is read/written anyway. So, perhaps there is
   something that can be done to transform formats at fragment boundaries (he
   says, waving hands wildly...)
   
   You'll also get speed only for queries without joins. If you have joins,
   then the joins are likely to take the vast amount of the runtime, leaving
   your projection and grouping in the noise. I'm not sure how vectorization
   can help joins; certainly in Drill today, vectors make the join code
   atrociously complex.
   
   This is why DBs (and compiler optimizers) are hard: the answers change
   based on use case...
   
   Thanks,
   
   - Paul
   
   On Wed, Jan 5, 2022 at 9:03 PM James Turton ***@***.***>
   wrote:
   
   > Okay, @paul-rogers <https://github.com/paul-rogers> I've had a few swigs
   > of the kool aid by now and I think I'm ready to forget about in-memory
   > column orientation and SIMD in return for the benefits of row orientation.
   > For workflows that do involve bulk arithmetic I can imagine good interop
   > taking care of that stage:
   >
   >    1. Do some efficient parsing, filtering, sorting, aggregating in Drill
   >    2. Smoothly switch over to Pandas/Numpy (perhaps an Arrow exporter?)
   >    or Julia or ...
   >    3. Do bulk arithmetic using SIMD
   >    4. Store results or smoothly switch back to Drill
   >
   > I've used this workflow myself where the data interchange format was
   > Parquet and the transport medium was the DFS (so perhaps a bit more
   > "clunky" than "smooth", with lots of serialisation and IO incurred).
   >
   > Going further, if the decoupling of Drill from its in-memory format
   > mentioned above is a real possibility then can we even imagine something
   > like this, entirely in Drill?
   >
   > alter session set exec.memory_format = 'drill'; -- the default, row-oriented format
   >
   > create table as select ... -- do some efficient parsing, filtering, sorting, aggregating in Drill
   > create table as select ... -- do some efficient parsing, filtering, sorting, aggregating in Drill
   >
   > alter session set exec.memory_format = 'arrow'; -- switch to Arrow format
   >
   > create table as select ... do some bulk arithmetic using SIMD
   > create table as select ... do some bulk arithmetic using SIMD
   >
   > To my mind Drill 2.0 would not try to ship support for the latter, Arrow
   > format, merely make design decisions which leave that door open for a
   > motivated developer...
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/drill/issues/2421#issuecomment-1006287333>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAYZF4OL3CNE5WIQCZG4SBDUUUPD3ANCNFSM5LHIIU5Q>
   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
   > or Android
   > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
   >
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org