You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/03/23 13:55:40 UTC

[GitHub] [drill] jnturton commented on pull request #2485: DRILL-8086: Convert the CSV (AKA "compliant text") reader to EVF V2

jnturton commented on pull request #2485:
URL: https://github.com/apache/drill/pull/2485#issuecomment-1076404891

> Drill 2.0 is an opportunity to reorient Drill away from fading big data space and toward the data science use cases that most PRs now seem to support. (It's not that big data itself is gone, it'd just that most folks who need that kind of scale now run in the cloud where Drill is not common.) As one of many examples, REST APIs make no sense at scale, but do make sense for a "small data" tool.
> Or, have two Drill additions, the old-school "distributed systems" edition and the newer "data science edition". Those who still need Drill to work distributed can keep that edition going (along with the big data CSV quirks), while the data science folks can fork the data science edition, chuck the distributed systems stuff that gets in the way, and focus on things that data scientists do (such as reading Excel and PDF files.)

@paul-rogers supporting two editions of Drill would be even harder for our small band of developers than supporting one, surely? Also, I think it would be a major loss to unpick all of the MPP work done in Drill to make big data queryable, in any notional edition. Indeed, for a small-data-only query engine, I doubt that there would be any sense in starting from Drill at all. A fresh start based on Calcite, Pandas or Julia would be simpler and cleaner.

Many people do their big data processing in the cloud but not all of them want the vendor lock-in of the SAAS products so prefer to deploy open source in the cloud. Others remain on-prem. In addition, I still contend that the worlds of small and big data are not disjoint, and that a single system that can query over many storages, formats and data sizes is valuable to a viable audience. If the contrib/ plugins can be sufficiently separated away from the rest of Drill then the variability in their scalability and behaviour is quarantined away from core Drill.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org