You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Sebastian Piu <se...@gmail.com> on 2017/03/15 12:37:01 UTC

Thrift Server as JDBC endpoint

Hi all,

I'm doing some research on best ways to expose data created by some of our
spark jobs so that they can be consumed by a client (A Web UI).

The data we need to serve might be huge but we can control the type of
queries that are submitted e.g.:
* Limit number of results
* only accept SELECT statements (i.e. readonly)
* Only expose some pre-calculated datasets, as in, always going to a
particular partitions - no joins etc.

In terms of latency, the lower the better but we don't have any weird
scenarios like sub second responses and stability is hugely preferred.

Is thrift server stable for this kind of use cases? How does it perform
under concurrency? Is it better to have several instances and load balance
them or a single one with more resources?

Would be interested in hearing any experiences from people using this on
prod environments

thanks
Seb