You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/05/13 00:36:12 UTC
[jira] [Created] (KUDU-1454) Spark and MR jobs running without scan
locality
Todd Lipcon created KUDU-1454:
---------------------------------
Summary: Spark and MR jobs running without scan locality
Key: KUDU-1454
URL: https://issues.apache.org/jira/browse/KUDU-1454
Project: Kudu
Issue Type: Bug
Components: client, perf, spark
Affects Versions: 0.8.0
Reporter: Todd Lipcon
Priority: Critical
Spark (and according to [~danburkert] MR also now) add all of the locations of a tablet as split locations. This makes sense except that the Java client currently always scans the leader replica. So in many cases we schedule a task which is "local" to a follower, and then it ends up having to do a remote scan.
This makes Spark queries take about twice as long on tables with replicas compared to unreplicated tables, and I think is a regression on the MR side.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)