You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Amit Sela (JIRA)" <ji...@apache.org> on 2016/09/26 08:42:21 UTC
[jira] [Created] (BEAM-673) Data locality for Read.Bounded
Amit Sela created BEAM-673:
------------------------------
Summary: Data locality for Read.Bounded
Key: BEAM-673
URL: https://issues.apache.org/jira/browse/BEAM-673
Project: Beam
Issue Type: Bug
Components: runner-spark
Reporter: Amit Sela
Assignee: Amit Sela
In some distributed filesystems, such as HDFS, we should be able to hint to Spark the preferred locations of splits.
Here is an example of how Spark does that for Hadoop RDDs:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L252
*Note: in case of 1-to-1 mapping of Read operation (e.g. TextIO) direct translation should still be preferred, but this is pending HDFS support for Beam anyway.*
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)