You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@beam.apache.org by "Amit Sela (JIRA)" <ji...@apache.org> on 2016/09/26 08:42:21 UTC

[jira] [Created] (BEAM-673) Data locality for Read.Bounded

Amit Sela created BEAM-673:
------------------------------

             Summary: Data locality for Read.Bounded
                 Key: BEAM-673
                 URL: https://issues.apache.org/jira/browse/BEAM-673
             Project: Beam
          Issue Type: Bug
          Components: runner-spark
            Reporter: Amit Sela
            Assignee: Amit Sela


In some distributed filesystems, such as HDFS, we should be able to hint to Spark the preferred locations of splits.
Here is an example of how Spark does that for Hadoop RDDs:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L252

*Note: in case of 1-to-1 mapping of Read operation (e.g. TextIO) direct translation should still be preferred, but this is pending HDFS support for Beam anyway.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)