You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Ismaël Mejía (Jira)" <ji...@apache.org> on 2020/05/30 14:54:00 UTC

[jira] [Created] (BEAM-10159) Support Reading data from Databricks Delta

Ismaël Mejía created BEAM-10159:
-----------------------------------

             Summary: Support Reading data from Databricks Delta
                 Key: BEAM-10159
                 URL: https://issues.apache.org/jira/browse/BEAM-10159
             Project: Beam
          Issue Type: New Feature
          Components: io-ideas
            Reporter: Ismaël Mejía


Databricks Delta is an open source storage layer on top of different
filesystems. The current implementation of Delta is strongly coupled with Spark
so we cannot rely on it because it would break Beam portability.

However now there is an open specification for Delta's protocol.
https://github.com/delta-io/delta/blob/master/PROTOCOL.md

Another possible approach could be to investigate how if Beam could use a
manifest based approach like Presto does:

https://docs.databricks.com/delta/presto-integration.html




--
This message was sent by Atlassian Jira
(v8.3.4#803005)