You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Robert Liszli (Jira)" <ji...@apache.org> on 2022/09/28 13:05:00 UTC

[jira] [Created] (NIFI-10556) Create processor to support DeltaLake tables

Robert Liszli created NIFI-10556:
------------------------------------

             Summary: Create processor to support DeltaLake tables
                 Key: NIFI-10556
                 URL: https://issues.apache.org/jira/browse/NIFI-10556
             Project: Apache NiFi
          Issue Type: New Feature
          Components: Extensions
            Reporter: Robert Liszli
            Assignee: Robert Liszli


*Plan for the new processor*

The new processor will use the Delta Standalone library to generate delta table for a set of parquet data files located locally or in cloud storage.

*Processors input:*
 * The path of the parquet files(a single directory). Located at local filesystem or in cloud storage(S3, GCP or Azure).
 * Structure of the parquet file in json format.

*Processors parameter:*
 * Dropdown selector for storage type selection.
 * Credentials for the selected storage type.

*On Trigger:*
 * The processor will compare the files in the data directory to the files already added to the delta table. If new data file exist, it will add it to the delta table.
 * If there is no delta table exists, the processor will create one and the delta table will be generated.

*Output of the processor:*
 * Up to date Delta table in the chosen storage system.

 

Delta Standalone: [https://github.com/delta-io/connectors#delta-standalone]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)