You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by mattyb149 <gi...@git.apache.org> on 2017/04/17 20:32:54 UTC

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

GitHub user mattyb149 opened a pull request:

    https://github.com/apache/nifi/pull/1677

    NIFI-3704: Add PutDatabaseRecord processor

    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [x] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [x] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [x] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [x] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [x] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [x] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mattyb149/nifi put-db-record

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1677
    
----
commit 102e83a516b161e980448ef3000c8c895768d9dd
Author: Matt Burgess <ma...@apache.org>
Date:   2017-04-17T20:30:59Z

    NIFI-3704: Add PutDatabaseRecord processor

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112190809
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1067 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "convert", "jdbc", "put", "database"})
    +@SeeAlso({ConvertJSONToSQL.class, PutSQL.class})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail", "Fail",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RowRecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    --- End diff --
    
    Nope, another copy-paste error, will remove


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on the issue:

    https://github.com/apache/nifi/pull/1677
  
    @mattyb149 I reviewed the updates and tested CDC scenario, that became much simpler without begin/commit and DDL events. Also, non CDC scenario seems to work as expected. +1
    
    I will squash and merge this PR into master. Thanks for your tremendous effort for realizing entire CDC flow!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112938898
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1058 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    --- End diff --
    
    Should this processor support Rollback on Failure? If so, perhaps we should wait on #1658 until this is merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/1677
  
    An end-to-end flow can be time-consuming to set up, I have a template [as a Gist here](https://gist.github.com/mattyb149/1ebdbff9ea48f36f5a5f007b191e8f8b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r113185780
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1076 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.schema.access.SchemaNotFoundException;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.SQLNonTransientException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    +                                        } else {
    +                                            log.error("Record had no (or null) value for Field Containing SQL: {}, flowfile {} will be penalized and routed to failure",
    +                                                    new Object[]{sqlField, flowFile});
    +                                            flowFile = session.penalize(flowFile);
    +                                            session.transfer(flowFile, REL_FAILURE);
    +                                            return;
    +                                        }
    +                                    }
    +                                    session.transfer(flowFile, REL_SUCCESS);
    +                                    session.getProvenanceReporter().send(flowFile, jdbcURL);
    +                                } catch (final SQLNonTransientException e) {
    +                                    log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                } catch (final SQLException e) {
    +                                    log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                            new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_RETRY);
    +                                }
    +                            } else {
    +                                log.warn("Record schema does not contain Field Containing SQL: {}, flowfile {} will be penalized and routed to failure", new Object[]{sqlField, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                            }
    +                        }
    +
    +                    } else {
    +                        // Ensure the table name has been set, the generated SQL statements (and TableSchema cache) will need it
    +                        if (StringUtils.isEmpty(tableName)) {
    +                            log.error("Cannot process {} because Table Name is null or empty; penalizing and routing to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        final boolean includePrimaryKeys = UPDATE_TYPE.equalsIgnoreCase(statementType) && updateKeys == null;
    +
    +                        // get the database schema from the cache, if one exists. We do this in a synchronized block, rather than
    +                        // using a ConcurrentMap because the Map that we are using is a LinkedHashMap with a capacity such that if
    +                        // the Map grows beyond this capacity, old elements are evicted. We do this in order to avoid filling the
    +                        // Java Heap if there are a lot of different SQL statements being generated that reference different tables.
    +                        TableSchema schema;
    +                        synchronized (this) {
    +                            schema = schemaCache.get(schemaKey);
    +                            if (schema == null) {
    +                                // No schema exists for this table yet. Query the database to determine the schema and put it into the cache.
    +                                try (final Connection conn = dbcpService.getConnection()) {
    +                                    schema = TableSchema.from(conn, catalog, schemaName, tableName, translateFieldNames, includePrimaryKeys);
    +                                    schemaCache.put(schemaKey, schema);
    +                                } catch (final SQLNonTransientException e) {
    +                                    log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                    return;
    +                                } catch (final SQLException e) {
    +                                    log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                            new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_RETRY);
    +                                    return;
    +                                }
    +                            }
    +                        }
    +
    +                        final SqlAndIncludedColumns sqlHolder;
    +                        try {
    +                            // build the fully qualified table name
    +                            final StringBuilder tableNameBuilder = new StringBuilder();
    +                            if (catalog != null) {
    +                                tableNameBuilder.append(catalog).append(".");
    +                            }
    +                            if (schemaName != null) {
    +                                tableNameBuilder.append(schemaName).append(".");
    +                            }
    +                            tableNameBuilder.append(tableName);
    +                            final String fqTableName = tableNameBuilder.toString();
    +
    +                            if (INSERT_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateInsert(recordSchema, fqTableName, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else if (UPDATE_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateUpdate(recordSchema, fqTableName, updateKeys, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else if (DELETE_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateDelete(recordSchema, fqTableName, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else {
    +                                log.error("Statement Type {} is not valid, flowfile {} will be penalized and routed to failure", new Object[]{statementType, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                                return;
    +                            }
    +                        } catch (final ProcessException pe) {
    +                            log.error("Failed to convert {} to a SQL {} statement due to {}; routing to failure",
    +                                    new Object[]{flowFile, statementType, pe.toString()}, pe);
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        try (PreparedStatement ps = con.prepareStatement(sqlHolder.getSql())) {
    +
    +                            try {
    +                                ps.setQueryTimeout(queryTimeout); // timeout in seconds
    +                            } catch (SQLException se) {
    +                                // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                if (queryTimeout > 0) {
    +                                    throw se;
    +                                }
    +                            }
    +
    +                            Record currentRecord;
    +                            List<Integer> fieldIndexes = sqlHolder.getFieldIndexes();
    +
    +                            while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                Object[] values = currentRecord.getValues();
    +                                if (values != null) {
    +                                    if (fieldIndexes != null) {
    +                                        for (int i = 0; i < fieldIndexes.size(); i++) {
    +                                            ps.setObject(i + 1, values[fieldIndexes.get(i)]);
    +                                        }
    +                                    } else {
    +                                        // If there's no index map, assume all values are included and set them in order
    +                                        for (int i = 0; i < values.length; i++) {
    +                                            ps.setObject(i + 1, values[i]);
    +                                        }
    +                                    }
    +                                    ps.addBatch();
    +                                }
    +                            }
    +
    +                            log.debug("Executing query {}", new Object[]{sqlHolder});
    +                            ps.executeBatch();
    --- End diff --
    
    I assumed that this processor can be used not only for CDC use-cases, but also for simply ingesting multiple records as a single batch insertion. It works as expected for normal cases. However when I tested with a FlowFile containing multiple records mixed with successful and failure records, a BatchUpdateException was thrown, and the incoming FlowFile was routed to "retry". Database table has records without the failed ones.
    
    This would cause data duplication if the same FlowFile is processed again, as it inserts the successful records again (it's possible depends on primary/unique key setting).
    
    I put nifi template and SQLs in this [Gist](https://gist.github.com/ijokarumawak/d2ae2d582472780cc5769c19dd204033) to reproduce this behavior.
    
    I would expect this processor to rollback without commit records partially. Or create FlowFiles for each destination relationship containing associated records. How do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112848809
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1058 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    --- End diff --
    
    `REL_RETRY` is not used. Do we still want this relationship?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112180165
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1067 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "convert", "jdbc", "put", "database"})
    +@SeeAlso({ConvertJSONToSQL.class, PutSQL.class})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail", "Fail",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RowRecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-batch-size")
    +            .displayName("Batch Size")
    +            .description("The preferred number of FlowFiles to put to the database in a single transaction")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR)
    +            .defaultValue("100")
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +        pds.add(BATCH_SIZE);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RowRecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RowRecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    +                                        } else {
    +                                            log.error("Record had no (or null) value for Field Containing SQL: {}, flowfile {} will be penalized and routed to failure",
    +                                                    new Object[]{sqlField, flowFile});
    +                                            flowFile = session.penalize(flowFile);
    +                                            session.transfer(flowFile, REL_FAILURE);
    +                                        }
    +                                    }
    +                                    session.transfer(flowFile, REL_SUCCESS);
    +                                    session.getProvenanceReporter().send(flowFile, jdbcURL);
    +                                } catch (final SQLException e) {
    +                                    log.error("Unable to update database due to {}, flowfile {} will be penalized and routed to failure", new Object[]{e.getMessage(), flowFile}, e);
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                }
    +                            } else {
    +                                log.warn("Record schema does not contain Field Containing SQL: {}, flowfile {} will be penalized and routed to failure", new Object[]{sqlField, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                            }
    +                        }
    +
    +                    } else {
    +                        // Ensure the table name has been set, the generated SQL statements (and TableSchema cache) will need it
    +                        if(StringUtils.isEmpty(tableName)) {
    +                            log.error("Cannot process {} because Table Name is null or empty; penalizing and routing to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        final boolean includePrimaryKeys = UPDATE_TYPE.equals(statementType) && updateKeys == null;
    --- End diff --
    
    When I tried to update a table which has a primary key column, I got:
    ```
    routing to failure:
    org.apache.nifi.processor.exception.ProcessException: Table 'xxx' does not have a Primary Key and no Update Keys were specified
    ```
    
    This `UPDATE_TYPE.equals(statementType)` should use equalsIgnoreCase as other code do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112124645
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1067 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "convert", "jdbc", "put", "database"})
    +@SeeAlso({ConvertJSONToSQL.class, PutSQL.class})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail", "Fail",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RowRecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-batch-size")
    +            .displayName("Batch Size")
    +            .description("The preferred number of FlowFiles to put to the database in a single transaction")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR)
    +            .defaultValue("100")
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +        pds.add(BATCH_SIZE);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RowRecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RowRecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    +                                        } else {
    +                                            log.error("Record had no (or null) value for Field Containing SQL: {}, flowfile {} will be penalized and routed to failure",
    +                                                    new Object[]{sqlField, flowFile});
    +                                            flowFile = session.penalize(flowFile);
    +                                            session.transfer(flowFile, REL_FAILURE);
    +                                        }
    +                                    }
    +                                    session.transfer(flowFile, REL_SUCCESS);
    --- End diff --
    
    I got
    ```
    org.apache.nifi.processor.exception.FlowFileHandlingException: StandardFlowFileRecord[uuid=....] is not the most recent version of this FlowFile within this session
    ```
    Exception here when an incoming FlowFile does not have sql value the FlowFile is routed to 'REL_FAILURE' above, but routed to 'REL_SUCCESS', too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112190994
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1067 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "convert", "jdbc", "put", "database"})
    +@SeeAlso({ConvertJSONToSQL.class, PutSQL.class})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail", "Fail",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RowRecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-batch-size")
    +            .displayName("Batch Size")
    +            .description("The preferred number of FlowFiles to put to the database in a single transaction")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR)
    +            .defaultValue("100")
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +        pds.add(BATCH_SIZE);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RowRecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RowRecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    +                                        } else {
    +                                            log.error("Record had no (or null) value for Field Containing SQL: {}, flowfile {} will be penalized and routed to failure",
    +                                                    new Object[]{sqlField, flowFile});
    +                                            flowFile = session.penalize(flowFile);
    +                                            session.transfer(flowFile, REL_FAILURE);
    +                                        }
    +                                    }
    +                                    session.transfer(flowFile, REL_SUCCESS);
    +                                    session.getProvenanceReporter().send(flowFile, jdbcURL);
    +                                } catch (final SQLException e) {
    +                                    log.error("Unable to update database due to {}, flowfile {} will be penalized and routed to failure", new Object[]{e.getMessage(), flowFile}, e);
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                }
    +                            } else {
    +                                log.warn("Record schema does not contain Field Containing SQL: {}, flowfile {} will be penalized and routed to failure", new Object[]{sqlField, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                            }
    +                        }
    +
    +                    } else {
    +                        // Ensure the table name has been set, the generated SQL statements (and TableSchema cache) will need it
    +                        if(StringUtils.isEmpty(tableName)) {
    +                            log.error("Cannot process {} because Table Name is null or empty; penalizing and routing to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        final boolean includePrimaryKeys = UPDATE_TYPE.equals(statementType) && updateKeys == null;
    --- End diff --
    
    Agree, another copy-paste error on my part, will change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r113343882
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1076 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.schema.access.SchemaNotFoundException;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.SQLNonTransientException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    +                                        } else {
    +                                            log.error("Record had no (or null) value for Field Containing SQL: {}, flowfile {} will be penalized and routed to failure",
    +                                                    new Object[]{sqlField, flowFile});
    +                                            flowFile = session.penalize(flowFile);
    +                                            session.transfer(flowFile, REL_FAILURE);
    +                                            return;
    +                                        }
    +                                    }
    +                                    session.transfer(flowFile, REL_SUCCESS);
    +                                    session.getProvenanceReporter().send(flowFile, jdbcURL);
    +                                } catch (final SQLNonTransientException e) {
    +                                    log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                } catch (final SQLException e) {
    +                                    log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                            new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_RETRY);
    +                                }
    +                            } else {
    +                                log.warn("Record schema does not contain Field Containing SQL: {}, flowfile {} will be penalized and routed to failure", new Object[]{sqlField, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                            }
    +                        }
    +
    +                    } else {
    +                        // Ensure the table name has been set, the generated SQL statements (and TableSchema cache) will need it
    +                        if (StringUtils.isEmpty(tableName)) {
    +                            log.error("Cannot process {} because Table Name is null or empty; penalizing and routing to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        final boolean includePrimaryKeys = UPDATE_TYPE.equalsIgnoreCase(statementType) && updateKeys == null;
    +
    +                        // get the database schema from the cache, if one exists. We do this in a synchronized block, rather than
    +                        // using a ConcurrentMap because the Map that we are using is a LinkedHashMap with a capacity such that if
    +                        // the Map grows beyond this capacity, old elements are evicted. We do this in order to avoid filling the
    +                        // Java Heap if there are a lot of different SQL statements being generated that reference different tables.
    +                        TableSchema schema;
    +                        synchronized (this) {
    +                            schema = schemaCache.get(schemaKey);
    +                            if (schema == null) {
    +                                // No schema exists for this table yet. Query the database to determine the schema and put it into the cache.
    +                                try (final Connection conn = dbcpService.getConnection()) {
    +                                    schema = TableSchema.from(conn, catalog, schemaName, tableName, translateFieldNames, includePrimaryKeys);
    +                                    schemaCache.put(schemaKey, schema);
    +                                } catch (final SQLNonTransientException e) {
    +                                    log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                    return;
    +                                } catch (final SQLException e) {
    +                                    log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                            new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_RETRY);
    +                                    return;
    +                                }
    +                            }
    +                        }
    +
    +                        final SqlAndIncludedColumns sqlHolder;
    +                        try {
    +                            // build the fully qualified table name
    +                            final StringBuilder tableNameBuilder = new StringBuilder();
    +                            if (catalog != null) {
    +                                tableNameBuilder.append(catalog).append(".");
    +                            }
    +                            if (schemaName != null) {
    +                                tableNameBuilder.append(schemaName).append(".");
    +                            }
    +                            tableNameBuilder.append(tableName);
    +                            final String fqTableName = tableNameBuilder.toString();
    +
    +                            if (INSERT_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateInsert(recordSchema, fqTableName, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else if (UPDATE_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateUpdate(recordSchema, fqTableName, updateKeys, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else if (DELETE_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateDelete(recordSchema, fqTableName, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else {
    +                                log.error("Statement Type {} is not valid, flowfile {} will be penalized and routed to failure", new Object[]{statementType, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                                return;
    +                            }
    +                        } catch (final ProcessException pe) {
    +                            log.error("Failed to convert {} to a SQL {} statement due to {}; routing to failure",
    +                                    new Object[]{flowFile, statementType, pe.toString()}, pe);
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        try (PreparedStatement ps = con.prepareStatement(sqlHolder.getSql())) {
    +
    +                            try {
    +                                ps.setQueryTimeout(queryTimeout); // timeout in seconds
    +                            } catch (SQLException se) {
    +                                // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                if (queryTimeout > 0) {
    +                                    throw se;
    +                                }
    +                            }
    +
    +                            Record currentRecord;
    +                            List<Integer> fieldIndexes = sqlHolder.getFieldIndexes();
    +
    +                            while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                Object[] values = currentRecord.getValues();
    +                                if (values != null) {
    +                                    if (fieldIndexes != null) {
    +                                        for (int i = 0; i < fieldIndexes.size(); i++) {
    +                                            ps.setObject(i + 1, values[fieldIndexes.get(i)]);
    +                                        }
    +                                    } else {
    +                                        // If there's no index map, assume all values are included and set them in order
    +                                        for (int i = 0; i < values.length; i++) {
    +                                            ps.setObject(i + 1, values[i]);
    +                                        }
    +                                    }
    +                                    ps.addBatch();
    +                                }
    +                            }
    +
    +                            log.debug("Executing query {}", new Object[]{sqlHolder});
    +                            ps.executeBatch();
    --- End diff --
    
    Confirmed that a RDBMS transaction is rolled back when any row failed to be updated. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on the issue:

    https://github.com/apache/nifi/pull/1677
  
    @mattyb149 Thanks for the updated commit. I confirmed that my comments are incorporated.
    
    It works as expected in most cases, however, I found a case which needs to be addressed.
    
    Let's say I have a table with a primary key like this:
    
    ```sql
    create table tutorials_tbl(
       tutorial_id INT NOT NULL,
       tutorial_title VARCHAR(100) NOT NULL,
       tutorial_author VARCHAR(40) NOT NULL,
       submission_date DATE,
       PRIMARY KEY ( tutorial_id )
    );
    ```
    
    And insert some rows, then update a row, especially update its primary key:
    
    ```sql
    update tutorials_tbl set tutorial_id = 110 where tutorial_id = 11;
    ```
    
    Above update query generates following JSON via CaptureChangeMySQL:
    
    ```json
    { "type" : "update",
      "timestamp" : 1492648209000,
      "binlog_filename" : "mysql-server-bin.000004",
      "binlog_position" : 97152,
      "database" : "nifi_test",
      "table_name" : "tutorials_tbl",
      "table_id" : 222,
      "columns" : [ {
        "id" : 1,    "name" : "tutorial_id",    "column_type" : 4,    "last_value" : 11,    "value" : 110
      }, {
        "id" : 2,    "name" : "tutorial_title",    "column_type" : 12,    "last_value" : "11th",    "value" : "11th"
      }, {
        "id" : 3,    "name" : "tutorial_author",    "column_type" : 12,    "last_value" : "koji",    "value" : "koji"
      }, {
        "id" : 4,    "name" : "submission_date",    "column_type" : 91,    "last_value" : null,    "value" : null
      } ]}
    ```
    
    `Transform to Flat JSON` (JoltTransform) flattens the event JSON as below. At this point, the record image before update is dropped:
    
    ```json
    [ {  "tutorial_id" : 110,  "tutorial_title" : "11th",  "tutorial_author" : "koji",  "submission_date" : null } ]
    ```
    
    Finally, PutDatabaseRecord generates an update sql statement with `where tutorial_id = 110`. But it doesn't update anything, because it should have used `where tutorial_id = 11` with before update row image.
    
    We might be able to handle this by generating two delete and insert records in a NiFi flow, or do something smart at PutDatabaseRecord.
    How do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r113125292
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1058 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RowRecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RowRecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RowRecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    --- End diff --
    
    I reviewed and merged NIFI-3730, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112144820
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1067 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "convert", "jdbc", "put", "database"})
    +@SeeAlso({ConvertJSONToSQL.class, PutSQL.class})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail", "Fail",
    --- End diff --
    
    Other AllowableValues are named as 'XXX Unmatched Fields' or 'XXX on Unmatched Columns' but this value is just 'Fail'. Probably better to be consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/1677
  
    Hmm, would this be a problem with Update Keys in general, or only Primary Keys when Update Keys are not specified? Either way, should we be including (and expecting) last_value when the statement type is update?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112936031
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1058 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    --- End diff --
    
    Yes, I forgot to distinguish between SQLNonTransientException and other SQL exceptions, my intent is to handle errors the same way PutSQL does. Will update to use REL_RETRY when appropriate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112849492
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1058 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RowRecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RowRecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RowRecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    --- End diff --
    
    In an end-to-end CDC flow, we would use CaptureChangeMySQL to get database events to stream into this PutDatabaseRecord as your template shared on Gist does. Those database events include SQL statements such as `begin` and `commit`, and such statements are executed literary here.
    
    I think it would be great if users can configure CaptureChangeMySQL whether to emit `begin` and `commit` event or not, as these statements do not have significant meaning for synchronizing change since CaptureChangeMySQL emits FlowFile per updated record, but just for EnforceOrder to order events correctly.
    
    If we can eliminate these events, we can minimize the number of FlowFiles which would lead us to a better performance.
    
    Also, (for MySQL at least) `begin` and `commit` is not a database or table specific event. When I replicate changes from table A to table B using CaptureChangeMySQL and PutDatabaseRecord, I saw following behavior and felt `begin` and `commit` are a little bit disturbing:
    
    1. Insert a row into table A
    1. 3 events are emitted, begin, insert A and commit. Then CaptureChangeMySQL emitted 3 FlowFiles.
    1. PutDatabaseRecord execute 3 SQLs. begin, insert B and commit.
    1. Another 3 events are emitted derived from `insert B` via MySQL bin-log, begin, insert B and commit.
    1. Since I have configured CaptureChangeMySQL only listens to `table A`, it only emitted 2 FlowFiles, begin and commit.
    1. PutDatabaseRecord executes the begin and commit SQL.
    
    Do we have to take these "begin" and "commit" statements into account for a CDC flow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112190651
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1067 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "convert", "jdbc", "put", "database"})
    +@SeeAlso({ConvertJSONToSQL.class, PutSQL.class})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail", "Fail",
    --- End diff --
    
    I agree, I copied these from ConvertJSONToSQL, but can improve this here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r113344089
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1058 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    --- End diff --
    
    Great idea, enabling Rollback on Failure with PutDatabaseRecord would be helpful. Let's do that after we finalize PutDatabaseRecord review process. I will rebase #1658 to include PutDatabaseRecord once this get merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112190902
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1067 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "convert", "jdbc", "put", "database"})
    +@SeeAlso({ConvertJSONToSQL.class, PutSQL.class})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail", "Fail",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RowRecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-batch-size")
    +            .displayName("Batch Size")
    +            .description("The preferred number of FlowFiles to put to the database in a single transaction")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.POSITIVE_INTEGER_VALIDATOR)
    +            .defaultValue("100")
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +        pds.add(BATCH_SIZE);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RowRecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RowRecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    +                                        } else {
    +                                            log.error("Record had no (or null) value for Field Containing SQL: {}, flowfile {} will be penalized and routed to failure",
    +                                                    new Object[]{sqlField, flowFile});
    +                                            flowFile = session.penalize(flowFile);
    +                                            session.transfer(flowFile, REL_FAILURE);
    +                                        }
    +                                    }
    +                                    session.transfer(flowFile, REL_SUCCESS);
    --- End diff --
    
    Oops! Good catch, will fix and add a unit test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/1677


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r113178028
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1076 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.schema.access.SchemaNotFoundException;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.SQLNonTransientException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    +                                        } else {
    +                                            log.error("Record had no (or null) value for Field Containing SQL: {}, flowfile {} will be penalized and routed to failure",
    +                                                    new Object[]{sqlField, flowFile});
    +                                            flowFile = session.penalize(flowFile);
    +                                            session.transfer(flowFile, REL_FAILURE);
    +                                            return;
    +                                        }
    +                                    }
    +                                    session.transfer(flowFile, REL_SUCCESS);
    +                                    session.getProvenanceReporter().send(flowFile, jdbcURL);
    +                                } catch (final SQLNonTransientException e) {
    +                                    log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                } catch (final SQLException e) {
    +                                    log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                            new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_RETRY);
    +                                }
    +                            } else {
    +                                log.warn("Record schema does not contain Field Containing SQL: {}, flowfile {} will be penalized and routed to failure", new Object[]{sqlField, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                            }
    +                        }
    +
    +                    } else {
    +                        // Ensure the table name has been set, the generated SQL statements (and TableSchema cache) will need it
    +                        if (StringUtils.isEmpty(tableName)) {
    +                            log.error("Cannot process {} because Table Name is null or empty; penalizing and routing to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        final boolean includePrimaryKeys = UPDATE_TYPE.equalsIgnoreCase(statementType) && updateKeys == null;
    +
    +                        // get the database schema from the cache, if one exists. We do this in a synchronized block, rather than
    +                        // using a ConcurrentMap because the Map that we are using is a LinkedHashMap with a capacity such that if
    +                        // the Map grows beyond this capacity, old elements are evicted. We do this in order to avoid filling the
    +                        // Java Heap if there are a lot of different SQL statements being generated that reference different tables.
    +                        TableSchema schema;
    +                        synchronized (this) {
    +                            schema = schemaCache.get(schemaKey);
    +                            if (schema == null) {
    +                                // No schema exists for this table yet. Query the database to determine the schema and put it into the cache.
    +                                try (final Connection conn = dbcpService.getConnection()) {
    +                                    schema = TableSchema.from(conn, catalog, schemaName, tableName, translateFieldNames, includePrimaryKeys);
    +                                    schemaCache.put(schemaKey, schema);
    +                                } catch (final SQLNonTransientException e) {
    +                                    log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                    return;
    +                                } catch (final SQLException e) {
    +                                    log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                            new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_RETRY);
    +                                    return;
    +                                }
    +                            }
    +                        }
    +
    +                        final SqlAndIncludedColumns sqlHolder;
    +                        try {
    +                            // build the fully qualified table name
    +                            final StringBuilder tableNameBuilder = new StringBuilder();
    +                            if (catalog != null) {
    +                                tableNameBuilder.append(catalog).append(".");
    +                            }
    +                            if (schemaName != null) {
    +                                tableNameBuilder.append(schemaName).append(".");
    +                            }
    +                            tableNameBuilder.append(tableName);
    +                            final String fqTableName = tableNameBuilder.toString();
    +
    +                            if (INSERT_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateInsert(recordSchema, fqTableName, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else if (UPDATE_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateUpdate(recordSchema, fqTableName, updateKeys, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else if (DELETE_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateDelete(recordSchema, fqTableName, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else {
    +                                log.error("Statement Type {} is not valid, flowfile {} will be penalized and routed to failure", new Object[]{statementType, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                                return;
    +                            }
    +                        } catch (final ProcessException pe) {
    +                            log.error("Failed to convert {} to a SQL {} statement due to {}; routing to failure",
    +                                    new Object[]{flowFile, statementType, pe.toString()}, pe);
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        try (PreparedStatement ps = con.prepareStatement(sqlHolder.getSql())) {
    +
    +                            try {
    +                                ps.setQueryTimeout(queryTimeout); // timeout in seconds
    +                            } catch (SQLException se) {
    +                                // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                if (queryTimeout > 0) {
    +                                    throw se;
    +                                }
    +                            }
    +
    +                            Record currentRecord;
    +                            List<Integer> fieldIndexes = sqlHolder.getFieldIndexes();
    +
    +                            while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                Object[] values = currentRecord.getValues();
    +                                if (values != null) {
    +                                    if (fieldIndexes != null) {
    +                                        for (int i = 0; i < fieldIndexes.size(); i++) {
    +                                            ps.setObject(i + 1, values[fieldIndexes.get(i)]);
    +                                        }
    +                                    } else {
    +                                        // If there's no index map, assume all values are included and set them in order
    +                                        for (int i = 0; i < values.length; i++) {
    +                                            ps.setObject(i + 1, values[i]);
    +                                        }
    +                                    }
    +                                    ps.addBatch();
    +                                }
    +                            }
    +
    +                            log.debug("Executing query {}", new Object[]{sqlHolder});
    +                            ps.executeBatch();
    +                            session.transfer(flowFile, REL_SUCCESS);
    +                            session.getProvenanceReporter().send(flowFile, jdbcURL);
    +
    +                        } catch (final SQLNonTransientException e) {
    +                            log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } catch (final SQLException e) {
    +                            log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                    new Object[]{flowFile, e});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_RETRY);
    +                        }
    +                    }
    +                } catch (final MalformedRecordException | SchemaNotFoundException | IOException e) {
    +                    throw new ProcessException("Failed to determine schema of data records for " + flowFile, e);
    --- End diff --
    
    This ProcessException will be caught by the outside catch block which logs the exception and yields context, but does not re-throw it nor route incoming FlowFile to anywhere. As a result FlowFileHandlingException is thrown. This is reproducible by specifying non-existing schema name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112135805
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1067 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.SeeAlso;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "convert", "jdbc", "put", "database"})
    +@SeeAlso({ConvertJSONToSQL.class, PutSQL.class})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail", "Fail",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RowRecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor BATCH_SIZE = new PropertyDescriptor.Builder()
    --- End diff --
    
    Is this `BATCH_SIZE` used somewhere? I couldn't find usage of it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r113190736
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1076 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.schema.access.SchemaNotFoundException;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.SQLNonTransientException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    +                                        } else {
    +                                            log.error("Record had no (or null) value for Field Containing SQL: {}, flowfile {} will be penalized and routed to failure",
    +                                                    new Object[]{sqlField, flowFile});
    +                                            flowFile = session.penalize(flowFile);
    +                                            session.transfer(flowFile, REL_FAILURE);
    +                                            return;
    +                                        }
    +                                    }
    +                                    session.transfer(flowFile, REL_SUCCESS);
    +                                    session.getProvenanceReporter().send(flowFile, jdbcURL);
    +                                } catch (final SQLNonTransientException e) {
    +                                    log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                } catch (final SQLException e) {
    +                                    log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                            new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_RETRY);
    +                                }
    +                            } else {
    +                                log.warn("Record schema does not contain Field Containing SQL: {}, flowfile {} will be penalized and routed to failure", new Object[]{sqlField, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                            }
    +                        }
    +
    +                    } else {
    +                        // Ensure the table name has been set, the generated SQL statements (and TableSchema cache) will need it
    +                        if (StringUtils.isEmpty(tableName)) {
    +                            log.error("Cannot process {} because Table Name is null or empty; penalizing and routing to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        final boolean includePrimaryKeys = UPDATE_TYPE.equalsIgnoreCase(statementType) && updateKeys == null;
    +
    +                        // get the database schema from the cache, if one exists. We do this in a synchronized block, rather than
    +                        // using a ConcurrentMap because the Map that we are using is a LinkedHashMap with a capacity such that if
    +                        // the Map grows beyond this capacity, old elements are evicted. We do this in order to avoid filling the
    +                        // Java Heap if there are a lot of different SQL statements being generated that reference different tables.
    +                        TableSchema schema;
    +                        synchronized (this) {
    +                            schema = schemaCache.get(schemaKey);
    +                            if (schema == null) {
    +                                // No schema exists for this table yet. Query the database to determine the schema and put it into the cache.
    +                                try (final Connection conn = dbcpService.getConnection()) {
    +                                    schema = TableSchema.from(conn, catalog, schemaName, tableName, translateFieldNames, includePrimaryKeys);
    +                                    schemaCache.put(schemaKey, schema);
    +                                } catch (final SQLNonTransientException e) {
    +                                    log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                    return;
    +                                } catch (final SQLException e) {
    +                                    log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                            new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_RETRY);
    +                                    return;
    +                                }
    +                            }
    +                        }
    +
    +                        final SqlAndIncludedColumns sqlHolder;
    +                        try {
    +                            // build the fully qualified table name
    +                            final StringBuilder tableNameBuilder = new StringBuilder();
    +                            if (catalog != null) {
    +                                tableNameBuilder.append(catalog).append(".");
    +                            }
    +                            if (schemaName != null) {
    +                                tableNameBuilder.append(schemaName).append(".");
    +                            }
    +                            tableNameBuilder.append(tableName);
    +                            final String fqTableName = tableNameBuilder.toString();
    +
    +                            if (INSERT_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateInsert(recordSchema, fqTableName, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else if (UPDATE_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateUpdate(recordSchema, fqTableName, updateKeys, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else if (DELETE_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateDelete(recordSchema, fqTableName, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else {
    +                                log.error("Statement Type {} is not valid, flowfile {} will be penalized and routed to failure", new Object[]{statementType, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                                return;
    +                            }
    +                        } catch (final ProcessException pe) {
    +                            log.error("Failed to convert {} to a SQL {} statement due to {}; routing to failure",
    +                                    new Object[]{flowFile, statementType, pe.toString()}, pe);
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        try (PreparedStatement ps = con.prepareStatement(sqlHolder.getSql())) {
    +
    +                            try {
    +                                ps.setQueryTimeout(queryTimeout); // timeout in seconds
    +                            } catch (SQLException se) {
    +                                // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                if (queryTimeout > 0) {
    +                                    throw se;
    +                                }
    +                            }
    +
    +                            Record currentRecord;
    +                            List<Integer> fieldIndexes = sqlHolder.getFieldIndexes();
    +
    +                            while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                Object[] values = currentRecord.getValues();
    +                                if (values != null) {
    +                                    if (fieldIndexes != null) {
    +                                        for (int i = 0; i < fieldIndexes.size(); i++) {
    +                                            ps.setObject(i + 1, values[fieldIndexes.get(i)]);
    +                                        }
    +                                    } else {
    +                                        // If there's no index map, assume all values are included and set them in order
    +                                        for (int i = 0; i < values.length; i++) {
    +                                            ps.setObject(i + 1, values[i]);
    +                                        }
    +                                    }
    +                                    ps.addBatch();
    +                                }
    +                            }
    +
    +                            log.debug("Executing query {}", new Object[]{sqlHolder});
    +                            ps.executeBatch();
    --- End diff --
    
    I think we should rollback RDBMS session and add exception details to FlowFile attribute then route it to 'retry' so that it can be analyzed and retry later. As an example, I got following exception:
    ```
    java.sql.BatchUpdateException: Duplicate entry '2' for key 'PRIMARY'
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r112938239
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1058 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RowRecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RowRecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RowRecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RowRecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    --- End diff --
    
    I agree that it should be configurable by the user whether to include begin and commit directives. For step 5 above, the "begin" event is received before the table map event, and as you point out, it is not associated to a table-specific event, so it is not filtered out.  I have written up [NIFI-3730](https://issues.apache.org/jira/browse/NIFI-3730) to address this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1677: NIFI-3704: Add PutDatabaseRecord processor

Posted by ijokarumawak <gi...@git.apache.org>.
Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1677#discussion_r113343779
  
    --- Diff: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/PutDatabaseRecord.java ---
    @@ -0,0 +1,1076 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.commons.lang3.StringUtils;
    +import org.apache.nifi.annotation.behavior.EventDriven;
    +import org.apache.nifi.annotation.behavior.InputRequirement;
    +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
    +import org.apache.nifi.annotation.behavior.ReadsAttribute;
    +import org.apache.nifi.annotation.documentation.CapabilityDescription;
    +import org.apache.nifi.annotation.documentation.Tags;
    +import org.apache.nifi.annotation.lifecycle.OnScheduled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.expression.AttributeExpression;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.processor.AbstractProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.ProcessSession;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.schema.access.SchemaNotFoundException;
    +import org.apache.nifi.serialization.MalformedRecordException;
    +import org.apache.nifi.serialization.RecordReader;
    +import org.apache.nifi.serialization.RecordReaderFactory;
    +import org.apache.nifi.serialization.record.Record;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.sql.Connection;
    +import java.sql.DatabaseMetaData;
    +import java.sql.PreparedStatement;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.SQLNonTransientException;
    +import java.sql.Statement;
    +import java.util.ArrayList;
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.HashSet;
    +import java.util.LinkedHashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.IntStream;
    +
    +
    +@EventDriven
    +@InputRequirement(Requirement.INPUT_REQUIRED)
    +@Tags({"sql", "record", "jdbc", "put", "database", "update", "insert", "delete"})
    +@CapabilityDescription("The PutDatabaseRecord processor uses a specified RecordReader to input (possibly multiple) records from an incoming flow file. These records are translated to SQL "
    +        + "statements and executed as a single batch. If any errors occur, the flow file is routed to failure or retry, and if the records are transmitted successfully, the incoming flow file is "
    +        + "routed to success.  The type of statement executed by the processor is specified via the Statement Type property, which accepts some hard-coded values such as INSERT, UPDATE, and DELETE, "
    +        + "as well as 'Use statement.type Attribute', which causes the processor to get the statement type from a flow file attribute.  IMPORTANT: If the Statement Type is UPDATE, then the incoming "
    +        + "records must not alter the value(s) of the primary keys (or user-specified Update Keys). If such records are encountered, the UPDATE statement issued to the database may do nothing "
    +        + "(if no existing records with the new primary key values are found), or could inadvertently corrupt the existing data (by changing records for which the new values of the primary keys "
    +        + "exist).")
    +@ReadsAttribute(attribute = PutDatabaseRecord.STATEMENT_TYPE_ATTRIBUTE, description = "If 'Use statement.type Attribute' is selected for the Statement Type property, the value of this attribute "
    +        + "will be used to determine the type of statement (INSERT, UPDATE, DELETE, SQL, etc.) to generate and execute.")
    +public class PutDatabaseRecord extends AbstractProcessor {
    +
    +    static final String UPDATE_TYPE = "UPDATE";
    +    static final String INSERT_TYPE = "INSERT";
    +    static final String DELETE_TYPE = "DELETE";
    +    static final String SQL_TYPE = "SQL";   // Not an allowable value in the Statement Type property, must be set by attribute
    +    static final String USE_ATTR_TYPE = "Use statement.type Attribute";
    +
    +    static final String STATEMENT_TYPE_ATTRIBUTE = "statement.type";
    +
    +    static final AllowableValue IGNORE_UNMATCHED_FIELD = new AllowableValue("Ignore Unmatched Fields", "Ignore Unmatched Fields",
    +            "Any field in the document that cannot be mapped to a column in the database is ignored");
    +    static final AllowableValue FAIL_UNMATCHED_FIELD = new AllowableValue("Fail on Unmatched Fields", "Fail on Unmatched Fields",
    +            "If the document has any field that cannot be mapped to a column in the database, the FlowFile will be routed to the failure relationship");
    +    static final AllowableValue IGNORE_UNMATCHED_COLUMN = new AllowableValue("Ignore Unmatched Columns",
    +            "Ignore Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  No notification will be logged");
    +    static final AllowableValue WARNING_UNMATCHED_COLUMN = new AllowableValue("Warn on Unmatched Columns",
    +            "Warn on Unmatched Columns",
    +            "Any column in the database that does not have a field in the document will be assumed to not be required.  A warning will be logged");
    +    static final AllowableValue FAIL_UNMATCHED_COLUMN = new AllowableValue("Fail on Unmatched Columns",
    +            "Fail on Unmatched Columns",
    +            "A flow will fail if any column in the database that does not have a field in the document.  An error will be logged");
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query result set.")
    +            .build();
    +
    +    static final Relationship REL_RETRY = new Relationship.Builder()
    +            .name("retry")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated but attempting the operation again may succeed")
    +            .build();
    +    static final Relationship REL_FAILURE = new Relationship.Builder()
    +            .name("failure")
    +            .description("A FlowFile is routed to this relationship if the database cannot be updated and retrying the operation will also fail, "
    +                    + "such as an invalid query or an integrity constraint violation")
    +            .build();
    +
    +    protected static Set<Relationship> relationships;
    +
    +    // Properties
    +    static final PropertyDescriptor RECORD_READER_FACTORY = new PropertyDescriptor.Builder()
    +            .name("put-db-record-record-reader")
    +            .displayName("Record Reader")
    +            .description("Specifies the Controller Service to use for parsing incoming data and determining the data's schema.")
    +            .identifiesControllerService(RecordReaderFactory.class)
    +            .required(true)
    +            .build();
    +
    +    static final PropertyDescriptor STATEMENT_TYPE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-statement-type")
    +            .displayName("Statement Type")
    +            .description("Specifies the type of SQL Statement to generate. If 'Use statement.type Attribute' is chosen, then the value is taken from the statement.type attribute in the "
    +                    + "FlowFile. The 'Use statement.type Attribute' option is the only one that allows the 'SQL' statement type. If 'SQL' is specified, the value of the field specified by the "
    +                    + "'Field Containing SQL' property is expected to be a valid SQL statement on the target database, and will be executed as-is.")
    +            .required(true)
    +            .allowableValues(UPDATE_TYPE, INSERT_TYPE, DELETE_TYPE, USE_ATTR_TYPE)
    +            .build();
    +
    +    static final PropertyDescriptor DBCP_SERVICE = new PropertyDescriptor.Builder()
    +            .name("put-db-record-dcbp-service")
    +            .displayName("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a connection to the database for sending records.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    static final PropertyDescriptor CATALOG_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-catalog-name")
    +            .displayName("Catalog Name")
    +            .description("The name of the catalog that the statement should update. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor SCHEMA_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-schema-name")
    +            .displayName("Schema Name")
    +            .description("The name of the schema that the table belongs to. This may not apply for the database that you are updating. In this case, leave the field empty")
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TABLE_NAME = new PropertyDescriptor.Builder()
    +            .name("put-db-record-table-name")
    +            .displayName("Table Name")
    +            .description("The name of the table that the statement should affect.")
    +            .required(true)
    +            .expressionLanguageSupported(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    static final PropertyDescriptor TRANSLATE_FIELD_NAMES = new PropertyDescriptor.Builder()
    +            .name("put-db-record-translate-field-names")
    +            .displayName("Translate Field Names")
    +            .description("If true, the Processor will attempt to translate field names into the appropriate column names for the table specified. "
    +                    + "If false, the field names must match the column names exactly, or the column will not be updated")
    +            .allowableValues("true", "false")
    +            .defaultValue("true")
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_FIELD_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-field-behavior")
    +            .displayName("Unmatched Field Behavior")
    +            .description("If an incoming record has a field that does not map to any of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_FIELD, FAIL_UNMATCHED_FIELD)
    +            .defaultValue(IGNORE_UNMATCHED_FIELD.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UNMATCHED_COLUMN_BEHAVIOR = new PropertyDescriptor.Builder()
    +            .name("put-db-record-unmatched-column-behavior")
    +            .displayName("Unmatched Column Behavior")
    +            .description("If an incoming record does not have a field mapping for all of the database table's columns, this property specifies how to handle the situation")
    +            .allowableValues(IGNORE_UNMATCHED_COLUMN, WARNING_UNMATCHED_COLUMN, FAIL_UNMATCHED_COLUMN)
    +            .defaultValue(FAIL_UNMATCHED_COLUMN.getValue())
    +            .build();
    +
    +    static final PropertyDescriptor UPDATE_KEYS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-update-keys")
    +            .displayName("Update Keys")
    +            .description("A comma-separated list of column names that uniquely identifies a row in the database for UPDATE statements. "
    +                    + "If the Statement Type is UPDATE and this property is not set, the table's Primary Keys are used. "
    +                    + "In this case, if no Primary Key exists, the conversion to SQL will fail if Unmatched Column Behaviour is set to FAIL. "
    +                    + "This property is ignored if the Statement Type is INSERT")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor FIELD_CONTAINING_SQL = new PropertyDescriptor.Builder()
    +            .name("put-db-record-field-containing-sql")
    +            .displayName("Field Containing SQL")
    +            .description("If the Statement Type is 'SQL' (as set in the statement.type attribute), this field indicates which field in the record(s) contains the SQL statement to execute. The value "
    +                    + "of the field must be a single SQL statement. If the Statement Type is not 'SQL', this field is ignored.")
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .required(false)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_IDENTIFIERS = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-identifiers")
    +            .displayName("Quote Column Identifiers")
    +            .description("Enabling this option will cause all column names to be quoted, allowing you to use reserved words as column names in your tables.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUOTED_TABLE_IDENTIFIER = new PropertyDescriptor.Builder()
    +            .name("put-db-record-quoted-table-identifiers")
    +            .displayName("Quote Table Identifiers")
    +            .description("Enabling this option will cause the table name to be quoted to support the use of special characters in the table name.")
    +            .allowableValues("true", "false")
    +            .defaultValue("false")
    +            .build();
    +
    +    static final PropertyDescriptor QUERY_TIMEOUT = new PropertyDescriptor.Builder()
    +            .name("put-db-record-query-timeout")
    +            .displayName("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running SQL statement "
    +                    + ", zero means there is no limit. Max time less than 1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .expressionLanguageSupported(true)
    +            .build();
    +
    +    protected static List<PropertyDescriptor> propDescriptors;
    +
    +    private final Map<SchemaKey, TableSchema> schemaCache = new LinkedHashMap<SchemaKey, TableSchema>(100) {
    +        private static final long serialVersionUID = 1L;
    +
    +        @Override
    +        protected boolean removeEldestEntry(Map.Entry<SchemaKey, TableSchema> eldest) {
    +            return size() >= 100;
    +        }
    +    };
    +
    +
    +    static {
    +        final Set<Relationship> r = new HashSet<>();
    +        r.add(REL_SUCCESS);
    +        r.add(REL_FAILURE);
    +        r.add(REL_RETRY);
    +        relationships = Collections.unmodifiableSet(r);
    +
    +        final List<PropertyDescriptor> pds = new ArrayList<>();
    +        pds.add(RECORD_READER_FACTORY);
    +        pds.add(STATEMENT_TYPE);
    +        pds.add(DBCP_SERVICE);
    +        pds.add(CATALOG_NAME);
    +        pds.add(SCHEMA_NAME);
    +        pds.add(TABLE_NAME);
    +        pds.add(TRANSLATE_FIELD_NAMES);
    +        pds.add(UNMATCHED_FIELD_BEHAVIOR);
    +        pds.add(UNMATCHED_COLUMN_BEHAVIOR);
    +        pds.add(UPDATE_KEYS);
    +        pds.add(FIELD_CONTAINING_SQL);
    +        pds.add(QUOTED_IDENTIFIERS);
    +        pds.add(QUOTED_TABLE_IDENTIFIER);
    +        pds.add(QUERY_TIMEOUT);
    +
    +        propDescriptors = Collections.unmodifiableList(pds);
    +    }
    +
    +
    +    @Override
    +    public Set<Relationship> getRelationships() {
    +        return relationships;
    +    }
    +
    +    @Override
    +    protected List<PropertyDescriptor> getSupportedPropertyDescriptors() {
    +        return propDescriptors;
    +    }
    +
    +    @Override
    +    protected PropertyDescriptor getSupportedDynamicPropertyDescriptor(final String propertyDescriptorName) {
    +        return new PropertyDescriptor.Builder()
    +                .name(propertyDescriptorName)
    +                .required(false)
    +                .addValidator(StandardValidators.createAttributeExpressionLanguageValidator(AttributeExpression.ResultType.STRING, true))
    +                .addValidator(StandardValidators.ATTRIBUTE_KEY_PROPERTY_NAME_VALIDATOR)
    +                .expressionLanguageSupported(true)
    +                .dynamic(true)
    +                .build();
    +    }
    +
    +    @OnScheduled
    +    public void onScheduled(final ProcessContext context) {
    +        synchronized (this) {
    +            schemaCache.clear();
    +        }
    +    }
    +
    +    @Override
    +    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
    +
    +        FlowFile flowFile = session.get();
    +        if (flowFile == null) {
    +            return;
    +        }
    +
    +        final ComponentLog log = getLogger();
    +
    +        final RecordReaderFactory recordParserFactory = context.getProperty(RECORD_READER_FACTORY)
    +                .asControllerService(RecordReaderFactory.class);
    +        final String statementTypeProperty = context.getProperty(STATEMENT_TYPE).getValue();
    +        final DBCPService dbcpService = context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final boolean translateFieldNames = context.getProperty(TRANSLATE_FIELD_NAMES).asBoolean();
    +        final boolean ignoreUnmappedFields = IGNORE_UNMATCHED_FIELD.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_FIELD_BEHAVIOR).getValue());
    +        final Integer queryTimeout = context.getProperty(QUERY_TIMEOUT).evaluateAttributeExpressions().asTimePeriod(TimeUnit.SECONDS).intValue();
    +
    +        // Is the unmatched column behaviour fail or warning?
    +        final boolean failUnmappedColumns = FAIL_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +        final boolean warningUnmappedColumns = WARNING_UNMATCHED_COLUMN.getValue().equalsIgnoreCase(context.getProperty(UNMATCHED_COLUMN_BEHAVIOR).getValue());
    +
    +        // Escape column names?
    +        final boolean escapeColumnNames = context.getProperty(QUOTED_IDENTIFIERS).asBoolean();
    +
    +        // Quote table name?
    +        final boolean quoteTableName = context.getProperty(QUOTED_TABLE_IDENTIFIER).asBoolean();
    +
    +        try (final Connection con = dbcpService.getConnection()) {
    +
    +            String jdbcURL = "DBCPService";
    +            try {
    +                DatabaseMetaData databaseMetaData = con.getMetaData();
    +                if (databaseMetaData != null) {
    +                    jdbcURL = databaseMetaData.getURL();
    +                }
    +            } catch (SQLException se) {
    +                // Ignore and use default JDBC URL. This shouldn't happen unless the driver doesn't implement getMetaData() properly
    +            }
    +
    +            final String catalog = context.getProperty(CATALOG_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String schemaName = context.getProperty(SCHEMA_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String tableName = context.getProperty(TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue();
    +            final String updateKeys = context.getProperty(UPDATE_KEYS).evaluateAttributeExpressions(flowFile).getValue();
    +            final SchemaKey schemaKey = new SchemaKey(catalog, tableName);
    +
    +            // Get the statement type from the attribute if necessary
    +            String statementType = statementTypeProperty;
    +            if (USE_ATTR_TYPE.equals(statementTypeProperty)) {
    +                statementType = flowFile.getAttribute(STATEMENT_TYPE_ATTRIBUTE);
    +            }
    +            if (StringUtils.isEmpty(statementType)) {
    +                log.error("Statement Type is not specified, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                flowFile = session.penalize(flowFile);
    +                session.transfer(flowFile, REL_FAILURE);
    +            } else {
    +                RecordSchema recordSchema;
    +                try (final InputStream in = session.read(flowFile)) {
    +
    +                    final RecordReader recordParser = recordParserFactory.createRecordReader(flowFile, in, log);
    +                    recordSchema = recordParser.getSchema();
    +
    +                    if (SQL_TYPE.equalsIgnoreCase(statementType)) {
    +
    +                        // Find which field has the SQL statement in it
    +                        final String sqlField = context.getProperty(FIELD_CONTAINING_SQL).evaluateAttributeExpressions(flowFile).getValue();
    +                        if (StringUtils.isEmpty(sqlField)) {
    +                            log.error("SQL specified as Statement Type but no Field Containing SQL was found, flowfile {} will be penalized and routed to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } else {
    +                            boolean schemaHasSqlField = recordSchema.getFields().stream().anyMatch((field) -> sqlField.equals(field.getFieldName()));
    +                            if (schemaHasSqlField) {
    +                                try (Statement s = con.createStatement()) {
    +
    +                                    try {
    +                                        s.setQueryTimeout(queryTimeout); // timeout in seconds
    +                                    } catch (SQLException se) {
    +                                        // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                        if (queryTimeout > 0) {
    +                                            throw se;
    +                                        }
    +                                    }
    +
    +                                    Record currentRecord;
    +                                    while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                        Object sql = currentRecord.getValue(sqlField);
    +                                        if (sql != null && !StringUtils.isEmpty((String) sql)) {
    +                                            // Execute the statement as-is
    +                                            s.execute((String) sql);
    +                                        } else {
    +                                            log.error("Record had no (or null) value for Field Containing SQL: {}, flowfile {} will be penalized and routed to failure",
    +                                                    new Object[]{sqlField, flowFile});
    +                                            flowFile = session.penalize(flowFile);
    +                                            session.transfer(flowFile, REL_FAILURE);
    +                                            return;
    +                                        }
    +                                    }
    +                                    session.transfer(flowFile, REL_SUCCESS);
    +                                    session.getProvenanceReporter().send(flowFile, jdbcURL);
    +                                } catch (final SQLNonTransientException e) {
    +                                    log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                } catch (final SQLException e) {
    +                                    log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                            new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_RETRY);
    +                                }
    +                            } else {
    +                                log.warn("Record schema does not contain Field Containing SQL: {}, flowfile {} will be penalized and routed to failure", new Object[]{sqlField, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                            }
    +                        }
    +
    +                    } else {
    +                        // Ensure the table name has been set, the generated SQL statements (and TableSchema cache) will need it
    +                        if (StringUtils.isEmpty(tableName)) {
    +                            log.error("Cannot process {} because Table Name is null or empty; penalizing and routing to failure", new Object[]{flowFile});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        final boolean includePrimaryKeys = UPDATE_TYPE.equalsIgnoreCase(statementType) && updateKeys == null;
    +
    +                        // get the database schema from the cache, if one exists. We do this in a synchronized block, rather than
    +                        // using a ConcurrentMap because the Map that we are using is a LinkedHashMap with a capacity such that if
    +                        // the Map grows beyond this capacity, old elements are evicted. We do this in order to avoid filling the
    +                        // Java Heap if there are a lot of different SQL statements being generated that reference different tables.
    +                        TableSchema schema;
    +                        synchronized (this) {
    +                            schema = schemaCache.get(schemaKey);
    +                            if (schema == null) {
    +                                // No schema exists for this table yet. Query the database to determine the schema and put it into the cache.
    +                                try (final Connection conn = dbcpService.getConnection()) {
    +                                    schema = TableSchema.from(conn, catalog, schemaName, tableName, translateFieldNames, includePrimaryKeys);
    +                                    schemaCache.put(schemaKey, schema);
    +                                } catch (final SQLNonTransientException e) {
    +                                    log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_FAILURE);
    +                                    return;
    +                                } catch (final SQLException e) {
    +                                    log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                            new Object[]{flowFile, e});
    +                                    flowFile = session.penalize(flowFile);
    +                                    session.transfer(flowFile, REL_RETRY);
    +                                    return;
    +                                }
    +                            }
    +                        }
    +
    +                        final SqlAndIncludedColumns sqlHolder;
    +                        try {
    +                            // build the fully qualified table name
    +                            final StringBuilder tableNameBuilder = new StringBuilder();
    +                            if (catalog != null) {
    +                                tableNameBuilder.append(catalog).append(".");
    +                            }
    +                            if (schemaName != null) {
    +                                tableNameBuilder.append(schemaName).append(".");
    +                            }
    +                            tableNameBuilder.append(tableName);
    +                            final String fqTableName = tableNameBuilder.toString();
    +
    +                            if (INSERT_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateInsert(recordSchema, fqTableName, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else if (UPDATE_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateUpdate(recordSchema, fqTableName, updateKeys, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else if (DELETE_TYPE.equalsIgnoreCase(statementType)) {
    +                                sqlHolder = generateDelete(recordSchema, fqTableName, schema, translateFieldNames, ignoreUnmappedFields,
    +                                        failUnmappedColumns, warningUnmappedColumns, escapeColumnNames, quoteTableName);
    +                            } else {
    +                                log.error("Statement Type {} is not valid, flowfile {} will be penalized and routed to failure", new Object[]{statementType, flowFile});
    +                                flowFile = session.penalize(flowFile);
    +                                session.transfer(flowFile, REL_FAILURE);
    +                                return;
    +                            }
    +                        } catch (final ProcessException pe) {
    +                            log.error("Failed to convert {} to a SQL {} statement due to {}; routing to failure",
    +                                    new Object[]{flowFile, statementType, pe.toString()}, pe);
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                            return;
    +                        }
    +
    +                        try (PreparedStatement ps = con.prepareStatement(sqlHolder.getSql())) {
    +
    +                            try {
    +                                ps.setQueryTimeout(queryTimeout); // timeout in seconds
    +                            } catch (SQLException se) {
    +                                // If the driver doesn't support query timeout, then assume it is "infinite". Allow a timeout of zero only
    +                                if (queryTimeout > 0) {
    +                                    throw se;
    +                                }
    +                            }
    +
    +                            Record currentRecord;
    +                            List<Integer> fieldIndexes = sqlHolder.getFieldIndexes();
    +
    +                            while ((currentRecord = recordParser.nextRecord()) != null) {
    +                                Object[] values = currentRecord.getValues();
    +                                if (values != null) {
    +                                    if (fieldIndexes != null) {
    +                                        for (int i = 0; i < fieldIndexes.size(); i++) {
    +                                            ps.setObject(i + 1, values[fieldIndexes.get(i)]);
    +                                        }
    +                                    } else {
    +                                        // If there's no index map, assume all values are included and set them in order
    +                                        for (int i = 0; i < values.length; i++) {
    +                                            ps.setObject(i + 1, values[i]);
    +                                        }
    +                                    }
    +                                    ps.addBatch();
    +                                }
    +                            }
    +
    +                            log.debug("Executing query {}", new Object[]{sqlHolder});
    +                            ps.executeBatch();
    +                            session.transfer(flowFile, REL_SUCCESS);
    +                            session.getProvenanceReporter().send(flowFile, jdbcURL);
    +
    +                        } catch (final SQLNonTransientException e) {
    +                            log.error("Failed to update database for {} due to {}; routing to failure", new Object[]{flowFile, e});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_FAILURE);
    +                        } catch (final SQLException e) {
    +                            log.error("Failed to update database for {} due to {}; it is possible that retrying the operation will succeed, so routing to retry",
    +                                    new Object[]{flowFile, e});
    +                            flowFile = session.penalize(flowFile);
    +                            session.transfer(flowFile, REL_RETRY);
    +                        }
    +                    }
    +                } catch (final MalformedRecordException | SchemaNotFoundException | IOException e) {
    +                    throw new ProcessException("Failed to determine schema of data records for " + flowFile, e);
    --- End diff --
    
    Confirmed that a FlowFile is routed to 'failure' when a schema is not found with specified name. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---