You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@nifi.apache.org by markap14 <gi...@git.apache.org> on 2017/04/19 16:41:16 UTC

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

GitHub user markap14 opened a pull request:

    https://github.com/apache/nifi/pull/1682

    NIFI-3682: Add Schema Access Strategy and Schema Write Strategy Recor\u2026

    \u2026d Readers and Writers; bug fixes.
    
    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [ ] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/markap14/nifi NIFI-3682-Rebased

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1682.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1682
    
----
commit 6ba8aa8b8cd3c94e00e0a06c404754f309a8af53
Author: Mark Payne <ma...@hotmail.com>
Date:   2017-04-19T16:39:35Z

    NIFI-3682: Add Schema Access Strategy and Schema Write Strategy Record Readers and Writers; bug fixes.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112476535
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/schema/access/HortonworksEncodedSchemaReferenceStrategy.java ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.schema.access;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.nio.ByteBuffer;
    +
    +import org.apache.nifi.controller.ConfigurationContext;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.schemaregistry.services.SchemaRegistry;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +import org.apache.nifi.stream.io.StreamUtils;
    +
    +public class HortonworksEncodedSchemaReferenceStrategy implements SchemaAccessStrategy {
    --- End diff --
    
    ok...i buy that argument.  That said, can you please be sure to provide links to the webpage for the Hortonworks Schema Registry at least in Github.  This will serve as a reference to the spec.  We'll want to add other techniques most likely as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    thanks.  could you please rebase with latest master and squash these ?  I am having trouble merging the latest commit of two.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/1682


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by markap14 <gi...@git.apache.org>.

Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    OK @joewitt I think the latest push here addresses all of the issues raised. Please let me know if you find any other concerns. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112307821
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReader.java ---
    @@ -19,31 +19,47 @@
     
     import java.io.IOException;
     import java.io.InputStream;
    +import java.util.ArrayList;
    +import java.util.List;
     
     import org.apache.nifi.annotation.documentation.CapabilityDescription;
     import org.apache.nifi.annotation.documentation.Tags;
    -import org.apache.nifi.controller.AbstractControllerService;
    +import org.apache.nifi.components.AllowableValue;
     import org.apache.nifi.flowfile.FlowFile;
     import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.schema.access.SchemaNotFoundException;
     import org.apache.nifi.serialization.MalformedRecordException;
     import org.apache.nifi.serialization.RecordReader;
    -import org.apache.nifi.serialization.RowRecordReaderFactory;
    -import org.apache.nifi.serialization.record.RecordSchema;
    +import org.apache.nifi.serialization.RecordReaderFactory;
    +import org.apache.nifi.serialization.SchemaRegistryService;
     
     @Tags({"avro", "parse", "record", "row", "reader", "delimited", "comma", "separated", "values"})
     @CapabilityDescription("Parses Avro data and returns each Avro record as an separate Record object. The Avro data must contain "
    --- End diff --
    
    this description needs to be updated it appears since now the schema may either reside in the data or be provided by reference


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by markap14 <gi...@git.apache.org>.

Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    Good catch, @joewitt. Pushed another commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112324751
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/schema/access/SchemaNameAsAttribute.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.schema.access;
    +
    +import java.io.IOException;
    +import java.io.OutputStream;
    +import java.util.Collections;
    +import java.util.Map;
    +import java.util.Optional;
    +
    +import org.apache.nifi.serialization.record.RecordSchema;
    +import org.apache.nifi.serialization.record.SchemaIdentifier;
    +
    +public class SchemaNameAsAttribute implements SchemaAccessWriter {
    --- End diff --
    
    what about 'SchemaReferenceAsAttribute' and we could support lookup via schema name, schema identifier and version.  Also, it means we'd supply each of these as flowfile attributes when we have them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    ok looking much better after latest commit!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by markap14 <gi...@git.apache.org>.

Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    @joewitt I've pushed another commit that I think addresses your concerns above. Re: the GrokReader, if you are using a "SELECT * FROM FLOWFILE..." type of query then I agree it is a bit odd. However, you could be using QueryRecord to do something like "SELECT timestamp, message WHERE ..." and in this case the extra Allowable Value means you don't have to define another schema that does include each of the fields.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by markap14 <gi...@git.apache.org>.

Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    @joewitt you're right - the schema text should be marked as optional. I think we can also tighten up the validation a bit for the writers to ensure that the Schema Access Strategy and Schema Write Strategy coincide


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112322877
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/grok/GrokReader.java ---
    @@ -23,38 +23,58 @@
     import java.io.Reader;
     import java.util.ArrayList;
     import java.util.List;
    +import java.util.Map;
    +import java.util.regex.Matcher;
     
     import org.apache.nifi.annotation.documentation.CapabilityDescription;
     import org.apache.nifi.annotation.documentation.Tags;
     import org.apache.nifi.annotation.lifecycle.OnEnabled;
    +import org.apache.nifi.components.AllowableValue;
     import org.apache.nifi.components.PropertyDescriptor;
     import org.apache.nifi.controller.ConfigurationContext;
     import org.apache.nifi.flowfile.FlowFile;
     import org.apache.nifi.logging.ComponentLog;
     import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.schema.access.SchemaAccessStrategy;
    +import org.apache.nifi.schema.access.SchemaNotFoundException;
    +import org.apache.nifi.schemaregistry.services.SchemaRegistry;
     import org.apache.nifi.serialization.RecordReader;
    -import org.apache.nifi.serialization.RowRecordReaderFactory;
    -import org.apache.nifi.serialization.SchemaRegistryRecordReader;
    +import org.apache.nifi.serialization.RecordReaderFactory;
    +import org.apache.nifi.serialization.SchemaRegistryService;
    +import org.apache.nifi.serialization.SimpleRecordSchema;
    +import org.apache.nifi.serialization.record.DataType;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordFieldType;
     import org.apache.nifi.serialization.record.RecordSchema;
     
     import io.thekraken.grok.api.Grok;
    +import io.thekraken.grok.api.GrokUtils;
     import io.thekraken.grok.api.exception.GrokException;
     
     @Tags({"grok", "logs", "logfiles", "parse", "unstructured", "text", "record", "reader", "regex", "pattern", "logstash"})
     @CapabilityDescription("Provides a mechanism for reading unstructured text data, such as log files, and structuring the data "
         + "so that it can be processed. The service is configured using Grok patterns. "
         + "The service reads from a stream of data and splits each message that it finds into a separate Record, each containing the fields that are configured. "
    -    + "If a line in the input does not match the expected message pattern, the line of text is considered to be part of the previous "
    -    + "message, with the exception of stack traces. A stack trace that is found at the end of a log message is considered to be part "
    -    + "of the previous message but is added to the 'STACK_TRACE' field of the Record. If a record has no stack trace, it will have a NULL value "
    -    + "for the STACK_TRACE field. All fields that are parsed are considered to be of type String by default. If there is need to change the type of a field, "
    -    + "this can be accomplished by configuring the Schema Registry to use and adding the appropriate schema.")
    -public class GrokReader extends SchemaRegistryRecordReader implements RowRecordReaderFactory {
    +    + "If a line in the input does not match the expected message pattern, the line of text is either considered to be part of the previous "
    +    + "message or is skipped, depending on the configuration,, with the exception of stack traces. A stack trace that is found at the end of "
    +    + "a log message is considered to be part of the previous message but is added to the 'stackTrace' field of the Record. If a record has "
    +    + "no stack trace, it will have a NULL value for the stackTrace field. All fields that are parsed are considered to be of type String by default. "
    --- End diff --
    
    Does the stack trace reference here make sense for general Grok consumption?  I mean, isn't that just a function of a given log file and whether or not to capture it depends on a given grok expression?  Or is this a first class concept that we should be talking about here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    @markap14 ok have reviewed it quite a bit and much happier with the results.
    
    1. 'Schema Write Strategy' default should be schema.name rather than schema.text SchemaRegistryService:73 is where this is set
    2. QueryRecord with a GrokReader where schema access is 'Use String fields from Grok expression' what do you do for the Writer?  It cannot lookup the schema by name unless you've registered it and if you've done that then the grok expression inferred schema is a bit odd. 
    3. Hortonworks Schema Registry access strategy should provide a link to the hortonworks schema registry in github so others can know the serde approach for prepending header bytes to cover protocol version, schema identifier, and schema version.  This is here: https://github.com/hortonworks/registry
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by markap14 <gi...@git.apache.org>.

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112450884
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/schema/access/SchemaNameAsAttribute.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.schema.access;
    +
    +import java.io.IOException;
    +import java.io.OutputStream;
    +import java.util.Collections;
    +import java.util.Map;
    +import java.util.Optional;
    +
    +import org.apache.nifi.serialization.record.RecordSchema;
    +import org.apache.nifi.serialization.record.SchemaIdentifier;
    +
    +public class SchemaNameAsAttribute implements SchemaAccessWriter {
    --- End diff --
    
    It probably makes sense to add that as a strategy as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by markap14 <gi...@git.apache.org>.

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112444804
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/avro/AvroReader.java ---
    @@ -19,31 +19,47 @@
     
     import java.io.IOException;
     import java.io.InputStream;
    +import java.util.ArrayList;
    +import java.util.List;
     
     import org.apache.nifi.annotation.documentation.CapabilityDescription;
     import org.apache.nifi.annotation.documentation.Tags;
    -import org.apache.nifi.controller.AbstractControllerService;
    +import org.apache.nifi.components.AllowableValue;
     import org.apache.nifi.flowfile.FlowFile;
     import org.apache.nifi.logging.ComponentLog;
    +import org.apache.nifi.schema.access.SchemaNotFoundException;
     import org.apache.nifi.serialization.MalformedRecordException;
     import org.apache.nifi.serialization.RecordReader;
    -import org.apache.nifi.serialization.RowRecordReaderFactory;
    -import org.apache.nifi.serialization.record.RecordSchema;
    +import org.apache.nifi.serialization.RecordReaderFactory;
    +import org.apache.nifi.serialization.SchemaRegistryService;
     
     @Tags({"avro", "parse", "record", "row", "reader", "delimited", "comma", "separated", "values"})
     @CapabilityDescription("Parses Avro data and returns each Avro record as an separate Record object. The Avro data must contain "
    --- End diff --
    
    Good catch. Will update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112324338
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/schema/access/HortonworksEncodedSchemaReferenceStrategy.java ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.schema.access;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.nio.ByteBuffer;
    +
    +import org.apache.nifi.controller.ConfigurationContext;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.schemaregistry.services.SchemaRegistry;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +import org.apache.nifi.stream.io.StreamUtils;
    +
    +public class HortonworksEncodedSchemaReferenceStrategy implements SchemaAccessStrategy {
    --- End diff --
    
    We either need to provide a direct link to publicly accessible reference material on what the 'hortonworks schema reference' strategy is OR we should avoid referencing it as such.  I'd actually prefer we call this the 'IdentifierVersionSchemaReferenceStrategy' and simply document that it is expected a long for identifier and an integer for version to exist on the message before the actual payload is provided.  The identifier and version then can be used against some schema registry to lookup the actual schema.  We can add other lookup strategies then that others might have and we can avoid getting into vendor naming and potentially even trademark complications.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    ehh nm i got it to apply cleanly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by markap14 <gi...@git.apache.org>.

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112517204
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/schema/access/HortonworksEncodedSchemaReferenceStrategy.java ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.schema.access;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.nio.ByteBuffer;
    +
    +import org.apache.nifi.controller.ConfigurationContext;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.schemaregistry.services.SchemaRegistry;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +import org.apache.nifi.stream.io.StreamUtils;
    +
    +public class HortonworksEncodedSchemaReferenceStrategy implements SchemaAccessStrategy {
    --- End diff --
    
    Sure thing. Will update that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by markap14 <gi...@git.apache.org>.

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112449062
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/grok/GrokReader.java ---
    @@ -23,38 +23,58 @@
     import java.io.Reader;
     import java.util.ArrayList;
     import java.util.List;
    +import java.util.Map;
    +import java.util.regex.Matcher;
     
     import org.apache.nifi.annotation.documentation.CapabilityDescription;
     import org.apache.nifi.annotation.documentation.Tags;
     import org.apache.nifi.annotation.lifecycle.OnEnabled;
    +import org.apache.nifi.components.AllowableValue;
     import org.apache.nifi.components.PropertyDescriptor;
     import org.apache.nifi.controller.ConfigurationContext;
     import org.apache.nifi.flowfile.FlowFile;
     import org.apache.nifi.logging.ComponentLog;
     import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.schema.access.SchemaAccessStrategy;
    +import org.apache.nifi.schema.access.SchemaNotFoundException;
    +import org.apache.nifi.schemaregistry.services.SchemaRegistry;
     import org.apache.nifi.serialization.RecordReader;
    -import org.apache.nifi.serialization.RowRecordReaderFactory;
    -import org.apache.nifi.serialization.SchemaRegistryRecordReader;
    +import org.apache.nifi.serialization.RecordReaderFactory;
    +import org.apache.nifi.serialization.SchemaRegistryService;
    +import org.apache.nifi.serialization.SimpleRecordSchema;
    +import org.apache.nifi.serialization.record.DataType;
    +import org.apache.nifi.serialization.record.RecordField;
    +import org.apache.nifi.serialization.record.RecordFieldType;
     import org.apache.nifi.serialization.record.RecordSchema;
     
     import io.thekraken.grok.api.Grok;
    +import io.thekraken.grok.api.GrokUtils;
     import io.thekraken.grok.api.exception.GrokException;
     
     @Tags({"grok", "logs", "logfiles", "parse", "unstructured", "text", "record", "reader", "regex", "pattern", "logstash"})
     @CapabilityDescription("Provides a mechanism for reading unstructured text data, such as log files, and structuring the data "
         + "so that it can be processed. The service is configured using Grok patterns. "
         + "The service reads from a stream of data and splits each message that it finds into a separate Record, each containing the fields that are configured. "
    -    + "If a line in the input does not match the expected message pattern, the line of text is considered to be part of the previous "
    -    + "message, with the exception of stack traces. A stack trace that is found at the end of a log message is considered to be part "
    -    + "of the previous message but is added to the 'STACK_TRACE' field of the Record. If a record has no stack trace, it will have a NULL value "
    -    + "for the STACK_TRACE field. All fields that are parsed are considered to be of type String by default. If there is need to change the type of a field, "
    -    + "this can be accomplished by configuring the Schema Registry to use and adding the appropriate schema.")
    -public class GrokReader extends SchemaRegistryRecordReader implements RowRecordReaderFactory {
    +    + "If a line in the input does not match the expected message pattern, the line of text is either considered to be part of the previous "
    +    + "message or is skipped, depending on the configuration,, with the exception of stack traces. A stack trace that is found at the end of "
    +    + "a log message is considered to be part of the previous message but is added to the 'stackTrace' field of the Record. If a record has "
    +    + "no stack trace, it will have a NULL value for the stackTrace field. All fields that are parsed are considered to be of type String by default. "
    --- End diff --
    
    I'll preface this comment with the disclaimer that I have little experience with LogStash and Grok. However, through my "extensive" google-based research, it looks like Grok itself doesn't really provide all of the necessary means for easily capturing stack traces. LogStash, for instance, adds on top of that to allow for multiline filters, multiline codecs, etc. There are several different approaches used in LogStash to capture stack traces, though, and it appears to be one of the very common problems that people run into. Perhaps it would make sense to introduce a LogStash reader at some point that could add more of those capabilities into reading log messages. But for now it made sense to me to instead simply check for a stack trace ourselves, since it is such a very common problem and I wanted to make it as easy as possible for users


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by markap14 <gi...@git.apache.org>.

Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112450731
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/schema/access/HortonworksEncodedSchemaReferenceStrategy.java ---
    @@ -0,0 +1,60 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.schema.access;
    +
    +import java.io.IOException;
    +import java.io.InputStream;
    +import java.nio.ByteBuffer;
    +
    +import org.apache.nifi.controller.ConfigurationContext;
    +import org.apache.nifi.flowfile.FlowFile;
    +import org.apache.nifi.schemaregistry.services.SchemaRegistry;
    +import org.apache.nifi.serialization.record.RecordSchema;
    +import org.apache.nifi.stream.io.StreamUtils;
    +
    +public class HortonworksEncodedSchemaReferenceStrategy implements SchemaAccessStrategy {
    --- End diff --
    
    I found the pattern used by looking at source code, since the Schema Registry is open source. I can look for documentation that spells this out. In my mind, I would steer clear of calling it something like IdentifierVersionSchemaReferenceStrategy for a few reasons. Firstly, it's quite wordy (though I guess mine is too) :) Secondly, the first byte of the pattern is a 'protocol' value that tells us how to interpret the rest of the bytes, how many bytes there are, etc. If the Hwx Schema Registry changes their serialization format - for instance, by adding new fields beyond identifier and version, or if they decide that the current format is too verbose and remove the version (assuming that a unique identifier would be given to each 'version') then this strategy can be updated to take advantage of that. If we went the route of IdentifierVersionSchemaReferenceStrategy, then we would need yet another strategy and it may be difficult for the user to know which one they need. Finally, it
  is not obvious from such a name why it would be used whereas a name like HortonworksEncodedSchemaReferenceStrategy makes it clear to users of the Hwx Schema Registry, I think, that this is compatible with the serializers & deserializers available there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    Soooo close.  "schema registry is invalid because schema.name is set and scheme registry is not..." paraphrasing a bit there.  Looks like validation on that is  alittle buggy.  I'm using the GrokReader without a schema registry set and this is what pops up now.  All else looks solid.  Pretty sure this will wrap it up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by markap14 <gi...@git.apache.org>.

Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    I think I've got the things wrapped up that we've discussed here. However, I did realize that avro schemas define a default value and a set of aliases, which we don't take into account. So I'm updating now to address that. Then I'll create a PR, likely tomorrow morning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112255200
  
    --- Diff: nifi-nar-bundles/nifi-registry-bundle/nifi-registry-service/src/main/java/org/apache/nifi/schemaregistry/services/AvroSchemaRegistry.java ---
    @@ -54,39 +57,51 @@
         private static final String LOGICAL_TYPE_TIMESTAMP_MILLIS = "timestamp-millis";
         private static final String LOGICAL_TYPE_TIMESTAMP_MICROS = "timestamp-micros";
     
    -
         public AvroSchemaRegistry() {
             this.schemaNameToSchemaMap = new HashMap<>();
         }
     
    -    @OnEnabled
    -    public void enable(ConfigurationContext configuratiponContext) throws InitializationException {
    -        this.schemaNameToSchemaMap.putAll(configuratiponContext.getProperties().entrySet().stream()
    -            .filter(propEntry -> propEntry.getKey().isDynamic())
    -            .collect(Collectors.toMap(propEntry -> propEntry.getKey().getName(), propEntry -> propEntry.getValue())));
    +    @Override
    +    public String retrieveSchemaText(final String schemaName) throws SchemaNotFoundException {
    +        final String schemaText = schemaNameToSchemaMap.get(schemaName);
    +        if (schemaText == null) {
    +            throw new SchemaNotFoundException("Unable to find schema with name '" + schemaName + "'");
    +        }
    +
    +        return schemaText;
         }
     
         @Override
    -    public String retrieveSchemaText(String schemaName) {
    -        if (!this.schemaNameToSchemaMap.containsKey(schemaName)) {
    -            throw new IllegalArgumentException("Failed to find schema; Name: '" + schemaName + ".");
    -        } else {
    -            return this.schemaNameToSchemaMap.get(schemaName);
    -        }
    +    public RecordSchema retrieveSchema(final String schemaName) throws SchemaNotFoundException {
    +        final String schemaText = retrieveSchemaText(schemaName);
    +        final Schema schema = new Schema.Parser().parse(schemaText);
    +        return createRecordSchema(schema, schemaText, schemaName);
         }
     
         @Override
    -    public String retrieveSchemaText(String schemaName, Map<String, String> attributes) {
    -        throw new UnsupportedOperationException("This version of schema registry does not "
    -            + "support this operation, since schemas are only identofied by name.");
    +    public RecordSchema retrieveSchema(long schemaId, int version) throws IOException, SchemaNotFoundException {
    +        throw new SchemaNotFoundException("This Schema Registry does not support schema lookup by identifier and version - only by name.");
         }
     
         @Override
    +    public String retrieveSchemaText(long schemaId, int version) throws IOException, SchemaNotFoundException {
    +        throw new SchemaNotFoundException("This Schema Registry does not support schema lookup by identifier and version - only by name.");
    +    }
    +
         @OnDisabled
         public void close() throws Exception {
    -        this.schemaNameToSchemaMap.clear();
    +        schemaNameToSchemaMap.clear();
         }
     
    +
    +    @OnEnabled
    +    public void enable(final ConfigurationContext configuratiponContext) throws InitializationException {
    --- End diff --
    
    spelling mistake on the var name configuratiponContext


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    and these
    
    16:17:32 PDTERROR552176d0-84dc-3e8f-f567-345d11834c65
    QueryRecord[id=552176d0-84dc-3e8f-f567-345d11834c65] Unable to query StandardFlowFileRecord[uuid=27127114-2059-4269-996b-6ea097d6db70,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1492643577664-1, container=default, section=1], offset=19874, length=203],offset=0,name=nifi-app.19874-20077.log,size=203] due to org.apache.nifi.schema.access.SchemaNotFoundException: java.lang.NullPointerException: org.apache.nifi.schema.access.SchemaNotFoundException: java.lang.NullPointerException
    
    
    2017-04-19 16:18:02,878 ERROR [Timer-Driven Process Thread-2] o.a.nifi.processors.standard.QueryRecord QueryRecord[id=552176d0-84dc-3e8f-f567-345d11834c65] Unable to query StandardFlowFileRecord[uuid=27127114-2059-4269-996b-6ea097d6db70,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1492643577664-1, container=default, section=1], offset=19874, length=203],offset=0,name=nifi-app.19874-20077.log,size=203] due to org.apache.nifi.schema.access.SchemaNotFoundException: java.lang.NullPointerException: {}
    org.apache.nifi.schema.access.SchemaNotFoundException: java.lang.NullPointerException
            at org.apache.nifi.schema.access.AvroSchemaTextStrategy.getSchema(AvroSchemaTextStrategy.java:52)
            at org.apache.nifi.serialization.SchemaRegistryService.getSchema(SchemaRegistryService.java:140)
            at org.apache.nifi.json.JsonRecordSetWriter.createWriter(JsonRecordSetWriter.java:68)
            at sun.reflect.GeneratedMethodAccessor348.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:89)
            at com.sun.proxy.$Proxy112.createWriter(Unknown Source)
            at org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:257)
            at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
            at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1115)
            at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:144)
            at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
            at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.NullPointerException: null
            at org.apache.avro.Schema.parse(Schema.java:1225)
            at org.apache.avro.Schema$Parser.parse(Schema.java:1032)
            at org.apache.avro.Schema$Parser.parse(Schema.java:1020)
            at org.apache.nifi.schema.access.AvroSchemaTextStrategy.getSchema(AvroSchemaTextStrategy.java:49)
            ... 20 common frames omitted


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    I believe in things like CSV Reader the 'schema text' property should be optional and its docs should highlight which schema access strategies make its value necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #1682: NIFI-3682: Add Schema Access Strategy and Schema Wr...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1682#discussion_r112325468
  
    --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/serialization/SchemaRegistryRecordSetWriter.java ---
    @@ -0,0 +1,113 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.serialization;
    +
    +import java.util.ArrayList;
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +
    +import org.apache.nifi.annotation.lifecycle.OnEnabled;
    +import org.apache.nifi.components.AllowableValue;
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.controller.ConfigurationContext;
    +import org.apache.nifi.schema.access.HortonworksEncodedSchemaReferenceWriter;
    +import org.apache.nifi.schema.access.SchemaAccessWriter;
    +import org.apache.nifi.schema.access.SchemaNameAsAttribute;
    +import org.apache.nifi.schema.access.SchemaTextAsAttribute;
    +
    +public abstract class SchemaRegistryRecordSetWriter extends SchemaRegistryService {
    +
    +    static final AllowableValue SCHEMA_NAME_ATTRIBUTE = new AllowableValue("schema-name", "Use 'schema.name' Attribute",
    +        "The FlowFile will be given an attribute named 'schema.name' and this attribute will indicate the name of the schema in the Schema Registry. Note that if"
    +            + "the schema for a record is not obtained from a Schema Registry, then no attribute will be added.");
    +    static final AllowableValue AVRO_SCHEMA_ATTRIBUTE = new AllowableValue("full-schema-attribute", "Use 'avro.schema' Attribute",
    +        "The FlowFile will be given an attribute named 'avro.schema' and this attribute will contain the Avro Schema that describes the records in the FlowFile. "
    +            + "The contents of the FlowFile need not be Avro, but the text of the schema will be used.");
    +    static final AllowableValue HWX_CONTENT_ENCODED_SCHEMA = new AllowableValue("hwx-content-encoded-schema", "HWX Content-Encoded Schema Reference",
    --- End diff --
    
    see my previous comments on this.  I think we should avoid vendor specific strategies unless we can strongly reference source material publishing the spec we're talking about.  Actually, given the simplicity of what we're describing I think we should avoid vendor specifics entirely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #1682: NIFI-3682: Add Schema Access Strategy and Schema Write Str...

Posted by joewitt <gi...@git.apache.org>.

Github user joewitt commented on the issue:

    https://github.com/apache/nifi/pull/1682
  
    seeing this quite a bit when failing to parse data against a given schema
    
    2017-04-19 16:13:34,895 WARN [Timer-Driven Process Thread-7] o.a.n.c.t.ContinuallyRunProcessorTask
    org.apache.nifi.processor.exception.FlowFileHandlingException: StandardFlowFileRecord[uuid=eb2bb793-5271-4b43-85f8-77240a46ab16,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1492643593409-15, container=default, section=15], offset=14125, length=0],offset=0,name=nifi-app.0-19874.log,size=0] has already been marked for removal
            at org.apache.nifi.controller.repository.StandardProcessSession.validateRecordState(StandardProcessSession.java:2873)
            at org.apache.nifi.controller.repository.StandardProcessSession.validateRecordState(StandardProcessSession.java:2880)
            at org.apache.nifi.controller.repository.StandardProcessSession.remove(StandardProcessSession.java:1905)
            at org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:341)
            at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
            at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1115)
            at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:144)
            at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
            at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---