You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by joaohf <gi...@git.apache.org> on 2016/03/04 22:08:31 UTC

[GitHub] nifi pull request: NIFI-1594: Add option to bulk using Index or Up...

GitHub user joaohf opened a pull request:

    https://github.com/apache/nifi/pull/255

    NIFI-1594: Add option to bulk using Index or Update.

    Signed-off-by: João Henrique Ferreira de Freitas <jo...@gmail.com>

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/joaohf/nifi NIFI-1594

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #255
    
----
commit 84597d253c61ed4f23c3078607da5f05145187c0
Author: João Henrique Ferreira de Freitas <jo...@gmail.com>
Date:   2016-02-24T17:44:34Z

    NIFI-1594: Add option to bulk using Index or Update.
    
    Signed-off-by: João Henrique Ferreira de Freitas <jo...@gmail.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request: NIFI-1594: Add option to bulk using Index or Up...

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/255#discussion_r55112614
  
    --- Diff: nifi-nar-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearch.java ---
    @@ -99,6 +99,14 @@
                         AttributeExpression.ResultType.STRING, true))
                 .build();
     
    +    public static final PropertyDescriptor INDEX_OP = new PropertyDescriptor.Builder()
    +            .name("Index Operation")
    +            .description("The type of the operation used to index (index, update, upsert)")
    --- End diff --
    
    The description mentions three modes but only two are allowed. Should they be "insert" and "update/upsert", or is "index" the preferred terminology for adding a new document?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request: NIFI-1594: Add option to bulk using Index or Up...

Posted by joaohf <gi...@git.apache.org>.
Github user joaohf commented on the pull request:

    https://github.com/apache/nifi/pull/255#issuecomment-218329839
  
    Hi @apiri,
    
    Fixed the test that was failed.
    
    Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #255: NIFI-1594: Add option to bulk using Index or Update.

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/255
  
    @joaohf will you have a chance to make the suggested updates? The PMC is looking to cut a 0.7.0 release soon and are wondering if this ticket can/should be included. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #255: NIFI-1594: Add option to bulk using Index or Update.

Posted by joaohf <gi...@git.apache.org>.
Github user joaohf commented on the issue:

    https://github.com/apache/nifi/pull/255
  
    @mattyb149,  done. Could you take a look?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #255: NIFI-1594: Add option to bulk using Index or Update.

Posted by joaohf <gi...@git.apache.org>.
Github user joaohf commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/255#discussion_r67496113
  
    --- Diff: nifi-nar-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearch.java ---
    @@ -178,8 +190,20 @@ public void onTrigger(final ProcessContext context, final ProcessSession session
                             public void process(final InputStream in) throws IOException {
                                 String json = IOUtils.toString(in, charset)
                                         .replace("\r\n", " ").replace('\n', ' ').replace('\r', ' ');
    -                            bulk.add(esClient.get().prepareIndex(index, docType, id)
    -                                    .setSource(json.getBytes(charset)));
    +
    +                            if (indexOp.equalsIgnoreCase("index")) {
    +                                bulk.add(esClient.get().prepareIndex(index, docType, id)
    +                                        .setSource(json.getBytes(charset)));
    +                            } else if (indexOp.equalsIgnoreCase("upsert")) {
    --- End diff --
    
    The difference is about how ES will handle it.
    
    See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#upserts. "The update API also support passing a partial document, which will be merged into the existing document .... instead of sending a partial doc plus an upsert doc, setting doc_as_upsert to true will use the contents of doc as the upsert value".
    
    You will see the difference when you upsert two or more documents like this:
    
    first time upserting: doc1 `{ "prop1": [...] }`
    
    second time upserting: doc1 `{ "prop2": [...]}`
    
    And the final doc1 should be: { "prop1": [...],  "prop2": [...]}



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi issue #255: NIFI-1594: Add option to bulk using Index or Update.

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/255
  
    +1 LGTM, built and ran tests, also ran a few sample documents through the flow to exercise the index, update, and upsert logic. Thanks for the contribution! will merge to 0.x and master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request: NIFI-1594: Add option to bulk using Index or Up...

Posted by apiri <gi...@git.apache.org>.
Github user apiri commented on the pull request:

    https://github.com/apache/nifi/pull/255#issuecomment-213661619
  
    Hey @joaohf!  Looks like @mattyb149 has already started giving some input but saw that there are a few test failures concerning expression language such as the one below (full sample log is available at: https://s3.amazonaws.com/archive.travis-ci.org/jobs/114414795/log.txt).  Haven't dug deep into the tests but wanted to call it out to your attention to get a fix for later incorporation.
    
    Thanks!
    
    ```
    testPutElasticSearchOnTrigger(org.apache.nifi.processors.elasticsearch.TestPutElasticsearch)  Time elapsed: 0.01 sec  <<< FAILURE!
    java.lang.AssertionError: java.lang.IllegalStateException: Attempting to retrieve value of PropertyDescriptor[Index Operation] without first evaluating Expressions, even though the PropertyDescriptor indicates that the Expression Language is Supported. If you realize that this is the case and do not want this error to occur, it can be disabled by calling TestRunner.setValidateExpressionUsage(false)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #255: NIFI-1594: Add option to bulk using Index or Update.

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/255#discussion_r66067616
  
    --- Diff: nifi-nar-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearch.java ---
    @@ -178,8 +190,18 @@ public void onTrigger(final ProcessContext context, final ProcessSession session
                             public void process(final InputStream in) throws IOException {
                                 String json = IOUtils.toString(in, charset)
                                         .replace("\r\n", " ").replace('\n', ' ').replace('\r', ' ');
    -                            bulk.add(esClient.get().prepareIndex(index, docType, id)
    -                                    .setSource(json.getBytes(charset)));
    +
    +                            if (indexOp.equalsIgnoreCase("index")) {
    +                                bulk.add(esClient.get().prepareIndex(index, docType, id)
    +                                        .setSource(json.getBytes(charset)));
    +                            } else if (indexOp.equalsIgnoreCase("upsert")) {
    +                                bulk.add(esClient.get().prepareUpdate(index, docType, id)
    +                                        .setDoc(json.getBytes(charset))
    +                                        .setDocAsUpsert(true));
    +                            } else if (indexOp.equalsIgnoreCase("update")) {
    +                                bulk.add(esClient.get().prepareUpdate(index, docType, id)
    +                                        .setDoc(json.getBytes(charset)));
    +                            }
    --- End diff --
    
    If the index operation is not one of these three values, it looks like the file is quietly ignored and still transferred to success. Can you add an else clause that throws an IOException, and unit test(s) to test these code paths?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request: NIFI-1594: Add option to bulk using Index or Up...

Posted by joaohf <gi...@git.apache.org>.
Github user joaohf commented on the pull request:

    https://github.com/apache/nifi/pull/255#issuecomment-193547644
  
    @bbende Nice idea.
    
    @mattyb149 The right terminology is 'index'. The code should accept the update and upsert. Fixing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #255: NIFI-1594: Add option to bulk using Index or Update.

Posted by mattyb149 <gi...@git.apache.org>.
Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/255#discussion_r67277159
  
    --- Diff: nifi-nar-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearch.java ---
    @@ -178,8 +190,20 @@ public void onTrigger(final ProcessContext context, final ProcessSession session
                             public void process(final InputStream in) throws IOException {
                                 String json = IOUtils.toString(in, charset)
                                         .replace("\r\n", " ").replace('\n', ' ').replace('\r', ' ');
    -                            bulk.add(esClient.get().prepareIndex(index, docType, id)
    -                                    .setSource(json.getBytes(charset)));
    +
    +                            if (indexOp.equalsIgnoreCase("index")) {
    +                                bulk.add(esClient.get().prepareIndex(index, docType, id)
    +                                        .setSource(json.getBytes(charset)));
    +                            } else if (indexOp.equalsIgnoreCase("upsert")) {
    --- End diff --
    
    I haven;t seen a difference in behavior between index and upsert, I tried setting the document identifier attribute to a constant number. What should I do to try upsert or update?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] nifi pull request #255: NIFI-1594: Add option to bulk using Index or Update.

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/255


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---