You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Eric Sammons <el...@gmail.com> on 2020/09/29 20:44:01 UTC

newbie attempting first flow with InvokeHTTP and PutDatabaseRecord

Hello,

As the subject line states I am a new to NiFi.  For my first flow I am
attempting to replace my python code with a NiFi workflow that extracts
data from a JSON endpoint and then writes the JSON to a Redshift
database.schema.table.

I have set up my DBCP for redshift and I have assigned that to my
PutDatabaseRecord processor; however, I am having (at least) two issues so
far.


   1. How do I tell InvokeHTTP (OR PutDatabaseRecord) to start reading the
   JSON from a specific point, most json responses are in the form of
   result.json()['data'] or result.json()['content']?
   2. What is the best approach to having the PutDatabaseRecord go to a
   "Create Table" when the table and thus columns do not exist?  Is there a
   way to have PutDatabaseRecord create the table if it doesn't exist?

As a side note to #2, the PutDatabaseRecord is reporting "PutDatabaseRecord
failed to process standardflowfilerecord.... due to None of the fields in
the record map to the columns defined by the schema.table table."  On
failures the JSON contents are being written to a file so I am able to see
evidence that InvokeHTTP is working and that PutDatabaseRecord has at least
looked at / for the schema.table.

Thank you!

Re: newbie attempting first flow with InvokeHTTP and PutDatabaseRecord

Posted by Eric Sammons <el...@gmail.com>.
Thank you Matt for the suggestion, I'm currently using JsonTreeReader w/
PutDatabaseRecord.  Wasn't sure if there was a way define the desired
content with this Record Reader.

And the records I am currently working with are:
{"content":[
     {"owner":"blah",
      "share":false,
      "description": "Job URL: https:.....",
      "id":"5f6afdb03972e5000164886c",
      "name":"NAME",
      "number":38,
      "start_time":1600847279311,
      "end_time":1600847284052,
      "status":"PASSED",
      "statistics":{
            "executions":{
            "total":"142",
            "passed":"142",
            "failed":"0",
            "skipped":"0"
            },
      "defects":{
           "product_bug":{
                 "total":0,
                 "PB001":0
           },
          "automation_bug":{
                "AB001":0,
                "total":0
          },
          "system_issue":{
                "total":0,
                "SI001":0
           },
          "to_investigate":{
               "total":0,
               "TI001":0
           },
          "no_defect":{
               "ND001":0,
               "total":0
          }
      }
   },
  "tags":["tag1", "tag2", "tag3"],
  ...
]
}

And typically in python I would do
pd.json_normalzie(results.json()['content']) and I would use
json_normalize() again on the statistics and defects columns and provide
the sep='_' option.

Thanks!
Eric

On Tue, Sep 29, 2020 at 2:12 PM Matt Burgess <ma...@apache.org> wrote:

> Eric,
>
> Depending on how large the JSON content is, you could use
> JoltTransformJSON to "hoist" the desired data to the top level. Given
> this example JSON:
>
> {
>   "json": {
>     "data": {
>       "value": 3
>     }
>   }
> }
>
> The spec would be:
>
> [
>   {
>     "operation": "shift",
>     "spec": {
>       "json": {
>         "data": {
>           "*": "&"
>         }
>       }
>     }
>   }
> ]
>
> It "walks down the tree" until it gets to what you want (in this case
> all fields under "data"), then outputs it at the top level.
>
>
> For PutDatabaseRecord, there currently is no way to have it create the
> target table if it doesn't exist, I wrote up NIFI-7862 [1] to cover
> this improvement. For the "none of the fields map to columns" issue,
> make sure the table and field/column names match the correct
> upper/lower case. You may also need to set "Translate Field Names" to
> true so it is less strict about the matching.
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-7862
>
> On Tue, Sep 29, 2020 at 4:44 PM Eric Sammons <el...@gmail.com> wrote:
> >
> > Hello,
> >
> > As the subject line states I am a new to NiFi.  For my first flow I am
> attempting to replace my python code with a NiFi workflow that extracts
> data from a JSON endpoint and then writes the JSON to a Redshift
> database.schema.table.
> >
> > I have set up my DBCP for redshift and I have assigned that to my
> PutDatabaseRecord processor; however, I am having (at least) two issues so
> far.
> >
> > How do I tell InvokeHTTP (OR PutDatabaseRecord) to start reading the
> JSON from a specific point, most json responses are in the form of
> result.json()['data'] or result.json()['content']?
> > What is the best approach to having the PutDatabaseRecord go to a
> "Create Table" when the table and thus columns do not exist?  Is there a
> way to have PutDatabaseRecord create the table if it doesn't exist?
> >
> > As a side note to #2, the PutDatabaseRecord is reporting
> "PutDatabaseRecord failed to process standardflowfilerecord.... due to None
> of the fields in the record map to the columns defined by the schema.table
> table."  On failures the JSON contents are being written to a file so I am
> able to see evidence that InvokeHTTP is working and that PutDatabaseRecord
> has at least looked at / for the schema.table.
> >
> > Thank you!
>

Re: newbie attempting first flow with InvokeHTTP and PutDatabaseRecord

Posted by Matt Burgess <ma...@apache.org>.
Eric,

Depending on how large the JSON content is, you could use
JoltTransformJSON to "hoist" the desired data to the top level. Given
this example JSON:

{
  "json": {
    "data": {
      "value": 3
    }
  }
}

The spec would be:

[
  {
    "operation": "shift",
    "spec": {
      "json": {
        "data": {
          "*": "&"
        }
      }
    }
  }
]

It "walks down the tree" until it gets to what you want (in this case
all fields under "data"), then outputs it at the top level.


For PutDatabaseRecord, there currently is no way to have it create the
target table if it doesn't exist, I wrote up NIFI-7862 [1] to cover
this improvement. For the "none of the fields map to columns" issue,
make sure the table and field/column names match the correct
upper/lower case. You may also need to set "Translate Field Names" to
true so it is less strict about the matching.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-7862

On Tue, Sep 29, 2020 at 4:44 PM Eric Sammons <el...@gmail.com> wrote:
>
> Hello,
>
> As the subject line states I am a new to NiFi.  For my first flow I am attempting to replace my python code with a NiFi workflow that extracts data from a JSON endpoint and then writes the JSON to a Redshift database.schema.table.
>
> I have set up my DBCP for redshift and I have assigned that to my PutDatabaseRecord processor; however, I am having (at least) two issues so far.
>
> How do I tell InvokeHTTP (OR PutDatabaseRecord) to start reading the JSON from a specific point, most json responses are in the form of result.json()['data'] or result.json()['content']?
> What is the best approach to having the PutDatabaseRecord go to a "Create Table" when the table and thus columns do not exist?  Is there a way to have PutDatabaseRecord create the table if it doesn't exist?
>
> As a side note to #2, the PutDatabaseRecord is reporting "PutDatabaseRecord failed to process standardflowfilerecord.... due to None of the fields in the record map to the columns defined by the schema.table table."  On failures the JSON contents are being written to a file so I am able to see evidence that InvokeHTTP is working and that PutDatabaseRecord has at least looked at / for the schema.table.
>
> Thank you!