You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by l vic <lv...@gmail.com> on 2019/02/06 19:07:22 UTC

PutSQL benchmarking ?

I have performance issues with PutSQL i my flow... Is there some way to
benchmark time required to write certain number of records to table from
GenerateFlowFile?
Thank you,

Re: PutSQL benchmarking ?

Posted by l vic <lv...@gmail.com>.
See attached. The left branch extracts attributes from "root" element and
creates upsert query for "parent" table; The right branch extracts
contained json array and makes upsert queries for "child" table from
elements of that array

On Wed, Feb 6, 2019 at 4:10 PM Matt Burgess <ma...@apache.org> wrote:

> If you don't do record splitting, how are you getting SQL to send to
> PutSQL? Can you describe your flow (processors, e.g.)?
>
> Thanks,
> Matt
>
> On Wed, Feb 6, 2019 at 3:41 PM l vic <lv...@gmail.com> wrote:
> >
> > Hi Matt,
> > No, I don't do record splitting, data looks like {
> "attr1":"val1",...[{....}]}
> > where "parent" data is saved into 1 record in "parent" table and array
> data is saved into multiple records in "child" table...
> > What's "lineage duration"?
> > Event Duration
> > < 1ms
> > Lineage Duration
> > 00:00:00.070
> >
> > On Wed, Feb 6, 2019 at 2:59 PM Matt Burgess <ma...@apache.org>
> wrote:
> >>
> >> In your flow, what does the data look like? Are you splitting it into
> >> individual records, then converting to SQL (probably via JSON) and
> >> calling PutSQL? If so, that's not going to be very performant; the
> >> PutDatabaseRecord processor combines all that together so you can
> >> leave your data in its original state (i.e. many records in one flow
> >> file). For benchmarking PutDatabaseRecord (PDR), you could provide
> >> sample data via GenerateFlowFile, run a few through PDR, and check the
> >> provenance events for fields such as durationMillis or calculations
> >> like (timestampMills - lineageStart).
> >>
> >> Regards,
> >> Matt
> >>
> >> On Wed, Feb 6, 2019 at 2:07 PM l vic <lv...@gmail.com> wrote:
> >> >
> >> > I have performance issues with PutSQL i my flow... Is there some way
> to benchmark time required to write certain number of records to table from
> GenerateFlowFile?
> >> > Thank you,
>

Re: PutSQL benchmarking ?

Posted by l vic <lv...@gmail.com>.
Would it be possible to work around this by passing "upsert" as attribute
to flowfile? If so: where can i find some examples of using
PutDatabaseRecord with RecordReader to extract/save Json array?
Thank you

On Thu, Feb 7, 2019 at 1:03 PM Matt Burgess <ma...@apache.org> wrote:

> Yeah that's a gap that needs filling. I'm hopefully wrapping up some
> stuff shortly, and would like to take a crack at upsert for PDR.
>
> Regards,
> Matt
>
> On Thu, Feb 7, 2019 at 12:54 PM l vic <lv...@gmail.com> wrote:
> >
> > Sorry, I realize i do indeed perform record splitting, the problem with
> PutDatabaseRecord is that it doesn't seem to recognize "upsert"
> >
> > On Wed, Feb 6, 2019 at 4:10 PM Matt Burgess <ma...@apache.org>
> wrote:
> >>
> >> If you don't do record splitting, how are you getting SQL to send to
> >> PutSQL? Can you describe your flow (processors, e.g.)?
> >>
> >> Thanks,
> >> Matt
> >>
> >> On Wed, Feb 6, 2019 at 3:41 PM l vic <lv...@gmail.com> wrote:
> >> >
> >> > Hi Matt,
> >> > No, I don't do record splitting, data looks like {
> "attr1":"val1",...[{....}]}
> >> > where "parent" data is saved into 1 record in "parent" table and
> array data is saved into multiple records in "child" table...
> >> > What's "lineage duration"?
> >> > Event Duration
> >> > < 1ms
> >> > Lineage Duration
> >> > 00:00:00.070
> >> >
> >> > On Wed, Feb 6, 2019 at 2:59 PM Matt Burgess <ma...@apache.org>
> wrote:
> >> >>
> >> >> In your flow, what does the data look like? Are you splitting it into
> >> >> individual records, then converting to SQL (probably via JSON) and
> >> >> calling PutSQL? If so, that's not going to be very performant; the
> >> >> PutDatabaseRecord processor combines all that together so you can
> >> >> leave your data in its original state (i.e. many records in one flow
> >> >> file). For benchmarking PutDatabaseRecord (PDR), you could provide
> >> >> sample data via GenerateFlowFile, run a few through PDR, and check
> the
> >> >> provenance events for fields such as durationMillis or calculations
> >> >> like (timestampMills - lineageStart).
> >> >>
> >> >> Regards,
> >> >> Matt
> >> >>
> >> >> On Wed, Feb 6, 2019 at 2:07 PM l vic <lv...@gmail.com> wrote:
> >> >> >
> >> >> > I have performance issues with PutSQL i my flow... Is there some
> way to benchmark time required to write certain number of records to table
> from GenerateFlowFile?
> >> >> > Thank you,
>

Re: PutSQL benchmarking ?

Posted by Matt Burgess <ma...@apache.org>.
Yeah that's a gap that needs filling. I'm hopefully wrapping up some
stuff shortly, and would like to take a crack at upsert for PDR.

Regards,
Matt

On Thu, Feb 7, 2019 at 12:54 PM l vic <lv...@gmail.com> wrote:
>
> Sorry, I realize i do indeed perform record splitting, the problem with PutDatabaseRecord is that it doesn't seem to recognize "upsert"
>
> On Wed, Feb 6, 2019 at 4:10 PM Matt Burgess <ma...@apache.org> wrote:
>>
>> If you don't do record splitting, how are you getting SQL to send to
>> PutSQL? Can you describe your flow (processors, e.g.)?
>>
>> Thanks,
>> Matt
>>
>> On Wed, Feb 6, 2019 at 3:41 PM l vic <lv...@gmail.com> wrote:
>> >
>> > Hi Matt,
>> > No, I don't do record splitting, data looks like { "attr1":"val1",...[{....}]}
>> > where "parent" data is saved into 1 record in "parent" table and array data is saved into multiple records in "child" table...
>> > What's "lineage duration"?
>> > Event Duration
>> > < 1ms
>> > Lineage Duration
>> > 00:00:00.070
>> >
>> > On Wed, Feb 6, 2019 at 2:59 PM Matt Burgess <ma...@apache.org> wrote:
>> >>
>> >> In your flow, what does the data look like? Are you splitting it into
>> >> individual records, then converting to SQL (probably via JSON) and
>> >> calling PutSQL? If so, that's not going to be very performant; the
>> >> PutDatabaseRecord processor combines all that together so you can
>> >> leave your data in its original state (i.e. many records in one flow
>> >> file). For benchmarking PutDatabaseRecord (PDR), you could provide
>> >> sample data via GenerateFlowFile, run a few through PDR, and check the
>> >> provenance events for fields such as durationMillis or calculations
>> >> like (timestampMills - lineageStart).
>> >>
>> >> Regards,
>> >> Matt
>> >>
>> >> On Wed, Feb 6, 2019 at 2:07 PM l vic <lv...@gmail.com> wrote:
>> >> >
>> >> > I have performance issues with PutSQL i my flow... Is there some way to benchmark time required to write certain number of records to table from GenerateFlowFile?
>> >> > Thank you,

Re: PutSQL benchmarking ?

Posted by l vic <lv...@gmail.com>.
Sorry, I realize i do indeed perform record splitting, the problem with
PutDatabaseRecord is that it doesn't seem to recognize "upsert"

On Wed, Feb 6, 2019 at 4:10 PM Matt Burgess <ma...@apache.org> wrote:

> If you don't do record splitting, how are you getting SQL to send to
> PutSQL? Can you describe your flow (processors, e.g.)?
>
> Thanks,
> Matt
>
> On Wed, Feb 6, 2019 at 3:41 PM l vic <lv...@gmail.com> wrote:
> >
> > Hi Matt,
> > No, I don't do record splitting, data looks like {
> "attr1":"val1",...[{....}]}
> > where "parent" data is saved into 1 record in "parent" table and array
> data is saved into multiple records in "child" table...
> > What's "lineage duration"?
> > Event Duration
> > < 1ms
> > Lineage Duration
> > 00:00:00.070
> >
> > On Wed, Feb 6, 2019 at 2:59 PM Matt Burgess <ma...@apache.org>
> wrote:
> >>
> >> In your flow, what does the data look like? Are you splitting it into
> >> individual records, then converting to SQL (probably via JSON) and
> >> calling PutSQL? If so, that's not going to be very performant; the
> >> PutDatabaseRecord processor combines all that together so you can
> >> leave your data in its original state (i.e. many records in one flow
> >> file). For benchmarking PutDatabaseRecord (PDR), you could provide
> >> sample data via GenerateFlowFile, run a few through PDR, and check the
> >> provenance events for fields such as durationMillis or calculations
> >> like (timestampMills - lineageStart).
> >>
> >> Regards,
> >> Matt
> >>
> >> On Wed, Feb 6, 2019 at 2:07 PM l vic <lv...@gmail.com> wrote:
> >> >
> >> > I have performance issues with PutSQL i my flow... Is there some way
> to benchmark time required to write certain number of records to table from
> GenerateFlowFile?
> >> > Thank you,
>

Re: PutSQL benchmarking ?

Posted by Matt Burgess <ma...@apache.org>.
If you don't do record splitting, how are you getting SQL to send to
PutSQL? Can you describe your flow (processors, e.g.)?

Thanks,
Matt

On Wed, Feb 6, 2019 at 3:41 PM l vic <lv...@gmail.com> wrote:
>
> Hi Matt,
> No, I don't do record splitting, data looks like { "attr1":"val1",...[{....}]}
> where "parent" data is saved into 1 record in "parent" table and array data is saved into multiple records in "child" table...
> What's "lineage duration"?
> Event Duration
> < 1ms
> Lineage Duration
> 00:00:00.070
>
> On Wed, Feb 6, 2019 at 2:59 PM Matt Burgess <ma...@apache.org> wrote:
>>
>> In your flow, what does the data look like? Are you splitting it into
>> individual records, then converting to SQL (probably via JSON) and
>> calling PutSQL? If so, that's not going to be very performant; the
>> PutDatabaseRecord processor combines all that together so you can
>> leave your data in its original state (i.e. many records in one flow
>> file). For benchmarking PutDatabaseRecord (PDR), you could provide
>> sample data via GenerateFlowFile, run a few through PDR, and check the
>> provenance events for fields such as durationMillis or calculations
>> like (timestampMills - lineageStart).
>>
>> Regards,
>> Matt
>>
>> On Wed, Feb 6, 2019 at 2:07 PM l vic <lv...@gmail.com> wrote:
>> >
>> > I have performance issues with PutSQL i my flow... Is there some way to benchmark time required to write certain number of records to table from GenerateFlowFile?
>> > Thank you,

Re: PutSQL benchmarking ?

Posted by l vic <lv...@gmail.com>.
Hi Matt,
No, I don't do record splitting, data looks like {
"attr1":"val1",...[{....}]}
where "parent" data is saved into 1 record in "parent" table and array data
is saved into multiple records in "child" table...
What's "lineage duration"?
Event Duration
< 1ms
Lineage Duration
00:00:00.070

On Wed, Feb 6, 2019 at 2:59 PM Matt Burgess <ma...@apache.org> wrote:

> In your flow, what does the data look like? Are you splitting it into
> individual records, then converting to SQL (probably via JSON) and
> calling PutSQL? If so, that's not going to be very performant; the
> PutDatabaseRecord processor combines all that together so you can
> leave your data in its original state (i.e. many records in one flow
> file). For benchmarking PutDatabaseRecord (PDR), you could provide
> sample data via GenerateFlowFile, run a few through PDR, and check the
> provenance events for fields such as durationMillis or calculations
> like (timestampMills - lineageStart).
>
> Regards,
> Matt
>
> On Wed, Feb 6, 2019 at 2:07 PM l vic <lv...@gmail.com> wrote:
> >
> > I have performance issues with PutSQL i my flow... Is there some way to
> benchmark time required to write certain number of records to table from
> GenerateFlowFile?
> > Thank you,
>

Re: PutSQL benchmarking ?

Posted by Matt Burgess <ma...@apache.org>.
In your flow, what does the data look like? Are you splitting it into
individual records, then converting to SQL (probably via JSON) and
calling PutSQL? If so, that's not going to be very performant; the
PutDatabaseRecord processor combines all that together so you can
leave your data in its original state (i.e. many records in one flow
file). For benchmarking PutDatabaseRecord (PDR), you could provide
sample data via GenerateFlowFile, run a few through PDR, and check the
provenance events for fields such as durationMillis or calculations
like (timestampMills - lineageStart).

Regards,
Matt

On Wed, Feb 6, 2019 at 2:07 PM l vic <lv...@gmail.com> wrote:
>
> I have performance issues with PutSQL i my flow... Is there some way to benchmark time required to write certain number of records to table from GenerateFlowFile?
> Thank you,