You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jason Iannone <br...@gmail.com> on 2020/09/14 18:29:01 UTC

Re: Extract Avro blob from RDBMS to HDFS

Anyone have thoughts on this? Essentially we have binary avro stored as a
BLOB in Oracle, and I want to extract it via Nifi and read and write out
the contents.

Thanks,
Jason

On Mon, Aug 17, 2020 at 10:04 AM Jason Iannone <br...@gmail.com> wrote:

> Hi all,
>
> I have a scenario where an Avro binary is being stored as a BLOB in an
> RDBMS. What's the recommended approach for querying this in bulk,
> extracting this specific field, and batching it to HDFS?
>
>    1. GenerateTableFetch OR QueryDatabaseTableRecord
>    2. Extract Avro column and assemble output <-- How?
>    3. MergeRecord
>    4. PutHDFS
>
> Additional clarification is that ultimately I want to make the Avro
> exactly as it is (content wise), store in HDFS, with an external Hive table
> on top.
>
> Thanks,
> Jason
>

Re: Extract Avro blob from RDBMS to HDFS

Posted by Jason Iannone <br...@gmail.com>.
Thanks Bryan.

That's exactly where my head is, and was hoping there was an easier way. A
custom processor would allow us to read in a ResulSet, by essentially
modeling it after AbstractRecordProcessor (which by the way would be great
to be extending like AbstractProcessor). The disadvantage in these
approaches is the custom code isn't taking advantage of the Avro pools to
reduce deserialization overhead. I also debated if it would be possible to
create a new service that's essentially the same as AvroReader, and then
tying that into ConvertRecord but I'm not familiar enough with this
approach and whether it can be done and registered with Nifi.

Thanks,
Jason

On Mon, Sep 14, 2020 at 2:37 PM Bryan Bende <bb...@gmail.com> wrote:

> Hello,
>
> I think it likely requires a custom processor, or custom script with
> ExecuteScript.
>
> Coming out of the database processor, you are going to have two levels of
> Avro...
>
> The outer Avro is representing the rows from your database, so you'll have
> Avro records where one field in each record is itself another Avro object.
>
> You would likely need to split all the outer records to one per flow file
> (not great for performance), then for each flow file use the custom
> processors/script to read the value of the field where the Avro blob is,
> and overwrite the flow file content with that value, then send all of these
> to a MergeRecord.
>
> -Bryan
>
>
> On Mon, Sep 14, 2020 at 2:29 PM Jason Iannone <br...@gmail.com> wrote:
>
>> Anyone have thoughts on this? Essentially we have binary avro stored as a
>> BLOB in Oracle, and I want to extract it via Nifi and read and write out
>> the contents.
>>
>> Thanks,
>> Jason
>>
>> On Mon, Aug 17, 2020 at 10:04 AM Jason Iannone <br...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I have a scenario where an Avro binary is being stored as a BLOB in an
>>> RDBMS. What's the recommended approach for querying this in bulk,
>>> extracting this specific field, and batching it to HDFS?
>>>
>>>    1. GenerateTableFetch OR QueryDatabaseTableRecord
>>>    2. Extract Avro column and assemble output <-- How?
>>>    3. MergeRecord
>>>    4. PutHDFS
>>>
>>> Additional clarification is that ultimately I want to make the Avro
>>> exactly as it is (content wise), store in HDFS, with an external Hive table
>>> on top.
>>>
>>> Thanks,
>>> Jason
>>>
>>

Re: Extract Avro blob from RDBMS to HDFS

Posted by Bryan Bende <bb...@gmail.com>.
Hello,

I think it likely requires a custom processor, or custom script with
ExecuteScript.

Coming out of the database processor, you are going to have two levels of
Avro...

The outer Avro is representing the rows from your database, so you'll have
Avro records where one field in each record is itself another Avro object.

You would likely need to split all the outer records to one per flow file
(not great for performance), then for each flow file use the custom
processors/script to read the value of the field where the Avro blob is,
and overwrite the flow file content with that value, then send all of these
to a MergeRecord.

-Bryan


On Mon, Sep 14, 2020 at 2:29 PM Jason Iannone <br...@gmail.com> wrote:

> Anyone have thoughts on this? Essentially we have binary avro stored as a
> BLOB in Oracle, and I want to extract it via Nifi and read and write out
> the contents.
>
> Thanks,
> Jason
>
> On Mon, Aug 17, 2020 at 10:04 AM Jason Iannone <br...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a scenario where an Avro binary is being stored as a BLOB in an
>> RDBMS. What's the recommended approach for querying this in bulk,
>> extracting this specific field, and batching it to HDFS?
>>
>>    1. GenerateTableFetch OR QueryDatabaseTableRecord
>>    2. Extract Avro column and assemble output <-- How?
>>    3. MergeRecord
>>    4. PutHDFS
>>
>> Additional clarification is that ultimately I want to make the Avro
>> exactly as it is (content wise), store in HDFS, with an external Hive table
>> on top.
>>
>> Thanks,
>> Jason
>>
>