You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@nifi.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/03/05 15:51:00 UTC

[jira] [Commented] (NIFI-4833) NIFI-4833 Add ScanHBase processor

    [ https://issues.apache.org/jira/browse/NIFI-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386243#comment-16386243 ] 

ASF GitHub Bot commented on NIFI-4833:
--------------------------------------

Github user bbende commented on the issue:

    https://github.com/apache/nifi/pull/2478
  
    @bdesert Thanks for the updates, was reviewing the code again and I think we need to change to way the `ScanHBaseResultHandler` works...
    
    Currently it adds rows to a list in memory until bulk size is reached, and since bulk size defaults to 0, the default case will be that bulk size is never reached and all the rows are left as "hanging" rows. This means if someone scans a table with 1 million rows, all 1 millions will be in memory before being written to the flow file which would not be good for memory usage.
    
    We should be able to write row by row to the flow file and never add them to a list. Inside the handler we can use `session.append(flowFile, (out) ->` to append a row at a time to the flow file. I think we can then do away with the "hanging rows" concept because there won't be anything buffered in memory.


> NIFI-4833 Add ScanHBase processor
> ---------------------------------
>
>                 Key: NIFI-4833
>                 URL: https://issues.apache.org/jira/browse/NIFI-4833
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Ed Berezitsky
>            Assignee: Ed Berezitsky
>            Priority: Major
>
> Add ScanHBase (new) processor to retrieve records from HBase tables.
> Today there are GetHBase and FetchHBaseRow. GetHBase can pull entire table or only new rows after processor started; it also must be scheduled and doesn't support incoming . FetchHBaseRow can pull rows with known rowkeys only.
> This processor could provide functionality similar to what could be reached by using hbase shell, defining following properties:
> -scan based on range of row key IDs 
> -scan based on range of time stamps
> -limit number of records pulled
> -use filters
> -reverse rows



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)