You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Michal Tomaszewski <Mi...@cca.pl> on 2021/03/19 13:31:56 UTC

SplitText issue

Hi,

In version 13.1 and 14 there is a problem with SplitText. In 1.13.0 and 1.10 component works as expected. So problem is in any of fixes between 1.13 and 1.13.1,

Test scenario:

  *   We take 1.2GB file from HDFS
  *   We divide it into rows/lines
  *   Due to performance limitations we use 4 splittexts one after another: divide flowfile into 5M rows, after that into flowfiles containing 500k rows, 50k rows and at the end 5k rows.

We verified the same problem exists when using only one splittext component and smalle input flows (e.g. ~5000 rows on input divided into flows having 200 rows is also not working)

At the end we expect flowfiles having 5k rows:
[cid:image001.png@01D71CCC.9B7D8680]


Result:
currently flowfiles are divided into fragments (sometimes it is even 1,5row instead of 5k rows) and it is even sometimes dividing in the middle of row:

example #1 of output flow:

[cid:image002.png@01D71CCC.9B7D8680]

Example #2 of output flow:
[cid:image003.png@01D71CCC.9B7D8680]




Regards,
           Michał
________________________________________ Uwaga: Treść niniejszej wiadomości może być poufna i objęta zakazem jej ujawniania. Jeśli czytelnik tej wiadomości nie jest jej zamierzonym adresatem, pracownikiem lub pośrednikiem upoważnionym do jej przekazania adresatowi, informujemy że wszelkie rozprowadzanie, rozpowszechnianie lub powielanie niniejszej wiadomości jest zabronione. Jeśli otrzymałeś tę wiadomość omyłkowo, proszę bezzwłocznie odesłać ją nadawcy, a samą wiadomość usunąć z komputera. Dziękujemy. ________________________________ Note: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.If you have received this communication in error, please notify the sender immediately by replying to the message and deleting it from your computer. Thank you. ________________________________

Re: SplitText issue

Posted by Joe Witt <jo...@gmail.com>.
Michal - It is on track to be available within about 3 hours.

On Fri, Mar 19, 2021 at 7:35 AM Michal Tomaszewski
<Mi...@cca.pl> wrote:
>
> Thanks!
>
> Regards,
>
> Michal
>
>
>
> From: Mark Payne <ma...@hotmail.com>
> Sent: Friday, March 19, 2021 3:01 PM
> To: users@nifi.apache.org
> Subject: Re: SplitText issue
>
>
>
> Michal,
>
>
>
> We are working on a 1.13.2 release currently that should address this.
>
>
>
> Thanks
>
> -Mark
>
>
>
>
>
> On Mar 19, 2021, at 9:31 AM, Michal Tomaszewski <Mi...@cca.pl> wrote:
>
>
>
> Hi,
>
>
>
> In version 13.1 and 14 there is a problem with SplitText. In 1.13.0 and 1.10 component works as expected. So problem is in any of fixes between 1.13 and 1.13.1,
>
>
>
> Test scenario:
>
> We take 1.2GB file from HDFS
> We divide it into rows/lines
> Due to performance limitations we use 4 splittexts one after another: divide flowfile into 5M rows, after that into flowfiles containing 500k rows, 50k rows and at the end 5k rows.
>
>
>
> We verified the same problem exists when using only one splittext component and smalle input flows (e.g. ~5000 rows on input divided into flows having 200 rows is also not working)
>
>
>
> At the end we expect flowfiles having 5k rows:
>
> <image001.png>
>
>
>
>
>
> Result:
>
> currently flowfiles are divided into fragments (sometimes it is even 1,5row instead of 5k rows) and it is even sometimes dividing in the middle of row:
>
>
>
> example #1 of output flow:
>
>
>
> <image002.png>
>
>
>
> Example #2 of output flow:
>
> <image003.png>
>
>
>
>
>
>
>
>
>
> Regards,
>
>            Michał
>
> ________________________________________ Uwaga: Treść niniejszej wiadomości może być poufna i objęta zakazem jej ujawniania. Jeśli czytelnik tej wiadomości nie jest jej zamierzonym adresatem, pracownikiem lub pośrednikiem upoważnionym do jej przekazania adresatowi, informujemy że wszelkie rozprowadzanie, rozpowszechnianie lub powielanie niniejszej wiadomości jest zabronione. Jeśli otrzymałeś tę wiadomość omyłkowo, proszę bezzwłocznie odesłać ją nadawcy, a samą wiadomość usunąć z komputera. Dziękujemy. ________________________________ Note: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.If you have received this communication in error, please notify the sender immediately by replying to the message and deleting it from your computer. Thank you. ________________________________

RE: SplitText issue

Posted by Michal Tomaszewski <Mi...@cca.pl>.
Thanks!
Regards,
Michal

From: Mark Payne <ma...@hotmail.com>
Sent: Friday, March 19, 2021 3:01 PM
To: users@nifi.apache.org
Subject: Re: SplitText issue

Michal,

We are working on a 1.13.2 release currently that should address this.

Thanks
-Mark



On Mar 19, 2021, at 9:31 AM, Michal Tomaszewski <Mi...@cca.pl>> wrote:

Hi,

In version 13.1 and 14 there is a problem with SplitText. In 1.13.0 and 1.10 component works as expected. So problem is in any of fixes between 1.13 and 1.13.1,

Test scenario:

  *   We take 1.2GB file from HDFS
  *   We divide it into rows/lines
  *   Due to performance limitations we use 4 splittexts one after another: divide flowfile into 5M rows, after that into flowfiles containing 500k rows, 50k rows and at the end 5k rows.

We verified the same problem exists when using only one splittext component and smalle input flows (e.g. ~5000 rows on input divided into flows having 200 rows is also not working)

At the end we expect flowfiles having 5k rows:
<image001.png>


Result:
currently flowfiles are divided into fragments (sometimes it is even 1,5row instead of 5k rows) and it is even sometimes dividing in the middle of row:

example #1 of output flow:

<image002.png>

Example #2 of output flow:
<image003.png>




Regards,
           Michał
________________________________________ Uwaga: Treść niniejszej wiadomości może być poufna i objęta zakazem jej ujawniania. Jeśli czytelnik tej wiadomości nie jest jej zamierzonym adresatem, pracownikiem lub pośrednikiem upoważnionym do jej przekazania adresatowi, informujemy że wszelkie rozprowadzanie, rozpowszechnianie lub powielanie niniejszej wiadomości jest zabronione. Jeśli otrzymałeś tę wiadomość omyłkowo, proszę bezzwłocznie odesłać ją nadawcy, a samą wiadomość usunąć z komputera. Dziękujemy. ________________________________ Note: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.If you have received this communication in error, please notify the sender immediately by replying to the message and deleting it from your computer. Thank you. ________________________________

Re: SplitText issue

Posted by Mark Payne <ma...@hotmail.com>.
Michal,

We are working on a 1.13.2 release currently that should address this.

Thanks
-Mark


On Mar 19, 2021, at 9:31 AM, Michal Tomaszewski <Mi...@cca.pl>> wrote:

Hi,

In version 13.1 and 14 there is a problem with SplitText. In 1.13.0 and 1.10 component works as expected. So problem is in any of fixes between 1.13 and 1.13.1,

Test scenario:

  *   We take 1.2GB file from HDFS
  *   We divide it into rows/lines
  *   Due to performance limitations we use 4 splittexts one after another: divide flowfile into 5M rows, after that into flowfiles containing 500k rows, 50k rows and at the end 5k rows.


We verified the same problem exists when using only one splittext component and smalle input flows (e.g. ~5000 rows on input divided into flows having 200 rows is also not working)

At the end we expect flowfiles having 5k rows:
<image001.png>


Result:
currently flowfiles are divided into fragments (sometimes it is even 1,5row instead of 5k rows) and it is even sometimes dividing in the middle of row:

example #1 of output flow:

<image002.png>

Example #2 of output flow:
<image003.png>




Regards,
           Michał
________________________________________ Uwaga: Treść niniejszej wiadomości może być poufna i objęta zakazem jej ujawniania. Jeśli czytelnik tej wiadomości nie jest jej zamierzonym adresatem, pracownikiem lub pośrednikiem upoważnionym do jej przekazania adresatowi, informujemy że wszelkie rozprowadzanie, rozpowszechnianie lub powielanie niniejszej wiadomości jest zabronione. Jeśli otrzymałeś tę wiadomość omyłkowo, proszę bezzwłocznie odesłać ją nadawcy, a samą wiadomość usunąć z komputera. Dziękujemy. ________________________________ Note: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.If you have received this communication in error, please notify the sender immediately by replying to the message and deleting it from your computer. Thank you. ________________________________