You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by shweta <sh...@gmail.com> on 2016/02/03 06:10:02 UTC

ConvertCSVtoAvro | support for "||" delimiter

Hi All,

It seems "ConvertCSVtoAvro" only support single character as delimiter in
Nifi. Is there a way to specify "||"
delimiter.

Thanks,
Shweta



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: ConvertCSVtoAvro | support for "||" delimiter

Posted by Joe Percivall <jo...@yahoo.com.INVALID>.
A more direct work-around would be to use the ReplaceText processor first in order to change instances of "||" to "|" so that it can be used by ConvertCSVtoAvro.
 
Joe- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com




On Thursday, February 4, 2016 9:50 AM, Joe Witt <jo...@gmail.com> wrote:
Not a direct answer but:
  With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
have a great option in scripting (Lua, Python, Ruby, Groovy,
Javascript) that will let you rapidly get past these hurdles without
having to build your own custom processor until you are sure what you
need.

Thanks
Joe


On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <tr...@gmail.com> wrote:
> With that processor alone it doesn't appear so. The validator for that
> property requires it to be one character.
> On Feb 3, 2016 1:01 AM, "shweta" <sh...@gmail.com> wrote:
>
>> Hi All,
>>
>> It seems "ConvertCSVtoAvro" only support single character as delimiter in
>> Nifi. Is there a way to specify "||"
>> delimiter.
>>
>> Thanks,
>> Shweta
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.
>>

Re: ConvertCSVtoAvro | support for "||" delimiter

Posted by Thad Guidry <th...@gmail.com>.
Non Printable characters are also great to use for this usecase (The ASCII
Control Characters which were designed for exactly this in the early days
of computing!)  (Just copy and paste from an editor or Notepad... on
Windows you can get Char 2 by holding down ALT and then using Numeric
Keypad to type 002 )

CHAR 2 (traditionally the Start Of Text or STX) is a great delimiter.
 \u0002

http://www.fileformat.info/info/unicode/char/0002/index.htm

Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>

Re: ConvertCSVtoAvro | support for "||" delimiter

Posted by Tony Kurc <tr...@gmail.com>.
I believe the processor supports \uXXXX notation for a delimiter as part of
a PR for 0.4.X.

On Thu, Feb 4, 2016 at 4:11 PM, Ryan Blue <bl...@cloudera.com> wrote:

> I didn't know there was a unit separator character, thanks for the
> suggestion. I think I have a lot of ☃ to replace.
>
> If you can paste the unit separator character in, then it should work. The
> underlying code supports escape sequences, like \t, but the validation
> doesn't take those into account yet. That would be a good starter
> contribution for someone out there...
>
> rb
>
>
> On 02/04/2016 12:39 PM, Alan Jackoway wrote:
>
>> Though I love the concept of ☃ as your separator, my belief is that the
>> correct way to do this to replace your custom delimiter with the ones that
>> are defined in ASCII (and therefore extremely unlikely to appear in your
>> data): https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text
>>
>> That said, I have not actually tried this with NiFi, so I don't know how
>> easy it is to specify ASCII character 31 as your separator in the UI.
>>
>> On Thu, Feb 4, 2016 at 2:34 PM, Ryan Blue <bl...@cloudera.com> wrote:
>>
>> The underlying CSV library only supports a single-character delimiter, so
>>> it would be a bit of work to allow multi-char delimiters. Another
>>> solution
>>> is to use | as your delimiter and simply account for that in your file
>>> header. Everything is mapped by name, so you'd just have a bunch of
>>> columns
>>> named "" and it should work fine otherwise.
>>>
>>> That may not work if your delimiter is || because you might have | in
>>> your
>>> data, though. If that's the case, then I'd go with the suggestion from
>>> Joe
>>> to replace "||" with a single-character delimiter that you won't see in
>>> the
>>> data, like ☃.
>>>
>>> rb
>>>
>>>
>>> On 02/04/2016 06:50 AM, Joe Witt wrote:
>>>
>>> Not a direct answer but:
>>>>     With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
>>>> have a great option in scripting (Lua, Python, Ruby, Groovy,
>>>> Javascript) that will let you rapidly get past these hurdles without
>>>> having to build your own custom processor until you are sure what you
>>>> need.
>>>>
>>>> Thanks
>>>> Joe
>>>>
>>>> On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <tr...@gmail.com> wrote:
>>>>
>>>> With that processor alone it doesn't appear so. The validator for that
>>>>> property requires it to be one character.
>>>>> On Feb 3, 2016 1:01 AM, "shweta" <sh...@gmail.com> wrote:
>>>>>
>>>>> Hi All,
>>>>>
>>>>>>
>>>>>> It seems "ConvertCSVtoAvro" only support single character as delimiter
>>>>>> in
>>>>>> Nifi. Is there a way to specify "||"
>>>>>> delimiter.
>>>>>>
>>>>>> Thanks,
>>>>>> Shweta
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>>
>>>>>>
>>>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>>>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>>
>>>>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Cloudera, Inc.
>>>
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>

Re: ConvertCSVtoAvro | support for "||" delimiter

Posted by Ryan Blue <bl...@cloudera.com>.
I didn't know there was a unit separator character, thanks for the 
suggestion. I think I have a lot of ☃ to replace.

If you can paste the unit separator character in, then it should work. 
The underlying code supports escape sequences, like \t, but the 
validation doesn't take those into account yet. That would be a good 
starter contribution for someone out there...

rb

On 02/04/2016 12:39 PM, Alan Jackoway wrote:
> Though I love the concept of ☃ as your separator, my belief is that the
> correct way to do this to replace your custom delimiter with the ones that
> are defined in ASCII (and therefore extremely unlikely to appear in your
> data): https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text
>
> That said, I have not actually tried this with NiFi, so I don't know how
> easy it is to specify ASCII character 31 as your separator in the UI.
>
> On Thu, Feb 4, 2016 at 2:34 PM, Ryan Blue <bl...@cloudera.com> wrote:
>
>> The underlying CSV library only supports a single-character delimiter, so
>> it would be a bit of work to allow multi-char delimiters. Another solution
>> is to use | as your delimiter and simply account for that in your file
>> header. Everything is mapped by name, so you'd just have a bunch of columns
>> named "" and it should work fine otherwise.
>>
>> That may not work if your delimiter is || because you might have | in your
>> data, though. If that's the case, then I'd go with the suggestion from Joe
>> to replace "||" with a single-character delimiter that you won't see in the
>> data, like ☃.
>>
>> rb
>>
>>
>> On 02/04/2016 06:50 AM, Joe Witt wrote:
>>
>>> Not a direct answer but:
>>>     With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
>>> have a great option in scripting (Lua, Python, Ruby, Groovy,
>>> Javascript) that will let you rapidly get past these hurdles without
>>> having to build your own custom processor until you are sure what you
>>> need.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <tr...@gmail.com> wrote:
>>>
>>>> With that processor alone it doesn't appear so. The validator for that
>>>> property requires it to be one character.
>>>> On Feb 3, 2016 1:01 AM, "shweta" <sh...@gmail.com> wrote:
>>>>
>>>> Hi All,
>>>>>
>>>>> It seems "ConvertCSVtoAvro" only support single character as delimiter
>>>>> in
>>>>> Nifi. Is there a way to specify "||"
>>>>> delimiter.
>>>>>
>>>>> Thanks,
>>>>> Shweta
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>>
>>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Cloudera, Inc.
>>
>


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: ConvertCSVtoAvro | support for "||" delimiter

Posted by Alan Jackoway <al...@cloudera.com>.
Though I love the concept of ☃ as your separator, my belief is that the
correct way to do this to replace your custom delimiter with the ones that
are defined in ASCII (and therefore extremely unlikely to appear in your
data): https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text

That said, I have not actually tried this with NiFi, so I don't know how
easy it is to specify ASCII character 31 as your separator in the UI.

On Thu, Feb 4, 2016 at 2:34 PM, Ryan Blue <bl...@cloudera.com> wrote:

> The underlying CSV library only supports a single-character delimiter, so
> it would be a bit of work to allow multi-char delimiters. Another solution
> is to use | as your delimiter and simply account for that in your file
> header. Everything is mapped by name, so you'd just have a bunch of columns
> named "" and it should work fine otherwise.
>
> That may not work if your delimiter is || because you might have | in your
> data, though. If that's the case, then I'd go with the suggestion from Joe
> to replace "||" with a single-character delimiter that you won't see in the
> data, like ☃.
>
> rb
>
>
> On 02/04/2016 06:50 AM, Joe Witt wrote:
>
>> Not a direct answer but:
>>    With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
>> have a great option in scripting (Lua, Python, Ruby, Groovy,
>> Javascript) that will let you rapidly get past these hurdles without
>> having to build your own custom processor until you are sure what you
>> need.
>>
>> Thanks
>> Joe
>>
>> On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <tr...@gmail.com> wrote:
>>
>>> With that processor alone it doesn't appear so. The validator for that
>>> property requires it to be one character.
>>> On Feb 3, 2016 1:01 AM, "shweta" <sh...@gmail.com> wrote:
>>>
>>> Hi All,
>>>>
>>>> It seems "ConvertCSVtoAvro" only support single character as delimiter
>>>> in
>>>> Nifi. Is there a way to specify "||"
>>>> delimiter.
>>>>
>>>> Thanks,
>>>> Shweta
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>> Nabble.com.
>>>>
>>>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>

Re: ConvertCSVtoAvro | support for "||" delimiter

Posted by Ryan Blue <bl...@cloudera.com>.
The underlying CSV library only supports a single-character delimiter, 
so it would be a bit of work to allow multi-char delimiters. Another 
solution is to use | as your delimiter and simply account for that in 
your file header. Everything is mapped by name, so you'd just have a 
bunch of columns named "" and it should work fine otherwise.

That may not work if your delimiter is || because you might have | in 
your data, though. If that's the case, then I'd go with the suggestion 
from Joe to replace "||" with a single-character delimiter that you 
won't see in the data, like ☃.

rb

On 02/04/2016 06:50 AM, Joe Witt wrote:
> Not a direct answer but:
>    With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
> have a great option in scripting (Lua, Python, Ruby, Groovy,
> Javascript) that will let you rapidly get past these hurdles without
> having to build your own custom processor until you are sure what you
> need.
>
> Thanks
> Joe
>
> On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <tr...@gmail.com> wrote:
>> With that processor alone it doesn't appear so. The validator for that
>> property requires it to be one character.
>> On Feb 3, 2016 1:01 AM, "shweta" <sh...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> It seems "ConvertCSVtoAvro" only support single character as delimiter in
>>> Nifi. Is there a way to specify "||"
>>> delimiter.
>>>
>>> Thanks,
>>> Shweta
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>>> Sent from the Apache NiFi Developer List mailing list archive at
>>> Nabble.com.
>>>


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: ConvertCSVtoAvro | support for "||" delimiter

Posted by Joe Witt <jo...@gmail.com>.
Not a direct answer but:
  With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will
have a great option in scripting (Lua, Python, Ruby, Groovy,
Javascript) that will let you rapidly get past these hurdles without
having to build your own custom processor until you are sure what you
need.

Thanks
Joe

On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc <tr...@gmail.com> wrote:
> With that processor alone it doesn't appear so. The validator for that
> property requires it to be one character.
> On Feb 3, 2016 1:01 AM, "shweta" <sh...@gmail.com> wrote:
>
>> Hi All,
>>
>> It seems "ConvertCSVtoAvro" only support single character as delimiter in
>> Nifi. Is there a way to specify "||"
>> delimiter.
>>
>> Thanks,
>> Shweta
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.
>>

Re: ConvertCSVtoAvro | support for "||" delimiter

Posted by Tony Kurc <tr...@gmail.com>.
With that processor alone it doesn't appear so. The validator for that
property requires it to be one character.
On Feb 3, 2016 1:01 AM, "shweta" <sh...@gmail.com> wrote:

> Hi All,
>
> It seems "ConvertCSVtoAvro" only support single character as delimiter in
> Nifi. Is there a way to specify "||"
> delimiter.
>
> Thanks,
> Shweta
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>