You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2016/02/08 18:56:35 UTC

Dealing with files created in Windows

Are there any decent tricks for dealing with Windows based text files (that
use /r/n as the line ending rather than just /n)

Right now my last field has /r showing up, and I'd like to not have that
there, I guess I could regex_replace it maybe? I was hoping for a
performant way to handle (Without reprocessing either)

John

Re: Dealing with files created in Windows

Posted by John Omernik <jo...@omernik.com>.
No, I do not want to reprocess files. I am sorry for bluntness, but this
seems like something that shouldn't require Drill to require an outside ETL
process to prune the files. (The data I have is large, and it just seems
like a process prone to failure).



On Mon, Feb 8, 2016 at 12:07 PM, Abdel Hakim Deneche <ad...@maprtech.com>
wrote:

> is dos2unix an option ?
>
> On Mon, Feb 8, 2016 at 9:56 AM, John Omernik <jo...@omernik.com> wrote:
>
> > Are there any decent tricks for dealing with Windows based text files
> (that
> > use /r/n as the line ending rather than just /n)
> >
> > Right now my last field has /r showing up, and I'd like to not have that
> > there, I guess I could regex_replace it maybe? I was hoping for a
> > performant way to handle (Without reprocessing either)
> >
> > John
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Dealing with files created in Windows

Posted by John Omernik <jo...@omernik.com>.
Ya, I think I will use regex_replace once and have Drill preprocess things
(I think).  I am thinking that there are a number of JIRAs on this, and
that we should handle this better from the system level.



On Mon, Feb 8, 2016 at 12:23 PM, Nathan Griffith <ng...@dremio.com>
wrote:

> Was going to say my goto for this kind of issue is the 'tr' command in
> unix, but if I understand right you'd rather not have to preprocess,
> instead preferring an in-Drill solution.
>
> As I think you're hinting at, a Drill UDF tailored to the data might
> be one way to handle it.
>
> On Mon, Feb 8, 2016 at 10:07 AM, Abdel Hakim Deneche
> <ad...@maprtech.com> wrote:
> > is dos2unix an option ?
> >
> > On Mon, Feb 8, 2016 at 9:56 AM, John Omernik <jo...@omernik.com> wrote:
> >
> >> Are there any decent tricks for dealing with Windows based text files
> (that
> >> use /r/n as the line ending rather than just /n)
> >>
> >> Right now my last field has /r showing up, and I'd like to not have that
> >> there, I guess I could regex_replace it maybe? I was hoping for a
> >> performant way to handle (Without reprocessing either)
> >>
> >> John
> >>
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Re: Dealing with files created in Windows

Posted by Nathan Griffith <ng...@dremio.com>.
Was going to say my goto for this kind of issue is the 'tr' command in
unix, but if I understand right you'd rather not have to preprocess,
instead preferring an in-Drill solution.

As I think you're hinting at, a Drill UDF tailored to the data might
be one way to handle it.

On Mon, Feb 8, 2016 at 10:07 AM, Abdel Hakim Deneche
<ad...@maprtech.com> wrote:
> is dos2unix an option ?
>
> On Mon, Feb 8, 2016 at 9:56 AM, John Omernik <jo...@omernik.com> wrote:
>
>> Are there any decent tricks for dealing with Windows based text files (that
>> use /r/n as the line ending rather than just /n)
>>
>> Right now my last field has /r showing up, and I'd like to not have that
>> there, I guess I could regex_replace it maybe? I was hoping for a
>> performant way to handle (Without reprocessing either)
>>
>> John
>>
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Dealing with files created in Windows

Posted by Abdel Hakim Deneche <ad...@maprtech.com>.
is dos2unix an option ?

On Mon, Feb 8, 2016 at 9:56 AM, John Omernik <jo...@omernik.com> wrote:

> Are there any decent tricks for dealing with Windows based text files (that
> use /r/n as the line ending rather than just /n)
>
> Right now my last field has /r showing up, and I'd like to not have that
> there, I guess I could regex_replace it maybe? I was hoping for a
> performant way to handle (Without reprocessing either)
>
> John
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>