You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Flavio Pompermaier <po...@okkam.it> on 2018/02/19 14:50:04 UTC

Fixed-width files

Hi to all,
I'm currently looking for the best solution to load a fixed-width text file
into Drill.
Is there any way right now to do that? Is there anyone that already have a
working connector?
Is it better to implement a brand new FormatPluginConfig or
StoragePluginConfig?

Best,
Flavio

Re: RE: Fixed-width files

Posted by Flavio Pompermaier <po...@okkam.it>.
For the moment I've created an improvement issue about this:
https://issues.apache.org/jira/browse/DRILL-6170

On Tue, Feb 20, 2018 at 9:23 AM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Thanks Paul for this suggestion, I think I'm going to give it a try.
> Once I've created my EasyFormatPlugin where should I put the produced jar?
> in which folder within jars directory?
>
> On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers <pa...@yahoo.com.invalid>
> wrote:
>
>> It may be that by "fixed width text", Flavio means a file in which the
>> text columns are of fixed width: kind of like old-school punch cards.
>> Drill has no reader for this use case, but if you are a Java programmer,
>> you can create one. See Drill Pull Request #1114 [1] for one example of a
>> regex reader along with pointers to a second example I'm building for a
>> book. Should be easy to adopt this code to take a list of column widths in
>> place of the regex. Actually, you could use the regex with a pattern that
>> just picks out a fixed number of characters.
>> Thanks,
>> - Paul
>>
>> [1]  https://github.com/apache/drill/pull/1114
>>
>>
>>
>>
>>     On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua <
>> kkhatua@mapr.com> wrote:
>>
>>  As long as you have delimiters, you should be able to import it as a
>> regular CSV file. Using views that define the fixed-width nature should
>> help operators downstream work more efficiently.
>>
>> -----Original Message-----
>> From: Flavio Pompermaier [mailto:pompermaier@okkam.it]
>> Sent: Monday, February 19, 2018 6:50 AM
>> To: user@drill.apache.org
>> Subject: Fixed-width files
>>
>> Hi to all,
>> I'm currently looking for the best solution to load a fixed-width text
>> file into Drill.
>> Is there any way right now to do that? Is there anyone that already have
>> a working connector?
>> Is it better to implement a brand new FormatPluginConfig or
>> StoragePluginConfig?
>>
>> Best,
>> Flavio
>>
>>
>

Re: Fixed-width files

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Flavio,
Great question! I've not yet experimented with the solution myself, but I believe that the plugin can be placed into a jar, along with the needed Drill config file, and then placed into the jars/3rd-party directory if you keep your config information in the Drill product directory. Perhaps Charles can offer more details based on his experience.
You may find it more convenient to use the "site" directory added in Drill 1.8. With a site directory, you separate your config files and custom jars from the Drill product files. Launch drill with the "--site" flag:
> drillbit.sh --site /my/site/dir start
For convenience, you can set the DRILL_SITE_DIR env var instead of using the --site flag.
If using a site directory, put your jar in the "jars" folder.
All that said, while you develop your plugin, you'll want to put the sources inside the Drill java-exec project. Why? Doing so allows you to very rapidly build and debug your library using your favorite IDE. The test file mentioned in the PR shows how to use the test framework to run a query, start an in-process Drillbit, and immediately step through (or set breakpoints in) your plugin code.
If you build the plugin as a jar file, then for each edit/compile/debug cycle, you'll need to build your jar, copy it to the proper location, restart the Drill server, attach the remote debugger, start a client tool, and finally submit a query. This works, but is quite slow; the above technique is faster for us impatient types...
Once the storage plugin works, then you can move the code to a new project from which you can build and deploy your jar.
Or, you can do as Charles did: offer your plugin to the Drill project via a PR so others can use it.
Thanks,
- Paul

 

    On Tuesday, February 20, 2018, 12:24:10 AM PST, Flavio Pompermaier <po...@okkam.it> wrote:  
 
 Thanks Paul for this suggestion, I think I'm going to give it a try.
Once I've created my EasyFormatPlugin where should I put the produced jar?
in which folder within jars directory?

On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> It may be that by "fixed width text", Flavio means a file in which the
> text columns are of fixed width: kind of like old-school punch cards.
> Drill has no reader for this use case, but if you are a Java programmer,
> you can create one. See Drill Pull Request #1114 [1] for one example of a
> regex reader along with pointers to a second example I'm building for a
> book. Should be easy to adopt this code to take a list of column widths in
> place of the regex. Actually, you could use the regex with a pattern that
> just picks out a fixed number of characters.
> Thanks,
> - Paul
>
> [1]  https://github.com/apache/drill/pull/1114
>
>
>
>
>    On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua <
> kkhatua@mapr.com> wrote:
>
>  As long as you have delimiters, you should be able to import it as a
> regular CSV file. Using views that define the fixed-width nature should
> help operators downstream work more efficiently.
>
> -----Original Message-----
> From: Flavio Pompermaier [mailto:pompermaier@okkam.it]
> Sent: Monday, February 19, 2018 6:50 AM
> To: user@drill.apache.org
> Subject: Fixed-width files
>
> Hi to all,
> I'm currently looking for the best solution to load a fixed-width text
> file into Drill.
> Is there any way right now to do that? Is there anyone that already have a
> working connector?
> Is it better to implement a brand new FormatPluginConfig or
> StoragePluginConfig?
>
> Best,
> Flavio
>
>
  

Re: RE: Fixed-width files

Posted by Flavio Pompermaier <po...@okkam.it>.
Thanks Paul for this suggestion, I think I'm going to give it a try.
Once I've created my EasyFormatPlugin where should I put the produced jar?
in which folder within jars directory?

On Tue, Feb 20, 2018 at 2:57 AM, Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> It may be that by "fixed width text", Flavio means a file in which the
> text columns are of fixed width: kind of like old-school punch cards.
> Drill has no reader for this use case, but if you are a Java programmer,
> you can create one. See Drill Pull Request #1114 [1] for one example of a
> regex reader along with pointers to a second example I'm building for a
> book. Should be easy to adopt this code to take a list of column widths in
> place of the regex. Actually, you could use the regex with a pattern that
> just picks out a fixed number of characters.
> Thanks,
> - Paul
>
> [1]  https://github.com/apache/drill/pull/1114
>
>
>
>
>     On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua <
> kkhatua@mapr.com> wrote:
>
>  As long as you have delimiters, you should be able to import it as a
> regular CSV file. Using views that define the fixed-width nature should
> help operators downstream work more efficiently.
>
> -----Original Message-----
> From: Flavio Pompermaier [mailto:pompermaier@okkam.it]
> Sent: Monday, February 19, 2018 6:50 AM
> To: user@drill.apache.org
> Subject: Fixed-width files
>
> Hi to all,
> I'm currently looking for the best solution to load a fixed-width text
> file into Drill.
> Is there any way right now to do that? Is there anyone that already have a
> working connector?
> Is it better to implement a brand new FormatPluginConfig or
> StoragePluginConfig?
>
> Best,
> Flavio
>
>

Re: RE: Fixed-width files

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
It may be that by "fixed width text", Flavio means a file in which the text columns are of fixed width: kind of like old-school punch cards.
Drill has no reader for this use case, but if you are a Java programmer, you can create one. See Drill Pull Request #1114 [1] for one example of a regex reader along with pointers to a second example I'm building for a book. Should be easy to adopt this code to take a list of column widths in place of the regex. Actually, you could use the regex with a pattern that just picks out a fixed number of characters.
Thanks,
- Paul

[1]  https://github.com/apache/drill/pull/1114


 

    On Monday, February 19, 2018, 12:52:42 PM PST, Kunal Khatua <kk...@mapr.com> wrote:  
 
 As long as you have delimiters, you should be able to import it as a regular CSV file. Using views that define the fixed-width nature should help operators downstream work more efficiently. 

-----Original Message-----
From: Flavio Pompermaier [mailto:pompermaier@okkam.it] 
Sent: Monday, February 19, 2018 6:50 AM
To: user@drill.apache.org
Subject: Fixed-width files

Hi to all,
I'm currently looking for the best solution to load a fixed-width text file into Drill.
Is there any way right now to do that? Is there anyone that already have a working connector?
Is it better to implement a brand new FormatPluginConfig or StoragePluginConfig?

Best,
Flavio
  

Re: Fixed-width files

Posted by Flavio Pompermaier <po...@okkam.it>.
Do you have any real example of this (apart the one reported at [1])?

[1] https://drill.apache.org/docs/text-files-csv-tsv-psv/

On Mon, Feb 19, 2018 at 9:52 PM, Kunal Khatua <kk...@mapr.com> wrote:

> As long as you have delimiters, you should be able to import it as a
> regular CSV file. Using views that define the fixed-width nature should
> help operators downstream work more efficiently.
>
> -----Original Message-----
> From: Flavio Pompermaier [mailto:pompermaier@okkam.it]
> Sent: Monday, February 19, 2018 6:50 AM
> To: user@drill.apache.org
> Subject: Fixed-width files
>
> Hi to all,
> I'm currently looking for the best solution to load a fixed-width text
> file into Drill.
> Is there any way right now to do that? Is there anyone that already have a
> working connector?
> Is it better to implement a brand new FormatPluginConfig or
> StoragePluginConfig?
>
> Best,
> Flavio
>



-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809

RE: Fixed-width files

Posted by Kunal Khatua <kk...@mapr.com>.
As long as you have delimiters, you should be able to import it as a regular CSV file. Using views that define the fixed-width nature should help operators downstream work more efficiently. 

-----Original Message-----
From: Flavio Pompermaier [mailto:pompermaier@okkam.it] 
Sent: Monday, February 19, 2018 6:50 AM
To: user@drill.apache.org
Subject: Fixed-width files

Hi to all,
I'm currently looking for the best solution to load a fixed-width text file into Drill.
Is there any way right now to do that? Is there anyone that already have a working connector?
Is it better to implement a brand new FormatPluginConfig or StoragePluginConfig?

Best,
Flavio