You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Charles Givre <cg...@gmail.com> on 2017/03/26 20:18:29 UTC

Format Plugin Question

Hello all, 
I’m working on a format plugin for a filetype that will have a mix of Strings and nested fields.  Basically something like this:

field1:  String
field2:  Array
etc…
My preference is to keep the nested data in the nested format rather than de-nest it, but I suppose that is always an option. 

I’ve gotten the format plugin to write Strings to the Drill buffer, but I’m not quite sure how to get it to write an Array or Map.  I’ve found the Map and List writer objects, but I’m not quite sure how to use them in this context.  Are there any examples that someone could point me to, or could someone explain how this can be done?  
Thanks,
— C

Re: Format Plugin Question

Posted by Paul Rogers <pr...@mapr.com>.
Hi Charles,

Did a bit of snooping in the code. Looks like you want the subclasses of the ListWriter interface. This interface provides startList() and endList() methods to manage lists.

Similarly, you’ll want the MapWriter interface for maps. It provides name-based access to the vectors within the map.

Each of these has a zillion subclasses, so it seems you need to use the right one for each data type.

- Paul

On Mar 26, 2017, at 9:33 PM, Paul Rogers <pr...@mapr.com>> wrote:

Hi Charles,

You asked three questions.

* How do we write arrays?
* How do we write maps?
* What tools are available in the code to help?

Let’s start with maps because I happen to be mucking with those at the moment. A map in Drill is really just a nested record, it is not a map like you’d find in Java or Python. [1] is a conceptual write-up of how maps work in Drill.

To write to a map, you first create a map vector per record batch. The map is a container of vectors for each member. The trick here is to realize that a Drill map is not an independent collection of name/value pairs per record. It is instead a single collection of vectors shared by ALL records in a batch. That is, in Drill, a map is a nested record (tuple), not really a map in the classic sense. Once you create your vector for your map member, you can use it just like a top-level vector.

Array vectors are just like other vectors: there is one vector for the entire record batch. Arrays have an extra twist: an indirection vector that points to the first entry for each record. All values from your field2 go into that single array; with the indirection vector having an entry per record that points to the start of that record’s values. (The number of values is found by taking the difference between the entry for record i+1 and that for record i.)

The code does provide vector readers and writers, but I’m not very familiar with them.

The best place to see this in action is the JSON record reader, specifically the JsonReader class.

Perhaps others can provide better, more concrete suggestions.

Thanks,

- Paul

[1] https://github.com/paul-rogers/drill/wiki/Drill-Maps


On Mar 26, 2017, at 1:18 PM, Charles Givre <cg...@gmail.com>> wrote:

Hello all,
I’m working on a format plugin for a filetype that will have a mix of Strings and nested fields.  Basically something like this:

field1:  String
field2:  Array
etc…
My preference is to keep the nested data in the nested format rather than de-nest it, but I suppose that is always an option.

I’ve gotten the format plugin to write Strings to the Drill buffer, but I’m not quite sure how to get it to write an Array or Map.  I’ve found the Map and List writer objects, but I’m not quite sure how to use them in this context.  Are there any examples that someone could point me to, or could someone explain how this can be done?
Thanks,
— C



Re: Format Plugin Question

Posted by Paul Rogers <pr...@mapr.com>.
Hi Charles,

You asked three questions.

* How do we write arrays?
* How do we write maps?
* What tools are available in the code to help?

Let’s start with maps because I happen to be mucking with those at the moment. A map in Drill is really just a nested record, it is not a map like you’d find in Java or Python. [1] is a conceptual write-up of how maps work in Drill.

To write to a map, you first create a map vector per record batch. The map is a container of vectors for each member. The trick here is to realize that a Drill map is not an independent collection of name/value pairs per record. It is instead a single collection of vectors shared by ALL records in a batch. That is, in Drill, a map is a nested record (tuple), not really a map in the classic sense. Once you create your vector for your map member, you can use it just like a top-level vector.

Array vectors are just like other vectors: there is one vector for the entire record batch. Arrays have an extra twist: an indirection vector that points to the first entry for each record. All values from your field2 go into that single array; with the indirection vector having an entry per record that points to the start of that record’s values. (The number of values is found by taking the difference between the entry for record i+1 and that for record i.)

The code does provide vector readers and writers, but I’m not very familiar with them.

The best place to see this in action is the JSON record reader, specifically the JsonReader class.

Perhaps others can provide better, more concrete suggestions.

Thanks,

- Paul

[1] https://github.com/paul-rogers/drill/wiki/Drill-Maps


On Mar 26, 2017, at 1:18 PM, Charles Givre <cg...@gmail.com>> wrote:

Hello all,
I’m working on a format plugin for a filetype that will have a mix of Strings and nested fields.  Basically something like this:

field1:  String
field2:  Array
etc…
My preference is to keep the nested data in the nested format rather than de-nest it, but I suppose that is always an option.

I’ve gotten the format plugin to write Strings to the Drill buffer, but I’m not quite sure how to get it to write an Array or Map.  I’ve found the Map and List writer objects, but I’m not quite sure how to use them in this context.  Are there any examples that someone could point me to, or could someone explain how this can be done?
Thanks,
— C