You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Charles Givre <cg...@gmail.com> on 2017/10/18 21:06:25 UTC

Re: describe query support? (catalog metadata, etc)

I’d like to second Alfredo’s request.  I’ve been trying to get Drill to work with some open source visualization tools such as SqlPad and Metabase and the issue I keep running into is that Drill doesn’t have a convenient way to describe how it interprets flat files.  This is really frustrating for me since this is my main use of Drill!  
I wish the SELECT * FROM <data> LIMIT 0 worked in the RESTFul interface.  In any event, would be very useful to have some way to get Drill to describe how it will interpret a flat file.
— C


> On Oct 18, 2017, at 15:20, Chun Chang <cc...@mapr.com> wrote:
> 
> There were discussions on the need of building a catalog for drill. But I don't think that's the focus right now. And I am not sure the community will ever decide to go in that direction. For now, you best bet is to create views on top of your JSON/CSV data.
> 
> ________________________________
> From: Alfredo Serafini <se...@gmail.com>
> Sent: Wednesday, October 18, 2017 8:31:15 AM
> To: user@drill.apache.org
> Subject: describe query support? (catalog metadata, etc)
> 
> Hi I'm experimenting using Drill as a data virtualization component via
> JDBC and it generally works great for my needs.
> 
> However some of the components connected via JDBC needs basic
> metadata/catalog informations, and they seems to be missing for JSON / CSV
> sources.
> 
> For example the simple query
> 
> DESCRIBE cp.`employee.json`;
> 
> returns no results.
> 
> Another possible example case could be when reading from an sqlite source
> containing the same data on an `employees` table
> DESCRIBE `emploees`
> 
> and still get no information: while this command is not directly supported
> in SQLite, an equivalent one could be for instance:
> PRAGMA table_info(`employees`);
> 
> but trying to execute it in Drill is not possible, as it is beyond the
> supported standard SQL dialect.
> 
> Moreover using a query like:
> SELECT *
> FROM INFORMATION_SCHEMA.COLUMNS
> WHERE (TABLE_NAME='employees_view');
> 
> on a view from the same data, seems to return the informations, so I
> suppose there should be a way to pass those informations to an
> internal *DatabaseMetaData
> <https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html>*
> implementation.
> I wonder if there is such a component designed to manage all the catalog
> informations for different sources?
> 
> In this case it could adopt different strategies for retrieving metadata,
> depending on the case: for sqlite a different command / dialect could be
> used, for CSV types could be guessed using simple heuristics, and so on.
> Probably cases like JSON would be much more complex, anyway.
> Once the metadata have been retrieved for a source, I suppose the standard
> SQL dialect should work as expected.
> 
> 
> Are there any plans to add catalog metadata support for various sources?
> Does anybody have some workaround? for example using views or similar
> approaches?
> 
> 
> thanks in advance, sorry if the message is too long :-)
> Alfredo