You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Charles Givre <cg...@gmail.com> on 2019/11/15 17:04:07 UTC

Re: Storage Plugin Assistance

Hi Igor, 
Thanks for the advice.  I've been doing some digging and am still pretty stuck here.  Can you recommend any techniques about how to debug the Jackson serialization/deserialization?  I added a unit test that serializes a query and then deserializes it and that test fails.  I've tracked this back to a constructor not receiving the plugin config and then throwing a NPE. What I can't seem to figure out is where that is being called from and why.

Any advice would be greatly appreciated.  Code can be found here: https://github.com/apache/drill/pull/1892 <https://github.com/apache/drill/pull/1892>
Thanks,
-- C


> On Oct 12, 2019, at 3:27 AM, Igor Guzenko <ih...@gmail.com> wrote:
> 
> Hello Charles,
> 
> Looks like you found another new issue. Maybe I explained unclear, but my
> previous suggestion wasn't about EXPLAIN PLAN construct, but rather:
> 1)  Use http client like Postman or simply browser to save response of
> requested rest service into json file
> 2)  Try to debug reading the file by Drill in order to compare how
> Calcite's conversion from AST SqlNode to RelNode tree differs for existing
> dfs storage plugin from same flow in your storage plugin.
> 
> From your last email I can figure out that exists another issue with class
> HttpGroupScan, at some point Drill tried to deserialize json into instance
> of HttpGroupScan and jackson library didn't find how to do this. Probably
> you missed some constructor with jackson metadata, for example see in
> HiveScan operator:
> 
> @JsonCreator
> public HiveScan(@JsonProperty("userName") final String userName,
>                @JsonProperty("hiveReadEntry") final HiveReadEntry
> hiveReadEntry,
>                @JsonProperty("hiveStoragePluginConfig") final
> HiveStoragePluginConfig hiveStoragePluginConfig,
>                @JsonProperty("columns") final List<SchemaPath> columns,
>                @JsonProperty("confProperties") final Map<String,
> String> confProperties,
>                @JacksonInject final StoragePluginRegistry
> pluginRegistry) throws ExecutionSetupException {
>  this(userName,
>      hiveReadEntry,
>      (HiveStoragePlugin) pluginRegistry.getPlugin(hiveStoragePluginConfig),
>      columns,
>      null, confProperties);
> }
> 
> Kind regards,
> Igor
> 
> 
> 
> On Fri, Oct 11, 2019 at 10:53 PM Charles Givre <cgivre@gmail.com <ma...@gmail.com>> wrote:
> 
>> Hi Igor,
>> Thanks for responding.  I'm not sure if this is what you intended, but
>> looked at the JSON for the Query plans and found something interesting.
>> For the SELECT * query, I'm getting the following when I try to run the
>> physical plan that it generates (without modification).  Do you think this
>> could be a related problem?
>> 
>> 
>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>> InvalidDefinitionException: Cannot construct instance of
>> `org.apache.drill.exec.store.http.HttpGroupScan` (no Creators, like default
>> construct, exist): cannot deserialize from Object value (no delegate- or
>> property-based Creator)
>> at [Source: (String)"{
>>  "head" : {
>>    "version" : 1,
>>    "generator" : {
>>      "type" : "ExplainHandler",
>>      "info" : ""
>>    },
>>    "type" : "APACHE_DRILL_PHYSICAL",
>>    "options" : [ ],
>>    "queue" : 0,
>>    "hasResourcePlan" : false,
>>    "resultMode" : "EXEC"
>>  },
>>  "graph" : [ {
>>    "pop" : "http-scan",
>>    "@id" : 2,
>>    "scanSpec" : {
>>      "uri" : "/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02"
>>    },
>>    "columns" : [ "`**`" ],
>>    "storageConfig" : {
>>      "type" : "http",
>>   "[truncated 766 chars]; line: 16, column: 5] (through reference chain:
>> org.apache.drill.exec.physical.PhysicalPlan["graph"]->java.util.ArrayList[0])
>> 
>> 
>> Please, refer to logs for more information.
>> 
>> [Error Id: 751b6d05-a631-4eca-9d83-162ab4fa839f on localhost:31010]
>> 
>> 
>>> On Oct 11, 2019, at 12:25 PM, Igor Guzenko <ih...@gmail.com>
>> wrote:
>>> 
>>> Hello Charles,
>>> 
>>> You got the error from Apache Calcite at the planning stage while
>>> converting SQLIdentifier to RexNode. From your stack trace the conversion
>>> starts here DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:685)
>> and
>>> goes to
>> SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3694). I
>>> would suggest to save json returned by rest locally as file and debug
>> same
>>> trace for query on the json file. So then you can find difference between
>>> conversion of sql identifier to rel for standart json reading and for
>> your
>>> storage plugin.
>>> 
>>> Thanks, Igor
>>> 
>>> 
>>> On Fri, Oct 11, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com <ma...@gmail.com> <mailto:
>> cgivre@gmail.com>> wrote:
>>> 
>>>> Hello all,
>>>> I decided to take the leap and attempt to implement a storage plugin.  I
>>>> found that a few people had started this, so I thought I'd complete a
>>>> simple generic HTTP/REST storage plugin. The use case would be to enrich
>>>> data sets with data that's available via public or internal APIs.
>>>> 
>>>> Anyway, I'm a little stuck and need some assistance.  i got the plugin
>> to
>>>> successfully execute a star query and return the results correctly:
>>>> 
>>>> apache drill> SELECT * FROM
>>>> http.`/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02`;
>>>> 
>>>> 
>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>> |  sunrise   |   sunset   | solar_noon  | day_length |
>>>> civil_twilight_begin | civil_twilight_end | nautical_twilight_begin |
>>>> nautical_twilight_end | astronomical_twilight_begin |
>>>> astronomical_twilight_end |
>>>> 
>>>> 
>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>> | 6:13:58 AM | 5:59:55 PM | 12:06:56 PM | 11:45:57   | 5:48:14 AM
>>>> | 6:25:38 PM         | 5:18:16 AM              | 6:55:36 PM            |
>>>> 4:48:07 AM                  | 7:25:45 PM                |
>>>> 
>>>> 
>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>> 1 row selected (0.392 seconds)
>>>> 
>>>> However, when I attempt to select individual fields i get errors.  (see
>>>> below for full stack trace).  I've walked through this with the
>> debugger,
>>>> but it seems like the code is breaking before it hits my storage plugin
>> and
>>>> I'm not sure what to do about it.  Here's a link to the code:
>>>> https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http <https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http>
>> <
>>>> https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http
>> <https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http>>
>>>> 
>>>> Any assistance would be greatly appreciated.  Thanks!!
>>>> 
>>>> 
>>>> 
>>>> apache drill> !verbose
>>>> verbose: on
>>>> apache drill> SELECT sunset FROM
>>>> http.`/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02`;
>>>> Error: SYSTEM ERROR: AssertionError: Field ordinal 1 is invalid for
>> type
>>>> '(DrillRecordRow[**])'
>>>> 
>>>> 
>>>> Please, refer to logs for more information.
>>>> 
>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>> (state=,code=0)
>>>> java.sql.SQLException: SYSTEM ERROR: AssertionError: Field ordinal 1 is
>>>> invalid for  type '(DrillRecordRow[**])'
>>>> 
>>>> 
>>>> Please, refer to logs for more information.
>>>> 
>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>>       at
>>>> 
>> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:538)
>>>>       at
>>>> 
>> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:610)
>>>>       at
>>>> 
>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1278)
>>>>       at
>>>> 
>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:58)
>>>>       at
>>>> 
>> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
>>>>       at
>>>> 
>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1102)
>>>>       at
>>>> 
>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1113)
>>>>       at
>>>> 
>> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
>>>>       at
>>>> 
>> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200)
>>>>       at
>>>> 
>> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>>>>       at
>>>> 
>> org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
>>>>       at sqlline.Commands.executeSingleQuery(Commands.java:1008)
>>>>       at sqlline.Commands.execute(Commands.java:957)
>>>>       at sqlline.Commands.sql(Commands.java:921)
>>>>       at sqlline.SqlLine.dispatch(SqlLine.java:717)
>>>>       at sqlline.SqlLine.begin(SqlLine.java:536)
>>>>       at sqlline.SqlLine.start(SqlLine.java:266)
>>>>       at sqlline.SqlLine.main(SqlLine.java:205)
>>>> Caused by: org.apache.drill.common.exceptions.UserRemoteException:
>> SYSTEM
>>>> ERROR: AssertionError: Field ordinal 1 is invalid for  type
>>>> '(DrillRecordRow[**])'
>>>> 
>>>> 
>>>> Please, refer to logs for more information.
>>>> 
>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>>       at
>>>> 
>> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>>>>       at
>>>> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
>>>>       at
>>>> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
>>>>       at
>>>> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
>>>>       at
>>>> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
>>>>       at
>>>> 
>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>       at
>>>> 
>> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>       at
>>>> 
>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>       at
>>>> 
>> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>>>>       at
>>>> 
>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>       at
>>>> 
>> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>       at
>>>> 
>> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>       at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>       at
>>>> 
>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>>>>       at
>>>> 
>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>>>>       at
>>>> 
>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>>>>       at
>>>> 
>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>>>>       at
>>>> 
>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>>>>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>>>>       at
>>>> 
>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
>>>>       at java.lang.Thread.run(Thread.java:748)
>>>> Caused by: org.apache.drill.exec.work.foreman.ForemanException:
>>>> Unexpected exception during fragment initialization: Field ordinal 1 is
>>>> invalid for  type '(DrillRecordRow[**])'
>>>>       at org.apache.drill.exec.work
>>>> .foreman.Foreman.run(Foreman.java:303)
>>>>       at .......(:0)
>>>> Caused by: java.lang.AssertionError: Field ordinal 1 is invalid for
>> type
>>>> '(DrillRecordRow[**])'
>>>>       at
>>>> org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197)
>>>>       at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3694)
>>>>       at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.access$2200(SqlToRelConverter.java:217)
>>>>       at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4765)
>>>>       at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4061)
>>>>       at
>>>> org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
>>>>       at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4625)
>>>>       at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3908)
>>>>       at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:670)
>>>>       at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
>>>>       at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3150)
>>>>       at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563)
>>>>       at
>>>> 
>> org.apache.drill.exec.planner.sql.SqlConverter.toRel(SqlConverter.java:414)
>>>>       at
>>>> 
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:685)
>>>>       at
>>>> 
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:202)
>>>>       at
>>>> 
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:172)
>>>>       at
>>>> 
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:226)
>>>>       at
>>>> 
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:124)
>>>>       at
>>>> 
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:90)
>>>>       at org.apache.drill.exec.work
>>>> .foreman.Foreman.runSQL(Foreman.java:591)
>>>>       at org.apache.drill.exec.work
>>>> .foreman.Foreman.run(Foreman.java:276)
>>>>       ... 1 more
>>>> apache drill>


Re: Storage Plugin Assistance

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Charles,

Looking at the code in your PR, it seems that you are, in fact, using Drill's JSON reader to decode the message JSON. (See [1]). Is that where you are having problems?

Looks like this reader handles JSON passed as a string or from a file? In either case, get a local copy of the JSON, then use the JsonReader directly. The JSON reader wants a container and other knick-knacks which you can create within a test that extends the SubOperatorTest. That framework gives you things like an allocator so you can create the vectors, allocate memory, and so on.

This code uses the old JSON reader, so it is going to be pretty fiddly. This reader would greatly benefit from the newer EVF-based JSON reader, but we can work on that later.

Thanks,

- Paul


[1] https://github.com/apache/drill/pull/1892/files#diff-59df95a0bedb082b25742242eef0bb9c


 

    On Friday, November 15, 2019, 12:40:22 PM PST, Charles Givre <cg...@gmail.com> wrote:  
 
 

> On Nov 15, 2019, at 1:39 PM, Paul Rogers <pa...@yahoo.com.INVALID> wrote:
> 
> Hi Charles,
> 
> A thought on debugging deserialization is to not do it in a query. Capture the JSON returned from a rest call. Write a simple unit test that deserializes that by itself from a string or file. Deserialization is a bit of a black art, and is really a problem separate from Drill itself.

So dumb non-dev question... How exactly do I do that?  I have SeDe unit test(s), but the query in question is failing in the first part of the unit test.

@Test
 public void testSerDe() throws Exception {
  String sql = "SELECT COUNT(*) FROM http.`/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02`";
  String plan = queryBuilder().sql(sql).explainJson();
  long cnt = queryBuilder().physical(plan).singletonLong();
  assertEquals("Counts should match",1L, cnt);
}


  

Re: Storage Plugin Assistance

Posted by Charles Givre <cg...@gmail.com>.

> On Nov 15, 2019, at 1:39 PM, Paul Rogers <pa...@yahoo.com.INVALID> wrote:
> 
> Hi Charles,
> 
> A thought on debugging deserialization is to not do it in a query. Capture the JSON returned from a rest call. Write a simple unit test that deserializes that by itself from a string or file. Deserialization is a bit of a black art, and is really a problem separate from Drill itself.

So dumb non-dev question... How exactly do I do that?  I have SeDe unit test(s), but the query in question is failing in the first part of the unit test.

@Test
 public void testSerDe() throws Exception {
  String sql = "SELECT COUNT(*) FROM http.`/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02`";
  String plan = queryBuilder().sql(sql).explainJson();
  long cnt = queryBuilder().physical(plan).singletonLong();
  assertEquals("Counts should match",1L, cnt);
}


> 
> As it turns out, for my "day job" I'm doing a POC using Drill to query SumoLogic. I took this as an opportunity to fill that gap you mentioned in our book: how to create a storage plugin. See [1]. This is a work in progress, but it has helped me build the planner-side stuff up to the batch reader, after which the work is identical to that for a format plugin.

YES!!  Awesome!   I know it is super involved, but simply documenting it will help a lot. Add that to the number of beers I owe you!


> 
> The Sumo API is REST-based, but for now I'm using the clunky REST client available in the Sumo public repo because of some unfortunate details of the Sumo REST service when used for this purpose. (Sumo returns data as a set of key/value pairs, not as a fixed JSON schema. [4])
> 
> Poking around elsewhere, it turns out someone wrote a very simple Presto connector for REST [2] using the Retrofit library from Square [3] which seems very simple to use. If we create a generic REST plugin, we might want to look at how it was done in Presto. Presto requires an up-front schema which Retrofit can provide. Drill, of course, does not require such a schema and so works with ad-hoc schemas, such as the one that Sumo's API provides. 
> 
> Actually, better than using a deserializer would be to use Drill's existing JSON parser to read data directly into value vectors. But, that existing code has lots of tech debt. I've been working on a PR for new version based on EVF, but that is a while off, and won't help us today.
> 
> It is interesting to note that neither the JSON reader, nor a generic REST API would work with the Sumo API because of is structure. I think the JSON reader would read an entire batch of Sumo results as a single record composed of a repeated Map, with elements being the key/value pairs. Not at all ideal.
> 
> So, both the JSON reader, and the REST API, should eventually handle data formats which are generic (name/value pairs) rather than expressed in the structure of JSON objects (as required by Jackson and Retrofit.) That is a topic for later, but is why the Sumo plugin has to be custom to Sumo's API for now.
> 
> 
> Thanks,
> - Paul
> 
> 
> [1] https://github.com/paul-rogers/drill/wiki/Create-a-Storage-Plugin
> 
> [2] https://github.com/prestosql-rocks/presto-rest
> 
> [3] https://square.github.io/retrofit/
> 
> [4] https://help.sumologic.com/APIs/Search-Job-API/About-the-Search-Job-API
> 
> 
> 
> 
> 
>    On Friday, November 15, 2019, 09:04:21 AM PST, Charles Givre <cg...@gmail.com> wrote:  
> 
> Hi Igor, 
> Thanks for the advice.  I've been doing some digging and am still pretty stuck here.  Can you recommend any techniques about how to debug the Jackson serialization/deserialization?  I added a unit test that serializes a query and then deserializes it and that test fails.  I've tracked this back to a constructor not receiving the plugin config and then throwing a NPE. What I can't seem to figure out is where that is being called from and why.
> 
> Any advice would be greatly appreciated.  Code can be found here: https://github.com/apache/drill/pull/1892 <https://github.com/apache/drill/pull/1892>
> Thanks,
> -- C
> 
> 
>> On Oct 12, 2019, at 3:27 AM, Igor Guzenko <ih...@gmail.com> wrote:
>> 
>> Hello Charles,
>> 
>> Looks like you found another new issue. Maybe I explained unclear, but my
>> previous suggestion wasn't about EXPLAIN PLAN construct, but rather:
>> 1)  Use http client like Postman or simply browser to save response of
>> requested rest service into json file
>> 2)  Try to debug reading the file by Drill in order to compare how
>> Calcite's conversion from AST SqlNode to RelNode tree differs for existing
>> dfs storage plugin from same flow in your storage plugin.
>> 
>> From your last email I can figure out that exists another issue with class
>> HttpGroupScan, at some point Drill tried to deserialize json into instance
>> of HttpGroupScan and jackson library didn't find how to do this. Probably
>> you missed some constructor with jackson metadata, for example see in
>> HiveScan operator:
>> 
>> @JsonCreator
>> public HiveScan(@JsonProperty("userName") final String userName,
>>                 @JsonProperty("hiveReadEntry") final HiveReadEntry
>> hiveReadEntry,
>>                 @JsonProperty("hiveStoragePluginConfig") final
>> HiveStoragePluginConfig hiveStoragePluginConfig,
>>                 @JsonProperty("columns") final List<SchemaPath> columns,
>>                 @JsonProperty("confProperties") final Map<String,
>> String> confProperties,
>>                 @JacksonInject final StoragePluginRegistry
>> pluginRegistry) throws ExecutionSetupException {
>>   this(userName,
>>       hiveReadEntry,
>>       (HiveStoragePlugin) pluginRegistry.getPlugin(hiveStoragePluginConfig),
>>       columns,
>>       null, confProperties);
>> }
>> 
>> Kind regards,
>> Igor
>> 
>> 
>> 
>> On Fri, Oct 11, 2019 at 10:53 PM Charles Givre <cgivre@gmail.com <ma...@gmail.com>> wrote:
>> 
>>> Hi Igor,
>>> Thanks for responding.  I'm not sure if this is what you intended, but
>>> looked at the JSON for the Query plans and found something interesting.
>>> For the SELECT * query, I'm getting the following when I try to run the
>>> physical plan that it generates (without modification).  Do you think this
>>> could be a related problem?
>>> 
>>> 
>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>> InvalidDefinitionException: Cannot construct instance of
>>> `org.apache.drill.exec.store.http.HttpGroupScan` (no Creators, like default
>>> construct, exist): cannot deserialize from Object value (no delegate- or
>>> property-based Creator)
>>> at [Source: (String)"{
>>>   "head" : {
>>>     "version" : 1,
>>>     "generator" : {
>>>       "type" : "ExplainHandler",
>>>       "info" : ""
>>>     },
>>>     "type" : "APACHE_DRILL_PHYSICAL",
>>>     "options" : [ ],
>>>     "queue" : 0,
>>>     "hasResourcePlan" : false,
>>>     "resultMode" : "EXEC"
>>>   },
>>>   "graph" : [ {
>>>     "pop" : "http-scan",
>>>     "@id" : 2,
>>>     "scanSpec" : {
>>>       "uri" : "/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02"
>>>     },
>>>     "columns" : [ "`**`" ],
>>>     "storageConfig" : {
>>>       "type" : "http",
>>>   "[truncated 766 chars]; line: 16, column: 5] (through reference chain:
>>> org.apache.drill.exec.physical.PhysicalPlan["graph"]->java.util.ArrayList[0])
>>> 
>>> 
>>> Please, refer to logs for more information.
>>> 
>>> [Error Id: 751b6d05-a631-4eca-9d83-162ab4fa839f on localhost:31010]
>>> 
>>> 
>>>> On Oct 11, 2019, at 12:25 PM, Igor Guzenko <ih...@gmail.com>
>>> wrote:
>>>> 
>>>> Hello Charles,
>>>> 
>>>> You got the error from Apache Calcite at the planning stage while
>>>> converting SQLIdentifier to RexNode. From your stack trace the conversion
>>>> starts here DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:685)
>>> and
>>>> goes to
>>> SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3694). I
>>>> would suggest to save json returned by rest locally as file and debug
>>> same
>>>> trace for query on the json file. So then you can find difference between
>>>> conversion of sql identifier to rel for standart json reading and for
>>> your
>>>> storage plugin.
>>>> 
>>>> Thanks, Igor
>>>> 
>>>> 
>>>> On Fri, Oct 11, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com <ma...@gmail.com> <mailto:
>>> cgivre@gmail.com>> wrote:
>>>> 
>>>>> Hello all,
>>>>> I decided to take the leap and attempt to implement a storage plugin.  I
>>>>> found that a few people had started this, so I thought I'd complete a
>>>>> simple generic HTTP/REST storage plugin. The use case would be to enrich
>>>>> data sets with data that's available via public or internal APIs.
>>>>> 
>>>>> Anyway, I'm a little stuck and need some assistance.  i got the plugin
>>> to
>>>>> successfully execute a star query and return the results correctly:
>>>>> 
>>>>> apache drill> SELECT * FROM
>>>>> http.`/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02`;
>>>>> 
>>>>> 
>>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>>> |  sunrise  |  sunset  | solar_noon  | day_length |
>>>>> civil_twilight_begin | civil_twilight_end | nautical_twilight_begin |
>>>>> nautical_twilight_end | astronomical_twilight_begin |
>>>>> astronomical_twilight_end |
>>>>> 
>>>>> 
>>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>>> | 6:13:58 AM | 5:59:55 PM | 12:06:56 PM | 11:45:57  | 5:48:14 AM
>>>>> | 6:25:38 PM        | 5:18:16 AM              | 6:55:36 PM            |
>>>>> 4:48:07 AM                  | 7:25:45 PM                |
>>>>> 
>>>>> 
>>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>>> 1 row selected (0.392 seconds)
>>>>> 
>>>>> However, when I attempt to select individual fields i get errors.  (see
>>>>> below for full stack trace).  I've walked through this with the
>>> debugger,
>>>>> but it seems like the code is breaking before it hits my storage plugin
>>> and
>>>>> I'm not sure what to do about it.  Here's a link to the code:
>>>>> https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http <https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http>
>>> <
>>>>> https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http
>>> <https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http>>
>>>>> 
>>>>> Any assistance would be greatly appreciated.  Thanks!!
>>>>> 
>>>>> 
>>>>> 
>>>>> apache drill> !verbose
>>>>> verbose: on
>>>>> apache drill> SELECT sunset FROM
>>>>> http.`/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02`;
>>>>> Error: SYSTEM ERROR: AssertionError: Field ordinal 1 is invalid for
>>> type
>>>>> '(DrillRecordRow[**])'
>>>>> 
>>>>> 
>>>>> Please, refer to logs for more information.
>>>>> 
>>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>>> (state=,code=0)
>>>>> java.sql.SQLException: SYSTEM ERROR: AssertionError: Field ordinal 1 is
>>>>> invalid for  type '(DrillRecordRow[**])'
>>>>> 
>>>>> 
>>>>> Please, refer to logs for more information.
>>>>> 
>>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>>>       at
>>>>> 
>>> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:538)
>>>>>       at
>>>>> 
>>> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:610)
>>>>>       at
>>>>> 
>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1278)
>>>>>       at
>>>>> 
>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:58)
>>>>>       at
>>>>> 
>>> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
>>>>>       at
>>>>> 
>>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1102)
>>>>>       at
>>>>> 
>>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1113)
>>>>>       at
>>>>> 
>>> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
>>>>>       at
>>>>> 
>>> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200)
>>>>>       at
>>>>> 
>>> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>>>>>       at
>>>>> 
>>> org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
>>>>>       at sqlline.Commands.executeSingleQuery(Commands.java:1008)
>>>>>       at sqlline.Commands.execute(Commands.java:957)
>>>>>       at sqlline.Commands.sql(Commands.java:921)
>>>>>       at sqlline.SqlLine.dispatch(SqlLine.java:717)
>>>>>       at sqlline.SqlLine.begin(SqlLine.java:536)
>>>>>       at sqlline.SqlLine.start(SqlLine.java:266)
>>>>>       at sqlline.SqlLine.main(SqlLine.java:205)
>>>>> Caused by: org.apache.drill.common.exceptions.UserRemoteException:
>>> SYSTEM
>>>>> ERROR: AssertionError: Field ordinal 1 is invalid for  type
>>>>> '(DrillRecordRow[**])'
>>>>> 
>>>>> 
>>>>> Please, refer to logs for more information.
>>>>> 
>>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>>>       at
>>>>> 
>>> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>>>>>       at
>>>>> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
>>>>>       at
>>>>> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
>>>>>       at
>>>>> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
>>>>>       at
>>>>> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
>>>>>       at
>>>>> 
>>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>>       at
>>>>> 
>>> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>>       at
>>>>> 
>>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>>       at
>>>>> 
>>> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>>>>>       at
>>>>> 
>>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>>       at
>>>>> 
>>> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>>       at
>>>>> 
>>> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>>       at
>>>>> 
>>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>>       at
>>>>> 
>>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>>>>>       at
>>>>> 
>>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>>>>>       at
>>>>> 
>>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>>>>>       at
>>>>> 
>>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>>>>>       at
>>>>> 
>>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>>>>>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>>>>>       at
>>>>> 
>>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
>>>>>       at java.lang.Thread.run(Thread.java:748)
>>>>> Caused by: org.apache.drill.exec.work.foreman.ForemanException:
>>>>> Unexpected exception during fragment initialization: Field ordinal 1 is
>>>>> invalid for  type '(DrillRecordRow[**])'
>>>>>       at org.apache.drill.exec.work
>>>>> .foreman.Foreman.run(Foreman.java:303)
>>>>>       at .......(:0)
>>>>> Caused by: java.lang.AssertionError: Field ordinal 1 is invalid for
>>> type
>>>>> '(DrillRecordRow[**])'
>>>>>       at
>>>>> org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197)
>>>>>       at
>>>>> 
>>> org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3694)
>>>>>       at
>>>>> 
>>> org.apache.calcite.sql2rel.SqlToRelConverter.access$2200(SqlToRelConverter.java:217)
>>>>>       at
>>>>> 
>>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4765)
>>>>>       at
>>>>> 
>>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4061)
>>>>>       at
>>>>> org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
>>>>>       at
>>>>> 
>>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4625)
>>>>>       at
>>>>> 
>>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3908)
>>>>>       at
>>>>> 
>>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:670)
>>>>>       at
>>>>> 
>>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
>>>>>       at
>>>>> 
>>> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3150)
>>>>>       at
>>>>> 
>>> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563)
>>>>>       at
>>>>> 
>>> org.apache.drill.exec.planner.sql.SqlConverter.toRel(SqlConverter.java:414)
>>>>>       at
>>>>> 
>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:685)
>>>>>       at
>>>>> 
>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:202)
>>>>>       at
>>>>> 
>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:172)
>>>>>       at
>>>>> 
>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:226)
>>>>>       at
>>>>> 
>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:124)
>>>>>       at
>>>>> 
>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:90)
>>>>>       at org.apache.drill.exec.work
>>>>> .foreman.Foreman.runSQL(Foreman.java:591)
>>>>>       at org.apache.drill.exec.work
>>>>> .foreman.Foreman.run(Foreman.java:276)
>>>>>       ... 1 more
>>>>> apache drill>


Re: Storage Plugin Assistance

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Charles,

A thought on debugging deserialization is to not do it in a query. Capture the JSON returned from a rest call. Write a simple unit test that deserializes that by itself from a string or file. Deserialization is a bit of a black art, and is really a problem separate from Drill itself.

As it turns out, for my "day job" I'm doing a POC using Drill to query SumoLogic. I took this as an opportunity to fill that gap you mentioned in our book: how to create a storage plugin. See [1]. This is a work in progress, but it has helped me build the planner-side stuff up to the batch reader, after which the work is identical to that for a format plugin.

The Sumo API is REST-based, but for now I'm using the clunky REST client available in the Sumo public repo because of some unfortunate details of the Sumo REST service when used for this purpose. (Sumo returns data as a set of key/value pairs, not as a fixed JSON schema. [4])

Poking around elsewhere, it turns out someone wrote a very simple Presto connector for REST [2] using the Retrofit library from Square [3] which seems very simple to use. If we create a generic REST plugin, we might want to look at how it was done in Presto. Presto requires an up-front schema which Retrofit can provide. Drill, of course, does not require such a schema and so works with ad-hoc schemas, such as the one that Sumo's API provides. 

Actually, better than using a deserializer would be to use Drill's existing JSON parser to read data directly into value vectors. But, that existing code has lots of tech debt. I've been working on a PR for new version based on EVF, but that is a while off, and won't help us today.

It is interesting to note that neither the JSON reader, nor a generic REST API would work with the Sumo API because of is structure. I think the JSON reader would read an entire batch of Sumo results as a single record composed of a repeated Map, with elements being the key/value pairs. Not at all ideal.

So, both the JSON reader, and the REST API, should eventually handle data formats which are generic (name/value pairs) rather than expressed in the structure of JSON objects (as required by Jackson and Retrofit.) That is a topic for later, but is why the Sumo plugin has to be custom to Sumo's API for now.


Thanks,
- Paul


[1] https://github.com/paul-rogers/drill/wiki/Create-a-Storage-Plugin

[2] https://github.com/prestosql-rocks/presto-rest

[3] https://square.github.io/retrofit/

[4] https://help.sumologic.com/APIs/Search-Job-API/About-the-Search-Job-API



 

    On Friday, November 15, 2019, 09:04:21 AM PST, Charles Givre <cg...@gmail.com> wrote:  
 
 Hi Igor, 
Thanks for the advice.  I've been doing some digging and am still pretty stuck here.  Can you recommend any techniques about how to debug the Jackson serialization/deserialization?  I added a unit test that serializes a query and then deserializes it and that test fails.  I've tracked this back to a constructor not receiving the plugin config and then throwing a NPE. What I can't seem to figure out is where that is being called from and why.

Any advice would be greatly appreciated.  Code can be found here: https://github.com/apache/drill/pull/1892 <https://github.com/apache/drill/pull/1892>
Thanks,
-- C


> On Oct 12, 2019, at 3:27 AM, Igor Guzenko <ih...@gmail.com> wrote:
> 
> Hello Charles,
> 
> Looks like you found another new issue. Maybe I explained unclear, but my
> previous suggestion wasn't about EXPLAIN PLAN construct, but rather:
> 1)  Use http client like Postman or simply browser to save response of
> requested rest service into json file
> 2)  Try to debug reading the file by Drill in order to compare how
> Calcite's conversion from AST SqlNode to RelNode tree differs for existing
> dfs storage plugin from same flow in your storage plugin.
> 
> From your last email I can figure out that exists another issue with class
> HttpGroupScan, at some point Drill tried to deserialize json into instance
> of HttpGroupScan and jackson library didn't find how to do this. Probably
> you missed some constructor with jackson metadata, for example see in
> HiveScan operator:
> 
> @JsonCreator
> public HiveScan(@JsonProperty("userName") final String userName,
>                @JsonProperty("hiveReadEntry") final HiveReadEntry
> hiveReadEntry,
>                @JsonProperty("hiveStoragePluginConfig") final
> HiveStoragePluginConfig hiveStoragePluginConfig,
>                @JsonProperty("columns") final List<SchemaPath> columns,
>                @JsonProperty("confProperties") final Map<String,
> String> confProperties,
>                @JacksonInject final StoragePluginRegistry
> pluginRegistry) throws ExecutionSetupException {
>  this(userName,
>      hiveReadEntry,
>      (HiveStoragePlugin) pluginRegistry.getPlugin(hiveStoragePluginConfig),
>      columns,
>      null, confProperties);
> }
> 
> Kind regards,
> Igor
> 
> 
> 
> On Fri, Oct 11, 2019 at 10:53 PM Charles Givre <cgivre@gmail.com <ma...@gmail.com>> wrote:
> 
>> Hi Igor,
>> Thanks for responding.  I'm not sure if this is what you intended, but
>> looked at the JSON for the Query plans and found something interesting.
>> For the SELECT * query, I'm getting the following when I try to run the
>> physical plan that it generates (without modification).  Do you think this
>> could be a related problem?
>> 
>> 
>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>> InvalidDefinitionException: Cannot construct instance of
>> `org.apache.drill.exec.store.http.HttpGroupScan` (no Creators, like default
>> construct, exist): cannot deserialize from Object value (no delegate- or
>> property-based Creator)
>> at [Source: (String)"{
>>  "head" : {
>>    "version" : 1,
>>    "generator" : {
>>      "type" : "ExplainHandler",
>>      "info" : ""
>>    },
>>    "type" : "APACHE_DRILL_PHYSICAL",
>>    "options" : [ ],
>>    "queue" : 0,
>>    "hasResourcePlan" : false,
>>    "resultMode" : "EXEC"
>>  },
>>  "graph" : [ {
>>    "pop" : "http-scan",
>>    "@id" : 2,
>>    "scanSpec" : {
>>      "uri" : "/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02"
>>    },
>>    "columns" : [ "`**`" ],
>>    "storageConfig" : {
>>      "type" : "http",
>>  "[truncated 766 chars]; line: 16, column: 5] (through reference chain:
>> org.apache.drill.exec.physical.PhysicalPlan["graph"]->java.util.ArrayList[0])
>> 
>> 
>> Please, refer to logs for more information.
>> 
>> [Error Id: 751b6d05-a631-4eca-9d83-162ab4fa839f on localhost:31010]
>> 
>> 
>>> On Oct 11, 2019, at 12:25 PM, Igor Guzenko <ih...@gmail.com>
>> wrote:
>>> 
>>> Hello Charles,
>>> 
>>> You got the error from Apache Calcite at the planning stage while
>>> converting SQLIdentifier to RexNode. From your stack trace the conversion
>>> starts here DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:685)
>> and
>>> goes to
>> SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3694). I
>>> would suggest to save json returned by rest locally as file and debug
>> same
>>> trace for query on the json file. So then you can find difference between
>>> conversion of sql identifier to rel for standart json reading and for
>> your
>>> storage plugin.
>>> 
>>> Thanks, Igor
>>> 
>>> 
>>> On Fri, Oct 11, 2019 at 6:34 PM Charles Givre <cgivre@gmail.com <ma...@gmail.com> <mailto:
>> cgivre@gmail.com>> wrote:
>>> 
>>>> Hello all,
>>>> I decided to take the leap and attempt to implement a storage plugin.  I
>>>> found that a few people had started this, so I thought I'd complete a
>>>> simple generic HTTP/REST storage plugin. The use case would be to enrich
>>>> data sets with data that's available via public or internal APIs.
>>>> 
>>>> Anyway, I'm a little stuck and need some assistance.  i got the plugin
>> to
>>>> successfully execute a star query and return the results correctly:
>>>> 
>>>> apache drill> SELECT * FROM
>>>> http.`/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02`;
>>>> 
>>>> 
>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>> |  sunrise  |  sunset  | solar_noon  | day_length |
>>>> civil_twilight_begin | civil_twilight_end | nautical_twilight_begin |
>>>> nautical_twilight_end | astronomical_twilight_begin |
>>>> astronomical_twilight_end |
>>>> 
>>>> 
>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>> | 6:13:58 AM | 5:59:55 PM | 12:06:56 PM | 11:45:57  | 5:48:14 AM
>>>> | 6:25:38 PM        | 5:18:16 AM              | 6:55:36 PM            |
>>>> 4:48:07 AM                  | 7:25:45 PM                |
>>>> 
>>>> 
>> +------------+------------+-------------+------------+----------------------+--------------------+-------------------------+-----------------------+-----------------------------+---------------------------+
>>>> 1 row selected (0.392 seconds)
>>>> 
>>>> However, when I attempt to select individual fields i get errors.  (see
>>>> below for full stack trace).  I've walked through this with the
>> debugger,
>>>> but it seems like the code is breaking before it hits my storage plugin
>> and
>>>> I'm not sure what to do about it.  Here's a link to the code:
>>>> https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http <https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http>
>> <
>>>> https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http
>> <https://github.com/cgivre/drill/tree/storage-http/contrib/storage-http>>
>>>> 
>>>> Any assistance would be greatly appreciated.  Thanks!!
>>>> 
>>>> 
>>>> 
>>>> apache drill> !verbose
>>>> verbose: on
>>>> apache drill> SELECT sunset FROM
>>>> http.`/json?lat=36.7201600&lng=-4.4203400&date=2019-10-02`;
>>>> Error: SYSTEM ERROR: AssertionError: Field ordinal 1 is invalid for
>> type
>>>> '(DrillRecordRow[**])'
>>>> 
>>>> 
>>>> Please, refer to logs for more information.
>>>> 
>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>> (state=,code=0)
>>>> java.sql.SQLException: SYSTEM ERROR: AssertionError: Field ordinal 1 is
>>>> invalid for  type '(DrillRecordRow[**])'
>>>> 
>>>> 
>>>> Please, refer to logs for more information.
>>>> 
>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:538)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:610)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1278)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:58)
>>>>      at
>>>> 
>> org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:667)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1102)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1113)
>>>>      at
>>>> 
>> org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:675)
>>>>      at
>>>> 
>> org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200)
>>>>      at
>>>> 
>> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
>>>>      at
>>>> 
>> org.apache.calcite.avatica.AvaticaStatement.execute(AvaticaStatement.java:217)
>>>>      at sqlline.Commands.executeSingleQuery(Commands.java:1008)
>>>>      at sqlline.Commands.execute(Commands.java:957)
>>>>      at sqlline.Commands.sql(Commands.java:921)
>>>>      at sqlline.SqlLine.dispatch(SqlLine.java:717)
>>>>      at sqlline.SqlLine.begin(SqlLine.java:536)
>>>>      at sqlline.SqlLine.start(SqlLine.java:266)
>>>>      at sqlline.SqlLine.main(SqlLine.java:205)
>>>> Caused by: org.apache.drill.common.exceptions.UserRemoteException:
>> SYSTEM
>>>> ERROR: AssertionError: Field ordinal 1 is invalid for  type
>>>> '(DrillRecordRow[**])'
>>>> 
>>>> 
>>>> Please, refer to logs for more information.
>>>> 
>>>> [Error Id: d7bccd2f-73e6-40d7-9b8a-73a772f65c02 on 192.168.1.21:31010]
>>>>      at
>>>> 
>> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
>>>>      at
>>>> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
>>>>      at
>>>> org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
>>>>      at
>>>> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
>>>>      at
>>>> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
>>>>      at
>>>> 
>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>      at
>>>> 
>> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>      at
>>>> 
>> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>      at
>>>> 
>> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
>>>>      at
>>>> 
>> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>      at
>>>> 
>> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
>>>>      at
>>>> 
>> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
>>>>      at
>>>> 
>> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
>>>>      at
>>>> 
>> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>>>>      at
>>>> 
>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>>>>      at
>>>> 
>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>>>>      at
>>>> 
>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>>>>      at
>>>> 
>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>>>>      at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>>>>      at
>>>> 
>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
>>>>      at java.lang.Thread.run(Thread.java:748)
>>>> Caused by: org.apache.drill.exec.work.foreman.ForemanException:
>>>> Unexpected exception during fragment initialization: Field ordinal 1 is
>>>> invalid for  type '(DrillRecordRow[**])'
>>>>      at org.apache.drill.exec.work
>>>> .foreman.Foreman.run(Foreman.java:303)
>>>>      at .......(:0)
>>>> Caused by: java.lang.AssertionError: Field ordinal 1 is invalid for
>> type
>>>> '(DrillRecordRow[**])'
>>>>      at
>>>> org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:197)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3694)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.access$2200(SqlToRelConverter.java:217)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4765)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4061)
>>>>      at
>>>> org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:317)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4625)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:3908)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:670)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:627)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3150)
>>>>      at
>>>> 
>> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:563)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.SqlConverter.toRel(SqlConverter.java:414)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:685)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:202)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:172)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:226)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:124)
>>>>      at
>>>> 
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:90)
>>>>      at org.apache.drill.exec.work
>>>> .foreman.Foreman.runSQL(Foreman.java:591)
>>>>      at org.apache.drill.exec.work
>>>> .foreman.Foreman.run(Foreman.java:276)
>>>>      ... 1 more
>>>> apache drill>