You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Andrew Evans <An...@hygenicsdata.com> on 2016/05/25 16:26:39 UTC

Apache Drill, Query PostgreSQL text or Jsonb as if it were from a json storage type?

Drill Members,

I have an intriguing problem where I have hundreds of thousands or even millions of records stored in jsonb format in uneven JSon objects in a PostgreSQL database. I would like to be able to implicitly grab column names and data types using your tool since neither PostreSQL or Pentaho have this function. My current tool for this task is Scala but I am not dedicated to writing and keeping up with the JSon format specs; etc. in real time.

Is it possible by conversion or otherwise to read the jsonb or a text string from PostgreSQL as if it were being queried from a "json" type storage instead of a "jdbc" type storage? If so, could I pull in different columns from PostgreSQL without some sort of key (with the original query)? Is there the ability to do some thing like SELECT pkey, split_to_columns(convert_to(field,'JSON')) FROM postgres.mytable?

For context, I have posted an example record below:

pkey       ,  data
1423234, "{"loadCaseResult": {"case": {"type": "Adoption (Family)", "judge": null, "style": "Confidential", "caseId": "12", "events": {"event": [{"date": "08/06/2014", "type": "Request: Action", "comment": "900125-Request for Action                \n Filed/Issued By"}}}

Thank you for your time,

Andrew Evans
Java/Scala/Python Programmer at Hygenics Data,LLC

Re: Apache Drill, Query PostgreSQL text or Jsonb as if it were from a json storage type?

Posted by Neeraja Rentachintala <nr...@maprtech.com>.
There is ability to do retrieve JSON fields using the convert_to function
in Drill. Check the following doc.

https://drill.apache.org/docs/data-type-conversion/#convert_to-and-convert_from


On Wed, May 25, 2016 at 9:26 AM, Andrew Evans <Andrew.Evans@hygenicsdata.com
> wrote:

> Drill Members,
>
> I have an intriguing problem where I have hundreds of thousands or even
> millions of records stored in jsonb format in uneven JSon objects in a
> PostgreSQL database. I would like to be able to implicitly grab column
> names and data types using your tool since neither PostreSQL or Pentaho
> have this function. My current tool for this task is Scala but I am not
> dedicated to writing and keeping up with the JSon format specs; etc. in
> real time.
>
> Is it possible by conversion or otherwise to read the jsonb or a text
> string from PostgreSQL as if it were being queried from a "json" type
> storage instead of a "jdbc" type storage? If so, could I pull in different
> columns from PostgreSQL without some sort of key (with the original query)?
> Is there the ability to do some thing like SELECT pkey,
> split_to_columns(convert_to(field,'JSON')) FROM postgres.mytable?
>
> For context, I have posted an example record below:
>
> pkey       ,  data
> 1423234, "{"loadCaseResult": {"case": {"type": "Adoption (Family)",
> "judge": null, "style": "Confidential", "caseId": "12", "events": {"event":
> [{"date": "08/06/2014", "type": "Request: Action", "comment":
> "900125-Request for Action                \n Filed/Issued By"}}}
>
> Thank you for your time,
>
> Andrew Evans
> Java/Scala/Python Programmer at Hygenics Data,LLC
>

RE: Apache Drill, Query PostgreSQL text or Jsonb as if it were from a json storage type?

Posted by Andrew Evans <An...@hygenicsdata.com>.
I am trying to avoid reading through millions of records but would like to also avoid building a postgreSQL table by making less but an also significant number of calls to check column existence in the information_schema. Drill seems perfect for implicity discovering this. Right now, I am dumpy to file. It would be great to have a way to avoid declaring types when I do not know the name of every column in the data. The data is acquired from a source that only provides key value pairs when the values are not null.

Also, thanks.  
________________________________________
From: MattK [matt@hybriddba.com]
Sent: Wednesday, May 25, 2016 10:51 AM
To: user@drill.apache.org
Subject: Re: Apache Drill, Query PostgreSQL text or Jsonb as if it were from a json storage type?

Would the PostgreSQL function jsonb_to_recordset(jsonb) help in this
case?

It would return to Drill a table instead of a set of JSON objects, but
you would have to declare the types in the call.

On 25 May 2016, at 12:26, Andrew Evans wrote:

> Drill Members,
>
> I have an intriguing problem where I have hundreds of thousands or
> even millions of records stored in jsonb format in uneven JSon objects
> in a PostgreSQL database. I would like to be able to implicitly grab
> column names and data types using your tool since neither PostreSQL or
> Pentaho have this function. My current tool for this task is Scala but
> I am not dedicated to writing and keeping up with the JSon format
> specs; etc. in real time.
>
> Is it possible by conversion or otherwise to read the jsonb or a text
> string from PostgreSQL as if it were being queried from a "json" type
> storage instead of a "jdbc" type storage? If so, could I pull in
> different columns from PostgreSQL without some sort of key (with the
> original query)? Is there the ability to do some thing like SELECT
> pkey, split\_to\_columns(convert\_to(field,'JSON')) FROM
> postgres.mytable?
>
> For context, I have posted an example record below:
>
> pkey       ,  data
> 1423234, "{"loadCaseResult": {"case": {"type": "Adoption (Family)",
> "judge": null, "style": "Confidential", "caseId": "12", "events":
> {"event": [{"date": "08/06/2014", "type": "Request: Action",
> "comment": "900125-Request for Action                \\n Filed/Issued
> By"}}}
>
> Thank you for your time,
>
> Andrew Evans
> Java/Scala/Python Programmer at Hygenics Data,LLC

Re: Apache Drill, Query PostgreSQL text or Jsonb as if it were from a json storage type?

Posted by MattK <ma...@hybriddba.com>.
Would the PostgreSQL function jsonb_to_recordset(jsonb) help in this 
case?

It would return to Drill a table instead of a set of JSON objects, but 
you would have to declare the types in the call.

On 25 May 2016, at 12:26, Andrew Evans wrote:

> Drill Members,
>
> I have an intriguing problem where I have hundreds of thousands or 
> even millions of records stored in jsonb format in uneven JSon objects 
> in a PostgreSQL database. I would like to be able to implicitly grab 
> column names and data types using your tool since neither PostreSQL or 
> Pentaho have this function. My current tool for this task is Scala but 
> I am not dedicated to writing and keeping up with the JSon format 
> specs; etc. in real time.
>
> Is it possible by conversion or otherwise to read the jsonb or a text 
> string from PostgreSQL as if it were being queried from a "json" type 
> storage instead of a "jdbc" type storage? If so, could I pull in 
> different columns from PostgreSQL without some sort of key (with the 
> original query)? Is there the ability to do some thing like SELECT 
> pkey, split\_to\_columns(convert\_to(field,'JSON')) FROM 
> postgres.mytable?
>
> For context, I have posted an example record below:
>
> pkey       ,  data
> 1423234, "{"loadCaseResult": {"case": {"type": "Adoption (Family)", 
> "judge": null, "style": "Confidential", "caseId": "12", "events": 
> {"event": [{"date": "08/06/2014", "type": "Request: Action", 
> "comment": "900125-Request for Action                \\n Filed/Issued 
> By"}}}
>
> Thank you for your time,
>
> Andrew Evans
> Java/Scala/Python Programmer at Hygenics Data,LLC