You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by yu feng <ol...@gmail.com> on 2017/04/18 03:02:24 UTC

impala support json format table

Hi impala community:
  I am Newly join to Impala, and In our scenario, we have lots of tables
shared by hive and impala, their storage format are various, such as
parquet/textfile/textfile with json/orc ..., In our plan, we need support
textfile with json at least;, I want to know what is the attitude of impala
community for supporting json format. If this match the roadmap, maybe I
can make some contribution.

By the way, Where can I find the newest roadmap of impala?

Look forward to your reply.

Re: impala support json format table

Posted by Jim Apple <jb...@cloudera.com>.
CC:  yu feng <ol...@gmail.com>

Yu, I think I haven't seen you post before here. This mailing list sets
reply-to that doesn't include the email address of the poster. Perhaps you
already knew this or were already subscribed to that list, but I have CCed
you just in case.

On Tue, Apr 18, 2017 at 10:55 AM, Alexander Behm <al...@cloudera.com>
wrote:

> The existing attempt used the Rapidjson library to do the parsing.
> Unfortunately, the Rapidjson API is not very convenient for Impala because
> it returns typed data, i.e., it internally converts to
> float/double/int/whatever which is problematic for decimal (among others).
> Ideally, we would use the same Impala code to convert data types from
> strings.
>
> On Tue, Apr 18, 2017 at 9:27 AM, Tim Armstrong <ta...@cloudera.com>
> wrote:
>
> > Seems like useful functionality that would be great to have in Impala.
> > There was an earlier attempt to do this that didn't make it in - I'm not
> > sure that the approach was quite right:
> > https://gerrit.cloudera.org/#/c/1201/1 . I'm not sure what the exact
> > problems were but I remember we didn't think it was quite the right
> > approach.
> >
> > I think we'd need to talk through a design first because there are a lot
> of
> > considerations and we want to make sure to get it right. I had some
> initial
> > questions that I'd want to think through before adding a JSON scanner.
> >
> >    - What JSON does it accept?
> >    - How do we declare a table schema and map it to the JSON
> >    - How does it handle missing or extra fields - does it just return
> null
> >    or drop the fields? What if the field type is wrong?
> >    - How do the numeric types work? JSON only supports floating point,
> but
> >    I think many people would like to store higher-precision decimal or
> > 64-bit
> >    integer types (which is technically outside of the JSON standard).
> >    - Will it support codegen? If not, is it written in a way that allows
> it
> >    in future?
> >
> > Cheers,
> > Tim
> >
> > - Tim
> >
> > On Tue, Apr 18, 2017 at 8:52 AM, Jim Apple <jb...@cloudera.com> wrote:
> >
> > > On Mon, Apr 17, 2017 at 8:02 PM, yu feng <ol...@gmail.com> wrote:
> > >
> > > > Hi impala community:
> > > >   I am Newly join to Impala,
> > >
> > >
> > > Welcome!
> > >
> > > I want to know what is the attitude of impala
> > > > community for supporting json format.
> > >
> > >
> > > I am in favor of it. I am only one person, though - anybody else object
> > to
> > > JSON support?
> > >
> > > If this match the roadmap, maybe I
> > > > can make some contribution.
> > > >
> > >
> > > I do not recall much talk about Apache Impala's roadmap since we joined
> > the
> > > ASF. Perhaps I missed a thread about it?
> > >
> >
>

Re: impala support json format table

Posted by Alexander Behm <al...@cloudera.com>.
Great!

On Tue, May 9, 2017 at 11:40 PM, yu feng <ol...@gmail.com> wrote:

> Thanks for reminding me, I very glad to do some contribution for impala, I
> will try to solve IMPALA-5016
> <https://issues.apache.org/jira/browse/IMPALA-5016> once I have time, and
> keep communication with community.
>
> 2017-05-10 12:59 GMT+08:00 Alexander Behm <al...@cloudera.com>:
>
> > Hi Yu,
> >
> > glad to hear that you are considering contributing to Impala! As others
> > have mentioned before, I'd strongly recommend starting with a smaller
> task
> > to get accustomed our development workflows.
> >
> > I'm happy to help with finding a suitable task. For example, you might
> find
> > this JIRA interesting:
> > https://issues.apache.org/jira/browse/IMPALA-5016
> >
> > Please do reach out if I can help.
> >
> > Alex
> >
> > On Tue, Apr 18, 2017 at 10:55 AM, Alexander Behm <alex.behm@cloudera.com
> >
> > wrote:
> >
> > > The existing attempt used the Rapidjson library to do the parsing.
> > > Unfortunately, the Rapidjson API is not very convenient for Impala
> > because
> > > it returns typed data, i.e., it internally converts to
> > > float/double/int/whatever which is problematic for decimal (among
> > others).
> > > Ideally, we would use the same Impala code to convert data types from
> > > strings.
> > >
> > > On Tue, Apr 18, 2017 at 9:27 AM, Tim Armstrong <
> tarmstrong@cloudera.com>
> > > wrote:
> > >
> > >> Seems like useful functionality that would be great to have in Impala.
> > >> There was an earlier attempt to do this that didn't make it in - I'm
> not
> > >> sure that the approach was quite right:
> > >> https://gerrit.cloudera.org/#/c/1201/1 . I'm not sure what the exact
> > >> problems were but I remember we didn't think it was quite the right
> > >> approach.
> > >>
> > >> I think we'd need to talk through a design first because there are a
> lot
> > >> of
> > >> considerations and we want to make sure to get it right. I had some
> > >> initial
> > >> questions that I'd want to think through before adding a JSON scanner.
> > >>
> > >>    - What JSON does it accept?
> > >>    - How do we declare a table schema and map it to the JSON
> > >>    - How does it handle missing or extra fields - does it just return
> > null
> > >>    or drop the fields? What if the field type is wrong?
> > >>    - How do the numeric types work? JSON only supports floating point,
> > but
> > >>    I think many people would like to store higher-precision decimal or
> > >> 64-bit
> > >>    integer types (which is technically outside of the JSON standard).
> > >>    - Will it support codegen? If not, is it written in a way that
> allows
> > >> it
> > >>    in future?
> > >>
> > >> Cheers,
> > >> Tim
> > >>
> > >> - Tim
> > >>
> > >> On Tue, Apr 18, 2017 at 8:52 AM, Jim Apple <jb...@cloudera.com>
> > wrote:
> > >>
> > >> > On Mon, Apr 17, 2017 at 8:02 PM, yu feng <ol...@gmail.com>
> > wrote:
> > >> >
> > >> > > Hi impala community:
> > >> > >   I am Newly join to Impala,
> > >> >
> > >> >
> > >> > Welcome!
> > >> >
> > >> > I want to know what is the attitude of impala
> > >> > > community for supporting json format.
> > >> >
> > >> >
> > >> > I am in favor of it. I am only one person, though - anybody else
> > object
> > >> to
> > >> > JSON support?
> > >> >
> > >> > If this match the roadmap, maybe I
> > >> > > can make some contribution.
> > >> > >
> > >> >
> > >> > I do not recall much talk about Apache Impala's roadmap since we
> > joined
> > >> the
> > >> > ASF. Perhaps I missed a thread about it?
> > >> >
> > >>
> > >
> > >
> >
>

Re: impala support json format table

Posted by yu feng <ol...@gmail.com>.
Thanks for reminding me, I very glad to do some contribution for impala, I
will try to solve IMPALA-5016
<https://issues.apache.org/jira/browse/IMPALA-5016> once I have time, and
keep communication with community.

2017-05-10 12:59 GMT+08:00 Alexander Behm <al...@cloudera.com>:

> Hi Yu,
>
> glad to hear that you are considering contributing to Impala! As others
> have mentioned before, I'd strongly recommend starting with a smaller task
> to get accustomed our development workflows.
>
> I'm happy to help with finding a suitable task. For example, you might find
> this JIRA interesting:
> https://issues.apache.org/jira/browse/IMPALA-5016
>
> Please do reach out if I can help.
>
> Alex
>
> On Tue, Apr 18, 2017 at 10:55 AM, Alexander Behm <al...@cloudera.com>
> wrote:
>
> > The existing attempt used the Rapidjson library to do the parsing.
> > Unfortunately, the Rapidjson API is not very convenient for Impala
> because
> > it returns typed data, i.e., it internally converts to
> > float/double/int/whatever which is problematic for decimal (among
> others).
> > Ideally, we would use the same Impala code to convert data types from
> > strings.
> >
> > On Tue, Apr 18, 2017 at 9:27 AM, Tim Armstrong <ta...@cloudera.com>
> > wrote:
> >
> >> Seems like useful functionality that would be great to have in Impala.
> >> There was an earlier attempt to do this that didn't make it in - I'm not
> >> sure that the approach was quite right:
> >> https://gerrit.cloudera.org/#/c/1201/1 . I'm not sure what the exact
> >> problems were but I remember we didn't think it was quite the right
> >> approach.
> >>
> >> I think we'd need to talk through a design first because there are a lot
> >> of
> >> considerations and we want to make sure to get it right. I had some
> >> initial
> >> questions that I'd want to think through before adding a JSON scanner.
> >>
> >>    - What JSON does it accept?
> >>    - How do we declare a table schema and map it to the JSON
> >>    - How does it handle missing or extra fields - does it just return
> null
> >>    or drop the fields? What if the field type is wrong?
> >>    - How do the numeric types work? JSON only supports floating point,
> but
> >>    I think many people would like to store higher-precision decimal or
> >> 64-bit
> >>    integer types (which is technically outside of the JSON standard).
> >>    - Will it support codegen? If not, is it written in a way that allows
> >> it
> >>    in future?
> >>
> >> Cheers,
> >> Tim
> >>
> >> - Tim
> >>
> >> On Tue, Apr 18, 2017 at 8:52 AM, Jim Apple <jb...@cloudera.com>
> wrote:
> >>
> >> > On Mon, Apr 17, 2017 at 8:02 PM, yu feng <ol...@gmail.com>
> wrote:
> >> >
> >> > > Hi impala community:
> >> > >   I am Newly join to Impala,
> >> >
> >> >
> >> > Welcome!
> >> >
> >> > I want to know what is the attitude of impala
> >> > > community for supporting json format.
> >> >
> >> >
> >> > I am in favor of it. I am only one person, though - anybody else
> object
> >> to
> >> > JSON support?
> >> >
> >> > If this match the roadmap, maybe I
> >> > > can make some contribution.
> >> > >
> >> >
> >> > I do not recall much talk about Apache Impala's roadmap since we
> joined
> >> the
> >> > ASF. Perhaps I missed a thread about it?
> >> >
> >>
> >
> >
>

Re: impala support json format table

Posted by Alexander Behm <al...@cloudera.com>.
Hi Yu,

glad to hear that you are considering contributing to Impala! As others
have mentioned before, I'd strongly recommend starting with a smaller task
to get accustomed our development workflows.

I'm happy to help with finding a suitable task. For example, you might find
this JIRA interesting:
https://issues.apache.org/jira/browse/IMPALA-5016

Please do reach out if I can help.

Alex

On Tue, Apr 18, 2017 at 10:55 AM, Alexander Behm <al...@cloudera.com>
wrote:

> The existing attempt used the Rapidjson library to do the parsing.
> Unfortunately, the Rapidjson API is not very convenient for Impala because
> it returns typed data, i.e., it internally converts to
> float/double/int/whatever which is problematic for decimal (among others).
> Ideally, we would use the same Impala code to convert data types from
> strings.
>
> On Tue, Apr 18, 2017 at 9:27 AM, Tim Armstrong <ta...@cloudera.com>
> wrote:
>
>> Seems like useful functionality that would be great to have in Impala.
>> There was an earlier attempt to do this that didn't make it in - I'm not
>> sure that the approach was quite right:
>> https://gerrit.cloudera.org/#/c/1201/1 . I'm not sure what the exact
>> problems were but I remember we didn't think it was quite the right
>> approach.
>>
>> I think we'd need to talk through a design first because there are a lot
>> of
>> considerations and we want to make sure to get it right. I had some
>> initial
>> questions that I'd want to think through before adding a JSON scanner.
>>
>>    - What JSON does it accept?
>>    - How do we declare a table schema and map it to the JSON
>>    - How does it handle missing or extra fields - does it just return null
>>    or drop the fields? What if the field type is wrong?
>>    - How do the numeric types work? JSON only supports floating point, but
>>    I think many people would like to store higher-precision decimal or
>> 64-bit
>>    integer types (which is technically outside of the JSON standard).
>>    - Will it support codegen? If not, is it written in a way that allows
>> it
>>    in future?
>>
>> Cheers,
>> Tim
>>
>> - Tim
>>
>> On Tue, Apr 18, 2017 at 8:52 AM, Jim Apple <jb...@cloudera.com> wrote:
>>
>> > On Mon, Apr 17, 2017 at 8:02 PM, yu feng <ol...@gmail.com> wrote:
>> >
>> > > Hi impala community:
>> > >   I am Newly join to Impala,
>> >
>> >
>> > Welcome!
>> >
>> > I want to know what is the attitude of impala
>> > > community for supporting json format.
>> >
>> >
>> > I am in favor of it. I am only one person, though - anybody else object
>> to
>> > JSON support?
>> >
>> > If this match the roadmap, maybe I
>> > > can make some contribution.
>> > >
>> >
>> > I do not recall much talk about Apache Impala's roadmap since we joined
>> the
>> > ASF. Perhaps I missed a thread about it?
>> >
>>
>
>

Re: impala support json format table

Posted by Alexander Behm <al...@cloudera.com>.
The existing attempt used the Rapidjson library to do the parsing.
Unfortunately, the Rapidjson API is not very convenient for Impala because
it returns typed data, i.e., it internally converts to
float/double/int/whatever which is problematic for decimal (among others).
Ideally, we would use the same Impala code to convert data types from
strings.

On Tue, Apr 18, 2017 at 9:27 AM, Tim Armstrong <ta...@cloudera.com>
wrote:

> Seems like useful functionality that would be great to have in Impala.
> There was an earlier attempt to do this that didn't make it in - I'm not
> sure that the approach was quite right:
> https://gerrit.cloudera.org/#/c/1201/1 . I'm not sure what the exact
> problems were but I remember we didn't think it was quite the right
> approach.
>
> I think we'd need to talk through a design first because there are a lot of
> considerations and we want to make sure to get it right. I had some initial
> questions that I'd want to think through before adding a JSON scanner.
>
>    - What JSON does it accept?
>    - How do we declare a table schema and map it to the JSON
>    - How does it handle missing or extra fields - does it just return null
>    or drop the fields? What if the field type is wrong?
>    - How do the numeric types work? JSON only supports floating point, but
>    I think many people would like to store higher-precision decimal or
> 64-bit
>    integer types (which is technically outside of the JSON standard).
>    - Will it support codegen? If not, is it written in a way that allows it
>    in future?
>
> Cheers,
> Tim
>
> - Tim
>
> On Tue, Apr 18, 2017 at 8:52 AM, Jim Apple <jb...@cloudera.com> wrote:
>
> > On Mon, Apr 17, 2017 at 8:02 PM, yu feng <ol...@gmail.com> wrote:
> >
> > > Hi impala community:
> > >   I am Newly join to Impala,
> >
> >
> > Welcome!
> >
> > I want to know what is the attitude of impala
> > > community for supporting json format.
> >
> >
> > I am in favor of it. I am only one person, though - anybody else object
> to
> > JSON support?
> >
> > If this match the roadmap, maybe I
> > > can make some contribution.
> > >
> >
> > I do not recall much talk about Apache Impala's roadmap since we joined
> the
> > ASF. Perhaps I missed a thread about it?
> >
>

Re: impala support json format table

Posted by Tim Armstrong <ta...@cloudera.com>.
+1 to starting with some smaller tasks

On 18 Apr. 2017 9:32 am, "Jim Apple" <jb...@cloudera.com> wrote:

> That's a good point, Tim. Generally, new contributors might want to start
> with a newbie bug:
>
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20IMPALA%20AND%20status%20%3D%20Open%20AND%20labels%20%3D%20newbie
>
> JSON support is a larger project, and it might not be the one most amenable
> to learning about the Impala community's style and processes.
>
> On Tue, Apr 18, 2017 at 9:27 AM, Tim Armstrong <ta...@cloudera.com>
> wrote:
>
> > Seems like useful functionality that would be great to have in Impala.
> > There was an earlier attempt to do this that didn't make it in - I'm not
> > sure that the approach was quite right:
> > https://gerrit.cloudera.org/#/c/1201/1 . I'm not sure what the exact
> > problems were but I remember we didn't think it was quite the right
> > approach.
> >
> > I think we'd need to talk through a design first because there are a lot
> of
> > considerations and we want to make sure to get it right. I had some
> initial
> > questions that I'd want to think through before adding a JSON scanner.
> >
> >    - What JSON does it accept?
> >    - How do we declare a table schema and map it to the JSON
> >    - How does it handle missing or extra fields - does it just return
> null
> >    or drop the fields? What if the field type is wrong?
> >    - How do the numeric types work? JSON only supports floating point,
> but
> >    I think many people would like to store higher-precision decimal or
> > 64-bit
> >    integer types (which is technically outside of the JSON standard).
> >    - Will it support codegen? If not, is it written in a way that allows
> it
> >    in future?
> >
> > Cheers,
> > Tim
> >
> > - Tim
> >
> > On Tue, Apr 18, 2017 at 8:52 AM, Jim Apple <jb...@cloudera.com> wrote:
> >
> > > On Mon, Apr 17, 2017 at 8:02 PM, yu feng <ol...@gmail.com> wrote:
> > >
> > > > Hi impala community:
> > > >   I am Newly join to Impala,
> > >
> > >
> > > Welcome!
> > >
> > > I want to know what is the attitude of impala
> > > > community for supporting json format.
> > >
> > >
> > > I am in favor of it. I am only one person, though - anybody else object
> > to
> > > JSON support?
> > >
> > > If this match the roadmap, maybe I
> > > > can make some contribution.
> > > >
> > >
> > > I do not recall much talk about Apache Impala's roadmap since we joined
> > the
> > > ASF. Perhaps I missed a thread about it?
> > >
> >
>

Re: impala support json format table

Posted by Jim Apple <jb...@cloudera.com>.
That's a good point, Tim. Generally, new contributors might want to start
with a newbie bug:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20IMPALA%20AND%20status%20%3D%20Open%20AND%20labels%20%3D%20newbie

JSON support is a larger project, and it might not be the one most amenable
to learning about the Impala community's style and processes.

On Tue, Apr 18, 2017 at 9:27 AM, Tim Armstrong <ta...@cloudera.com>
wrote:

> Seems like useful functionality that would be great to have in Impala.
> There was an earlier attempt to do this that didn't make it in - I'm not
> sure that the approach was quite right:
> https://gerrit.cloudera.org/#/c/1201/1 . I'm not sure what the exact
> problems were but I remember we didn't think it was quite the right
> approach.
>
> I think we'd need to talk through a design first because there are a lot of
> considerations and we want to make sure to get it right. I had some initial
> questions that I'd want to think through before adding a JSON scanner.
>
>    - What JSON does it accept?
>    - How do we declare a table schema and map it to the JSON
>    - How does it handle missing or extra fields - does it just return null
>    or drop the fields? What if the field type is wrong?
>    - How do the numeric types work? JSON only supports floating point, but
>    I think many people would like to store higher-precision decimal or
> 64-bit
>    integer types (which is technically outside of the JSON standard).
>    - Will it support codegen? If not, is it written in a way that allows it
>    in future?
>
> Cheers,
> Tim
>
> - Tim
>
> On Tue, Apr 18, 2017 at 8:52 AM, Jim Apple <jb...@cloudera.com> wrote:
>
> > On Mon, Apr 17, 2017 at 8:02 PM, yu feng <ol...@gmail.com> wrote:
> >
> > > Hi impala community:
> > >   I am Newly join to Impala,
> >
> >
> > Welcome!
> >
> > I want to know what is the attitude of impala
> > > community for supporting json format.
> >
> >
> > I am in favor of it. I am only one person, though - anybody else object
> to
> > JSON support?
> >
> > If this match the roadmap, maybe I
> > > can make some contribution.
> > >
> >
> > I do not recall much talk about Apache Impala's roadmap since we joined
> the
> > ASF. Perhaps I missed a thread about it?
> >
>

Re: impala support json format table

Posted by Tim Armstrong <ta...@cloudera.com>.
Seems like useful functionality that would be great to have in Impala.
There was an earlier attempt to do this that didn't make it in - I'm not
sure that the approach was quite right:
https://gerrit.cloudera.org/#/c/1201/1 . I'm not sure what the exact
problems were but I remember we didn't think it was quite the right
approach.

I think we'd need to talk through a design first because there are a lot of
considerations and we want to make sure to get it right. I had some initial
questions that I'd want to think through before adding a JSON scanner.

   - What JSON does it accept?
   - How do we declare a table schema and map it to the JSON
   - How does it handle missing or extra fields - does it just return null
   or drop the fields? What if the field type is wrong?
   - How do the numeric types work? JSON only supports floating point, but
   I think many people would like to store higher-precision decimal or 64-bit
   integer types (which is technically outside of the JSON standard).
   - Will it support codegen? If not, is it written in a way that allows it
   in future?

Cheers,
Tim

- Tim

On Tue, Apr 18, 2017 at 8:52 AM, Jim Apple <jb...@cloudera.com> wrote:

> On Mon, Apr 17, 2017 at 8:02 PM, yu feng <ol...@gmail.com> wrote:
>
> > Hi impala community:
> >   I am Newly join to Impala,
>
>
> Welcome!
>
> I want to know what is the attitude of impala
> > community for supporting json format.
>
>
> I am in favor of it. I am only one person, though - anybody else object to
> JSON support?
>
> If this match the roadmap, maybe I
> > can make some contribution.
> >
>
> I do not recall much talk about Apache Impala's roadmap since we joined the
> ASF. Perhaps I missed a thread about it?
>

Re: impala support json format table

Posted by Jim Apple <jb...@cloudera.com>.
On Mon, Apr 17, 2017 at 8:02 PM, yu feng <ol...@gmail.com> wrote:

> Hi impala community:
>   I am Newly join to Impala,


Welcome!

I want to know what is the attitude of impala
> community for supporting json format.


I am in favor of it. I am only one person, though - anybody else object to
JSON support?

If this match the roadmap, maybe I
> can make some contribution.
>

I do not recall much talk about Apache Impala's roadmap since we joined the
ASF. Perhaps I missed a thread about it?