You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by David Coe <Da...@microsoft.com.INVALID> on 2023/03/09 20:19:05 UTC

Field class in Java vs C#

I am interested in the difference between how a Field is structured in Java (with children) and in C# (no children) and why that's the case.

I am looking to port apache/arrow-adbc: Apache arrow (github.com)<https://github.com/apache/arrow-adbc> to C# but the concept of children is making it a little hairy.


  *   David


Re: [EXTERNAL] Re: Field class in Java vs C#

Posted by Will Jones <wi...@gmail.com>.
Hi David Coe,

As David Li pointed out, ADBC implementations can either be based purely
within a language (C#-specific drivers that can only be used by C#
programs) or use C API drivers written in other languages (C, C++, Go). For
the latter, we won't be able to implement this until we finish implementing
the C Data Interface [1] and C Stream Interface [2]. And for both
approaches, I think we need to implement Union and Map types for GetInfo,
which currently aren't implemented in Arrow C#.

Best,
Will Jones

[1] https://github.com/apache/arrow/issues/33856
[2] https://github.com/apache/arrow/issues/33857


On Thu, Mar 9, 2023 at 1:38 PM David Coe <Da...@microsoft.com.invalid>
wrote:

> Yes, ok, I see the pattern now. Thanks you.
>
> -----Original Message-----
> From: David Li <li...@apache.org>
> Sent: Thursday, March 9, 2023 4:30 PM
> To: dev@arrow.apache.org
> Subject: Re: [EXTERNAL] Re: Field class in Java vs C#
>
> [You don't often get email from lidavidm@apache.org. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> I believe it would be something like (pseudocode since the last time I
> touched C♯ was, 2009?)
>
> List<Field> TABLE_SCHEMA = new[]{
>   ...,
>   new Field("table_columns", new ListType(new StructType(COLUMN_SCHEMA)),
>   ...,
> };
>
> i.e. COLUMN_SCHEMA gets passed as the fields of a StructType itself
> instead of the field containing the StructType. (Which saves you some
> typing too since you don't have to explicitly name the list child field.)
>
> On Thu, Mar 9, 2023, at 16:20, David Coe wrote:
> > I am investigating whether ADBC can be a replacement for ODBC in
> > certain scenarios and help with more efficient copying.
> >
> > For example, in
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2Fapache%2Farrow-adbc%2Fblob%2F923e0408fe5a32cc6501b997fafa8316
> > ace25fe0%2Fjava%2Fcore%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Farrow%2Fad
> > bc%2Fcore%2FStandardSchemas.java%23L116&data=05%7C01%7CDavid.Coe%40mic
> > rosoft.com%7C79f83852d98644220a2808db20e58966%7C72f988bf86f141af91ab2d
> > 7cd011db47%7C1%7C0%7C638139942453273749%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> > oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C
> > %7C%7C&sdata=AHmBdvmzfH8vIabO8Z91HzqD%2BexwKNUn4McJQavldRM%3D&reserved
> > =0 it wants COLUMN_SCHEMA and CONSTRAINT_SCHEMA as children but
> > there's not an obvious way to add those children to the respected
> > fields.
> >
> > -----Original Message-----
> > From: David Li <li...@apache.org>
> > Sent: Thursday, March 9, 2023 3:37 PM
> > To: dev@arrow.apache.org
> > Subject: [EXTERNAL] Re: Field class in Java vs C#
> >
> > [You don't often get email from lidavidm@apache.org. Learn why this is
> > important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > I'd be very interested if I can help in any way with porting ADBC to
> > more languages, and learning more about use cases/what functionality
> > is useful (e.g. are you looking to have a full driver/client ecosystem
> > in C♯, or are you interested in being able to leverage drivers written
> > in
> > C/C++/Go?)
> >
> > From a quick look, C♯ follows C++, Python, etc. in putting child
> > fields as part of the nested type, rather than as part of the field
> > itself. I can't say why precisely one implementation chose one design
> > or another, but Java basically follows the IPC format exactly in this
> > regard (and others, e.g. it has a parameterized Int type rather than
> > Int32, Int64, UInt32, etc.), while the other languages model it at a
> > higher level (because only some types can have children).
> >
> > What specifically is difficult with the how the APIs are structured?
> >
> > On Thu, Mar 9, 2023, at 15:19, David Coe wrote:
> >> I am interested in the difference between how a Field is structured
> >> in Java (with children) and in C# (no children) and why that's the case.
> >>
> >> I am looking to port apache/arrow-adbc: Apache arrow
> >> (github.com)<https://nam06.safelinks.protection.outlook.com/?url=http
> >> %3A%2F%2Fhttps%2F&data=05%7C01%7CDavid.Coe%40microsoft.com%7C79f83852
> >> d98644220a2808db20e58966%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7
> >> C638139942453273749%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI
> >> joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EDG87
> >> XRgoxsU9x0bQ4zi0HcyxRytJMb8p3RsXR6xuO8%3D&reserved=0
> >> %3A%2F%2Fgithub.com%2Fapache%2Farrow-adbc&data=05%7C01%7CDavid.Coe%
> 40microsoft.com%7Cedcb6c5cec5f41e653ac08db20de28e7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638139910769892299%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6bc%2BVEFR2syJFe%2FwzSTHwSMeLka8U48r9mypFbspkIQ%3D&reserved=0>
> to C# but the concept of children is making it a little hairy.
> >>
> >>
> >>   *   David
>

RE: [EXTERNAL] Re: Field class in Java vs C#

Posted by David Coe <Da...@microsoft.com.INVALID>.
Yes, ok, I see the pattern now. Thanks you.

-----Original Message-----
From: David Li <li...@apache.org> 
Sent: Thursday, March 9, 2023 4:30 PM
To: dev@arrow.apache.org
Subject: Re: [EXTERNAL] Re: Field class in Java vs C#

[You don't often get email from lidavidm@apache.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

I believe it would be something like (pseudocode since the last time I touched C♯ was, 2009?)

List<Field> TABLE_SCHEMA = new[]{
  ...,
  new Field("table_columns", new ListType(new StructType(COLUMN_SCHEMA)),
  ...,
};

i.e. COLUMN_SCHEMA gets passed as the fields of a StructType itself instead of the field containing the StructType. (Which saves you some typing too since you don't have to explicitly name the list child field.)

On Thu, Mar 9, 2023, at 16:20, David Coe wrote:
> I am investigating whether ADBC can be a replacement for ODBC in 
> certain scenarios and help with more efficient copying.
>
> For example, in
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Farrow-adbc%2Fblob%2F923e0408fe5a32cc6501b997fafa8316
> ace25fe0%2Fjava%2Fcore%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Farrow%2Fad
> bc%2Fcore%2FStandardSchemas.java%23L116&data=05%7C01%7CDavid.Coe%40mic
> rosoft.com%7C79f83852d98644220a2808db20e58966%7C72f988bf86f141af91ab2d
> 7cd011db47%7C1%7C0%7C638139942453273749%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C
> %7C%7C&sdata=AHmBdvmzfH8vIabO8Z91HzqD%2BexwKNUn4McJQavldRM%3D&reserved
> =0 it wants COLUMN_SCHEMA and CONSTRAINT_SCHEMA as children but 
> there's not an obvious way to add those children to the respected 
> fields.
>
> -----Original Message-----
> From: David Li <li...@apache.org>
> Sent: Thursday, March 9, 2023 3:37 PM
> To: dev@arrow.apache.org
> Subject: [EXTERNAL] Re: Field class in Java vs C#
>
> [You don't often get email from lidavidm@apache.org. Learn why this is 
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> I'd be very interested if I can help in any way with porting ADBC to 
> more languages, and learning more about use cases/what functionality 
> is useful (e.g. are you looking to have a full driver/client ecosystem 
> in C♯, or are you interested in being able to leverage drivers written 
> in
> C/C++/Go?)
>
> From a quick look, C♯ follows C++, Python, etc. in putting child 
> fields as part of the nested type, rather than as part of the field 
> itself. I can't say why precisely one implementation chose one design 
> or another, but Java basically follows the IPC format exactly in this 
> regard (and others, e.g. it has a parameterized Int type rather than 
> Int32, Int64, UInt32, etc.), while the other languages model it at a 
> higher level (because only some types can have children).
>
> What specifically is difficult with the how the APIs are structured?
>
> On Thu, Mar 9, 2023, at 15:19, David Coe wrote:
>> I am interested in the difference between how a Field is structured 
>> in Java (with children) and in C# (no children) and why that's the case.
>>
>> I am looking to port apache/arrow-adbc: Apache arrow
>> (github.com)<https://nam06.safelinks.protection.outlook.com/?url=http
>> %3A%2F%2Fhttps%2F&data=05%7C01%7CDavid.Coe%40microsoft.com%7C79f83852
>> d98644220a2808db20e58966%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7
>> C638139942453273749%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI
>> joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EDG87
>> XRgoxsU9x0bQ4zi0HcyxRytJMb8p3RsXR6xuO8%3D&reserved=0
>> %3A%2F%2Fgithub.com%2Fapache%2Farrow-adbc&data=05%7C01%7CDavid.Coe%40microsoft.com%7Cedcb6c5cec5f41e653ac08db20de28e7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638139910769892299%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6bc%2BVEFR2syJFe%2FwzSTHwSMeLka8U48r9mypFbspkIQ%3D&reserved=0> to C# but the concept of children is making it a little hairy.
>>
>>
>>   *   David

Re: [EXTERNAL] Re: Field class in Java vs C#

Posted by David Li <li...@apache.org>.
I believe it would be something like (pseudocode since the last time I touched C♯ was, 2009?)

List<Field> TABLE_SCHEMA = new[]{
  ...,
  new Field("table_columns", new ListType(new StructType(COLUMN_SCHEMA)),
  ...,
};

i.e. COLUMN_SCHEMA gets passed as the fields of a StructType itself instead of the field containing the StructType. (Which saves you some typing too since you don't have to explicitly name the list child field.)

On Thu, Mar 9, 2023, at 16:20, David Coe wrote:
> I am investigating whether ADBC can be a replacement for ODBC in 
> certain scenarios and help with more efficient copying.
>
> For example, in 
> https://github.com/apache/arrow-adbc/blob/923e0408fe5a32cc6501b997fafa8316ace25fe0/java/core/src/main/java/org/apache/arrow/adbc/core/StandardSchemas.java#L116 
> it wants COLUMN_SCHEMA and CONSTRAINT_SCHEMA as children but there's 
> not an obvious way to add those children to the respected fields.
>
> -----Original Message-----
> From: David Li <li...@apache.org> 
> Sent: Thursday, March 9, 2023 3:37 PM
> To: dev@arrow.apache.org
> Subject: [EXTERNAL] Re: Field class in Java vs C#
>
> [You don't often get email from lidavidm@apache.org. Learn why this is 
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> I'd be very interested if I can help in any way with porting ADBC to 
> more languages, and learning more about use cases/what functionality is 
> useful (e.g. are you looking to have a full driver/client ecosystem in 
> C♯, or are you interested in being able to leverage drivers written in 
> C/C++/Go?)
>
> From a quick look, C♯ follows C++, Python, etc. in putting child fields 
> as part of the nested type, rather than as part of the field itself. I 
> can't say why precisely one implementation chose one design or another, 
> but Java basically follows the IPC format exactly in this regard (and 
> others, e.g. it has a parameterized Int type rather than Int32, Int64, 
> UInt32, etc.), while the other languages model it at a higher level 
> (because only some types can have children).
>
> What specifically is difficult with the how the APIs are structured?
>
> On Thu, Mar 9, 2023, at 15:19, David Coe wrote:
>> I am interested in the difference between how a Field is structured in 
>> Java (with children) and in C# (no children) and why that's the case.
>>
>> I am looking to port apache/arrow-adbc: Apache arrow 
>> (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https
>> %3A%2F%2Fgithub.com%2Fapache%2Farrow-adbc&data=05%7C01%7CDavid.Coe%40microsoft.com%7Cedcb6c5cec5f41e653ac08db20de28e7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638139910769892299%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6bc%2BVEFR2syJFe%2FwzSTHwSMeLka8U48r9mypFbspkIQ%3D&reserved=0> to C# but the concept of children is making it a little hairy.
>>
>>
>>   *   David

RE: [EXTERNAL] Re: Field class in Java vs C#

Posted by David Coe <Da...@microsoft.com.INVALID>.
I am investigating whether ADBC can be a replacement for ODBC in certain scenarios and help with more efficient copying.

For example, in https://github.com/apache/arrow-adbc/blob/923e0408fe5a32cc6501b997fafa8316ace25fe0/java/core/src/main/java/org/apache/arrow/adbc/core/StandardSchemas.java#L116 it wants COLUMN_SCHEMA and CONSTRAINT_SCHEMA as children but there's not an obvious way to add those children to the respected fields.

-----Original Message-----
From: David Li <li...@apache.org> 
Sent: Thursday, March 9, 2023 3:37 PM
To: dev@arrow.apache.org
Subject: [EXTERNAL] Re: Field class in Java vs C#

[You don't often get email from lidavidm@apache.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

I'd be very interested if I can help in any way with porting ADBC to more languages, and learning more about use cases/what functionality is useful (e.g. are you looking to have a full driver/client ecosystem in C♯, or are you interested in being able to leverage drivers written in C/C++/Go?)

From a quick look, C♯ follows C++, Python, etc. in putting child fields as part of the nested type, rather than as part of the field itself. I can't say why precisely one implementation chose one design or another, but Java basically follows the IPC format exactly in this regard (and others, e.g. it has a parameterized Int type rather than Int32, Int64, UInt32, etc.), while the other languages model it at a higher level (because only some types can have children).

What specifically is difficult with the how the APIs are structured?

On Thu, Mar 9, 2023, at 15:19, David Coe wrote:
> I am interested in the difference between how a Field is structured in 
> Java (with children) and in C# (no children) and why that's the case.
>
> I am looking to port apache/arrow-adbc: Apache arrow 
> (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https
> %3A%2F%2Fgithub.com%2Fapache%2Farrow-adbc&data=05%7C01%7CDavid.Coe%40microsoft.com%7Cedcb6c5cec5f41e653ac08db20de28e7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638139910769892299%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6bc%2BVEFR2syJFe%2FwzSTHwSMeLka8U48r9mypFbspkIQ%3D&reserved=0> to C# but the concept of children is making it a little hairy.
>
>
>   *   David

Re: Field class in Java vs C#

Posted by David Li <li...@apache.org>.
I'd be very interested if I can help in any way with porting ADBC to more languages, and learning more about use cases/what functionality is useful (e.g. are you looking to have a full driver/client ecosystem in C♯, or are you interested in being able to leverage drivers written in C/C++/Go?)

From a quick look, C♯ follows C++, Python, etc. in putting child fields as part of the nested type, rather than as part of the field itself. I can't say why precisely one implementation chose one design or another, but Java basically follows the IPC format exactly in this regard (and others, e.g. it has a parameterized Int type rather than Int32, Int64, UInt32, etc.), while the other languages model it at a higher level (because only some types can have children).

What specifically is difficult with the how the APIs are structured?

On Thu, Mar 9, 2023, at 15:19, David Coe wrote:
> I am interested in the difference between how a Field is structured in 
> Java (with children) and in C# (no children) and why that's the case.
>
> I am looking to port apache/arrow-adbc: Apache arrow 
> (github.com)<https://github.com/apache/arrow-adbc> to C# but the 
> concept of children is making it a little hairy.
>
>
>   *   David