You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by françois lacombe <fr...@dcbrain.com> on 2018/08/23 16:54:03 UTC

AvroSchemaConverter and Tuple classes

Hi all,

I'm looking for best practices regarding Tuple<T> instances creation.

I have a TypeInformation object produced by
AvroSchemaConverter.convertToTypeInfo("{...}");
Is this possible to define a corresponding Tuple<T> instance with it? (get
the T from the TypeInformation)

Example :
{
  "type": "record",
  "fields": [
    { "name": "field1", "type": "int" },
    { "name": "field2", "type": "string"}
]}
 = Tuple2<Int,String>

The same question rises with DataSet or other any record handling class
with parametrized types.

My goal is to parse several CsvFiles with different structures described in
an Avro schema.
It would be great to not hard-code structures in my Java code and only get
types information at runtime from Avro schemas

Is this possible?

Thanks in advance

François Lacombe

Re: AvroSchemaConverter and Tuple classes

Posted by françois lacombe <fr...@dcbrain.com>.
Thank you all for you answers.

It's ok with BatchTableSource<Row>


All the best

François

2018-08-26 17:40 GMT+02:00 Rong Rong <wa...@gmail.com>:

> Yes you should be able to use Row instead of Tuple in your
> BatchTableSink<T>.
> There's sections in Flink documentation regarding mapping of data types to
> table schemas [1]. and table can be converted into various typed DataStream
> [2] as well. Hope these are helpful.
>
> Thanks,
> Rong
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.6/dev/table/common.html#mapping-of-data-types-to-table-schema
> [2] https://ci.apache.org/projects/flink/flink-docs-
> release-1.6/dev/table/common.html#convert-a-table-into-a-
> datastream-or-dataset
>
>
>
> On Fri, Aug 24, 2018 at 8:04 AM françois lacombe <
> francois.lacombe@dcbrain.com> wrote:
>
>> Hi Timo,
>>
>> Thanks for your answer
>> I was looking for a Tuple as to feed a BatchTableSink<T> subclass, but it
>> may be achived with a Row instead?
>>
>> All the best
>>
>> François
>>
>> 2018-08-24 10:21 GMT+02:00 Timo Walther <tw...@apache.org>:
>>
>>> Hi,
>>>
>>> tuples are just a sub category of rows. Because the tuple arity is
>>> limited to 25 fields. I think the easiest solution would be to write your
>>> own converter that maps rows to tuples if you know that you will not need
>>> more than 25 fields. Otherwise it might be easier to just use a
>>> TextInputFormat and do the parsing yourself with a library.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> Am 23.08.18 um 18:54 schrieb françois lacombe:
>>>
>>> Hi all,
>>>>
>>>> I'm looking for best practices regarding Tuple<T> instances creation.
>>>>
>>>> I have a TypeInformation object produced by AvroSchemaConverter.
>>>> convertToTypeInfo("{...}");
>>>> Is this possible to define a corresponding Tuple<T> instance with it?
>>>> (get the T from the TypeInformation)
>>>>
>>>> Example :
>>>> {
>>>>   "type": "record",
>>>>   "fields": [
>>>>     { "name": "field1", "type": "int" },
>>>>     { "name": "field2", "type": "string"}
>>>> ]}
>>>>  = Tuple2<Int,String>
>>>>
>>>> The same question rises with DataSet or other any record handling class
>>>> with parametrized types.
>>>>
>>>> My goal is to parse several CsvFiles with different structures
>>>> described in an Avro schema.
>>>> It would be great to not hard-code structures in my Java code and only
>>>> get types information at runtime from Avro schemas
>>>>
>>>> Is this possible?
>>>>
>>>> Thanks in advance
>>>>
>>>> François Lacombe
>>>>
>>>
>>>
>>>
>>

Re: AvroSchemaConverter and Tuple classes

Posted by Rong Rong <wa...@gmail.com>.
Yes you should be able to use Row instead of Tuple in your
BatchTableSink<T>.
There's sections in Flink documentation regarding mapping of data types to
table schemas [1]. and table can be converted into various typed DataStream
[2] as well. Hope these are helpful.

Thanks,
Rong

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/common.html#mapping-of-data-types-to-table-schema
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/common.html#convert-a-table-into-a-datastream-or-dataset



On Fri, Aug 24, 2018 at 8:04 AM françois lacombe <
francois.lacombe@dcbrain.com> wrote:

> Hi Timo,
>
> Thanks for your answer
> I was looking for a Tuple as to feed a BatchTableSink<T> subclass, but it
> may be achived with a Row instead?
>
> All the best
>
> François
>
> 2018-08-24 10:21 GMT+02:00 Timo Walther <tw...@apache.org>:
>
>> Hi,
>>
>> tuples are just a sub category of rows. Because the tuple arity is
>> limited to 25 fields. I think the easiest solution would be to write your
>> own converter that maps rows to tuples if you know that you will not need
>> more than 25 fields. Otherwise it might be easier to just use a
>> TextInputFormat and do the parsing yourself with a library.
>>
>> Regards,
>> Timo
>>
>>
>> Am 23.08.18 um 18:54 schrieb françois lacombe:
>>
>> Hi all,
>>>
>>> I'm looking for best practices regarding Tuple<T> instances creation.
>>>
>>> I have a TypeInformation object produced by
>>> AvroSchemaConverter.convertToTypeInfo("{...}");
>>> Is this possible to define a corresponding Tuple<T> instance with it?
>>> (get the T from the TypeInformation)
>>>
>>> Example :
>>> {
>>>   "type": "record",
>>>   "fields": [
>>>     { "name": "field1", "type": "int" },
>>>     { "name": "field2", "type": "string"}
>>> ]}
>>>  = Tuple2<Int,String>
>>>
>>> The same question rises with DataSet or other any record handling class
>>> with parametrized types.
>>>
>>> My goal is to parse several CsvFiles with different structures described
>>> in an Avro schema.
>>> It would be great to not hard-code structures in my Java code and only
>>> get types information at runtime from Avro schemas
>>>
>>> Is this possible?
>>>
>>> Thanks in advance
>>>
>>> François Lacombe
>>>
>>
>>
>>
>

Re: AvroSchemaConverter and Tuple classes

Posted by françois lacombe <fr...@dcbrain.com>.
Hi Timo,

Thanks for your answer
I was looking for a Tuple as to feed a BatchTableSink<T> subclass, but it
may be achived with a Row instead?

All the best

François

2018-08-24 10:21 GMT+02:00 Timo Walther <tw...@apache.org>:

> Hi,
>
> tuples are just a sub category of rows. Because the tuple arity is limited
> to 25 fields. I think the easiest solution would be to write your own
> converter that maps rows to tuples if you know that you will not need more
> than 25 fields. Otherwise it might be easier to just use a TextInputFormat
> and do the parsing yourself with a library.
>
> Regards,
> Timo
>
>
> Am 23.08.18 um 18:54 schrieb françois lacombe:
>
> Hi all,
>>
>> I'm looking for best practices regarding Tuple<T> instances creation.
>>
>> I have a TypeInformation object produced by AvroSchemaConverter.convertToT
>> ypeInfo("{...}");
>> Is this possible to define a corresponding Tuple<T> instance with it?
>> (get the T from the TypeInformation)
>>
>> Example :
>> {
>>   "type": "record",
>>   "fields": [
>>     { "name": "field1", "type": "int" },
>>     { "name": "field2", "type": "string"}
>> ]}
>>  = Tuple2<Int,String>
>>
>> The same question rises with DataSet or other any record handling class
>> with parametrized types.
>>
>> My goal is to parse several CsvFiles with different structures described
>> in an Avro schema.
>> It would be great to not hard-code structures in my Java code and only
>> get types information at runtime from Avro schemas
>>
>> Is this possible?
>>
>> Thanks in advance
>>
>> François Lacombe
>>
>
>
>

Re: AvroSchemaConverter and Tuple classes

Posted by Timo Walther <tw...@apache.org>.
Hi,

tuples are just a sub category of rows. Because the tuple arity is 
limited to 25 fields. I think the easiest solution would be to write 
your own converter that maps rows to tuples if you know that you will 
not need more than 25 fields. Otherwise it might be easier to just use a 
TextInputFormat and do the parsing yourself with a library.

Regards,
Timo


Am 23.08.18 um 18:54 schrieb françois lacombe:
> Hi all,
>
> I'm looking for best practices regarding Tuple<T> instances creation.
>
> I have a TypeInformation object produced by 
> AvroSchemaConverter.convertToTypeInfo("{...}");
> Is this possible to define a corresponding Tuple<T> instance with it? 
> (get the T from the TypeInformation)
>
> Example :
> {
>   "type": "record",
>   "fields": [
>     { "name": "field1", "type": "int" },
>     { "name": "field2", "type": "string"}
> ]}
>  = Tuple2<Int,String>
>
> The same question rises with DataSet or other any record handling 
> class with parametrized types.
>
> My goal is to parse several CsvFiles with different structures 
> described in an Avro schema.
> It would be great to not hard-code structures in my Java code and only 
> get types information at runtime from Avro schemas
>
> Is this possible?
>
> Thanks in advance
>
> François Lacombe