You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Luqman Ghani <lg...@gmail.com> on 2017/02/12 08:30:08 UTC

Specifying Schema dynamically

Hi,

I hope everyone is doing well.

I have a use case where we infer schema according to file headers and other
information. Now, in Flink, we can specify schema of a stream with case
classes and tuples. With tuples, we cannot give names to fields, but we
will have to generate case classes on the fly if we use them. Is there any
way of specifying schema with a Map[String,Any] to Flink, so it can infer
schema from this map.

Like if a file has a header: id, first_name, last_name, last_login
and we infer schema as: Int, String, String, Long

Can we specify it as Map[String, Any]("id" -> Int, "first_name" -> String,
"last_name" -> String, "last_login" -> Long)

We want to use keyBy with field names instead of their indices. I hope
there is a way :)

I was looking into dynamically create case classes in scala using
scala-reflect, but I'm facing problems in getting that class that
forwarding it to Flink program.

Thanks,
Luqman

Re: Specifying Schema dynamically

Posted by Luqman Ghani <lg...@gmail.com>.
Hi,

My case is very similar to what is described in this link of Spark:
http://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema

I hope this clarifies it.

Thanks,
Luqman

On Mon, Feb 13, 2017 at 12:04 PM, Tzu-Li (Gordon) Tai <tz...@apache.org>
wrote:

> Hi Luqman,
>
> From your description, it seems like that you want to infer the type (case
> class, tuple, etc.) of a stream dynamically at runtime.
> AFAIK, I don’t think this is supported in Flink. You’re required to have
> defined types for your DataStreams.
>
> Could you also provide an example code of what the functionality you have
> in mind looks like?
> That would help clarify if I have misunderstood and there’s actually a way
> to do it.
>
> - Gordon
>
> On February 12, 2017 at 4:30:56 PM, Luqman Ghani (lgsahaf@gmail.com)
> wrote:
>
> Like if a file has a header: id, first_name, last_name, last_login
> and we infer schema as: Int, String, String, Long
>
>

Re: Specifying Schema dynamically

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi Luqman,

From your description, it seems like that you want to infer the type (case class, tuple, etc.) of a stream dynamically at runtime.
AFAIK, I don’t think this is supported in Flink. You’re required to have defined types for your DataStreams.

Could you also provide an example code of what the functionality you have in mind looks like?
That would help clarify if I have misunderstood and there’s actually a way to do it.

- Gordon

On February 12, 2017 at 4:30:56 PM, Luqman Ghani (lgsahaf@gmail.com) wrote:

Like if a file has a header: id, first_name, last_name, last_login
and we infer schema as: Int, String, String, Long