You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Reuven Lax <re...@google.com> on 2018/02/02 21:23:58 UTC

rename: BeamRecord -> Row

We're looking at renaming the BeamRecord class
<https://github.com/apache/beam/pull/4550>, that was used for columnar
data. There was sufficient discussion on the naming, that I want to make
sure the dev list is aware of naming plans here.

BeamRecord is a columnar, field-based record. Currently it's used by
BeamSQL, and the plan is to use it for schemas as well. "Record" is a
confusing name for this class, as all elements in the Beam model are
referred to as "records," whether or not they have schemas. "Row" is a much
clearer name.

There was a lot of discussion whether to name this BeamRow or just plain
Row (in the org.apache.beam.values namespace). The argument in favor of
BeamRow was so that people aren't forced to qualify their type names in the
case of a conflict with a Row from another package. The argument in favor
of Row was that it's a better name, it's in the Beam namespace anyway, and
it's what the rest of the world (Cassandra, Hive, Spark, etc.) calls
similar classes.

RIght not consensus on the PR is leaning to Row. If you feel strongly,
please speak up :)

Reuven

Re: rename: BeamRecord -> Row

Posted by Reuven Lax <re...@google.com>.
This thread exists so others can weigh in on the name if they want :) If I
don't hear any conflicting opinions, I'll merge the PR with Row (though of
course this is all still an experimental API, so we can easily change our
minds about the name later)

On Sat, Feb 3, 2018 at 9:44 AM, Romain Manni-Bucau <rm...@gmail.com>
wrote:

> This is as true as the renaming is not needed so I guess the PR owner will
> decide ;). Thanks for the clarification.
>
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github
> <https://github.com/rmannibucau> | LinkedIn
> <https://www.linkedin.com/in/rmannibucau> | Book
> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>
> 2018-02-03 18:36 GMT+01:00 Reuven Lax <re...@google.com>:
>
>> Oh I agree 100%, however I'm just saying that we shouldn't ask the SQL
>> effort to halt just because the schema effort overlaps. There's at least
>> one other pending PR on this class (to do with automatic POJO generation).
>>
>> Also the name of the Record/Row class is somewhat independent of
>> everything else in the schema discussion, and doesn't really need to block
>> on that. That's why I started this thread. there was enough discussion on
>> the PR itself that I felt that the community should be aware, as I assume
>> not everyone follows all PR discussions :)
>>
>> Reuven
>>
>> On Sat, Feb 3, 2018 at 9:00 AM, Romain Manni-Bucau <rmannibucau@gmail.com
>> > wrote:
>>
>>> I know Reuven, but when you check what it does, it is exactly the same
>>> and the current work will be to replace by the schema work so better to
>>> avoid a round trip of work which will be throw away in any case. Also note
>>> that current structure is flat and very limiting for modern SQL so the
>>> alignment of both will be beneficial to beam in any case so better to
>>> ensure all parts of the projects move in the same direction instead of
>>> requiring yet another layer of conversion, no?
>>>
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>> <http://rmannibucau.wordpress.com> | Github
>>> <https://github.com/rmannibucau> | LinkedIn
>>> <https://www.linkedin.com/in/rmannibucau> | Book
>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>>
>>> 2018-02-03 16:32 GMT+01:00 Reuven Lax <re...@google.com>:
>>>
>>>> This is a core part of SQL which is ongoing.
>>>>
>>>> On Feb 2, 2018 11:45 PM, "Romain Manni-Bucau" <rm...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> Shouldnt the discussion on schema which has a direct impact on this
>>>>> generic container be closed before any action on this?
>>>>>
>>>>>
>>>>> Le 3 févr. 2018 01:09, "Ankur Chauhan" <an...@malloc64.com> a écrit :
>>>>>
>>>>>> ++
>>>>>>
>>>>>> On Fri, Feb 2, 2018 at 1:33 PM Rafael Fernandez <rf...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Very strong +1
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 2, 2018 at 1:24 PM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>> We're looking at renaming the BeamRecord class
>>>>>>>> <https://github.com/apache/beam/pull/4550>, that was used for
>>>>>>>> columnar data. There was sufficient discussion on the naming, that I want
>>>>>>>> to make sure the dev list is aware of naming plans here.
>>>>>>>>
>>>>>>>> BeamRecord is a columnar, field-based record. Currently it's used
>>>>>>>> by BeamSQL, and the plan is to use it for schemas as well. "Record" is a
>>>>>>>> confusing name for this class, as all elements in the Beam model are
>>>>>>>> referred to as "records," whether or not they have schemas. "Row" is a much
>>>>>>>> clearer name.
>>>>>>>>
>>>>>>>> There was a lot of discussion whether to name this BeamRow or just
>>>>>>>> plain Row (in the org.apache.beam.values namespace). The argument in favor
>>>>>>>> of BeamRow was so that people aren't forced to qualify their type names in
>>>>>>>> the case of a conflict with a Row from another package. The argument in
>>>>>>>> favor of Row was that it's a better name, it's in the Beam namespace
>>>>>>>> anyway, and it's what the rest of the world (Cassandra, Hive, Spark, etc.)
>>>>>>>> calls similar classes.
>>>>>>>>
>>>>>>>> RIght not consensus on the PR is leaning to Row. If you feel
>>>>>>>> strongly, please speak up :)
>>>>>>>>
>>>>>>>> Reuven
>>>>>>>>
>>>>>>>
>>>
>>
>

Re: rename: BeamRecord -> Row

Posted by Romain Manni-Bucau <rm...@gmail.com>.
This is as true as the renaming is not needed so I guess the PR owner will
decide ;). Thanks for the clarification.


Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>

2018-02-03 18:36 GMT+01:00 Reuven Lax <re...@google.com>:

> Oh I agree 100%, however I'm just saying that we shouldn't ask the SQL
> effort to halt just because the schema effort overlaps. There's at least
> one other pending PR on this class (to do with automatic POJO generation).
>
> Also the name of the Record/Row class is somewhat independent of
> everything else in the schema discussion, and doesn't really need to block
> on that. That's why I started this thread. there was enough discussion on
> the PR itself that I felt that the community should be aware, as I assume
> not everyone follows all PR discussions :)
>
> Reuven
>
> On Sat, Feb 3, 2018 at 9:00 AM, Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
>> I know Reuven, but when you check what it does, it is exactly the same
>> and the current work will be to replace by the schema work so better to
>> avoid a round trip of work which will be throw away in any case. Also note
>> that current structure is flat and very limiting for modern SQL so the
>> alignment of both will be beneficial to beam in any case so better to
>> ensure all parts of the projects move in the same direction instead of
>> requiring yet another layer of conversion, no?
>>
>>
>> Romain Manni-Bucau
>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>> <https://rmannibucau.metawerx.net/> | Old Blog
>> <http://rmannibucau.wordpress.com> | Github
>> <https://github.com/rmannibucau> | LinkedIn
>> <https://www.linkedin.com/in/rmannibucau> | Book
>> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>>
>> 2018-02-03 16:32 GMT+01:00 Reuven Lax <re...@google.com>:
>>
>>> This is a core part of SQL which is ongoing.
>>>
>>> On Feb 2, 2018 11:45 PM, "Romain Manni-Bucau" <rm...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> Shouldnt the discussion on schema which has a direct impact on this
>>>> generic container be closed before any action on this?
>>>>
>>>>
>>>> Le 3 févr. 2018 01:09, "Ankur Chauhan" <an...@malloc64.com> a écrit :
>>>>
>>>>> ++
>>>>>
>>>>> On Fri, Feb 2, 2018 at 1:33 PM Rafael Fernandez <rf...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Very strong +1
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 2, 2018 at 1:24 PM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> We're looking at renaming the BeamRecord class
>>>>>>> <https://github.com/apache/beam/pull/4550>, that was used for
>>>>>>> columnar data. There was sufficient discussion on the naming, that I want
>>>>>>> to make sure the dev list is aware of naming plans here.
>>>>>>>
>>>>>>> BeamRecord is a columnar, field-based record. Currently it's used by
>>>>>>> BeamSQL, and the plan is to use it for schemas as well. "Record" is a
>>>>>>> confusing name for this class, as all elements in the Beam model are
>>>>>>> referred to as "records," whether or not they have schemas. "Row" is a much
>>>>>>> clearer name.
>>>>>>>
>>>>>>> There was a lot of discussion whether to name this BeamRow or just
>>>>>>> plain Row (in the org.apache.beam.values namespace). The argument in favor
>>>>>>> of BeamRow was so that people aren't forced to qualify their type names in
>>>>>>> the case of a conflict with a Row from another package. The argument in
>>>>>>> favor of Row was that it's a better name, it's in the Beam namespace
>>>>>>> anyway, and it's what the rest of the world (Cassandra, Hive, Spark, etc.)
>>>>>>> calls similar classes.
>>>>>>>
>>>>>>> RIght not consensus on the PR is leaning to Row. If you feel
>>>>>>> strongly, please speak up :)
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>
>>
>

Re: rename: BeamRecord -> Row

Posted by Reuven Lax <re...@google.com>.
Oh I agree 100%, however I'm just saying that we shouldn't ask the SQL
effort to halt just because the schema effort overlaps. There's at least
one other pending PR on this class (to do with automatic POJO generation).

Also the name of the Record/Row class is somewhat independent of everything
else in the schema discussion, and doesn't really need to block on that.
That's why I started this thread. there was enough discussion on the PR
itself that I felt that the community should be aware, as I assume not
everyone follows all PR discussions :)

Reuven

On Sat, Feb 3, 2018 at 9:00 AM, Romain Manni-Bucau <rm...@gmail.com>
wrote:

> I know Reuven, but when you check what it does, it is exactly the same and
> the current work will be to replace by the schema work so better to avoid a
> round trip of work which will be throw away in any case. Also note that
> current structure is flat and very limiting for modern SQL so the alignment
> of both will be beneficial to beam in any case so better to ensure all
> parts of the projects move in the same direction instead of requiring yet
> another layer of conversion, no?
>
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github
> <https://github.com/rmannibucau> | LinkedIn
> <https://www.linkedin.com/in/rmannibucau> | Book
> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>
> 2018-02-03 16:32 GMT+01:00 Reuven Lax <re...@google.com>:
>
>> This is a core part of SQL which is ongoing.
>>
>> On Feb 2, 2018 11:45 PM, "Romain Manni-Bucau" <rm...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> Shouldnt the discussion on schema which has a direct impact on this
>>> generic container be closed before any action on this?
>>>
>>>
>>> Le 3 févr. 2018 01:09, "Ankur Chauhan" <an...@malloc64.com> a écrit :
>>>
>>>> ++
>>>>
>>>> On Fri, Feb 2, 2018 at 1:33 PM Rafael Fernandez <rf...@google.com>
>>>> wrote:
>>>>
>>>>> Very strong +1
>>>>>
>>>>>
>>>>> On Fri, Feb 2, 2018 at 1:24 PM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> We're looking at renaming the BeamRecord class
>>>>>> <https://github.com/apache/beam/pull/4550>, that was used for
>>>>>> columnar data. There was sufficient discussion on the naming, that I want
>>>>>> to make sure the dev list is aware of naming plans here.
>>>>>>
>>>>>> BeamRecord is a columnar, field-based record. Currently it's used by
>>>>>> BeamSQL, and the plan is to use it for schemas as well. "Record" is a
>>>>>> confusing name for this class, as all elements in the Beam model are
>>>>>> referred to as "records," whether or not they have schemas. "Row" is a much
>>>>>> clearer name.
>>>>>>
>>>>>> There was a lot of discussion whether to name this BeamRow or just
>>>>>> plain Row (in the org.apache.beam.values namespace). The argument in favor
>>>>>> of BeamRow was so that people aren't forced to qualify their type names in
>>>>>> the case of a conflict with a Row from another package. The argument in
>>>>>> favor of Row was that it's a better name, it's in the Beam namespace
>>>>>> anyway, and it's what the rest of the world (Cassandra, Hive, Spark, etc.)
>>>>>> calls similar classes.
>>>>>>
>>>>>> RIght not consensus on the PR is leaning to Row. If you feel
>>>>>> strongly, please speak up :)
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>
>

Re: rename: BeamRecord -> Row

Posted by Romain Manni-Bucau <rm...@gmail.com>.
I know Reuven, but when you check what it does, it is exactly the same and
the current work will be to replace by the schema work so better to avoid a
round trip of work which will be throw away in any case. Also note that
current structure is flat and very limiting for modern SQL so the alignment
of both will be beneficial to beam in any case so better to ensure all
parts of the projects move in the same direction instead of requiring yet
another layer of conversion, no?


Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>

2018-02-03 16:32 GMT+01:00 Reuven Lax <re...@google.com>:

> This is a core part of SQL which is ongoing.
>
> On Feb 2, 2018 11:45 PM, "Romain Manni-Bucau" <rm...@gmail.com>
> wrote:
>
>> Hi
>>
>> Shouldnt the discussion on schema which has a direct impact on this
>> generic container be closed before any action on this?
>>
>>
>> Le 3 févr. 2018 01:09, "Ankur Chauhan" <an...@malloc64.com> a écrit :
>>
>>> ++
>>>
>>> On Fri, Feb 2, 2018 at 1:33 PM Rafael Fernandez <rf...@google.com>
>>> wrote:
>>>
>>>> Very strong +1
>>>>
>>>>
>>>> On Fri, Feb 2, 2018 at 1:24 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> We're looking at renaming the BeamRecord class
>>>>> <https://github.com/apache/beam/pull/4550>, that was used for
>>>>> columnar data. There was sufficient discussion on the naming, that I want
>>>>> to make sure the dev list is aware of naming plans here.
>>>>>
>>>>> BeamRecord is a columnar, field-based record. Currently it's used by
>>>>> BeamSQL, and the plan is to use it for schemas as well. "Record" is a
>>>>> confusing name for this class, as all elements in the Beam model are
>>>>> referred to as "records," whether or not they have schemas. "Row" is a much
>>>>> clearer name.
>>>>>
>>>>> There was a lot of discussion whether to name this BeamRow or just
>>>>> plain Row (in the org.apache.beam.values namespace). The argument in favor
>>>>> of BeamRow was so that people aren't forced to qualify their type names in
>>>>> the case of a conflict with a Row from another package. The argument in
>>>>> favor of Row was that it's a better name, it's in the Beam namespace
>>>>> anyway, and it's what the rest of the world (Cassandra, Hive, Spark, etc.)
>>>>> calls similar classes.
>>>>>
>>>>> RIght not consensus on the PR is leaning to Row. If you feel strongly,
>>>>> please speak up :)
>>>>>
>>>>> Reuven
>>>>>
>>>>

Re: rename: BeamRecord -> Row

Posted by Reuven Lax <re...@google.com>.
This is a core part of SQL which is ongoing.

On Feb 2, 2018 11:45 PM, "Romain Manni-Bucau" <rm...@gmail.com> wrote:

> Hi
>
> Shouldnt the discussion on schema which has a direct impact on this
> generic container be closed before any action on this?
>
>
> Le 3 févr. 2018 01:09, "Ankur Chauhan" <an...@malloc64.com> a écrit :
>
>> ++
>>
>> On Fri, Feb 2, 2018 at 1:33 PM Rafael Fernandez <rf...@google.com>
>> wrote:
>>
>>> Very strong +1
>>>
>>>
>>> On Fri, Feb 2, 2018 at 1:24 PM Reuven Lax <re...@google.com> wrote:
>>>
>>>> We're looking at renaming the BeamRecord class
>>>> <https://github.com/apache/beam/pull/4550>, that was used for columnar
>>>> data. There was sufficient discussion on the naming, that I want to make
>>>> sure the dev list is aware of naming plans here.
>>>>
>>>> BeamRecord is a columnar, field-based record. Currently it's used by
>>>> BeamSQL, and the plan is to use it for schemas as well. "Record" is a
>>>> confusing name for this class, as all elements in the Beam model are
>>>> referred to as "records," whether or not they have schemas. "Row" is a much
>>>> clearer name.
>>>>
>>>> There was a lot of discussion whether to name this BeamRow or just
>>>> plain Row (in the org.apache.beam.values namespace). The argument in favor
>>>> of BeamRow was so that people aren't forced to qualify their type names in
>>>> the case of a conflict with a Row from another package. The argument in
>>>> favor of Row was that it's a better name, it's in the Beam namespace
>>>> anyway, and it's what the rest of the world (Cassandra, Hive, Spark, etc.)
>>>> calls similar classes.
>>>>
>>>> RIght not consensus on the PR is leaning to Row. If you feel strongly,
>>>> please speak up :)
>>>>
>>>> Reuven
>>>>
>>>

Re: rename: BeamRecord -> Row

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Hi

Shouldnt the discussion on schema which has a direct impact on this generic
container be closed before any action on this?


Le 3 févr. 2018 01:09, "Ankur Chauhan" <an...@malloc64.com> a écrit :

> ++
>
> On Fri, Feb 2, 2018 at 1:33 PM Rafael Fernandez <rf...@google.com>
> wrote:
>
>> Very strong +1
>>
>>
>> On Fri, Feb 2, 2018 at 1:24 PM Reuven Lax <re...@google.com> wrote:
>>
>>> We're looking at renaming the BeamRecord class
>>> <https://github.com/apache/beam/pull/4550>, that was used for columnar
>>> data. There was sufficient discussion on the naming, that I want to make
>>> sure the dev list is aware of naming plans here.
>>>
>>> BeamRecord is a columnar, field-based record. Currently it's used by
>>> BeamSQL, and the plan is to use it for schemas as well. "Record" is a
>>> confusing name for this class, as all elements in the Beam model are
>>> referred to as "records," whether or not they have schemas. "Row" is a much
>>> clearer name.
>>>
>>> There was a lot of discussion whether to name this BeamRow or just plain
>>> Row (in the org.apache.beam.values namespace). The argument in favor of
>>> BeamRow was so that people aren't forced to qualify their type names in the
>>> case of a conflict with a Row from another package. The argument in favor
>>> of Row was that it's a better name, it's in the Beam namespace anyway, and
>>> it's what the rest of the world (Cassandra, Hive, Spark, etc.) calls
>>> similar classes.
>>>
>>> RIght not consensus on the PR is leaning to Row. If you feel strongly,
>>> please speak up :)
>>>
>>> Reuven
>>>
>>

Re: rename: BeamRecord -> Row

Posted by Ankur Chauhan <an...@malloc64.com>.
++

On Fri, Feb 2, 2018 at 1:33 PM Rafael Fernandez <rf...@google.com> wrote:

> Very strong +1
>
>
> On Fri, Feb 2, 2018 at 1:24 PM Reuven Lax <re...@google.com> wrote:
>
>> We're looking at renaming the BeamRecord class
>> <https://github.com/apache/beam/pull/4550>, that was used for columnar
>> data. There was sufficient discussion on the naming, that I want to make
>> sure the dev list is aware of naming plans here.
>>
>> BeamRecord is a columnar, field-based record. Currently it's used by
>> BeamSQL, and the plan is to use it for schemas as well. "Record" is a
>> confusing name for this class, as all elements in the Beam model are
>> referred to as "records," whether or not they have schemas. "Row" is a much
>> clearer name.
>>
>> There was a lot of discussion whether to name this BeamRow or just plain
>> Row (in the org.apache.beam.values namespace). The argument in favor of
>> BeamRow was so that people aren't forced to qualify their type names in the
>> case of a conflict with a Row from another package. The argument in favor
>> of Row was that it's a better name, it's in the Beam namespace anyway, and
>> it's what the rest of the world (Cassandra, Hive, Spark, etc.) calls
>> similar classes.
>>
>> RIght not consensus on the PR is leaning to Row. If you feel strongly,
>> please speak up :)
>>
>> Reuven
>>
>

Re: rename: BeamRecord -> Row

Posted by Rafael Fernandez <rf...@google.com>.
Very strong +1


On Fri, Feb 2, 2018 at 1:24 PM Reuven Lax <re...@google.com> wrote:

> We're looking at renaming the BeamRecord class
> <https://github.com/apache/beam/pull/4550>, that was used for columnar
> data. There was sufficient discussion on the naming, that I want to make
> sure the dev list is aware of naming plans here.
>
> BeamRecord is a columnar, field-based record. Currently it's used by
> BeamSQL, and the plan is to use it for schemas as well. "Record" is a
> confusing name for this class, as all elements in the Beam model are
> referred to as "records," whether or not they have schemas. "Row" is a much
> clearer name.
>
> There was a lot of discussion whether to name this BeamRow or just plain
> Row (in the org.apache.beam.values namespace). The argument in favor of
> BeamRow was so that people aren't forced to qualify their type names in the
> case of a conflict with a Row from another package. The argument in favor
> of Row was that it's a better name, it's in the Beam namespace anyway, and
> it's what the rest of the world (Cassandra, Hive, Spark, etc.) calls
> similar classes.
>
> RIght not consensus on the PR is leaning to Row. If you feel strongly,
> please speak up :)
>
> Reuven
>