You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ismaël Mejía <ie...@gmail.com> on 2017/05/23 16:52:29 UTC

[DISCUSS] HadoopInputFormat based IOs

Hello, I bring this subject to the mailing list to see everybody’s
opinion on the subject.

The recent inclusion of HadoopInputFormatIO (HiFiIO) gave Beam users
the option to ‘easily’ include data stores that support the
Hadoop-based partitioning scheme. There are currently examples of how
to use it for example to read from Elasticsearch and Cassandra. In
both cases we already have specific IOs on master or as WIP so using
HiFiIO based IO is not needed.

During the review of the recent IO for Hive (HCatalog) that uses
HiFiIO instead of a native API, there was a discussion about the fact
that this shouldn’t be included as a specific IO but better to add the
tests/documentation of how to read Hive records using the existing
HiFiIO. This makes sense from an abstraction point of view, however
there are visibility issues since end users would need to repackage
and discover the supported (and tested) HiFi-based IOs that won’t be
explicit in the code base.

I would like to know what other members of the community think about
this, is it worth to have individual IOs based on HiFiIO for things
that we currently don’t support (e.g. Hive or Amazon Redshift) (option
1) or maybe it is just better to add just the tests/docs of how to use
them as proposed in the PR (option 2).

Feel free to comment/vote or maybe add an eventual third option if you
think there is one better option.

Regards,
Ismaël Mejía

[1] https://issues.apache.org/jira/browse/BEAM-1158

RE: [DISCUSS] HadoopInputFormat based IOs

Posted by Seshadri Raghunathan <sr...@etouch.net>.
+1

I think this is a good way to streamline HIFIO and native IOs.

Regards,
Seshadri
408 601 7548

-----Original Message-----
From: Ismaël Mejía [mailto:iemejia@gmail.com] 
Sent: Tuesday, May 30, 2017 1:47 PM
To: dev@beam.apache.org
Subject: Re: [DISCUSS] HadoopInputFormat based IOs

The whole goal of this discussion is that we define what shall we do when someone wants to add a new IO that uses HIFIO. The consensus so far following the PR comments + this thread is that it should be discouraged and those contribution be included as documentation in the website, and that we should give priority to the native implementations, which seems reasonable (e,g, to encourage better implementations and avoid the maintenance burden).

So, I was wondering what would be a good rule to justify that we have tests for some data stores as part of the tests of HIFIO and I don't see a strong reason to do this, in particular once those have native implementations, to be more clear, in the current case we have HIFIO tests (jdk1.8-tests) for Elasticsearch5 and Cassandra which both are not covered by the native IOs yet. However once the native IOs for both systems are merged I don't see any reason to keep the extra tests in HIFIO, because we will be doing a double effort to test an IO that is not native, and that does not support Write, so I think we should remove those. Also not having this in the source code base would be consistent with the ideas of the previous paragraph.

But well maybe I am missing something here, do you see any strong reason to keep them.


Re: [DISCUSS] HadoopInputFormat based IOs

Posted by Stephen Sisk <si...@google.com.INVALID>.
That summary phrase looks good to me, thanks for writing it up.

S

On Thu, Jun 1, 2017 at 3:08 AM Ismaël Mejía <ie...@gmail.com> wrote:

> Stephen I agree with you that the most important thing is not to lose
> functionality for the integration tests, so it is important to keep at
> least one of the two (Cassandra or Elasticsearch) to do a real
> integration test for HIFIO.
>
> Your proposal of making the IT tests for the native IOs parallelized
> with WriteThenRead seems excellent, we have to sync to see how we can
> do this, my only doubt is that the setup takes still long time (at
> least for the bounded sources, it will all depend on the size of the
> generated data).
>
> I think we are good in our discussion (at least the missing part of
> the tests) so please confirm me if you agree with this phrase to
> summarize the thread:
>
> New IOs based on HIFIO won’t be merged as tests of HIFIO or as
> independent IOs. However we still encourage these contributions as
> documentation on how to use HIFIO with different data stores, these
> contributions will be part of the documentation in the Beam website.
>
>
> On Tue, May 30, 2017 at 11:16 PM, Stephen Sisk <si...@google.com.invalid>
> wrote:
> > Ah, thanks for clarifying ismael.
> >
> > I think you would agree that we need to have integration testing of
> HIFIO.
> > Cassandra and ES are currently the only ITs for HIFIO. If we want to
> write
> > ITs for HIFIO that don't rely on ES/Cassandra with the idea that we'd
> > remove ES/Cassandra, I could be okay with that. The data store in
> question
> > would need to have both small & large k8s cluster scripts so that we can
> do
> > small & large integration tests (since that's what's currently supported
> > with HIFIO today and I don't think we should go backwards.)
> >
> > The reason I hesitate to use a data store that doesn't have a native
> > implementation is that we can use ES/Cassandra's native write transform
> to
> > eventually switch HIFIO ITs to the new writeThenRead style IO IT [1] that
> > will *drastically* simplify maintenance requirements for the HIFIO tests.
> > WriteThenRead writes the test data inside of the test, thus removing the
> > requirement for a separate data loading step outside of the step. We
> > *could* write inside the test setup code (thus running only on one
> > machine), but for larger data amounts, that takes too long - it's easier
> to
> > do the write using the IO, which runs in parallel, and thus is a lot
> > quicker. That means we need a data store that has a native,
> parallelizable
> > write.
> >
> > What do you think? Basically, I agree with you in principal, but given
> that
> > using a data store without a native implementation either a separate data
> > loading step or slower tests, I'd strongly prefer to keep using
> > ES/Cassandra. (you could make the case that we should remove one of them.
> > I'm not attached to keeping both.)
> >
> >
> >> having [ES/Cassandra HIFIO read-code] in the source code base would
> [not]
> > be
> > consistent with the ideas of the previous paragraph.
> > I do agree with this. If we keep the ES/Cassandra HIFIO test code, I'd
> > propose that we add comments in there directing people to the correct
> > native source.
> >
> > S
> > [1] writeThenRead style IO IT -
> >
> https://lists.apache.org/thread.html/26ee3ba827c2917c393ab26ce97e7491846594d8f574b5ae29a44551@%3Cdev.beam.apache.org%3E
> >
> > On Tue, May 30, 2017 at 1:47 PM Ismaël Mejía <ie...@gmail.com> wrote:
> >
> >> The whole goal of this discussion is that we define what shall we do
> >> when someone wants to add a new IO that uses HIFIO. The consensus so
> >> far following the PR comments + this thread is that it should be
> >> discouraged and those contribution be included as documentation in the
> >> website, and that we should give priority to the native
> >> implementations, which seems reasonable (e,g, to encourage better
> >> implementations and avoid the maintenance burden).
> >>
> >> So, I was wondering what would be a good rule to justify that we have
> >> tests for some data stores as part of the tests of HIFIO and I don't
> >> see a strong reason to do this, in particular once those have native
> >> implementations, to be more clear, in the current case we have HIFIO
> >> tests (jdk1.8-tests) for Elasticsearch5 and Cassandra which both are
> >> not covered by the native IOs yet. However once the native IOs for
> >> both systems are merged I don't see any reason to keep the extra tests
> >> in HIFIO, because we will be doing a double effort to test an IO that
> >> is not native, and that does not support Write, so I think we should
> >> remove those. Also not having this in the source code base would be
> >> consistent with the ideas of the previous paragraph.
> >>
> >> But well maybe I am missing something here, do you see any strong
> >> reason to keep them.
> >>
>

Re: [DISCUSS] HadoopInputFormat based IOs

Posted by Ismaël Mejía <ie...@gmail.com>.
Stephen I agree with you that the most important thing is not to lose
functionality for the integration tests, so it is important to keep at
least one of the two (Cassandra or Elasticsearch) to do a real
integration test for HIFIO.

Your proposal of making the IT tests for the native IOs parallelized
with WriteThenRead seems excellent, we have to sync to see how we can
do this, my only doubt is that the setup takes still long time (at
least for the bounded sources, it will all depend on the size of the
generated data).

I think we are good in our discussion (at least the missing part of
the tests) so please confirm me if you agree with this phrase to
summarize the thread:

New IOs based on HIFIO won’t be merged as tests of HIFIO or as
independent IOs. However we still encourage these contributions as
documentation on how to use HIFIO with different data stores, these
contributions will be part of the documentation in the Beam website.


On Tue, May 30, 2017 at 11:16 PM, Stephen Sisk <si...@google.com.invalid> wrote:
> Ah, thanks for clarifying ismael.
>
> I think you would agree that we need to have integration testing of HIFIO.
> Cassandra and ES are currently the only ITs for HIFIO. If we want to write
> ITs for HIFIO that don't rely on ES/Cassandra with the idea that we'd
> remove ES/Cassandra, I could be okay with that. The data store in question
> would need to have both small & large k8s cluster scripts so that we can do
> small & large integration tests (since that's what's currently supported
> with HIFIO today and I don't think we should go backwards.)
>
> The reason I hesitate to use a data store that doesn't have a native
> implementation is that we can use ES/Cassandra's native write transform to
> eventually switch HIFIO ITs to the new writeThenRead style IO IT [1] that
> will *drastically* simplify maintenance requirements for the HIFIO tests.
> WriteThenRead writes the test data inside of the test, thus removing the
> requirement for a separate data loading step outside of the step. We
> *could* write inside the test setup code (thus running only on one
> machine), but for larger data amounts, that takes too long - it's easier to
> do the write using the IO, which runs in parallel, and thus is a lot
> quicker. That means we need a data store that has a native, parallelizable
> write.
>
> What do you think? Basically, I agree with you in principal, but given that
> using a data store without a native implementation either a separate data
> loading step or slower tests, I'd strongly prefer to keep using
> ES/Cassandra. (you could make the case that we should remove one of them.
> I'm not attached to keeping both.)
>
>
>> having [ES/Cassandra HIFIO read-code] in the source code base would [not]
> be
> consistent with the ideas of the previous paragraph.
> I do agree with this. If we keep the ES/Cassandra HIFIO test code, I'd
> propose that we add comments in there directing people to the correct
> native source.
>
> S
> [1] writeThenRead style IO IT -
> https://lists.apache.org/thread.html/26ee3ba827c2917c393ab26ce97e7491846594d8f574b5ae29a44551@%3Cdev.beam.apache.org%3E
>
> On Tue, May 30, 2017 at 1:47 PM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> The whole goal of this discussion is that we define what shall we do
>> when someone wants to add a new IO that uses HIFIO. The consensus so
>> far following the PR comments + this thread is that it should be
>> discouraged and those contribution be included as documentation in the
>> website, and that we should give priority to the native
>> implementations, which seems reasonable (e,g, to encourage better
>> implementations and avoid the maintenance burden).
>>
>> So, I was wondering what would be a good rule to justify that we have
>> tests for some data stores as part of the tests of HIFIO and I don't
>> see a strong reason to do this, in particular once those have native
>> implementations, to be more clear, in the current case we have HIFIO
>> tests (jdk1.8-tests) for Elasticsearch5 and Cassandra which both are
>> not covered by the native IOs yet. However once the native IOs for
>> both systems are merged I don't see any reason to keep the extra tests
>> in HIFIO, because we will be doing a double effort to test an IO that
>> is not native, and that does not support Write, so I think we should
>> remove those. Also not having this in the source code base would be
>> consistent with the ideas of the previous paragraph.
>>
>> But well maybe I am missing something here, do you see any strong
>> reason to keep them.
>>

Re: [DISCUSS] HadoopInputFormat based IOs

Posted by Stephen Sisk <si...@google.com.INVALID>.
Ah, thanks for clarifying ismael.

I think you would agree that we need to have integration testing of HIFIO.
Cassandra and ES are currently the only ITs for HIFIO. If we want to write
ITs for HIFIO that don't rely on ES/Cassandra with the idea that we'd
remove ES/Cassandra, I could be okay with that. The data store in question
would need to have both small & large k8s cluster scripts so that we can do
small & large integration tests (since that's what's currently supported
with HIFIO today and I don't think we should go backwards.)

The reason I hesitate to use a data store that doesn't have a native
implementation is that we can use ES/Cassandra's native write transform to
eventually switch HIFIO ITs to the new writeThenRead style IO IT [1] that
will *drastically* simplify maintenance requirements for the HIFIO tests.
WriteThenRead writes the test data inside of the test, thus removing the
requirement for a separate data loading step outside of the step. We
*could* write inside the test setup code (thus running only on one
machine), but for larger data amounts, that takes too long - it's easier to
do the write using the IO, which runs in parallel, and thus is a lot
quicker. That means we need a data store that has a native, parallelizable
write.

What do you think? Basically, I agree with you in principal, but given that
using a data store without a native implementation either a separate data
loading step or slower tests, I'd strongly prefer to keep using
ES/Cassandra. (you could make the case that we should remove one of them.
I'm not attached to keeping both.)


> having [ES/Cassandra HIFIO read-code] in the source code base would [not]
be
consistent with the ideas of the previous paragraph.
I do agree with this. If we keep the ES/Cassandra HIFIO test code, I'd
propose that we add comments in there directing people to the correct
native source.

S
[1] writeThenRead style IO IT -
https://lists.apache.org/thread.html/26ee3ba827c2917c393ab26ce97e7491846594d8f574b5ae29a44551@%3Cdev.beam.apache.org%3E

On Tue, May 30, 2017 at 1:47 PM Ismaël Mejía <ie...@gmail.com> wrote:

> The whole goal of this discussion is that we define what shall we do
> when someone wants to add a new IO that uses HIFIO. The consensus so
> far following the PR comments + this thread is that it should be
> discouraged and those contribution be included as documentation in the
> website, and that we should give priority to the native
> implementations, which seems reasonable (e,g, to encourage better
> implementations and avoid the maintenance burden).
>
> So, I was wondering what would be a good rule to justify that we have
> tests for some data stores as part of the tests of HIFIO and I don't
> see a strong reason to do this, in particular once those have native
> implementations, to be more clear, in the current case we have HIFIO
> tests (jdk1.8-tests) for Elasticsearch5 and Cassandra which both are
> not covered by the native IOs yet. However once the native IOs for
> both systems are merged I don't see any reason to keep the extra tests
> in HIFIO, because we will be doing a double effort to test an IO that
> is not native, and that does not support Write, so I think we should
> remove those. Also not having this in the source code base would be
> consistent with the ideas of the previous paragraph.
>
> But well maybe I am missing something here, do you see any strong
> reason to keep them.
>

Re: [DISCUSS] HadoopInputFormat based IOs

Posted by Ismaël Mejía <ie...@gmail.com>.
The whole goal of this discussion is that we define what shall we do
when someone wants to add a new IO that uses HIFIO. The consensus so
far following the PR comments + this thread is that it should be
discouraged and those contribution be included as documentation in the
website, and that we should give priority to the native
implementations, which seems reasonable (e,g, to encourage better
implementations and avoid the maintenance burden).

So, I was wondering what would be a good rule to justify that we have
tests for some data stores as part of the tests of HIFIO and I don't
see a strong reason to do this, in particular once those have native
implementations, to be more clear, in the current case we have HIFIO
tests (jdk1.8-tests) for Elasticsearch5 and Cassandra which both are
not covered by the native IOs yet. However once the native IOs for
both systems are merged I don't see any reason to keep the extra tests
in HIFIO, because we will be doing a double effort to test an IO that
is not native, and that does not support Write, so I think we should
remove those. Also not having this in the source code base would be
consistent with the ideas of the previous paragraph.

But well maybe I am missing something here, do you see any strong
reason to keep them.

Re: [DISCUSS] HadoopInputFormat based IOs

Posted by Stephen Sisk <si...@google.com.INVALID>.
Great, I'm glad to hear that. I filed BEAM-2388 to track the work
(currently unassigned)

> today we have Cassandra and Elasticsearch5 examples based
on HIF that will be clearly redundant once we have the native
versions, so they should maybe moved into the proposed website
section
Can you clarify what you're proposing removing? Are you saying that we
should remove the ES/cassandra examples from the HIFIO web page linked to
from the built-in page [2]?  I definitely agree with that, thanks for
pointing that out. (I don't think you're proposing removing the tests, eg
HIFIOWithEmbeddedCassandraTest)

S

[1] https://issues.apache.org/jira/browse/BEAM-2388
[2] https://beam.apache.org/documentation/io/built-in/hadoop/

On Tue, May 30, 2017 at 12:13 PM Ismaël Mejía <ie...@gmail.com> wrote:

> I agree 100% with Stephen points, I think that including a
> 'discoverability' section for these IOs that are shared by multiple
> data stores is a great step, in particular for the HIF ones.
>
> I would like that we define what would we do in concrete with the
> HIFIO based implementations of IOs once their native implementation is
> merged, e.g. today we have Cassandra and Elasticsearch5 examples based
> on HIF that will be clearly redundant once we have the native
> versions, so they should maybe moved into the proposed website
> section. What do you guys think?
>
> Any other ideas/comments on the general subject?
>
>
>
> On Tue, May 23, 2017 at 7:25 PM, Stephen Sisk <si...@google.com.invalid>
> wrote:
> > hey,
> >
> > Thanks for bringing this up! It's definitely an interesting question and
> I
> > can see both sides of the argument.
> >
> > I can see the appeal of HIFIO wrapper IOs as stop-gaps and if they have
> > good test coverage, it does ensure that the HIFIO route is working. If we
> > have good IT coverage, it also means there's fewer steps involved in
> > building a native IO as well, since the ITs will already be written.
> >
> > However, I think I'm still assuming that the community will implement
> > native IOs for most data stores that users want to interact with, and
> thus
> > I'd still discourage building IOs that are just HIFIO/jdbc wrappers. I'd
> > personally rather devote time and resources to native IOs. If we don't
> see
> > traction on building more IOs then I'd be more open to it.
> >
> > If we do choose to go down this "Don't build HIFIO wrappers, just improve
> > discoverability" route, one idea I had floating around in my head was
> that
> > we might add a section to the Built-in IO Transforms page that covers
> > "non-native but readable" IOs (better name suggestions appreciated :) -
> > that could include a list of data stores that jdbc/jms/hifio support and
> > link to HIFIO's info on how to use them. (That might also be a good place
> > to document the performance tradeoffs of using HIFIO)
> >
> > S
> >
> >
> > On Tue, May 23, 2017 at 9:53 AM Ismaël Mejía <ie...@gmail.com> wrote:
> >
> >> Hello, I bring this subject to the mailing list to see everybody’s
> >> opinion on the subject.
> >>
> >> The recent inclusion of HadoopInputFormatIO (HiFiIO) gave Beam users
> >> the option to ‘easily’ include data stores that support the
> >> Hadoop-based partitioning scheme. There are currently examples of how
> >> to use it for example to read from Elasticsearch and Cassandra. In
> >> both cases we already have specific IOs on master or as WIP so using
> >> HiFiIO based IO is not needed.
> >>
> >> During the review of the recent IO for Hive (HCatalog) that uses
> >> HiFiIO instead of a native API, there was a discussion about the fact
> >> that this shouldn’t be included as a specific IO but better to add the
> >> tests/documentation of how to read Hive records using the existing
> >> HiFiIO. This makes sense from an abstraction point of view, however
> >> there are visibility issues since end users would need to repackage
> >> and discover the supported (and tested) HiFi-based IOs that won’t be
> >> explicit in the code base.
> >>
> >> I would like to know what other members of the community think about
> >> this, is it worth to have individual IOs based on HiFiIO for things
> >> that we currently don’t support (e.g. Hive or Amazon Redshift) (option
> >> 1) or maybe it is just better to add just the tests/docs of how to use
> >> them as proposed in the PR (option 2).
> >>
> >> Feel free to comment/vote or maybe add an eventual third option if you
> >> think there is one better option.
> >>
> >> Regards,
> >> Ismaël Mejía
> >>
> >> [1] https://issues.apache.org/jira/browse/BEAM-1158
> >>
>

Re: [DISCUSS] HadoopInputFormat based IOs

Posted by Ismaël Mejía <ie...@gmail.com>.
I agree 100% with Stephen points, I think that including a
'discoverability' section for these IOs that are shared by multiple
data stores is a great step, in particular for the HIF ones.

I would like that we define what would we do in concrete with the
HIFIO based implementations of IOs once their native implementation is
merged, e.g. today we have Cassandra and Elasticsearch5 examples based
on HIF that will be clearly redundant once we have the native
versions, so they should maybe moved into the proposed website
section. What do you guys think?

Any other ideas/comments on the general subject?



On Tue, May 23, 2017 at 7:25 PM, Stephen Sisk <si...@google.com.invalid> wrote:
> hey,
>
> Thanks for bringing this up! It's definitely an interesting question and I
> can see both sides of the argument.
>
> I can see the appeal of HIFIO wrapper IOs as stop-gaps and if they have
> good test coverage, it does ensure that the HIFIO route is working. If we
> have good IT coverage, it also means there's fewer steps involved in
> building a native IO as well, since the ITs will already be written.
>
> However, I think I'm still assuming that the community will implement
> native IOs for most data stores that users want to interact with, and thus
> I'd still discourage building IOs that are just HIFIO/jdbc wrappers. I'd
> personally rather devote time and resources to native IOs. If we don't see
> traction on building more IOs then I'd be more open to it.
>
> If we do choose to go down this "Don't build HIFIO wrappers, just improve
> discoverability" route, one idea I had floating around in my head was that
> we might add a section to the Built-in IO Transforms page that covers
> "non-native but readable" IOs (better name suggestions appreciated :) -
> that could include a list of data stores that jdbc/jms/hifio support and
> link to HIFIO's info on how to use them. (That might also be a good place
> to document the performance tradeoffs of using HIFIO)
>
> S
>
>
> On Tue, May 23, 2017 at 9:53 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> Hello, I bring this subject to the mailing list to see everybody’s
>> opinion on the subject.
>>
>> The recent inclusion of HadoopInputFormatIO (HiFiIO) gave Beam users
>> the option to ‘easily’ include data stores that support the
>> Hadoop-based partitioning scheme. There are currently examples of how
>> to use it for example to read from Elasticsearch and Cassandra. In
>> both cases we already have specific IOs on master or as WIP so using
>> HiFiIO based IO is not needed.
>>
>> During the review of the recent IO for Hive (HCatalog) that uses
>> HiFiIO instead of a native API, there was a discussion about the fact
>> that this shouldn’t be included as a specific IO but better to add the
>> tests/documentation of how to read Hive records using the existing
>> HiFiIO. This makes sense from an abstraction point of view, however
>> there are visibility issues since end users would need to repackage
>> and discover the supported (and tested) HiFi-based IOs that won’t be
>> explicit in the code base.
>>
>> I would like to know what other members of the community think about
>> this, is it worth to have individual IOs based on HiFiIO for things
>> that we currently don’t support (e.g. Hive or Amazon Redshift) (option
>> 1) or maybe it is just better to add just the tests/docs of how to use
>> them as proposed in the PR (option 2).
>>
>> Feel free to comment/vote or maybe add an eventual third option if you
>> think there is one better option.
>>
>> Regards,
>> Ismaël Mejía
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-1158
>>

Re: [DISCUSS] HadoopInputFormat based IOs

Posted by Stephen Sisk <si...@google.com.INVALID>.
hey,

Thanks for bringing this up! It's definitely an interesting question and I
can see both sides of the argument.

I can see the appeal of HIFIO wrapper IOs as stop-gaps and if they have
good test coverage, it does ensure that the HIFIO route is working. If we
have good IT coverage, it also means there's fewer steps involved in
building a native IO as well, since the ITs will already be written.

However, I think I'm still assuming that the community will implement
native IOs for most data stores that users want to interact with, and thus
I'd still discourage building IOs that are just HIFIO/jdbc wrappers. I'd
personally rather devote time and resources to native IOs. If we don't see
traction on building more IOs then I'd be more open to it.

If we do choose to go down this "Don't build HIFIO wrappers, just improve
discoverability" route, one idea I had floating around in my head was that
we might add a section to the Built-in IO Transforms page that covers
"non-native but readable" IOs (better name suggestions appreciated :) -
that could include a list of data stores that jdbc/jms/hifio support and
link to HIFIO's info on how to use them. (That might also be a good place
to document the performance tradeoffs of using HIFIO)

S


On Tue, May 23, 2017 at 9:53 AM Ismaël Mejía <ie...@gmail.com> wrote:

> Hello, I bring this subject to the mailing list to see everybody’s
> opinion on the subject.
>
> The recent inclusion of HadoopInputFormatIO (HiFiIO) gave Beam users
> the option to ‘easily’ include data stores that support the
> Hadoop-based partitioning scheme. There are currently examples of how
> to use it for example to read from Elasticsearch and Cassandra. In
> both cases we already have specific IOs on master or as WIP so using
> HiFiIO based IO is not needed.
>
> During the review of the recent IO for Hive (HCatalog) that uses
> HiFiIO instead of a native API, there was a discussion about the fact
> that this shouldn’t be included as a specific IO but better to add the
> tests/documentation of how to read Hive records using the existing
> HiFiIO. This makes sense from an abstraction point of view, however
> there are visibility issues since end users would need to repackage
> and discover the supported (and tested) HiFi-based IOs that won’t be
> explicit in the code base.
>
> I would like to know what other members of the community think about
> this, is it worth to have individual IOs based on HiFiIO for things
> that we currently don’t support (e.g. Hive or Amazon Redshift) (option
> 1) or maybe it is just better to add just the tests/docs of how to use
> them as proposed in the PR (option 2).
>
> Feel free to comment/vote or maybe add an eventual third option if you
> think there is one better option.
>
> Regards,
> Ismaël Mejía
>
> [1] https://issues.apache.org/jira/browse/BEAM-1158
>