You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Prashant Wason <pw...@uber.com.INVALID> on 2020/03/10 19:32:29 UTC

Issue related to [HUDI-377] Adding Delete() support to DeltaStreamer

Hi Team,

While exploring HUDI source code I came across this PR:
https://github.com/apache/incubator-hudi/pull/1073

As part of the above PR, generation of delete records was added
to HoodieTestDataGenerator. Within the class HoodieTestDataGenerator, the
existingKeys Map maintains the current keys. In the above PR, the following
code was added to delete from the Map:

existingKeys.remove(kp);

This is delete by value rather than the key (private final Map<Integer,
KeyPartition> existingKeys;)

I tried fixing this issue but this leads to unit test failures
in TestHoodieDeltaStreamer within the testUpsertsCOWContinuousMode. The
code which is failing is this check (bold):

    TestHelpers.waitTillCondition((r) -> {
      if (tableType.equals(HoodieTableType.MERGE_ON_READ)) {
        TestHelpers.assertAtleastNDeltaCommits(5, tableBasePath, dfs);
        TestHelpers.assertAtleastNCompactionCommits(2, tableBasePath, dfs);
      } else {
        TestHelpers.assertAtleastNCompactionCommits(5, tableBasePath, dfs);
      }
      *TestHelpers.assertRecordCount(totalRecords + 200, tableBasePath +
"/*/*.parquet", sqlContext);*
      *TestHelpers.assertDistanceCount(totalRecords + 200, tableBasePath +
"/*/*.parquet", sqlContext);*
      return true;

I did not understand why a +200 was added in the checks above? Is this
related to the existingKeys.remove() which does not remove the records from
the Map?

I have left these comments on the PR itself so they are easier to read.

Thanks
Prashant

Re: Issue related to [HUDI-377] Adding Delete() support to DeltaStreamer

Posted by Sivabalan <n....@gmail.com>.
https://github.com/apache/incubator-hudi/pull/1395



On Tue, Mar 10, 2020 at 12:40 PM Prashant Wason <pw...@uber.com.invalid>
wrote:

> Thanks for the update Sivabalan. I will wait for your fix.
>
> On Tue, Mar 10, 2020 at 12:36 PM Sivabalan <n....@gmail.com> wrote:
>
> > thanks for bringing this to my attention Prasant. Yes, I bumped into the
> > bug couple of days back. I am working on the fix, and the expected no of
> > records might have to be fixed as well. I am running into issues
> debugging
> > continuous tests as of now. But I am working on it.
> >
> >
> > On Tue, Mar 10, 2020 at 12:32 PM Prashant Wason <pwason@uber.com.invalid
> >
> > wrote:
> >
> > > Hi Team,
> > >
> > > While exploring HUDI source code I came across this PR:
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dhudi_pull_1073&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=c89AU9T1AVhM4r2Xi3ctZA&m=9WZ2tqIxWwOrZRAqmP_InSRBlFhGKElcWnFP-DPgCkY&s=s8ZOjL4LXWaB6kfrL-BUZdOwb22h4RA4ff9KdUrfTNk&e=
> > >
> > > As part of the above PR, generation of delete records was added
> > > to HoodieTestDataGenerator. Within the class HoodieTestDataGenerator,
> the
> > > existingKeys Map maintains the current keys. In the above PR, the
> > following
> > > code was added to delete from the Map:
> > >
> > > existingKeys.remove(kp);
> > >
> > > This is delete by value rather than the key (private final Map<Integer,
> > > KeyPartition> existingKeys;)
> > >
> > > I tried fixing this issue but this leads to unit test failures
> > > in TestHoodieDeltaStreamer within the testUpsertsCOWContinuousMode. The
> > > code which is failing is this check (bold):
> > >
> > >     TestHelpers.waitTillCondition((r) -> {
> > >       if (tableType.equals(HoodieTableType.MERGE_ON_READ)) {
> > >         TestHelpers.assertAtleastNDeltaCommits(5, tableBasePath, dfs);
> > >         TestHelpers.assertAtleastNCompactionCommits(2, tableBasePath,
> > dfs);
> > >       } else {
> > >         TestHelpers.assertAtleastNCompactionCommits(5, tableBasePath,
> > dfs);
> > >       }
> > >       *TestHelpers.assertRecordCount(totalRecords + 200, tableBasePath
> +
> > > "/*/*.parquet", sqlContext);*
> > >       *TestHelpers.assertDistanceCount(totalRecords + 200,
> tableBasePath
> > +
> > > "/*/*.parquet", sqlContext);*
> > >       return true;
> > >
> > > I did not understand why a +200 was added in the checks above? Is this
> > > related to the existingKeys.remove() which does not remove the records
> > from
> > > the Map?
> > >
> > > I have left these comments on the PR itself so they are easier to read.
> > >
> > > Thanks
> > > Prashant
> > >
> >
> >
> > --
> > Regards,
> > -Sivabalan
> >
>


-- 
Regards,
-Sivabalan

Re: Issue related to [HUDI-377] Adding Delete() support to DeltaStreamer

Posted by Prashant Wason <pw...@uber.com.INVALID>.
Thanks for the update Sivabalan. I will wait for your fix.

On Tue, Mar 10, 2020 at 12:36 PM Sivabalan <n....@gmail.com> wrote:

> thanks for bringing this to my attention Prasant. Yes, I bumped into the
> bug couple of days back. I am working on the fix, and the expected no of
> records might have to be fixed as well. I am running into issues debugging
> continuous tests as of now. But I am working on it.
>
>
> On Tue, Mar 10, 2020 at 12:32 PM Prashant Wason <pw...@uber.com.invalid>
> wrote:
>
> > Hi Team,
> >
> > While exploring HUDI source code I came across this PR:
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dhudi_pull_1073&d=DwIBaQ&c=r2dcLCtU9q6n0vrtnDw9vg&r=c89AU9T1AVhM4r2Xi3ctZA&m=9WZ2tqIxWwOrZRAqmP_InSRBlFhGKElcWnFP-DPgCkY&s=s8ZOjL4LXWaB6kfrL-BUZdOwb22h4RA4ff9KdUrfTNk&e=
> >
> > As part of the above PR, generation of delete records was added
> > to HoodieTestDataGenerator. Within the class HoodieTestDataGenerator, the
> > existingKeys Map maintains the current keys. In the above PR, the
> following
> > code was added to delete from the Map:
> >
> > existingKeys.remove(kp);
> >
> > This is delete by value rather than the key (private final Map<Integer,
> > KeyPartition> existingKeys;)
> >
> > I tried fixing this issue but this leads to unit test failures
> > in TestHoodieDeltaStreamer within the testUpsertsCOWContinuousMode. The
> > code which is failing is this check (bold):
> >
> >     TestHelpers.waitTillCondition((r) -> {
> >       if (tableType.equals(HoodieTableType.MERGE_ON_READ)) {
> >         TestHelpers.assertAtleastNDeltaCommits(5, tableBasePath, dfs);
> >         TestHelpers.assertAtleastNCompactionCommits(2, tableBasePath,
> dfs);
> >       } else {
> >         TestHelpers.assertAtleastNCompactionCommits(5, tableBasePath,
> dfs);
> >       }
> >       *TestHelpers.assertRecordCount(totalRecords + 200, tableBasePath +
> > "/*/*.parquet", sqlContext);*
> >       *TestHelpers.assertDistanceCount(totalRecords + 200, tableBasePath
> +
> > "/*/*.parquet", sqlContext);*
> >       return true;
> >
> > I did not understand why a +200 was added in the checks above? Is this
> > related to the existingKeys.remove() which does not remove the records
> from
> > the Map?
> >
> > I have left these comments on the PR itself so they are easier to read.
> >
> > Thanks
> > Prashant
> >
>
>
> --
> Regards,
> -Sivabalan
>

Re: Issue related to [HUDI-377] Adding Delete() support to DeltaStreamer

Posted by Sivabalan <n....@gmail.com>.
thanks for bringing this to my attention Prasant. Yes, I bumped into the
bug couple of days back. I am working on the fix, and the expected no of
records might have to be fixed as well. I am running into issues debugging
continuous tests as of now. But I am working on it.


On Tue, Mar 10, 2020 at 12:32 PM Prashant Wason <pw...@uber.com.invalid>
wrote:

> Hi Team,
>
> While exploring HUDI source code I came across this PR:
> https://github.com/apache/incubator-hudi/pull/1073
>
> As part of the above PR, generation of delete records was added
> to HoodieTestDataGenerator. Within the class HoodieTestDataGenerator, the
> existingKeys Map maintains the current keys. In the above PR, the following
> code was added to delete from the Map:
>
> existingKeys.remove(kp);
>
> This is delete by value rather than the key (private final Map<Integer,
> KeyPartition> existingKeys;)
>
> I tried fixing this issue but this leads to unit test failures
> in TestHoodieDeltaStreamer within the testUpsertsCOWContinuousMode. The
> code which is failing is this check (bold):
>
>     TestHelpers.waitTillCondition((r) -> {
>       if (tableType.equals(HoodieTableType.MERGE_ON_READ)) {
>         TestHelpers.assertAtleastNDeltaCommits(5, tableBasePath, dfs);
>         TestHelpers.assertAtleastNCompactionCommits(2, tableBasePath, dfs);
>       } else {
>         TestHelpers.assertAtleastNCompactionCommits(5, tableBasePath, dfs);
>       }
>       *TestHelpers.assertRecordCount(totalRecords + 200, tableBasePath +
> "/*/*.parquet", sqlContext);*
>       *TestHelpers.assertDistanceCount(totalRecords + 200, tableBasePath +
> "/*/*.parquet", sqlContext);*
>       return true;
>
> I did not understand why a +200 was added in the checks above? Is this
> related to the existingKeys.remove() which does not remove the records from
> the Map?
>
> I have left these comments on the PR itself so they are easier to read.
>
> Thanks
> Prashant
>


-- 
Regards,
-Sivabalan