You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jan Lukavský <ja...@firma.seznam.cz> on 2014/10/21 17:23:01 UTC
Delete.deleteColumns not working with HFileOutputFormat?
Hi all,
we are using HBase version 0.94.6-cdh4.3.1 and I have a suspicion that a
Delete written to hbase through HFileOutputFormat might be ignored (and
not delete any data) in the following scenario:
* a Delete object is used to delete the data at the client side
* call to "deleteColumn" instead of "deleteColumns" is used, which
means that the underlaying KeyValue will not have an associated
timestamp (will have HConstants.LATEST_TIMESTAMP)
* the Delete object is then converted to KeyValues and these are
written into the output format's record writer
I think (our systems seems to behave this way) the problem is in the way
the KeyValue is processed in the RegionServer, even though I was not
able to track the problem in the source code. Can anyone else confirm
this? When using Delete#deleteColumns everything seems to be working
fine (the KeyValues have different type). Is this expected or should it
be considered a bug? And if so, where it should be fixed? I think it
could be on the side of the record writer (maybe by throwing an
exception), or in the region server (if possible, this might be
non-trivial, because of the Delete#deleteColumn semantics).
Any opinions?
Thanks,
Jan
Re: Delete.deleteColumn not working with HFileOutputFormat?
Posted by Ted Yu <yu...@gmail.com>.
Yes.
Once you come up with a unit test, file a JIRA.
Thanks
On Oct 22, 2014, at 1:59 AM, Jan Lukavský <ja...@firma.seznam.cz> wrote:
> Hi Ted,
>
> sure, there was a typo in the subject. The problem is with Delete#deleteColumn, fixed that in the subject. Since we are not planning to upgrade our CDH4 distribution (we are planning to upgrade to CDH5 as a next step), I'm afraid I cannot simply test this on the version you mentioned. I can try to create a unittest for this. Should I file a JIRA?
>
> Thanks,
> Jan
>
> On 10/21/2014 06:05 PM, Ted Yu wrote:
>> bq. When using Delete#deleteColumns everything seems to be working fine
>>
>> Please confirm that the issue you observe was with Delete#deleteColumn
>> (different from the method mentioned in subject).
>>
>> Can you tried with 0.94.24 (the latest 0.94 release) ?
>>
>> If you can capture this using a unit test, that would great.
>>
>> Thanks
>>
>> On Tue, Oct 21, 2014 at 8:23 AM, Jan Lukavský <ja...@firma.seznam.cz>
>> wrote:
>>
>>> Hi all,
>>>
>>> we are using HBase version 0.94.6-cdh4.3.1 and I have a suspicion that a
>>> Delete written to hbase through HFileOutputFormat might be ignored (and not
>>> delete any data) in the following scenario:
>>> * a Delete object is used to delete the data at the client side
>>> * call to "deleteColumn" instead of "deleteColumns" is used, which means
>>> that the underlaying KeyValue will not have an associated timestamp (will
>>> have HConstants.LATEST_TIMESTAMP)
>>> * the Delete object is then converted to KeyValues and these are written
>>> into the output format's record writer
>>>
>>> I think (our systems seems to behave this way) the problem is in the way
>>> the KeyValue is processed in the RegionServer, even though I was not able
>>> to track the problem in the source code. Can anyone else confirm this? When
>>> using Delete#deleteColumns everything seems to be working fine (the
>>> KeyValues have different type). Is this expected or should it be considered
>>> a bug? And if so, where it should be fixed? I think it could be on the side
>>> of the record writer (maybe by throwing an exception), or in the region
>>> server (if possible, this might be non-trivial, because of the
>>> Delete#deleteColumn semantics).
>>>
>>> Any opinions?
>>>
>>> Thanks,
>>> Jan
>
>
> --
>
> Jan Lukavský
> Vedoucí týmu vývoje
> Seznam.cz, a.s.
> Radlická 3494/10
> 15000, Praha 5
>
> jan.lukavsky@firma.seznam.cz
> http://www.seznam.cz
>
Re: Delete.deleteColumn not working with HFileOutputFormat?
Posted by Jan Lukavský <ja...@firma.seznam.cz>.
Hi Ted,
sure, there was a typo in the subject. The problem is with
Delete#deleteColumn, fixed that in the subject. Since we are not
planning to upgrade our CDH4 distribution (we are planning to upgrade to
CDH5 as a next step), I'm afraid I cannot simply test this on the
version you mentioned. I can try to create a unittest for this. Should I
file a JIRA?
Thanks,
Jan
On 10/21/2014 06:05 PM, Ted Yu wrote:
> bq. When using Delete#deleteColumns everything seems to be working fine
>
> Please confirm that the issue you observe was with Delete#deleteColumn
> (different from the method mentioned in subject).
>
> Can you tried with 0.94.24 (the latest 0.94 release) ?
>
> If you can capture this using a unit test, that would great.
>
> Thanks
>
> On Tue, Oct 21, 2014 at 8:23 AM, Jan Lukavský <ja...@firma.seznam.cz>
> wrote:
>
>> Hi all,
>>
>> we are using HBase version 0.94.6-cdh4.3.1 and I have a suspicion that a
>> Delete written to hbase through HFileOutputFormat might be ignored (and not
>> delete any data) in the following scenario:
>> * a Delete object is used to delete the data at the client side
>> * call to "deleteColumn" instead of "deleteColumns" is used, which means
>> that the underlaying KeyValue will not have an associated timestamp (will
>> have HConstants.LATEST_TIMESTAMP)
>> * the Delete object is then converted to KeyValues and these are written
>> into the output format's record writer
>>
>> I think (our systems seems to behave this way) the problem is in the way
>> the KeyValue is processed in the RegionServer, even though I was not able
>> to track the problem in the source code. Can anyone else confirm this? When
>> using Delete#deleteColumns everything seems to be working fine (the
>> KeyValues have different type). Is this expected or should it be considered
>> a bug? And if so, where it should be fixed? I think it could be on the side
>> of the record writer (maybe by throwing an exception), or in the region
>> server (if possible, this might be non-trivial, because of the
>> Delete#deleteColumn semantics).
>>
>> Any opinions?
>>
>> Thanks,
>> Jan
>>
>>
--
Jan Lukavský
Vedoucí týmu vývoje
Seznam.cz, a.s.
Radlická 3494/10
15000, Praha 5
jan.lukavsky@firma.seznam.cz
http://www.seznam.cz
Re: Delete.deleteColumns not working with HFileOutputFormat?
Posted by Ted Yu <yu...@gmail.com>.
bq. When using Delete#deleteColumns everything seems to be working fine
Please confirm that the issue you observe was with Delete#deleteColumn
(different from the method mentioned in subject).
Can you tried with 0.94.24 (the latest 0.94 release) ?
If you can capture this using a unit test, that would great.
Thanks
On Tue, Oct 21, 2014 at 8:23 AM, Jan Lukavský <ja...@firma.seznam.cz>
wrote:
> Hi all,
>
> we are using HBase version 0.94.6-cdh4.3.1 and I have a suspicion that a
> Delete written to hbase through HFileOutputFormat might be ignored (and not
> delete any data) in the following scenario:
> * a Delete object is used to delete the data at the client side
> * call to "deleteColumn" instead of "deleteColumns" is used, which means
> that the underlaying KeyValue will not have an associated timestamp (will
> have HConstants.LATEST_TIMESTAMP)
> * the Delete object is then converted to KeyValues and these are written
> into the output format's record writer
>
> I think (our systems seems to behave this way) the problem is in the way
> the KeyValue is processed in the RegionServer, even though I was not able
> to track the problem in the source code. Can anyone else confirm this? When
> using Delete#deleteColumns everything seems to be working fine (the
> KeyValues have different type). Is this expected or should it be considered
> a bug? And if so, where it should be fixed? I think it could be on the side
> of the record writer (maybe by throwing an exception), or in the region
> server (if possible, this might be non-trivial, because of the
> Delete#deleteColumn semantics).
>
> Any opinions?
>
> Thanks,
> Jan
>
>