You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Bear Giles <bg...@snaplogic.com> on 2020/11/10 18:53:46 UTC

Inconsistent behavior in datanode between hdfs CLI and application

I'm seeing a really weird issue with a 2.6.0 datanode (CDH 5.16).

In short we have a cluster running HDFS 2.6.0 (CDH 5.16), a 'hdfs' CLI app
running HDFS 3.0.0 (CDH 6.3), and an application using HDP 2.8.5. (CDH 6.x
- not sure which version). If it matters the cluster uses Kerberos
authentication and TLS encryption.

The hdfs CLI can do everything expected. This includes '-put' and '-get'.

The app can do all datanode operations and can *read* existing files but it
gives us 'no blocks available' when we try to *write* to a new file. This
exception is seen in the datanode log.

Finally the app works with a CDH 6.1.1 cluster. It has an identical
configuration.

I'm at a loss since it's clearly not an issue with the server
configuration, client configuration, app implementation, etc. We're trying
to write a file of < 20 bytes to a cluster with over 90 GB of free space.
The only difference is the client library - and I would expect the opposite
results if that were the cause. (Why would the newer version work when the
older version failed?)

Does anyone have any ideas?

P.S., the last version of the app worked with a prior 5.x cluster. However
we've rebuilt the cluster (including bumping to 5.16) and made changes to
the app so we can't really use it as a data point.

-- 

Bear Giles

Sr. Software Engineer
bgiles@snaplogic.com
Mobile: 720-749-7876


<http://www.snaplogic.com/about-us/jobs>



*SnapLogic Inc | 1825 South Grant Street | San Mateo CA | USA   *


This message is confidential. It may also be privileged or otherwise
protected by work product immunity or other legal rules. If you have
received it by mistake, please let us know by e-mail reply and delete it
from your system; you may not copy this message or disclose its contents to
anyone. The integrity and security of this message cannot be guaranteed on
the Internet.

Re: Inconsistent behavior in datanode between hdfs CLI and application

Posted by Bear Giles <bg...@snaplogic.com>.

I may have an insight. We're pulling in protobuf 2.5.0 - but we're also
pulling in the google cloud and it's pulling in protobuf-util 3.11.4. Some
of our other projects pull in both protobuf 3.11.4 and protobuf-util 3.11.4.

At first glance I don't know if this matters - the serialization of the
data should be the same no matter which direction it's flowing and we have
verified that our app can read a file from HDFS. However it *is* a
difference between the hdfs CLI app and our application so I can't rule it
out.

I'm still trying to figure out how to tweak our dependencies so we see what
happens if we remove protobuf-util. It's not easy since one of the key
features of our app is that it hides the details of your cloud storage
without relying on your filesystem having those filesystems mounted as
remote drives.

On Tue, Nov 10, 2020 at 11:53 AM Bear Giles <bg...@snaplogic.com> wrote:

> I'm seeing a really weird issue with a 2.6.0 datanode (CDH 5.16).
>
> In short we have a cluster running HDFS 2.6.0 (CDH 5.16), a 'hdfs' CLI app
> running HDFS 3.0.0 (CDH 6.3), and an application using HDP 2.8.5. (CDH 6.x
> - not sure which version). If it matters the cluster uses Kerberos
> authentication and TLS encryption.
>
> The hdfs CLI can do everything expected. This includes '-put' and '-get'.
>
> The app can do all datanode operations and can *read* existing files but
> it gives us 'no blocks available' when we try to *write* to a new file.
> This exception is seen in the datanode log.
>
> Finally the app works with a CDH 6.1.1 cluster. It has an identical
> configuration.
>
> I'm at a loss since it's clearly not an issue with the server
> configuration, client configuration, app implementation, etc. We're trying
> to write a file of < 20 bytes to a cluster with over 90 GB of free space.
> The only difference is the client library - and I would expect the opposite
> results if that were the cause. (Why would the newer version work when the
> older version failed?)
>
> Does anyone have any ideas?
>
> P.S., the last version of the app worked with a prior 5.x cluster. However
> we've rebuilt the cluster (including bumping to 5.16) and made changes to
> the app so we can't really use it as a data point.
>
> --
>
> Bear Giles
>
> Sr. Software Engineer
> bgiles@snaplogic.com
> Mobile: 720-749-7876
>
>
> <http://www.snaplogic.com/about-us/jobs>
>
>
>
> *SnapLogic Inc | 1825 South Grant Street | San Mateo CA | USA   *
>
>
> This message is confidential. It may also be privileged or otherwise
> protected by work product immunity or other legal rules. If you have
> received it by mistake, please let us know by e-mail reply and delete it
> from your system; you may not copy this message or disclose its contents to
> anyone. The integrity and security of this message cannot be guaranteed on
> the Internet.
>
>

-- 

Bear Giles

Sr. Software Engineer
bgiles@snaplogic.com
Mobile: 720-749-7876

<http://www.snaplogic.com/about-us/jobs>

*SnapLogic Inc | 1825 South Grant Street | San Mateo CA | USA   *

This message is confidential. It may also be privileged or otherwise
protected by work product immunity or other legal rules. If you have
received it by mistake, please let us know by e-mail reply and delete it
from your system; you may not copy this message or disclose its contents to
anyone. The integrity and security of this message cannot be guaranteed on
the Internet.