You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ruoyun Huang <ru...@google.com> on 2018/11/16 23:02:51 UTC

Questions on [MD5] hash code of staged files

Hi, Folks,

     I am running python SDK PortableRunner, by connecting to Java
Reference Runner Job server. But we couldn't make it work because docker
container fails to start due to error message: "2018/11/16 21:38:55 Failed
to retrieve staged files: failed to retrieve pickled_main_session in 3
attempts: bad MD5 for /tmp/staged/pickled_main_session:
9g/EU11J0QTfwDVbpHQhAQ==, want ; bad MD5 for
/tmp/staged/pickled_main_session: 9g/EU11J0QTfwDVbpHQhAQ==, want ; bad MD5
for /tmp/staged/pickled_main_session: 9g/EU11J0QTfwDVbpHQhAQ==, want ; bad
MD5 for /tmp/staged/pickled_main_session: 9g/EU11J0QTfwDVbpHQhAQ==,
want ".  Actual code for this error message is here
<https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/artifact/materialize.go#L173>
.

The file pickled_main_session is INDEED staged, but for unknown reason we
are expecting an empty string as the hash code. My hypothesis is that, the
job request should've included a hash code, but fails to do so on the
python part, thus led to an empty string.

If the hypothesis above is correct, then my question is: where should I put
the code in python SDK's job request to make it right? A pointer to the
right place is appreciated.

That being said, I also saw Ankur's recent PR#7049
<https://github.com/apache/beam/commit/1b241f9517342c73ed2f0a73251858ee67c7e191>
updates
MD5 into SHA256. And this PR we are not updating anything in Java or
Python. Therefore it makes me not sure about the hypothesis above. What did
I miss? (or maybe that is what PR#7049 should've done?)

Suggestions appreciated.

Cheers,
-- 
================
Ruoyun  Huang

Re: Questions on [MD5] hash code of staged files

Posted by Ankur Goenka <go...@google.com>.
Hi Ruoyun,

We moved from MD5 to SHA256 hashing which caused this problem.
The java and python code was updated in PR
https://github.com/apache/beam/pull/6583 though GO code was not updates. Go
caches the generated code which caused tests to pass. Though I am not sure
why we did not break integration tests sooner.
We resolved this issue with https://github.com/apache/beam/pull/7071 .
Let me know if you are still having the same issue.

Thanks,
Ankur

On Fri, Nov 16, 2018 at 3:03 PM Ruoyun Huang <ru...@google.com> wrote:

> Hi, Folks,
>
>      I am running python SDK PortableRunner, by connecting to Java
> Reference Runner Job server. But we couldn't make it work because docker
> container fails to start due to error message: "2018/11/16 21:38:55 Failed
> to retrieve staged files: failed to retrieve pickled_main_session in 3
> attempts: bad MD5 for /tmp/staged/pickled_main_session:
> 9g/EU11J0QTfwDVbpHQhAQ==, want ; bad MD5 for
> /tmp/staged/pickled_main_session: 9g/EU11J0QTfwDVbpHQhAQ==, want ; bad MD5
> for /tmp/staged/pickled_main_session: 9g/EU11J0QTfwDVbpHQhAQ==, want ; bad
> MD5 for /tmp/staged/pickled_main_session: 9g/EU11J0QTfwDVbpHQhAQ==,
> want ".  Actual code for this error message is here
> <https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/artifact/materialize.go#L173>
> .
>
> The file pickled_main_session is INDEED staged, but for unknown reason we
> are expecting an empty string as the hash code. My hypothesis is that, the
> job request should've included a hash code, but fails to do so on the
> python part, thus led to an empty string.
>
> If the hypothesis above is correct, then my question is: where should I
> put the code in python SDK's job request to make it right? A pointer to the
> right place is appreciated.
>
> That being said, I also saw Ankur's recent PR#7049
> <https://github.com/apache/beam/commit/1b241f9517342c73ed2f0a73251858ee67c7e191> updates
> MD5 into SHA256. And this PR we are not updating anything in Java or
> Python. Therefore it makes me not sure about the hypothesis above. What did
> I miss? (or maybe that is what PR#7049 should've done?)
>
> Suggestions appreciated.
>
> Cheers,
> --
> ================
> Ruoyun  Huang
>
>