You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by OrielResearch Eila Arich-Landkof <ei...@orielresearch.org> on 2020/04/17 01:29:19 UTC

Copying tar.gz libraries to apache-beam workers

Hi all,

This is a question that I have post on user-help@apache.org. in case, the
other one is not a valid address. posting here again. if you have
already received it, apologies for the spam.

I hope that you are all well. I would like to copy a tools library into the
worker machines and uses the setup.py file. I have update the
CUSTOM_COMMANDS:

 CUSTOM_COMMANDS = [
  ["wget", "-O",
"/usr/local/sratoolkit.tar.gz","http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"],
  ["tar", "-xzf", "/usr/local/sratoolkit.tar.gz","-C","/usr/local/"]]

When I am looking for the execution files at the /user/local folder, I can
not find the tools that I have copied to the worker. What is the right &
easiest way to copy tools libraries into the worker machines? I was able to
get the expected behavior of the local runner and now it's a matter of
finding the right way to reproduce it for the dataflow runner.

I am using python 3.5 with the latest apache-beam 2.20 and latest dataflow
Thanks a lot, eilalan
-- 
Eila
<http://www.orielresearch.com>
Meetup <https://www.meetup.com/Deep-Learning-In-Production/>

Re: Copying tar.gz libraries to apache-beam workers

Posted by OrielResearch Eila Arich-Landkof <ei...@orielresearch.org>.
Thank you. I should have realized this flag.
Stay safe
Best,
Eila


On Mon, Apr 20, 2020 at 12:42 PM Luke Cwik <lc...@google.com> wrote:

> It looks like anaconda assumes that the directory doesn't exist before
> installation. I would recommend using a new directory such as
> ["/opt/userowned/anaconda.sh", "-b","-p","/opt/userowned/anaconda/"]
> but if you really want it to be in "/opt/userowned", you should add "-f":
> ["/opt/userowned/anaconda.sh","-f","-b","-p","/opt/userowned/"]
>
> See the Anaconda silent install[1] instructions for more details.
>
> 1: https://docs.anaconda.com/anaconda/install/silent-mode/#linux-macos
>
>
> On Fri, Apr 17, 2020 at 9:28 PM OrielResearch Eila Arich-Landkof <
> eila@orielresearch.org> wrote:
>
>> Thank you. I was able to tar my libraries at the /opt/userowned fodler.
>> I am using setup.py from this url (recommended your apache-beam
>> documentation)
>>
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/setup.py
>>
>> Last, I want to install anaconda. I have a library with dependencies that
>> I rather have anaconda take care of. using the custom commands, I am
>> executing the following commands. the first two are running fine, the last
>> one, fires and error:
>> ["wget", "-O", "/opt/userowned/anaconda.sh","
>> https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh"],
>> ['echo', 'Change conda permissions'],
>> ["chmod", "777", "/opt/userowned/anaconda.sh"],
>> ["/opt/userowned/anaconda.sh", "-b","-p","/opt/userowned/"]
>>
>> the last command failed with the following error:
>> Command output: b"ERROR: File or directory already exists:
>> '/opt/userowned/'\nIf you want to update an existing installation, use the
>> -u option.\n"
>>
>> Please let me know if you have any recommendations on how to install
>> anaconda
>>
>> Many thanks,
>> Eila
>>
>> On Fri, Apr 17, 2020 at 12:12 PM Luke Cwik <lc...@google.com> wrote:
>>
>>> On Dataflow you should be able to use /opt/userowned
>>>
>>> On Fri, Apr 17, 2020 at 9:01 AM OrielResearch Eila Arich-Landkof <
>>> eila@orielresearch.org> wrote:
>>>
>>>> See inline
>>>>
>>>>
>>>> —
>>>> Eila
>>>> www.orielesearch.com
>>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>>>> m/Deep-Learning-In-Production
>>>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Apr 17, 2020, at 11:32 AM, Luke Cwik <lc...@google.com> wrote:
>>>>
>>>> 
>>>> When you said you checked '/usr/local/', did you check inside the
>>>> docker container or on the VM itself?
>>>>
>>>> Yes. Checked as part of the DoFn function that runs on the Docker
>>>>
>>>> Have you tried adding an echo command before and after your script runs
>>>> and looked for them in the stackdriver logs?
>>>>
>>>> No. I see the name of the setup.py that I have in the logs
>>>> I can add the echo command as well
>>>>
>>>> It should help you locate any errors that might have happened when
>>>> executing the custom commands.
>>>>
>>>> Will try that. Is it correct to target to this folder. Any other folder
>>>> is ‘dedicated’ for custom downloads?
>>>>
>>>> Thanks
>>>> Eila
>>>>
>>>>
>>>> On Thu, Apr 16, 2020 at 6:30 PM OrielResearch Eila Arich-Landkof <
>>>> eila@orielresearch.org> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> This is a question that I have post on user-help@apache.org. in case,
>>>>> the other one is not a valid address. posting here again. if you have
>>>>> already received it, apologies for the spam.
>>>>>
>>>>> I hope that you are all well. I would like to copy a tools library
>>>>> into the worker machines and uses the setup.py file. I have update the
>>>>> CUSTOM_COMMANDS:
>>>>>
>>>>>  CUSTOM_COMMANDS = [
>>>>>   ["wget", "-O", "/usr/local/sratoolkit.tar.gz","http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"],
>>>>>   ["tar", "-xzf", "/usr/local/sratoolkit.tar.gz","-C","/usr/local/"]]
>>>>>
>>>>> When I am looking for the execution files at the /user/local folder, I
>>>>> can not find the tools that I have copied to the worker. What is the right
>>>>> & easiest way to copy tools libraries into the worker machines? I was able
>>>>> to get the expected behavior of the local runner and now it's a matter of
>>>>> finding the right way to reproduce it for the dataflow runner.
>>>>>
>>>>> I am using python 3.5 with the latest apache-beam 2.20 and latest
>>>>> dataflow Thanks a lot, eilalan
>>>>> --
>>>>> Eila
>>>>> <http://www.orielresearch.com>
>>>>> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>
>>>>>
>>>>
>>
>> --
>> Eila
>> <http://www.orielresearch.com>
>> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>
>>
>

-- 
Eila
<http://www.orielresearch.com>
Meetup <https://www.meetup.com/Deep-Learning-In-Production/>

Re: Copying tar.gz libraries to apache-beam workers

Posted by Luke Cwik <lc...@google.com>.
It looks like anaconda assumes that the directory doesn't exist before
installation. I would recommend using a new directory such as
["/opt/userowned/anaconda.sh", "-b","-p","/opt/userowned/anaconda/"]
but if you really want it to be in "/opt/userowned", you should add "-f":
["/opt/userowned/anaconda.sh","-f","-b","-p","/opt/userowned/"]

See the Anaconda silent install[1] instructions for more details.

1: https://docs.anaconda.com/anaconda/install/silent-mode/#linux-macos


On Fri, Apr 17, 2020 at 9:28 PM OrielResearch Eila Arich-Landkof <
eila@orielresearch.org> wrote:

> Thank you. I was able to tar my libraries at the /opt/userowned fodler.
> I am using setup.py from this url (recommended your apache-beam
> documentation)
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/setup.py
>
> Last, I want to install anaconda. I have a library with dependencies that
> I rather have anaconda take care of. using the custom commands, I am
> executing the following commands. the first two are running fine, the last
> one, fires and error:
> ["wget", "-O", "/opt/userowned/anaconda.sh","
> https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh"],
> ['echo', 'Change conda permissions'],
> ["chmod", "777", "/opt/userowned/anaconda.sh"],
> ["/opt/userowned/anaconda.sh", "-b","-p","/opt/userowned/"]
>
> the last command failed with the following error:
> Command output: b"ERROR: File or directory already exists:
> '/opt/userowned/'\nIf you want to update an existing installation, use the
> -u option.\n"
>
> Please let me know if you have any recommendations on how to install
> anaconda
>
> Many thanks,
> Eila
>
> On Fri, Apr 17, 2020 at 12:12 PM Luke Cwik <lc...@google.com> wrote:
>
>> On Dataflow you should be able to use /opt/userowned
>>
>> On Fri, Apr 17, 2020 at 9:01 AM OrielResearch Eila Arich-Landkof <
>> eila@orielresearch.org> wrote:
>>
>>> See inline
>>>
>>>
>>> —
>>> Eila
>>> www.orielesearch.com
>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>>> m/Deep-Learning-In-Production
>>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>>
>>> Sent from my iPhone
>>>
>>> On Apr 17, 2020, at 11:32 AM, Luke Cwik <lc...@google.com> wrote:
>>>
>>> 
>>> When you said you checked '/usr/local/', did you check inside the docker
>>> container or on the VM itself?
>>>
>>> Yes. Checked as part of the DoFn function that runs on the Docker
>>>
>>> Have you tried adding an echo command before and after your script runs
>>> and looked for them in the stackdriver logs?
>>>
>>> No. I see the name of the setup.py that I have in the logs
>>> I can add the echo command as well
>>>
>>> It should help you locate any errors that might have happened when
>>> executing the custom commands.
>>>
>>> Will try that. Is it correct to target to this folder. Any other folder
>>> is ‘dedicated’ for custom downloads?
>>>
>>> Thanks
>>> Eila
>>>
>>>
>>> On Thu, Apr 16, 2020 at 6:30 PM OrielResearch Eila Arich-Landkof <
>>> eila@orielresearch.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>> This is a question that I have post on user-help@apache.org. in case,
>>>> the other one is not a valid address. posting here again. if you have
>>>> already received it, apologies for the spam.
>>>>
>>>> I hope that you are all well. I would like to copy a tools library into
>>>> the worker machines and uses the setup.py file. I have update the
>>>> CUSTOM_COMMANDS:
>>>>
>>>>  CUSTOM_COMMANDS = [
>>>>   ["wget", "-O", "/usr/local/sratoolkit.tar.gz","http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"],
>>>>   ["tar", "-xzf", "/usr/local/sratoolkit.tar.gz","-C","/usr/local/"]]
>>>>
>>>> When I am looking for the execution files at the /user/local folder, I
>>>> can not find the tools that I have copied to the worker. What is the right
>>>> & easiest way to copy tools libraries into the worker machines? I was able
>>>> to get the expected behavior of the local runner and now it's a matter of
>>>> finding the right way to reproduce it for the dataflow runner.
>>>>
>>>> I am using python 3.5 with the latest apache-beam 2.20 and latest
>>>> dataflow Thanks a lot, eilalan
>>>> --
>>>> Eila
>>>> <http://www.orielresearch.com>
>>>> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>
>>>>
>>>
>
> --
> Eila
> <http://www.orielresearch.com>
> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>
>

Re: Copying tar.gz libraries to apache-beam workers

Posted by OrielResearch Eila Arich-Landkof <ei...@orielresearch.org>.
Thank you. I was able to tar my libraries at the /opt/userowned fodler.
I am using setup.py from this url (recommended your apache-beam
documentation)
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/setup.py

Last, I want to install anaconda. I have a library with dependencies that I
rather have anaconda take care of. using the custom commands, I am
executing the following commands. the first two are running fine, the last
one, fires and error:
["wget", "-O", "/opt/userowned/anaconda.sh","
https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh"],
['echo', 'Change conda permissions'],
["chmod", "777", "/opt/userowned/anaconda.sh"],
["/opt/userowned/anaconda.sh", "-b","-p","/opt/userowned/"]

the last command failed with the following error:
Command output: b"ERROR: File or directory already exists:
'/opt/userowned/'\nIf you want to update an existing installation, use the
-u option.\n"

Please let me know if you have any recommendations on how to install
anaconda

Many thanks,
Eila

On Fri, Apr 17, 2020 at 12:12 PM Luke Cwik <lc...@google.com> wrote:

> On Dataflow you should be able to use /opt/userowned
>
> On Fri, Apr 17, 2020 at 9:01 AM OrielResearch Eila Arich-Landkof <
> eila@orielresearch.org> wrote:
>
>> See inline
>>
>>
>> —
>> Eila
>> www.orielesearch.com
>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
>> m/Deep-Learning-In-Production
>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>
>> Sent from my iPhone
>>
>> On Apr 17, 2020, at 11:32 AM, Luke Cwik <lc...@google.com> wrote:
>>
>> 
>> When you said you checked '/usr/local/', did you check inside the docker
>> container or on the VM itself?
>>
>> Yes. Checked as part of the DoFn function that runs on the Docker
>>
>> Have you tried adding an echo command before and after your script runs
>> and looked for them in the stackdriver logs?
>>
>> No. I see the name of the setup.py that I have in the logs
>> I can add the echo command as well
>>
>> It should help you locate any errors that might have happened when
>> executing the custom commands.
>>
>> Will try that. Is it correct to target to this folder. Any other folder
>> is ‘dedicated’ for custom downloads?
>>
>> Thanks
>> Eila
>>
>>
>> On Thu, Apr 16, 2020 at 6:30 PM OrielResearch Eila Arich-Landkof <
>> eila@orielresearch.org> wrote:
>>
>>> Hi all,
>>>
>>> This is a question that I have post on user-help@apache.org. in case,
>>> the other one is not a valid address. posting here again. if you have
>>> already received it, apologies for the spam.
>>>
>>> I hope that you are all well. I would like to copy a tools library into
>>> the worker machines and uses the setup.py file. I have update the
>>> CUSTOM_COMMANDS:
>>>
>>>  CUSTOM_COMMANDS = [
>>>   ["wget", "-O", "/usr/local/sratoolkit.tar.gz","http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"],
>>>   ["tar", "-xzf", "/usr/local/sratoolkit.tar.gz","-C","/usr/local/"]]
>>>
>>> When I am looking for the execution files at the /user/local folder, I
>>> can not find the tools that I have copied to the worker. What is the right
>>> & easiest way to copy tools libraries into the worker machines? I was able
>>> to get the expected behavior of the local runner and now it's a matter of
>>> finding the right way to reproduce it for the dataflow runner.
>>>
>>> I am using python 3.5 with the latest apache-beam 2.20 and latest
>>> dataflow Thanks a lot, eilalan
>>> --
>>> Eila
>>> <http://www.orielresearch.com>
>>> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>
>>>
>>

-- 
Eila
<http://www.orielresearch.com>
Meetup <https://www.meetup.com/Deep-Learning-In-Production/>

Re: Copying tar.gz libraries to apache-beam workers

Posted by Luke Cwik <lc...@google.com>.
On Dataflow you should be able to use /opt/userowned

On Fri, Apr 17, 2020 at 9:01 AM OrielResearch Eila Arich-Landkof <
eila@orielresearch.org> wrote:

> See inline
>
>
> —
> Eila
> www.orielesearch.com
> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
> p.co <https://www.meetup.com/Deep-Learning-In-Production/>
> m/Deep-Learning-In-Production
> <https://www.meetup.com/Deep-Learning-In-Production/>
>
> Sent from my iPhone
>
> On Apr 17, 2020, at 11:32 AM, Luke Cwik <lc...@google.com> wrote:
>
> 
> When you said you checked '/usr/local/', did you check inside the docker
> container or on the VM itself?
>
> Yes. Checked as part of the DoFn function that runs on the Docker
>
> Have you tried adding an echo command before and after your script runs
> and looked for them in the stackdriver logs?
>
> No. I see the name of the setup.py that I have in the logs
> I can add the echo command as well
>
> It should help you locate any errors that might have happened when
> executing the custom commands.
>
> Will try that. Is it correct to target to this folder. Any other folder is
> ‘dedicated’ for custom downloads?
>
> Thanks
> Eila
>
>
> On Thu, Apr 16, 2020 at 6:30 PM OrielResearch Eila Arich-Landkof <
> eila@orielresearch.org> wrote:
>
>> Hi all,
>>
>> This is a question that I have post on user-help@apache.org. in case,
>> the other one is not a valid address. posting here again. if you have
>> already received it, apologies for the spam.
>>
>> I hope that you are all well. I would like to copy a tools library into
>> the worker machines and uses the setup.py file. I have update the
>> CUSTOM_COMMANDS:
>>
>>  CUSTOM_COMMANDS = [
>>   ["wget", "-O", "/usr/local/sratoolkit.tar.gz","http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"],
>>   ["tar", "-xzf", "/usr/local/sratoolkit.tar.gz","-C","/usr/local/"]]
>>
>> When I am looking for the execution files at the /user/local folder, I
>> can not find the tools that I have copied to the worker. What is the right
>> & easiest way to copy tools libraries into the worker machines? I was able
>> to get the expected behavior of the local runner and now it's a matter of
>> finding the right way to reproduce it for the dataflow runner.
>>
>> I am using python 3.5 with the latest apache-beam 2.20 and latest
>> dataflow Thanks a lot, eilalan
>> --
>> Eila
>> <http://www.orielresearch.com>
>> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>
>>
>

Re: Copying tar.gz libraries to apache-beam workers

Posted by OrielResearch Eila Arich-Landkof <ei...@orielresearch.org>.
See inline 


—
Eila
www.orielesearch.com
https://www.meetup.com/Deep-Learning-In-Production 


Sent from my iPhone

> On Apr 17, 2020, at 11:32 AM, Luke Cwik <lc...@google.com> wrote:
> 
> 
> When you said you checked '/usr/local/', did you check inside the docker container or on the VM itself?
Yes. Checked as part of the DoFn function that runs on the Docker 
> Have you tried adding an echo command before and after your script runs and looked for them in the stackdriver logs?
No. I see the name of the setup.py that I have in the logs
I can add the echo command as well 

> It should help you locate any errors that might have happened when executing the custom commands.
Will try that. Is it correct to target to this folder. Any other folder is ‘dedicated’ for custom downloads?

Thanks
Eila
> 
>> On Thu, Apr 16, 2020 at 6:30 PM OrielResearch Eila Arich-Landkof <ei...@orielresearch.org> wrote:
>> Hi all,
>> 
>> This is a question that I have post on user-help@apache.org. in case, the other one is not a valid address. posting here again. if you have already received it, apologies for the spam. 
>> 
>> I hope that you are all well. I would like to copy a tools library into the worker machines and uses the setup.py file. I have update the CUSTOM_COMMANDS:
>> 
>>  CUSTOM_COMMANDS = [
>>   ["wget", "-O", "/usr/local/sratoolkit.tar.gz","http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"],
>>   ["tar", "-xzf", "/usr/local/sratoolkit.tar.gz","-C","/usr/local/"]]
>> When I am looking for the execution files at the /user/local folder, I can not find the tools that I have copied to the worker. What is the right & easiest way to copy tools libraries into the worker machines? I was able to get the expected behavior of the local runner and now it's a matter of finding the right way to reproduce it for the dataflow runner.
>> 
>> I am using python 3.5 with the latest apache-beam 2.20 and latest dataflow Thanks a lot, eilalan
>> 
>> -- 
>> Eila
>> 
>> Meetup

Re: Copying tar.gz libraries to apache-beam workers

Posted by Luke Cwik <lc...@google.com>.
When you said you checked '/usr/local/', did you check inside the docker
container or on the VM itself?

Have you tried adding an echo command before and after your script runs and
looked for them in the stackdriver logs?
It should help you locate any errors that might have happened when
executing the custom commands.

On Thu, Apr 16, 2020 at 6:30 PM OrielResearch Eila Arich-Landkof <
eila@orielresearch.org> wrote:

> Hi all,
>
> This is a question that I have post on user-help@apache.org. in case, the
> other one is not a valid address. posting here again. if you have
> already received it, apologies for the spam.
>
> I hope that you are all well. I would like to copy a tools library into
> the worker machines and uses the setup.py file. I have update the
> CUSTOM_COMMANDS:
>
>  CUSTOM_COMMANDS = [
>   ["wget", "-O", "/usr/local/sratoolkit.tar.gz","http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"],
>   ["tar", "-xzf", "/usr/local/sratoolkit.tar.gz","-C","/usr/local/"]]
>
> When I am looking for the execution files at the /user/local folder, I can
> not find the tools that I have copied to the worker. What is the right &
> easiest way to copy tools libraries into the worker machines? I was able to
> get the expected behavior of the local runner and now it's a matter of
> finding the right way to reproduce it for the dataflow runner.
>
> I am using python 3.5 with the latest apache-beam 2.20 and latest dataflow
> Thanks a lot, eilalan
> --
> Eila
> <http://www.orielresearch.com>
> Meetup <https://www.meetup.com/Deep-Learning-In-Production/>
>