You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@toree.apache.org by Ia...@tdameritrade.com on 2016/11/17 18:41:04 UTC

Issue when running toree, with jupyterhub, and ambari

Hi,

I’m experiencing a strange issue when running a toree kernel with jupyterhub. The python version used for spark in the kernel.json is 2.7, I verified that in the notebook itself, but in the jupyterhub logs, I see errors from two python files, created by ambari:

/usr/bin/hdp-select
/etc/hadoop/conf/topology_script.py

The errors come from any code which works in python 2, but not 3, since ambari needs python 2 to run. Unfortunately, jupyterhub needs python 3. I’m not sure why Toree is using python 3 with these files, instead of the python 2, specified in the kernel.json. I tested with a pyspark notebook and did not have the same issue, so it seems to be something related to toree’s integration with jupyterhub.

I’ve updated the files to handle both python 2 & 3, but they seem to be recreated when ambari and the cluster restart. I’m looking for a more stable long term solution.

Any Ideas?

Ian

Re: Issue when running toree, with jupyterhub, and ambari

Posted by Ia...@tdameritrade.com.
I find it strange that toree calls /etc/hadoop/conf/topology_script.py
every second, even when nothing is being executed. Is there a way to turn
that off in toree?

Thanks,

Ian








On 11/18/16, 9:38 AM, "Maloney, Ian" <Ia...@tdameritrade.com> wrote:

>Like I mentioned below, JupyterHub is in a python 3 environment. I have a
>pyspark kernel and a toree kernel, both pointing to the same python. The
>pyspark kernel works fine. The toree kernel will print errors about the
>topology_script.py over and over. So it is specific to toree.
>
>
>Ian Maloney
>Platform Architect
>Advanced Analytics
>Internal: 828716
>Office: (734) 623-8716
>Mobile: (313) 910-9272
>
>
>
>
>
>
>
>
>On 11/17/16, 6:15 PM, "Marius van Niekerk" <ma...@gmail.com>
>wrote:
>
>>So those are typically run when you are trying to discover nodes when
>>running under Yarn.
>>
>>It shouldn't have anything toree specific with it. You're probably just
>>using a version of python by default that doesn't cause errors.
>>
>>On Thu, Nov 17, 2016, 16:02 <Ia...@tdameritrade.com> wrote:
>>
>>> I¹d prefer not to change those scripts, that¹s the issue. I¹m wondering
>>> why toree is running them, but not my pyspark notebook.
>>>
>>>
>>> Ian Maloney
>>> Platform Architect
>>> Advanced Analytics
>>> Internal: 828716
>>> Office: (734) 623-8716
>>> Mobile: (313) 910-9272
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 11/17/16, 3:15 PM, "Marius van Niekerk" <ma...@gmail.com>
>>> wrote:
>>>
>>> >So for the topology files are read by /usr/bin/env Python .  You can
>>> >change
>>> >it to point at the system Python or make those scripts py2 and 3
>>> >compatible
>>> >
>>> >On Thu, Nov 17, 2016, 13:41 <Ia...@tdameritrade.com> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> I¹m experiencing a strange issue when running a toree kernel with
>>> >> jupyterhub. The python version used for spark in the kernel.json is
>>> >>2.7, I
>>> >> verified that in the notebook itself, but in the jupyterhub logs, I
>>>see
>>> >> errors from two python files, created by ambari:
>>> >>
>>> >> /usr/bin/hdp-select
>>> >> /etc/hadoop/conf/topology_script.py
>>> >>
>>> >> The errors come from any code which works in python 2, but not 3,
>>>since
>>> >> ambari needs python 2 to run. Unfortunately, jupyterhub needs python
>>>3.
>>> >>I¹m
>>> >> not sure why Toree is using python 3 with these files, instead of
>>>the
>>> >> python 2, specified in the kernel.json. I tested with a pyspark
>>>notebook
>>> >> and did not have the same issue, so it seems to be something related
>>>to
>>> >> toree¹s integration with jupyterhub.
>>> >>
>>> >> I¹ve updated the files to handle both python 2 & 3, but they seem to
>>>be
>>> >> recreated when ambari and the cluster restart. I¹m looking for a
>>>more
>>> >> stable long term solution.
>>> >>
>>> >> Any Ideas?
>>> >>
>>> >> Ian
>>> >>
>>> >--
>>> >regards
>>> >Marius van Niekerk
>>>
>>> --
>>regards
>>Marius van Niekerk
>


Re: Issue when running toree, with jupyterhub, and ambari

Posted by Ia...@tdameritrade.com.
Like I mentioned below, JupyterHub is in a python 3 environment. I have a
pyspark kernel and a toree kernel, both pointing to the same python. The
pyspark kernel works fine. The toree kernel will print errors about the
topology_script.py over and over. So it is specific to toree.


Ian Maloney
Platform Architect
Advanced Analytics
Internal: 828716
Office: (734) 623-8716
Mobile: (313) 910-9272








On 11/17/16, 6:15 PM, "Marius van Niekerk" <ma...@gmail.com>
wrote:

>So those are typically run when you are trying to discover nodes when
>running under Yarn.
>
>It shouldn't have anything toree specific with it. You're probably just
>using a version of python by default that doesn't cause errors.
>
>On Thu, Nov 17, 2016, 16:02 <Ia...@tdameritrade.com> wrote:
>
>> I¹d prefer not to change those scripts, that¹s the issue. I¹m wondering
>> why toree is running them, but not my pyspark notebook.
>>
>>
>> Ian Maloney
>> Platform Architect
>> Advanced Analytics
>> Internal: 828716
>> Office: (734) 623-8716
>> Mobile: (313) 910-9272
>>
>>
>>
>>
>>
>>
>>
>>
>> On 11/17/16, 3:15 PM, "Marius van Niekerk" <ma...@gmail.com>
>> wrote:
>>
>> >So for the topology files are read by /usr/bin/env Python .  You can
>> >change
>> >it to point at the system Python or make those scripts py2 and 3
>> >compatible
>> >
>> >On Thu, Nov 17, 2016, 13:41 <Ia...@tdameritrade.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> I¹m experiencing a strange issue when running a toree kernel with
>> >> jupyterhub. The python version used for spark in the kernel.json is
>> >>2.7, I
>> >> verified that in the notebook itself, but in the jupyterhub logs, I
>>see
>> >> errors from two python files, created by ambari:
>> >>
>> >> /usr/bin/hdp-select
>> >> /etc/hadoop/conf/topology_script.py
>> >>
>> >> The errors come from any code which works in python 2, but not 3,
>>since
>> >> ambari needs python 2 to run. Unfortunately, jupyterhub needs python
>>3.
>> >>I¹m
>> >> not sure why Toree is using python 3 with these files, instead of the
>> >> python 2, specified in the kernel.json. I tested with a pyspark
>>notebook
>> >> and did not have the same issue, so it seems to be something related
>>to
>> >> toree¹s integration with jupyterhub.
>> >>
>> >> I¹ve updated the files to handle both python 2 & 3, but they seem to
>>be
>> >> recreated when ambari and the cluster restart. I¹m looking for a more
>> >> stable long term solution.
>> >>
>> >> Any Ideas?
>> >>
>> >> Ian
>> >>
>> >--
>> >regards
>> >Marius van Niekerk
>>
>> --
>regards
>Marius van Niekerk


Re: Issue when running toree, with jupyterhub, and ambari

Posted by Marius van Niekerk <ma...@gmail.com>.
So those are typically run when you are trying to discover nodes when
running under Yarn.

It shouldn't have anything toree specific with it. You're probably just
using a version of python by default that doesn't cause errors.

On Thu, Nov 17, 2016, 16:02 <Ia...@tdameritrade.com> wrote:

> I¹d prefer not to change those scripts, that¹s the issue. I¹m wondering
> why toree is running them, but not my pyspark notebook.
>
>
> Ian Maloney
> Platform Architect
> Advanced Analytics
> Internal: 828716
> Office: (734) 623-8716
> Mobile: (313) 910-9272
>
>
>
>
>
>
>
>
> On 11/17/16, 3:15 PM, "Marius van Niekerk" <ma...@gmail.com>
> wrote:
>
> >So for the topology files are read by /usr/bin/env Python .  You can
> >change
> >it to point at the system Python or make those scripts py2 and 3
> >compatible
> >
> >On Thu, Nov 17, 2016, 13:41 <Ia...@tdameritrade.com> wrote:
> >
> >> Hi,
> >>
> >> I¹m experiencing a strange issue when running a toree kernel with
> >> jupyterhub. The python version used for spark in the kernel.json is
> >>2.7, I
> >> verified that in the notebook itself, but in the jupyterhub logs, I see
> >> errors from two python files, created by ambari:
> >>
> >> /usr/bin/hdp-select
> >> /etc/hadoop/conf/topology_script.py
> >>
> >> The errors come from any code which works in python 2, but not 3, since
> >> ambari needs python 2 to run. Unfortunately, jupyterhub needs python 3.
> >>I¹m
> >> not sure why Toree is using python 3 with these files, instead of the
> >> python 2, specified in the kernel.json. I tested with a pyspark notebook
> >> and did not have the same issue, so it seems to be something related to
> >> toree¹s integration with jupyterhub.
> >>
> >> I¹ve updated the files to handle both python 2 & 3, but they seem to be
> >> recreated when ambari and the cluster restart. I¹m looking for a more
> >> stable long term solution.
> >>
> >> Any Ideas?
> >>
> >> Ian
> >>
> >--
> >regards
> >Marius van Niekerk
>
> --
regards
Marius van Niekerk

Re: Issue when running toree, with jupyterhub, and ambari

Posted by Ia...@tdameritrade.com.
I¹d prefer not to change those scripts, that¹s the issue. I¹m wondering
why toree is running them, but not my pyspark notebook.


Ian Maloney
Platform Architect
Advanced Analytics
Internal: 828716
Office: (734) 623-8716
Mobile: (313) 910-9272








On 11/17/16, 3:15 PM, "Marius van Niekerk" <ma...@gmail.com>
wrote:

>So for the topology files are read by /usr/bin/env Python .  You can
>change
>it to point at the system Python or make those scripts py2 and 3
>compatible
>
>On Thu, Nov 17, 2016, 13:41 <Ia...@tdameritrade.com> wrote:
>
>> Hi,
>>
>> I¹m experiencing a strange issue when running a toree kernel with
>> jupyterhub. The python version used for spark in the kernel.json is
>>2.7, I
>> verified that in the notebook itself, but in the jupyterhub logs, I see
>> errors from two python files, created by ambari:
>>
>> /usr/bin/hdp-select
>> /etc/hadoop/conf/topology_script.py
>>
>> The errors come from any code which works in python 2, but not 3, since
>> ambari needs python 2 to run. Unfortunately, jupyterhub needs python 3.
>>I¹m
>> not sure why Toree is using python 3 with these files, instead of the
>> python 2, specified in the kernel.json. I tested with a pyspark notebook
>> and did not have the same issue, so it seems to be something related to
>> toree¹s integration with jupyterhub.
>>
>> I¹ve updated the files to handle both python 2 & 3, but they seem to be
>> recreated when ambari and the cluster restart. I¹m looking for a more
>> stable long term solution.
>>
>> Any Ideas?
>>
>> Ian
>>
>-- 
>regards
>Marius van Niekerk


Re: Issue when running toree, with jupyterhub, and ambari

Posted by Marius van Niekerk <ma...@gmail.com>.
So for the topology files are read by /usr/bin/env Python .  You can change
it to point at the system Python or make those scripts py2 and 3 compatible

On Thu, Nov 17, 2016, 13:41 <Ia...@tdameritrade.com> wrote:

> Hi,
>
> I’m experiencing a strange issue when running a toree kernel with
> jupyterhub. The python version used for spark in the kernel.json is 2.7, I
> verified that in the notebook itself, but in the jupyterhub logs, I see
> errors from two python files, created by ambari:
>
> /usr/bin/hdp-select
> /etc/hadoop/conf/topology_script.py
>
> The errors come from any code which works in python 2, but not 3, since
> ambari needs python 2 to run. Unfortunately, jupyterhub needs python 3. I’m
> not sure why Toree is using python 3 with these files, instead of the
> python 2, specified in the kernel.json. I tested with a pyspark notebook
> and did not have the same issue, so it seems to be something related to
> toree’s integration with jupyterhub.
>
> I’ve updated the files to handle both python 2 & 3, but they seem to be
> recreated when ambari and the cluster restart. I’m looking for a more
> stable long term solution.
>
> Any Ideas?
>
> Ian
>
-- 
regards
Marius van Niekerk