You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Elemir Stevko <El...@versent.com.au> on 2019/03/25 03:40:06 UTC

Exception in ExecuteScript when creating boto3 client for S3

Hello,

I am trying to implement a Python-based ExecuteScript processor in NiFi 1.9.0 that will get a list of files from S3. I am getting an exception when I am trying to create an s3 client in boto3:

boto_client = boto3.client('s3', region_name='us-east-1')

Exception: No module named multiprocessing in <script> at line number 16

2019-03-25 03:08:33,591 ERROR [Timer-Driven Process Thread-4] o.a.nifi.processors.script.ExecuteScript ExecuteScript[id=b258d892-0169-1000-f2ec-1e98e077f15b] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: javax.script.ScriptException: ImportError: No module named multiprocessing in <script> at line number 16

I can however create a boto3 client for Athena for example and that passes without error:

boto_client = boto3.client('athena', region_name='us-east-1')

I have observed the same behaviour with InvokeScriptedProcessor.

I am passing '/usr/lib/python2.7/site-packages/' in Module Directory property.

Here is the code snippet for ExecuteScript processor that should reproduce this issue:

import boto3
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
  def __init__(self, text):
    self.text = text
  def process(self, inputStream, outputStream):
    outputStream.write(bytearray(self.text.encode('utf-8')))

def getFileList():
    return ['file1', 'file2']

boto_client = boto3.client('s3', region_name='ap-southeast-2')

for file in getFileList():
  flowfile = session.create()
  if flowfile:
    flowfile = session.write(flowfile, PyStreamCallback(file))
    session.transfer(flowfile, REL_SUCCESS)

Is there any workaround for this issue?

Best regards,
Elemir

Re: Exception in ExecuteScript when creating boto3 client for S3

Posted by Elemir Stevko <El...@versent.com.au>.
Thanks a lot, Matt! I'll try implementing it with the ExecuteStreamCommand processor.

Best regards,
Elemir

On 25/3/19, 2:48 pm, "Matt Burgess" <ma...@apache.org> wrote:

    As NiFi is a pure Java/JVM application, we use Jython rather than
    Python for ExecuteScript. This means that you can't import native
    (CPython, e.g.) modules into your Jython scripts in ExecuteScript,
    which is what I believe is happening here. If you need native CPython
    modules (and if you're operating only on flowfile content and not
    attributes), consider using ExecuteStreamCommand with a real Python
    interpreter and script. I'm looking at Py4J to try and bridge the gap,
    but in the meantime you have to choose between "pure" Python (Jython)
    for ExecuteScript and full Python with
    ExecuteStreamCommand/ExecuteProcess.
    
    Regards,
    Matt
    
    On Sun, Mar 24, 2019 at 11:40 PM Elemir Stevko
    <El...@versent.com.au> wrote:
    >
    > Hello,
    >
    > I am trying to implement a Python-based ExecuteScript processor in NiFi 1.9.0 that will get a list of files from S3. I am getting an exception when I am trying to create an s3 client in boto3:
    >
    > boto_client = boto3.client('s3', region_name='us-east-1')
    >
    > Exception: No module named multiprocessing in <script> at line number 16
    >
    > 2019-03-25 03:08:33,591 ERROR [Timer-Driven Process Thread-4] o.a.nifi.processors.script.ExecuteScript ExecuteScript[id=b258d892-0169-1000-f2ec-1e98e077f15b] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: javax.script.ScriptException: ImportError: No module named multiprocessing in <script> at line number 16
    >
    > I can however create a boto3 client for Athena for example and that passes without error:
    >
    > boto_client = boto3.client('athena', region_name='us-east-1')
    >
    > I have observed the same behaviour with InvokeScriptedProcessor.
    >
    > I am passing '/usr/lib/python2.7/site-packages/' in Module Directory property.
    >
    > Here is the code snippet for ExecuteScript processor that should reproduce this issue:
    >
    > import boto3
    > import java.io
    > from org.apache.commons.io import IOUtils
    > from java.nio.charset import StandardCharsets
    > from org.apache.nifi.processor.io import StreamCallback
    >
    > class PyStreamCallback(StreamCallback):
    >   def __init__(self, text):
    >     self.text = text
    >   def process(self, inputStream, outputStream):
    >     outputStream.write(bytearray(self.text.encode('utf-8')))
    >
    > def getFileList():
    >     return ['file1', 'file2']
    >
    > boto_client = boto3.client('s3', region_name='ap-southeast-2')
    >
    > for file in getFileList():
    >   flowfile = session.create()
    >   if flowfile:
    >     flowfile = session.write(flowfile, PyStreamCallback(file))
    >     session.transfer(flowfile, REL_SUCCESS)
    >
    > Is there any workaround for this issue?
    >
    > Best regards,
    > Elemir
    


Re: Exception in ExecuteScript when creating boto3 client for S3

Posted by Matt Burgess <ma...@apache.org>.
As NiFi is a pure Java/JVM application, we use Jython rather than
Python for ExecuteScript. This means that you can't import native
(CPython, e.g.) modules into your Jython scripts in ExecuteScript,
which is what I believe is happening here. If you need native CPython
modules (and if you're operating only on flowfile content and not
attributes), consider using ExecuteStreamCommand with a real Python
interpreter and script. I'm looking at Py4J to try and bridge the gap,
but in the meantime you have to choose between "pure" Python (Jython)
for ExecuteScript and full Python with
ExecuteStreamCommand/ExecuteProcess.

Regards,
Matt

On Sun, Mar 24, 2019 at 11:40 PM Elemir Stevko
<El...@versent.com.au> wrote:
>
> Hello,
>
> I am trying to implement a Python-based ExecuteScript processor in NiFi 1.9.0 that will get a list of files from S3. I am getting an exception when I am trying to create an s3 client in boto3:
>
> boto_client = boto3.client('s3', region_name='us-east-1')
>
> Exception: No module named multiprocessing in <script> at line number 16
>
> 2019-03-25 03:08:33,591 ERROR [Timer-Driven Process Thread-4] o.a.nifi.processors.script.ExecuteScript ExecuteScript[id=b258d892-0169-1000-f2ec-1e98e077f15b] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: javax.script.ScriptException: ImportError: No module named multiprocessing in <script> at line number 16
>
> I can however create a boto3 client for Athena for example and that passes without error:
>
> boto_client = boto3.client('athena', region_name='us-east-1')
>
> I have observed the same behaviour with InvokeScriptedProcessor.
>
> I am passing '/usr/lib/python2.7/site-packages/' in Module Directory property.
>
> Here is the code snippet for ExecuteScript processor that should reproduce this issue:
>
> import boto3
> import java.io
> from org.apache.commons.io import IOUtils
> from java.nio.charset import StandardCharsets
> from org.apache.nifi.processor.io import StreamCallback
>
> class PyStreamCallback(StreamCallback):
>   def __init__(self, text):
>     self.text = text
>   def process(self, inputStream, outputStream):
>     outputStream.write(bytearray(self.text.encode('utf-8')))
>
> def getFileList():
>     return ['file1', 'file2']
>
> boto_client = boto3.client('s3', region_name='ap-southeast-2')
>
> for file in getFileList():
>   flowfile = session.create()
>   if flowfile:
>     flowfile = session.write(flowfile, PyStreamCallback(file))
>     session.transfer(flowfile, REL_SUCCESS)
>
> Is there any workaround for this issue?
>
> Best regards,
> Elemir