You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Jeremy Karn <jk...@mortardata.com> on 2013/08/23 22:19:52 UTC
Review Request 13781: Changes to add support for streaming_python udfs.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13781/
-----------------------------------------------------------
Review request for pig.
Repository: pig-git
Description
-------
Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417)
Diffs
-----
build.xml b20eb3d
src/org/apache/pig/PigToStream.java 7cc2950
src/org/apache/pig/StreamToPig.java ff24b27
src/org/apache/pig/builtin/PigStreaming.java 5467693
src/org/apache/pig/builtin/PigToStreamUDF.java PRE-CREATION
src/org/apache/pig/builtin/StreamUDFToPig.java PRE-CREATION
src/org/apache/pig/builtin/StreamingDelimiters.java PRE-CREATION
src/org/apache/pig/builtin/StreamingUDF.java PRE-CREATION
src/org/apache/pig/builtin/StreamingUDFException.java PRE-CREATION
src/org/apache/pig/builtin/StreamingUDFOutputSchemaException.java PRE-CREATION
src/org/apache/pig/impl/streaming/DefaultInputHandler.java 301bea3
src/org/apache/pig/impl/streaming/DefaultOutputHandler.java 1b46e7d
src/org/apache/pig/impl/streaming/ExecutableManager.java cf79c83
src/org/apache/pig/impl/streaming/InputHandler.java 690d94e
src/org/apache/pig/impl/streaming/OutputHandler.java 6e9262a
src/org/apache/pig/impl/streaming/StreamingUDFOutputHandler.java PRE-CREATION
src/org/apache/pig/impl/streaming/StreamingUtil.java PRE-CREATION
src/org/apache/pig/impl/util/JarManager.java 5c4acb0
src/org/apache/pig/impl/util/StorageUtil.java dcb62ec
src/org/apache/pig/scripting/ScriptEngine.java 29a9e1f
src/org/apache/pig/scripting/ScriptingIllustrateOutputCapturer.java PRE-CREATION
src/org/apache/pig/scripting/streaming/python/PythonScriptEngine.java PRE-CREATION
src/python/streaming/controller.py PRE-CREATION
src/python/streaming/pig_util.py PRE-CREATION
test/org/apache/pig/builtin/TestPigToStreamUDF.java PRE-CREATION
test/org/apache/pig/builtin/TestStreamUDFToPig.java PRE-CREATION
test/org/apache/pig/builtin/TestStreamingUDF.java PRE-CREATION
test/org/apache/pig/impl/streaming/TestExecutableManager.java 6246019
test/org/apache/pig/impl/streaming/TestStreamingUDFOutputHandler.java PRE-CREATION
test/org/apache/pig/impl/streaming/TestStreamingUtil.java PRE-CREATION
test/org/apache/pig/test/TestPigStreaming.java PRE-CREATION
test/org/apache/pig/test/TestStreaming.java 1eac5d2
test/python/streaming/test_controller.py PRE-CREATION
test/unit-tests d52ad9d
Diff: https://reviews.apache.org/r/13781/diff/
Testing
-------
Thanks,
Jeremy Karn
Re: Review Request 13781: Changes to add support for streaming_python udfs.
Posted by Jeremy Karn <jk...@mortardata.com>.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 74
> > <https://reviews.apache.org/r/13781/diff/1/?file=344447#file344447line74>
> >
> > decoding string is actually quite costly as java strings are utf-16 and creating a string always copies the content of the byte array.
> > Can't you just use the binary format of the type?
I'm not sure how to use the binary format without significantly rewriting the serialization logic on the python side to serialize the data in a byte format (using something like struct.pack). Is that what you were thinking of, or were you thinking of something else?
The last profiling we did showed that the bottleneck is typically on the python side deserializing the input from Pig. I think any performance enhancements at this point should be done in a new jira with solid profiling over a number of test cases.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 98
> > <https://reviews.apache.org/r/13781/diff/1/?file=344447#file344447line98>
> >
> > would StreamTokenizer help ?
I took a look, and I don't think its worth re-writing this logic using StreamTokenizer.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 109
> > <https://reviews.apache.org/r/13781/diff/1/?file=344447#file344447line109>
> >
> > if you know the size in advance you can pass it here and avoid resizing
Done.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 123
> > <https://reviews.apache.org/r/13781/diff/1/?file=344447#file344447line123>
> >
> > you probably want to do getInstance() once
Done.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamingDelimiters.java, line 74
> > <https://reviews.apache.org/r/13781/diff/1/?file=344448#file344448line74>
> >
> > why do we need a hashmap to access those statically defined values?
Done.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamingUDF.java, line 51
> > <https://reviews.apache.org/r/13781/diff/1/?file=344449#file344449line51>
> >
> > enum?
An enum seems a little more verbose without much benefit in this case. I don't feel strongly about it though.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamingUDF.java, line 382
> > <https://reviews.apache.org/r/13781/diff/1/?file=344449#file344449line382>
> >
> > if (info) log.info
Fixed. This log statement was removed entirely.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/builtin/StreamingUDFException.java, line 23
> > <https://reviews.apache.org/r/13781/diff/1/?file=344450#file344450line23>
> >
> > if you don't make the parent see the cause printStackTrace won't display the cause's stackTrace
Done.
> On Sept. 4, 2013, 11:36 p.m., Julien Le Dem wrote:
> > src/org/apache/pig/impl/util/StorageUtil.java, line 176
> > <https://reviews.apache.org/r/13781/diff/1/?file=344460#file344460line176>
> >
> > why is that?
No good reason, so I added BigDecimal and BigInteger support.
- Jeremy
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13781/#review25913
-----------------------------------------------------------
On Sept. 5, 2013, 8:23 p.m., Jeremy Karn wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13781/
> -----------------------------------------------------------
>
> (Updated Sept. 5, 2013, 8:23 p.m.)
>
>
> Review request for pig.
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417)
>
>
> Diffs
> -----
>
> build.xml 7e22192
> src/org/apache/pig/PigToStream.java 7cc2950
> src/org/apache/pig/StreamToPig.java ff24b27
> src/org/apache/pig/builtin/PigStreaming.java 5467693
> src/org/apache/pig/builtin/PigToStreamUDF.java PRE-CREATION
> src/org/apache/pig/builtin/StreamUDFToPig.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingDelimiters.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDF.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDFException.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDFOutputSchemaException.java PRE-CREATION
> src/org/apache/pig/impl/streaming/DefaultInputHandler.java 301bea3
> src/org/apache/pig/impl/streaming/DefaultOutputHandler.java 1b46e7d
> src/org/apache/pig/impl/streaming/ExecutableManager.java cf79c83
> src/org/apache/pig/impl/streaming/InputHandler.java 690d94e
> src/org/apache/pig/impl/streaming/OutputHandler.java 6e9262a
> src/org/apache/pig/impl/streaming/StreamingUDFOutputHandler.java PRE-CREATION
> src/org/apache/pig/impl/streaming/StreamingUtil.java PRE-CREATION
> src/org/apache/pig/impl/util/JarManager.java 5c4acb0
> src/org/apache/pig/impl/util/StorageUtil.java dcb62ec
> src/org/apache/pig/scripting/ScriptEngine.java 29a9e1f
> src/org/apache/pig/scripting/ScriptingIllustrateOutputCapturer.java PRE-CREATION
> src/org/apache/pig/scripting/streaming/python/PythonScriptEngine.java PRE-CREATION
> src/python/streaming/controller.py PRE-CREATION
> src/python/streaming/pig_util.py PRE-CREATION
> test/org/apache/pig/builtin/TestPigToStreamUDF.java PRE-CREATION
> test/org/apache/pig/builtin/TestStreamUDFToPig.java PRE-CREATION
> test/org/apache/pig/builtin/TestStreamingUDF.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestExecutableManager.java 6246019
> test/org/apache/pig/impl/streaming/TestStreamingUDFOutputHandler.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestStreamingUtil.java PRE-CREATION
> test/org/apache/pig/test/TestPigStreaming.java PRE-CREATION
> test/org/apache/pig/test/TestStreaming.java 1eac5d2
> test/python/streaming/test_controller.py PRE-CREATION
> test/unit-tests d52ad9d
>
> Diff: https://reviews.apache.org/r/13781/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jeremy Karn
>
>
Re: Review Request 13781: Changes to add support for streaming_python udfs.
Posted by Julien Le Dem <ju...@ledem.net>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13781/#review25913
-----------------------------------------------------------
src/org/apache/pig/builtin/StreamUDFToPig.java
<https://reviews.apache.org/r/13781/#comment50560>
decoding string is actually quite costly as java strings are utf-16 and creating a string always copies the content of the byte array.
Can't you just use the binary format of the type?
src/org/apache/pig/builtin/StreamUDFToPig.java
<https://reviews.apache.org/r/13781/#comment50566>
would StreamTokenizer help ?
src/org/apache/pig/builtin/StreamUDFToPig.java
<https://reviews.apache.org/r/13781/#comment50562>
if you know the size in advance you can pass it here and avoid resizing
src/org/apache/pig/builtin/StreamUDFToPig.java
<https://reviews.apache.org/r/13781/#comment50561>
you probably want to do getInstance() once
src/org/apache/pig/builtin/StreamingDelimiters.java
<https://reviews.apache.org/r/13781/#comment50565>
why do we need a hashmap to access those statically defined values?
src/org/apache/pig/builtin/StreamingUDF.java
<https://reviews.apache.org/r/13781/#comment50568>
enum?
src/org/apache/pig/builtin/StreamingUDF.java
<https://reviews.apache.org/r/13781/#comment50569>
chain exceptions
src/org/apache/pig/builtin/StreamingUDF.java
<https://reviews.apache.org/r/13781/#comment50571>
if (info) log.info
src/org/apache/pig/builtin/StreamingUDFException.java
<https://reviews.apache.org/r/13781/#comment50574>
if you don't make the parent see the cause printStackTrace won't display the cause's stackTrace
src/org/apache/pig/impl/util/StorageUtil.java
<https://reviews.apache.org/r/13781/#comment50575>
why is that?
- Julien Le Dem
On Aug. 23, 2013, 8:19 p.m., Jeremy Karn wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13781/
> -----------------------------------------------------------
>
> (Updated Aug. 23, 2013, 8:19 p.m.)
>
>
> Review request for pig.
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417)
>
>
> Diffs
> -----
>
> build.xml b20eb3d
> src/org/apache/pig/PigToStream.java 7cc2950
> src/org/apache/pig/StreamToPig.java ff24b27
> src/org/apache/pig/builtin/PigStreaming.java 5467693
> src/org/apache/pig/builtin/PigToStreamUDF.java PRE-CREATION
> src/org/apache/pig/builtin/StreamUDFToPig.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingDelimiters.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDF.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDFException.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDFOutputSchemaException.java PRE-CREATION
> src/org/apache/pig/impl/streaming/DefaultInputHandler.java 301bea3
> src/org/apache/pig/impl/streaming/DefaultOutputHandler.java 1b46e7d
> src/org/apache/pig/impl/streaming/ExecutableManager.java cf79c83
> src/org/apache/pig/impl/streaming/InputHandler.java 690d94e
> src/org/apache/pig/impl/streaming/OutputHandler.java 6e9262a
> src/org/apache/pig/impl/streaming/StreamingUDFOutputHandler.java PRE-CREATION
> src/org/apache/pig/impl/streaming/StreamingUtil.java PRE-CREATION
> src/org/apache/pig/impl/util/JarManager.java 5c4acb0
> src/org/apache/pig/impl/util/StorageUtil.java dcb62ec
> src/org/apache/pig/scripting/ScriptEngine.java 29a9e1f
> src/org/apache/pig/scripting/ScriptingIllustrateOutputCapturer.java PRE-CREATION
> src/org/apache/pig/scripting/streaming/python/PythonScriptEngine.java PRE-CREATION
> src/python/streaming/controller.py PRE-CREATION
> src/python/streaming/pig_util.py PRE-CREATION
> test/org/apache/pig/builtin/TestPigToStreamUDF.java PRE-CREATION
> test/org/apache/pig/builtin/TestStreamUDFToPig.java PRE-CREATION
> test/org/apache/pig/builtin/TestStreamingUDF.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestExecutableManager.java 6246019
> test/org/apache/pig/impl/streaming/TestStreamingUDFOutputHandler.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestStreamingUtil.java PRE-CREATION
> test/org/apache/pig/test/TestPigStreaming.java PRE-CREATION
> test/org/apache/pig/test/TestStreaming.java 1eac5d2
> test/python/streaming/test_controller.py PRE-CREATION
> test/unit-tests d52ad9d
>
> Diff: https://reviews.apache.org/r/13781/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jeremy Karn
>
>
Re: Review Request 13781: Changes to add support for streaming_python udfs.
Posted by Jeremy Karn <jk...@mortardata.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13781/
-----------------------------------------------------------
(Updated Sept. 7, 2013, 3:39 p.m.)
Review request for pig.
Repository: pig-git
Description
-------
Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417)
Diffs (updated)
-----
build.xml 7e22192
src/org/apache/pig/impl/builtin/StreamingUDF.java PRE-CREATION
src/org/apache/pig/impl/streaming/ExecutableManager.java cf79c83
src/org/apache/pig/impl/streaming/OutputHandler.java 6e9262a
src/org/apache/pig/impl/streaming/PigToStreamUDF.java PRE-CREATION
src/org/apache/pig/impl/streaming/StreamUDFToPig.java PRE-CREATION
src/org/apache/pig/impl/streaming/StreamingDelimiters.java PRE-CREATION
src/org/apache/pig/impl/streaming/StreamingUDFException.java PRE-CREATION
src/org/apache/pig/impl/streaming/StreamingUDFInputHandler.java PRE-CREATION
src/org/apache/pig/impl/streaming/StreamingUDFOutputHandler.java PRE-CREATION
src/org/apache/pig/impl/streaming/StreamingUDFOutputSchemaException.java PRE-CREATION
src/org/apache/pig/impl/streaming/StreamingUtil.java PRE-CREATION
src/org/apache/pig/impl/util/JarManager.java 5c4acb0
src/org/apache/pig/impl/util/StorageUtil.java dcb62ec
src/org/apache/pig/scripting/ScriptEngine.java 29a9e1f
src/org/apache/pig/scripting/ScriptingOutputCapturer.java PRE-CREATION
src/org/apache/pig/scripting/streaming/python/PythonScriptEngine.java PRE-CREATION
src/python/streaming/controller.py PRE-CREATION
src/python/streaming/pig_util.py PRE-CREATION
test/org/apache/pig/impl/builtin/TestStreamingUDF.java PRE-CREATION
test/org/apache/pig/impl/streaming/TestExecutableManager.java 6246019
test/org/apache/pig/impl/streaming/TestPigToStreamUDF.java PRE-CREATION
test/org/apache/pig/impl/streaming/TestStreamUDFToPig.java PRE-CREATION
test/org/apache/pig/impl/streaming/TestStreamingUDFOutputHandler.java PRE-CREATION
test/org/apache/pig/impl/streaming/TestStreamingUtil.java PRE-CREATION
test/org/apache/pig/test/TestPigStreaming.java PRE-CREATION
test/python/streaming/test_controller.py PRE-CREATION
test/unit-tests d52ad9d
Diff: https://reviews.apache.org/r/13781/diff/
Testing
-------
Thanks,
Jeremy Karn
Re: Review Request 13781: Changes to add support for streaming_python udfs.
Posted by Daniel Dai <da...@gmail.com>.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/builtin/StreamingUDF.java, line 169
> > <https://reviews.apache.org/r/13781/diff/2/?file=348276#file348276line169>
> >
> > Are those files only for illustrate? Why not just use a random tmp file? Why do we distinguish exectype in illustrate? It only run locally.
>
> Jeremy Karn wrote:
> I renamed the class to remove Illustrate from the name. This is actually used both for illustrate (where we capture the output only from the last run) and in real runs (where we write the output to the user logs). By putting it in the user logs you can get the logs from the task tracker web interface. We could use temp files but I like the more human readable names.
Does the log file location hack also work in Hadoop2?
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > test/unit-tests, line 73
> > <https://reviews.apache.org/r/13781/diff/2/?file=348302#file348302line73>
> >
> > This is not true unit test, should remove from the list
>
> Jeremy Karn wrote:
> Can it go in commit-tests?
The goal for commit-tests to complete within 10 min. So we don't tend to include end-to-end unit test cases.
- Daniel
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13781/#review25964
-----------------------------------------------------------
On Sept. 7, 2013, 3:39 p.m., Jeremy Karn wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13781/
> -----------------------------------------------------------
>
> (Updated Sept. 7, 2013, 3:39 p.m.)
>
>
> Review request for pig.
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417)
>
>
> Diffs
> -----
>
> build.xml 7e22192
> src/org/apache/pig/impl/builtin/StreamingUDF.java PRE-CREATION
> src/org/apache/pig/impl/streaming/ExecutableManager.java cf79c83
> src/org/apache/pig/impl/streaming/OutputHandler.java 6e9262a
> src/org/apache/pig/impl/streaming/PigToStreamUDF.java PRE-CREATION
> src/org/apache/pig/impl/streaming/StreamUDFToPig.java PRE-CREATION
> src/org/apache/pig/impl/streaming/StreamingDelimiters.java PRE-CREATION
> src/org/apache/pig/impl/streaming/StreamingUDFException.java PRE-CREATION
> src/org/apache/pig/impl/streaming/StreamingUDFInputHandler.java PRE-CREATION
> src/org/apache/pig/impl/streaming/StreamingUDFOutputHandler.java PRE-CREATION
> src/org/apache/pig/impl/streaming/StreamingUDFOutputSchemaException.java PRE-CREATION
> src/org/apache/pig/impl/streaming/StreamingUtil.java PRE-CREATION
> src/org/apache/pig/impl/util/JarManager.java 5c4acb0
> src/org/apache/pig/impl/util/StorageUtil.java dcb62ec
> src/org/apache/pig/scripting/ScriptEngine.java 29a9e1f
> src/org/apache/pig/scripting/ScriptingOutputCapturer.java PRE-CREATION
> src/org/apache/pig/scripting/streaming/python/PythonScriptEngine.java PRE-CREATION
> src/python/streaming/controller.py PRE-CREATION
> src/python/streaming/pig_util.py PRE-CREATION
> test/org/apache/pig/impl/builtin/TestStreamingUDF.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestExecutableManager.java 6246019
> test/org/apache/pig/impl/streaming/TestPigToStreamUDF.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestStreamUDFToPig.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestStreamingUDFOutputHandler.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestStreamingUtil.java PRE-CREATION
> test/org/apache/pig/test/TestPigStreaming.java PRE-CREATION
> test/python/streaming/test_controller.py PRE-CREATION
> test/unit-tests d52ad9d
>
> Diff: https://reviews.apache.org/r/13781/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jeremy Karn
>
>
Re: Review Request 13781: Changes to add support for streaming_python udfs.
Posted by Jeremy Karn <jk...@mortardata.com>.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 47
> > <https://reviews.apache.org/r/13781/diff/2/?file=348274#file348274line47>
> >
> > This might be affected by PIG-3255
Looks like it would be easy to make this patch work with PIG-3255. If we could get that merged shortly I'd update this patch with the necessary changes.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/builtin/StreamUDFToPig.java, line 109
> > <https://reviews.apache.org/r/13781/diff/2/?file=348274#file348274line109>
> >
> > How do we handle delimit escaping? Seems we don't handle it right now?
We don't handle it now. We use a 3 character delimiter which avoids collisions in practice. I would actually like to switch the serialization to a 'real' serialization library but we did a little bit of prototyping a few months ago and there wasn't anything out there that worked really well and had significant performance improvements. Avro was the most promising but on the Python side it was missing a number of features we needed.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/builtin/StreamingUDF.java, line 169
> > <https://reviews.apache.org/r/13781/diff/2/?file=348276#file348276line169>
> >
> > Are those files only for illustrate? Why not just use a random tmp file? Why do we distinguish exectype in illustrate? It only run locally.
I renamed the class to remove Illustrate from the name. This is actually used both for illustrate (where we capture the output only from the last run) and in real runs (where we write the output to the user logs). By putting it in the user logs you can get the logs from the task tracker web interface. We could use temp files but I like the more human readable names.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/builtin/StreamingUDF.java, line 198
> > <https://reviews.apache.org/r/13781/diff/2/?file=348276#file348276line198>
> >
> > Any reason not to create a StreamingUDFInputHandler?
I didn't really think it was necessary but I guess its more consistent.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/builtin/StreamingUDF.java, line 366
> > <https://reviews.apache.org/r/13781/diff/2/?file=348276#file348276line366>
> >
> > Seems flush() is more proper to put inside inputHandler
I'm not sure if it would be unnecessary for some clients of InputHandler. For the StreamingUDF we need to make sure we always flush or else we can get blocked on the reading process not getting the full input (and then never writing the output) but I'm not sure if its always necessary.
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > src/org/apache/pig/impl/streaming/InputHandler.java, line 66
> > <https://reviews.apache.org/r/13781/diff/2/?file=348282#file348282line66>
> >
> > Why we need this change?
Probably just left over from debugging where I wanted to know the value of the byte[].
> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote:
> > test/unit-tests, line 73
> > <https://reviews.apache.org/r/13781/diff/2/?file=348302#file348302line73>
> >
> > This is not true unit test, should remove from the list
Can it go in commit-tests?
- Jeremy
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13781/#review25964
-----------------------------------------------------------
On Sept. 5, 2013, 8:23 p.m., Jeremy Karn wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13781/
> -----------------------------------------------------------
>
> (Updated Sept. 5, 2013, 8:23 p.m.)
>
>
> Review request for pig.
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417)
>
>
> Diffs
> -----
>
> build.xml 7e22192
> src/org/apache/pig/PigToStream.java 7cc2950
> src/org/apache/pig/StreamToPig.java ff24b27
> src/org/apache/pig/builtin/PigStreaming.java 5467693
> src/org/apache/pig/builtin/PigToStreamUDF.java PRE-CREATION
> src/org/apache/pig/builtin/StreamUDFToPig.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingDelimiters.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDF.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDFException.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDFOutputSchemaException.java PRE-CREATION
> src/org/apache/pig/impl/streaming/DefaultInputHandler.java 301bea3
> src/org/apache/pig/impl/streaming/DefaultOutputHandler.java 1b46e7d
> src/org/apache/pig/impl/streaming/ExecutableManager.java cf79c83
> src/org/apache/pig/impl/streaming/InputHandler.java 690d94e
> src/org/apache/pig/impl/streaming/OutputHandler.java 6e9262a
> src/org/apache/pig/impl/streaming/StreamingUDFOutputHandler.java PRE-CREATION
> src/org/apache/pig/impl/streaming/StreamingUtil.java PRE-CREATION
> src/org/apache/pig/impl/util/JarManager.java 5c4acb0
> src/org/apache/pig/impl/util/StorageUtil.java dcb62ec
> src/org/apache/pig/scripting/ScriptEngine.java 29a9e1f
> src/org/apache/pig/scripting/ScriptingIllustrateOutputCapturer.java PRE-CREATION
> src/org/apache/pig/scripting/streaming/python/PythonScriptEngine.java PRE-CREATION
> src/python/streaming/controller.py PRE-CREATION
> src/python/streaming/pig_util.py PRE-CREATION
> test/org/apache/pig/builtin/TestPigToStreamUDF.java PRE-CREATION
> test/org/apache/pig/builtin/TestStreamUDFToPig.java PRE-CREATION
> test/org/apache/pig/builtin/TestStreamingUDF.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestExecutableManager.java 6246019
> test/org/apache/pig/impl/streaming/TestStreamingUDFOutputHandler.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestStreamingUtil.java PRE-CREATION
> test/org/apache/pig/test/TestPigStreaming.java PRE-CREATION
> test/org/apache/pig/test/TestStreaming.java 1eac5d2
> test/python/streaming/test_controller.py PRE-CREATION
> test/unit-tests d52ad9d
>
> Diff: https://reviews.apache.org/r/13781/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jeremy Karn
>
>
Re: Review Request 13781: Changes to add support for streaming_python udfs.
Posted by Daniel Dai <da...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13781/#review25964
-----------------------------------------------------------
src/org/apache/pig/PigToStream.java
<https://reviews.apache.org/r/13781/#comment50670>
No code change, remove this section from patch
src/org/apache/pig/StreamToPig.java
<https://reviews.apache.org/r/13781/#comment50671>
No code change, remove this section from patch
src/org/apache/pig/builtin/PigStreaming.java
<https://reviews.apache.org/r/13781/#comment50672>
No code change, remove this section from patch
src/org/apache/pig/builtin/PigToStreamUDF.java
<https://reviews.apache.org/r/13781/#comment50675>
I feel those classes StreamingUDF should go to package org.apache.pig.impl.builtin, StreamingDelimiters, StreamUDFToPig, PigToStreamUDF, etc should go to org.apache.pig.impl.streaming
src/org/apache/pig/builtin/PigToStreamUDF.java
<https://reviews.apache.org/r/13781/#comment50673>
I personally like to move the last element processing to the for loop:
for (int i=0; i < sz; i++) {
field = t.get(i);
StorageUtil.putField(out, field, DELIMS, true);
if (i!=sz-1) {
out.write(DELIMS.getParamDelim());
}
}
src/org/apache/pig/builtin/StreamUDFToPig.java
<https://reviews.apache.org/r/13781/#comment50684>
This might be affected by PIG-3255
src/org/apache/pig/builtin/StreamUDFToPig.java
<https://reviews.apache.org/r/13781/#comment50686>
How do we handle delimit escaping? Seems we don't handle it right now?
src/org/apache/pig/builtin/StreamingUDF.java
<https://reviews.apache.org/r/13781/#comment50685>
Are those files only for illustrate? Why not just use a random tmp file? Why do we distinguish exectype in illustrate? It only run locally.
src/org/apache/pig/builtin/StreamingUDF.java
<https://reviews.apache.org/r/13781/#comment50680>
Any reason not to create a StreamingUDFInputHandler?
src/org/apache/pig/builtin/StreamingUDF.java
<https://reviews.apache.org/r/13781/#comment50679>
Can we use FileUtils.copyFile?
src/org/apache/pig/builtin/StreamingUDF.java
<https://reviews.apache.org/r/13781/#comment50683>
Seems flush() is more proper to put inside inputHandler
src/org/apache/pig/impl/streaming/InputHandler.java
<https://reviews.apache.org/r/13781/#comment50682>
Why we need this change?
test/unit-tests
<https://reviews.apache.org/r/13781/#comment50681>
This is not true unit test, should remove from the list
Overall, the patch is non-intrusive. Except for minor refactory in ExecutableManager, existing features should not be affected. I am fine to check in code quickly, then improve the code over time. Before that, we need:
* address existing comments, especially conflict with PIG-3255
* add some e2e tests
* run regression tests
- Daniel Dai
On Sept. 5, 2013, 8:23 p.m., Jeremy Karn wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13781/
> -----------------------------------------------------------
>
> (Updated Sept. 5, 2013, 8:23 p.m.)
>
>
> Review request for pig.
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417)
>
>
> Diffs
> -----
>
> build.xml 7e22192
> src/org/apache/pig/PigToStream.java 7cc2950
> src/org/apache/pig/StreamToPig.java ff24b27
> src/org/apache/pig/builtin/PigStreaming.java 5467693
> src/org/apache/pig/builtin/PigToStreamUDF.java PRE-CREATION
> src/org/apache/pig/builtin/StreamUDFToPig.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingDelimiters.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDF.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDFException.java PRE-CREATION
> src/org/apache/pig/builtin/StreamingUDFOutputSchemaException.java PRE-CREATION
> src/org/apache/pig/impl/streaming/DefaultInputHandler.java 301bea3
> src/org/apache/pig/impl/streaming/DefaultOutputHandler.java 1b46e7d
> src/org/apache/pig/impl/streaming/ExecutableManager.java cf79c83
> src/org/apache/pig/impl/streaming/InputHandler.java 690d94e
> src/org/apache/pig/impl/streaming/OutputHandler.java 6e9262a
> src/org/apache/pig/impl/streaming/StreamingUDFOutputHandler.java PRE-CREATION
> src/org/apache/pig/impl/streaming/StreamingUtil.java PRE-CREATION
> src/org/apache/pig/impl/util/JarManager.java 5c4acb0
> src/org/apache/pig/impl/util/StorageUtil.java dcb62ec
> src/org/apache/pig/scripting/ScriptEngine.java 29a9e1f
> src/org/apache/pig/scripting/ScriptingIllustrateOutputCapturer.java PRE-CREATION
> src/org/apache/pig/scripting/streaming/python/PythonScriptEngine.java PRE-CREATION
> src/python/streaming/controller.py PRE-CREATION
> src/python/streaming/pig_util.py PRE-CREATION
> test/org/apache/pig/builtin/TestPigToStreamUDF.java PRE-CREATION
> test/org/apache/pig/builtin/TestStreamUDFToPig.java PRE-CREATION
> test/org/apache/pig/builtin/TestStreamingUDF.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestExecutableManager.java 6246019
> test/org/apache/pig/impl/streaming/TestStreamingUDFOutputHandler.java PRE-CREATION
> test/org/apache/pig/impl/streaming/TestStreamingUtil.java PRE-CREATION
> test/org/apache/pig/test/TestPigStreaming.java PRE-CREATION
> test/org/apache/pig/test/TestStreaming.java 1eac5d2
> test/python/streaming/test_controller.py PRE-CREATION
> test/unit-tests d52ad9d
>
> Diff: https://reviews.apache.org/r/13781/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Jeremy Karn
>
>
Re: Review Request 13781: Changes to add support for streaming_python udfs.
Posted by Jeremy Karn <jk...@mortardata.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13781/
-----------------------------------------------------------
(Updated Sept. 5, 2013, 8:23 p.m.)
Review request for pig.
Repository: pig-git
Description
-------
Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417)
Diffs (updated)
-----
build.xml 7e22192
src/org/apache/pig/PigToStream.java 7cc2950
src/org/apache/pig/StreamToPig.java ff24b27
src/org/apache/pig/builtin/PigStreaming.java 5467693
src/org/apache/pig/builtin/PigToStreamUDF.java PRE-CREATION
src/org/apache/pig/builtin/StreamUDFToPig.java PRE-CREATION
src/org/apache/pig/builtin/StreamingDelimiters.java PRE-CREATION
src/org/apache/pig/builtin/StreamingUDF.java PRE-CREATION
src/org/apache/pig/builtin/StreamingUDFException.java PRE-CREATION
src/org/apache/pig/builtin/StreamingUDFOutputSchemaException.java PRE-CREATION
src/org/apache/pig/impl/streaming/DefaultInputHandler.java 301bea3
src/org/apache/pig/impl/streaming/DefaultOutputHandler.java 1b46e7d
src/org/apache/pig/impl/streaming/ExecutableManager.java cf79c83
src/org/apache/pig/impl/streaming/InputHandler.java 690d94e
src/org/apache/pig/impl/streaming/OutputHandler.java 6e9262a
src/org/apache/pig/impl/streaming/StreamingUDFOutputHandler.java PRE-CREATION
src/org/apache/pig/impl/streaming/StreamingUtil.java PRE-CREATION
src/org/apache/pig/impl/util/JarManager.java 5c4acb0
src/org/apache/pig/impl/util/StorageUtil.java dcb62ec
src/org/apache/pig/scripting/ScriptEngine.java 29a9e1f
src/org/apache/pig/scripting/ScriptingIllustrateOutputCapturer.java PRE-CREATION
src/org/apache/pig/scripting/streaming/python/PythonScriptEngine.java PRE-CREATION
src/python/streaming/controller.py PRE-CREATION
src/python/streaming/pig_util.py PRE-CREATION
test/org/apache/pig/builtin/TestPigToStreamUDF.java PRE-CREATION
test/org/apache/pig/builtin/TestStreamUDFToPig.java PRE-CREATION
test/org/apache/pig/builtin/TestStreamingUDF.java PRE-CREATION
test/org/apache/pig/impl/streaming/TestExecutableManager.java 6246019
test/org/apache/pig/impl/streaming/TestStreamingUDFOutputHandler.java PRE-CREATION
test/org/apache/pig/impl/streaming/TestStreamingUtil.java PRE-CREATION
test/org/apache/pig/test/TestPigStreaming.java PRE-CREATION
test/org/apache/pig/test/TestStreaming.java 1eac5d2
test/python/streaming/test_controller.py PRE-CREATION
test/unit-tests d52ad9d
Diff: https://reviews.apache.org/r/13781/diff/
Testing
-------
Thanks,
Jeremy Karn