You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Attila Sasvari (JIRA)" <ji...@apache.org> on 2017/03/14 14:43:41 UTC

[jira] [Commented] (OOZIE-2819) Make Oozie REST API accept multibyte characters via client side xml

    [ https://issues.apache.org/jira/browse/OOZIE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924342#comment-15924342 ] 

Attila Sasvari commented on OOZIE-2819:
---------------------------------------

I found the root cause of the problem:
in {{core/src/main/java/org/apache/oozie/action/hadoop/ScriptLanguageActionExecutor.java}}   {{dos.writeBytes(scriptContent);}} was used instead of  {{dos.write(scriptContent.getBytes());}}
as a result the following was written to HDFS as Pig script
{code}
lines = LOAD 'hdfs:///tmp/encoding/input.txt' USING PigStorage('\n') AS line;			test = FILTER lines BY line == '~';			STORE test INTO 'hdfs:///tmp/encoding/output' USING PigStorage('\n');% 
{code}

Using {{writeBytes}} to write out multibyte characters is faulty, its [javadoc|https://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html#writeBytes(java.lang.String)] says {{Each character in the string is written out, in sequence, by discarding its high eight bits.}}

After replacing {{writeBytes}} with {{write}} the proper Pig script is stored in HDFS:
{code}
bin/hadoop dfs -cat hdfs://localhost:9000/user/asasvari/oozie-asas/0000001-170314152844064-oozie-asas-W/pig1--pig/dummy.pig
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

17/03/14 15:31:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
lines = LOAD 'hdfs:///tmp/encoding/input.txt' USING PigStorage('\n') AS line;			test = FILTER lines BY line == '松';			STORE test INTO 'hdfs:///tmp/encoding/output' USING PigStorage('\n');% 
{code}

> Make Oozie REST API accept multibyte characters via client side xml
> -------------------------------------------------------------------
>
>                 Key: OOZIE-2819
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2819
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Attila Sasvari
>
> Submitted Pig action with client side xml failed via proxy submission when it contained multibyte characters.
> {code}
> curl -i  -X POST -d @/tmp/pig.xml -H 'Content-Type: application/XML; charset=UTF-8' 'http://'localhost':11000/oozie/v1/jobs?jobtype=pig&action=start'
> {code}
> Where
> {code}
> $ hdfs dfs -cat /tmp/encoding/input.txt
> 松
> 林檎
> 松
> {code}
> {code}
> $ cat /tmp/pig.xml 
> <configuration>
> <property>
> <name>fs.default.name</name>
> <value>hdfs://localhost:8020/</value>
> </property>
> <property>
> <name>mapred.job.tracker</name>
> <value>localhost:8032</value>
> </property>
> <property>
> <name>user.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>oozie.pig.script</name>
> <value><![CDATA[
> lines = LOAD 'hdfs:///tmp/encoding/input.txt' USING PigStorage('\n') AS line;
> test = FILTER lines BY line == '松';
> STORE test INTO 'hdfs:///tmp/encoding/output' USING PigStorage('\n');
> ]]></value>
> </property>
> <property>
> <name>oozie.pig.script.params.size</name>
> <value>0</value>
> </property>
> <property>
> <name>oozie.pig.script.options.size</name>
> <value>0</value>
> </property>
> <property>
> <name>oozie.libpath</name>
> <value>hdfs:///user/oozie/share/lib</value>
> </property>
> <property>
> <name>oozie.use.system.libpath</name>
> <value>true</value>
> </property>
> <property>
> <name>oozie.proxysubmission</name>
> <value>true</value>
> </property>
> </configuration>
> {code}
> In the Oozie launcher log, I could see
> {code}
> lines = LOAD 'hdfs:///tmp/encoding/input.txt' USING PigStorage('\n') AS line;test = FILTER lines BY line == '~';STORE test INTO 'hdfs:///tmp/encoding/output' USING PigStorage('\n');
> {code}
> was used instead of the intended 松



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)