You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Russell Jurney (JIRA)" <ji...@apache.org> on 2012/07/07 02:23:33 UTC
[jira] [Commented] (PIG-2792) Wonderdog stopped working in Pig 0.10.0 (worked in 0.9.2)

    [ https://issues.apache.org/jira/browse/PIG-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408478#comment-13408478 ] 

Russell Jurney commented on PIG-2792:
-------------------------------------

The properties that need to be set in the Hadoop configuration object are:

           Instantiates a new RecordWriter for Elasticsearch
           <p>
           The properties that <b>MUST</b> be set in the hadoop Configuration object
           are as follows:
           <ul>
           <li><b>elasticsearch.index.name</b> - The name of the elasticsearch index data will be written to. It does not have to exist ahead of time</li>
           <li><b>elasticsearch.bulk.size</b> - The number of records to be accumulated into a bulk request before writing to elasticsearch.</li>
           <li><b>elasticsearch.is_json</b> - A boolean indicating whether the records to be indexed are json records. If false the records are assumed to be tsv, in which case <b>elasticsearch.field.names</b> must be set and contain a comma separated list of field names</li>
           <li><b>elasticsearch.object.type</b> - The type of objects being indexed</li>
           <li><b>elasticsearch.config</b> - The full path the elasticsearch.yml. It is a local path and must exist on all machines in the hadoop cluster.</li>
           <li><b>elasticsearch.plugins.dir</b> - The full path the elasticsearch plugins directory. It is a local path and must exist on all machines in the hadoop cluster.</li>
           </ul>
           <p>
           The following fields depend on whether <b>elasticsearch.is_json</b> is true or false.
           <ul>
           <li><b>elasticsearch.id.field.name</b> - When <b>elasticsearch.is_json</b> is true, this is the name of a field in the json document that contains the document's id. If -1 is used then the document is assumed to have no id and one is assigned to it by elasticsearch.</li>
           <li><b>elasticsearch.field.names</b> - When <b>elasticsearch.is_json</b> is false, this is a comma separated list of field names.</li>
           <li><b>elasticsearch.id.field</b> - When <b>elasticsearch.is_json</b> is false, this is the numeric index of the field to use as the document id. If -1 is used the document is assumed to have no id and one is assigned to it by elasticsearch.</li>
           </ul>       

                
> Wonderdog stopped working in Pig 0.10.0 (worked in 0.9.2)
> ---------------------------------------------------------
>
>                 Key: PIG-2792
>                 URL: https://issues.apache.org/jira/browse/PIG-2792
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.10.0, 0.11, 0.10.1
>         Environment: Pig with Wonderdog https://github.com/infochimps-labs/wonderdog for elasticsearch integration. Elasticsearch 0.18.6. Pig local mode.
>            Reporter: Russell Jurney
>            Priority: Blocker
>              Labels: a, about, area, book, did, i, moving, of, omg, technology, why, write
>             Fix For: 0.10.1
>
>
> The Pig UDFs in Wonderdog for ElasticSearch integration, which worked in 0.9.2 stopped working in 0.10.0.
> Now in 0.10.0 there is an error, as Wonderdog is unable to read its configuration from the hadoop cache.
> If someone can help identify what the issue is, or advise how Wonderdog or Pig can be modified so that wonderdog works with with Pig 0.10, it would be greatly appreciated.
> This issue is duped in the Wonderdog project here: https://github.com/infochimps-labs/wonderdog/issues/6 https://github.com/infochimps-labs/wonderdog/issues/5 and https://github.com/infochimps-labs/wonderdog/issues/7
> The error is below:
> 2012-07-06 16:50:51,501 [main] INFO  org.apache.pig.Main - Apache Pig version 0.10.0-SNAPSHOT (rexported) compiled Jun 22 2012, 15:56:16
> 2012-07-06 16:50:51,502 [main] INFO  org.apache.pig.Main - Logging error messages to: /private/tmp/pig_1341618651472.log
> 2012-07-06 16:50:51,829 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
> {"ok":true}
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                  Dload  Upload   Total   Spent    Left  Speed
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
> 100    11  100    11    0     0    647      0 --:--:-- --:--:-- --:--:--   733
> 2012-07-06 16:50:53,206 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
> 2012-07-06 16:50:53,379 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
> 2012-07-06 16:50:53,403 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
> 2012-07-06 16:50:53,403 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
> 2012-07-06 16:50:53,441 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
> 2012-07-06 16:50:53,449 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-07-06 16:50:53,494 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2012-07-06 16:50:53,560 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2012-07-06 16:50:53,587 [Thread-7] WARN  org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 2012-07-06 16:50:53,597 [Thread-7] WARN  org.apache.hadoop.mapred.JobClient - No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
> ****file:/tmp/emails.json
> 2012-07-06 16:50:53,711 [Thread-7] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
> 2012-07-06 16:50:53,711 [Thread-7] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 2
> 2012-07-06 16:50:53,734 [Thread-7] WARN  org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
> 2012-07-06 16:50:53,737 [Thread-7] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 3
> 2012-07-06 16:50:54,008 [Thread-8] INFO  org.apache.hadoop.mapred.Task -  Using ResourceCalculatorPlugin : null
> 2012-07-06 16:50:54,023 [Thread-8] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/tmp/emails.json/part-m-00000:0+33554432
> 2012-07-06 16:50:54,029 [Thread-8] INFO  com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using field:[message_id] for document ids
> 2012-07-06 16:50:54,029 [Thread-8] INFO  com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using [null] as es.config
> 2012-07-06 16:50:54,029 [Thread-8] INFO  com.infochimps.elasticsearch.ElasticSearchOutputFormat - Using [null] as es.plugins.dir
> 2012-07-06 16:50:54,033 [Thread-8] WARN  org.apache.hadoop.mapred.FileOutputCommitter - Output path is null in cleanup
> 2012-07-06 16:50:54,034 [Thread-8] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> java.lang.RuntimeException: java.lang.NullPointerException
> 	at com.infochimps.elasticsearch.ElasticSearchOutputFormat$ElasticSearchRecordWriter.<init>(ElasticSearchOutputFormat.java:133)
> 	at com.infochimps.elasticsearch.ElasticSearchOutputFormat.getRecordWriter(ElasticSearchOutputFormat.java:262)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Caused by: java.lang.NullPointerException
> 	at java.util.Hashtable.put(Hashtable.java:394)
> 	at java.util.Properties.setProperty(Properties.java:143)
> 	at java.lang.System.setProperty(System.java:746)
> 	at com.infochimps.elasticsearch.ElasticSearchOutputFormat$ElasticSearchRecordWriter.<init>(ElasticSearchOutputFormat.java:130)
> 	... 6 more
> 2012-07-06 16:50:54,506 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
> 2012-07-06 16:50:54,506 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2012-07-06 16:50:59,022 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0001 has failed! Stop running all dependent jobs
> 2012-07-06 16:50:59,023 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
> 2012-07-06 16:50:59,024 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2012-07-06 16:50:59,024 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
> 2012-07-06 16:50:59,025 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 
> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
> 1.0.2	0.10.0-SNAPSHOT	rjurney	2012-07-06 16:50:53	2012-07-06 16:50:59	UNKNOWN
> Failed!
> Failed Jobs:
> JobId	Alias	Feature	Message	Outputs
> job_local_0001	json_emails	MAP_ONLY	Message: Job failed! Error - NA	es://email/email?id=message_id&json=true&size=1000,
> Input(s):
> Failed to read data from "/tmp/emails.json"
> Output(s):
> Failed to produce result in "es://email/email?id=message_id&json=true&size=1000"
> Job DAG:
> job_local_0001
> 2012-07-06 16:50:59,025 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
> 2012-07-06 16:50:59,029 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
> 2012-07-06 16:50:59,029 [main] ERROR org.apache.pig.tools.grunt.GruntParser - org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
> 	at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
> 	at org.apache.pig.tools.grunt.GruntParser.processShCommand(GruntParser.java:1025)
> 	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:167)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> 	at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> 	at org.apache.pig.Main.run(Main.java:555)
> 	at org.apache.pig.Main.main(Main.java:111)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Details also at logfile: /private/tmp/pig_1341618651472.log
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                  Dload  Upload   Total   Spent    Left  Speed
> {
>   "took" : 75,
>   "timed_out" : false,
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  "_shards" : {
>     "total" : 5,
>     "successful" : 5,
>     "failed" : 0
>   },
>   "hits" : {
>     "total" : 0,
>     "max_score" : null,
>     "hits" : [ ]
>   }
> }
> 100   193  100   193    0     0   2475      0 --:--:-- --:--:-- --:--:--  2539
> 2012-07-06 16:50:59,140 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
> 2012-07-06 16:50:59,140 [main] ERROR org.apache.pig.tools.grunt.GruntParser - org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
> 	at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> 	at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> 	at org.apache.pig.Main.run(Main.java:555)
> 	at org.apache.pig.Main.main(Main.java:111)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira