You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ralf Heyde <ra...@gmx.de> on 2011/09/09 17:41:21 UTC
Native HDFS Write Text & JAQL Execution
Hello again,
I'm thinking I have misunderstood something in writing files to HDFS and
process them in JAQL.
I have some sample-data which are represented by a set of objects.
I transform these object to a JSONString.
I'm writing JSON data directly to a HDFS-File through my HDFS-Client code:
-------------------------------------------
Configuration config = new Configuration();
// add the hadoop configuration files residing in the installation path of
hadoop
config.addResource(new Path("core-site.xml"));
// pass the username and password required to access the HDFS (set up on the
namenode)
config.set("hadoop.job.ugi", "hadoop, password");
FileSystem fs = FileSystem.get(config);
Path path = new Path("/sampledata");
fs.mkdirs( path );
Path file = new Path( path, "samplefile.json" );
FSDataOutputStream fos = fs.create( file );
// Collect Sample Data and
Collection<Entry> entries = MockFactory.createEntries();
// Build JSON and
JSONArray jsonArray = JSONBuilder.buildSomeTwitterJSON(entries);
// write JSON to HDFS
fos.writeBytes( jsonArray.toString() );
fos.flush();
fos.close();
fs.close();
-------------------------------------------
Now I would like to run a JAQL script, but I get an error - The input file
is not a SequenceFile.
-------------------------------------------
// Read
$sampledata = read(hdfs("/sampledata/samplefile.json"));
// Query 1: filter and transform
$ sampledata
-> filter $.status_id == 1
-> transform { $.authorurl, $.datum };
-------------------------------------------
Can someone give me a hint to correct my misunderstanding?
Thanks,
Ralf
RE: Native HDFS Write Text & JAQL Execution
Posted by Ralf Heyde <ra...@gmx.de>.
In Addition ... I found my mistake ...
The JAQL documentation gaves me an hint:
http://code.google.com/p/jaql/wiki/IO
// Options needed to read data as JSON from Hadoop Text File
jaql> txtInOpt = {format: "org.apache.hadoop.mapred.TextInputFormat",
converter:
"com.ibm.jaql.io.hadoop.converter.FromJsonTextConverter"};
-----Original Message-----
From: Ralf Heyde [mailto:ralf.heyde@gmx.de]
Sent: Freitag, 9. September 2011 17:41
To: common-user@hadoop.apache.org
Subject: Native HDFS Write Text & JAQL Execution
Hello again,
I'm thinking I have misunderstood something in writing files to HDFS and
process them in JAQL.
I have some sample-data which are represented by a set of objects.
I transform these object to a JSONString.
I'm writing JSON data directly to a HDFS-File through my HDFS-Client code:
-------------------------------------------
Configuration config = new Configuration();
// add the hadoop configuration files residing in the installation path of
hadoop
config.addResource(new Path("core-site.xml"));
// pass the username and password required to access the HDFS (set up on the
namenode)
config.set("hadoop.job.ugi", "hadoop, password");
FileSystem fs = FileSystem.get(config);
Path path = new Path("/sampledata");
fs.mkdirs( path );
Path file = new Path( path, "samplefile.json" );
FSDataOutputStream fos = fs.create( file );
// Collect Sample Data and
Collection<Entry> entries = MockFactory.createEntries();
// Build JSON and
JSONArray jsonArray = JSONBuilder.buildSomeTwitterJSON(entries);
// write JSON to HDFS
fos.writeBytes( jsonArray.toString() );
fos.flush();
fos.close();
fs.close();
-------------------------------------------
Now I would like to run a JAQL script, but I get an error - The input file
is not a SequenceFile.
-------------------------------------------
// Read
$sampledata = read(hdfs("/sampledata/samplefile.json"));
// Query 1: filter and transform
$ sampledata
-> filter $.status_id == 1
-> transform { $.authorurl, $.datum };
-------------------------------------------
Can someone give me a hint to correct my misunderstanding?
Thanks,
Ralf