You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/06/21 23:38:02 UTC
Inputformat
Hi,
I am using one of the libraries which rely on InputFormat.
Right now, it is reading xml files spanning across mutiple lines.
So currently the input format is like:
public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
public static final String START_TAG = "<page>";
public static final String END_TAG = "</page>";
@Override
public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
JobConf conf, Reporter reporter) throws IOException {
conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
return new XMLRecordReader((FileSplit) split, conf);
}
}
So, in above if the data is like:
<page>
soemthing \n
somthing \n
</page>
It process this sort of data..
Now, i want to use the same framework but for json files but lasting just
single line..
So I guess my
my START_TAG can be "{"
Will my END_TAG be "}\n"
it can't be "}" as there can be nested json in this data?
Any clues
Thanks
Re: Inputformat
Posted by Azuryy Yu <az...@gmail.com>.
you had to write a JSONInputFormat, or google first to find it.
--Send from my Sony mobile.
On Jun 23, 2013 7:06 AM, "jamal sasha" <ja...@gmail.com> wrote:
> Then how should I approach this issue?
>
>
> On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> If you try to hammer in a nail (json file) with a screwdriver (
>> XMLInputReader) then perhaps the reason it won't work may be that you are
>> using the wrong tool?
>> On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am using one of the libraries which rely on InputFormat.
>>> Right now, it is reading xml files spanning across mutiple lines.
>>> So currently the input format is like:
>>>
>>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>>
>>> public static final String START_TAG = "<page>";
>>> public static final String END_TAG = "</page>";
>>>
>>> @Override
>>> public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>>> split,
>>> JobConf conf, Reporter reporter) throws IOException {
>>> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>> return new XMLRecordReader((FileSplit) split, conf);
>>> }
>>> }
>>> So, in above if the data is like:
>>>
>>> <page>
>>>
>>> soemthing \n
>>> somthing \n
>>>
>>> </page>
>>>
>>> It process this sort of data..
>>>
>>>
>>> Now, i want to use the same framework but for json files but lasting
>>> just single line..
>>>
>>> So I guess my
>>> my START_TAG can be "{"
>>>
>>> Will my END_TAG be "}\n"
>>>
>>> it can't be "}" as there can be nested json in this data?
>>>
>>> Any clues
>>> Thanks
>>>
>>
>
Re: Inputformat
Posted by Azuryy Yu <az...@gmail.com>.
you had to write a JSONInputFormat, or google first to find it.
--Send from my Sony mobile.
On Jun 23, 2013 7:06 AM, "jamal sasha" <ja...@gmail.com> wrote:
> Then how should I approach this issue?
>
>
> On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> If you try to hammer in a nail (json file) with a screwdriver (
>> XMLInputReader) then perhaps the reason it won't work may be that you are
>> using the wrong tool?
>> On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am using one of the libraries which rely on InputFormat.
>>> Right now, it is reading xml files spanning across mutiple lines.
>>> So currently the input format is like:
>>>
>>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>>
>>> public static final String START_TAG = "<page>";
>>> public static final String END_TAG = "</page>";
>>>
>>> @Override
>>> public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>>> split,
>>> JobConf conf, Reporter reporter) throws IOException {
>>> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>> return new XMLRecordReader((FileSplit) split, conf);
>>> }
>>> }
>>> So, in above if the data is like:
>>>
>>> <page>
>>>
>>> soemthing \n
>>> somthing \n
>>>
>>> </page>
>>>
>>> It process this sort of data..
>>>
>>>
>>> Now, i want to use the same framework but for json files but lasting
>>> just single line..
>>>
>>> So I guess my
>>> my START_TAG can be "{"
>>>
>>> Will my END_TAG be "}\n"
>>>
>>> it can't be "}" as there can be nested json in this data?
>>>
>>> Any clues
>>> Thanks
>>>
>>
>
Re: Inputformat
Posted by Azuryy Yu <az...@gmail.com>.
you had to write a JSONInputFormat, or google first to find it.
--Send from my Sony mobile.
On Jun 23, 2013 7:06 AM, "jamal sasha" <ja...@gmail.com> wrote:
> Then how should I approach this issue?
>
>
> On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> If you try to hammer in a nail (json file) with a screwdriver (
>> XMLInputReader) then perhaps the reason it won't work may be that you are
>> using the wrong tool?
>> On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am using one of the libraries which rely on InputFormat.
>>> Right now, it is reading xml files spanning across mutiple lines.
>>> So currently the input format is like:
>>>
>>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>>
>>> public static final String START_TAG = "<page>";
>>> public static final String END_TAG = "</page>";
>>>
>>> @Override
>>> public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>>> split,
>>> JobConf conf, Reporter reporter) throws IOException {
>>> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>> return new XMLRecordReader((FileSplit) split, conf);
>>> }
>>> }
>>> So, in above if the data is like:
>>>
>>> <page>
>>>
>>> soemthing \n
>>> somthing \n
>>>
>>> </page>
>>>
>>> It process this sort of data..
>>>
>>>
>>> Now, i want to use the same framework but for json files but lasting
>>> just single line..
>>>
>>> So I guess my
>>> my START_TAG can be "{"
>>>
>>> Will my END_TAG be "}\n"
>>>
>>> it can't be "}" as there can be nested json in this data?
>>>
>>> Any clues
>>> Thanks
>>>
>>
>
Re: Inputformat
Posted by Azuryy Yu <az...@gmail.com>.
you had to write a JSONInputFormat, or google first to find it.
--Send from my Sony mobile.
On Jun 23, 2013 7:06 AM, "jamal sasha" <ja...@gmail.com> wrote:
> Then how should I approach this issue?
>
>
> On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> If you try to hammer in a nail (json file) with a screwdriver (
>> XMLInputReader) then perhaps the reason it won't work may be that you are
>> using the wrong tool?
>> On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am using one of the libraries which rely on InputFormat.
>>> Right now, it is reading xml files spanning across mutiple lines.
>>> So currently the input format is like:
>>>
>>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>>
>>> public static final String START_TAG = "<page>";
>>> public static final String END_TAG = "</page>";
>>>
>>> @Override
>>> public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>>> split,
>>> JobConf conf, Reporter reporter) throws IOException {
>>> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>> return new XMLRecordReader((FileSplit) split, conf);
>>> }
>>> }
>>> So, in above if the data is like:
>>>
>>> <page>
>>>
>>> soemthing \n
>>> somthing \n
>>>
>>> </page>
>>>
>>> It process this sort of data..
>>>
>>>
>>> Now, i want to use the same framework but for json files but lasting
>>> just single line..
>>>
>>> So I guess my
>>> my START_TAG can be "{"
>>>
>>> Will my END_TAG be "}\n"
>>>
>>> it can't be "}" as there can be nested json in this data?
>>>
>>> Any clues
>>> Thanks
>>>
>>
>
Re: Inputformat
Posted by jamal sasha <ja...@gmail.com>.
Then how should I approach this issue?
On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
> If you try to hammer in a nail (json file) with a screwdriver (
> XMLInputReader) then perhaps the reason it won't work may be that you are
> using the wrong tool?
> On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using one of the libraries which rely on InputFormat.
>> Right now, it is reading xml files spanning across mutiple lines.
>> So currently the input format is like:
>>
>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>
>> public static final String START_TAG = "<page>";
>> public static final String END_TAG = "</page>";
>>
>> @Override
>> public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>> split,
>> JobConf conf, Reporter reporter) throws IOException {
>> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>> return new XMLRecordReader((FileSplit) split, conf);
>> }
>> }
>> So, in above if the data is like:
>>
>> <page>
>>
>> soemthing \n
>> somthing \n
>>
>> </page>
>>
>> It process this sort of data..
>>
>>
>> Now, i want to use the same framework but for json files but lasting just
>> single line..
>>
>> So I guess my
>> my START_TAG can be "{"
>>
>> Will my END_TAG be "}\n"
>>
>> it can't be "}" as there can be nested json in this data?
>>
>> Any clues
>> Thanks
>>
>
Re: Inputformat
Posted by jamal sasha <ja...@gmail.com>.
Then how should I approach this issue?
On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
> If you try to hammer in a nail (json file) with a screwdriver (
> XMLInputReader) then perhaps the reason it won't work may be that you are
> using the wrong tool?
> On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using one of the libraries which rely on InputFormat.
>> Right now, it is reading xml files spanning across mutiple lines.
>> So currently the input format is like:
>>
>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>
>> public static final String START_TAG = "<page>";
>> public static final String END_TAG = "</page>";
>>
>> @Override
>> public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>> split,
>> JobConf conf, Reporter reporter) throws IOException {
>> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>> return new XMLRecordReader((FileSplit) split, conf);
>> }
>> }
>> So, in above if the data is like:
>>
>> <page>
>>
>> soemthing \n
>> somthing \n
>>
>> </page>
>>
>> It process this sort of data..
>>
>>
>> Now, i want to use the same framework but for json files but lasting just
>> single line..
>>
>> So I guess my
>> my START_TAG can be "{"
>>
>> Will my END_TAG be "}\n"
>>
>> it can't be "}" as there can be nested json in this data?
>>
>> Any clues
>> Thanks
>>
>
Re: Inputformat
Posted by jamal sasha <ja...@gmail.com>.
Then how should I approach this issue?
On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
> If you try to hammer in a nail (json file) with a screwdriver (
> XMLInputReader) then perhaps the reason it won't work may be that you are
> using the wrong tool?
> On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using one of the libraries which rely on InputFormat.
>> Right now, it is reading xml files spanning across mutiple lines.
>> So currently the input format is like:
>>
>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>
>> public static final String START_TAG = "<page>";
>> public static final String END_TAG = "</page>";
>>
>> @Override
>> public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>> split,
>> JobConf conf, Reporter reporter) throws IOException {
>> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>> return new XMLRecordReader((FileSplit) split, conf);
>> }
>> }
>> So, in above if the data is like:
>>
>> <page>
>>
>> soemthing \n
>> somthing \n
>>
>> </page>
>>
>> It process this sort of data..
>>
>>
>> Now, i want to use the same framework but for json files but lasting just
>> single line..
>>
>> So I guess my
>> my START_TAG can be "{"
>>
>> Will my END_TAG be "}\n"
>>
>> it can't be "}" as there can be nested json in this data?
>>
>> Any clues
>> Thanks
>>
>
Re: Inputformat
Posted by jamal sasha <ja...@gmail.com>.
Then how should I approach this issue?
On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
> If you try to hammer in a nail (json file) with a screwdriver (
> XMLInputReader) then perhaps the reason it won't work may be that you are
> using the wrong tool?
> On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using one of the libraries which rely on InputFormat.
>> Right now, it is reading xml files spanning across mutiple lines.
>> So currently the input format is like:
>>
>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>
>> public static final String START_TAG = "<page>";
>> public static final String END_TAG = "</page>";
>>
>> @Override
>> public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>> split,
>> JobConf conf, Reporter reporter) throws IOException {
>> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>> return new XMLRecordReader((FileSplit) split, conf);
>> }
>> }
>> So, in above if the data is like:
>>
>> <page>
>>
>> soemthing \n
>> somthing \n
>>
>> </page>
>>
>> It process this sort of data..
>>
>>
>> Now, i want to use the same framework but for json files but lasting just
>> single line..
>>
>> So I guess my
>> my START_TAG can be "{"
>>
>> Will my END_TAG be "}\n"
>>
>> it can't be "}" as there can be nested json in this data?
>>
>> Any clues
>> Thanks
>>
>
Re: Inputformat
Posted by Niels Basjes <Ni...@basjes.nl>.
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
> Hi,
>
> I am using one of the libraries which rely on InputFormat.
> Right now, it is reading xml files spanning across mutiple lines.
> So currently the input format is like:
>
> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>
> public static final String START_TAG = "<page>";
> public static final String END_TAG = "</page>";
>
> @Override
> public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
> JobConf conf, Reporter reporter) throws IOException {
> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
> return new XMLRecordReader((FileSplit) split, conf);
> }
> }
> So, in above if the data is like:
>
> <page>
>
> soemthing \n
> somthing \n
>
> </page>
>
> It process this sort of data..
>
>
> Now, i want to use the same framework but for json files but lasting just
> single line..
>
> So I guess my
> my START_TAG can be "{"
>
> Will my END_TAG be "}\n"
>
> it can't be "}" as there can be nested json in this data?
>
> Any clues
> Thanks
>
Re: Inputformat
Posted by Niels Basjes <Ni...@basjes.nl>.
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
> Hi,
>
> I am using one of the libraries which rely on InputFormat.
> Right now, it is reading xml files spanning across mutiple lines.
> So currently the input format is like:
>
> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>
> public static final String START_TAG = "<page>";
> public static final String END_TAG = "</page>";
>
> @Override
> public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
> JobConf conf, Reporter reporter) throws IOException {
> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
> return new XMLRecordReader((FileSplit) split, conf);
> }
> }
> So, in above if the data is like:
>
> <page>
>
> soemthing \n
> somthing \n
>
> </page>
>
> It process this sort of data..
>
>
> Now, i want to use the same framework but for json files but lasting just
> single line..
>
> So I guess my
> my START_TAG can be "{"
>
> Will my END_TAG be "}\n"
>
> it can't be "}" as there can be nested json in this data?
>
> Any clues
> Thanks
>
Re: Inputformat
Posted by Niels Basjes <Ni...@basjes.nl>.
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
> Hi,
>
> I am using one of the libraries which rely on InputFormat.
> Right now, it is reading xml files spanning across mutiple lines.
> So currently the input format is like:
>
> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>
> public static final String START_TAG = "<page>";
> public static final String END_TAG = "</page>";
>
> @Override
> public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
> JobConf conf, Reporter reporter) throws IOException {
> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
> return new XMLRecordReader((FileSplit) split, conf);
> }
> }
> So, in above if the data is like:
>
> <page>
>
> soemthing \n
> somthing \n
>
> </page>
>
> It process this sort of data..
>
>
> Now, i want to use the same framework but for json files but lasting just
> single line..
>
> So I guess my
> my START_TAG can be "{"
>
> Will my END_TAG be "}\n"
>
> it can't be "}" as there can be nested json in this data?
>
> Any clues
> Thanks
>
Re: Inputformat
Posted by Niels Basjes <Ni...@basjes.nl>.
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
> Hi,
>
> I am using one of the libraries which rely on InputFormat.
> Right now, it is reading xml files spanning across mutiple lines.
> So currently the input format is like:
>
> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>
> public static final String START_TAG = "<page>";
> public static final String END_TAG = "</page>";
>
> @Override
> public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
> JobConf conf, Reporter reporter) throws IOException {
> conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
> conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
> return new XMLRecordReader((FileSplit) split, conf);
> }
> }
> So, in above if the data is like:
>
> <page>
>
> soemthing \n
> somthing \n
>
> </page>
>
> It process this sort of data..
>
>
> Now, i want to use the same framework but for json files but lasting just
> single line..
>
> So I guess my
> my START_TAG can be "{"
>
> Will my END_TAG be "}\n"
>
> it can't be "}" as there can be nested json in this data?
>
> Any clues
> Thanks
>