You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/06/21 23:38:02 UTC

Inputformat

Hi,

  I am using one of the libraries which rely on InputFormat.
Right now, it is reading xml files spanning across mutiple lines.
So currently the input format is like:

public class XMLInputReader extends FileInputFormat<LongWritable, Text> {

  public static final String START_TAG = "<page>";
  public static final String END_TAG = "</page>";

  @Override
  public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
      JobConf conf, Reporter reporter) throws IOException {
    conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
    conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
    return new XMLRecordReader((FileSplit) split, conf);
  }
}
So, in above if the data is like:

<page>

 soemthing \n
somthing \n

</page>

It process this sort of data..


Now, i want to use the same framework but for json files but lasting just
single line..

So I guess my
my START_TAG can be "{"

Will my END_TAG be "}\n"

it can't be "}" as there can be nested json in this data?

Any clues
Thanks

Re: Inputformat

Posted by Azuryy Yu <az...@gmail.com>.
you had to write a JSONInputFormat, or google first to find it.

--Send from my Sony mobile.
On Jun 23, 2013 7:06 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Then how should I approach this issue?
>
>
> On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> If you try to hammer in a nail (json file) with a screwdriver (
>> XMLInputReader) then perhaps the reason it won't work may be that you are
>> using the wrong tool?
>>  On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>   I am using one of the libraries which rely on InputFormat.
>>> Right now, it is reading xml files spanning across mutiple lines.
>>> So currently the input format is like:
>>>
>>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>>
>>>   public static final String START_TAG = "<page>";
>>>   public static final String END_TAG = "</page>";
>>>
>>>   @Override
>>>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>>> split,
>>>       JobConf conf, Reporter reporter) throws IOException {
>>>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>>     return new XMLRecordReader((FileSplit) split, conf);
>>>   }
>>> }
>>> So, in above if the data is like:
>>>
>>> <page>
>>>
>>>  soemthing \n
>>> somthing \n
>>>
>>> </page>
>>>
>>> It process this sort of data..
>>>
>>>
>>> Now, i want to use the same framework but for json files but lasting
>>> just single line..
>>>
>>> So I guess my
>>> my START_TAG can be "{"
>>>
>>> Will my END_TAG be "}\n"
>>>
>>> it can't be "}" as there can be nested json in this data?
>>>
>>> Any clues
>>> Thanks
>>>
>>
>

Re: Inputformat

Posted by Azuryy Yu <az...@gmail.com>.
you had to write a JSONInputFormat, or google first to find it.

--Send from my Sony mobile.
On Jun 23, 2013 7:06 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Then how should I approach this issue?
>
>
> On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> If you try to hammer in a nail (json file) with a screwdriver (
>> XMLInputReader) then perhaps the reason it won't work may be that you are
>> using the wrong tool?
>>  On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>   I am using one of the libraries which rely on InputFormat.
>>> Right now, it is reading xml files spanning across mutiple lines.
>>> So currently the input format is like:
>>>
>>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>>
>>>   public static final String START_TAG = "<page>";
>>>   public static final String END_TAG = "</page>";
>>>
>>>   @Override
>>>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>>> split,
>>>       JobConf conf, Reporter reporter) throws IOException {
>>>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>>     return new XMLRecordReader((FileSplit) split, conf);
>>>   }
>>> }
>>> So, in above if the data is like:
>>>
>>> <page>
>>>
>>>  soemthing \n
>>> somthing \n
>>>
>>> </page>
>>>
>>> It process this sort of data..
>>>
>>>
>>> Now, i want to use the same framework but for json files but lasting
>>> just single line..
>>>
>>> So I guess my
>>> my START_TAG can be "{"
>>>
>>> Will my END_TAG be "}\n"
>>>
>>> it can't be "}" as there can be nested json in this data?
>>>
>>> Any clues
>>> Thanks
>>>
>>
>

Re: Inputformat

Posted by Azuryy Yu <az...@gmail.com>.
you had to write a JSONInputFormat, or google first to find it.

--Send from my Sony mobile.
On Jun 23, 2013 7:06 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Then how should I approach this issue?
>
>
> On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> If you try to hammer in a nail (json file) with a screwdriver (
>> XMLInputReader) then perhaps the reason it won't work may be that you are
>> using the wrong tool?
>>  On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>   I am using one of the libraries which rely on InputFormat.
>>> Right now, it is reading xml files spanning across mutiple lines.
>>> So currently the input format is like:
>>>
>>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>>
>>>   public static final String START_TAG = "<page>";
>>>   public static final String END_TAG = "</page>";
>>>
>>>   @Override
>>>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>>> split,
>>>       JobConf conf, Reporter reporter) throws IOException {
>>>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>>     return new XMLRecordReader((FileSplit) split, conf);
>>>   }
>>> }
>>> So, in above if the data is like:
>>>
>>> <page>
>>>
>>>  soemthing \n
>>> somthing \n
>>>
>>> </page>
>>>
>>> It process this sort of data..
>>>
>>>
>>> Now, i want to use the same framework but for json files but lasting
>>> just single line..
>>>
>>> So I guess my
>>> my START_TAG can be "{"
>>>
>>> Will my END_TAG be "}\n"
>>>
>>> it can't be "}" as there can be nested json in this data?
>>>
>>> Any clues
>>> Thanks
>>>
>>
>

Re: Inputformat

Posted by Azuryy Yu <az...@gmail.com>.
you had to write a JSONInputFormat, or google first to find it.

--Send from my Sony mobile.
On Jun 23, 2013 7:06 AM, "jamal sasha" <ja...@gmail.com> wrote:

> Then how should I approach this issue?
>
>
> On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>> If you try to hammer in a nail (json file) with a screwdriver (
>> XMLInputReader) then perhaps the reason it won't work may be that you are
>> using the wrong tool?
>>  On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>   I am using one of the libraries which rely on InputFormat.
>>> Right now, it is reading xml files spanning across mutiple lines.
>>> So currently the input format is like:
>>>
>>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>>
>>>   public static final String START_TAG = "<page>";
>>>   public static final String END_TAG = "</page>";
>>>
>>>   @Override
>>>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>>> split,
>>>       JobConf conf, Reporter reporter) throws IOException {
>>>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>>     return new XMLRecordReader((FileSplit) split, conf);
>>>   }
>>> }
>>> So, in above if the data is like:
>>>
>>> <page>
>>>
>>>  soemthing \n
>>> somthing \n
>>>
>>> </page>
>>>
>>> It process this sort of data..
>>>
>>>
>>> Now, i want to use the same framework but for json files but lasting
>>> just single line..
>>>
>>> So I guess my
>>> my START_TAG can be "{"
>>>
>>> Will my END_TAG be "}\n"
>>>
>>> it can't be "}" as there can be nested json in this data?
>>>
>>> Any clues
>>> Thanks
>>>
>>
>

Re: Inputformat

Posted by jamal sasha <ja...@gmail.com>.
Then how should I approach this issue?


On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:

> If you try to hammer in a nail (json file) with a screwdriver (
> XMLInputReader) then perhaps the reason it won't work may be that you are
> using the wrong tool?
>  On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>
>> Hi,
>>
>>   I am using one of the libraries which rely on InputFormat.
>> Right now, it is reading xml files spanning across mutiple lines.
>> So currently the input format is like:
>>
>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>
>>   public static final String START_TAG = "<page>";
>>   public static final String END_TAG = "</page>";
>>
>>   @Override
>>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>> split,
>>       JobConf conf, Reporter reporter) throws IOException {
>>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>     return new XMLRecordReader((FileSplit) split, conf);
>>   }
>> }
>> So, in above if the data is like:
>>
>> <page>
>>
>>  soemthing \n
>> somthing \n
>>
>> </page>
>>
>> It process this sort of data..
>>
>>
>> Now, i want to use the same framework but for json files but lasting just
>> single line..
>>
>> So I guess my
>> my START_TAG can be "{"
>>
>> Will my END_TAG be "}\n"
>>
>> it can't be "}" as there can be nested json in this data?
>>
>> Any clues
>> Thanks
>>
>

Re: Inputformat

Posted by jamal sasha <ja...@gmail.com>.
Then how should I approach this issue?


On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:

> If you try to hammer in a nail (json file) with a screwdriver (
> XMLInputReader) then perhaps the reason it won't work may be that you are
> using the wrong tool?
>  On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>
>> Hi,
>>
>>   I am using one of the libraries which rely on InputFormat.
>> Right now, it is reading xml files spanning across mutiple lines.
>> So currently the input format is like:
>>
>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>
>>   public static final String START_TAG = "<page>";
>>   public static final String END_TAG = "</page>";
>>
>>   @Override
>>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>> split,
>>       JobConf conf, Reporter reporter) throws IOException {
>>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>     return new XMLRecordReader((FileSplit) split, conf);
>>   }
>> }
>> So, in above if the data is like:
>>
>> <page>
>>
>>  soemthing \n
>> somthing \n
>>
>> </page>
>>
>> It process this sort of data..
>>
>>
>> Now, i want to use the same framework but for json files but lasting just
>> single line..
>>
>> So I guess my
>> my START_TAG can be "{"
>>
>> Will my END_TAG be "}\n"
>>
>> it can't be "}" as there can be nested json in this data?
>>
>> Any clues
>> Thanks
>>
>

Re: Inputformat

Posted by jamal sasha <ja...@gmail.com>.
Then how should I approach this issue?


On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:

> If you try to hammer in a nail (json file) with a screwdriver (
> XMLInputReader) then perhaps the reason it won't work may be that you are
> using the wrong tool?
>  On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>
>> Hi,
>>
>>   I am using one of the libraries which rely on InputFormat.
>> Right now, it is reading xml files spanning across mutiple lines.
>> So currently the input format is like:
>>
>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>
>>   public static final String START_TAG = "<page>";
>>   public static final String END_TAG = "</page>";
>>
>>   @Override
>>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>> split,
>>       JobConf conf, Reporter reporter) throws IOException {
>>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>     return new XMLRecordReader((FileSplit) split, conf);
>>   }
>> }
>> So, in above if the data is like:
>>
>> <page>
>>
>>  soemthing \n
>> somthing \n
>>
>> </page>
>>
>> It process this sort of data..
>>
>>
>> Now, i want to use the same framework but for json files but lasting just
>> single line..
>>
>> So I guess my
>> my START_TAG can be "{"
>>
>> Will my END_TAG be "}\n"
>>
>> it can't be "}" as there can be nested json in this data?
>>
>> Any clues
>> Thanks
>>
>

Re: Inputformat

Posted by jamal sasha <ja...@gmail.com>.
Then how should I approach this issue?


On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes <Ni...@basjes.nl> wrote:

> If you try to hammer in a nail (json file) with a screwdriver (
> XMLInputReader) then perhaps the reason it won't work may be that you are
> using the wrong tool?
>  On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:
>
>> Hi,
>>
>>   I am using one of the libraries which rely on InputFormat.
>> Right now, it is reading xml files spanning across mutiple lines.
>> So currently the input format is like:
>>
>> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>>
>>   public static final String START_TAG = "<page>";
>>   public static final String END_TAG = "</page>";
>>
>>   @Override
>>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit
>> split,
>>       JobConf conf, Reporter reporter) throws IOException {
>>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>>     return new XMLRecordReader((FileSplit) split, conf);
>>   }
>> }
>> So, in above if the data is like:
>>
>> <page>
>>
>>  soemthing \n
>> somthing \n
>>
>> </page>
>>
>> It process this sort of data..
>>
>>
>> Now, i want to use the same framework but for json files but lasting just
>> single line..
>>
>> So I guess my
>> my START_TAG can be "{"
>>
>> Will my END_TAG be "}\n"
>>
>> it can't be "}" as there can be nested json in this data?
>>
>> Any clues
>> Thanks
>>
>

Re: Inputformat

Posted by Niels Basjes <Ni...@basjes.nl>.
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
 On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:

> Hi,
>
>   I am using one of the libraries which rely on InputFormat.
> Right now, it is reading xml files spanning across mutiple lines.
> So currently the input format is like:
>
> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>
>   public static final String START_TAG = "<page>";
>   public static final String END_TAG = "</page>";
>
>   @Override
>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
>       JobConf conf, Reporter reporter) throws IOException {
>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>     return new XMLRecordReader((FileSplit) split, conf);
>   }
> }
> So, in above if the data is like:
>
> <page>
>
>  soemthing \n
> somthing \n
>
> </page>
>
> It process this sort of data..
>
>
> Now, i want to use the same framework but for json files but lasting just
> single line..
>
> So I guess my
> my START_TAG can be "{"
>
> Will my END_TAG be "}\n"
>
> it can't be "}" as there can be nested json in this data?
>
> Any clues
> Thanks
>

Re: Inputformat

Posted by Niels Basjes <Ni...@basjes.nl>.
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
 On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:

> Hi,
>
>   I am using one of the libraries which rely on InputFormat.
> Right now, it is reading xml files spanning across mutiple lines.
> So currently the input format is like:
>
> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>
>   public static final String START_TAG = "<page>";
>   public static final String END_TAG = "</page>";
>
>   @Override
>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
>       JobConf conf, Reporter reporter) throws IOException {
>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>     return new XMLRecordReader((FileSplit) split, conf);
>   }
> }
> So, in above if the data is like:
>
> <page>
>
>  soemthing \n
> somthing \n
>
> </page>
>
> It process this sort of data..
>
>
> Now, i want to use the same framework but for json files but lasting just
> single line..
>
> So I guess my
> my START_TAG can be "{"
>
> Will my END_TAG be "}\n"
>
> it can't be "}" as there can be nested json in this data?
>
> Any clues
> Thanks
>

Re: Inputformat

Posted by Niels Basjes <Ni...@basjes.nl>.
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
 On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:

> Hi,
>
>   I am using one of the libraries which rely on InputFormat.
> Right now, it is reading xml files spanning across mutiple lines.
> So currently the input format is like:
>
> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>
>   public static final String START_TAG = "<page>";
>   public static final String END_TAG = "</page>";
>
>   @Override
>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
>       JobConf conf, Reporter reporter) throws IOException {
>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>     return new XMLRecordReader((FileSplit) split, conf);
>   }
> }
> So, in above if the data is like:
>
> <page>
>
>  soemthing \n
> somthing \n
>
> </page>
>
> It process this sort of data..
>
>
> Now, i want to use the same framework but for json files but lasting just
> single line..
>
> So I guess my
> my START_TAG can be "{"
>
> Will my END_TAG be "}\n"
>
> it can't be "}" as there can be nested json in this data?
>
> Any clues
> Thanks
>

Re: Inputformat

Posted by Niels Basjes <Ni...@basjes.nl>.
If you try to hammer in a nail (json file) with a screwdriver (
XMLInputReader) then perhaps the reason it won't work may be that you are
using the wrong tool?
 On Jun 21, 2013 11:38 PM, "jamal sasha" <ja...@gmail.com> wrote:

> Hi,
>
>   I am using one of the libraries which rely on InputFormat.
> Right now, it is reading xml files spanning across mutiple lines.
> So currently the input format is like:
>
> public class XMLInputReader extends FileInputFormat<LongWritable, Text> {
>
>   public static final String START_TAG = "<page>";
>   public static final String END_TAG = "</page>";
>
>   @Override
>   public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
>       JobConf conf, Reporter reporter) throws IOException {
>     conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
>     conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
>     return new XMLRecordReader((FileSplit) split, conf);
>   }
> }
> So, in above if the data is like:
>
> <page>
>
>  soemthing \n
> somthing \n
>
> </page>
>
> It process this sort of data..
>
>
> Now, i want to use the same framework but for json files but lasting just
> single line..
>
> So I guess my
> my START_TAG can be "{"
>
> Will my END_TAG be "}\n"
>
> it can't be "}" as there can be nested json in this data?
>
> Any clues
> Thanks
>