You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Grandl Robert <rg...@yahoo.com> on 2012/08/22 05:21:49 UTC

Splitting input file

Hi,

I think there are many discussions about splitting the input file based on custom delimiters. 

However, I am not sure if there is a simple way to split text input file based on end of sentences(.) without writing any custom split delimiter or so. Can I simply specify such delimiter when I add the input into HDFS ?

Thanks,
Robert

Re: Splitting input file

Posted by Harsh J <ha...@cloudera.com>.
Hi Grandl,

You can set "textinputformat.record.delimiter" to "," to have records
from a text file split at commas. Isn't that sufficient? You do not
need to write any special InputFormat for text files this way.

On Wed, Aug 22, 2012 at 8:51 AM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi,
>
> I think there are many discussions about splitting the input file based on
> custom delimiters.
>
> However, I am not sure if there is a simple way to split text input file
> based on end of sentences(.) without writing any custom split delimiter or
> so. Can I simply specify such delimiter when I add the input into HDFS ?
>
> Thanks,
> Robert
>



-- 
Harsh J

Re: Splitting input file

Posted by Harsh J <ha...@cloudera.com>.
Hi Grandl,

You can set "textinputformat.record.delimiter" to "," to have records
from a text file split at commas. Isn't that sufficient? You do not
need to write any special InputFormat for text files this way.

On Wed, Aug 22, 2012 at 8:51 AM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi,
>
> I think there are many discussions about splitting the input file based on
> custom delimiters.
>
> However, I am not sure if there is a simple way to split text input file
> based on end of sentences(.) without writing any custom split delimiter or
> so. Can I simply specify such delimiter when I add the input into HDFS ?
>
> Thanks,
> Robert
>



-- 
Harsh J

Re: Splitting input file

Posted by Harsh J <ha...@cloudera.com>.
Hi Grandl,

You can set "textinputformat.record.delimiter" to "," to have records
from a text file split at commas. Isn't that sufficient? You do not
need to write any special InputFormat for text files this way.

On Wed, Aug 22, 2012 at 8:51 AM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi,
>
> I think there are many discussions about splitting the input file based on
> custom delimiters.
>
> However, I am not sure if there is a simple way to split text input file
> based on end of sentences(.) without writing any custom split delimiter or
> so. Can I simply specify such delimiter when I add the input into HDFS ?
>
> Thanks,
> Robert
>



-- 
Harsh J

Re: Splitting input file

Posted by Harsh J <ha...@cloudera.com>.
Hi Grandl,

You can set "textinputformat.record.delimiter" to "," to have records
from a text file split at commas. Isn't that sufficient? You do not
need to write any special InputFormat for text files this way.

On Wed, Aug 22, 2012 at 8:51 AM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi,
>
> I think there are many discussions about splitting the input file based on
> custom delimiters.
>
> However, I am not sure if there is a simple way to split text input file
> based on end of sentences(.) without writing any custom split delimiter or
> so. Can I simply specify such delimiter when I add the input into HDFS ?
>
> Thanks,
> Robert
>



-- 
Harsh J