You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Grandl Robert <rg...@yahoo.com> on 2012/08/22 05:21:49 UTC
Splitting input file
Hi,
I think there are many discussions about splitting the input file based on custom delimiters.
However, I am not sure if there is a simple way to split text input file based on end of sentences(.) without writing any custom split delimiter or so. Can I simply specify such delimiter when I add the input into HDFS ?
Thanks,
Robert
Re: Splitting input file
Posted by Harsh J <ha...@cloudera.com>.
Hi Grandl,
You can set "textinputformat.record.delimiter" to "," to have records
from a text file split at commas. Isn't that sufficient? You do not
need to write any special InputFormat for text files this way.
On Wed, Aug 22, 2012 at 8:51 AM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi,
>
> I think there are many discussions about splitting the input file based on
> custom delimiters.
>
> However, I am not sure if there is a simple way to split text input file
> based on end of sentences(.) without writing any custom split delimiter or
> so. Can I simply specify such delimiter when I add the input into HDFS ?
>
> Thanks,
> Robert
>
--
Harsh J
Re: Splitting input file
Posted by Harsh J <ha...@cloudera.com>.
Hi Grandl,
You can set "textinputformat.record.delimiter" to "," to have records
from a text file split at commas. Isn't that sufficient? You do not
need to write any special InputFormat for text files this way.
On Wed, Aug 22, 2012 at 8:51 AM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi,
>
> I think there are many discussions about splitting the input file based on
> custom delimiters.
>
> However, I am not sure if there is a simple way to split text input file
> based on end of sentences(.) without writing any custom split delimiter or
> so. Can I simply specify such delimiter when I add the input into HDFS ?
>
> Thanks,
> Robert
>
--
Harsh J
Re: Splitting input file
Posted by Harsh J <ha...@cloudera.com>.
Hi Grandl,
You can set "textinputformat.record.delimiter" to "," to have records
from a text file split at commas. Isn't that sufficient? You do not
need to write any special InputFormat for text files this way.
On Wed, Aug 22, 2012 at 8:51 AM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi,
>
> I think there are many discussions about splitting the input file based on
> custom delimiters.
>
> However, I am not sure if there is a simple way to split text input file
> based on end of sentences(.) without writing any custom split delimiter or
> so. Can I simply specify such delimiter when I add the input into HDFS ?
>
> Thanks,
> Robert
>
--
Harsh J
Re: Splitting input file
Posted by Harsh J <ha...@cloudera.com>.
Hi Grandl,
You can set "textinputformat.record.delimiter" to "," to have records
from a text file split at commas. Isn't that sufficient? You do not
need to write any special InputFormat for text files this way.
On Wed, Aug 22, 2012 at 8:51 AM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi,
>
> I think there are many discussions about splitting the input file based on
> custom delimiters.
>
> However, I am not sure if there is a simple way to split text input file
> based on end of sentences(.) without writing any custom split delimiter or
> so. Can I simply specify such delimiter when I add the input into HDFS ?
>
> Thanks,
> Robert
>
--
Harsh J