You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Siddharth Tiwari <si...@live.com> on 2013/11/01 18:08:54 UTC

best solution for data ingestion

hi team
seeking your advice on what could be best way to ingest a lot of data to hadoop. Also what are views about fuse ?

*------------------------*

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God.” 

"Maybe other people will try to limit me but I don't limit myself"
 		 	   		  

Re: best solution for data ingestion

Posted by Chris Mattmann <ma...@apache.org>.
Hi Guys,

Depending on the *type* of ingestion you are trying to do into HDFS,
the combination of Apache OODT (http://oodt.apache.org/) and Apache
Tika (http://tika.apache.org/) may do the trick.

Cheers,
Chris



-----Original Message-----
From: Bing Jiang <ji...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Monday, November 4, 2013 2:34 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: best solution for data ingestion

>Apache Pig is also a solution for data ingest, which gives more flexible
>in functionality and more efficient in development.
>
>
>Regards.
>Bing
>
>
>2013/11/2 Marcel Mitsuto F. S. <mi...@gmail.com>
>
>I've done some testing with flume, but ended up using syslog-ng, more
>flexible, reliable, and with a lower fingerprint.
>
>
>On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf
><mi...@gmail.com> wrote:
>
>Have a look on Sqoop for data from RDBMS or Flume, if data flows and
>multiple sources have to be used.
>Best wishes
>Mirko
>
>
>
>2013/11/1 Siddharth Tiwari <si...@live.com>
>
>hi team
>
>seeking your advice on what could be best way to ingest a lot of data to
>hadoop. Also what are views about fuse ?
>
>
>*------------------------*
>Cheers !!!
>SiddharthTiwari
>Have a refreshing day !!!
>"Every duty is holy, and devotion to duty is the highest form of worship
>of God.”
>
>"Maybe other people will try to limit me but I don't limit myself"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>-- 
>Bing Jiang
>Tel:(86)134-2619-1361
>weibo: http://weibo.com/jiangbinglover
>BLOG: www.binospace.com <http://www.binospace.com>
>BLOG: http://blog.sina.com.cn/jiangbinglover
>
>Focus on distributed computing, HDFS/HBase
>
>
>



Re: best solution for data ingestion

Posted by Chris Mattmann <ma...@apache.org>.
Hi Guys,

Depending on the *type* of ingestion you are trying to do into HDFS,
the combination of Apache OODT (http://oodt.apache.org/) and Apache
Tika (http://tika.apache.org/) may do the trick.

Cheers,
Chris



-----Original Message-----
From: Bing Jiang <ji...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Monday, November 4, 2013 2:34 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: best solution for data ingestion

>Apache Pig is also a solution for data ingest, which gives more flexible
>in functionality and more efficient in development.
>
>
>Regards.
>Bing
>
>
>2013/11/2 Marcel Mitsuto F. S. <mi...@gmail.com>
>
>I've done some testing with flume, but ended up using syslog-ng, more
>flexible, reliable, and with a lower fingerprint.
>
>
>On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf
><mi...@gmail.com> wrote:
>
>Have a look on Sqoop for data from RDBMS or Flume, if data flows and
>multiple sources have to be used.
>Best wishes
>Mirko
>
>
>
>2013/11/1 Siddharth Tiwari <si...@live.com>
>
>hi team
>
>seeking your advice on what could be best way to ingest a lot of data to
>hadoop. Also what are views about fuse ?
>
>
>*------------------------*
>Cheers !!!
>SiddharthTiwari
>Have a refreshing day !!!
>"Every duty is holy, and devotion to duty is the highest form of worship
>of God.”
>
>"Maybe other people will try to limit me but I don't limit myself"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>-- 
>Bing Jiang
>Tel:(86)134-2619-1361
>weibo: http://weibo.com/jiangbinglover
>BLOG: www.binospace.com <http://www.binospace.com>
>BLOG: http://blog.sina.com.cn/jiangbinglover
>
>Focus on distributed computing, HDFS/HBase
>
>
>



Re: best solution for data ingestion

Posted by Chris Mattmann <ma...@apache.org>.
Hi Guys,

Depending on the *type* of ingestion you are trying to do into HDFS,
the combination of Apache OODT (http://oodt.apache.org/) and Apache
Tika (http://tika.apache.org/) may do the trick.

Cheers,
Chris



-----Original Message-----
From: Bing Jiang <ji...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Monday, November 4, 2013 2:34 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: best solution for data ingestion

>Apache Pig is also a solution for data ingest, which gives more flexible
>in functionality and more efficient in development.
>
>
>Regards.
>Bing
>
>
>2013/11/2 Marcel Mitsuto F. S. <mi...@gmail.com>
>
>I've done some testing with flume, but ended up using syslog-ng, more
>flexible, reliable, and with a lower fingerprint.
>
>
>On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf
><mi...@gmail.com> wrote:
>
>Have a look on Sqoop for data from RDBMS or Flume, if data flows and
>multiple sources have to be used.
>Best wishes
>Mirko
>
>
>
>2013/11/1 Siddharth Tiwari <si...@live.com>
>
>hi team
>
>seeking your advice on what could be best way to ingest a lot of data to
>hadoop. Also what are views about fuse ?
>
>
>*------------------------*
>Cheers !!!
>SiddharthTiwari
>Have a refreshing day !!!
>"Every duty is holy, and devotion to duty is the highest form of worship
>of God.”
>
>"Maybe other people will try to limit me but I don't limit myself"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>-- 
>Bing Jiang
>Tel:(86)134-2619-1361
>weibo: http://weibo.com/jiangbinglover
>BLOG: www.binospace.com <http://www.binospace.com>
>BLOG: http://blog.sina.com.cn/jiangbinglover
>
>Focus on distributed computing, HDFS/HBase
>
>
>



Re: best solution for data ingestion

Posted by Chris Mattmann <ma...@apache.org>.
Hi Guys,

Depending on the *type* of ingestion you are trying to do into HDFS,
the combination of Apache OODT (http://oodt.apache.org/) and Apache
Tika (http://tika.apache.org/) may do the trick.

Cheers,
Chris



-----Original Message-----
From: Bing Jiang <ji...@gmail.com>
Reply-To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Date: Monday, November 4, 2013 2:34 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Re: best solution for data ingestion

>Apache Pig is also a solution for data ingest, which gives more flexible
>in functionality and more efficient in development.
>
>
>Regards.
>Bing
>
>
>2013/11/2 Marcel Mitsuto F. S. <mi...@gmail.com>
>
>I've done some testing with flume, but ended up using syslog-ng, more
>flexible, reliable, and with a lower fingerprint.
>
>
>On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf
><mi...@gmail.com> wrote:
>
>Have a look on Sqoop for data from RDBMS or Flume, if data flows and
>multiple sources have to be used.
>Best wishes
>Mirko
>
>
>
>2013/11/1 Siddharth Tiwari <si...@live.com>
>
>hi team
>
>seeking your advice on what could be best way to ingest a lot of data to
>hadoop. Also what are views about fuse ?
>
>
>*------------------------*
>Cheers !!!
>SiddharthTiwari
>Have a refreshing day !!!
>"Every duty is holy, and devotion to duty is the highest form of worship
>of God.”
>
>"Maybe other people will try to limit me but I don't limit myself"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>-- 
>Bing Jiang
>Tel:(86)134-2619-1361
>weibo: http://weibo.com/jiangbinglover
>BLOG: www.binospace.com <http://www.binospace.com>
>BLOG: http://blog.sina.com.cn/jiangbinglover
>
>Focus on distributed computing, HDFS/HBase
>
>
>



Re: best solution for data ingestion

Posted by Bing Jiang <ji...@gmail.com>.
Apache Pig is also a solution for data ingest, which gives more flexible in
functionality and more efficient in development.

Regards.
Bing

2013/11/2 Marcel Mitsuto F. S. <mi...@gmail.com>

> I've done some testing with flume, but ended up using syslog-ng, more
> flexible, reliable, and with a lower fingerprint.
>
>
> On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>
>> Have a look on Sqoop for data from RDBMS or Flume, if data flows and
>> multiple sources have to be used.
>> Best wishes
>> Mirko
>>
>>
>> 2013/11/1 Siddharth Tiwari <si...@live.com>
>>
>>> hi team
>>>
>>> seeking your advice on what could be best way to ingest a lot of data to
>>> hadoop. Also what are views about fuse ?
>>>
>>>
>>> **------------------------**
>>> *Cheers !!!*
>>> *Siddharth Tiwari*
>>> Have a refreshing day !!!
>>> *"Every duty is holy, and devotion to duty is the highest form of
>>> worship of God.” *
>>> *"Maybe other people will try to limit me but I don't limit myself"*
>>>
>>
>>
>


-- 
Bing Jiang
Tel:(86)134-2619-1361
weibo: http://weibo.com/jiangbinglover
BLOG: www.binospace.com
BLOG: http://blog.sina.com.cn/jiangbinglover
Focus on distributed computing, HDFS/HBase

Re: best solution for data ingestion

Posted by Bing Jiang <ji...@gmail.com>.
Apache Pig is also a solution for data ingest, which gives more flexible in
functionality and more efficient in development.

Regards.
Bing

2013/11/2 Marcel Mitsuto F. S. <mi...@gmail.com>

> I've done some testing with flume, but ended up using syslog-ng, more
> flexible, reliable, and with a lower fingerprint.
>
>
> On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>
>> Have a look on Sqoop for data from RDBMS or Flume, if data flows and
>> multiple sources have to be used.
>> Best wishes
>> Mirko
>>
>>
>> 2013/11/1 Siddharth Tiwari <si...@live.com>
>>
>>> hi team
>>>
>>> seeking your advice on what could be best way to ingest a lot of data to
>>> hadoop. Also what are views about fuse ?
>>>
>>>
>>> **------------------------**
>>> *Cheers !!!*
>>> *Siddharth Tiwari*
>>> Have a refreshing day !!!
>>> *"Every duty is holy, and devotion to duty is the highest form of
>>> worship of God.” *
>>> *"Maybe other people will try to limit me but I don't limit myself"*
>>>
>>
>>
>


-- 
Bing Jiang
Tel:(86)134-2619-1361
weibo: http://weibo.com/jiangbinglover
BLOG: www.binospace.com
BLOG: http://blog.sina.com.cn/jiangbinglover
Focus on distributed computing, HDFS/HBase

Re: best solution for data ingestion

Posted by Bing Jiang <ji...@gmail.com>.
Apache Pig is also a solution for data ingest, which gives more flexible in
functionality and more efficient in development.

Regards.
Bing

2013/11/2 Marcel Mitsuto F. S. <mi...@gmail.com>

> I've done some testing with flume, but ended up using syslog-ng, more
> flexible, reliable, and with a lower fingerprint.
>
>
> On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>
>> Have a look on Sqoop for data from RDBMS or Flume, if data flows and
>> multiple sources have to be used.
>> Best wishes
>> Mirko
>>
>>
>> 2013/11/1 Siddharth Tiwari <si...@live.com>
>>
>>> hi team
>>>
>>> seeking your advice on what could be best way to ingest a lot of data to
>>> hadoop. Also what are views about fuse ?
>>>
>>>
>>> **------------------------**
>>> *Cheers !!!*
>>> *Siddharth Tiwari*
>>> Have a refreshing day !!!
>>> *"Every duty is holy, and devotion to duty is the highest form of
>>> worship of God.” *
>>> *"Maybe other people will try to limit me but I don't limit myself"*
>>>
>>
>>
>


-- 
Bing Jiang
Tel:(86)134-2619-1361
weibo: http://weibo.com/jiangbinglover
BLOG: www.binospace.com
BLOG: http://blog.sina.com.cn/jiangbinglover
Focus on distributed computing, HDFS/HBase

Re: best solution for data ingestion

Posted by Bing Jiang <ji...@gmail.com>.
Apache Pig is also a solution for data ingest, which gives more flexible in
functionality and more efficient in development.

Regards.
Bing

2013/11/2 Marcel Mitsuto F. S. <mi...@gmail.com>

> I've done some testing with flume, but ended up using syslog-ng, more
> flexible, reliable, and with a lower fingerprint.
>
>
> On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf <mi...@gmail.com>wrote:
>
>> Have a look on Sqoop for data from RDBMS or Flume, if data flows and
>> multiple sources have to be used.
>> Best wishes
>> Mirko
>>
>>
>> 2013/11/1 Siddharth Tiwari <si...@live.com>
>>
>>> hi team
>>>
>>> seeking your advice on what could be best way to ingest a lot of data to
>>> hadoop. Also what are views about fuse ?
>>>
>>>
>>> **------------------------**
>>> *Cheers !!!*
>>> *Siddharth Tiwari*
>>> Have a refreshing day !!!
>>> *"Every duty is holy, and devotion to duty is the highest form of
>>> worship of God.” *
>>> *"Maybe other people will try to limit me but I don't limit myself"*
>>>
>>
>>
>


-- 
Bing Jiang
Tel:(86)134-2619-1361
weibo: http://weibo.com/jiangbinglover
BLOG: www.binospace.com
BLOG: http://blog.sina.com.cn/jiangbinglover
Focus on distributed computing, HDFS/HBase

Re: best solution for data ingestion

Posted by "Marcel Mitsuto F. S." <mi...@gmail.com>.
I've done some testing with flume, but ended up using syslog-ng, more
flexible, reliable, and with a lower fingerprint.


On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> Have a look on Sqoop for data from RDBMS or Flume, if data flows and
> multiple sources have to be used.
> Best wishes
> Mirko
>
>
> 2013/11/1 Siddharth Tiwari <si...@live.com>
>
>> hi team
>>
>> seeking your advice on what could be best way to ingest a lot of data to
>> hadoop. Also what are views about fuse ?
>>
>>
>> **------------------------**
>> *Cheers !!!*
>> *Siddharth Tiwari*
>> Have a refreshing day !!!
>> *"Every duty is holy, and devotion to duty is the highest form of
>> worship of God.” *
>> *"Maybe other people will try to limit me but I don't limit myself"*
>>
>
>

Re: best solution for data ingestion

Posted by "Marcel Mitsuto F. S." <mi...@gmail.com>.
I've done some testing with flume, but ended up using syslog-ng, more
flexible, reliable, and with a lower fingerprint.


On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> Have a look on Sqoop for data from RDBMS or Flume, if data flows and
> multiple sources have to be used.
> Best wishes
> Mirko
>
>
> 2013/11/1 Siddharth Tiwari <si...@live.com>
>
>> hi team
>>
>> seeking your advice on what could be best way to ingest a lot of data to
>> hadoop. Also what are views about fuse ?
>>
>>
>> **------------------------**
>> *Cheers !!!*
>> *Siddharth Tiwari*
>> Have a refreshing day !!!
>> *"Every duty is holy, and devotion to duty is the highest form of
>> worship of God.” *
>> *"Maybe other people will try to limit me but I don't limit myself"*
>>
>
>

Re: best solution for data ingestion

Posted by "Marcel Mitsuto F. S." <mi...@gmail.com>.
I've done some testing with flume, but ended up using syslog-ng, more
flexible, reliable, and with a lower fingerprint.


On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> Have a look on Sqoop for data from RDBMS or Flume, if data flows and
> multiple sources have to be used.
> Best wishes
> Mirko
>
>
> 2013/11/1 Siddharth Tiwari <si...@live.com>
>
>> hi team
>>
>> seeking your advice on what could be best way to ingest a lot of data to
>> hadoop. Also what are views about fuse ?
>>
>>
>> **------------------------**
>> *Cheers !!!*
>> *Siddharth Tiwari*
>> Have a refreshing day !!!
>> *"Every duty is holy, and devotion to duty is the highest form of
>> worship of God.” *
>> *"Maybe other people will try to limit me but I don't limit myself"*
>>
>
>

Re: best solution for data ingestion

Posted by "Marcel Mitsuto F. S." <mi...@gmail.com>.
I've done some testing with flume, but ended up using syslog-ng, more
flexible, reliable, and with a lower fingerprint.


On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf <mi...@gmail.com> wrote:

> Have a look on Sqoop for data from RDBMS or Flume, if data flows and
> multiple sources have to be used.
> Best wishes
> Mirko
>
>
> 2013/11/1 Siddharth Tiwari <si...@live.com>
>
>> hi team
>>
>> seeking your advice on what could be best way to ingest a lot of data to
>> hadoop. Also what are views about fuse ?
>>
>>
>> **------------------------**
>> *Cheers !!!*
>> *Siddharth Tiwari*
>> Have a refreshing day !!!
>> *"Every duty is holy, and devotion to duty is the highest form of
>> worship of God.” *
>> *"Maybe other people will try to limit me but I don't limit myself"*
>>
>
>

Re: best solution for data ingestion

Posted by Mirko Kämpf <mi...@gmail.com>.
Have a look on Sqoop for data from RDBMS or Flume, if data flows and
multiple sources have to be used.
Best wishes
Mirko


2013/11/1 Siddharth Tiwari <si...@live.com>

> hi team
>
> seeking your advice on what could be best way to ingest a lot of data to
> hadoop. Also what are views about fuse ?
>
>
> **------------------------**
> *Cheers !!!*
> *Siddharth Tiwari*
> Have a refreshing day !!!
> *"Every duty is holy, and devotion to duty is the highest form of worship
> of God.” *
> *"Maybe other people will try to limit me but I don't limit myself"*
>

Re: best solution for data ingestion

Posted by Mirko Kämpf <mi...@gmail.com>.
Have a look on Sqoop for data from RDBMS or Flume, if data flows and
multiple sources have to be used.
Best wishes
Mirko


2013/11/1 Siddharth Tiwari <si...@live.com>

> hi team
>
> seeking your advice on what could be best way to ingest a lot of data to
> hadoop. Also what are views about fuse ?
>
>
> **------------------------**
> *Cheers !!!*
> *Siddharth Tiwari*
> Have a refreshing day !!!
> *"Every duty is holy, and devotion to duty is the highest form of worship
> of God.” *
> *"Maybe other people will try to limit me but I don't limit myself"*
>

Re: best solution for data ingestion

Posted by Mirko Kämpf <mi...@gmail.com>.
Have a look on Sqoop for data from RDBMS or Flume, if data flows and
multiple sources have to be used.
Best wishes
Mirko


2013/11/1 Siddharth Tiwari <si...@live.com>

> hi team
>
> seeking your advice on what could be best way to ingest a lot of data to
> hadoop. Also what are views about fuse ?
>
>
> **------------------------**
> *Cheers !!!*
> *Siddharth Tiwari*
> Have a refreshing day !!!
> *"Every duty is holy, and devotion to duty is the highest form of worship
> of God.” *
> *"Maybe other people will try to limit me but I don't limit myself"*
>

Re: best solution for data ingestion

Posted by Mirko Kämpf <mi...@gmail.com>.
Have a look on Sqoop for data from RDBMS or Flume, if data flows and
multiple sources have to be used.
Best wishes
Mirko


2013/11/1 Siddharth Tiwari <si...@live.com>

> hi team
>
> seeking your advice on what could be best way to ingest a lot of data to
> hadoop. Also what are views about fuse ?
>
>
> **------------------------**
> *Cheers !!!*
> *Siddharth Tiwari*
> Have a refreshing day !!!
> *"Every duty is holy, and devotion to duty is the highest form of worship
> of God.” *
> *"Maybe other people will try to limit me but I don't limit myself"*
>