You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by "ados1984@gmail.com" <ad...@gmail.com> on 2014/03/24 18:00:09 UTC

Architecture question on Injesting Data into Hadoop

Hello Team,

I am doing POC in Hadoop and want to understand what is recommended
architecture to injest data from different data stream like web log,
portal, mobile, pos system into Hadoop system? Also what are the use cases
where we need to have hbase on top of HDFS? Can't we only have hdfs and no
hbase and if we have only hdfs can we create tables directly on hdfs which
impala can query on?

Kindly advise !!!
Regards, Apurva

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Tariq but using Flume...how is structured data captured into
hdfs, let's say I do not have hbase or any other data store on top of
Hadoop then in that case...how will structured and un-structured data from
different input streams be captured into hdfs using flume and how can i go
in and divide what range of data would go on which node?

I am exploring Kafka for data injest mechanism, does anyone have experience
with using Kafka as core component in Hadoop Injest Project?

Kafka based architecture that I am think of is to have different kafka
queues for different data sources like web logs, mobile user activity,
portal etc and then each of this queues would have consumer that would
consume the data and put into hdfs..things am not sure about here is what
data format of message that is stored into hdfs?

also how is data partitioned among different nodes in hdfs, any thoughts or
suggestions?


On Mon, Mar 24, 2014 at 4:20 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hi Apurva,
>
> In would use some data ingestion tool like Apache Flume to make the task
> easier without much human intervention. Create sources for your different
> systems and rest will be taken care of by Fume. However, it is not a must
> to use something like Flume. But it will definitely make your life easier
> and will help you in developing a more sophisticated system, IMHO.
>
> You need HBase when you need rea-time random read/access to your data.
> Basically when you intend to have low latency access to small amounts of
> data from within a large data set and you have a flexible schema.
>
> And for the last part of your question, use Apache Hive. It provides us
> warehousing capabilities on top of an existing Hadoop cluster with an
> SQLish interface to query the stored data. Also, it will be of help while
> using Impala.
>
> HTH
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Based on what you have said, it sounds as if you want to append records
>> to a file(s) in hdfs.  I was able to do this with WebHDFS and with the
>> hadoop client.  But you asked about architecture.  Would a POST to a url
>> satisfy you as to architecture?  If so setup WebHDFS as POST to it.
>>
>>
>> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I am doing POC in Hadoop and want to understand what is recommended
>>> architecture to injest data from different data stream like web log,
>>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>>> hbase and if we have only hdfs can we create tables directly on hdfs which
>>> impala can query on?
>>>
>>> Kindly advise !!!
>>> Regards, Apurva
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Tariq but using Flume...how is structured data captured into
hdfs, let's say I do not have hbase or any other data store on top of
Hadoop then in that case...how will structured and un-structured data from
different input streams be captured into hdfs using flume and how can i go
in and divide what range of data would go on which node?

I am exploring Kafka for data injest mechanism, does anyone have experience
with using Kafka as core component in Hadoop Injest Project?

Kafka based architecture that I am think of is to have different kafka
queues for different data sources like web logs, mobile user activity,
portal etc and then each of this queues would have consumer that would
consume the data and put into hdfs..things am not sure about here is what
data format of message that is stored into hdfs?

also how is data partitioned among different nodes in hdfs, any thoughts or
suggestions?


On Mon, Mar 24, 2014 at 4:20 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hi Apurva,
>
> In would use some data ingestion tool like Apache Flume to make the task
> easier without much human intervention. Create sources for your different
> systems and rest will be taken care of by Fume. However, it is not a must
> to use something like Flume. But it will definitely make your life easier
> and will help you in developing a more sophisticated system, IMHO.
>
> You need HBase when you need rea-time random read/access to your data.
> Basically when you intend to have low latency access to small amounts of
> data from within a large data set and you have a flexible schema.
>
> And for the last part of your question, use Apache Hive. It provides us
> warehousing capabilities on top of an existing Hadoop cluster with an
> SQLish interface to query the stored data. Also, it will be of help while
> using Impala.
>
> HTH
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Based on what you have said, it sounds as if you want to append records
>> to a file(s) in hdfs.  I was able to do this with WebHDFS and with the
>> hadoop client.  But you asked about architecture.  Would a POST to a url
>> satisfy you as to architecture?  If so setup WebHDFS as POST to it.
>>
>>
>> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I am doing POC in Hadoop and want to understand what is recommended
>>> architecture to injest data from different data stream like web log,
>>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>>> hbase and if we have only hdfs can we create tables directly on hdfs which
>>> impala can query on?
>>>
>>> Kindly advise !!!
>>> Regards, Apurva
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Tariq but using Flume...how is structured data captured into
hdfs, let's say I do not have hbase or any other data store on top of
Hadoop then in that case...how will structured and un-structured data from
different input streams be captured into hdfs using flume and how can i go
in and divide what range of data would go on which node?

I am exploring Kafka for data injest mechanism, does anyone have experience
with using Kafka as core component in Hadoop Injest Project?

Kafka based architecture that I am think of is to have different kafka
queues for different data sources like web logs, mobile user activity,
portal etc and then each of this queues would have consumer that would
consume the data and put into hdfs..things am not sure about here is what
data format of message that is stored into hdfs?

also how is data partitioned among different nodes in hdfs, any thoughts or
suggestions?


On Mon, Mar 24, 2014 at 4:20 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hi Apurva,
>
> In would use some data ingestion tool like Apache Flume to make the task
> easier without much human intervention. Create sources for your different
> systems and rest will be taken care of by Fume. However, it is not a must
> to use something like Flume. But it will definitely make your life easier
> and will help you in developing a more sophisticated system, IMHO.
>
> You need HBase when you need rea-time random read/access to your data.
> Basically when you intend to have low latency access to small amounts of
> data from within a large data set and you have a flexible schema.
>
> And for the last part of your question, use Apache Hive. It provides us
> warehousing capabilities on top of an existing Hadoop cluster with an
> SQLish interface to query the stored data. Also, it will be of help while
> using Impala.
>
> HTH
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Based on what you have said, it sounds as if you want to append records
>> to a file(s) in hdfs.  I was able to do this with WebHDFS and with the
>> hadoop client.  But you asked about architecture.  Would a POST to a url
>> satisfy you as to architecture?  If so setup WebHDFS as POST to it.
>>
>>
>> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I am doing POC in Hadoop and want to understand what is recommended
>>> architecture to injest data from different data stream like web log,
>>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>>> hbase and if we have only hdfs can we create tables directly on hdfs which
>>> impala can query on?
>>>
>>> Kindly advise !!!
>>> Regards, Apurva
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Tariq but using Flume...how is structured data captured into
hdfs, let's say I do not have hbase or any other data store on top of
Hadoop then in that case...how will structured and un-structured data from
different input streams be captured into hdfs using flume and how can i go
in and divide what range of data would go on which node?

I am exploring Kafka for data injest mechanism, does anyone have experience
with using Kafka as core component in Hadoop Injest Project?

Kafka based architecture that I am think of is to have different kafka
queues for different data sources like web logs, mobile user activity,
portal etc and then each of this queues would have consumer that would
consume the data and put into hdfs..things am not sure about here is what
data format of message that is stored into hdfs?

also how is data partitioned among different nodes in hdfs, any thoughts or
suggestions?


On Mon, Mar 24, 2014 at 4:20 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hi Apurva,
>
> In would use some data ingestion tool like Apache Flume to make the task
> easier without much human intervention. Create sources for your different
> systems and rest will be taken care of by Fume. However, it is not a must
> to use something like Flume. But it will definitely make your life easier
> and will help you in developing a more sophisticated system, IMHO.
>
> You need HBase when you need rea-time random read/access to your data.
> Basically when you intend to have low latency access to small amounts of
> data from within a large data set and you have a flexible schema.
>
> And for the last part of your question, use Apache Hive. It provides us
> warehousing capabilities on top of an existing Hadoop cluster with an
> SQLish interface to query the stored data. Also, it will be of help while
> using Impala.
>
> HTH
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Based on what you have said, it sounds as if you want to append records
>> to a file(s) in hdfs.  I was able to do this with WebHDFS and with the
>> hadoop client.  But you asked about architecture.  Would a POST to a url
>> satisfy you as to architecture?  If so setup WebHDFS as POST to it.
>>
>>
>> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I am doing POC in Hadoop and want to understand what is recommended
>>> architecture to injest data from different data stream like web log,
>>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>>> hbase and if we have only hdfs can we create tables directly on hdfs which
>>> impala can query on?
>>>
>>> Kindly advise !!!
>>> Regards, Apurva
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>

Re: Architecture question on Injesting Data into Hadoop

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Apurva,

In would use some data ingestion tool like Apache Flume to make the task
easier without much human intervention. Create sources for your different
systems and rest will be taken care of by Fume. However, it is not a must
to use something like Flume. But it will definitely make your life easier
and will help you in developing a more sophisticated system, IMHO.

You need HBase when you need rea-time random read/access to your data.
Basically when you intend to have low latency access to small amounts of
data from within a large data set and you have a flexible schema.

And for the last part of your question, use Apache Hive. It provides us
warehousing capabilities on top of an existing Hadoop cluster with an
SQLish interface to query the stored data. Also, it will be of help while
using Impala.

HTH

Warm Regards,
Tariq
cloudfront.blogspot.com


On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Shahab but I am using Impala and so I can directly create tables
using Impala on hdfs without using hbase or any other noSQL technology.

Other question that I have here is let's say that I have 3 nodes...now if
am installing impala on one node then on that node am able to create tables
but how can i create tables on other nodes and how is data divided between
different nodes and where do we need to put in information about what range
of data we want in node 1 and what range of data we want in node 2 and node
3?




On Mon, Mar 24, 2014 at 4:22 PM, Shahab Yunus <sh...@gmail.com>wrote:

> @ados1984, HDFS is a file system and HBase is a data store on top of that.
> You cannot create tables (in the conventional meaning of the word table in
> database/store) directly on HDFS without HBase.
>
> Regards,
> Shahab
>
>
> On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Based on what you have said, it sounds as if you want to append records
>> to a file(s) in hdfs.  I was able to do this with WebHDFS and with the
>> hadoop client.  But you asked about architecture.  Would a POST to a url
>> satisfy you as to architecture?  If so setup WebHDFS as POST to it.
>>
>>
>> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I am doing POC in Hadoop and want to understand what is recommended
>>> architecture to injest data from different data stream like web log,
>>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>>> hbase and if we have only hdfs can we create tables directly on hdfs which
>>> impala can query on?
>>>
>>> Kindly advise !!!
>>> Regards, Apurva
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Shahab but I am using Impala and so I can directly create tables
using Impala on hdfs without using hbase or any other noSQL technology.

Other question that I have here is let's say that I have 3 nodes...now if
am installing impala on one node then on that node am able to create tables
but how can i create tables on other nodes and how is data divided between
different nodes and where do we need to put in information about what range
of data we want in node 1 and what range of data we want in node 2 and node
3?




On Mon, Mar 24, 2014 at 4:22 PM, Shahab Yunus <sh...@gmail.com>wrote:

> @ados1984, HDFS is a file system and HBase is a data store on top of that.
> You cannot create tables (in the conventional meaning of the word table in
> database/store) directly on HDFS without HBase.
>
> Regards,
> Shahab
>
>
> On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Based on what you have said, it sounds as if you want to append records
>> to a file(s) in hdfs.  I was able to do this with WebHDFS and with the
>> hadoop client.  But you asked about architecture.  Would a POST to a url
>> satisfy you as to architecture?  If so setup WebHDFS as POST to it.
>>
>>
>> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I am doing POC in Hadoop and want to understand what is recommended
>>> architecture to injest data from different data stream like web log,
>>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>>> hbase and if we have only hdfs can we create tables directly on hdfs which
>>> impala can query on?
>>>
>>> Kindly advise !!!
>>> Regards, Apurva
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Shahab but I am using Impala and so I can directly create tables
using Impala on hdfs without using hbase or any other noSQL technology.

Other question that I have here is let's say that I have 3 nodes...now if
am installing impala on one node then on that node am able to create tables
but how can i create tables on other nodes and how is data divided between
different nodes and where do we need to put in information about what range
of data we want in node 1 and what range of data we want in node 2 and node
3?




On Mon, Mar 24, 2014 at 4:22 PM, Shahab Yunus <sh...@gmail.com>wrote:

> @ados1984, HDFS is a file system and HBase is a data store on top of that.
> You cannot create tables (in the conventional meaning of the word table in
> database/store) directly on HDFS without HBase.
>
> Regards,
> Shahab
>
>
> On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Based on what you have said, it sounds as if you want to append records
>> to a file(s) in hdfs.  I was able to do this with WebHDFS and with the
>> hadoop client.  But you asked about architecture.  Would a POST to a url
>> satisfy you as to architecture?  If so setup WebHDFS as POST to it.
>>
>>
>> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I am doing POC in Hadoop and want to understand what is recommended
>>> architecture to injest data from different data stream like web log,
>>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>>> hbase and if we have only hdfs can we create tables directly on hdfs which
>>> impala can query on?
>>>
>>> Kindly advise !!!
>>> Regards, Apurva
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Thank you Shahab but I am using Impala and so I can directly create tables
using Impala on hdfs without using hbase or any other noSQL technology.

Other question that I have here is let's say that I have 3 nodes...now if
am installing impala on one node then on that node am able to create tables
but how can i create tables on other nodes and how is data divided between
different nodes and where do we need to put in information about what range
of data we want in node 1 and what range of data we want in node 2 and node
3?




On Mon, Mar 24, 2014 at 4:22 PM, Shahab Yunus <sh...@gmail.com>wrote:

> @ados1984, HDFS is a file system and HBase is a data store on top of that.
> You cannot create tables (in the conventional meaning of the word table in
> database/store) directly on HDFS without HBase.
>
> Regards,
> Shahab
>
>
> On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:
>
>> Based on what you have said, it sounds as if you want to append records
>> to a file(s) in hdfs.  I was able to do this with WebHDFS and with the
>> hadoop client.  But you asked about architecture.  Would a POST to a url
>> satisfy you as to architecture?  If so setup WebHDFS as POST to it.
>>
>>
>> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>>
>>> Hello Team,
>>>
>>> I am doing POC in Hadoop and want to understand what is recommended
>>> architecture to injest data from different data stream like web log,
>>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>>> hbase and if we have only hdfs can we create tables directly on hdfs which
>>> impala can query on?
>>>
>>> Kindly advise !!!
>>> Regards, Apurva
>>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>
>

Re: Architecture question on Injesting Data into Hadoop

Posted by Shahab Yunus <sh...@gmail.com>.
@ados1984, HDFS is a file system and HBase is a data store on top of that.
You cannot create tables (in the conventional meaning of the word table in
database/store) directly on HDFS without HBase.

Regards,
Shahab


On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Geoffry,

Can you elaborate more on your architecture...also when you refer to Hadoop
Client, what exactly are you referring to? HUE or cloudera manager or
something else?

Kindly advise.


On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Apurva,

In would use some data ingestion tool like Apache Flume to make the task
easier without much human intervention. Create sources for your different
systems and rest will be taken care of by Fume. However, it is not a must
to use something like Flume. But it will definitely make your life easier
and will help you in developing a more sophisticated system, IMHO.

You need HBase when you need rea-time random read/access to your data.
Basically when you intend to have low latency access to small amounts of
data from within a large data set and you have a flexible schema.

And for the last part of your question, use Apache Hive. It provides us
warehousing capabilities on top of an existing Hadoop cluster with an
SQLish interface to query the stored data. Also, it will be of help while
using Impala.

HTH

Warm Regards,
Tariq
cloudfront.blogspot.com


On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Geoffry,

Can you elaborate more on your architecture...also when you refer to Hadoop
Client, what exactly are you referring to? HUE or cloudera manager or
something else?

Kindly advise.


On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Geoffry,

Can you elaborate more on your architecture...also when you refer to Hadoop
Client, what exactly are you referring to? HUE or cloudera manager or
something else?

Kindly advise.


On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Apurva,

In would use some data ingestion tool like Apache Flume to make the task
easier without much human intervention. Create sources for your different
systems and rest will be taken care of by Fume. However, it is not a must
to use something like Flume. But it will definitely make your life easier
and will help you in developing a more sophisticated system, IMHO.

You need HBase when you need rea-time random read/access to your data.
Basically when you intend to have low latency access to small amounts of
data from within a large data set and you have a flexible schema.

And for the last part of your question, use Apache Hive. It provides us
warehousing capabilities on top of an existing Hadoop cluster with an
SQLish interface to query the stored data. Also, it will be of help while
using Impala.

HTH

Warm Regards,
Tariq
cloudfront.blogspot.com


On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by Shahab Yunus <sh...@gmail.com>.
@ados1984, HDFS is a file system and HBase is a data store on top of that.
You cannot create tables (in the conventional meaning of the word table in
database/store) directly on HDFS without HBase.

Regards,
Shahab


On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by "ados1984@gmail.com" <ad...@gmail.com>.
Geoffry,

Can you elaborate more on your architecture...also when you refer to Hadoop
Client, what exactly are you referring to? HUE or cloudera manager or
something else?

Kindly advise.


On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by Shahab Yunus <sh...@gmail.com>.
@ados1984, HDFS is a file system and HBase is a data store on top of that.
You cannot create tables (in the conventional meaning of the word table in
database/store) directly on HDFS without HBase.

Regards,
Shahab


On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by Shahab Yunus <sh...@gmail.com>.
@ados1984, HDFS is a file system and HBase is a data store on top of that.
You cannot create tables (in the conventional meaning of the word table in
database/store) directly on HDFS without HBase.

Regards,
Shahab


On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Apurva,

In would use some data ingestion tool like Apache Flume to make the task
easier without much human intervention. Create sources for your different
systems and rest will be taken care of by Fume. However, it is not a must
to use something like Flume. But it will definitely make your life easier
and will help you in developing a more sophisticated system, IMHO.

You need HBase when you need rea-time random read/access to your data.
Basically when you intend to have low latency access to small amounts of
data from within a large data set and you have a flexible schema.

And for the last part of your question, use Apache Hive. It provides us
warehousing capabilities on top of an existing Hadoop cluster with an
SQLish interface to query the stored data. Also, it will be of help while
using Impala.

HTH

Warm Regards,
Tariq
cloudfront.blogspot.com


On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <th...@gmail.com>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Posted by Geoffry Roberts <th...@gmail.com>.
Based on what you have said, it sounds as if you want to append records to
a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
client.  But you asked about architecture.  Would a POST to a url satisfy
you as to architecture?  If so setup WebHDFS as POST to it.


On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Hello Team,
>
> I am doing POC in Hadoop and want to understand what is recommended
> architecture to injest data from different data stream like web log,
> portal, mobile, pos system into Hadoop system? Also what are the use cases
> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
> hbase and if we have only hdfs can we create tables directly on hdfs which
> impala can query on?
>
> Kindly advise !!!
> Regards, Apurva
>



-- 
There are ways and there are ways,

Geoffry Roberts

Re: Architecture question on Injesting Data into Hadoop

Posted by Geoffry Roberts <th...@gmail.com>.
Based on what you have said, it sounds as if you want to append records to
a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
client.  But you asked about architecture.  Would a POST to a url satisfy
you as to architecture?  If so setup WebHDFS as POST to it.


On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Hello Team,
>
> I am doing POC in Hadoop and want to understand what is recommended
> architecture to injest data from different data stream like web log,
> portal, mobile, pos system into Hadoop system? Also what are the use cases
> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
> hbase and if we have only hdfs can we create tables directly on hdfs which
> impala can query on?
>
> Kindly advise !!!
> Regards, Apurva
>



-- 
There are ways and there are ways,

Geoffry Roberts

Re: Architecture question on Injesting Data into Hadoop

Posted by Geoffry Roberts <th...@gmail.com>.
Based on what you have said, it sounds as if you want to append records to
a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
client.  But you asked about architecture.  Would a POST to a url satisfy
you as to architecture?  If so setup WebHDFS as POST to it.


On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Hello Team,
>
> I am doing POC in Hadoop and want to understand what is recommended
> architecture to injest data from different data stream like web log,
> portal, mobile, pos system into Hadoop system? Also what are the use cases
> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
> hbase and if we have only hdfs can we create tables directly on hdfs which
> impala can query on?
>
> Kindly advise !!!
> Regards, Apurva
>



-- 
There are ways and there are ways,

Geoffry Roberts

Re: Architecture question on Injesting Data into Hadoop

Posted by Geoffry Roberts <th...@gmail.com>.
Based on what you have said, it sounds as if you want to append records to
a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
client.  But you asked about architecture.  Would a POST to a url satisfy
you as to architecture?  If so setup WebHDFS as POST to it.


On Mon, Mar 24, 2014 at 1:00 PM, ados1984@gmail.com <ad...@gmail.com>wrote:

> Hello Team,
>
> I am doing POC in Hadoop and want to understand what is recommended
> architecture to injest data from different data stream like web log,
> portal, mobile, pos system into Hadoop system? Also what are the use cases
> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
> hbase and if we have only hdfs can we create tables directly on hdfs which
> impala can query on?
>
> Kindly advise !!!
> Regards, Apurva
>



-- 
There are ways and there are ways,

Geoffry Roberts