You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by 남경완 <kw...@gmail.com> on 2015/07/13 04:38:18 UTC

namenode doesn't update block locations when data directories of a datanode is changed

Hello

I’m running hadoop-2.4.0 cluster.
each datanode has 10 disks, directories for 10 disks are specified in
dfs.datanode.data.dir.
a few days ago, I modified dfs.datanode.data.dir of a datanode (<DN1>) to
reduce disks. so two disks were excluded from dfs.datanode.data.dir.
after the datanode was restarted, I expected that the namenode would update
block locations.
In other words, I thought the namenode should remove <DN1> from block
locations associated with blocks which were stored on excluded disks.
but, the namenode didn't update the block locations...
in my understanding, datanode send a block report to the namenode when
datanode start so the namenode should update block locations immediately.
Is a bug? Could anyone please explain?

Thank you

Re: Hadoop or RDBMS

Posted by daemeon reiydelle <da...@gmail.com>.
Based on the brief description, which includes the relatively "small"
number of records, type of queries I can "imagine" the end customer would
make, my question would be how ad hoc are the queries vs. how well managed
by traditional RDBMS schemas?

Then I would be interested to understand the nature of your growth?

If commodity hardware/scalability is a driver, the size of the data
suggests traditional schema based rdbms's, perhaps with sharding such as
e.g. sharded MySQL, Postgress seems like it could scale well at the data
sizes you suggest. If you see both a significant growth and need the
ultra-ad hoc capability of a no-schema solution, I would ask if you have
considered Cassandra+Sparc (acknowledging the no-schema nature of the
repository drives quite a bit more data denormalization in C+S than in an
RDMBS.

Net net, perhaps sharded mySQL could be a middle ground?



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Jul 13, 2015 at 3:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>

Re: Hadoop or RDBMS

Posted by daemeon reiydelle <da...@gmail.com>.
Based on the brief description, which includes the relatively "small"
number of records, type of queries I can "imagine" the end customer would
make, my question would be how ad hoc are the queries vs. how well managed
by traditional RDBMS schemas?

Then I would be interested to understand the nature of your growth?

If commodity hardware/scalability is a driver, the size of the data
suggests traditional schema based rdbms's, perhaps with sharding such as
e.g. sharded MySQL, Postgress seems like it could scale well at the data
sizes you suggest. If you see both a significant growth and need the
ultra-ad hoc capability of a no-schema solution, I would ask if you have
considered Cassandra+Sparc (acknowledging the no-schema nature of the
repository drives quite a bit more data denormalization in C+S than in an
RDMBS.

Net net, perhaps sharded mySQL could be a middle ground?



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Jul 13, 2015 at 3:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>

Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Thanks, this is clear to me.

Op 13-07-15 20:53 schreef Roman Shaposhnik <ro...@shaposhnik.org>:

>On Mon, Jul 13, 2015 at 7:27 AM, Sean Busbey <bu...@cloudera.com> wrote:
>> Given the relatively modest dataset size, this sounds like a straight
>> forward use case for a traditional RDBMS.
>
>To pile on top of Sean's reply I'd say that given the current estimates a
>traditional RDBMS such as Postgres could fit the bill. If the potential
>scalability needs to be taken into account you could look at MPP-type
>of solutions to scale from Postgres. Ping me off-list if you're interested
>in that route.
>
>To get us back on topic for user@hadoop, I'd say that Hadoop ecosystem
>strategy for your use case has to be warranted by more than a size
>of data. In fact, I'd say that the Hadoop and/or Spark would only make
>sense to you if you feel like taking advantage of various analytical
>frameworks available for those.
>
>Just my 2c.
>
>Thanks,
>Roman.



Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Thanks, this is clear to me.

Op 13-07-15 20:53 schreef Roman Shaposhnik <ro...@shaposhnik.org>:

>On Mon, Jul 13, 2015 at 7:27 AM, Sean Busbey <bu...@cloudera.com> wrote:
>> Given the relatively modest dataset size, this sounds like a straight
>> forward use case for a traditional RDBMS.
>
>To pile on top of Sean's reply I'd say that given the current estimates a
>traditional RDBMS such as Postgres could fit the bill. If the potential
>scalability needs to be taken into account you could look at MPP-type
>of solutions to scale from Postgres. Ping me off-list if you're interested
>in that route.
>
>To get us back on topic for user@hadoop, I'd say that Hadoop ecosystem
>strategy for your use case has to be warranted by more than a size
>of data. In fact, I'd say that the Hadoop and/or Spark would only make
>sense to you if you feel like taking advantage of various analytical
>frameworks available for those.
>
>Just my 2c.
>
>Thanks,
>Roman.



Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Thanks, this is clear to me.

Op 13-07-15 20:53 schreef Roman Shaposhnik <ro...@shaposhnik.org>:

>On Mon, Jul 13, 2015 at 7:27 AM, Sean Busbey <bu...@cloudera.com> wrote:
>> Given the relatively modest dataset size, this sounds like a straight
>> forward use case for a traditional RDBMS.
>
>To pile on top of Sean's reply I'd say that given the current estimates a
>traditional RDBMS such as Postgres could fit the bill. If the potential
>scalability needs to be taken into account you could look at MPP-type
>of solutions to scale from Postgres. Ping me off-list if you're interested
>in that route.
>
>To get us back on topic for user@hadoop, I'd say that Hadoop ecosystem
>strategy for your use case has to be warranted by more than a size
>of data. In fact, I'd say that the Hadoop and/or Spark would only make
>sense to you if you feel like taking advantage of various analytical
>frameworks available for those.
>
>Just my 2c.
>
>Thanks,
>Roman.



Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Thanks, this is clear to me.

Op 13-07-15 20:53 schreef Roman Shaposhnik <ro...@shaposhnik.org>:

>On Mon, Jul 13, 2015 at 7:27 AM, Sean Busbey <bu...@cloudera.com> wrote:
>> Given the relatively modest dataset size, this sounds like a straight
>> forward use case for a traditional RDBMS.
>
>To pile on top of Sean's reply I'd say that given the current estimates a
>traditional RDBMS such as Postgres could fit the bill. If the potential
>scalability needs to be taken into account you could look at MPP-type
>of solutions to scale from Postgres. Ping me off-list if you're interested
>in that route.
>
>To get us back on topic for user@hadoop, I'd say that Hadoop ecosystem
>strategy for your use case has to be warranted by more than a size
>of data. In fact, I'd say that the Hadoop and/or Spark would only make
>sense to you if you feel like taking advantage of various analytical
>frameworks available for those.
>
>Just my 2c.
>
>Thanks,
>Roman.



Re: Hadoop or RDBMS

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Mon, Jul 13, 2015 at 7:27 AM, Sean Busbey <bu...@cloudera.com> wrote:
> Given the relatively modest dataset size, this sounds like a straight
> forward use case for a traditional RDBMS.

To pile on top of Sean's reply I'd say that given the current estimates a
traditional RDBMS such as Postgres could fit the bill. If the potential
scalability needs to be taken into account you could look at MPP-type
of solutions to scale from Postgres. Ping me off-list if you're interested
in that route.

To get us back on topic for user@hadoop, I'd say that Hadoop ecosystem
strategy for your use case has to be warranted by more than a size
of data. In fact, I'd say that the Hadoop and/or Spark would only make
sense to you if you feel like taking advantage of various analytical
frameworks available for those.

Just my 2c.

Thanks,
Roman.

Re: Hadoop or RDBMS

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Mon, Jul 13, 2015 at 7:27 AM, Sean Busbey <bu...@cloudera.com> wrote:
> Given the relatively modest dataset size, this sounds like a straight
> forward use case for a traditional RDBMS.

To pile on top of Sean's reply I'd say that given the current estimates a
traditional RDBMS such as Postgres could fit the bill. If the potential
scalability needs to be taken into account you could look at MPP-type
of solutions to scale from Postgres. Ping me off-list if you're interested
in that route.

To get us back on topic for user@hadoop, I'd say that Hadoop ecosystem
strategy for your use case has to be warranted by more than a size
of data. In fact, I'd say that the Hadoop and/or Spark would only make
sense to you if you feel like taking advantage of various analytical
frameworks available for those.

Just my 2c.

Thanks,
Roman.

Re: Hadoop or RDBMS

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Mon, Jul 13, 2015 at 7:27 AM, Sean Busbey <bu...@cloudera.com> wrote:
> Given the relatively modest dataset size, this sounds like a straight
> forward use case for a traditional RDBMS.

To pile on top of Sean's reply I'd say that given the current estimates a
traditional RDBMS such as Postgres could fit the bill. If the potential
scalability needs to be taken into account you could look at MPP-type
of solutions to scale from Postgres. Ping me off-list if you're interested
in that route.

To get us back on topic for user@hadoop, I'd say that Hadoop ecosystem
strategy for your use case has to be warranted by more than a size
of data. In fact, I'd say that the Hadoop and/or Spark would only make
sense to you if you feel like taking advantage of various analytical
frameworks available for those.

Just my 2c.

Thanks,
Roman.

Re: Hadoop or RDBMS

Posted by Roman Shaposhnik <ro...@shaposhnik.org>.
On Mon, Jul 13, 2015 at 7:27 AM, Sean Busbey <bu...@cloudera.com> wrote:
> Given the relatively modest dataset size, this sounds like a straight
> forward use case for a traditional RDBMS.

To pile on top of Sean's reply I'd say that given the current estimates a
traditional RDBMS such as Postgres could fit the bill. If the potential
scalability needs to be taken into account you could look at MPP-type
of solutions to scale from Postgres. Ping me off-list if you're interested
in that route.

To get us back on topic for user@hadoop, I'd say that Hadoop ecosystem
strategy for your use case has to be warranted by more than a size
of data. In fact, I'd say that the Hadoop and/or Spark would only make
sense to you if you feel like taking advantage of various analytical
frameworks available for those.

Just my 2c.

Thanks,
Roman.

Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Customer works already with Hadoop and doesn¹t want a RDMS on the side.
So maybe ElasticSearch for Hadoop or similar solution might be good for this
to guarantee performance for real time queries.

Van:  Sean Busbey <bu...@cloudera.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 16:27
Aan:  user <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

Given the relatively modest dataset size, this sounds like a straight
forward use case for a traditional RDBMS.

Is there some other criteria that's leading you to evaluate things built on
Hadoop? Are you expecting several orders of magnitude of growth in the
record count?

On Mon, Jul 13, 2015 at 5:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:
> Hi there,
> 
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code, Geographical
> codes, Legal form code, Turnover figures, Year of establishment and so on)
> 
> Performance is very important for this online app. If one makes a selection
> with different criteria, the number of selected records should be on your
> screen in (milli) seconds.
> 
> We are not sure if Hadoop will be a good choice, for fast results we need a
> good indexed relational database in our opinionŠ
> 
> Can anybody advise me?
> 
> Thanks!
> 
> Best regards,
> 
> James Peterzon



-- 
Sean



Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Customer works already with Hadoop and doesn¹t want a RDMS on the side.
So maybe ElasticSearch for Hadoop or similar solution might be good for this
to guarantee performance for real time queries.

Van:  Sean Busbey <bu...@cloudera.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 16:27
Aan:  user <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

Given the relatively modest dataset size, this sounds like a straight
forward use case for a traditional RDBMS.

Is there some other criteria that's leading you to evaluate things built on
Hadoop? Are you expecting several orders of magnitude of growth in the
record count?

On Mon, Jul 13, 2015 at 5:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:
> Hi there,
> 
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code, Geographical
> codes, Legal form code, Turnover figures, Year of establishment and so on)
> 
> Performance is very important for this online app. If one makes a selection
> with different criteria, the number of selected records should be on your
> screen in (milli) seconds.
> 
> We are not sure if Hadoop will be a good choice, for fast results we need a
> good indexed relational database in our opinionŠ
> 
> Can anybody advise me?
> 
> Thanks!
> 
> Best regards,
> 
> James Peterzon



-- 
Sean



Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Customer works already with Hadoop and doesn¹t want a RDMS on the side.
So maybe ElasticSearch for Hadoop or similar solution might be good for this
to guarantee performance for real time queries.

Van:  Sean Busbey <bu...@cloudera.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 16:27
Aan:  user <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

Given the relatively modest dataset size, this sounds like a straight
forward use case for a traditional RDBMS.

Is there some other criteria that's leading you to evaluate things built on
Hadoop? Are you expecting several orders of magnitude of growth in the
record count?

On Mon, Jul 13, 2015 at 5:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:
> Hi there,
> 
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code, Geographical
> codes, Legal form code, Turnover figures, Year of establishment and so on)
> 
> Performance is very important for this online app. If one makes a selection
> with different criteria, the number of selected records should be on your
> screen in (milli) seconds.
> 
> We are not sure if Hadoop will be a good choice, for fast results we need a
> good indexed relational database in our opinionŠ
> 
> Can anybody advise me?
> 
> Thanks!
> 
> Best regards,
> 
> James Peterzon



-- 
Sean



Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Customer works already with Hadoop and doesn¹t want a RDMS on the side.
So maybe ElasticSearch for Hadoop or similar solution might be good for this
to guarantee performance for real time queries.

Van:  Sean Busbey <bu...@cloudera.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 16:27
Aan:  user <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

Given the relatively modest dataset size, this sounds like a straight
forward use case for a traditional RDBMS.

Is there some other criteria that's leading you to evaluate things built on
Hadoop? Are you expecting several orders of magnitude of growth in the
record count?

On Mon, Jul 13, 2015 at 5:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:
> Hi there,
> 
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code, Geographical
> codes, Legal form code, Turnover figures, Year of establishment and so on)
> 
> Performance is very important for this online app. If one makes a selection
> with different criteria, the number of selected records should be on your
> screen in (milli) seconds.
> 
> We are not sure if Hadoop will be a good choice, for fast results we need a
> good indexed relational database in our opinionŠ
> 
> Can anybody advise me?
> 
> Thanks!
> 
> Best regards,
> 
> James Peterzon



-- 
Sean



Re: Hadoop or RDBMS

Posted by Sean Busbey <bu...@cloudera.com>.
Given the relatively modest dataset size, this sounds like a straight
forward use case for a traditional RDBMS.

Is there some other criteria that's leading you to evaluate things built on
Hadoop? Are you expecting several orders of magnitude of growth in the
record count?

On Mon, Jul 13, 2015 at 5:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>



-- 
Sean

RE: Hadoop or RDBMS

Posted by yves callaert <yv...@hotmail.com>.
I would also take a look at Hbase or Spark.

regards,
Yves

Date: Mon, 13 Jul 2015 13:08:33 +0200
Subject: Re: Hadoop or RDBMS
From: james@123dm.nl
To: user@hadoop.apache.org

Thanks Harshit,  ElasticSearch for Hadoop seems a very good idea!
Regards,James
Van:  Harshit Mathur <ma...@gmail.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 12:54
Aan:  <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl> wrote:
Hi there,
We have build a (online) selection tool where marketeers can select their target groups for marketing purposes eg direct mail or telemarketing.Now we were asked to build a similar selection tool based on a Hadoop database. This database contains about 35 million records (companies) with different fields to select on (Number of emplyees, Activity code, Geographical codes, Legal form code, Turnover figures, Year of establishment and so on) 
Performance is very important for this online app. If one makes a selection with different criteria, the number of selected records should be on your screen in (milli) seconds. 
We are not sure if Hadoop will be a good choice, for fast results we need a good indexed relational database in our opinion…
Can anybody advise me?
Thanks!
Best regards,
James Peterzon

-- 
Harshit Mathur 		 	   		  

RE: Hadoop or RDBMS

Posted by yves callaert <yv...@hotmail.com>.
I would also take a look at Hbase or Spark.

regards,
Yves

Date: Mon, 13 Jul 2015 13:08:33 +0200
Subject: Re: Hadoop or RDBMS
From: james@123dm.nl
To: user@hadoop.apache.org

Thanks Harshit,  ElasticSearch for Hadoop seems a very good idea!
Regards,James
Van:  Harshit Mathur <ma...@gmail.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 12:54
Aan:  <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl> wrote:
Hi there,
We have build a (online) selection tool where marketeers can select their target groups for marketing purposes eg direct mail or telemarketing.Now we were asked to build a similar selection tool based on a Hadoop database. This database contains about 35 million records (companies) with different fields to select on (Number of emplyees, Activity code, Geographical codes, Legal form code, Turnover figures, Year of establishment and so on) 
Performance is very important for this online app. If one makes a selection with different criteria, the number of selected records should be on your screen in (milli) seconds. 
We are not sure if Hadoop will be a good choice, for fast results we need a good indexed relational database in our opinion…
Can anybody advise me?
Thanks!
Best regards,
James Peterzon

-- 
Harshit Mathur 		 	   		  

RE: Hadoop or RDBMS

Posted by yves callaert <yv...@hotmail.com>.
I would also take a look at Hbase or Spark.

regards,
Yves

Date: Mon, 13 Jul 2015 13:08:33 +0200
Subject: Re: Hadoop or RDBMS
From: james@123dm.nl
To: user@hadoop.apache.org

Thanks Harshit,  ElasticSearch for Hadoop seems a very good idea!
Regards,James
Van:  Harshit Mathur <ma...@gmail.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 12:54
Aan:  <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl> wrote:
Hi there,
We have build a (online) selection tool where marketeers can select their target groups for marketing purposes eg direct mail or telemarketing.Now we were asked to build a similar selection tool based on a Hadoop database. This database contains about 35 million records (companies) with different fields to select on (Number of emplyees, Activity code, Geographical codes, Legal form code, Turnover figures, Year of establishment and so on) 
Performance is very important for this online app. If one makes a selection with different criteria, the number of selected records should be on your screen in (milli) seconds. 
We are not sure if Hadoop will be a good choice, for fast results we need a good indexed relational database in our opinion…
Can anybody advise me?
Thanks!
Best regards,
James Peterzon

-- 
Harshit Mathur 		 	   		  

RE: Hadoop or RDBMS

Posted by yves callaert <yv...@hotmail.com>.
I would also take a look at Hbase or Spark.

regards,
Yves

Date: Mon, 13 Jul 2015 13:08:33 +0200
Subject: Re: Hadoop or RDBMS
From: james@123dm.nl
To: user@hadoop.apache.org

Thanks Harshit,  ElasticSearch for Hadoop seems a very good idea!
Regards,James
Van:  Harshit Mathur <ma...@gmail.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 12:54
Aan:  <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl> wrote:
Hi there,
We have build a (online) selection tool where marketeers can select their target groups for marketing purposes eg direct mail or telemarketing.Now we were asked to build a similar selection tool based on a Hadoop database. This database contains about 35 million records (companies) with different fields to select on (Number of emplyees, Activity code, Geographical codes, Legal form code, Turnover figures, Year of establishment and so on) 
Performance is very important for this online app. If one makes a selection with different criteria, the number of selected records should be on your screen in (milli) seconds. 
We are not sure if Hadoop will be a good choice, for fast results we need a good indexed relational database in our opinion…
Can anybody advise me?
Thanks!
Best regards,
James Peterzon

-- 
Harshit Mathur 		 	   		  

Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Thanks Harshit,  ElasticSearch for Hadoop seems a very good idea!

Regards,
James

Van:  Harshit Mathur <ma...@gmail.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 12:54
Aan:  <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:
> Hi there,
> 
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code, Geographical
> codes, Legal form code, Turnover figures, Year of establishment and so on)
> 
> Performance is very important for this online app. If one makes a selection
> with different criteria, the number of selected records should be on your
> screen in (milli) seconds.
> 
> We are not sure if Hadoop will be a good choice, for fast results we need a
> good indexed relational database in our opinionŠ
> 
> Can anybody advise me?
> 
> Thanks!
> 
> Best regards,
> 
> James Peterzon



-- 
Harshit Mathur



Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Thanks Harshit,  ElasticSearch for Hadoop seems a very good idea!

Regards,
James

Van:  Harshit Mathur <ma...@gmail.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 12:54
Aan:  <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:
> Hi there,
> 
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code, Geographical
> codes, Legal form code, Turnover figures, Year of establishment and so on)
> 
> Performance is very important for this online app. If one makes a selection
> with different criteria, the number of selected records should be on your
> screen in (milli) seconds.
> 
> We are not sure if Hadoop will be a good choice, for fast results we need a
> good indexed relational database in our opinionŠ
> 
> Can anybody advise me?
> 
> Thanks!
> 
> Best regards,
> 
> James Peterzon



-- 
Harshit Mathur



Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Thanks Harshit,  ElasticSearch for Hadoop seems a very good idea!

Regards,
James

Van:  Harshit Mathur <ma...@gmail.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 12:54
Aan:  <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:
> Hi there,
> 
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code, Geographical
> codes, Legal form code, Turnover figures, Year of establishment and so on)
> 
> Performance is very important for this online app. If one makes a selection
> with different criteria, the number of selected records should be on your
> screen in (milli) seconds.
> 
> We are not sure if Hadoop will be a good choice, for fast results we need a
> good indexed relational database in our opinionŠ
> 
> Can anybody advise me?
> 
> Thanks!
> 
> Best regards,
> 
> James Peterzon



-- 
Harshit Mathur



Re: Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Thanks Harshit,  ElasticSearch for Hadoop seems a very good idea!

Regards,
James

Van:  Harshit Mathur <ma...@gmail.com>
Beantwoorden - Aan:  <us...@hadoop.apache.org>
Datum:  maandag 13 juli 2015 12:54
Aan:  <us...@hadoop.apache.org>
Onderwerp:  Re: Hadoop or RDBMS

I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:
> Hi there,
> 
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code, Geographical
> codes, Legal form code, Turnover figures, Year of establishment and so on)
> 
> Performance is very important for this online app. If one makes a selection
> with different criteria, the number of selected records should be on your
> screen in (milli) seconds.
> 
> We are not sure if Hadoop will be a good choice, for fast results we need a
> good indexed relational database in our opinionŠ
> 
> Can anybody advise me?
> 
> Thanks!
> 
> Best regards,
> 
> James Peterzon



-- 
Harshit Mathur



Re: Hadoop or RDBMS

Posted by Harshit Mathur <ma...@gmail.com>.
I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>



-- 
Harshit Mathur

Re: Hadoop or RDBMS

Posted by Harshit Mathur <ma...@gmail.com>.
I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>



-- 
Harshit Mathur

Re: Hadoop or RDBMS

Posted by daemeon reiydelle <da...@gmail.com>.
Based on the brief description, which includes the relatively "small"
number of records, type of queries I can "imagine" the end customer would
make, my question would be how ad hoc are the queries vs. how well managed
by traditional RDBMS schemas?

Then I would be interested to understand the nature of your growth?

If commodity hardware/scalability is a driver, the size of the data
suggests traditional schema based rdbms's, perhaps with sharding such as
e.g. sharded MySQL, Postgress seems like it could scale well at the data
sizes you suggest. If you see both a significant growth and need the
ultra-ad hoc capability of a no-schema solution, I would ask if you have
considered Cassandra+Sparc (acknowledging the no-schema nature of the
repository drives quite a bit more data denormalization in C+S than in an
RDMBS.

Net net, perhaps sharded mySQL could be a middle ground?



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Jul 13, 2015 at 3:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>

Re: Hadoop or RDBMS

Posted by Harshit Mathur <ma...@gmail.com>.
I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>



-- 
Harshit Mathur

Re: Hadoop or RDBMS

Posted by daemeon reiydelle <da...@gmail.com>.
Based on the brief description, which includes the relatively "small"
number of records, type of queries I can "imagine" the end customer would
make, my question would be how ad hoc are the queries vs. how well managed
by traditional RDBMS schemas?

Then I would be interested to understand the nature of your growth?

If commodity hardware/scalability is a driver, the size of the data
suggests traditional schema based rdbms's, perhaps with sharding such as
e.g. sharded MySQL, Postgress seems like it could scale well at the data
sizes you suggest. If you see both a significant growth and need the
ultra-ad hoc capability of a no-schema solution, I would ask if you have
considered Cassandra+Sparc (acknowledging the no-schema nature of the
repository drives quite a bit more data denormalization in C+S than in an
RDMBS.

Net net, perhaps sharded mySQL could be a middle ground?



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, Jul 13, 2015 at 3:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>

Re: Hadoop or RDBMS

Posted by Sean Busbey <bu...@cloudera.com>.
Given the relatively modest dataset size, this sounds like a straight
forward use case for a traditional RDBMS.

Is there some other criteria that's leading you to evaluate things built on
Hadoop? Are you expecting several orders of magnitude of growth in the
record count?

On Mon, Jul 13, 2015 at 5:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>



-- 
Sean

Re: Hadoop or RDBMS

Posted by Harshit Mathur <ma...@gmail.com>.
I am not sure, but ElasticSearch can be a good candidate for this.

Regards,
Harshit

On Mon, Jul 13, 2015 at 4:16 PM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>



-- 
Harshit Mathur

Re: Hadoop or RDBMS

Posted by Sean Busbey <bu...@cloudera.com>.
Given the relatively modest dataset size, this sounds like a straight
forward use case for a traditional RDBMS.

Is there some other criteria that's leading you to evaluate things built on
Hadoop? Are you expecting several orders of magnitude of growth in the
record count?

On Mon, Jul 13, 2015 at 5:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>



-- 
Sean

Re: Hadoop or RDBMS

Posted by Sean Busbey <bu...@cloudera.com>.
Given the relatively modest dataset size, this sounds like a straight
forward use case for a traditional RDBMS.

Is there some other criteria that's leading you to evaluate things built on
Hadoop? Are you expecting several orders of magnitude of growth in the
record count?

On Mon, Jul 13, 2015 at 5:46 AM, James Peterzon | 123dm <ja...@123dm.nl>
wrote:

> Hi there,
>
> We have build a (online) selection tool where marketeers can select their
> target groups for marketing purposes eg direct mail or telemarketing.
> Now we were asked to build a similar selection tool based on a Hadoop
> database. This database contains about 35 million records (companies) with
> different fields to select on (Number of emplyees, Activity code,
> Geographical codes, Legal form code, Turnover figures, Year of
> establishment and so on)
>
> Performance is very important for this online app. If one makes a
> selection with different criteria, the number of selected records should be
> on your screen in (milli) seconds.
>
> We are not sure if Hadoop will be a good choice, for fast results we need
> a good indexed relational database in our opinion…
>
> Can anybody advise me?
>
> Thanks!
>
> Best regards,
>
> James Peterzon
>



-- 
Sean

Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Hi there,

We have build a (online) selection tool where marketeers can select their
target groups for marketing purposes eg direct mail or telemarketing.
Now we were asked to build a similar selection tool based on a Hadoop
database. This database contains about 35 million records (companies) with
different fields to select on (Number of emplyees, Activity code,
Geographical codes, Legal form code, Turnover figures, Year of establishment
and so on) 

Performance is very important for this online app. If one makes a selection
with different criteria, the number of selected records should be on your
screen in (milli) seconds.

We are not sure if Hadoop will be a good choice, for fast results we need a
good indexed relational database in our opinionŠ

Can anybody advise me?

Thanks!

Best regards,

James Peterzon



Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Hi there,

We have build a (online) selection tool where marketeers can select their
target groups for marketing purposes eg direct mail or telemarketing.
Now we were asked to build a similar selection tool based on a Hadoop
database. This database contains about 35 million records (companies) with
different fields to select on (Number of emplyees, Activity code,
Geographical codes, Legal form code, Turnover figures, Year of establishment
and so on) 

Performance is very important for this online app. If one makes a selection
with different criteria, the number of selected records should be on your
screen in (milli) seconds.

We are not sure if Hadoop will be a good choice, for fast results we need a
good indexed relational database in our opinionŠ

Can anybody advise me?

Thanks!

Best regards,

James Peterzon



Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Hi there,

We have build a (online) selection tool where marketeers can select their
target groups for marketing purposes eg direct mail or telemarketing.
Now we were asked to build a similar selection tool based on a Hadoop
database. This database contains about 35 million records (companies) with
different fields to select on (Number of emplyees, Activity code,
Geographical codes, Legal form code, Turnover figures, Year of establishment
and so on) 

Performance is very important for this online app. If one makes a selection
with different criteria, the number of selected records should be on your
screen in (milli) seconds.

We are not sure if Hadoop will be a good choice, for fast results we need a
good indexed relational database in our opinionŠ

Can anybody advise me?

Thanks!

Best regards,

James Peterzon



Hadoop or RDBMS

Posted by James Peterzon | 123dm <ja...@123dm.nl>.
Hi there,

We have build a (online) selection tool where marketeers can select their
target groups for marketing purposes eg direct mail or telemarketing.
Now we were asked to build a similar selection tool based on a Hadoop
database. This database contains about 35 million records (companies) with
different fields to select on (Number of emplyees, Activity code,
Geographical codes, Legal form code, Turnover figures, Year of establishment
and so on) 

Performance is very important for this online app. If one makes a selection
with different criteria, the number of selected records should be on your
screen in (milli) seconds.

We are not sure if Hadoop will be a good choice, for fast results we need a
good indexed relational database in our opinionŠ

Can anybody advise me?

Thanks!

Best regards,

James Peterzon