You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by UsefullyWastedIce <pa...@gmail.com> on 2010/01/14 19:03:34 UTC

Will hadoop work for what i am trying to achieve?

I've been doing a lot of research over the past few days, but haven't been
able to find out whether or not hadoop will work for what i am trying to
achieve. 

The data I have is initially in XML, and the user needs to be able to query
that data very quickly (response time should be in the 10 second range).
Since the amount of data will grow into can easily grow into Gigabytes, just
processing it on the fly is not fast enough. 

What I am was thinking of doing is loading that data into a hadoop cluster,
processing it, and then serving the result back to the user. There are many
tools I've looked at, and since most of them are running on top of hadoop, i
figured that this would be my biggest hurdle. 

Is 10 seconds a possible return time, including loading, processing and
returning the data? 
-- 
View this message in context: http://old.nabble.com/Will-hadoop-work-for-what-i-am-trying-to-achieve--tp27165540p27165540.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Will hadoop work for what i am trying to achieve?

Posted by stack <st...@duboce.net>.

5GB files means you are on the wrong project.  5GB is too big for hbase.
 You probably want to put your big files direct on hdfs rather than into
hbase.  I'd suggest you try it.

St.Ack

On Thu, Jan 14, 2010 at 11:17 AM, UsefullyWastedIce <pa...@gmail.com>wrote:

>
> For the sake of discussion, let's say my XML file is 5GB.
>
> When i say loading, i mean actually propagating that data across the HDFS.
> When i process the data, i will be doing two things with it: filtering out
> records that are irrelevant, and then modifying individual records by
> adding
> additional information (joining them with other data). The final results
> might potentially be saved by the end user, and then end user may want to
> come back to it, and perform additional processing on it.
>
> XSLT processing would be ideal, but i've given up on it because I didn't
> think it would work. I've done some tests with it on my local machine, and
> in order to apply XSLT to an entire file, the entire file would get loaded
> into memory, which was obviously not an option.
>
>
>
> stack-3 wrote:
> >
> > Please describe what your queries will be like and what you mean by
> > "loading, processing, and returning the data"?  So your files are xml?
> > What
> > size?  Then you'd process them in user-time?  What kinda processing?
> >  xslt'ing?
> >
> > St.Ack
> >
> > On Thu, Jan 14, 2010 at 10:03 AM, UsefullyWastedIce
> > <pa...@gmail.com>wrote:
> >
> >>
> >> I've been doing a lot of research over the past few days, but haven't
> >> been
> >> able to find out whether or not hadoop will work for what i am trying to
> >> achieve.
> >>
> >> The data I have is initially in XML, and the user needs to be able to
> >> query
> >> that data very quickly (response time should be in the 10 second range).
> >> Since the amount of data will grow into can easily grow into Gigabytes,
> >> just
> >> processing it on the fly is not fast enough.
> >>
> >> What I am was thinking of doing is loading that data into a hadoop
> >> cluster,
> >> processing it, and then serving the result back to the user. There are
> >> many
> >> tools I've looked at, and since most of them are running on top of
> >> hadoop,
> >> i
> >> figured that this would be my biggest hurdle.
> >>
> >> Is 10 seconds a possible return time, including loading, processing and
> >> returning the data?
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/Will-hadoop-work-for-what-i-am-trying-to-achieve--tp27165540p27165540.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Will-hadoop-work-for-what-i-am-trying-to-achieve--tp27165540p27166654.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Will hadoop work for what i am trying to achieve?

Posted by UsefullyWastedIce <pa...@gmail.com>.

For the sake of discussion, let's say my XML file is 5GB. 

When i say loading, i mean actually propagating that data across the HDFS.
When i process the data, i will be doing two things with it: filtering out
records that are irrelevant, and then modifying individual records by adding
additional information (joining them with other data). The final results
might potentially be saved by the end user, and then end user may want to
come back to it, and perform additional processing on it. 

XSLT processing would be ideal, but i've given up on it because I didn't
think it would work. I've done some tests with it on my local machine, and
in order to apply XSLT to an entire file, the entire file would get loaded
into memory, which was obviously not an option. 

stack-3 wrote:
> 
> Please describe what your queries will be like and what you mean by
> "loading, processing, and returning the data"?  So your files are xml? 
> What
> size?  Then you'd process them in user-time?  What kinda processing?
>  xslt'ing?
> 
> St.Ack
> 
> On Thu, Jan 14, 2010 at 10:03 AM, UsefullyWastedIce
> <pa...@gmail.com>wrote:
> 
>>
>> I've been doing a lot of research over the past few days, but haven't
>> been
>> able to find out whether or not hadoop will work for what i am trying to
>> achieve.
>>
>> The data I have is initially in XML, and the user needs to be able to
>> query
>> that data very quickly (response time should be in the 10 second range).
>> Since the amount of data will grow into can easily grow into Gigabytes,
>> just
>> processing it on the fly is not fast enough.
>>
>> What I am was thinking of doing is loading that data into a hadoop
>> cluster,
>> processing it, and then serving the result back to the user. There are
>> many
>> tools I've looked at, and since most of them are running on top of
>> hadoop,
>> i
>> figured that this would be my biggest hurdle.
>>
>> Is 10 seconds a possible return time, including loading, processing and
>> returning the data?
>> --
>> View this message in context:
>> http://old.nabble.com/Will-hadoop-work-for-what-i-am-trying-to-achieve--tp27165540p27165540.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Will-hadoop-work-for-what-i-am-trying-to-achieve--tp27165540p27166654.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Will hadoop work for what i am trying to achieve?

Posted by stack <st...@duboce.net>.

Please describe what your queries will be like and what you mean by
"loading, processing, and returning the data"?  So your files are xml?  What
size?  Then you'd process them in user-time?  What kinda processing?
 xslt'ing?

St.Ack

On Thu, Jan 14, 2010 at 10:03 AM, UsefullyWastedIce <pa...@gmail.com>wrote:

>
> I've been doing a lot of research over the past few days, but haven't been
> able to find out whether or not hadoop will work for what i am trying to
> achieve.
>
> The data I have is initially in XML, and the user needs to be able to query
> that data very quickly (response time should be in the 10 second range).
> Since the amount of data will grow into can easily grow into Gigabytes,
> just
> processing it on the fly is not fast enough.
>
> What I am was thinking of doing is loading that data into a hadoop cluster,
> processing it, and then serving the result back to the user. There are many
> tools I've looked at, and since most of them are running on top of hadoop,
> i
> figured that this would be my biggest hurdle.
>
> Is 10 seconds a possible return time, including loading, processing and
> returning the data?
> --
> View this message in context:
> http://old.nabble.com/Will-hadoop-work-for-what-i-am-trying-to-achieve--tp27165540p27165540.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>