You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Robert Spurrier <sp...@gmail.com> on 2013/04/16 22:31:35 UTC

Querying a Prolog Server from a JVM during a MapReduce Job

Hello!

I'm working on a research project, and I also happen to be relatively new
to Hadoop/MapReduce. So apologies ahead of time for any glaring errors.

On my local machine, my project runs within a JVM and uses a Java API to
communicate with a Prolog server to do information lookups. I was planning
on deploying my project as the mapper during the MR job, but I am unclear
on how I would access the Prolog server during runtime. Would it be O.K. To
just let the server live and run on each data node while my job is running,
and have each mapper hit the server on its respective node? (let's assume
the server can handle the high volume of queries from the mappers)

I am not even remotely aware of what types of issues will arise when the
mappers (from each of their JVMs/process) query the Prolog server (running
in its own single & separate process on each node). They will only be
querying data from the server, not deleting/updating.


Anything that would make this impossible or what I should be looking out
for?

Thanks
-Robert

Re: Querying a Prolog Server from a JVM during a MapReduce Job

Posted by Steve Lewis <lo...@gmail.com>.
Assuming that the server can handle high volume and multiple queries there
is no reason not to run it on a large and powerful machine outside the
cluster. Nothing prevents your mappers from accessing a server or even,
depending on the design, a custom InputFormat from pulling data from the
server.
I would not try to run copies of the server on datanodes without a very
compelling reason.


On Tue, Apr 16, 2013 at 1:31 PM, Robert Spurrier
<sp...@gmail.com>wrote:

> Hello!
>
> I'm working on a research project, and I also happen to be relatively new
> to Hadoop/MapReduce. So apologies ahead of time for any glaring errors.
>
> On my local machine, my project runs within a JVM and uses a Java API to
> communicate with a Prolog server to do information lookups. I was planning
> on deploying my project as the mapper during the MR job, but I am unclear
> on how I would access the Prolog server during runtime. Would it be O.K. To
> just let the server live and run on each data node while my job is running,
> and have each mapper hit the server on its respective node? (let's assume
> the server can handle the high volume of queries from the mappers)
>
> I am not even remotely aware of what types of issues will arise when the
> mappers (from each of their JVMs/process) query the Prolog server (running
> in its own single & separate process on each node). They will only be
> querying data from the server, not deleting/updating.
>
>
> Anything that would make this impossible or what I should be looking out
> for?
>
> Thanks
> -Robert
>
>
>
>


-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Re: Querying a Prolog Server from a JVM during a MapReduce Job

Posted by Steve Lewis <lo...@gmail.com>.
Assuming that the server can handle high volume and multiple queries there
is no reason not to run it on a large and powerful machine outside the
cluster. Nothing prevents your mappers from accessing a server or even,
depending on the design, a custom InputFormat from pulling data from the
server.
I would not try to run copies of the server on datanodes without a very
compelling reason.


On Tue, Apr 16, 2013 at 1:31 PM, Robert Spurrier
<sp...@gmail.com>wrote:

> Hello!
>
> I'm working on a research project, and I also happen to be relatively new
> to Hadoop/MapReduce. So apologies ahead of time for any glaring errors.
>
> On my local machine, my project runs within a JVM and uses a Java API to
> communicate with a Prolog server to do information lookups. I was planning
> on deploying my project as the mapper during the MR job, but I am unclear
> on how I would access the Prolog server during runtime. Would it be O.K. To
> just let the server live and run on each data node while my job is running,
> and have each mapper hit the server on its respective node? (let's assume
> the server can handle the high volume of queries from the mappers)
>
> I am not even remotely aware of what types of issues will arise when the
> mappers (from each of their JVMs/process) query the Prolog server (running
> in its own single & separate process on each node). They will only be
> querying data from the server, not deleting/updating.
>
>
> Anything that would make this impossible or what I should be looking out
> for?
>
> Thanks
> -Robert
>
>
>
>


-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Re: Querying a Prolog Server from a JVM during a MapReduce Job

Posted by Steve Lewis <lo...@gmail.com>.
Assuming that the server can handle high volume and multiple queries there
is no reason not to run it on a large and powerful machine outside the
cluster. Nothing prevents your mappers from accessing a server or even,
depending on the design, a custom InputFormat from pulling data from the
server.
I would not try to run copies of the server on datanodes without a very
compelling reason.


On Tue, Apr 16, 2013 at 1:31 PM, Robert Spurrier
<sp...@gmail.com>wrote:

> Hello!
>
> I'm working on a research project, and I also happen to be relatively new
> to Hadoop/MapReduce. So apologies ahead of time for any glaring errors.
>
> On my local machine, my project runs within a JVM and uses a Java API to
> communicate with a Prolog server to do information lookups. I was planning
> on deploying my project as the mapper during the MR job, but I am unclear
> on how I would access the Prolog server during runtime. Would it be O.K. To
> just let the server live and run on each data node while my job is running,
> and have each mapper hit the server on its respective node? (let's assume
> the server can handle the high volume of queries from the mappers)
>
> I am not even remotely aware of what types of issues will arise when the
> mappers (from each of their JVMs/process) query the Prolog server (running
> in its own single & separate process on each node). They will only be
> querying data from the server, not deleting/updating.
>
>
> Anything that would make this impossible or what I should be looking out
> for?
>
> Thanks
> -Robert
>
>
>
>


-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Re: Querying a Prolog Server from a JVM during a MapReduce Job

Posted by Steve Lewis <lo...@gmail.com>.
Assuming that the server can handle high volume and multiple queries there
is no reason not to run it on a large and powerful machine outside the
cluster. Nothing prevents your mappers from accessing a server or even,
depending on the design, a custom InputFormat from pulling data from the
server.
I would not try to run copies of the server on datanodes without a very
compelling reason.


On Tue, Apr 16, 2013 at 1:31 PM, Robert Spurrier
<sp...@gmail.com>wrote:

> Hello!
>
> I'm working on a research project, and I also happen to be relatively new
> to Hadoop/MapReduce. So apologies ahead of time for any glaring errors.
>
> On my local machine, my project runs within a JVM and uses a Java API to
> communicate with a Prolog server to do information lookups. I was planning
> on deploying my project as the mapper during the MR job, but I am unclear
> on how I would access the Prolog server during runtime. Would it be O.K. To
> just let the server live and run on each data node while my job is running,
> and have each mapper hit the server on its respective node? (let's assume
> the server can handle the high volume of queries from the mappers)
>
> I am not even remotely aware of what types of issues will arise when the
> mappers (from each of their JVMs/process) query the Prolog server (running
> in its own single & separate process on each node). They will only be
> querying data from the server, not deleting/updating.
>
>
> Anything that would make this impossible or what I should be looking out
> for?
>
> Thanks
> -Robert
>
>
>
>


-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com