You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Mark Kerzner <ma...@shmsoft.com> on 2012/10/10 04:47:30 UTC

Hadoop/Lucene + Solr architecture suggestions?

Hi,

if I create a Lucene index in each mapper, locally, then copy them to under
/jobid/mapid1, /jodid/mapid2, and then in the reducers copy them to some
Solr machine (perhaps even merging), does such architecture makes sense, to
create a searchable index with Hadoop?

Are there links for similar architectures and questions?

Thank you. Sincerely,
Mark

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ivan Frain <iv...@gmail.com>.

Hi Mark,

I don't know Lucene/Solr very well but your question made me remember the
lily project: http://www.lilyproject.org/lily/index.html. They use
hadoop/hbase and solr to provide a searchable data management platform.

Maybe you will find ideas in their documentation.

BR,
Ivan

2012/10/10 Mark Kerzner <ma...@shmsoft.com>

> Hi,
>
> if I create a Lucene index in each mapper, locally, then copy them to
> under /jobid/mapid1, /jodid/mapid2, and then in the reducers copy them to
> some Solr machine (perhaps even merging), does such architecture makes
> sense, to create a searchable index with Hadoop?
>
> Are there links for similar architectures and questions?
>
> Thank you. Sincerely,
> Mark
>



-- 
Ivan Frain
11, route de Grenade
31530 Saint-Paul-sur-Save
mobile: +33 (0)6 52 52 47 07

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Mark Kerzner <ma...@shmsoft.com>.

That is very interesting, Lance, thank you.

Mark

On Wed, Oct 10, 2012 at 9:15 PM, Lance Norskog <go...@gmail.com> wrote:

> In the LucidWorks Big Data product, we handle this with a reducer that
> sends documents to a SolrCloud cluster. This way the index files are not
> managed by Hadoop.
>
> ----- Original Message -----
> | From: "Ted Dunning" <td...@maprtech.com>
> | To: user@hadoop.apache.org
> | Cc: "Hadoop User" <us...@hadoop.apache.org>
> | Sent: Wednesday, October 10, 2012 7:58:57 AM
> | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> |
> | I prefer to create indexes in the reducer personally.
> |
> | Also you can avoid the copies if you use an advanced hadoop-derived
> | distro. Email me off list for details.
> |
> | Sent from my iPhone
> |
> | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> | wrote:
> |
> | > Hi,
> | >
> | > if I create a Lucene index in each mapper, locally, then copy them
> | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> | > copy them to some Solr machine (perhaps even merging), does such
> | > architecture makes sense, to create a searchable index with
> | > Hadoop?
> | >
> | > Are there links for similar architectures and questions?
> | >
> | > Thank you. Sincerely,
> | > Mark
> |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

Your mileage will vary.  I find that this exact motion is what makes reduce
side indexing very attractive since it allows precise sharding control.

On Thu, Oct 11, 2012 at 10:20 PM, Lance Norskog <go...@gmail.com> wrote:

> The problem is finding the documents in the search cluster. Making the
> indexes in the reducer means the final m/r pass has to segregate the
> documents according to the distribution function. This uses network
> bandwidth for Hadoop coordination, while using SolrCloud only pushes
> each document once. It's a disk bandwidth v.s. network bandwidth
> problem. Although a combiner in the final m/r pass would really speed
> up the hadoop shuffle.
>
> Lance
>
>     From: "Ted Dunning" <td...@maprtech.com>
>     To: user@hadoop.apache.org
>     Sent: Wednesday, October 10, 2012 11:13:36 PM
>     Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
>
>     Having solr cloud do the indexing instead of having map-reduce do
> the indexing causes substantial write amplification.
>
>     The issue is that cores are replicated in solr cloud.  To keep the
> cores in sync, all of the replicas index the same documents leading to
> amplification.  Further amplification occurs when all of the logs get
> a copy of the document as well.
>
>     Indexing in map-reduce provides considerably higher latency, but
> it drops the write amplification dramatically because only one copy of
> the index needs to be created.  For MapR, this index is created
> directly in the dfs, in vanilla Hadoop you would typically create the
> index in local disk and copy to HDFS.  The major expense, however, is
> the indexing which is nice to do only once.  Reduce-side indexing has
> the advantage that your indexing bandwidth naturally increases with
> your increasing cluster size.
>
>     Deployment of indexes can be done by copying from HDFS, or
> directly deploying using NFS from MapR.  Either way, all of the shard
> replicas appear under the live SolR.  Obviously, if you copy from
> HDFS, you have some issues with making sure that things appear
> correctly.  One way to deal with this is to double the copy steps by
> copying from HDFS and then using a differential copy to leave files as
> unchanged as possible. You want to do that to allow as much of the
> memory image to stay live as possible.  With NFS and transactional
> updates, that isn't an issue, of course.
>
>     On the extreme side, you can host all of the searchers on NFS
> hosted index shards.  You can have one Solr instance devoted to
> indexing each shard.  This will cause the shards to update and each
> Solr index will detect these changes after a few seconds.  This gives
> near real-time search with high isolation between indexing and search
> loads.
>
>
>     On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:
>
>         How can you store Solr shards in hadoop? Is each data node
> running a Solr server?  If so - is the reducer doing a trick to write
> to a local fs?
>
>         Sent from my iPad
>
>         On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com>
> wrote:
>
>             Interestingly, a few MapR customers have gone the other
> way, deliberately having the indexer put  the Solr shards directly
> into MapR and letting it distribute it. Has made index-management a
> cinch.
>
>             Otherwise they do run into what Tim alludes to.
>
>             On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams
> <wi...@gmail.com> wrote:
>
>                 On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog
> <go...@gmail.com> wrote:
>                 > In the LucidWorks Big Data product, we handle this
> with a reducer that sends documents to a SolrCloud cluster. This way
> the index files are not managed by Hadoop.
>
>                 Hi Lance,
>                 I'm curious if you've gotten that to work with a
> decent-sized (e.g. >
>                 250 node) cluster?  Even a trivial cluster seems to
> crush SolrCloud
>                 from a few months ago at least...
>
>                 Thanks,
>                 --tim
>
>                 > ----- Original Message -----
>                 > | From: "Ted Dunning" <td...@maprtech.com>
>                 > | To: user@hadoop.apache.org
>                 > | Cc: "Hadoop User" <us...@hadoop.apache.org>
>                 > | Sent: Wednesday, October 10, 2012 7:58:57 AM
>                 > | Subject: Re: Hadoop/Lucene + Solr architecture
> suggestions?
>                 > |
>                 > | I prefer to create indexes in the reducer personally.
>                 > |
>                 > | Also you can avoid the copies if you use an
> advanced hadoop-derived
>                 > | distro. Email me off list for details.
>                 > |
>                 > | Sent from my iPhone
>                 > |
>                 > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner
> <ma...@shmsoft.com>
>                 > | wrote:
>                 > |
>                 > | > Hi,
>                 > | >
>                 > | > if I create a Lucene index in each mapper,
> locally, then copy them
>                 > | > to under /jobid/mapid1, /jodid/mapid2, and then
> in the reducers
>                 > | > copy them to some Solr machine (perhaps even
> merging), does such
>                 > | > architecture makes sense, to create a searchable
> index with
>                 > | > Hadoop?
>                 > | >
>                 > | > Are there links for similar architectures and
> questions?
>                 > | >
>                 > | > Thank you. Sincerely,
>                 > | > Mark
>                 > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

Your mileage will vary.  I find that this exact motion is what makes reduce
side indexing very attractive since it allows precise sharding control.

On Thu, Oct 11, 2012 at 10:20 PM, Lance Norskog <go...@gmail.com> wrote:

> The problem is finding the documents in the search cluster. Making the
> indexes in the reducer means the final m/r pass has to segregate the
> documents according to the distribution function. This uses network
> bandwidth for Hadoop coordination, while using SolrCloud only pushes
> each document once. It's a disk bandwidth v.s. network bandwidth
> problem. Although a combiner in the final m/r pass would really speed
> up the hadoop shuffle.
>
> Lance
>
>     From: "Ted Dunning" <td...@maprtech.com>
>     To: user@hadoop.apache.org
>     Sent: Wednesday, October 10, 2012 11:13:36 PM
>     Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
>
>     Having solr cloud do the indexing instead of having map-reduce do
> the indexing causes substantial write amplification.
>
>     The issue is that cores are replicated in solr cloud.  To keep the
> cores in sync, all of the replicas index the same documents leading to
> amplification.  Further amplification occurs when all of the logs get
> a copy of the document as well.
>
>     Indexing in map-reduce provides considerably higher latency, but
> it drops the write amplification dramatically because only one copy of
> the index needs to be created.  For MapR, this index is created
> directly in the dfs, in vanilla Hadoop you would typically create the
> index in local disk and copy to HDFS.  The major expense, however, is
> the indexing which is nice to do only once.  Reduce-side indexing has
> the advantage that your indexing bandwidth naturally increases with
> your increasing cluster size.
>
>     Deployment of indexes can be done by copying from HDFS, or
> directly deploying using NFS from MapR.  Either way, all of the shard
> replicas appear under the live SolR.  Obviously, if you copy from
> HDFS, you have some issues with making sure that things appear
> correctly.  One way to deal with this is to double the copy steps by
> copying from HDFS and then using a differential copy to leave files as
> unchanged as possible. You want to do that to allow as much of the
> memory image to stay live as possible.  With NFS and transactional
> updates, that isn't an issue, of course.
>
>     On the extreme side, you can host all of the searchers on NFS
> hosted index shards.  You can have one Solr instance devoted to
> indexing each shard.  This will cause the shards to update and each
> Solr index will detect these changes after a few seconds.  This gives
> near real-time search with high isolation between indexing and search
> loads.
>
>
>     On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:
>
>         How can you store Solr shards in hadoop? Is each data node
> running a Solr server?  If so - is the reducer doing a trick to write
> to a local fs?
>
>         Sent from my iPad
>
>         On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com>
> wrote:
>
>             Interestingly, a few MapR customers have gone the other
> way, deliberately having the indexer put  the Solr shards directly
> into MapR and letting it distribute it. Has made index-management a
> cinch.
>
>             Otherwise they do run into what Tim alludes to.
>
>             On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams
> <wi...@gmail.com> wrote:
>
>                 On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog
> <go...@gmail.com> wrote:
>                 > In the LucidWorks Big Data product, we handle this
> with a reducer that sends documents to a SolrCloud cluster. This way
> the index files are not managed by Hadoop.
>
>                 Hi Lance,
>                 I'm curious if you've gotten that to work with a
> decent-sized (e.g. >
>                 250 node) cluster?  Even a trivial cluster seems to
> crush SolrCloud
>                 from a few months ago at least...
>
>                 Thanks,
>                 --tim
>
>                 > ----- Original Message -----
>                 > | From: "Ted Dunning" <td...@maprtech.com>
>                 > | To: user@hadoop.apache.org
>                 > | Cc: "Hadoop User" <us...@hadoop.apache.org>
>                 > | Sent: Wednesday, October 10, 2012 7:58:57 AM
>                 > | Subject: Re: Hadoop/Lucene + Solr architecture
> suggestions?
>                 > |
>                 > | I prefer to create indexes in the reducer personally.
>                 > |
>                 > | Also you can avoid the copies if you use an
> advanced hadoop-derived
>                 > | distro. Email me off list for details.
>                 > |
>                 > | Sent from my iPhone
>                 > |
>                 > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner
> <ma...@shmsoft.com>
>                 > | wrote:
>                 > |
>                 > | > Hi,
>                 > | >
>                 > | > if I create a Lucene index in each mapper,
> locally, then copy them
>                 > | > to under /jobid/mapid1, /jodid/mapid2, and then
> in the reducers
>                 > | > copy them to some Solr machine (perhaps even
> merging), does such
>                 > | > architecture makes sense, to create a searchable
> index with
>                 > | > Hadoop?
>                 > | >
>                 > | > Are there links for similar architectures and
> questions?
>                 > | >
>                 > | > Thank you. Sincerely,
>                 > | > Mark
>                 > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

Your mileage will vary.  I find that this exact motion is what makes reduce
side indexing very attractive since it allows precise sharding control.

On Thu, Oct 11, 2012 at 10:20 PM, Lance Norskog <go...@gmail.com> wrote:

> The problem is finding the documents in the search cluster. Making the
> indexes in the reducer means the final m/r pass has to segregate the
> documents according to the distribution function. This uses network
> bandwidth for Hadoop coordination, while using SolrCloud only pushes
> each document once. It's a disk bandwidth v.s. network bandwidth
> problem. Although a combiner in the final m/r pass would really speed
> up the hadoop shuffle.
>
> Lance
>
>     From: "Ted Dunning" <td...@maprtech.com>
>     To: user@hadoop.apache.org
>     Sent: Wednesday, October 10, 2012 11:13:36 PM
>     Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
>
>     Having solr cloud do the indexing instead of having map-reduce do
> the indexing causes substantial write amplification.
>
>     The issue is that cores are replicated in solr cloud.  To keep the
> cores in sync, all of the replicas index the same documents leading to
> amplification.  Further amplification occurs when all of the logs get
> a copy of the document as well.
>
>     Indexing in map-reduce provides considerably higher latency, but
> it drops the write amplification dramatically because only one copy of
> the index needs to be created.  For MapR, this index is created
> directly in the dfs, in vanilla Hadoop you would typically create the
> index in local disk and copy to HDFS.  The major expense, however, is
> the indexing which is nice to do only once.  Reduce-side indexing has
> the advantage that your indexing bandwidth naturally increases with
> your increasing cluster size.
>
>     Deployment of indexes can be done by copying from HDFS, or
> directly deploying using NFS from MapR.  Either way, all of the shard
> replicas appear under the live SolR.  Obviously, if you copy from
> HDFS, you have some issues with making sure that things appear
> correctly.  One way to deal with this is to double the copy steps by
> copying from HDFS and then using a differential copy to leave files as
> unchanged as possible. You want to do that to allow as much of the
> memory image to stay live as possible.  With NFS and transactional
> updates, that isn't an issue, of course.
>
>     On the extreme side, you can host all of the searchers on NFS
> hosted index shards.  You can have one Solr instance devoted to
> indexing each shard.  This will cause the shards to update and each
> Solr index will detect these changes after a few seconds.  This gives
> near real-time search with high isolation between indexing and search
> loads.
>
>
>     On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:
>
>         How can you store Solr shards in hadoop? Is each data node
> running a Solr server?  If so - is the reducer doing a trick to write
> to a local fs?
>
>         Sent from my iPad
>
>         On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com>
> wrote:
>
>             Interestingly, a few MapR customers have gone the other
> way, deliberately having the indexer put  the Solr shards directly
> into MapR and letting it distribute it. Has made index-management a
> cinch.
>
>             Otherwise they do run into what Tim alludes to.
>
>             On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams
> <wi...@gmail.com> wrote:
>
>                 On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog
> <go...@gmail.com> wrote:
>                 > In the LucidWorks Big Data product, we handle this
> with a reducer that sends documents to a SolrCloud cluster. This way
> the index files are not managed by Hadoop.
>
>                 Hi Lance,
>                 I'm curious if you've gotten that to work with a
> decent-sized (e.g. >
>                 250 node) cluster?  Even a trivial cluster seems to
> crush SolrCloud
>                 from a few months ago at least...
>
>                 Thanks,
>                 --tim
>
>                 > ----- Original Message -----
>                 > | From: "Ted Dunning" <td...@maprtech.com>
>                 > | To: user@hadoop.apache.org
>                 > | Cc: "Hadoop User" <us...@hadoop.apache.org>
>                 > | Sent: Wednesday, October 10, 2012 7:58:57 AM
>                 > | Subject: Re: Hadoop/Lucene + Solr architecture
> suggestions?
>                 > |
>                 > | I prefer to create indexes in the reducer personally.
>                 > |
>                 > | Also you can avoid the copies if you use an
> advanced hadoop-derived
>                 > | distro. Email me off list for details.
>                 > |
>                 > | Sent from my iPhone
>                 > |
>                 > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner
> <ma...@shmsoft.com>
>                 > | wrote:
>                 > |
>                 > | > Hi,
>                 > | >
>                 > | > if I create a Lucene index in each mapper,
> locally, then copy them
>                 > | > to under /jobid/mapid1, /jodid/mapid2, and then
> in the reducers
>                 > | > copy them to some Solr machine (perhaps even
> merging), does such
>                 > | > architecture makes sense, to create a searchable
> index with
>                 > | > Hadoop?
>                 > | >
>                 > | > Are there links for similar architectures and
> questions?
>                 > | >
>                 > | > Thank you. Sincerely,
>                 > | > Mark
>                 > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

Your mileage will vary.  I find that this exact motion is what makes reduce
side indexing very attractive since it allows precise sharding control.

On Thu, Oct 11, 2012 at 10:20 PM, Lance Norskog <go...@gmail.com> wrote:

> The problem is finding the documents in the search cluster. Making the
> indexes in the reducer means the final m/r pass has to segregate the
> documents according to the distribution function. This uses network
> bandwidth for Hadoop coordination, while using SolrCloud only pushes
> each document once. It's a disk bandwidth v.s. network bandwidth
> problem. Although a combiner in the final m/r pass would really speed
> up the hadoop shuffle.
>
> Lance
>
>     From: "Ted Dunning" <td...@maprtech.com>
>     To: user@hadoop.apache.org
>     Sent: Wednesday, October 10, 2012 11:13:36 PM
>     Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
>
>     Having solr cloud do the indexing instead of having map-reduce do
> the indexing causes substantial write amplification.
>
>     The issue is that cores are replicated in solr cloud.  To keep the
> cores in sync, all of the replicas index the same documents leading to
> amplification.  Further amplification occurs when all of the logs get
> a copy of the document as well.
>
>     Indexing in map-reduce provides considerably higher latency, but
> it drops the write amplification dramatically because only one copy of
> the index needs to be created.  For MapR, this index is created
> directly in the dfs, in vanilla Hadoop you would typically create the
> index in local disk and copy to HDFS.  The major expense, however, is
> the indexing which is nice to do only once.  Reduce-side indexing has
> the advantage that your indexing bandwidth naturally increases with
> your increasing cluster size.
>
>     Deployment of indexes can be done by copying from HDFS, or
> directly deploying using NFS from MapR.  Either way, all of the shard
> replicas appear under the live SolR.  Obviously, if you copy from
> HDFS, you have some issues with making sure that things appear
> correctly.  One way to deal with this is to double the copy steps by
> copying from HDFS and then using a differential copy to leave files as
> unchanged as possible. You want to do that to allow as much of the
> memory image to stay live as possible.  With NFS and transactional
> updates, that isn't an issue, of course.
>
>     On the extreme side, you can host all of the searchers on NFS
> hosted index shards.  You can have one Solr instance devoted to
> indexing each shard.  This will cause the shards to update and each
> Solr index will detect these changes after a few seconds.  This gives
> near real-time search with high isolation between indexing and search
> loads.
>
>
>     On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:
>
>         How can you store Solr shards in hadoop? Is each data node
> running a Solr server?  If so - is the reducer doing a trick to write
> to a local fs?
>
>         Sent from my iPad
>
>         On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com>
> wrote:
>
>             Interestingly, a few MapR customers have gone the other
> way, deliberately having the indexer put  the Solr shards directly
> into MapR and letting it distribute it. Has made index-management a
> cinch.
>
>             Otherwise they do run into what Tim alludes to.
>
>             On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams
> <wi...@gmail.com> wrote:
>
>                 On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog
> <go...@gmail.com> wrote:
>                 > In the LucidWorks Big Data product, we handle this
> with a reducer that sends documents to a SolrCloud cluster. This way
> the index files are not managed by Hadoop.
>
>                 Hi Lance,
>                 I'm curious if you've gotten that to work with a
> decent-sized (e.g. >
>                 250 node) cluster?  Even a trivial cluster seems to
> crush SolrCloud
>                 from a few months ago at least...
>
>                 Thanks,
>                 --tim
>
>                 > ----- Original Message -----
>                 > | From: "Ted Dunning" <td...@maprtech.com>
>                 > | To: user@hadoop.apache.org
>                 > | Cc: "Hadoop User" <us...@hadoop.apache.org>
>                 > | Sent: Wednesday, October 10, 2012 7:58:57 AM
>                 > | Subject: Re: Hadoop/Lucene + Solr architecture
> suggestions?
>                 > |
>                 > | I prefer to create indexes in the reducer personally.
>                 > |
>                 > | Also you can avoid the copies if you use an
> advanced hadoop-derived
>                 > | distro. Email me off list for details.
>                 > |
>                 > | Sent from my iPhone
>                 > |
>                 > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner
> <ma...@shmsoft.com>
>                 > | wrote:
>                 > |
>                 > | > Hi,
>                 > | >
>                 > | > if I create a Lucene index in each mapper,
> locally, then copy them
>                 > | > to under /jobid/mapid1, /jodid/mapid2, and then
> in the reducers
>                 > | > copy them to some Solr machine (perhaps even
> merging), does such
>                 > | > architecture makes sense, to create a searchable
> index with
>                 > | > Hadoop?
>                 > | >
>                 > | > Are there links for similar architectures and
> questions?
>                 > | >
>                 > | > Thank you. Sincerely,
>                 > | > Mark
>                 > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Lance Norskog <go...@gmail.com>.

The problem is finding the documents in the search cluster. Making the
indexes in the reducer means the final m/r pass has to segregate the
documents according to the distribution function. This uses network
bandwidth for Hadoop coordination, while using SolrCloud only pushes
each document once. It's a disk bandwidth v.s. network bandwidth
problem. Although a combiner in the final m/r pass would really speed
up the hadoop shuffle.

Lance

    From: "Ted Dunning" <td...@maprtech.com>
    To: user@hadoop.apache.org
    Sent: Wednesday, October 10, 2012 11:13:36 PM
    Subject: Re: Hadoop/Lucene + Solr architecture suggestions?

    Having solr cloud do the indexing instead of having map-reduce do
the indexing causes substantial write amplification.

    The issue is that cores are replicated in solr cloud.  To keep the
cores in sync, all of the replicas index the same documents leading to
amplification.  Further amplification occurs when all of the logs get
a copy of the document as well.

    Indexing in map-reduce provides considerably higher latency, but
it drops the write amplification dramatically because only one copy of
the index needs to be created.  For MapR, this index is created
directly in the dfs, in vanilla Hadoop you would typically create the
index in local disk and copy to HDFS.  The major expense, however, is
the indexing which is nice to do only once.  Reduce-side indexing has
the advantage that your indexing bandwidth naturally increases with
your increasing cluster size.

    Deployment of indexes can be done by copying from HDFS, or
directly deploying using NFS from MapR.  Either way, all of the shard
replicas appear under the live SolR.  Obviously, if you copy from
HDFS, you have some issues with making sure that things appear
correctly.  One way to deal with this is to double the copy steps by
copying from HDFS and then using a differential copy to leave files as
unchanged as possible. You want to do that to allow as much of the
memory image to stay live as possible.  With NFS and transactional
updates, that isn't an issue, of course.

    On the extreme side, you can host all of the searchers on NFS
hosted index shards.  You can have one Solr instance devoted to
indexing each shard.  This will cause the shards to update and each
Solr index will detect these changes after a few seconds.  This gives
near real-time search with high isolation between indexing and search
loads.


    On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:

        How can you store Solr shards in hadoop? Is each data node
running a Solr server?  If so - is the reducer doing a trick to write
to a local fs?

        Sent from my iPad

        On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:

            Interestingly, a few MapR customers have gone the other
way, deliberately having the indexer put  the Solr shards directly
into MapR and letting it distribute it. Has made index-management a
cinch.

            Otherwise they do run into what Tim alludes to.

            On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams
<wi...@gmail.com> wrote:

                On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog
<go...@gmail.com> wrote:
                > In the LucidWorks Big Data product, we handle this
with a reducer that sends documents to a SolrCloud cluster. This way
the index files are not managed by Hadoop.

                Hi Lance,
                I'm curious if you've gotten that to work with a
decent-sized (e.g. >
                250 node) cluster?  Even a trivial cluster seems to
crush SolrCloud
                from a few months ago at least...

                Thanks,
                --tim

                > ----- Original Message -----
                > | From: "Ted Dunning" <td...@maprtech.com>
                > | To: user@hadoop.apache.org
                > | Cc: "Hadoop User" <us...@hadoop.apache.org>
                > | Sent: Wednesday, October 10, 2012 7:58:57 AM
                > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
                > |
                > | I prefer to create indexes in the reducer personally.
                > |
                > | Also you can avoid the copies if you use an
advanced hadoop-derived
                > | distro. Email me off list for details.
                > |
                > | Sent from my iPhone
                > |
                > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner
<ma...@shmsoft.com>
                > | wrote:
                > |
                > | > Hi,
                > | >
                > | > if I create a Lucene index in each mapper,
locally, then copy them
                > | > to under /jobid/mapid1, /jodid/mapid2, and then
in the reducers
                > | > copy them to some Solr machine (perhaps even
merging), does such
                > | > architecture makes sense, to create a searchable
index with
                > | > Hadoop?
                > | >
                > | > Are there links for similar architectures and questions?
                > | >
                > | > Thank you. Sincerely,
                > | > Mark
                > |

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Lance Norskog <go...@gmail.com>.

The problem is finding the documents in the search cluster. Making the
indexes in the reducer means the final m/r pass has to segregate the
documents according to the distribution function. This uses network
bandwidth for Hadoop coordination, while using SolrCloud only pushes
each document once. It's a disk bandwidth v.s. network bandwidth
problem. Although a combiner in the final m/r pass would really speed
up the hadoop shuffle.

Lance

    From: "Ted Dunning" <td...@maprtech.com>
    To: user@hadoop.apache.org
    Sent: Wednesday, October 10, 2012 11:13:36 PM
    Subject: Re: Hadoop/Lucene + Solr architecture suggestions?

    Having solr cloud do the indexing instead of having map-reduce do
the indexing causes substantial write amplification.

    The issue is that cores are replicated in solr cloud.  To keep the
cores in sync, all of the replicas index the same documents leading to
amplification.  Further amplification occurs when all of the logs get
a copy of the document as well.

    Indexing in map-reduce provides considerably higher latency, but
it drops the write amplification dramatically because only one copy of
the index needs to be created.  For MapR, this index is created
directly in the dfs, in vanilla Hadoop you would typically create the
index in local disk and copy to HDFS.  The major expense, however, is
the indexing which is nice to do only once.  Reduce-side indexing has
the advantage that your indexing bandwidth naturally increases with
your increasing cluster size.

    Deployment of indexes can be done by copying from HDFS, or
directly deploying using NFS from MapR.  Either way, all of the shard
replicas appear under the live SolR.  Obviously, if you copy from
HDFS, you have some issues with making sure that things appear
correctly.  One way to deal with this is to double the copy steps by
copying from HDFS and then using a differential copy to leave files as
unchanged as possible. You want to do that to allow as much of the
memory image to stay live as possible.  With NFS and transactional
updates, that isn't an issue, of course.

    On the extreme side, you can host all of the searchers on NFS
hosted index shards.  You can have one Solr instance devoted to
indexing each shard.  This will cause the shards to update and each
Solr index will detect these changes after a few seconds.  This gives
near real-time search with high isolation between indexing and search
loads.


    On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:

        How can you store Solr shards in hadoop? Is each data node
running a Solr server?  If so - is the reducer doing a trick to write
to a local fs?

        Sent from my iPad

        On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:

            Interestingly, a few MapR customers have gone the other
way, deliberately having the indexer put  the Solr shards directly
into MapR and letting it distribute it. Has made index-management a
cinch.

            Otherwise they do run into what Tim alludes to.

            On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams
<wi...@gmail.com> wrote:

                On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog
<go...@gmail.com> wrote:
                > In the LucidWorks Big Data product, we handle this
with a reducer that sends documents to a SolrCloud cluster. This way
the index files are not managed by Hadoop.

                Hi Lance,
                I'm curious if you've gotten that to work with a
decent-sized (e.g. >
                250 node) cluster?  Even a trivial cluster seems to
crush SolrCloud
                from a few months ago at least...

                Thanks,
                --tim

                > ----- Original Message -----
                > | From: "Ted Dunning" <td...@maprtech.com>
                > | To: user@hadoop.apache.org
                > | Cc: "Hadoop User" <us...@hadoop.apache.org>
                > | Sent: Wednesday, October 10, 2012 7:58:57 AM
                > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
                > |
                > | I prefer to create indexes in the reducer personally.
                > |
                > | Also you can avoid the copies if you use an
advanced hadoop-derived
                > | distro. Email me off list for details.
                > |
                > | Sent from my iPhone
                > |
                > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner
<ma...@shmsoft.com>
                > | wrote:
                > |
                > | > Hi,
                > | >
                > | > if I create a Lucene index in each mapper,
locally, then copy them
                > | > to under /jobid/mapid1, /jodid/mapid2, and then
in the reducers
                > | > copy them to some Solr machine (perhaps even
merging), does such
                > | > architecture makes sense, to create a searchable
index with
                > | > Hadoop?
                > | >
                > | > Are there links for similar architectures and questions?
                > | >
                > | > Thank you. Sincerely,
                > | > Mark
                > |

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Lance Norskog <go...@gmail.com>.

The problem is finding the documents in the search cluster. Making the
indexes in the reducer means the final m/r pass has to segregate the
documents according to the distribution function. This uses network
bandwidth for Hadoop coordination, while using SolrCloud only pushes
each document once. It's a disk bandwidth v.s. network bandwidth
problem. Although a combiner in the final m/r pass would really speed
up the hadoop shuffle.

Lance

    From: "Ted Dunning" <td...@maprtech.com>
    To: user@hadoop.apache.org
    Sent: Wednesday, October 10, 2012 11:13:36 PM
    Subject: Re: Hadoop/Lucene + Solr architecture suggestions?

    Having solr cloud do the indexing instead of having map-reduce do
the indexing causes substantial write amplification.

    The issue is that cores are replicated in solr cloud.  To keep the
cores in sync, all of the replicas index the same documents leading to
amplification.  Further amplification occurs when all of the logs get
a copy of the document as well.

    Indexing in map-reduce provides considerably higher latency, but
it drops the write amplification dramatically because only one copy of
the index needs to be created.  For MapR, this index is created
directly in the dfs, in vanilla Hadoop you would typically create the
index in local disk and copy to HDFS.  The major expense, however, is
the indexing which is nice to do only once.  Reduce-side indexing has
the advantage that your indexing bandwidth naturally increases with
your increasing cluster size.

    Deployment of indexes can be done by copying from HDFS, or
directly deploying using NFS from MapR.  Either way, all of the shard
replicas appear under the live SolR.  Obviously, if you copy from
HDFS, you have some issues with making sure that things appear
correctly.  One way to deal with this is to double the copy steps by
copying from HDFS and then using a differential copy to leave files as
unchanged as possible. You want to do that to allow as much of the
memory image to stay live as possible.  With NFS and transactional
updates, that isn't an issue, of course.

    On the extreme side, you can host all of the searchers on NFS
hosted index shards.  You can have one Solr instance devoted to
indexing each shard.  This will cause the shards to update and each
Solr index will detect these changes after a few seconds.  This gives
near real-time search with high isolation between indexing and search
loads.


    On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:

        How can you store Solr shards in hadoop? Is each data node
running a Solr server?  If so - is the reducer doing a trick to write
to a local fs?

        Sent from my iPad

        On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:

            Interestingly, a few MapR customers have gone the other
way, deliberately having the indexer put  the Solr shards directly
into MapR and letting it distribute it. Has made index-management a
cinch.

            Otherwise they do run into what Tim alludes to.

            On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams
<wi...@gmail.com> wrote:

                On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog
<go...@gmail.com> wrote:
                > In the LucidWorks Big Data product, we handle this
with a reducer that sends documents to a SolrCloud cluster. This way
the index files are not managed by Hadoop.

                Hi Lance,
                I'm curious if you've gotten that to work with a
decent-sized (e.g. >
                250 node) cluster?  Even a trivial cluster seems to
crush SolrCloud
                from a few months ago at least...

                Thanks,
                --tim

                > ----- Original Message -----
                > | From: "Ted Dunning" <td...@maprtech.com>
                > | To: user@hadoop.apache.org
                > | Cc: "Hadoop User" <us...@hadoop.apache.org>
                > | Sent: Wednesday, October 10, 2012 7:58:57 AM
                > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
                > |
                > | I prefer to create indexes in the reducer personally.
                > |
                > | Also you can avoid the copies if you use an
advanced hadoop-derived
                > | distro. Email me off list for details.
                > |
                > | Sent from my iPhone
                > |
                > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner
<ma...@shmsoft.com>
                > | wrote:
                > |
                > | > Hi,
                > | >
                > | > if I create a Lucene index in each mapper,
locally, then copy them
                > | > to under /jobid/mapid1, /jodid/mapid2, and then
in the reducers
                > | > copy them to some Solr machine (perhaps even
merging), does such
                > | > architecture makes sense, to create a searchable
index with
                > | > Hadoop?
                > | >
                > | > Are there links for similar architectures and questions?
                > | >
                > | > Thank you. Sincerely,
                > | > Mark
                > |

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Lance Norskog <go...@gmail.com>.

The problem is finding the documents in the search cluster. Making the
indexes in the reducer means the final m/r pass has to segregate the
documents according to the distribution function. This uses network
bandwidth for Hadoop coordination, while using SolrCloud only pushes
each document once. It's a disk bandwidth v.s. network bandwidth
problem. Although a combiner in the final m/r pass would really speed
up the hadoop shuffle.

Lance

    From: "Ted Dunning" <td...@maprtech.com>
    To: user@hadoop.apache.org
    Sent: Wednesday, October 10, 2012 11:13:36 PM
    Subject: Re: Hadoop/Lucene + Solr architecture suggestions?

    Having solr cloud do the indexing instead of having map-reduce do
the indexing causes substantial write amplification.

    The issue is that cores are replicated in solr cloud.  To keep the
cores in sync, all of the replicas index the same documents leading to
amplification.  Further amplification occurs when all of the logs get
a copy of the document as well.

    Indexing in map-reduce provides considerably higher latency, but
it drops the write amplification dramatically because only one copy of
the index needs to be created.  For MapR, this index is created
directly in the dfs, in vanilla Hadoop you would typically create the
index in local disk and copy to HDFS.  The major expense, however, is
the indexing which is nice to do only once.  Reduce-side indexing has
the advantage that your indexing bandwidth naturally increases with
your increasing cluster size.

    Deployment of indexes can be done by copying from HDFS, or
directly deploying using NFS from MapR.  Either way, all of the shard
replicas appear under the live SolR.  Obviously, if you copy from
HDFS, you have some issues with making sure that things appear
correctly.  One way to deal with this is to double the copy steps by
copying from HDFS and then using a differential copy to leave files as
unchanged as possible. You want to do that to allow as much of the
memory image to stay live as possible.  With NFS and transactional
updates, that isn't an issue, of course.

    On the extreme side, you can host all of the searchers on NFS
hosted index shards.  You can have one Solr instance devoted to
indexing each shard.  This will cause the shards to update and each
Solr index will detect these changes after a few seconds.  This gives
near real-time search with high isolation between indexing and search
loads.


    On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:

        How can you store Solr shards in hadoop? Is each data node
running a Solr server?  If so - is the reducer doing a trick to write
to a local fs?

        Sent from my iPad

        On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:

            Interestingly, a few MapR customers have gone the other
way, deliberately having the indexer put  the Solr shards directly
into MapR and letting it distribute it. Has made index-management a
cinch.

            Otherwise they do run into what Tim alludes to.

            On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams
<wi...@gmail.com> wrote:

                On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog
<go...@gmail.com> wrote:
                > In the LucidWorks Big Data product, we handle this
with a reducer that sends documents to a SolrCloud cluster. This way
the index files are not managed by Hadoop.

                Hi Lance,
                I'm curious if you've gotten that to work with a
decent-sized (e.g. >
                250 node) cluster?  Even a trivial cluster seems to
crush SolrCloud
                from a few months ago at least...

                Thanks,
                --tim

                > ----- Original Message -----
                > | From: "Ted Dunning" <td...@maprtech.com>
                > | To: user@hadoop.apache.org
                > | Cc: "Hadoop User" <us...@hadoop.apache.org>
                > | Sent: Wednesday, October 10, 2012 7:58:57 AM
                > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
                > |
                > | I prefer to create indexes in the reducer personally.
                > |
                > | Also you can avoid the copies if you use an
advanced hadoop-derived
                > | distro. Email me off list for details.
                > |
                > | Sent from my iPhone
                > |
                > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner
<ma...@shmsoft.com>
                > | wrote:
                > |
                > | > Hi,
                > | >
                > | > if I create a Lucene index in each mapper,
locally, then copy them
                > | > to under /jobid/mapid1, /jodid/mapid2, and then
in the reducers
                > | > copy them to some Solr machine (perhaps even
merging), does such
                > | > architecture makes sense, to create a searchable
index with
                > | > Hadoop?
                > | >
                > | > Are there links for similar architectures and questions?
                > | >
                > | > Thank you. Sincerely,
                > | > Mark
                > |

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

Having solr cloud do the indexing instead of having map-reduce do the
indexing causes substantial write amplification.

The issue is that cores are replicated in solr cloud.  To keep the cores in
sync, all of the replicas index the same documents leading to
amplification.  Further amplification occurs when all of the logs get a
copy of the document as well.

Indexing in map-reduce provides considerably higher latency, but it drops
the write amplification dramatically because only one copy of the index
needs to be created.  For MapR, this index is created directly in the dfs,
in vanilla Hadoop you would typically create the index in local disk and
copy to HDFS.  The major expense, however, is the indexing which is nice to
do only once.  Reduce-side indexing has the advantage that your indexing
bandwidth naturally increases with your increasing cluster size.

Deployment of indexes can be done by copying from HDFS, or directly
deploying using NFS from MapR.  Either way, all of the shard replicas
appear under the live SolR.  Obviously, if you copy from HDFS, you have
some issues with making sure that things appear correctly.  One way to deal
with this is to double the copy steps by copying from HDFS and then using a
differential copy to leave files as unchanged as possible. You want to do
that to allow as much of the memory image to stay live as possible.  With
NFS and transactional updates, that isn't an issue, of course.

On the extreme side, you can host all of the searchers on NFS hosted index
shards.  You can have one Solr instance devoted to indexing each shard.
 This will cause the shards to update and each Solr index will detect these
changes after a few seconds.  This gives near real-time search with high
isolation between indexing and search loads.

On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:

> How can you store Solr shards in hadoop? Is each data node running a Solr
> server?  If so - is the reducer doing a trick to write to a local fs?
>
> Sent from my iPad
>
> On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:
>
> Interestingly, a few MapR customers have gone the other way, deliberately
> having the indexer put  the Solr shards directly into MapR and letting it
> distribute it. Has made index-management a cinch.
>
> Otherwise they do run into what Tim alludes to.
>
> On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams < <wi...@gmail.com>
> williamstw@gmail.com> wrote:
>
>> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog < <go...@gmail.com>
>> goksron@gmail.com> wrote:
>> > In the LucidWorks Big Data product, we handle this with a reducer that
>> sends documents to a SolrCloud cluster. This way the index files are not
>> managed by Hadoop.
>>
>> Hi Lance,
>> I'm curious if you've gotten that to work with a decent-sized (e.g. >
>> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
>> from a few months ago at least...
>>
>> Thanks,
>> --tim
>>
>> > ----- Original Message -----
>> > | From: "Ted Dunning" < <td...@maprtech.com>
>> > | To: <us...@hadoop.apache.org>user@hadoop.apache.org
>> > | Cc: "Hadoop User" < <us...@hadoop.apache.org>
>> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
>> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
>> > |
>> > | I prefer to create indexes in the reducer personally.
>> > |
>> > | Also you can avoid the copies if you use an advanced hadoop-derived
>> > | distro. Email me off list for details.
>> > |
>> > | Sent from my iPhone
>> > |
>> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner < <ma...@shmsoft.com>
>> mark.kerzner@shmsoft.com>
>> > | wrote:
>> > |
>> > | > Hi,
>> > | >
>> > | > if I create a Lucene index in each mapper, locally, then copy them
>> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
>> > | > copy them to some Solr machine (perhaps even merging), does such
>> > | > architecture makes sense, to create a searchable index with
>> > | > Hadoop?
>> > | >
>> > | > Are there links for similar architectures and questions?
>> > | >
>> > | > Thank you. Sincerely,
>> > | > Mark
>> > |
>>
>
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

Having solr cloud do the indexing instead of having map-reduce do the
indexing causes substantial write amplification.

The issue is that cores are replicated in solr cloud.  To keep the cores in
sync, all of the replicas index the same documents leading to
amplification.  Further amplification occurs when all of the logs get a
copy of the document as well.

Indexing in map-reduce provides considerably higher latency, but it drops
the write amplification dramatically because only one copy of the index
needs to be created.  For MapR, this index is created directly in the dfs,
in vanilla Hadoop you would typically create the index in local disk and
copy to HDFS.  The major expense, however, is the indexing which is nice to
do only once.  Reduce-side indexing has the advantage that your indexing
bandwidth naturally increases with your increasing cluster size.

Deployment of indexes can be done by copying from HDFS, or directly
deploying using NFS from MapR.  Either way, all of the shard replicas
appear under the live SolR.  Obviously, if you copy from HDFS, you have
some issues with making sure that things appear correctly.  One way to deal
with this is to double the copy steps by copying from HDFS and then using a
differential copy to leave files as unchanged as possible. You want to do
that to allow as much of the memory image to stay live as possible.  With
NFS and transactional updates, that isn't an issue, of course.

On the extreme side, you can host all of the searchers on NFS hosted index
shards.  You can have one Solr instance devoted to indexing each shard.
 This will cause the shards to update and each Solr index will detect these
changes after a few seconds.  This gives near real-time search with high
isolation between indexing and search loads.

On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:

> How can you store Solr shards in hadoop? Is each data node running a Solr
> server?  If so - is the reducer doing a trick to write to a local fs?
>
> Sent from my iPad
>
> On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:
>
> Interestingly, a few MapR customers have gone the other way, deliberately
> having the indexer put  the Solr shards directly into MapR and letting it
> distribute it. Has made index-management a cinch.
>
> Otherwise they do run into what Tim alludes to.
>
> On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams < <wi...@gmail.com>
> williamstw@gmail.com> wrote:
>
>> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog < <go...@gmail.com>
>> goksron@gmail.com> wrote:
>> > In the LucidWorks Big Data product, we handle this with a reducer that
>> sends documents to a SolrCloud cluster. This way the index files are not
>> managed by Hadoop.
>>
>> Hi Lance,
>> I'm curious if you've gotten that to work with a decent-sized (e.g. >
>> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
>> from a few months ago at least...
>>
>> Thanks,
>> --tim
>>
>> > ----- Original Message -----
>> > | From: "Ted Dunning" < <td...@maprtech.com>
>> > | To: <us...@hadoop.apache.org>user@hadoop.apache.org
>> > | Cc: "Hadoop User" < <us...@hadoop.apache.org>
>> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
>> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
>> > |
>> > | I prefer to create indexes in the reducer personally.
>> > |
>> > | Also you can avoid the copies if you use an advanced hadoop-derived
>> > | distro. Email me off list for details.
>> > |
>> > | Sent from my iPhone
>> > |
>> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner < <ma...@shmsoft.com>
>> mark.kerzner@shmsoft.com>
>> > | wrote:
>> > |
>> > | > Hi,
>> > | >
>> > | > if I create a Lucene index in each mapper, locally, then copy them
>> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
>> > | > copy them to some Solr machine (perhaps even merging), does such
>> > | > architecture makes sense, to create a searchable index with
>> > | > Hadoop?
>> > | >
>> > | > Are there links for similar architectures and questions?
>> > | >
>> > | > Thank you. Sincerely,
>> > | > Mark
>> > |
>>
>
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

Having solr cloud do the indexing instead of having map-reduce do the
indexing causes substantial write amplification.

The issue is that cores are replicated in solr cloud.  To keep the cores in
sync, all of the replicas index the same documents leading to
amplification.  Further amplification occurs when all of the logs get a
copy of the document as well.

Indexing in map-reduce provides considerably higher latency, but it drops
the write amplification dramatically because only one copy of the index
needs to be created.  For MapR, this index is created directly in the dfs,
in vanilla Hadoop you would typically create the index in local disk and
copy to HDFS.  The major expense, however, is the indexing which is nice to
do only once.  Reduce-side indexing has the advantage that your indexing
bandwidth naturally increases with your increasing cluster size.

Deployment of indexes can be done by copying from HDFS, or directly
deploying using NFS from MapR.  Either way, all of the shard replicas
appear under the live SolR.  Obviously, if you copy from HDFS, you have
some issues with making sure that things appear correctly.  One way to deal
with this is to double the copy steps by copying from HDFS and then using a
differential copy to leave files as unchanged as possible. You want to do
that to allow as much of the memory image to stay live as possible.  With
NFS and transactional updates, that isn't an issue, of course.

On the extreme side, you can host all of the searchers on NFS hosted index
shards.  You can have one Solr instance devoted to indexing each shard.
 This will cause the shards to update and each Solr index will detect these
changes after a few seconds.  This gives near real-time search with high
isolation between indexing and search loads.

On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:

> How can you store Solr shards in hadoop? Is each data node running a Solr
> server?  If so - is the reducer doing a trick to write to a local fs?
>
> Sent from my iPad
>
> On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:
>
> Interestingly, a few MapR customers have gone the other way, deliberately
> having the indexer put  the Solr shards directly into MapR and letting it
> distribute it. Has made index-management a cinch.
>
> Otherwise they do run into what Tim alludes to.
>
> On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams < <wi...@gmail.com>
> williamstw@gmail.com> wrote:
>
>> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog < <go...@gmail.com>
>> goksron@gmail.com> wrote:
>> > In the LucidWorks Big Data product, we handle this with a reducer that
>> sends documents to a SolrCloud cluster. This way the index files are not
>> managed by Hadoop.
>>
>> Hi Lance,
>> I'm curious if you've gotten that to work with a decent-sized (e.g. >
>> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
>> from a few months ago at least...
>>
>> Thanks,
>> --tim
>>
>> > ----- Original Message -----
>> > | From: "Ted Dunning" < <td...@maprtech.com>
>> > | To: <us...@hadoop.apache.org>user@hadoop.apache.org
>> > | Cc: "Hadoop User" < <us...@hadoop.apache.org>
>> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
>> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
>> > |
>> > | I prefer to create indexes in the reducer personally.
>> > |
>> > | Also you can avoid the copies if you use an advanced hadoop-derived
>> > | distro. Email me off list for details.
>> > |
>> > | Sent from my iPhone
>> > |
>> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner < <ma...@shmsoft.com>
>> mark.kerzner@shmsoft.com>
>> > | wrote:
>> > |
>> > | > Hi,
>> > | >
>> > | > if I create a Lucene index in each mapper, locally, then copy them
>> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
>> > | > copy them to some Solr machine (perhaps even merging), does such
>> > | > architecture makes sense, to create a searchable index with
>> > | > Hadoop?
>> > | >
>> > | > Are there links for similar architectures and questions?
>> > | >
>> > | > Thank you. Sincerely,
>> > | > Mark
>> > |
>>
>
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

Having solr cloud do the indexing instead of having map-reduce do the
indexing causes substantial write amplification.

The issue is that cores are replicated in solr cloud.  To keep the cores in
sync, all of the replicas index the same documents leading to
amplification.  Further amplification occurs when all of the logs get a
copy of the document as well.

Indexing in map-reduce provides considerably higher latency, but it drops
the write amplification dramatically because only one copy of the index
needs to be created.  For MapR, this index is created directly in the dfs,
in vanilla Hadoop you would typically create the index in local disk and
copy to HDFS.  The major expense, however, is the indexing which is nice to
do only once.  Reduce-side indexing has the advantage that your indexing
bandwidth naturally increases with your increasing cluster size.

Deployment of indexes can be done by copying from HDFS, or directly
deploying using NFS from MapR.  Either way, all of the shard replicas
appear under the live SolR.  Obviously, if you copy from HDFS, you have
some issues with making sure that things appear correctly.  One way to deal
with this is to double the copy steps by copying from HDFS and then using a
differential copy to leave files as unchanged as possible. You want to do
that to allow as much of the memory image to stay live as possible.  With
NFS and transactional updates, that isn't an issue, of course.

On the extreme side, you can host all of the searchers on NFS hosted index
shards.  You can have one Solr instance devoted to indexing each shard.
 This will cause the shards to update and each Solr index will detect these
changes after a few seconds.  This gives near real-time search with high
isolation between indexing and search loads.

On Wed, Oct 10, 2012 at 10:38 PM, JAY <ja...@gmail.com> wrote:

> How can you store Solr shards in hadoop? Is each data node running a Solr
> server?  If so - is the reducer doing a trick to write to a local fs?
>
> Sent from my iPad
>
> On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:
>
> Interestingly, a few MapR customers have gone the other way, deliberately
> having the indexer put  the Solr shards directly into MapR and letting it
> distribute it. Has made index-management a cinch.
>
> Otherwise they do run into what Tim alludes to.
>
> On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams < <wi...@gmail.com>
> williamstw@gmail.com> wrote:
>
>> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog < <go...@gmail.com>
>> goksron@gmail.com> wrote:
>> > In the LucidWorks Big Data product, we handle this with a reducer that
>> sends documents to a SolrCloud cluster. This way the index files are not
>> managed by Hadoop.
>>
>> Hi Lance,
>> I'm curious if you've gotten that to work with a decent-sized (e.g. >
>> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
>> from a few months ago at least...
>>
>> Thanks,
>> --tim
>>
>> > ----- Original Message -----
>> > | From: "Ted Dunning" < <td...@maprtech.com>
>> > | To: <us...@hadoop.apache.org>user@hadoop.apache.org
>> > | Cc: "Hadoop User" < <us...@hadoop.apache.org>
>> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
>> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
>> > |
>> > | I prefer to create indexes in the reducer personally.
>> > |
>> > | Also you can avoid the copies if you use an advanced hadoop-derived
>> > | distro. Email me off list for details.
>> > |
>> > | Sent from my iPhone
>> > |
>> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner < <ma...@shmsoft.com>
>> mark.kerzner@shmsoft.com>
>> > | wrote:
>> > |
>> > | > Hi,
>> > | >
>> > | > if I create a Lucene index in each mapper, locally, then copy them
>> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
>> > | > copy them to some Solr machine (perhaps even merging), does such
>> > | > architecture makes sense, to create a searchable index with
>> > | > Hadoop?
>> > | >
>> > | > Are there links for similar architectures and questions?
>> > | >
>> > | > Thank you. Sincerely,
>> > | > Mark
>> > |
>>
>
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by JAY <ja...@gmail.com>.

How can you store Solr shards in hadoop? Is each data node running a Solr server?  If so - is the reducer doing a trick to write to a local fs?

Sent from my iPad

On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:

> Interestingly, a few MapR customers have gone the other way, deliberately having the indexer put  the Solr shards directly into MapR and letting it distribute it. Has made index-management a cinch.
> 
> Otherwise they do run into what Tim alludes to.
> 
> On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams <wi...@gmail.com> wrote:
> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> > In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.
> 
> Hi Lance,
> I'm curious if you've gotten that to work with a decent-sized (e.g. >
> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
> from a few months ago at least...
> 
> Thanks,
> --tim
> 
> > ----- Original Message -----
> > | From: "Ted Dunning" <td...@maprtech.com>
> > | To: user@hadoop.apache.org
> > | Cc: "Hadoop User" <us...@hadoop.apache.org>
> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> > |
> > | I prefer to create indexes in the reducer personally.
> > |
> > | Also you can avoid the copies if you use an advanced hadoop-derived
> > | distro. Email me off list for details.
> > |
> > | Sent from my iPhone
> > |
> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> > | wrote:
> > |
> > | > Hi,
> > | >
> > | > if I create a Lucene index in each mapper, locally, then copy them
> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> > | > copy them to some Solr machine (perhaps even merging), does such
> > | > architecture makes sense, to create a searchable index with
> > | > Hadoop?
> > | >
> > | > Are there links for similar architectures and questions?
> > | >
> > | > Thank you. Sincerely,
> > | > Mark
> > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by JAY <ja...@gmail.com>.

How can you store Solr shards in hadoop? Is each data node running a Solr server?  If so - is the reducer doing a trick to write to a local fs?

Sent from my iPad

On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:

> Interestingly, a few MapR customers have gone the other way, deliberately having the indexer put  the Solr shards directly into MapR and letting it distribute it. Has made index-management a cinch.
> 
> Otherwise they do run into what Tim alludes to.
> 
> On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams <wi...@gmail.com> wrote:
> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> > In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.
> 
> Hi Lance,
> I'm curious if you've gotten that to work with a decent-sized (e.g. >
> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
> from a few months ago at least...
> 
> Thanks,
> --tim
> 
> > ----- Original Message -----
> > | From: "Ted Dunning" <td...@maprtech.com>
> > | To: user@hadoop.apache.org
> > | Cc: "Hadoop User" <us...@hadoop.apache.org>
> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> > |
> > | I prefer to create indexes in the reducer personally.
> > |
> > | Also you can avoid the copies if you use an advanced hadoop-derived
> > | distro. Email me off list for details.
> > |
> > | Sent from my iPhone
> > |
> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> > | wrote:
> > |
> > | > Hi,
> > | >
> > | > if I create a Lucene index in each mapper, locally, then copy them
> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> > | > copy them to some Solr machine (perhaps even merging), does such
> > | > architecture makes sense, to create a searchable index with
> > | > Hadoop?
> > | >
> > | > Are there links for similar architectures and questions?
> > | >
> > | > Thank you. Sincerely,
> > | > Mark
> > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by JAY <ja...@gmail.com>.

How can you store Solr shards in hadoop? Is each data node running a Solr server?  If so - is the reducer doing a trick to write to a local fs?

Sent from my iPad

On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:

> Interestingly, a few MapR customers have gone the other way, deliberately having the indexer put  the Solr shards directly into MapR and letting it distribute it. Has made index-management a cinch.
> 
> Otherwise they do run into what Tim alludes to.
> 
> On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams <wi...@gmail.com> wrote:
> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> > In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.
> 
> Hi Lance,
> I'm curious if you've gotten that to work with a decent-sized (e.g. >
> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
> from a few months ago at least...
> 
> Thanks,
> --tim
> 
> > ----- Original Message -----
> > | From: "Ted Dunning" <td...@maprtech.com>
> > | To: user@hadoop.apache.org
> > | Cc: "Hadoop User" <us...@hadoop.apache.org>
> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> > |
> > | I prefer to create indexes in the reducer personally.
> > |
> > | Also you can avoid the copies if you use an advanced hadoop-derived
> > | distro. Email me off list for details.
> > |
> > | Sent from my iPhone
> > |
> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> > | wrote:
> > |
> > | > Hi,
> > | >
> > | > if I create a Lucene index in each mapper, locally, then copy them
> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> > | > copy them to some Solr machine (perhaps even merging), does such
> > | > architecture makes sense, to create a searchable index with
> > | > Hadoop?
> > | >
> > | > Are there links for similar architectures and questions?
> > | >
> > | > Thank you. Sincerely,
> > | > Mark
> > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by JAY <ja...@gmail.com>.

How can you store Solr shards in hadoop? Is each data node running a Solr server?  If so - is the reducer doing a trick to write to a local fs?

Sent from my iPad

On Oct 11, 2012, at 12:04 AM, "M. C. Srivas" <mc...@gmail.com> wrote:

> Interestingly, a few MapR customers have gone the other way, deliberately having the indexer put  the Solr shards directly into MapR and letting it distribute it. Has made index-management a cinch.
> 
> Otherwise they do run into what Tim alludes to.
> 
> On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams <wi...@gmail.com> wrote:
> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> > In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.
> 
> Hi Lance,
> I'm curious if you've gotten that to work with a decent-sized (e.g. >
> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
> from a few months ago at least...
> 
> Thanks,
> --tim
> 
> > ----- Original Message -----
> > | From: "Ted Dunning" <td...@maprtech.com>
> > | To: user@hadoop.apache.org
> > | Cc: "Hadoop User" <us...@hadoop.apache.org>
> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> > |
> > | I prefer to create indexes in the reducer personally.
> > |
> > | Also you can avoid the copies if you use an advanced hadoop-derived
> > | distro. Email me off list for details.
> > |
> > | Sent from my iPhone
> > |
> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> > | wrote:
> > |
> > | > Hi,
> > | >
> > | > if I create a Lucene index in each mapper, locally, then copy them
> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> > | > copy them to some Solr machine (perhaps even merging), does such
> > | > architecture makes sense, to create a searchable index with
> > | > Hadoop?
> > | >
> > | > Are there links for similar architectures and questions?
> > | >
> > | > Thank you. Sincerely,
> > | > Mark
> > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by "M. C. Srivas" <mc...@gmail.com>.

Interestingly, a few MapR customers have gone the other way, deliberately
having the indexer put  the Solr shards directly into MapR and letting it
distribute it. Has made index-management a cinch.

Otherwise they do run into what Tim alludes to.

On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams <wi...@gmail.com> wrote:

> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> > In the LucidWorks Big Data product, we handle this with a reducer that
> sends documents to a SolrCloud cluster. This way the index files are not
> managed by Hadoop.
>
> Hi Lance,
> I'm curious if you've gotten that to work with a decent-sized (e.g. >
> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
> from a few months ago at least...
>
> Thanks,
> --tim
>
> > ----- Original Message -----
> > | From: "Ted Dunning" <td...@maprtech.com>
> > | To: user@hadoop.apache.org
> > | Cc: "Hadoop User" <us...@hadoop.apache.org>
> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> > |
> > | I prefer to create indexes in the reducer personally.
> > |
> > | Also you can avoid the copies if you use an advanced hadoop-derived
> > | distro. Email me off list for details.
> > |
> > | Sent from my iPhone
> > |
> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> > | wrote:
> > |
> > | > Hi,
> > | >
> > | > if I create a Lucene index in each mapper, locally, then copy them
> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> > | > copy them to some Solr machine (perhaps even merging), does such
> > | > architecture makes sense, to create a searchable index with
> > | > Hadoop?
> > | >
> > | > Are there links for similar architectures and questions?
> > | >
> > | > Thank you. Sincerely,
> > | > Mark
> > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by "M. C. Srivas" <mc...@gmail.com>.

Interestingly, a few MapR customers have gone the other way, deliberately
having the indexer put  the Solr shards directly into MapR and letting it
distribute it. Has made index-management a cinch.

Otherwise they do run into what Tim alludes to.

On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams <wi...@gmail.com> wrote:

> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> > In the LucidWorks Big Data product, we handle this with a reducer that
> sends documents to a SolrCloud cluster. This way the index files are not
> managed by Hadoop.
>
> Hi Lance,
> I'm curious if you've gotten that to work with a decent-sized (e.g. >
> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
> from a few months ago at least...
>
> Thanks,
> --tim
>
> > ----- Original Message -----
> > | From: "Ted Dunning" <td...@maprtech.com>
> > | To: user@hadoop.apache.org
> > | Cc: "Hadoop User" <us...@hadoop.apache.org>
> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> > |
> > | I prefer to create indexes in the reducer personally.
> > |
> > | Also you can avoid the copies if you use an advanced hadoop-derived
> > | distro. Email me off list for details.
> > |
> > | Sent from my iPhone
> > |
> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> > | wrote:
> > |
> > | > Hi,
> > | >
> > | > if I create a Lucene index in each mapper, locally, then copy them
> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> > | > copy them to some Solr machine (perhaps even merging), does such
> > | > architecture makes sense, to create a searchable index with
> > | > Hadoop?
> > | >
> > | > Are there links for similar architectures and questions?
> > | >
> > | > Thank you. Sincerely,
> > | > Mark
> > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by "M. C. Srivas" <mc...@gmail.com>.

Interestingly, a few MapR customers have gone the other way, deliberately
having the indexer put  the Solr shards directly into MapR and letting it
distribute it. Has made index-management a cinch.

Otherwise they do run into what Tim alludes to.

On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams <wi...@gmail.com> wrote:

> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> > In the LucidWorks Big Data product, we handle this with a reducer that
> sends documents to a SolrCloud cluster. This way the index files are not
> managed by Hadoop.
>
> Hi Lance,
> I'm curious if you've gotten that to work with a decent-sized (e.g. >
> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
> from a few months ago at least...
>
> Thanks,
> --tim
>
> > ----- Original Message -----
> > | From: "Ted Dunning" <td...@maprtech.com>
> > | To: user@hadoop.apache.org
> > | Cc: "Hadoop User" <us...@hadoop.apache.org>
> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> > |
> > | I prefer to create indexes in the reducer personally.
> > |
> > | Also you can avoid the copies if you use an advanced hadoop-derived
> > | distro. Email me off list for details.
> > |
> > | Sent from my iPhone
> > |
> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> > | wrote:
> > |
> > | > Hi,
> > | >
> > | > if I create a Lucene index in each mapper, locally, then copy them
> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> > | > copy them to some Solr machine (perhaps even merging), does such
> > | > architecture makes sense, to create a searchable index with
> > | > Hadoop?
> > | >
> > | > Are there links for similar architectures and questions?
> > | >
> > | > Thank you. Sincerely,
> > | > Mark
> > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by "M. C. Srivas" <mc...@gmail.com>.

Interestingly, a few MapR customers have gone the other way, deliberately
having the indexer put  the Solr shards directly into MapR and letting it
distribute it. Has made index-management a cinch.

Otherwise they do run into what Tim alludes to.

On Wed, Oct 10, 2012 at 7:27 PM, Tim Williams <wi...@gmail.com> wrote:

> On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> > In the LucidWorks Big Data product, we handle this with a reducer that
> sends documents to a SolrCloud cluster. This way the index files are not
> managed by Hadoop.
>
> Hi Lance,
> I'm curious if you've gotten that to work with a decent-sized (e.g. >
> 250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
> from a few months ago at least...
>
> Thanks,
> --tim
>
> > ----- Original Message -----
> > | From: "Ted Dunning" <td...@maprtech.com>
> > | To: user@hadoop.apache.org
> > | Cc: "Hadoop User" <us...@hadoop.apache.org>
> > | Sent: Wednesday, October 10, 2012 7:58:57 AM
> > | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> > |
> > | I prefer to create indexes in the reducer personally.
> > |
> > | Also you can avoid the copies if you use an advanced hadoop-derived
> > | distro. Email me off list for details.
> > |
> > | Sent from my iPhone
> > |
> > | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> > | wrote:
> > |
> > | > Hi,
> > | >
> > | > if I create a Lucene index in each mapper, locally, then copy them
> > | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> > | > copy them to some Solr machine (perhaps even merging), does such
> > | > architecture makes sense, to create a searchable index with
> > | > Hadoop?
> > | >
> > | > Are there links for similar architectures and questions?
> > | >
> > | > Thank you. Sincerely,
> > | > Mark
> > |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Tim Williams <wi...@gmail.com>.

On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.

Hi Lance,
I'm curious if you've gotten that to work with a decent-sized (e.g. >
250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
from a few months ago at least...

Thanks,
--tim

> ----- Original Message -----
> | From: "Ted Dunning" <td...@maprtech.com>
> | To: user@hadoop.apache.org
> | Cc: "Hadoop User" <us...@hadoop.apache.org>
> | Sent: Wednesday, October 10, 2012 7:58:57 AM
> | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> |
> | I prefer to create indexes in the reducer personally.
> |
> | Also you can avoid the copies if you use an advanced hadoop-derived
> | distro. Email me off list for details.
> |
> | Sent from my iPhone
> |
> | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> | wrote:
> |
> | > Hi,
> | >
> | > if I create a Lucene index in each mapper, locally, then copy them
> | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> | > copy them to some Solr machine (perhaps even merging), does such
> | > architecture makes sense, to create a searchable index with
> | > Hadoop?
> | >
> | > Are there links for similar architectures and questions?
> | >
> | > Thank you. Sincerely,
> | > Mark
> |

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Tim Williams <wi...@gmail.com>.

On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.

Hi Lance,
I'm curious if you've gotten that to work with a decent-sized (e.g. >
250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
from a few months ago at least...

Thanks,
--tim

> ----- Original Message -----
> | From: "Ted Dunning" <td...@maprtech.com>
> | To: user@hadoop.apache.org
> | Cc: "Hadoop User" <us...@hadoop.apache.org>
> | Sent: Wednesday, October 10, 2012 7:58:57 AM
> | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> |
> | I prefer to create indexes in the reducer personally.
> |
> | Also you can avoid the copies if you use an advanced hadoop-derived
> | distro. Email me off list for details.
> |
> | Sent from my iPhone
> |
> | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> | wrote:
> |
> | > Hi,
> | >
> | > if I create a Lucene index in each mapper, locally, then copy them
> | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> | > copy them to some Solr machine (perhaps even merging), does such
> | > architecture makes sense, to create a searchable index with
> | > Hadoop?
> | >
> | > Are there links for similar architectures and questions?
> | >
> | > Thank you. Sincerely,
> | > Mark
> |

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Mark Kerzner <ma...@shmsoft.com>.

That is very interesting, Lance, thank you.

Mark

On Wed, Oct 10, 2012 at 9:15 PM, Lance Norskog <go...@gmail.com> wrote:

> In the LucidWorks Big Data product, we handle this with a reducer that
> sends documents to a SolrCloud cluster. This way the index files are not
> managed by Hadoop.
>
> ----- Original Message -----
> | From: "Ted Dunning" <td...@maprtech.com>
> | To: user@hadoop.apache.org
> | Cc: "Hadoop User" <us...@hadoop.apache.org>
> | Sent: Wednesday, October 10, 2012 7:58:57 AM
> | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> |
> | I prefer to create indexes in the reducer personally.
> |
> | Also you can avoid the copies if you use an advanced hadoop-derived
> | distro. Email me off list for details.
> |
> | Sent from my iPhone
> |
> | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> | wrote:
> |
> | > Hi,
> | >
> | > if I create a Lucene index in each mapper, locally, then copy them
> | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> | > copy them to some Solr machine (perhaps even merging), does such
> | > architecture makes sense, to create a searchable index with
> | > Hadoop?
> | >
> | > Are there links for similar architectures and questions?
> | >
> | > Thank you. Sincerely,
> | > Mark
> |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Mark Kerzner <ma...@shmsoft.com>.

That is very interesting, Lance, thank you.

Mark

On Wed, Oct 10, 2012 at 9:15 PM, Lance Norskog <go...@gmail.com> wrote:

> In the LucidWorks Big Data product, we handle this with a reducer that
> sends documents to a SolrCloud cluster. This way the index files are not
> managed by Hadoop.
>
> ----- Original Message -----
> | From: "Ted Dunning" <td...@maprtech.com>
> | To: user@hadoop.apache.org
> | Cc: "Hadoop User" <us...@hadoop.apache.org>
> | Sent: Wednesday, October 10, 2012 7:58:57 AM
> | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> |
> | I prefer to create indexes in the reducer personally.
> |
> | Also you can avoid the copies if you use an advanced hadoop-derived
> | distro. Email me off list for details.
> |
> | Sent from my iPhone
> |
> | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> | wrote:
> |
> | > Hi,
> | >
> | > if I create a Lucene index in each mapper, locally, then copy them
> | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> | > copy them to some Solr machine (perhaps even merging), does such
> | > architecture makes sense, to create a searchable index with
> | > Hadoop?
> | >
> | > Are there links for similar architectures and questions?
> | >
> | > Thank you. Sincerely,
> | > Mark
> |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Mark Kerzner <ma...@shmsoft.com>.

That is very interesting, Lance, thank you.

Mark

On Wed, Oct 10, 2012 at 9:15 PM, Lance Norskog <go...@gmail.com> wrote:

> In the LucidWorks Big Data product, we handle this with a reducer that
> sends documents to a SolrCloud cluster. This way the index files are not
> managed by Hadoop.
>
> ----- Original Message -----
> | From: "Ted Dunning" <td...@maprtech.com>
> | To: user@hadoop.apache.org
> | Cc: "Hadoop User" <us...@hadoop.apache.org>
> | Sent: Wednesday, October 10, 2012 7:58:57 AM
> | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> |
> | I prefer to create indexes in the reducer personally.
> |
> | Also you can avoid the copies if you use an advanced hadoop-derived
> | distro. Email me off list for details.
> |
> | Sent from my iPhone
> |
> | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> | wrote:
> |
> | > Hi,
> | >
> | > if I create a Lucene index in each mapper, locally, then copy them
> | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> | > copy them to some Solr machine (perhaps even merging), does such
> | > architecture makes sense, to create a searchable index with
> | > Hadoop?
> | >
> | > Are there links for similar architectures and questions?
> | >
> | > Thank you. Sincerely,
> | > Mark
> |
>

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Tim Williams <wi...@gmail.com>.

On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.

Hi Lance,
I'm curious if you've gotten that to work with a decent-sized (e.g. >
250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
from a few months ago at least...

Thanks,
--tim

> ----- Original Message -----
> | From: "Ted Dunning" <td...@maprtech.com>
> | To: user@hadoop.apache.org
> | Cc: "Hadoop User" <us...@hadoop.apache.org>
> | Sent: Wednesday, October 10, 2012 7:58:57 AM
> | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> |
> | I prefer to create indexes in the reducer personally.
> |
> | Also you can avoid the copies if you use an advanced hadoop-derived
> | distro. Email me off list for details.
> |
> | Sent from my iPhone
> |
> | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> | wrote:
> |
> | > Hi,
> | >
> | > if I create a Lucene index in each mapper, locally, then copy them
> | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> | > copy them to some Solr machine (perhaps even merging), does such
> | > architecture makes sense, to create a searchable index with
> | > Hadoop?
> | >
> | > Are there links for similar architectures and questions?
> | >
> | > Thank you. Sincerely,
> | > Mark
> |

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Tim Williams <wi...@gmail.com>.

On Wed, Oct 10, 2012 at 10:15 PM, Lance Norskog <go...@gmail.com> wrote:
> In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.

Hi Lance,
I'm curious if you've gotten that to work with a decent-sized (e.g. >
250 node) cluster?  Even a trivial cluster seems to crush SolrCloud
from a few months ago at least...

Thanks,
--tim

> ----- Original Message -----
> | From: "Ted Dunning" <td...@maprtech.com>
> | To: user@hadoop.apache.org
> | Cc: "Hadoop User" <us...@hadoop.apache.org>
> | Sent: Wednesday, October 10, 2012 7:58:57 AM
> | Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
> |
> | I prefer to create indexes in the reducer personally.
> |
> | Also you can avoid the copies if you use an advanced hadoop-derived
> | distro. Email me off list for details.
> |
> | Sent from my iPhone
> |
> | On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
> | wrote:
> |
> | > Hi,
> | >
> | > if I create a Lucene index in each mapper, locally, then copy them
> | > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
> | > copy them to some Solr machine (perhaps even merging), does such
> | > architecture makes sense, to create a searchable index with
> | > Hadoop?
> | >
> | > Are there links for similar architectures and questions?
> | >
> | > Thank you. Sincerely,
> | > Mark
> |

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Lance Norskog <go...@gmail.com>.

In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.

----- Original Message -----
| From: "Ted Dunning" <td...@maprtech.com>
| To: user@hadoop.apache.org
| Cc: "Hadoop User" <us...@hadoop.apache.org>
| Sent: Wednesday, October 10, 2012 7:58:57 AM
| Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
| 
| I prefer to create indexes in the reducer personally.
| 
| Also you can avoid the copies if you use an advanced hadoop-derived
| distro. Email me off list for details.
| 
| Sent from my iPhone
| 
| On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
| wrote:
| 
| > Hi,
| > 
| > if I create a Lucene index in each mapper, locally, then copy them
| > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
| > copy them to some Solr machine (perhaps even merging), does such
| > architecture makes sense, to create a searchable index with
| > Hadoop?
| > 
| > Are there links for similar architectures and questions?
| > 
| > Thank you. Sincerely,
| > Mark
|

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Lance Norskog <go...@gmail.com>.

In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.

----- Original Message -----
| From: "Ted Dunning" <td...@maprtech.com>
| To: user@hadoop.apache.org
| Cc: "Hadoop User" <us...@hadoop.apache.org>
| Sent: Wednesday, October 10, 2012 7:58:57 AM
| Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
| 
| I prefer to create indexes in the reducer personally.
| 
| Also you can avoid the copies if you use an advanced hadoop-derived
| distro. Email me off list for details.
| 
| Sent from my iPhone
| 
| On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
| wrote:
| 
| > Hi,
| > 
| > if I create a Lucene index in each mapper, locally, then copy them
| > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
| > copy them to some Solr machine (perhaps even merging), does such
| > architecture makes sense, to create a searchable index with
| > Hadoop?
| > 
| > Are there links for similar architectures and questions?
| > 
| > Thank you. Sincerely,
| > Mark
|

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Lance Norskog <go...@gmail.com>.

In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.

----- Original Message -----
| From: "Ted Dunning" <td...@maprtech.com>
| To: user@hadoop.apache.org
| Cc: "Hadoop User" <us...@hadoop.apache.org>
| Sent: Wednesday, October 10, 2012 7:58:57 AM
| Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
| 
| I prefer to create indexes in the reducer personally.
| 
| Also you can avoid the copies if you use an advanced hadoop-derived
| distro. Email me off list for details.
| 
| Sent from my iPhone
| 
| On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
| wrote:
| 
| > Hi,
| > 
| > if I create a Lucene index in each mapper, locally, then copy them
| > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
| > copy them to some Solr machine (perhaps even merging), does such
| > architecture makes sense, to create a searchable index with
| > Hadoop?
| > 
| > Are there links for similar architectures and questions?
| > 
| > Thank you. Sincerely,
| > Mark
|

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Lance Norskog <go...@gmail.com>.

In the LucidWorks Big Data product, we handle this with a reducer that sends documents to a SolrCloud cluster. This way the index files are not managed by Hadoop.

----- Original Message -----
| From: "Ted Dunning" <td...@maprtech.com>
| To: user@hadoop.apache.org
| Cc: "Hadoop User" <us...@hadoop.apache.org>
| Sent: Wednesday, October 10, 2012 7:58:57 AM
| Subject: Re: Hadoop/Lucene + Solr architecture suggestions?
| 
| I prefer to create indexes in the reducer personally.
| 
| Also you can avoid the copies if you use an advanced hadoop-derived
| distro. Email me off list for details.
| 
| Sent from my iPhone
| 
| On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com>
| wrote:
| 
| > Hi,
| > 
| > if I create a Lucene index in each mapper, locally, then copy them
| > to under /jobid/mapid1, /jodid/mapid2, and then in the reducers
| > copy them to some Solr machine (perhaps even merging), does such
| > architecture makes sense, to create a searchable index with
| > Hadoop?
| > 
| > Are there links for similar architectures and questions?
| > 
| > Thank you. Sincerely,
| > Mark
|

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

I prefer to create indexes in the reducer personally. 

Also you can avoid the copies if you use an advanced hadoop-derived distro. Email me off list for details. 

Sent from my iPhone

On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com> wrote:

> Hi,
> 
> if I create a Lucene index in each mapper, locally, then copy them to under /jobid/mapid1, /jodid/mapid2, and then in the reducers copy them to some Solr machine (perhaps even merging), does such architecture makes sense, to create a searchable index with Hadoop?
> 
> Are there links for similar architectures and questions?
> 
> Thank you. Sincerely,
> Mark

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ivan Frain <iv...@gmail.com>.

Hi Mark,

I don't know Lucene/Solr very well but your question made me remember the
lily project: http://www.lilyproject.org/lily/index.html. They use
hadoop/hbase and solr to provide a searchable data management platform.

Maybe you will find ideas in their documentation.

BR,
Ivan

2012/10/10 Mark Kerzner <ma...@shmsoft.com>

> Hi,
>
> if I create a Lucene index in each mapper, locally, then copy them to
> under /jobid/mapid1, /jodid/mapid2, and then in the reducers copy them to
> some Solr machine (perhaps even merging), does such architecture makes
> sense, to create a searchable index with Hadoop?
>
> Are there links for similar architectures and questions?
>
> Thank you. Sincerely,
> Mark
>



-- 
Ivan Frain
11, route de Grenade
31530 Saint-Paul-sur-Save
mobile: +33 (0)6 52 52 47 07

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

I prefer to create indexes in the reducer personally. 

Also you can avoid the copies if you use an advanced hadoop-derived distro. Email me off list for details. 

Sent from my iPhone

On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com> wrote:

> Hi,
> 
> if I create a Lucene index in each mapper, locally, then copy them to under /jobid/mapid1, /jodid/mapid2, and then in the reducers copy them to some Solr machine (perhaps even merging), does such architecture makes sense, to create a searchable index with Hadoop?
> 
> Are there links for similar architectures and questions?
> 
> Thank you. Sincerely,
> Mark

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ivan Frain <iv...@gmail.com>.

Hi Mark,

I don't know Lucene/Solr very well but your question made me remember the
lily project: http://www.lilyproject.org/lily/index.html. They use
hadoop/hbase and solr to provide a searchable data management platform.

Maybe you will find ideas in their documentation.

BR,
Ivan

2012/10/10 Mark Kerzner <ma...@shmsoft.com>

> Hi,
>
> if I create a Lucene index in each mapper, locally, then copy them to
> under /jobid/mapid1, /jodid/mapid2, and then in the reducers copy them to
> some Solr machine (perhaps even merging), does such architecture makes
> sense, to create a searchable index with Hadoop?
>
> Are there links for similar architectures and questions?
>
> Thank you. Sincerely,
> Mark
>



-- 
Ivan Frain
11, route de Grenade
31530 Saint-Paul-sur-Save
mobile: +33 (0)6 52 52 47 07

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

I prefer to create indexes in the reducer personally. 

Also you can avoid the copies if you use an advanced hadoop-derived distro. Email me off list for details. 

Sent from my iPhone

On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com> wrote:

> Hi,
> 
> if I create a Lucene index in each mapper, locally, then copy them to under /jobid/mapid1, /jodid/mapid2, and then in the reducers copy them to some Solr machine (perhaps even merging), does such architecture makes sense, to create a searchable index with Hadoop?
> 
> Are there links for similar architectures and questions?
> 
> Thank you. Sincerely,
> Mark

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ted Dunning <td...@maprtech.com>.

I prefer to create indexes in the reducer personally. 

Also you can avoid the copies if you use an advanced hadoop-derived distro. Email me off list for details. 

Sent from my iPhone

On Oct 9, 2012, at 7:47 PM, Mark Kerzner <ma...@shmsoft.com> wrote:

> Hi,
> 
> if I create a Lucene index in each mapper, locally, then copy them to under /jobid/mapid1, /jodid/mapid2, and then in the reducers copy them to some Solr machine (perhaps even merging), does such architecture makes sense, to create a searchable index with Hadoop?
> 
> Are there links for similar architectures and questions?
> 
> Thank you. Sincerely,
> Mark

Re: Hadoop/Lucene + Solr architecture suggestions?

Posted by Ivan Frain <iv...@gmail.com>.

Hi Mark,

I don't know Lucene/Solr very well but your question made me remember the
lily project: http://www.lilyproject.org/lily/index.html. They use
hadoop/hbase and solr to provide a searchable data management platform.

Maybe you will find ideas in their documentation.

BR,
Ivan

2012/10/10 Mark Kerzner <ma...@shmsoft.com>

> Hi,
>
> if I create a Lucene index in each mapper, locally, then copy them to
> under /jobid/mapid1, /jodid/mapid2, and then in the reducers copy them to
> some Solr machine (perhaps even merging), does such architecture makes
> sense, to create a searchable index with Hadoop?
>
> Are there links for similar architectures and questions?
>
> Thank you. Sincerely,
> Mark
>



-- 
Ivan Frain
11, route de Grenade
31530 Saint-Paul-sur-Save
mobile: +33 (0)6 52 52 47 07