You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Estrada Groups <es...@gmail.com> on 2011/04/12 06:55:23 UTC

Indexing Flickr and Panaramio

Has anyone tried doing this? Got any tips for someone getting started?

Thanks,
Adam

Sent from my iPhone

Re: Vetting Our Architecture: 2 Repeaters and Slaves.

Posted by Lance Norskog <go...@gmail.com>.

SAN vendors make high-priced super-fast shared file system hardware.
They don't use NFS, usually they have a kernel drop-in file system.


On 4/14/11, Parker Johnson <Pa...@gap.com> wrote:
>
> Otis and Erick,
>
> Thanks for the responses and for thinking over my potential scenarios.
>
> The big draw for me on 2 repeaters idea is that I can:
>
> 1. Maximize my hardware.  I don't need a standby master.  Instead, I can
> use the "second" repeater to field customer requests.
> 2. After the primary repeater failure, I neither need to fumble with
> multiple solconfig.xml edits (we're also using cores) or worry about
> manually replicating or copying indexes around.
>
> In a sense, although, perhaps not by design, a repeater solves those
> problems.
>
> We considered centralized storage and a standby master with access to
> shared filesystem, but what are you using for a shared filesystem? (NFS?
> Egh...)
>
> -Parker
>
> On 4/12/11 6:19 PM, "Erick Erickson" <er...@gmail.com> wrote:
>
>>I think the repeaters are misleading you a bit here. The purpose of a
>>repeater is
>>usually to replicate across a slow network, say in a remote data
>>center, then slaves at that center can get more timely updates. I don't
>>think
>>they add anything to your disaster recovery scenario.
>>
>>So I'll ignore repeaters for a bit here. The only difference between a
>>master
>>and a slave is a bit of configuration, and usually you'll allocate, say,
>>memory
>>differently on the two machines when you start the JVM. You might disable
>>caches on the master (since they're used for searching). You may......
>>
>>Let's say
>>I have master M, and slaves S1, S2, S3. The slaves have an
>>up-to-date index as of the last replication (just like your repeater
>>would have). If any slave goes down, you can simply bring up another
>>machine as a slave, point it at your master, wait for replication on that
>>slave and then let your load balancer know it's there. This is the
>>HOST2-4 failure you outlined....
>>
>>Should the master fail you have two choices,
>>depending upon how long you can wait for *new* content to be searchable.
>>Let's say you can wait half a day in this situation. Spin up a new
>>machine,
>>copy the index over from one of the slaves (via a simple copy or by
>>replicating). Point your indexing process at the master, point your slaves
>>at the master for replication and you're done.
>>
>>Let's say you can't wait very long at all (and remember this better be
>>quite
>>a rare
>>event). Then you could take a slave (let's say S1) it out of the loop that
>>serves
>>searches. Copy in the configuration files you use for your
>>masters to it, point the indexer and searchers at it and you're done.
>>Now spin up a new slave as above and your old configuration is back.
>>
>>Note that in two of these cases, you temporarily have 2 slaves doing the
>>work
>>that 3 used to, so a bit of over-capacity may be in order.
>>
>>But a really good question here is how to be sure all your data is in your
>>index.
>>After all, the slaves (and repeater for that matter) are only current up
>>to
>>the last
>>replication. The simplest thing to do is simply re-index everything from
>>the
>>last
>>known commit point. Assuming you have a <uniqueKey> defined, if you index
>>documents that are already in the index, they'll just be replaced, no harm
>>done.
>>So let's say your replication interval is 10 minutes (picking a number
>>from
>>thin
>>air). When your system is back and you restart your indexer, restart
>>indexing from,
>>say, the time you noticed your master went down - 1 hour as the restart
>>point for
>>your indexer. You can be more deterministic than this by examining the log
>>on
>>the machine you're using to replace the master with and noting the last
>>replication
>>time and subtract your hour (or whatever) from that.
>>
>>Anyway, hope I haven't confused you unduly! The take-away is that a that
>>a
>>slave can be made into a master as fast as a repeater can, the replication
>>process is the same and I just don't see what a repeater buys you in the
>>scenario you described.
>>
>>Best
>>Erick
>>
>>
>>On Tue, Apr 12, 2011 at 6:33 PM, Parker Johnson
>><Pa...@gap.com>wrote:
>>
>>>
>>>
>>> I am hoping to get some feedback on the architecture I've been planning
>>> for a medium to high volume site.  This is my first time working
>>> with Solr, so I want to be sure what I'm planning isn't totally weird,
>>> unsupported, etc.
>>>
>>> We've got a a pair of F5 loadbalancers and 4 hosts.  2 of those hosts
>>>will
>>> be repeaters (master+slave), and 2 of those hosts will be pure slaves.
>>>One
>>> of the F5 vips, "Index-vip" will have members HOST1 and HOST2, but HOST2
>>> will be "downed" and not taking traffic from that vip.  The second vip,
>>> "Search-vip" will have 3 members: HOST2, HOST3, and HOST4.  The
>>> "Index-vip" is intended to be used to post and commit index changes.
>>>The
>>> "Search-vip" is intended to be customer facing.
>>>
>>> Here is some ASCII art.  The line with the "X"'s thru it denotes a
>>> "downed" member of a vip, one that isn't taking any traffic.  The "M:"
>>> denotes the value in the solrconfig.xml that the host uses as the
>>>master.
>>>
>>>
>>>              Index-vip         Search-vip
>>>                 / \             /   |   \
>>>                /   X           /    |    \
>>>               /     \         /     |     \
>>>              /       X       /      |      \
>>>             /         \     /       |       \
>>>            /           X   /        |        \
>>>           /             \ /         |         \
>>>         HOST1          HOST2      HOST3      HOST4
>>>       REPEATER        REPEATER    SLAVE      SLAVE
>>>      M:Index-vip    M:Index-vip M:Index-vip  M:Index-vip
>>>
>>>
>>> I've been working through a couple failure scenarios.  Recovering from a
>>> failure of HOST2, HOST3, or HOST4 is pretty straightforward.  Loosing
>>> HOST1 is my major concern.  My plan for recovering from a failure of
>>>HOST1
>>> is as follows: Enable HOST2 as a member of the Index-vip, while
>>>disabling
>>> member HOST1.  HOST2 effectively becomes the Master.  HOST2, 3, and 4
>>> continue fielding customer requests and pulling indexes from
>>>"Index-vip."
>>> Since HOST2 is now in charge of crunching indexes and fielding customer
>>> requests, I assume load will increase on that box.
>>>
>>> When we recover HOST1, we will simply make sure it has replicated
>>>against
>>> "Index-vip" and then re-enable HOST1 as a member of the Index-vip and
>>> disable HOST2.
>>>
>>> Hopefully this makes sense.  If all goes correctly, I've managed to keep
>>> all services up and running without loosing any index data.
>>>
>>> So, I have a few questions:
>>>
>>> 1. Has anyone else tried this dual repeater approach?
>>> 2. Am I going to have any semaphore/blocking issues if a repeater is
>>> pulling index data from itself?
>>> 3. Is there a better way to do this?
>>>
>>>
>>> Thanks,
>>> Parker
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>
>


-- 
Lance Norskog
goksron@gmail.com

Master/Dormant Master and NFS. SimpleFSLockFactory?

Posted by Parker Johnson <Pa...@gap.com>.

I am trying to run through a few failure scenarios using a dual master
approach using NFS as a shared storage solution to hold the Master's
indexes.  My goal is to be able to bring up a secondary master in the case
that the primary master fails.  I have several slaves using replication to
pull indexes from the master.

I am NOT trying to do an active/active master.  I will be failing traffic
over from master to dormant master using a F5 vip.  But it does beg the
question...has anyone here done an active/active master with shared
storage?

So assuming for a moment I am doing active/dormant:

>From a few quick google searches it looks like I need to configure both
master's to use the SimpleFSLockFactory and to set "unlockOnStartup" to
true in solconfig.xml.  For those that have done this before, are there
any other settings I should be aware of?  What are the downsides to the
SimpleFSLockFactory?  Are most folks here keeping solr up and running on
both hosts at the same time, or rather just starting solr manually on the
dormant host once the primary dies?

Thanks,
Parker

Re: Vetting Our Architecture: 2 Repeaters and Slaves.

Posted by Parker Johnson <Pa...@gap.com>.

Otis and Erick,

Thanks for the responses and for thinking over my potential scenarios.

The big draw for me on 2 repeaters idea is that I can:

1. Maximize my hardware.  I don't need a standby master.  Instead, I can
use the "second" repeater to field customer requests.
2. After the primary repeater failure, I neither need to fumble with
multiple solconfig.xml edits (we're also using cores) or worry about
manually replicating or copying indexes around.

In a sense, although, perhaps not by design, a repeater solves those
problems.

We considered centralized storage and a standby master with access to
shared filesystem, but what are you using for a shared filesystem? (NFS?
Egh...)

-Parker

On 4/12/11 6:19 PM, "Erick Erickson" <er...@gmail.com> wrote:

>I think the repeaters are misleading you a bit here. The purpose of a
>repeater is
>usually to replicate across a slow network, say in a remote data
>center, then slaves at that center can get more timely updates. I don't
>think
>they add anything to your disaster recovery scenario.
>
>So I'll ignore repeaters for a bit here. The only difference between a
>master
>and a slave is a bit of configuration, and usually you'll allocate, say,
>memory
>differently on the two machines when you start the JVM. You might disable
>caches on the master (since they're used for searching). You may......
>
>Let's say
>I have master M, and slaves S1, S2, S3. The slaves have an
>up-to-date index as of the last replication (just like your repeater
>would have). If any slave goes down, you can simply bring up another
>machine as a slave, point it at your master, wait for replication on that
>slave and then let your load balancer know it's there. This is the
>HOST2-4 failure you outlined....
>
>Should the master fail you have two choices,
>depending upon how long you can wait for *new* content to be searchable.
>Let's say you can wait half a day in this situation. Spin up a new
>machine,
>copy the index over from one of the slaves (via a simple copy or by
>replicating). Point your indexing process at the master, point your slaves
>at the master for replication and you're done.
>
>Let's say you can't wait very long at all (and remember this better be
>quite
>a rare
>event). Then you could take a slave (let's say S1) it out of the loop that
>serves
>searches. Copy in the configuration files you use for your
>masters to it, point the indexer and searchers at it and you're done.
>Now spin up a new slave as above and your old configuration is back.
>
>Note that in two of these cases, you temporarily have 2 slaves doing the
>work
>that 3 used to, so a bit of over-capacity may be in order.
>
>But a really good question here is how to be sure all your data is in your
>index.
>After all, the slaves (and repeater for that matter) are only current up
>to
>the last
>replication. The simplest thing to do is simply re-index everything from
>the
>last
>known commit point. Assuming you have a <uniqueKey> defined, if you index
>documents that are already in the index, they'll just be replaced, no harm
>done.
>So let's say your replication interval is 10 minutes (picking a number
>from
>thin
>air). When your system is back and you restart your indexer, restart
>indexing from,
>say, the time you noticed your master went down - 1 hour as the restart
>point for
>your indexer. You can be more deterministic than this by examining the log
>on
>the machine you're using to replace the master with and noting the last
>replication
>time and subtract your hour (or whatever) from that.
>
>Anyway, hope I haven't confused you unduly! The take-away is that a that
>a
>slave can be made into a master as fast as a repeater can, the replication
>process is the same and I just don't see what a repeater buys you in the
>scenario you described.
>
>Best
>Erick
>
>
>On Tue, Apr 12, 2011 at 6:33 PM, Parker Johnson
><Pa...@gap.com>wrote:
>
>>
>>
>> I am hoping to get some feedback on the architecture I've been planning
>> for a medium to high volume site.  This is my first time working
>> with Solr, so I want to be sure what I'm planning isn't totally weird,
>> unsupported, etc.
>>
>> We've got a a pair of F5 loadbalancers and 4 hosts.  2 of those hosts
>>will
>> be repeaters (master+slave), and 2 of those hosts will be pure slaves.
>>One
>> of the F5 vips, "Index-vip" will have members HOST1 and HOST2, but HOST2
>> will be "downed" and not taking traffic from that vip.  The second vip,
>> "Search-vip" will have 3 members: HOST2, HOST3, and HOST4.  The
>> "Index-vip" is intended to be used to post and commit index changes.
>>The
>> "Search-vip" is intended to be customer facing.
>>
>> Here is some ASCII art.  The line with the "X"'s thru it denotes a
>> "downed" member of a vip, one that isn't taking any traffic.  The "M:"
>> denotes the value in the solrconfig.xml that the host uses as the
>>master.
>>
>>
>>              Index-vip         Search-vip
>>                 / \             /   |   \
>>                /   X           /    |    \
>>               /     \         /     |     \
>>              /       X       /      |      \
>>             /         \     /       |       \
>>            /           X   /        |        \
>>           /             \ /         |         \
>>         HOST1          HOST2      HOST3      HOST4
>>       REPEATER        REPEATER    SLAVE      SLAVE
>>      M:Index-vip    M:Index-vip M:Index-vip  M:Index-vip
>>
>>
>> I've been working through a couple failure scenarios.  Recovering from a
>> failure of HOST2, HOST3, or HOST4 is pretty straightforward.  Loosing
>> HOST1 is my major concern.  My plan for recovering from a failure of
>>HOST1
>> is as follows: Enable HOST2 as a member of the Index-vip, while
>>disabling
>> member HOST1.  HOST2 effectively becomes the Master.  HOST2, 3, and 4
>> continue fielding customer requests and pulling indexes from
>>"Index-vip."
>> Since HOST2 is now in charge of crunching indexes and fielding customer
>> requests, I assume load will increase on that box.
>>
>> When we recover HOST1, we will simply make sure it has replicated
>>against
>> "Index-vip" and then re-enable HOST1 as a member of the Index-vip and
>> disable HOST2.
>>
>> Hopefully this makes sense.  If all goes correctly, I've managed to keep
>> all services up and running without loosing any index data.
>>
>> So, I have a few questions:
>>
>> 1. Has anyone else tried this dual repeater approach?
>> 2. Am I going to have any semaphore/blocking issues if a repeater is
>> pulling index data from itself?
>> 3. Is there a better way to do this?
>>
>>
>> Thanks,
>> Parker
>>
>>
>>
>>
>>
>>
>>

Re: Vetting Our Architecture: 2 Repeaters and Slaves.

Posted by Erick Erickson <er...@gmail.com>.

I think the repeaters are misleading you a bit here. The purpose of a
repeater is
usually to replicate across a slow network, say in a remote data
center, then slaves at that center can get more timely updates. I don't
think
they add anything to your disaster recovery scenario.

So I'll ignore repeaters for a bit here. The only difference between a
master
and a slave is a bit of configuration, and usually you'll allocate, say,
memory
differently on the two machines when you start the JVM. You might disable
caches on the master (since they're used for searching). You may......

Let's say
I have master M, and slaves S1, S2, S3. The slaves have an
up-to-date index as of the last replication (just like your repeater
would have). If any slave goes down, you can simply bring up another
machine as a slave, point it at your master, wait for replication on that
slave and then let your load balancer know it's there. This is the
HOST2-4 failure you outlined....

Should the master fail you have two choices,
depending upon how long you can wait for *new* content to be searchable.
Let's say you can wait half a day in this situation. Spin up a new machine,
copy the index over from one of the slaves (via a simple copy or by
replicating). Point your indexing process at the master, point your slaves
at the master for replication and you're done.

Let's say you can't wait very long at all (and remember this better be quite
a rare
event). Then you could take a slave (let's say S1) it out of the loop that
serves
searches. Copy in the configuration files you use for your
masters to it, point the indexer and searchers at it and you're done.
Now spin up a new slave as above and your old configuration is back.

Note that in two of these cases, you temporarily have 2 slaves doing the
work
that 3 used to, so a bit of over-capacity may be in order.

But a really good question here is how to be sure all your data is in your
index.
After all, the slaves (and repeater for that matter) are only current up to
the last
replication. The simplest thing to do is simply re-index everything from the
last
known commit point. Assuming you have a <uniqueKey> defined, if you index
documents that are already in the index, they'll just be replaced, no harm
done.
So let's say your replication interval is 10 minutes (picking a number from
thin
air). When your system is back and you restart your indexer, restart
indexing from,
say, the time you noticed your master went down - 1 hour as the restart
point for
your indexer. You can be more deterministic than this by examining the log
on
the machine you're using to replace the master with and noting the last
replication
time and subtract your hour (or whatever) from that.

Anyway, hope I haven't confused you unduly! The take-away is that a that  a
slave can be made into a master as fast as a repeater can, the replication
process is the same and I just don't see what a repeater buys you in the
scenario you described.

Best
Erick

On Tue, Apr 12, 2011 at 6:33 PM, Parker Johnson <Pa...@gap.com>wrote:

>
>
> I am hoping to get some feedback on the architecture I've been planning
> for a medium to high volume site.  This is my first time working
> with Solr, so I want to be sure what I'm planning isn't totally weird,
> unsupported, etc.
>
> We've got a a pair of F5 loadbalancers and 4 hosts.  2 of those hosts will
> be repeaters (master+slave), and 2 of those hosts will be pure slaves. One
> of the F5 vips, "Index-vip" will have members HOST1 and HOST2, but HOST2
> will be "downed" and not taking traffic from that vip.  The second vip,
> "Search-vip" will have 3 members: HOST2, HOST3, and HOST4.  The
> "Index-vip" is intended to be used to post and commit index changes.  The
> "Search-vip" is intended to be customer facing.
>
> Here is some ASCII art.  The line with the "X"'s thru it denotes a
> "downed" member of a vip, one that isn't taking any traffic.  The "M:"
> denotes the value in the solrconfig.xml that the host uses as the master.
>
>
>              Index-vip         Search-vip
>                 / \             /   |   \
>                /   X           /    |    \
>               /     \         /     |     \
>              /       X       /      |      \
>             /         \     /       |       \
>            /           X   /        |        \
>           /             \ /         |         \
>         HOST1          HOST2      HOST3      HOST4
>       REPEATER        REPEATER    SLAVE      SLAVE
>      M:Index-vip    M:Index-vip M:Index-vip  M:Index-vip
>
>
> I've been working through a couple failure scenarios.  Recovering from a
> failure of HOST2, HOST3, or HOST4 is pretty straightforward.  Loosing
> HOST1 is my major concern.  My plan for recovering from a failure of HOST1
> is as follows: Enable HOST2 as a member of the Index-vip, while disabling
> member HOST1.  HOST2 effectively becomes the Master.  HOST2, 3, and 4
> continue fielding customer requests and pulling indexes from "Index-vip."
> Since HOST2 is now in charge of crunching indexes and fielding customer
> requests, I assume load will increase on that box.
>
> When we recover HOST1, we will simply make sure it has replicated against
> "Index-vip" and then re-enable HOST1 as a member of the Index-vip and
> disable HOST2.
>
> Hopefully this makes sense.  If all goes correctly, I've managed to keep
> all services up and running without loosing any index data.
>
> So, I have a few questions:
>
> 1. Has anyone else tried this dual repeater approach?
> 2. Am I going to have any semaphore/blocking issues if a repeater is
> pulling index data from itself?
> 3. Is there a better way to do this?
>
>
> Thanks,
> Parker
>
>
>
>
>
>
>

Re: Vetting Our Architecture: 2 Repeaters and Slaves.

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hi Parker,

 Lovely ASCII art. :)

Yes, I think you can simplify this by introducing shared storage (e.g., SAN) 
that hosts the index to which you active/primary master writes.  When your 
primary master dies, you start your stand-by master that is configured to point 
to the same index.  If there are any left-over index locks from the primary 
master, they can be removed (these is a property for that in solrconfig.xml) 
when Solr starts.  Your Index VIP can then be pointed to the the new master.  
Slaves talk to the master via Index VIP, so they hardly notice this.  And since 
the index is on the SAN, your slaves could actually point to that same index and 
avoid the whole replication process, thus removing one more moving piece, plus 
eliminating OS cache-unfriendly disk IO caused by index replication as a bonus 
feature.

Repeaters are handy for DR (replication to the second DC) or when you have so 
many slaves that their (very frequent) replication requests and actual index 
replication are too much for a single master, but it doesn't sound like you need 
them here unless you really want to have your index or even mirror the whole 
cluster setup in a second DC.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Parker Johnson <Pa...@gap.com>
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Tue, April 12, 2011 6:33:08 PM
> Subject: Vetting Our Architecture: 2 Repeaters and Slaves.
> 
> 
> 
> I am hoping to get some feedback on the architecture I've been  planning
> for a medium to high volume site.  This is my first time  working
> with Solr, so I want to be sure what I'm planning isn't totally  weird,
> unsupported, etc.
> 
> We've got a a pair of F5 loadbalancers and 4  hosts.  2 of those hosts will
> be repeaters (master+slave), and 2 of  those hosts will be pure slaves. One
> of the F5 vips, "Index-vip" will have  members HOST1 and HOST2, but HOST2
> will be "downed" and not taking traffic  from that vip.  The second vip,
> "Search-vip" will have 3 members: HOST2,  HOST3, and HOST4.  The
> "Index-vip" is intended to be used to post and  commit index changes.  The
> "Search-vip" is intended to be customer  facing.
> 
> Here is some ASCII art.  The line with the "X"'s thru it  denotes a
> "downed" member of a vip, one that isn't taking any traffic.   The "M:"
> denotes the value in the solrconfig.xml that the host uses as the  master.
> 
> 
>                Index-vip         Search-vip
>                   / \              /   |   \
>                  /   X           /    |     \
>                /     \          /     |     \
>                /       X        /      |      \
>               /         \     /        |       \
>              /           X   /         |        \
>            /              \ /         |          \
>          HOST1           HOST2      HOST3       HOST4
>        REPEATER         REPEATER    SLAVE      SLAVE
>        M:Index-vip    M:Index-vip M:Index-vip   M:Index-vip
> 
> 
> I've been working through a couple failure  scenarios.  Recovering from a
> failure of HOST2, HOST3, or HOST4 is  pretty straightforward.  Loosing
> HOST1 is my major concern.  My  plan for recovering from a failure of HOST1
> is as follows: Enable HOST2 as a  member of the Index-vip, while disabling
> member HOST1.  HOST2  effectively becomes the Master.  HOST2, 3, and 4
> continue fielding  customer requests and pulling indexes from "Index-vip."
> Since HOST2 is now in  charge of crunching indexes and fielding customer
> requests, I assume load  will increase on that box.
> 
> When we recover HOST1, we will simply make  sure it has replicated against
> "Index-vip" and then re-enable HOST1 as a  member of the Index-vip and
> disable HOST2.
> 
> Hopefully this makes  sense.  If all goes correctly, I've managed to keep
> all services up and  running without loosing any index data.
> 
> So, I have a few  questions:
> 
> 1. Has anyone else tried this dual repeater approach?
> 2. Am  I going to have any semaphore/blocking issues if a repeater is
> pulling index  data from itself?
> 3. Is there a better way to do  this?
> 
> 
> Thanks,
> Parker
> 
> 
> 
> 
> 
> 
>

Vetting Our Architecture: 2 Repeaters and Slaves.

Posted by Parker Johnson <Pa...@gap.com>.


I am hoping to get some feedback on the architecture I've been planning
for a medium to high volume site.  This is my first time working
with Solr, so I want to be sure what I'm planning isn't totally weird,
unsupported, etc.

We've got a a pair of F5 loadbalancers and 4 hosts.  2 of those hosts will
be repeaters (master+slave), and 2 of those hosts will be pure slaves. One
of the F5 vips, "Index-vip" will have members HOST1 and HOST2, but HOST2
will be "downed" and not taking traffic from that vip.  The second vip,
"Search-vip" will have 3 members: HOST2, HOST3, and HOST4.  The
"Index-vip" is intended to be used to post and commit index changes.  The
"Search-vip" is intended to be customer facing.

Here is some ASCII art.  The line with the "X"'s thru it denotes a
"downed" member of a vip, one that isn't taking any traffic.  The "M:"
denotes the value in the solrconfig.xml that the host uses as the master.


              Index-vip         Search-vip
                 / \             /   |   \
                /   X           /    |    \
               /     \         /     |     \
              /       X       /      |      \
             /         \     /       |       \
            /           X   /        |        \
           /             \ /         |         \
         HOST1          HOST2      HOST3      HOST4
       REPEATER        REPEATER    SLAVE      SLAVE
      M:Index-vip    M:Index-vip M:Index-vip  M:Index-vip


I've been working through a couple failure scenarios.  Recovering from a
failure of HOST2, HOST3, or HOST4 is pretty straightforward.  Loosing
HOST1 is my major concern.  My plan for recovering from a failure of HOST1
is as follows: Enable HOST2 as a member of the Index-vip, while disabling
member HOST1.  HOST2 effectively becomes the Master.  HOST2, 3, and 4
continue fielding customer requests and pulling indexes from "Index-vip."
Since HOST2 is now in charge of crunching indexes and fielding customer
requests, I assume load will increase on that box.

When we recover HOST1, we will simply make sure it has replicated against
"Index-vip" and then re-enable HOST1 as a member of the Index-vip and
disable HOST2.

Hopefully this makes sense.  If all goes correctly, I've managed to keep
all services up and running without loosing any index data.

So, I have a few questions:

1. Has anyone else tried this dual repeater approach?
2. Am I going to have any semaphore/blocking issues if a repeater is
pulling index data from itself?
3. Is there a better way to do this?


Thanks,
Parker

Re: Indexing Flickr and Panaramio

Posted by Estrada Groups <es...@gmail.com>.

Thanks Peter! I am thinking that I may just use Nutch to do the crawl and index off of these sites. I need to check out the APIs for each to make sure I'm not missing anything related to the geospatial data for each image. Obviously both do the extraction when the images are uploaded so I'm guessing that it's also stored somewhere too ;-)

Adam 

Sent from my iPhone

On Apr 12, 2011, at 4:00 PM, Péter Király <ki...@gmail.com> wrote:

> Hi,
> 
> I did Flickr into Lucene about 3 years ago. There is a Flickr API,
> which covers almost everything you need (as I remember, not always
> Flickr feature was implemented at that time in the API, like the
> "collection" was not searchable). You can harvest by user ID or
> searching for a topic. You can use a language library (PHP, Java etc.)
> to wrap the details of communication. It is possible, that you would
> like to merge information into one entity before send to Solr (like
> merging the user, collection and set info into each pictures). The
> last step is to transform this information into a Solr document (again
> either directly or with a language library). I am not sure if it helps
> you, but if you ask more specific question, I try to answer.
> 
> regards,
> Péter
> 
> 2011/4/12 Estrada Groups <es...@gmail.com>:
>> Has anyone tried doing this? Got any tips for someone getting started?
>> 
>> Thanks,
>> Adam
>> 
>> Sent from my iPhone
>>

Re: Indexing Flickr and Panaramio

Posted by Péter Király <ki...@gmail.com>.

Hi,

I did Flickr into Lucene about 3 years ago. There is a Flickr API,
which covers almost everything you need (as I remember, not always
Flickr feature was implemented at that time in the API, like the
"collection" was not searchable). You can harvest by user ID or
searching for a topic. You can use a language library (PHP, Java etc.)
to wrap the details of communication. It is possible, that you would
like to merge information into one entity before send to Solr (like
merging the user, collection and set info into each pictures). The
last step is to transform this information into a Solr document (again
either directly or with a language library). I am not sure if it helps
you, but if you ask more specific question, I try to answer.

regards,
Péter

2011/4/12 Estrada Groups <es...@gmail.com>:
> Has anyone tried doing this? Got any tips for someone getting started?
>
> Thanks,
> Adam
>
> Sent from my iPhone
>

Re: Indexing Flickr and Panaramio

Posted by Otis Gospodnetic <ot...@yahoo.com>.

It did: http://search-lucene.com/?q=panaramio

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Estrada Groups <es...@gmail.com>
> To: Estrada Groups <es...@gmail.com>
> Cc: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Sent: Tue, April 12, 2011 3:14:56 PM
> Subject: Re: Indexing Flickr and Panaramio
> 
> Did this go to the list? I think I may need to resubscribe...
> 
> Sent from  my iPhone
> 
> On Apr 12, 2011, at 12:55 AM, Estrada Groups <es...@gmail.com>  
>wrote:
> 
> > Has anyone tried doing this? Got any tips for someone  getting started?
> > 
> > Thanks,
> > Adam
> > 
> > Sent  from my iPhone
>

Re: Indexing Flickr and Panaramio

Posted by Estrada Groups <es...@gmail.com>.

Did this go to the list? I think I may need to resubscribe...

Sent from my iPhone

On Apr 12, 2011, at 12:55 AM, Estrada Groups <es...@gmail.com> wrote:

> Has anyone tried doing this? Got any tips for someone getting started?
> 
> Thanks,
> Adam
> 
> Sent from my iPhone