You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Anthony Rodriguez <ar...@spark.net> on 2013/03/19 02:10:39 UTC

lucene.net 3.0.3 indexing spatial too slow

I have recently upgraded my search code from lucene.net 2.9.4 to 3.0.3. I have noticed a change in the spatial packages and have updated my code accordingly. One drawback from the upgrade that I have noticed is much slower index times. Through process of elimination, I have been able to narrow the slowness down to the new spatial code that indexes the lat/long coordinates:
public void AddLocation (double lat, double lng)
    {
        try
        {
            string latLongKey = lat.ToString() + "," + lng.ToString();
            AbstractField[] shapeFields = null;
            Shape shape = null;
            if (HasSpatialShapes(latLongKey))
            {
                shape = SpatialShapes[latLongKey];
            }
            else
            {
                if (this.Strategy is BBoxStrategy)
                {
                    shape = Context.MakeRectangle(DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat), DistanceUtils.NormLatDEG(lat));
                }
               else
                {
                    shape = Context.MakePoint(DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat));
                }

                AddSpatialShapes(latLongKey, shape);
            }

            shapeFields = Strategy.CreateIndexableFields(shape);
            //Potentially more than one shape in this field is supported by some
            // strategies; see the javadocs of the SpatialStrategy impl to see.
            foreach (AbstractField f in shapeFields)
            {
                _document.Add(f);
            }
            //add lat long values to index too
            _document.Add(GetField("latitude", NumericUtils.DoubleToPrefixCoded(lat), Field.Index.NOT_ANALYZED, Field.Store.YES, 0f, false));
            _document.Add(GetField("longitude", NumericUtils.DoubleToPrefixCoded(lng), Field.Index.NOT_ANALYZED, Field.Store.YES, 0f, false));
        }
        catch (Exception e)
        {
            RollingFileLogger.Instance.LogException(ServiceConstants.SERVICE_INDEXER_CONST, "Document",string.Format("AddLocation({0},{1})", lat.ToString(), lng.ToString()), e, null);
            throw e;
        }
    }

With 2.9.4, I was able to index about 300,000 rows of data with lat/lng points in about 11 minutes. With this new spatial package it takes upwards of 5 hours (I've killed the test before it finishes so I don't have an exact timing for it). Here is the spatial context/strategy I am using:


public static SpatialContext SpatialContext

   {

       get

       {

           if (null == _spatialContext)

           {

               lock (_lockObject)

               {

                   if(null==_spatialContext) _spatialContext = SpatialContext.GEO;

               }

           }

           return _spatialContext;

       }

   }



   public static SpatialStrategy SpatialStrategy

   {

       get

       {

           if (null == _spatialStrategy)

           {

               lock (_lockObject)

               {

                   if (null == _spatialStrategy)

                   {

                       int maxLength = 9;

                       GeohashPrefixTree geohashPrefixTree = new GeohashPrefixTree(SpatialContext, maxLength);

                       _spatialStrategy = new RecursivePrefixTreeStrategy(geohashPrefixTree, "geoField");

                   }

               }

           }

           return _spatialStrategy;

       }

   }

Is there something I am doing wrong with my indexing approach? I have cached the shapes that get created by the lat/lng points since I don't need a new shape for the same coordinates. It appears to be the CreateIndexableFields() method that is taking the most time during indexing. I've tried to cache the fields generated by this method to reuse but I can't create a new instance of the TokenStream from the cached field to use in a new Document (in lucene.net 3.0.3 the constructor for TokenStream is protected). I've lowered the maxLevels int to 4 in the spatial strategy but I haven't seen an improvement in indexing times. Any feedback would be greatly appreciated.

________________________________
Anthony Rodriguez
Senior Software Developer

Spark Networks<http://www.spark.net> | Igniting Relationships(r)
8383 Wilshire Blvd. Suite 800 | Beverly Hills, CA 90211
p. 323 658 3000 ext. 8021 | f. 866 945 5209
________________________________

RE: Unsubscribe

Posted by Prescott Nasser <ge...@hotmail.com>.
I wrote that text, I thought it was pretty clear, does it not actually work?
>> "To subscribe to the mailing lists, send an email to *list*-
>> subscribe@lucenenet.apache.org. To unsubscribe, send an email to *list*-
>> unsubscribe@lucenenet.apache.org."I assume people know to change *list* to the list they are trying to unsubscribe to - since we have 3? Does that not work?
> From: rob.cecil@gmail.com
> Date: Thu, 21 Mar 2013 18:54:39 -0400
> Subject: Re: Unsubscribe
> To: user@lucenenet.apache.org
> 
> Sucks. Who are these Apache guys again? New to the Internets? :)
> 
> 
> On Thu, Mar 21, 2013 at 6:47 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:
> 
> > The Lucene.Net website is really kinda poor in many regards
> >
> > Following the general Apache mailing list instructions from
> > http://www.apache.org/foundation/mailinglists.html should work
> >
> > So for this list send an email to user-unsubscribe@lucenenet.apache.org,
> > if you are also on the dev list send one to
> > dev-unsubscribe@lucenenet.apache.org
> >
> > Rob
> >
> > On 3/21/13 2:54 PM, "Kyle Jones" <ky...@bucebuce.com> wrote:
> >
> > >Rob,
> > >
> > >Actually, that email address is broken. I'm not sure there is currently a
> > >documented way to unsubscribe.
> > >
> > >- Kyle
> > >
> > >
> > >On Thu, Mar 21, 2013 at 2:21 PM, Rob Cecil <ro...@gmail.com> wrote:
> > >
> > >> Bart,
> > >>
> > >> How about reading http://lucenenet.apache.org/community.html ??
> > >>
> > >> "To subscribe to the mailing lists, send an email to *list*-
> > >> subscribe@lucenenet.apache.org. To unsubscribe, send an email to
> > *list*-
> > >> unsubscribe@lucenenet.apache.org."
> > >>
> > >> Did you try that?
> > >>
> > >>
> > >> On Thu, Mar 21, 2013 at 3:02 PM, <ba...@gmail.com> wrote:
> > >>
> > >> > Unsubscribe me from everything for the eight time
> > >> >
> > >> >
> > >> >
> > >> > Sent from Windows Mail
> > >> >
> > >> >
> > >> > From: Anthony Rodriguez
> > >> > Sent: March 18, 2013 8:10 PM
> > >> > To: user@lucenenet.apache.org
> > >> > Subject: lucene.net 3.0.3 indexing spatial too slow
> > >> >
> > >> >
> > >> > I have recently upgraded my search code from lucene.net 2.9.4 to
> > >>3.0.3.
> > >> I
> > >> > have noticed a change in the spatial packages and have updated my code
> > >> > accordingly. One drawback from the upgrade that I have noticed is much
> > >> > slower index times. Through process of elimination, I have been able
> > >>to
> > >> > narrow the slowness down to the new spatial code that indexes the
> > >> lat/long
> > >> > coordinates:
> > >> > public void AddLocation (double lat, double lng)
> > >> >     {
> > >> >         try
> > >> >         {
> > >> >             string latLongKey = lat.ToString() + "," + lng.ToString();
> > >> >             AbstractField[] shapeFields = null;
> > >> >             Shape shape = null;
> > >> >             if (HasSpatialShapes(latLongKey))
> > >> >             {
> > >> >                 shape = SpatialShapes[latLongKey];
> > >> >             }
> > >> >             else
> > >> >             {
> > >> >                 if (this.Strategy is BBoxStrategy)
> > >> >                 {
> > >> >                     shape =
> > >> > Context.MakeRectangle(DistanceUtils.NormLonDEG(lng),
> > >> > DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat),
> > >> > DistanceUtils.NormLatDEG(lat));
> > >> >                 }
> > >> >                else
> > >> >                 {
> > >> >                     shape =
> > >> > Context.MakePoint(DistanceUtils.NormLonDEG(lng),
> > >> > DistanceUtils.NormLatDEG(lat));
> > >> >                 }
> > >> >
> > >> >                 AddSpatialShapes(latLongKey, shape);
> > >> >             }
> > >> >
> > >> >             shapeFields = Strategy.CreateIndexableFields(shape);
> > >> >             //Potentially more than one shape in this field is
> > >>supported
> > >> > by some
> > >> >             // strategies; see the javadocs of the SpatialStrategy
> > >>impl
> > >> to
> > >> > see.
> > >> >             foreach (AbstractField f in shapeFields)
> > >> >             {
> > >> >                 _document.Add(f);
> > >> >             }
> > >> >             //add lat long values to index too
> > >> >             _document.Add(GetField("latitude",
> > >> > NumericUtils.DoubleToPrefixCoded(lat), Field.Index.NOT_ANALYZED,
> > >> > Field.Store.YES, 0f, false));
> > >> >             _document.Add(GetField("longitude",
> > >> > NumericUtils.DoubleToPrefixCoded(lng), Field.Index.NOT_ANALYZED,
> > >> > Field.Store.YES, 0f, false));
> > >> >         }
> > >> >         catch (Exception e)
> > >> >         {
> > >> >
> > >> >
> > >>
> > >>RollingFileLogger.Instance.LogException(ServiceConstants.SERVICE_INDEXER_
> > >>CONST,
> > >> > "Document",string.Format("AddLocation({0},{1})", lat.ToString(),
> > >> > lng.ToString()), e, null);
> > >> >             throw e;
> > >> >         }
> > >> >     }
> > >> >
> > >> > With 2.9.4, I was able to index about 300,000 rows of data with
> > >>lat/lng
> > >> > points in about 11 minutes. With this new spatial package it takes
> > >> upwards
> > >> > of 5 hours (I've killed the test before it finishes so I don't have an
> > >> > exact timing for it). Here is the spatial context/strategy I am using:
> > >> >
> > >> >
> > >> > public static SpatialContext SpatialContext
> > >> >
> > >> >    {
> > >> >
> > >> >        get
> > >> >
> > >> >        {
> > >> >
> > >> >            if (null == _spatialContext)
> > >> >
> > >> >            {
> > >> >
> > >> >                lock (_lockObject)
> > >> >
> > >> >                {
> > >> >
> > >> >                    if(null==_spatialContext) _spatialContext =
> > >> > SpatialContext.GEO;
> > >> >
> > >> >                }
> > >> >
> > >> >            }
> > >> >
> > >> >            return _spatialContext;
> > >> >
> > >> >        }
> > >> >
> > >> >    }
> > >> >
> > >> >
> > >> >
> > >> >    public static SpatialStrategy SpatialStrategy
> > >> >
> > >> >    {
> > >> >
> > >> >        get
> > >> >
> > >> >        {
> > >> >
> > >> >            if (null == _spatialStrategy)
> > >> >
> > >> >            {
> > >> >
> > >> >                lock (_lockObject)
> > >> >
> > >> >                {
> > >> >
> > >> >                    if (null == _spatialStrategy)
> > >> >
> > >> >                    {
> > >> >
> > >> >                        int maxLength = 9;
> > >> >
> > >> >                        GeohashPrefixTree geohashPrefixTree = new
> > >> > GeohashPrefixTree(SpatialContext, maxLength);
> > >> >
> > >> >                        _spatialStrategy = new
> > >> > RecursivePrefixTreeStrategy(geohashPrefixTree, "geoField");
> > >> >
> > >> >                    }
> > >> >
> > >> >                }
> > >> >
> > >> >            }
> > >> >
> > >> >            return _spatialStrategy;
> > >> >
> > >> >        }
> > >> >
> > >> >    }
> > >> >
> > >> > Is there something I am doing wrong with my indexing approach? I have
> > >> > cached the shapes that get created by the lat/lng points since I don't
> > >> need
> > >> > a new shape for the same coordinates. It appears to be the
> > >> > CreateIndexableFields() method that is taking the most time during
> > >> > indexing. I've tried to cache the fields generated by this method to
> > >> reuse
> > >> > but I can't create a new instance of the TokenStream from the cached
> > >> field
> > >> > to use in a new Document (in lucene.net 3.0.3 the constructor for
> > >> > TokenStream is protected). I've lowered the maxLevels int to 4 in the
> > >> > spatial strategy but I haven't seen an improvement in indexing times.
> > >>Any
> > >> > feedback would be greatly appreciated.
> > >> >
> > >> > ________________________________
> > >> > Anthony Rodriguez
> > >> > Senior Software Developer
> > >> >
> > >> > Spark Networks<http://www.spark.net> | Igniting Relationships(r)
> > >> > 8383 Wilshire Blvd. Suite 800 | Beverly Hills, CA 90211
> > >> > p. 323 658 3000 ext. 8021 | f. 866 945 5209
> > >> > ________________________________
> > >>
> >
> >
> >
> >
> >
 		 	   		  

Re: Unsubscribe

Posted by Rob Cecil <ro...@gmail.com>.
Sucks. Who are these Apache guys again? New to the Internets? :)


On Thu, Mar 21, 2013 at 6:47 PM, Rob Vesse <rv...@dotnetrdf.org> wrote:

> The Lucene.Net website is really kinda poor in many regards
>
> Following the general Apache mailing list instructions from
> http://www.apache.org/foundation/mailinglists.html should work
>
> So for this list send an email to user-unsubscribe@lucenenet.apache.org,
> if you are also on the dev list send one to
> dev-unsubscribe@lucenenet.apache.org
>
> Rob
>
> On 3/21/13 2:54 PM, "Kyle Jones" <ky...@bucebuce.com> wrote:
>
> >Rob,
> >
> >Actually, that email address is broken. I'm not sure there is currently a
> >documented way to unsubscribe.
> >
> >- Kyle
> >
> >
> >On Thu, Mar 21, 2013 at 2:21 PM, Rob Cecil <ro...@gmail.com> wrote:
> >
> >> Bart,
> >>
> >> How about reading http://lucenenet.apache.org/community.html ??
> >>
> >> "To subscribe to the mailing lists, send an email to *list*-
> >> subscribe@lucenenet.apache.org. To unsubscribe, send an email to
> *list*-
> >> unsubscribe@lucenenet.apache.org."
> >>
> >> Did you try that?
> >>
> >>
> >> On Thu, Mar 21, 2013 at 3:02 PM, <ba...@gmail.com> wrote:
> >>
> >> > Unsubscribe me from everything for the eight time
> >> >
> >> >
> >> >
> >> > Sent from Windows Mail
> >> >
> >> >
> >> > From: Anthony Rodriguez
> >> > Sent: March 18, 2013 8:10 PM
> >> > To: user@lucenenet.apache.org
> >> > Subject: lucene.net 3.0.3 indexing spatial too slow
> >> >
> >> >
> >> > I have recently upgraded my search code from lucene.net 2.9.4 to
> >>3.0.3.
> >> I
> >> > have noticed a change in the spatial packages and have updated my code
> >> > accordingly. One drawback from the upgrade that I have noticed is much
> >> > slower index times. Through process of elimination, I have been able
> >>to
> >> > narrow the slowness down to the new spatial code that indexes the
> >> lat/long
> >> > coordinates:
> >> > public void AddLocation (double lat, double lng)
> >> >     {
> >> >         try
> >> >         {
> >> >             string latLongKey = lat.ToString() + "," + lng.ToString();
> >> >             AbstractField[] shapeFields = null;
> >> >             Shape shape = null;
> >> >             if (HasSpatialShapes(latLongKey))
> >> >             {
> >> >                 shape = SpatialShapes[latLongKey];
> >> >             }
> >> >             else
> >> >             {
> >> >                 if (this.Strategy is BBoxStrategy)
> >> >                 {
> >> >                     shape =
> >> > Context.MakeRectangle(DistanceUtils.NormLonDEG(lng),
> >> > DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat),
> >> > DistanceUtils.NormLatDEG(lat));
> >> >                 }
> >> >                else
> >> >                 {
> >> >                     shape =
> >> > Context.MakePoint(DistanceUtils.NormLonDEG(lng),
> >> > DistanceUtils.NormLatDEG(lat));
> >> >                 }
> >> >
> >> >                 AddSpatialShapes(latLongKey, shape);
> >> >             }
> >> >
> >> >             shapeFields = Strategy.CreateIndexableFields(shape);
> >> >             //Potentially more than one shape in this field is
> >>supported
> >> > by some
> >> >             // strategies; see the javadocs of the SpatialStrategy
> >>impl
> >> to
> >> > see.
> >> >             foreach (AbstractField f in shapeFields)
> >> >             {
> >> >                 _document.Add(f);
> >> >             }
> >> >             //add lat long values to index too
> >> >             _document.Add(GetField("latitude",
> >> > NumericUtils.DoubleToPrefixCoded(lat), Field.Index.NOT_ANALYZED,
> >> > Field.Store.YES, 0f, false));
> >> >             _document.Add(GetField("longitude",
> >> > NumericUtils.DoubleToPrefixCoded(lng), Field.Index.NOT_ANALYZED,
> >> > Field.Store.YES, 0f, false));
> >> >         }
> >> >         catch (Exception e)
> >> >         {
> >> >
> >> >
> >>
> >>RollingFileLogger.Instance.LogException(ServiceConstants.SERVICE_INDEXER_
> >>CONST,
> >> > "Document",string.Format("AddLocation({0},{1})", lat.ToString(),
> >> > lng.ToString()), e, null);
> >> >             throw e;
> >> >         }
> >> >     }
> >> >
> >> > With 2.9.4, I was able to index about 300,000 rows of data with
> >>lat/lng
> >> > points in about 11 minutes. With this new spatial package it takes
> >> upwards
> >> > of 5 hours (I've killed the test before it finishes so I don't have an
> >> > exact timing for it). Here is the spatial context/strategy I am using:
> >> >
> >> >
> >> > public static SpatialContext SpatialContext
> >> >
> >> >    {
> >> >
> >> >        get
> >> >
> >> >        {
> >> >
> >> >            if (null == _spatialContext)
> >> >
> >> >            {
> >> >
> >> >                lock (_lockObject)
> >> >
> >> >                {
> >> >
> >> >                    if(null==_spatialContext) _spatialContext =
> >> > SpatialContext.GEO;
> >> >
> >> >                }
> >> >
> >> >            }
> >> >
> >> >            return _spatialContext;
> >> >
> >> >        }
> >> >
> >> >    }
> >> >
> >> >
> >> >
> >> >    public static SpatialStrategy SpatialStrategy
> >> >
> >> >    {
> >> >
> >> >        get
> >> >
> >> >        {
> >> >
> >> >            if (null == _spatialStrategy)
> >> >
> >> >            {
> >> >
> >> >                lock (_lockObject)
> >> >
> >> >                {
> >> >
> >> >                    if (null == _spatialStrategy)
> >> >
> >> >                    {
> >> >
> >> >                        int maxLength = 9;
> >> >
> >> >                        GeohashPrefixTree geohashPrefixTree = new
> >> > GeohashPrefixTree(SpatialContext, maxLength);
> >> >
> >> >                        _spatialStrategy = new
> >> > RecursivePrefixTreeStrategy(geohashPrefixTree, "geoField");
> >> >
> >> >                    }
> >> >
> >> >                }
> >> >
> >> >            }
> >> >
> >> >            return _spatialStrategy;
> >> >
> >> >        }
> >> >
> >> >    }
> >> >
> >> > Is there something I am doing wrong with my indexing approach? I have
> >> > cached the shapes that get created by the lat/lng points since I don't
> >> need
> >> > a new shape for the same coordinates. It appears to be the
> >> > CreateIndexableFields() method that is taking the most time during
> >> > indexing. I've tried to cache the fields generated by this method to
> >> reuse
> >> > but I can't create a new instance of the TokenStream from the cached
> >> field
> >> > to use in a new Document (in lucene.net 3.0.3 the constructor for
> >> > TokenStream is protected). I've lowered the maxLevels int to 4 in the
> >> > spatial strategy but I haven't seen an improvement in indexing times.
> >>Any
> >> > feedback would be greatly appreciated.
> >> >
> >> > ________________________________
> >> > Anthony Rodriguez
> >> > Senior Software Developer
> >> >
> >> > Spark Networks<http://www.spark.net> | Igniting Relationships(r)
> >> > 8383 Wilshire Blvd. Suite 800 | Beverly Hills, CA 90211
> >> > p. 323 658 3000 ext. 8021 | f. 866 945 5209
> >> > ________________________________
> >>
>
>
>
>
>

Re: Unsubscribe

Posted by Rob Vesse <rv...@dotnetrdf.org>.
The Lucene.Net website is really kinda poor in many regards

Following the general Apache mailing list instructions from
http://www.apache.org/foundation/mailinglists.html should work

So for this list send an email to user-unsubscribe@lucenenet.apache.org,
if you are also on the dev list send one to
dev-unsubscribe@lucenenet.apache.org

Rob

On 3/21/13 2:54 PM, "Kyle Jones" <ky...@bucebuce.com> wrote:

>Rob,
>
>Actually, that email address is broken. I'm not sure there is currently a
>documented way to unsubscribe.
>
>- Kyle
>
>
>On Thu, Mar 21, 2013 at 2:21 PM, Rob Cecil <ro...@gmail.com> wrote:
>
>> Bart,
>>
>> How about reading http://lucenenet.apache.org/community.html ??
>>
>> "To subscribe to the mailing lists, send an email to *list*-
>> subscribe@lucenenet.apache.org. To unsubscribe, send an email to *list*-
>> unsubscribe@lucenenet.apache.org."
>>
>> Did you try that?
>>
>>
>> On Thu, Mar 21, 2013 at 3:02 PM, <ba...@gmail.com> wrote:
>>
>> > Unsubscribe me from everything for the eight time
>> >
>> >
>> >
>> > Sent from Windows Mail
>> >
>> >
>> > From: Anthony Rodriguez
>> > Sent: March 18, 2013 8:10 PM
>> > To: user@lucenenet.apache.org
>> > Subject: lucene.net 3.0.3 indexing spatial too slow
>> >
>> >
>> > I have recently upgraded my search code from lucene.net 2.9.4 to
>>3.0.3.
>> I
>> > have noticed a change in the spatial packages and have updated my code
>> > accordingly. One drawback from the upgrade that I have noticed is much
>> > slower index times. Through process of elimination, I have been able
>>to
>> > narrow the slowness down to the new spatial code that indexes the
>> lat/long
>> > coordinates:
>> > public void AddLocation (double lat, double lng)
>> >     {
>> >         try
>> >         {
>> >             string latLongKey = lat.ToString() + "," + lng.ToString();
>> >             AbstractField[] shapeFields = null;
>> >             Shape shape = null;
>> >             if (HasSpatialShapes(latLongKey))
>> >             {
>> >                 shape = SpatialShapes[latLongKey];
>> >             }
>> >             else
>> >             {
>> >                 if (this.Strategy is BBoxStrategy)
>> >                 {
>> >                     shape =
>> > Context.MakeRectangle(DistanceUtils.NormLonDEG(lng),
>> > DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat),
>> > DistanceUtils.NormLatDEG(lat));
>> >                 }
>> >                else
>> >                 {
>> >                     shape =
>> > Context.MakePoint(DistanceUtils.NormLonDEG(lng),
>> > DistanceUtils.NormLatDEG(lat));
>> >                 }
>> >
>> >                 AddSpatialShapes(latLongKey, shape);
>> >             }
>> >
>> >             shapeFields = Strategy.CreateIndexableFields(shape);
>> >             //Potentially more than one shape in this field is
>>supported
>> > by some
>> >             // strategies; see the javadocs of the SpatialStrategy
>>impl
>> to
>> > see.
>> >             foreach (AbstractField f in shapeFields)
>> >             {
>> >                 _document.Add(f);
>> >             }
>> >             //add lat long values to index too
>> >             _document.Add(GetField("latitude",
>> > NumericUtils.DoubleToPrefixCoded(lat), Field.Index.NOT_ANALYZED,
>> > Field.Store.YES, 0f, false));
>> >             _document.Add(GetField("longitude",
>> > NumericUtils.DoubleToPrefixCoded(lng), Field.Index.NOT_ANALYZED,
>> > Field.Store.YES, 0f, false));
>> >         }
>> >         catch (Exception e)
>> >         {
>> >
>> >
>> 
>>RollingFileLogger.Instance.LogException(ServiceConstants.SERVICE_INDEXER_
>>CONST,
>> > "Document",string.Format("AddLocation({0},{1})", lat.ToString(),
>> > lng.ToString()), e, null);
>> >             throw e;
>> >         }
>> >     }
>> >
>> > With 2.9.4, I was able to index about 300,000 rows of data with
>>lat/lng
>> > points in about 11 minutes. With this new spatial package it takes
>> upwards
>> > of 5 hours (I've killed the test before it finishes so I don't have an
>> > exact timing for it). Here is the spatial context/strategy I am using:
>> >
>> >
>> > public static SpatialContext SpatialContext
>> >
>> >    {
>> >
>> >        get
>> >
>> >        {
>> >
>> >            if (null == _spatialContext)
>> >
>> >            {
>> >
>> >                lock (_lockObject)
>> >
>> >                {
>> >
>> >                    if(null==_spatialContext) _spatialContext =
>> > SpatialContext.GEO;
>> >
>> >                }
>> >
>> >            }
>> >
>> >            return _spatialContext;
>> >
>> >        }
>> >
>> >    }
>> >
>> >
>> >
>> >    public static SpatialStrategy SpatialStrategy
>> >
>> >    {
>> >
>> >        get
>> >
>> >        {
>> >
>> >            if (null == _spatialStrategy)
>> >
>> >            {
>> >
>> >                lock (_lockObject)
>> >
>> >                {
>> >
>> >                    if (null == _spatialStrategy)
>> >
>> >                    {
>> >
>> >                        int maxLength = 9;
>> >
>> >                        GeohashPrefixTree geohashPrefixTree = new
>> > GeohashPrefixTree(SpatialContext, maxLength);
>> >
>> >                        _spatialStrategy = new
>> > RecursivePrefixTreeStrategy(geohashPrefixTree, "geoField");
>> >
>> >                    }
>> >
>> >                }
>> >
>> >            }
>> >
>> >            return _spatialStrategy;
>> >
>> >        }
>> >
>> >    }
>> >
>> > Is there something I am doing wrong with my indexing approach? I have
>> > cached the shapes that get created by the lat/lng points since I don't
>> need
>> > a new shape for the same coordinates. It appears to be the
>> > CreateIndexableFields() method that is taking the most time during
>> > indexing. I've tried to cache the fields generated by this method to
>> reuse
>> > but I can't create a new instance of the TokenStream from the cached
>> field
>> > to use in a new Document (in lucene.net 3.0.3 the constructor for
>> > TokenStream is protected). I've lowered the maxLevels int to 4 in the
>> > spatial strategy but I haven't seen an improvement in indexing times.
>>Any
>> > feedback would be greatly appreciated.
>> >
>> > ________________________________
>> > Anthony Rodriguez
>> > Senior Software Developer
>> >
>> > Spark Networks<http://www.spark.net> | Igniting Relationships(r)
>> > 8383 Wilshire Blvd. Suite 800 | Beverly Hills, CA 90211
>> > p. 323 658 3000 ext. 8021 | f. 866 945 5209
>> > ________________________________
>>





Re: Unsubscribe

Posted by Kyle Jones <ky...@bucebuce.com>.
Rob,

Actually, that email address is broken. I'm not sure there is currently a
documented way to unsubscribe.

- Kyle


On Thu, Mar 21, 2013 at 2:21 PM, Rob Cecil <ro...@gmail.com> wrote:

> Bart,
>
> How about reading http://lucenenet.apache.org/community.html ??
>
> "To subscribe to the mailing lists, send an email to *list*-
> subscribe@lucenenet.apache.org. To unsubscribe, send an email to *list*-
> unsubscribe@lucenenet.apache.org."
>
> Did you try that?
>
>
> On Thu, Mar 21, 2013 at 3:02 PM, <ba...@gmail.com> wrote:
>
> > Unsubscribe me from everything for the eight time
> >
> >
> >
> > Sent from Windows Mail
> >
> >
> > From: Anthony Rodriguez
> > Sent: March 18, 2013 8:10 PM
> > To: user@lucenenet.apache.org
> > Subject: lucene.net 3.0.3 indexing spatial too slow
> >
> >
> > I have recently upgraded my search code from lucene.net 2.9.4 to 3.0.3.
> I
> > have noticed a change in the spatial packages and have updated my code
> > accordingly. One drawback from the upgrade that I have noticed is much
> > slower index times. Through process of elimination, I have been able to
> > narrow the slowness down to the new spatial code that indexes the
> lat/long
> > coordinates:
> > public void AddLocation (double lat, double lng)
> >     {
> >         try
> >         {
> >             string latLongKey = lat.ToString() + "," + lng.ToString();
> >             AbstractField[] shapeFields = null;
> >             Shape shape = null;
> >             if (HasSpatialShapes(latLongKey))
> >             {
> >                 shape = SpatialShapes[latLongKey];
> >             }
> >             else
> >             {
> >                 if (this.Strategy is BBoxStrategy)
> >                 {
> >                     shape =
> > Context.MakeRectangle(DistanceUtils.NormLonDEG(lng),
> > DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat),
> > DistanceUtils.NormLatDEG(lat));
> >                 }
> >                else
> >                 {
> >                     shape =
> > Context.MakePoint(DistanceUtils.NormLonDEG(lng),
> > DistanceUtils.NormLatDEG(lat));
> >                 }
> >
> >                 AddSpatialShapes(latLongKey, shape);
> >             }
> >
> >             shapeFields = Strategy.CreateIndexableFields(shape);
> >             //Potentially more than one shape in this field is supported
> > by some
> >             // strategies; see the javadocs of the SpatialStrategy impl
> to
> > see.
> >             foreach (AbstractField f in shapeFields)
> >             {
> >                 _document.Add(f);
> >             }
> >             //add lat long values to index too
> >             _document.Add(GetField("latitude",
> > NumericUtils.DoubleToPrefixCoded(lat), Field.Index.NOT_ANALYZED,
> > Field.Store.YES, 0f, false));
> >             _document.Add(GetField("longitude",
> > NumericUtils.DoubleToPrefixCoded(lng), Field.Index.NOT_ANALYZED,
> > Field.Store.YES, 0f, false));
> >         }
> >         catch (Exception e)
> >         {
> >
> >
> RollingFileLogger.Instance.LogException(ServiceConstants.SERVICE_INDEXER_CONST,
> > "Document",string.Format("AddLocation({0},{1})", lat.ToString(),
> > lng.ToString()), e, null);
> >             throw e;
> >         }
> >     }
> >
> > With 2.9.4, I was able to index about 300,000 rows of data with lat/lng
> > points in about 11 minutes. With this new spatial package it takes
> upwards
> > of 5 hours (I've killed the test before it finishes so I don't have an
> > exact timing for it). Here is the spatial context/strategy I am using:
> >
> >
> > public static SpatialContext SpatialContext
> >
> >    {
> >
> >        get
> >
> >        {
> >
> >            if (null == _spatialContext)
> >
> >            {
> >
> >                lock (_lockObject)
> >
> >                {
> >
> >                    if(null==_spatialContext) _spatialContext =
> > SpatialContext.GEO;
> >
> >                }
> >
> >            }
> >
> >            return _spatialContext;
> >
> >        }
> >
> >    }
> >
> >
> >
> >    public static SpatialStrategy SpatialStrategy
> >
> >    {
> >
> >        get
> >
> >        {
> >
> >            if (null == _spatialStrategy)
> >
> >            {
> >
> >                lock (_lockObject)
> >
> >                {
> >
> >                    if (null == _spatialStrategy)
> >
> >                    {
> >
> >                        int maxLength = 9;
> >
> >                        GeohashPrefixTree geohashPrefixTree = new
> > GeohashPrefixTree(SpatialContext, maxLength);
> >
> >                        _spatialStrategy = new
> > RecursivePrefixTreeStrategy(geohashPrefixTree, "geoField");
> >
> >                    }
> >
> >                }
> >
> >            }
> >
> >            return _spatialStrategy;
> >
> >        }
> >
> >    }
> >
> > Is there something I am doing wrong with my indexing approach? I have
> > cached the shapes that get created by the lat/lng points since I don't
> need
> > a new shape for the same coordinates. It appears to be the
> > CreateIndexableFields() method that is taking the most time during
> > indexing. I've tried to cache the fields generated by this method to
> reuse
> > but I can't create a new instance of the TokenStream from the cached
> field
> > to use in a new Document (in lucene.net 3.0.3 the constructor for
> > TokenStream is protected). I've lowered the maxLevels int to 4 in the
> > spatial strategy but I haven't seen an improvement in indexing times. Any
> > feedback would be greatly appreciated.
> >
> > ________________________________
> > Anthony Rodriguez
> > Senior Software Developer
> >
> > Spark Networks<http://www.spark.net> | Igniting Relationships(r)
> > 8383 Wilshire Blvd. Suite 800 | Beverly Hills, CA 90211
> > p. 323 658 3000 ext. 8021 | f. 866 945 5209
> > ________________________________
>

Re: Unsubscribe

Posted by Rob Cecil <ro...@gmail.com>.
Bart,

How about reading http://lucenenet.apache.org/community.html ??

"To subscribe to the mailing lists, send an email to *list*-
subscribe@lucenenet.apache.org. To unsubscribe, send an email to *list*-
unsubscribe@lucenenet.apache.org."

Did you try that?


On Thu, Mar 21, 2013 at 3:02 PM, <ba...@gmail.com> wrote:

> Unsubscribe me from everything for the eight time
>
>
>
> Sent from Windows Mail
>
>
> From: Anthony Rodriguez
> Sent: March 18, 2013 8:10 PM
> To: user@lucenenet.apache.org
> Subject: lucene.net 3.0.3 indexing spatial too slow
>
>
> I have recently upgraded my search code from lucene.net 2.9.4 to 3.0.3. I
> have noticed a change in the spatial packages and have updated my code
> accordingly. One drawback from the upgrade that I have noticed is much
> slower index times. Through process of elimination, I have been able to
> narrow the slowness down to the new spatial code that indexes the lat/long
> coordinates:
> public void AddLocation (double lat, double lng)
>     {
>         try
>         {
>             string latLongKey = lat.ToString() + "," + lng.ToString();
>             AbstractField[] shapeFields = null;
>             Shape shape = null;
>             if (HasSpatialShapes(latLongKey))
>             {
>                 shape = SpatialShapes[latLongKey];
>             }
>             else
>             {
>                 if (this.Strategy is BBoxStrategy)
>                 {
>                     shape =
> Context.MakeRectangle(DistanceUtils.NormLonDEG(lng),
> DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat),
> DistanceUtils.NormLatDEG(lat));
>                 }
>                else
>                 {
>                     shape =
> Context.MakePoint(DistanceUtils.NormLonDEG(lng),
> DistanceUtils.NormLatDEG(lat));
>                 }
>
>                 AddSpatialShapes(latLongKey, shape);
>             }
>
>             shapeFields = Strategy.CreateIndexableFields(shape);
>             //Potentially more than one shape in this field is supported
> by some
>             // strategies; see the javadocs of the SpatialStrategy impl to
> see.
>             foreach (AbstractField f in shapeFields)
>             {
>                 _document.Add(f);
>             }
>             //add lat long values to index too
>             _document.Add(GetField("latitude",
> NumericUtils.DoubleToPrefixCoded(lat), Field.Index.NOT_ANALYZED,
> Field.Store.YES, 0f, false));
>             _document.Add(GetField("longitude",
> NumericUtils.DoubleToPrefixCoded(lng), Field.Index.NOT_ANALYZED,
> Field.Store.YES, 0f, false));
>         }
>         catch (Exception e)
>         {
>
> RollingFileLogger.Instance.LogException(ServiceConstants.SERVICE_INDEXER_CONST,
> "Document",string.Format("AddLocation({0},{1})", lat.ToString(),
> lng.ToString()), e, null);
>             throw e;
>         }
>     }
>
> With 2.9.4, I was able to index about 300,000 rows of data with lat/lng
> points in about 11 minutes. With this new spatial package it takes upwards
> of 5 hours (I've killed the test before it finishes so I don't have an
> exact timing for it). Here is the spatial context/strategy I am using:
>
>
> public static SpatialContext SpatialContext
>
>    {
>
>        get
>
>        {
>
>            if (null == _spatialContext)
>
>            {
>
>                lock (_lockObject)
>
>                {
>
>                    if(null==_spatialContext) _spatialContext =
> SpatialContext.GEO;
>
>                }
>
>            }
>
>            return _spatialContext;
>
>        }
>
>    }
>
>
>
>    public static SpatialStrategy SpatialStrategy
>
>    {
>
>        get
>
>        {
>
>            if (null == _spatialStrategy)
>
>            {
>
>                lock (_lockObject)
>
>                {
>
>                    if (null == _spatialStrategy)
>
>                    {
>
>                        int maxLength = 9;
>
>                        GeohashPrefixTree geohashPrefixTree = new
> GeohashPrefixTree(SpatialContext, maxLength);
>
>                        _spatialStrategy = new
> RecursivePrefixTreeStrategy(geohashPrefixTree, "geoField");
>
>                    }
>
>                }
>
>            }
>
>            return _spatialStrategy;
>
>        }
>
>    }
>
> Is there something I am doing wrong with my indexing approach? I have
> cached the shapes that get created by the lat/lng points since I don't need
> a new shape for the same coordinates. It appears to be the
> CreateIndexableFields() method that is taking the most time during
> indexing. I've tried to cache the fields generated by this method to reuse
> but I can't create a new instance of the TokenStream from the cached field
> to use in a new Document (in lucene.net 3.0.3 the constructor for
> TokenStream is protected). I've lowered the maxLevels int to 4 in the
> spatial strategy but I haven't seen an improvement in indexing times. Any
> feedback would be greatly appreciated.
>
> ________________________________
> Anthony Rodriguez
> Senior Software Developer
>
> Spark Networks<http://www.spark.net> | Igniting Relationships(r)
> 8383 Wilshire Blvd. Suite 800 | Beverly Hills, CA 90211
> p. 323 658 3000 ext. 8021 | f. 866 945 5209
> ________________________________

Unsubscribe

Posted by ba...@gmail.com.
Unsubscribe me from everything for the eight time



Sent from Windows Mail


From: Anthony Rodriguez
Sent: ‎March‎ ‎18‎, ‎2013 ‎8‎:‎10‎ ‎PM
To: user@lucenenet.apache.org
Subject: lucene.net 3.0.3 indexing spatial too slow


I have recently upgraded my search code from lucene.net 2.9.4 to 3.0.3. I have noticed a change in the spatial packages and have updated my code accordingly. One drawback from the upgrade that I have noticed is much slower index times. Through process of elimination, I have been able to narrow the slowness down to the new spatial code that indexes the lat/long coordinates:
public void AddLocation (double lat, double lng)
    {
        try
        {
            string latLongKey = lat.ToString() + "," + lng.ToString();
            AbstractField[] shapeFields = null;
            Shape shape = null;
            if (HasSpatialShapes(latLongKey))
            {
                shape = SpatialShapes[latLongKey];
            }
            else
            {
                if (this.Strategy is BBoxStrategy)
                {
                    shape = Context.MakeRectangle(DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat), DistanceUtils.NormLatDEG(lat));
                }
               else
                {
                    shape = Context.MakePoint(DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat));
                }

                AddSpatialShapes(latLongKey, shape);
            }

            shapeFields = Strategy.CreateIndexableFields(shape);
            //Potentially more than one shape in this field is supported by some
            // strategies; see the javadocs of the SpatialStrategy impl to see.
            foreach (AbstractField f in shapeFields)
            {
                _document.Add(f);
            }
            //add lat long values to index too
            _document.Add(GetField("latitude", NumericUtils.DoubleToPrefixCoded(lat), Field.Index.NOT_ANALYZED, Field.Store.YES, 0f, false));
            _document.Add(GetField("longitude", NumericUtils.DoubleToPrefixCoded(lng), Field.Index.NOT_ANALYZED, Field.Store.YES, 0f, false));
        }
        catch (Exception e)
        {
            RollingFileLogger.Instance.LogException(ServiceConstants.SERVICE_INDEXER_CONST, "Document",string.Format("AddLocation({0},{1})", lat.ToString(), lng.ToString()), e, null);
            throw e;
        }
    }

With 2.9.4, I was able to index about 300,000 rows of data with lat/lng points in about 11 minutes. With this new spatial package it takes upwards of 5 hours (I've killed the test before it finishes so I don't have an exact timing for it). Here is the spatial context/strategy I am using:


public static SpatialContext SpatialContext

   {

       get

       {

           if (null == _spatialContext)

           {

               lock (_lockObject)

               {

                   if(null==_spatialContext) _spatialContext = SpatialContext.GEO;

               }

           }

           return _spatialContext;

       }

   }



   public static SpatialStrategy SpatialStrategy

   {

       get

       {

           if (null == _spatialStrategy)

           {

               lock (_lockObject)

               {

                   if (null == _spatialStrategy)

                   {

                       int maxLength = 9;

                       GeohashPrefixTree geohashPrefixTree = new GeohashPrefixTree(SpatialContext, maxLength);

                       _spatialStrategy = new RecursivePrefixTreeStrategy(geohashPrefixTree, "geoField");

                   }

               }

           }

           return _spatialStrategy;

       }

   }

Is there something I am doing wrong with my indexing approach? I have cached the shapes that get created by the lat/lng points since I don't need a new shape for the same coordinates. It appears to be the CreateIndexableFields() method that is taking the most time during indexing. I've tried to cache the fields generated by this method to reuse but I can't create a new instance of the TokenStream from the cached field to use in a new Document (in lucene.net 3.0.3 the constructor for TokenStream is protected). I've lowered the maxLevels int to 4 in the spatial strategy but I haven't seen an improvement in indexing times. Any feedback would be greatly appreciated.

________________________________
Anthony Rodriguez
Senior Software Developer

Spark Networks<http://www.spark.net> | Igniting Relationships(r)
8383 Wilshire Blvd. Suite 800 | Beverly Hills, CA 90211
p. 323 658 3000 ext. 8021 | f. 866 945 5209
________________________________