You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@datasketches.apache.org by Matthew Farkas <mf...@gmail.com> on 2021/07/09 17:02:23 UTC

Postgres Performance Question

Hi,

My name is Matt and I'm a data engineer at Spotify. I'm testing out trying
Data Sketches with Postgres, and running into some performance issues. I'm
seeing merge times much slower than what I'm seeing in the docs here
<https://datasketches.apache.org/docs/Theta/ThetaMergeSpeed.html> (millions
of sketches/sec).

In my case, I've pre-computed many sketches, inserted then into PG, then
I'm running queries in PG and doing the merging there. My hunch is that
there's something wrong with my Postgres configs, which I've tried tweaking
extensively but haven't been able to improve query time.

My question is if anyone knows what type of performance can be expected in
Postgres and if anyone has any examples/tips in general from their
implementations.

Also, this is my first message to this list, so please let me know if I
should be directing it anywhere else!

Thanks!!
Matt



*Matthew Z. Farkas*

Data Science @ Spotify
MS Northwestern University, BS Georgia Tech

m: (770) 337-2709
e: mfarkas27@gmail.com

<https://www.linkedin.com/in/matthewzfarkas>

Re: [E] Postgres Performance Question

Posted by Will Lauer <wl...@verizonmedia.com.INVALID>.
What we've found in our code (I'm primarily using the Java version of Theta
sketches, although I've also been experimenting with the C++ version), is
that the merge time depends heavily on the overall size of the sketch. The
theta sketch works by keeping a sample of the values inserted into it, so
the size will vary the number of values that you have inserted into it.
While there are fewer values than K, the sketch is in exact mode, retaining
ALL the values, but once you exceed K, you have reached the maximum sketch
size and sampling starts, putting the sketch into estimation mode. The data
set I primarily deal with has a power law distribution - the majority of my
sketches are smaller, in exact mode, some with only a couple of unique
values, while a small portion are fully populated and are in estimation
mode. Unioning the sketches that are in estimation mode is obviously slower
that those in exact mode, as it has less data to process.

Unioning sketches with a smaller K is faster, becuase the overall size of
the sketches is smaller. Sketches with logK=10 have a maximum size 4 times
smaller that logK=12. If most of your sketches are in estimation mode, this
could mean a 4x difference in performance, but the difference will be
smaller the more sketches you have in exact mode. The difference between
logK=10 and logK=12 should be minimal if your sketches are in exact mode.
If they are in estimation mode, they should give similar (but not exactly
the same) estimates, but with different error bounds.

I've been doing some experimentation with the C++ version of theta sketches
and another database integration. I'm using logK 14 and 16, and seeing
100k+/sec when doing merge operations, but we definitely haven't finished
optimizing that integration and haven't run it on production quatlity
hardware yet.

WIll

<http://www.verizonmedia.com>

Will Lauer

Senior Principal Architect, Audience & Advertising Reporting
Data Platforms & Systems Engineering

M 508 561 6427
1908 S. First St
Champaign, IL 61822

<http://www.facebook.com/verizonmedia>   <http://twitter.com/verizonmedia>
<https://www.linkedin.com/company/verizon-media/>
<http://www.instagram.com/verizonmedia>



On Fri, Jul 9, 2021 at 12:32 PM Matthew Farkas <mf...@gmail.com> wrote:

> Hi Will,
>
> Thanks for the quick response! For your questions:
>
> 1. Yup, looking at Theta sketches for set operations.
> 2. So I'm creating the initial sketches in dataflow like so, with K=4096
> (so lgK=12) right now:
>     UpdateSketch userSketch = UpdateSketch.builder().build(K);
>     userSketch.update(requestValue.userId())
>     // pass to PG using
>     ByteString.copyFrom(userSketch.compact().toByteArray());
> 3. By "sketch size", do you mean the number of uniques in each sketch? If
> so, there's a good bit of variance in sketch size, as I'm segmenting (by
> dimensions like demo, geo, etc.) users and saving a sketch for each segment.
> 4. I do not know the proportion that are in direct vs. estimation.
> (Admittedly, I'm not familiar with the differences there, will check it
> out.) Is this explicitly set? Or maybe determined based on K & sketch size.
>
> One thing I found interesting was that doing a
> `THETA_SKETCH_UNION(user_id_sketch, 10)` on all sketches vastly improved
> query time (70s to 6s), and produced the exact same results. I expected the
> results to be the same, since lgK=12 when originally creating the sketches,
> but I'm not sure why that would improve query time.
>
> Thanks again!
>
>
>
> *Matthew Z. Farkas*
>
> Data Science @ Spotify
> MS Northwestern University, BS Georgia Tech
>
> m: (770) 337-2709
> e: mfarkas27@gmail.com
>
>
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=_B4g9-sQFrwL2GzbG1fjNfeXKWB07Ba_UTtlvY4fnoA&s=hlBJ_oEHx6EcGHH16svgyOOeXG_1t90iq2o2ZSTP9Ag&e=>
>
>
> On Fri, Jul 9, 2021 at 1:13 PM Will Lauer <wl...@verizonmedia.com.invalid>
> wrote:
>
>> Welcome Matt!
>>
>> One of the others is probably best qualified to answer your question, but
>> I'll chime in early with a couple of questions. The performance of merging
>> depends on many factors, including type of sketch and sketch size. I'm
>> assuming from the link you posted that you are dealing with Theta sketches,
>> for count unique operations. Can you confirm that? If so, what's the logK
>> you are using? What is the sketch size? Do you happen to know what
>> proportion of your sketches are in estimation mode vs exact mode?
>>
>> Will
>>
>> <http://www.verizonmedia.com>
>>
>> Will Lauer
>>
>> Senior Principal Architect, Audience & Advertising Reporting
>> Data Platforms & Systems Engineering
>>
>> M 508 561 6427
>> 1908 S. First St
>> Champaign, IL 61822
>>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=_B4g9-sQFrwL2GzbG1fjNfeXKWB07Ba_UTtlvY4fnoA&s=_gzRAxTGuR3xXP-F5SfMl9m9RrlXxIfkwWWS48FwF40&e=>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=_B4g9-sQFrwL2GzbG1fjNfeXKWB07Ba_UTtlvY4fnoA&s=WK_9dkSMOWNjIBBGadxxXusdMUM3C6BN3OHluOa4rl8&e=>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=_B4g9-sQFrwL2GzbG1fjNfeXKWB07Ba_UTtlvY4fnoA&s=2dlLMi-34dvKkp73rrNZ7MBqCOVabEyiqAmWRwVVzFw&e=>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=_B4g9-sQFrwL2GzbG1fjNfeXKWB07Ba_UTtlvY4fnoA&s=MTTPriU7r0cuq7vPMnAvC4ah_nrn4HvP4j8jSvLwgAc&e=>
>>
>>
>>
>> On Fri, Jul 9, 2021 at 12:02 PM Matthew Farkas <mf...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> My name is Matt and I'm a data engineer at Spotify. I'm testing out
>>> trying Data Sketches with Postgres, and running into some
>>> performance issues. I'm seeing merge times much slower than what I'm seeing
>>> in the docs here
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__datasketches.apache.org_docs_Theta_ThetaMergeSpeed.html&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=R8BDffIXwyiZ46IUKowhz2-gQqGfpM3u-KkwplE4Ing&e=> (millions
>>> of sketches/sec).
>>>
>>> In my case, I've pre-computed many sketches, inserted then into PG, then
>>> I'm running queries in PG and doing the merging there. My hunch is that
>>> there's something wrong with my Postgres configs, which I've tried tweaking
>>> extensively but haven't been able to improve query time.
>>>
>>> My question is if anyone knows what type of performance can be expected
>>> in Postgres and if anyone has any examples/tips in general from their
>>> implementations.
>>>
>>> Also, this is my first message to this list, so please let me know if I
>>> should be directing it anywhere else!
>>>
>>> Thanks!!
>>> Matt
>>>
>>>
>>>
>>> *Matthew Z. Farkas*
>>>
>>> Data Science @ Spotify
>>> MS Northwestern University, BS Georgia Tech
>>>
>>> m: (770) 337-2709
>>> e: mfarkas27@gmail.com
>>>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=WBAi_Zz2AI6QpCCX6AsWbHRrBwTG4JtAMLfzxzllOU4&e=>
>>>
>>

Re: [E] Postgres Performance Question

Posted by Alexander Saydakov <sa...@verizonmedia.com.INVALID>.
Everyone is using all sorts of engines: Druid, Spark, Hive, Pig, Splice
Machine and so on. I would love to know if there is any Greenplum
installation using Datasketches.

On Mon, Jul 12, 2021 at 10:00 AM Matthew Farkas <mf...@gmail.com> wrote:

> Ah, thanks, Alexander.
>
> That makes sense, I started digging into cpu usage, and noticed that
> queries can only use one cpu in my single-host case.
>
> So sounds like to use datasketches at this scale, everyone is currently
> using druid (if no one is using greenplum)?
>
> [image: image.png]
>
>
>
> *Matthew Z. Farkas*
>
> Data Science @ Spotify
> MS Northwestern University, BS Georgia Tech
>
> m: (770) 337-2709
> e: mfarkas27@gmail.com
>
>
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=qLTR_K9NKiNg1ePOz3nolQUm9_f6BH9WjB1R7pW7kVc&s=ZsFGcOKd7oVEW5UuRd3MBcbNWbVXkyvL-1uFVgqSr9Y&e=>
>
>
> On Mon, Jul 12, 2021 at 12:36 PM Alexander Saydakov
> <sa...@verizonmedia.com.invalid> wrote:
>
>> Matt,
>> I assume you are running a single-host PostgreSQL. If so, your numbers
>> don't look too bad I would say. You may want to consider the distributed
>> variant, which is Greenplum. However I am not aware of any deployment of
>> our extension in such environments.
>>
>> On Fri, Jul 9, 2021 at 2:41 PM Will Lauer <wl...@verizonmedia.com.invalid>
>> wrote:
>>
>>> Matt,
>>>
>>> In my production case, I'm building sketches using java in an ETL
>>> pipeline and then loading them into a Druid datamart, which aggregates them
>>> together when it receives queries. Queries might aggregate several hundred
>>> sketches all the way to many millions (the average number is probably in
>>> the 100's of thousands), depending on the time frame involved in the query
>>> and the particular dimensions selected. The majority of our queries (95%+)
>>> return in less than 10 seconds. This is running on a cluster with between
>>> 150 and 200 nodes.
>>>
>>> We are investigating implementing this in an alternative database, but
>>> haven't gotten that database working in a performant way yet (due to some
>>> problems with the databases' API, not due to sketches), but are working
>>> with the vendor to find some workarounds.
>>>
>>> Will
>>>
>>> <http://www.verizonmedia.com>
>>>
>>> Will Lauer
>>>
>>> Senior Principal Architect, Audience & Advertising Reporting
>>> Data Platforms & Systems Engineering
>>>
>>> M 508 561 6427
>>> 1908 S. First St
>>> Champaign, IL 61822
>>>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=57XsolCiQCmaWB6pOS1IQ3j3GHdH3P95fd1GxvaPJ2M&e=>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=sYwbOS3PJaMGg7HGlH8AtHTxJrjAr-zbzNptyihuvDM&e=>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=KqW7eRDxcvjFALxVALwlain6zSytoHqDJLipg3rSunM&e=>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=trPaP60q5pMtDCgPGneuxL0UMfJ8DnavcbkMHNiHj9Y&e=>
>>>
>>>
>>>
>>> On Fri, Jul 9, 2021 at 2:57 PM Matthew Farkas <mf...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm running PG 13.3 and pg-datasketches 1.3.0 (I built from master
>>>> after running into this issue
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_datasketches-2Dpostgresql_issues_34&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=4tX6hAxcgLT0zeFgrAKVZ-oxngSqXLrUVy9rYDZIPZE&s=bEI9ZIoMM-58NW0wMXeJ0Ben3Mg0BYk2FamasN9e75A&e=>
>>>> ).
>>>>
>>>> So some rough numbers- I have a week-hour table with 168 user_id
>>>> sketches, all would be estimates and not exact, and that is taking 21ms for
>>>> unioning those 168 sketches.
>>>> - 13k sketches is taking 1-2s
>>>> - 13m sketches was taking ~2min yesterday (I must have updated a config
>>>> that hurt this, though, I'm cancelling the query after 9mins now)
>>>>
>>>> Will-
>>>> Thanks for the background. So you're combining the sketches in Java-
>>>> are you retrieving them from a db? Also, how many sketches are you
>>>> typically merging?
>>>>
>>>>
>>>>
>>>> *Matthew Z. Farkas*
>>>>
>>>> Data Science @ Spotify
>>>> MS Northwestern University, BS Georgia Tech
>>>>
>>>> m: (770) 337-2709
>>>> e: mfarkas27@gmail.com
>>>>
>>>>
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=4tX6hAxcgLT0zeFgrAKVZ-oxngSqXLrUVy9rYDZIPZE&s=wR4KZ0n2kgAyu0WCCxyxdMddHWTfnUSaY9H4r9fjJ2U&e=>
>>>>
>>>>
>>>> On Fri, Jul 9, 2021 at 1:53 PM Alexander Saydakov
>>>> <sa...@verizonmedia.com.invalid> wrote:
>>>>
>>>>> Hi Matt,
>>>>> What version of PostgreSQL and DataSketches are you using?
>>>>> Could you give some numbers? How many sketches? How long does the
>>>>> union take?
>>>>>
>>>>> The graph you are referring to was based on performance in Druid I
>>>>> believe. So it may or may not be transferable to PostgreSQL. We did not do
>>>>> a large-scale test in PostgreSQL.
>>>>>
>>>>> Also we have a performance improvement in the works, which is supposed
>>>>> to avoid some cost of deserialization of Theta sketches. It might speed
>>>>> things up 10-15% according to some preliminary testing.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 9, 2021 at 10:32 AM Matthew Farkas <mf...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Will,
>>>>>>
>>>>>> Thanks for the quick response! For your questions:
>>>>>>
>>>>>> 1. Yup, looking at Theta sketches for set operations.
>>>>>> 2. So I'm creating the initial sketches in dataflow like so, with
>>>>>> K=4096 (so lgK=12) right now:
>>>>>>     UpdateSketch userSketch = UpdateSketch.builder().build(K);
>>>>>>     userSketch.update(requestValue.userId())
>>>>>>     // pass to PG using
>>>>>>     ByteString.copyFrom(userSketch.compact().toByteArray());
>>>>>> 3. By "sketch size", do you mean the number of uniques in each
>>>>>> sketch? If so, there's a good bit of variance in sketch size, as I'm
>>>>>> segmenting (by dimensions like demo, geo, etc.) users and saving a sketch
>>>>>> for each segment.
>>>>>> 4. I do not know the proportion that are in direct vs. estimation.
>>>>>> (Admittedly, I'm not familiar with the differences there, will check it
>>>>>> out.) Is this explicitly set? Or maybe determined based on K & sketch size.
>>>>>>
>>>>>> One thing I found interesting was that doing a
>>>>>> `THETA_SKETCH_UNION(user_id_sketch, 10)` on all sketches vastly improved
>>>>>> query time (70s to 6s), and produced the exact same results. I expected the
>>>>>> results to be the same, since lgK=12 when originally creating the sketches,
>>>>>> but I'm not sure why that would improve query time.
>>>>>>
>>>>>> Thanks again!
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Matthew Z. Farkas*
>>>>>>
>>>>>> Data Science @ Spotify
>>>>>> MS Northwestern University, BS Georgia Tech
>>>>>>
>>>>>> m: (770) 337-2709
>>>>>> e: mfarkas27@gmail.com
>>>>>>
>>>>>>
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=zHLsL8UzcCcVZJGnwJ_cAY9tZt12_0GAe-aetSX7hRs&e=>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 9, 2021 at 1:13 PM Will Lauer
>>>>>> <wl...@verizonmedia.com.invalid> wrote:
>>>>>>
>>>>>>> Welcome Matt!
>>>>>>>
>>>>>>> One of the others is probably best qualified to answer your
>>>>>>> question, but I'll chime in early with a couple of questions. The
>>>>>>> performance of merging depends on many factors, including type of sketch
>>>>>>> and sketch size. I'm assuming from the link you posted that you are dealing
>>>>>>> with Theta sketches, for count unique operations. Can you confirm that? If
>>>>>>> so, what's the logK you are using? What is the sketch size? Do you happen
>>>>>>> to know what proportion of your sketches are in estimation mode vs exact
>>>>>>> mode?
>>>>>>>
>>>>>>> Will
>>>>>>>
>>>>>>> <http://www.verizonmedia.com>
>>>>>>>
>>>>>>> Will Lauer
>>>>>>>
>>>>>>> Senior Principal Architect, Audience & Advertising Reporting
>>>>>>> Data Platforms & Systems Engineering
>>>>>>>
>>>>>>> M 508 561 6427
>>>>>>> 1908 S. First St
>>>>>>> Champaign, IL 61822
>>>>>>>
>>>>>>>
>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=jRrfF2nGEDNEOSN9u2TMIRbAao3Qya1dLiv0QLMNIrw&e=>
>>>>>>>
>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=R7lAUjJWXf1nxnzQVpYAnTkOe0Nj7JensDwaKj9B-r0&e=>
>>>>>>>
>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=l_zRh61jHy17fBuu9BQPIqxm4y9-HZCwKEtwhH8Qnos&e=>
>>>>>>>
>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=L5CKzXaeysdQ8JJq0pCGb3V6CM43b-vd-9vUK5qEgk8&e=>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 9, 2021 at 12:02 PM Matthew Farkas <mf...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> My name is Matt and I'm a data engineer at Spotify. I'm testing out
>>>>>>>> trying Data Sketches with Postgres, and running into some
>>>>>>>> performance issues. I'm seeing merge times much slower than what I'm seeing
>>>>>>>> in the docs here
>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__datasketches.apache.org_docs_Theta_ThetaMergeSpeed.html&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=R8BDffIXwyiZ46IUKowhz2-gQqGfpM3u-KkwplE4Ing&e=> (millions
>>>>>>>> of sketches/sec).
>>>>>>>>
>>>>>>>> In my case, I've pre-computed many sketches, inserted then into PG,
>>>>>>>> then I'm running queries in PG and doing the merging there. My hunch is
>>>>>>>> that there's something wrong with my Postgres configs, which I've tried
>>>>>>>> tweaking extensively but haven't been able to improve query time.
>>>>>>>>
>>>>>>>> My question is if anyone knows what type of performance can be
>>>>>>>> expected in Postgres and if anyone has any examples/tips in general from
>>>>>>>> their implementations.
>>>>>>>>
>>>>>>>> Also, this is my first message to this list, so please let me know
>>>>>>>> if I should be directing it anywhere else!
>>>>>>>>
>>>>>>>> Thanks!!
>>>>>>>> Matt
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Matthew Z. Farkas*
>>>>>>>>
>>>>>>>> Data Science @ Spotify
>>>>>>>> MS Northwestern University, BS Georgia Tech
>>>>>>>>
>>>>>>>> m: (770) 337-2709
>>>>>>>> e: mfarkas27@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=WBAi_Zz2AI6QpCCX6AsWbHRrBwTG4JtAMLfzxzllOU4&e=>
>>>>>>>>
>>>>>>>

Re: [E] Postgres Performance Question

Posted by Matthew Farkas <mf...@gmail.com>.
Ah, thanks, Alexander.

That makes sense, I started digging into cpu usage, and noticed that
queries can only use one cpu in my single-host case.

So sounds like to use datasketches at this scale, everyone is currently
using druid (if no one is using greenplum)?

[image: image.png]



*Matthew Z. Farkas*

Data Science @ Spotify
MS Northwestern University, BS Georgia Tech

m: (770) 337-2709
e: mfarkas27@gmail.com

<https://www.linkedin.com/in/matthewzfarkas>


On Mon, Jul 12, 2021 at 12:36 PM Alexander Saydakov
<sa...@verizonmedia.com.invalid> wrote:

> Matt,
> I assume you are running a single-host PostgreSQL. If so, your numbers
> don't look too bad I would say. You may want to consider the distributed
> variant, which is Greenplum. However I am not aware of any deployment of
> our extension in such environments.
>
> On Fri, Jul 9, 2021 at 2:41 PM Will Lauer <wl...@verizonmedia.com.invalid>
> wrote:
>
>> Matt,
>>
>> In my production case, I'm building sketches using java in an ETL
>> pipeline and then loading them into a Druid datamart, which aggregates them
>> together when it receives queries. Queries might aggregate several hundred
>> sketches all the way to many millions (the average number is probably in
>> the 100's of thousands), depending on the time frame involved in the query
>> and the particular dimensions selected. The majority of our queries (95%+)
>> return in less than 10 seconds. This is running on a cluster with between
>> 150 and 200 nodes.
>>
>> We are investigating implementing this in an alternative database, but
>> haven't gotten that database working in a performant way yet (due to some
>> problems with the databases' API, not due to sketches), but are working
>> with the vendor to find some workarounds.
>>
>> Will
>>
>> <http://www.verizonmedia.com>
>>
>> Will Lauer
>>
>> Senior Principal Architect, Audience & Advertising Reporting
>> Data Platforms & Systems Engineering
>>
>> M 508 561 6427
>> 1908 S. First St
>> Champaign, IL 61822
>>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=57XsolCiQCmaWB6pOS1IQ3j3GHdH3P95fd1GxvaPJ2M&e=>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=sYwbOS3PJaMGg7HGlH8AtHTxJrjAr-zbzNptyihuvDM&e=>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=KqW7eRDxcvjFALxVALwlain6zSytoHqDJLipg3rSunM&e=>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=trPaP60q5pMtDCgPGneuxL0UMfJ8DnavcbkMHNiHj9Y&e=>
>>
>>
>>
>> On Fri, Jul 9, 2021 at 2:57 PM Matthew Farkas <mf...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm running PG 13.3 and pg-datasketches 1.3.0 (I built from master after
>>> running into this issue
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_datasketches-2Dpostgresql_issues_34&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=4tX6hAxcgLT0zeFgrAKVZ-oxngSqXLrUVy9rYDZIPZE&s=bEI9ZIoMM-58NW0wMXeJ0Ben3Mg0BYk2FamasN9e75A&e=>
>>> ).
>>>
>>> So some rough numbers- I have a week-hour table with 168 user_id
>>> sketches, all would be estimates and not exact, and that is taking 21ms for
>>> unioning those 168 sketches.
>>> - 13k sketches is taking 1-2s
>>> - 13m sketches was taking ~2min yesterday (I must have updated a config
>>> that hurt this, though, I'm cancelling the query after 9mins now)
>>>
>>> Will-
>>> Thanks for the background. So you're combining the sketches in Java- are
>>> you retrieving them from a db? Also, how many sketches are you typically
>>> merging?
>>>
>>>
>>>
>>> *Matthew Z. Farkas*
>>>
>>> Data Science @ Spotify
>>> MS Northwestern University, BS Georgia Tech
>>>
>>> m: (770) 337-2709
>>> e: mfarkas27@gmail.com
>>>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=4tX6hAxcgLT0zeFgrAKVZ-oxngSqXLrUVy9rYDZIPZE&s=wR4KZ0n2kgAyu0WCCxyxdMddHWTfnUSaY9H4r9fjJ2U&e=>
>>>
>>>
>>> On Fri, Jul 9, 2021 at 1:53 PM Alexander Saydakov
>>> <sa...@verizonmedia.com.invalid> wrote:
>>>
>>>> Hi Matt,
>>>> What version of PostgreSQL and DataSketches are you using?
>>>> Could you give some numbers? How many sketches? How long does the union
>>>> take?
>>>>
>>>> The graph you are referring to was based on performance in Druid I
>>>> believe. So it may or may not be transferable to PostgreSQL. We did not do
>>>> a large-scale test in PostgreSQL.
>>>>
>>>> Also we have a performance improvement in the works, which is supposed
>>>> to avoid some cost of deserialization of Theta sketches. It might speed
>>>> things up 10-15% according to some preliminary testing.
>>>>
>>>>
>>>>
>>>> On Fri, Jul 9, 2021 at 10:32 AM Matthew Farkas <mf...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Will,
>>>>>
>>>>> Thanks for the quick response! For your questions:
>>>>>
>>>>> 1. Yup, looking at Theta sketches for set operations.
>>>>> 2. So I'm creating the initial sketches in dataflow like so, with
>>>>> K=4096 (so lgK=12) right now:
>>>>>     UpdateSketch userSketch = UpdateSketch.builder().build(K);
>>>>>     userSketch.update(requestValue.userId())
>>>>>     // pass to PG using
>>>>>     ByteString.copyFrom(userSketch.compact().toByteArray());
>>>>> 3. By "sketch size", do you mean the number of uniques in each sketch?
>>>>> If so, there's a good bit of variance in sketch size, as I'm segmenting (by
>>>>> dimensions like demo, geo, etc.) users and saving a sketch for each segment.
>>>>> 4. I do not know the proportion that are in direct vs. estimation.
>>>>> (Admittedly, I'm not familiar with the differences there, will check it
>>>>> out.) Is this explicitly set? Or maybe determined based on K & sketch size.
>>>>>
>>>>> One thing I found interesting was that doing a
>>>>> `THETA_SKETCH_UNION(user_id_sketch, 10)` on all sketches vastly improved
>>>>> query time (70s to 6s), and produced the exact same results. I expected the
>>>>> results to be the same, since lgK=12 when originally creating the sketches,
>>>>> but I'm not sure why that would improve query time.
>>>>>
>>>>> Thanks again!
>>>>>
>>>>>
>>>>>
>>>>> *Matthew Z. Farkas*
>>>>>
>>>>> Data Science @ Spotify
>>>>> MS Northwestern University, BS Georgia Tech
>>>>>
>>>>> m: (770) 337-2709
>>>>> e: mfarkas27@gmail.com
>>>>>
>>>>>
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=zHLsL8UzcCcVZJGnwJ_cAY9tZt12_0GAe-aetSX7hRs&e=>
>>>>>
>>>>>
>>>>> On Fri, Jul 9, 2021 at 1:13 PM Will Lauer
>>>>> <wl...@verizonmedia.com.invalid> wrote:
>>>>>
>>>>>> Welcome Matt!
>>>>>>
>>>>>> One of the others is probably best qualified to answer your question,
>>>>>> but I'll chime in early with a couple of questions. The performance of
>>>>>> merging depends on many factors, including type of sketch and sketch size.
>>>>>> I'm assuming from the link you posted that you are dealing with Theta
>>>>>> sketches, for count unique operations. Can you confirm that? If so, what's
>>>>>> the logK you are using? What is the sketch size? Do you happen to know what
>>>>>> proportion of your sketches are in estimation mode vs exact mode?
>>>>>>
>>>>>> Will
>>>>>>
>>>>>> <http://www.verizonmedia.com>
>>>>>>
>>>>>> Will Lauer
>>>>>>
>>>>>> Senior Principal Architect, Audience & Advertising Reporting
>>>>>> Data Platforms & Systems Engineering
>>>>>>
>>>>>> M 508 561 6427
>>>>>> 1908 S. First St
>>>>>> Champaign, IL 61822
>>>>>>
>>>>>>
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=jRrfF2nGEDNEOSN9u2TMIRbAao3Qya1dLiv0QLMNIrw&e=>
>>>>>>
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=R7lAUjJWXf1nxnzQVpYAnTkOe0Nj7JensDwaKj9B-r0&e=>
>>>>>>
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=l_zRh61jHy17fBuu9BQPIqxm4y9-HZCwKEtwhH8Qnos&e=>
>>>>>>
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=L5CKzXaeysdQ8JJq0pCGb3V6CM43b-vd-9vUK5qEgk8&e=>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 9, 2021 at 12:02 PM Matthew Farkas <mf...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> My name is Matt and I'm a data engineer at Spotify. I'm testing out
>>>>>>> trying Data Sketches with Postgres, and running into some
>>>>>>> performance issues. I'm seeing merge times much slower than what I'm seeing
>>>>>>> in the docs here
>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__datasketches.apache.org_docs_Theta_ThetaMergeSpeed.html&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=R8BDffIXwyiZ46IUKowhz2-gQqGfpM3u-KkwplE4Ing&e=> (millions
>>>>>>> of sketches/sec).
>>>>>>>
>>>>>>> In my case, I've pre-computed many sketches, inserted then into PG,
>>>>>>> then I'm running queries in PG and doing the merging there. My hunch is
>>>>>>> that there's something wrong with my Postgres configs, which I've tried
>>>>>>> tweaking extensively but haven't been able to improve query time.
>>>>>>>
>>>>>>> My question is if anyone knows what type of performance can be
>>>>>>> expected in Postgres and if anyone has any examples/tips in general from
>>>>>>> their implementations.
>>>>>>>
>>>>>>> Also, this is my first message to this list, so please let me know
>>>>>>> if I should be directing it anywhere else!
>>>>>>>
>>>>>>> Thanks!!
>>>>>>> Matt
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Matthew Z. Farkas*
>>>>>>>
>>>>>>> Data Science @ Spotify
>>>>>>> MS Northwestern University, BS Georgia Tech
>>>>>>>
>>>>>>> m: (770) 337-2709
>>>>>>> e: mfarkas27@gmail.com
>>>>>>>
>>>>>>>
>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=WBAi_Zz2AI6QpCCX6AsWbHRrBwTG4JtAMLfzxzllOU4&e=>
>>>>>>>
>>>>>>

Re: [E] Postgres Performance Question

Posted by Alexander Saydakov <sa...@verizonmedia.com.INVALID>.
Matt,
I assume you are running a single-host PostgreSQL. If so, your numbers
don't look too bad I would say. You may want to consider the distributed
variant, which is Greenplum. However I am not aware of any deployment of
our extension in such environments.

On Fri, Jul 9, 2021 at 2:41 PM Will Lauer <wl...@verizonmedia.com.invalid>
wrote:

> Matt,
>
> In my production case, I'm building sketches using java in an ETL pipeline
> and then loading them into a Druid datamart, which aggregates them together
> when it receives queries. Queries might aggregate several hundred sketches
> all the way to many millions (the average number is probably in the 100's
> of thousands), depending on the time frame involved in the query and the
> particular dimensions selected. The majority of our queries (95%+) return
> in less than 10 seconds. This is running on a cluster with between 150 and
> 200 nodes.
>
> We are investigating implementing this in an alternative database, but
> haven't gotten that database working in a performant way yet (due to some
> problems with the databases' API, not due to sketches), but are working
> with the vendor to find some workarounds.
>
> Will
>
> <http://www.verizonmedia.com>
>
> Will Lauer
>
> Senior Principal Architect, Audience & Advertising Reporting
> Data Platforms & Systems Engineering
>
> M 508 561 6427
> 1908 S. First St
> Champaign, IL 61822
>
>
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=57XsolCiQCmaWB6pOS1IQ3j3GHdH3P95fd1GxvaPJ2M&e=>
>
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=sYwbOS3PJaMGg7HGlH8AtHTxJrjAr-zbzNptyihuvDM&e=>
>
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=KqW7eRDxcvjFALxVALwlain6zSytoHqDJLipg3rSunM&e=>
>
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=JkYL6uq0qoDR1Cvko3w9WWpX6sPJ5r64kDiNY_i0Stk&s=trPaP60q5pMtDCgPGneuxL0UMfJ8DnavcbkMHNiHj9Y&e=>
>
>
>
> On Fri, Jul 9, 2021 at 2:57 PM Matthew Farkas <mf...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm running PG 13.3 and pg-datasketches 1.3.0 (I built from master after
>> running into this issue
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_datasketches-2Dpostgresql_issues_34&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=4tX6hAxcgLT0zeFgrAKVZ-oxngSqXLrUVy9rYDZIPZE&s=bEI9ZIoMM-58NW0wMXeJ0Ben3Mg0BYk2FamasN9e75A&e=>
>> ).
>>
>> So some rough numbers- I have a week-hour table with 168 user_id
>> sketches, all would be estimates and not exact, and that is taking 21ms for
>> unioning those 168 sketches.
>> - 13k sketches is taking 1-2s
>> - 13m sketches was taking ~2min yesterday (I must have updated a config
>> that hurt this, though, I'm cancelling the query after 9mins now)
>>
>> Will-
>> Thanks for the background. So you're combining the sketches in Java- are
>> you retrieving them from a db? Also, how many sketches are you typically
>> merging?
>>
>>
>>
>> *Matthew Z. Farkas*
>>
>> Data Science @ Spotify
>> MS Northwestern University, BS Georgia Tech
>>
>> m: (770) 337-2709
>> e: mfarkas27@gmail.com
>>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=4tX6hAxcgLT0zeFgrAKVZ-oxngSqXLrUVy9rYDZIPZE&s=wR4KZ0n2kgAyu0WCCxyxdMddHWTfnUSaY9H4r9fjJ2U&e=>
>>
>>
>> On Fri, Jul 9, 2021 at 1:53 PM Alexander Saydakov
>> <sa...@verizonmedia.com.invalid> wrote:
>>
>>> Hi Matt,
>>> What version of PostgreSQL and DataSketches are you using?
>>> Could you give some numbers? How many sketches? How long does the union
>>> take?
>>>
>>> The graph you are referring to was based on performance in Druid I
>>> believe. So it may or may not be transferable to PostgreSQL. We did not do
>>> a large-scale test in PostgreSQL.
>>>
>>> Also we have a performance improvement in the works, which is supposed
>>> to avoid some cost of deserialization of Theta sketches. It might speed
>>> things up 10-15% according to some preliminary testing.
>>>
>>>
>>>
>>> On Fri, Jul 9, 2021 at 10:32 AM Matthew Farkas <mf...@gmail.com>
>>> wrote:
>>>
>>>> Hi Will,
>>>>
>>>> Thanks for the quick response! For your questions:
>>>>
>>>> 1. Yup, looking at Theta sketches for set operations.
>>>> 2. So I'm creating the initial sketches in dataflow like so, with
>>>> K=4096 (so lgK=12) right now:
>>>>     UpdateSketch userSketch = UpdateSketch.builder().build(K);
>>>>     userSketch.update(requestValue.userId())
>>>>     // pass to PG using
>>>>     ByteString.copyFrom(userSketch.compact().toByteArray());
>>>> 3. By "sketch size", do you mean the number of uniques in each sketch?
>>>> If so, there's a good bit of variance in sketch size, as I'm segmenting (by
>>>> dimensions like demo, geo, etc.) users and saving a sketch for each segment.
>>>> 4. I do not know the proportion that are in direct vs. estimation.
>>>> (Admittedly, I'm not familiar with the differences there, will check it
>>>> out.) Is this explicitly set? Or maybe determined based on K & sketch size.
>>>>
>>>> One thing I found interesting was that doing a
>>>> `THETA_SKETCH_UNION(user_id_sketch, 10)` on all sketches vastly improved
>>>> query time (70s to 6s), and produced the exact same results. I expected the
>>>> results to be the same, since lgK=12 when originally creating the sketches,
>>>> but I'm not sure why that would improve query time.
>>>>
>>>> Thanks again!
>>>>
>>>>
>>>>
>>>> *Matthew Z. Farkas*
>>>>
>>>> Data Science @ Spotify
>>>> MS Northwestern University, BS Georgia Tech
>>>>
>>>> m: (770) 337-2709
>>>> e: mfarkas27@gmail.com
>>>>
>>>>
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=zHLsL8UzcCcVZJGnwJ_cAY9tZt12_0GAe-aetSX7hRs&e=>
>>>>
>>>>
>>>> On Fri, Jul 9, 2021 at 1:13 PM Will Lauer
>>>> <wl...@verizonmedia.com.invalid> wrote:
>>>>
>>>>> Welcome Matt!
>>>>>
>>>>> One of the others is probably best qualified to answer your question,
>>>>> but I'll chime in early with a couple of questions. The performance of
>>>>> merging depends on many factors, including type of sketch and sketch size.
>>>>> I'm assuming from the link you posted that you are dealing with Theta
>>>>> sketches, for count unique operations. Can you confirm that? If so, what's
>>>>> the logK you are using? What is the sketch size? Do you happen to know what
>>>>> proportion of your sketches are in estimation mode vs exact mode?
>>>>>
>>>>> Will
>>>>>
>>>>> <http://www.verizonmedia.com>
>>>>>
>>>>> Will Lauer
>>>>>
>>>>> Senior Principal Architect, Audience & Advertising Reporting
>>>>> Data Platforms & Systems Engineering
>>>>>
>>>>> M 508 561 6427
>>>>> 1908 S. First St
>>>>> Champaign, IL 61822
>>>>>
>>>>>
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=jRrfF2nGEDNEOSN9u2TMIRbAao3Qya1dLiv0QLMNIrw&e=>
>>>>>
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=R7lAUjJWXf1nxnzQVpYAnTkOe0Nj7JensDwaKj9B-r0&e=>
>>>>>
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=l_zRh61jHy17fBuu9BQPIqxm4y9-HZCwKEtwhH8Qnos&e=>
>>>>>
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=L5CKzXaeysdQ8JJq0pCGb3V6CM43b-vd-9vUK5qEgk8&e=>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 9, 2021 at 12:02 PM Matthew Farkas <mf...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> My name is Matt and I'm a data engineer at Spotify. I'm testing out
>>>>>> trying Data Sketches with Postgres, and running into some
>>>>>> performance issues. I'm seeing merge times much slower than what I'm seeing
>>>>>> in the docs here
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__datasketches.apache.org_docs_Theta_ThetaMergeSpeed.html&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=R8BDffIXwyiZ46IUKowhz2-gQqGfpM3u-KkwplE4Ing&e=> (millions
>>>>>> of sketches/sec).
>>>>>>
>>>>>> In my case, I've pre-computed many sketches, inserted then into PG,
>>>>>> then I'm running queries in PG and doing the merging there. My hunch is
>>>>>> that there's something wrong with my Postgres configs, which I've tried
>>>>>> tweaking extensively but haven't been able to improve query time.
>>>>>>
>>>>>> My question is if anyone knows what type of performance can be
>>>>>> expected in Postgres and if anyone has any examples/tips in general from
>>>>>> their implementations.
>>>>>>
>>>>>> Also, this is my first message to this list, so please let me know if
>>>>>> I should be directing it anywhere else!
>>>>>>
>>>>>> Thanks!!
>>>>>> Matt
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Matthew Z. Farkas*
>>>>>>
>>>>>> Data Science @ Spotify
>>>>>> MS Northwestern University, BS Georgia Tech
>>>>>>
>>>>>> m: (770) 337-2709
>>>>>> e: mfarkas27@gmail.com
>>>>>>
>>>>>>
>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=WBAi_Zz2AI6QpCCX6AsWbHRrBwTG4JtAMLfzxzllOU4&e=>
>>>>>>
>>>>>

Re: [E] Postgres Performance Question

Posted by Will Lauer <wl...@verizonmedia.com.INVALID>.
Matt,

In my production case, I'm building sketches using java in an ETL pipeline
and then loading them into a Druid datamart, which aggregates them together
when it receives queries. Queries might aggregate several hundred sketches
all the way to many millions (the average number is probably in the 100's
of thousands), depending on the time frame involved in the query and the
particular dimensions selected. The majority of our queries (95%+) return
in less than 10 seconds. This is running on a cluster with between 150 and
200 nodes.

We are investigating implementing this in an alternative database, but
haven't gotten that database working in a performant way yet (due to some
problems with the databases' API, not due to sketches), but are working
with the vendor to find some workarounds.

Will

<http://www.verizonmedia.com>

Will Lauer

Senior Principal Architect, Audience & Advertising Reporting
Data Platforms & Systems Engineering

M 508 561 6427
1908 S. First St
Champaign, IL 61822

<http://www.facebook.com/verizonmedia>   <http://twitter.com/verizonmedia>
<https://www.linkedin.com/company/verizon-media/>
<http://www.instagram.com/verizonmedia>



On Fri, Jul 9, 2021 at 2:57 PM Matthew Farkas <mf...@gmail.com> wrote:

> Hi,
>
> I'm running PG 13.3 and pg-datasketches 1.3.0 (I built from master after
> running into this issue
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_datasketches-2Dpostgresql_issues_34&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=4tX6hAxcgLT0zeFgrAKVZ-oxngSqXLrUVy9rYDZIPZE&s=bEI9ZIoMM-58NW0wMXeJ0Ben3Mg0BYk2FamasN9e75A&e=>
> ).
>
> So some rough numbers- I have a week-hour table with 168 user_id sketches,
> all would be estimates and not exact, and that is taking 21ms for
> unioning those 168 sketches.
> - 13k sketches is taking 1-2s
> - 13m sketches was taking ~2min yesterday (I must have updated a config
> that hurt this, though, I'm cancelling the query after 9mins now)
>
> Will-
> Thanks for the background. So you're combining the sketches in Java- are
> you retrieving them from a db? Also, how many sketches are you typically
> merging?
>
>
>
> *Matthew Z. Farkas*
>
> Data Science @ Spotify
> MS Northwestern University, BS Georgia Tech
>
> m: (770) 337-2709
> e: mfarkas27@gmail.com
>
>
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=4tX6hAxcgLT0zeFgrAKVZ-oxngSqXLrUVy9rYDZIPZE&s=wR4KZ0n2kgAyu0WCCxyxdMddHWTfnUSaY9H4r9fjJ2U&e=>
>
>
> On Fri, Jul 9, 2021 at 1:53 PM Alexander Saydakov
> <sa...@verizonmedia.com.invalid> wrote:
>
>> Hi Matt,
>> What version of PostgreSQL and DataSketches are you using?
>> Could you give some numbers? How many sketches? How long does the union
>> take?
>>
>> The graph you are referring to was based on performance in Druid I
>> believe. So it may or may not be transferable to PostgreSQL. We did not do
>> a large-scale test in PostgreSQL.
>>
>> Also we have a performance improvement in the works, which is supposed to
>> avoid some cost of deserialization of Theta sketches. It might speed things
>> up 10-15% according to some preliminary testing.
>>
>>
>>
>> On Fri, Jul 9, 2021 at 10:32 AM Matthew Farkas <mf...@gmail.com>
>> wrote:
>>
>>> Hi Will,
>>>
>>> Thanks for the quick response! For your questions:
>>>
>>> 1. Yup, looking at Theta sketches for set operations.
>>> 2. So I'm creating the initial sketches in dataflow like so, with K=4096
>>> (so lgK=12) right now:
>>>     UpdateSketch userSketch = UpdateSketch.builder().build(K);
>>>     userSketch.update(requestValue.userId())
>>>     // pass to PG using
>>>     ByteString.copyFrom(userSketch.compact().toByteArray());
>>> 3. By "sketch size", do you mean the number of uniques in each sketch?
>>> If so, there's a good bit of variance in sketch size, as I'm segmenting (by
>>> dimensions like demo, geo, etc.) users and saving a sketch for each segment.
>>> 4. I do not know the proportion that are in direct vs. estimation.
>>> (Admittedly, I'm not familiar with the differences there, will check it
>>> out.) Is this explicitly set? Or maybe determined based on K & sketch size.
>>>
>>> One thing I found interesting was that doing a
>>> `THETA_SKETCH_UNION(user_id_sketch, 10)` on all sketches vastly improved
>>> query time (70s to 6s), and produced the exact same results. I expected the
>>> results to be the same, since lgK=12 when originally creating the sketches,
>>> but I'm not sure why that would improve query time.
>>>
>>> Thanks again!
>>>
>>>
>>>
>>> *Matthew Z. Farkas*
>>>
>>> Data Science @ Spotify
>>> MS Northwestern University, BS Georgia Tech
>>>
>>> m: (770) 337-2709
>>> e: mfarkas27@gmail.com
>>>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=zHLsL8UzcCcVZJGnwJ_cAY9tZt12_0GAe-aetSX7hRs&e=>
>>>
>>>
>>> On Fri, Jul 9, 2021 at 1:13 PM Will Lauer
>>> <wl...@verizonmedia.com.invalid> wrote:
>>>
>>>> Welcome Matt!
>>>>
>>>> One of the others is probably best qualified to answer your question,
>>>> but I'll chime in early with a couple of questions. The performance of
>>>> merging depends on many factors, including type of sketch and sketch size.
>>>> I'm assuming from the link you posted that you are dealing with Theta
>>>> sketches, for count unique operations. Can you confirm that? If so, what's
>>>> the logK you are using? What is the sketch size? Do you happen to know what
>>>> proportion of your sketches are in estimation mode vs exact mode?
>>>>
>>>> Will
>>>>
>>>> <http://www.verizonmedia.com>
>>>>
>>>> Will Lauer
>>>>
>>>> Senior Principal Architect, Audience & Advertising Reporting
>>>> Data Platforms & Systems Engineering
>>>>
>>>> M 508 561 6427
>>>> 1908 S. First St
>>>> Champaign, IL 61822
>>>>
>>>>
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=jRrfF2nGEDNEOSN9u2TMIRbAao3Qya1dLiv0QLMNIrw&e=>
>>>>
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=R7lAUjJWXf1nxnzQVpYAnTkOe0Nj7JensDwaKj9B-r0&e=>
>>>>
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=l_zRh61jHy17fBuu9BQPIqxm4y9-HZCwKEtwhH8Qnos&e=>
>>>>
>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=L5CKzXaeysdQ8JJq0pCGb3V6CM43b-vd-9vUK5qEgk8&e=>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 9, 2021 at 12:02 PM Matthew Farkas <mf...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> My name is Matt and I'm a data engineer at Spotify. I'm testing out
>>>>> trying Data Sketches with Postgres, and running into some
>>>>> performance issues. I'm seeing merge times much slower than what I'm seeing
>>>>> in the docs here
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__datasketches.apache.org_docs_Theta_ThetaMergeSpeed.html&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=R8BDffIXwyiZ46IUKowhz2-gQqGfpM3u-KkwplE4Ing&e=> (millions
>>>>> of sketches/sec).
>>>>>
>>>>> In my case, I've pre-computed many sketches, inserted then into PG,
>>>>> then I'm running queries in PG and doing the merging there. My hunch is
>>>>> that there's something wrong with my Postgres configs, which I've tried
>>>>> tweaking extensively but haven't been able to improve query time.
>>>>>
>>>>> My question is if anyone knows what type of performance can be
>>>>> expected in Postgres and if anyone has any examples/tips in general from
>>>>> their implementations.
>>>>>
>>>>> Also, this is my first message to this list, so please let me know if
>>>>> I should be directing it anywhere else!
>>>>>
>>>>> Thanks!!
>>>>> Matt
>>>>>
>>>>>
>>>>>
>>>>> *Matthew Z. Farkas*
>>>>>
>>>>> Data Science @ Spotify
>>>>> MS Northwestern University, BS Georgia Tech
>>>>>
>>>>> m: (770) 337-2709
>>>>> e: mfarkas27@gmail.com
>>>>>
>>>>>
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=WBAi_Zz2AI6QpCCX6AsWbHRrBwTG4JtAMLfzxzllOU4&e=>
>>>>>
>>>>

Re: [E] Postgres Performance Question

Posted by Matthew Farkas <mf...@gmail.com>.
Hi,

I'm running PG 13.3 and pg-datasketches 1.3.0 (I built from master after
running into this issue
<https://github.com/apache/datasketches-postgresql/issues/34>).

So some rough numbers- I have a week-hour table with 168 user_id sketches,
all would be estimates and not exact, and that is taking 21ms for
unioning those 168 sketches.
- 13k sketches is taking 1-2s
- 13m sketches was taking ~2min yesterday (I must have updated a config
that hurt this, though, I'm cancelling the query after 9mins now)

Will-
Thanks for the background. So you're combining the sketches in Java- are
you retrieving them from a db? Also, how many sketches are you typically
merging?



*Matthew Z. Farkas*

Data Science @ Spotify
MS Northwestern University, BS Georgia Tech

m: (770) 337-2709
e: mfarkas27@gmail.com

<https://www.linkedin.com/in/matthewzfarkas>


On Fri, Jul 9, 2021 at 1:53 PM Alexander Saydakov
<sa...@verizonmedia.com.invalid> wrote:

> Hi Matt,
> What version of PostgreSQL and DataSketches are you using?
> Could you give some numbers? How many sketches? How long does the union
> take?
>
> The graph you are referring to was based on performance in Druid I
> believe. So it may or may not be transferable to PostgreSQL. We did not do
> a large-scale test in PostgreSQL.
>
> Also we have a performance improvement in the works, which is supposed to
> avoid some cost of deserialization of Theta sketches. It might speed things
> up 10-15% according to some preliminary testing.
>
>
>
> On Fri, Jul 9, 2021 at 10:32 AM Matthew Farkas <mf...@gmail.com>
> wrote:
>
>> Hi Will,
>>
>> Thanks for the quick response! For your questions:
>>
>> 1. Yup, looking at Theta sketches for set operations.
>> 2. So I'm creating the initial sketches in dataflow like so, with K=4096
>> (so lgK=12) right now:
>>     UpdateSketch userSketch = UpdateSketch.builder().build(K);
>>     userSketch.update(requestValue.userId())
>>     // pass to PG using
>>     ByteString.copyFrom(userSketch.compact().toByteArray());
>> 3. By "sketch size", do you mean the number of uniques in each sketch? If
>> so, there's a good bit of variance in sketch size, as I'm segmenting (by
>> dimensions like demo, geo, etc.) users and saving a sketch for each segment.
>> 4. I do not know the proportion that are in direct vs. estimation.
>> (Admittedly, I'm not familiar with the differences there, will check it
>> out.) Is this explicitly set? Or maybe determined based on K & sketch size.
>>
>> One thing I found interesting was that doing a
>> `THETA_SKETCH_UNION(user_id_sketch, 10)` on all sketches vastly improved
>> query time (70s to 6s), and produced the exact same results. I expected the
>> results to be the same, since lgK=12 when originally creating the sketches,
>> but I'm not sure why that would improve query time.
>>
>> Thanks again!
>>
>>
>>
>> *Matthew Z. Farkas*
>>
>> Data Science @ Spotify
>> MS Northwestern University, BS Georgia Tech
>>
>> m: (770) 337-2709
>> e: mfarkas27@gmail.com
>>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=zHLsL8UzcCcVZJGnwJ_cAY9tZt12_0GAe-aetSX7hRs&e=>
>>
>>
>> On Fri, Jul 9, 2021 at 1:13 PM Will Lauer <wl...@verizonmedia.com.invalid>
>> wrote:
>>
>>> Welcome Matt!
>>>
>>> One of the others is probably best qualified to answer your question,
>>> but I'll chime in early with a couple of questions. The performance of
>>> merging depends on many factors, including type of sketch and sketch size.
>>> I'm assuming from the link you posted that you are dealing with Theta
>>> sketches, for count unique operations. Can you confirm that? If so, what's
>>> the logK you are using? What is the sketch size? Do you happen to know what
>>> proportion of your sketches are in estimation mode vs exact mode?
>>>
>>> Will
>>>
>>> <http://www.verizonmedia.com>
>>>
>>> Will Lauer
>>>
>>> Senior Principal Architect, Audience & Advertising Reporting
>>> Data Platforms & Systems Engineering
>>>
>>> M 508 561 6427
>>> 1908 S. First St
>>> Champaign, IL 61822
>>>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=jRrfF2nGEDNEOSN9u2TMIRbAao3Qya1dLiv0QLMNIrw&e=>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=R7lAUjJWXf1nxnzQVpYAnTkOe0Nj7JensDwaKj9B-r0&e=>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=l_zRh61jHy17fBuu9BQPIqxm4y9-HZCwKEtwhH8Qnos&e=>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=L5CKzXaeysdQ8JJq0pCGb3V6CM43b-vd-9vUK5qEgk8&e=>
>>>
>>>
>>>
>>> On Fri, Jul 9, 2021 at 12:02 PM Matthew Farkas <mf...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> My name is Matt and I'm a data engineer at Spotify. I'm testing out
>>>> trying Data Sketches with Postgres, and running into some
>>>> performance issues. I'm seeing merge times much slower than what I'm seeing
>>>> in the docs here
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__datasketches.apache.org_docs_Theta_ThetaMergeSpeed.html&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=R8BDffIXwyiZ46IUKowhz2-gQqGfpM3u-KkwplE4Ing&e=> (millions
>>>> of sketches/sec).
>>>>
>>>> In my case, I've pre-computed many sketches, inserted then into PG,
>>>> then I'm running queries in PG and doing the merging there. My hunch is
>>>> that there's something wrong with my Postgres configs, which I've tried
>>>> tweaking extensively but haven't been able to improve query time.
>>>>
>>>> My question is if anyone knows what type of performance can be expected
>>>> in Postgres and if anyone has any examples/tips in general from their
>>>> implementations.
>>>>
>>>> Also, this is my first message to this list, so please let me know if I
>>>> should be directing it anywhere else!
>>>>
>>>> Thanks!!
>>>> Matt
>>>>
>>>>
>>>>
>>>> *Matthew Z. Farkas*
>>>>
>>>> Data Science @ Spotify
>>>> MS Northwestern University, BS Georgia Tech
>>>>
>>>> m: (770) 337-2709
>>>> e: mfarkas27@gmail.com
>>>>
>>>>
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=WBAi_Zz2AI6QpCCX6AsWbHRrBwTG4JtAMLfzxzllOU4&e=>
>>>>
>>>

Re: [E] Postgres Performance Question

Posted by Alexander Saydakov <sa...@verizonmedia.com.INVALID>.
Hi Matt,
What version of PostgreSQL and DataSketches are you using?
Could you give some numbers? How many sketches? How long does the union
take?

The graph you are referring to was based on performance in Druid I believe.
So it may or may not be transferable to PostgreSQL. We did not do a
large-scale test in PostgreSQL.

Also we have a performance improvement in the works, which is supposed to
avoid some cost of deserialization of Theta sketches. It might speed things
up 10-15% according to some preliminary testing.



On Fri, Jul 9, 2021 at 10:32 AM Matthew Farkas <mf...@gmail.com> wrote:

> Hi Will,
>
> Thanks for the quick response! For your questions:
>
> 1. Yup, looking at Theta sketches for set operations.
> 2. So I'm creating the initial sketches in dataflow like so, with K=4096
> (so lgK=12) right now:
>     UpdateSketch userSketch = UpdateSketch.builder().build(K);
>     userSketch.update(requestValue.userId())
>     // pass to PG using
>     ByteString.copyFrom(userSketch.compact().toByteArray());
> 3. By "sketch size", do you mean the number of uniques in each sketch? If
> so, there's a good bit of variance in sketch size, as I'm segmenting (by
> dimensions like demo, geo, etc.) users and saving a sketch for each segment.
> 4. I do not know the proportion that are in direct vs. estimation.
> (Admittedly, I'm not familiar with the differences there, will check it
> out.) Is this explicitly set? Or maybe determined based on K & sketch size.
>
> One thing I found interesting was that doing a
> `THETA_SKETCH_UNION(user_id_sketch, 10)` on all sketches vastly improved
> query time (70s to 6s), and produced the exact same results. I expected the
> results to be the same, since lgK=12 when originally creating the sketches,
> but I'm not sure why that would improve query time.
>
> Thanks again!
>
>
>
> *Matthew Z. Farkas*
>
> Data Science @ Spotify
> MS Northwestern University, BS Georgia Tech
>
> m: (770) 337-2709
> e: mfarkas27@gmail.com
>
>
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=zHLsL8UzcCcVZJGnwJ_cAY9tZt12_0GAe-aetSX7hRs&e=>
>
>
> On Fri, Jul 9, 2021 at 1:13 PM Will Lauer <wl...@verizonmedia.com.invalid>
> wrote:
>
>> Welcome Matt!
>>
>> One of the others is probably best qualified to answer your question, but
>> I'll chime in early with a couple of questions. The performance of merging
>> depends on many factors, including type of sketch and sketch size. I'm
>> assuming from the link you posted that you are dealing with Theta sketches,
>> for count unique operations. Can you confirm that? If so, what's the logK
>> you are using? What is the sketch size? Do you happen to know what
>> proportion of your sketches are in estimation mode vs exact mode?
>>
>> Will
>>
>> <http://www.verizonmedia.com>
>>
>> Will Lauer
>>
>> Senior Principal Architect, Audience & Advertising Reporting
>> Data Platforms & Systems Engineering
>>
>> M 508 561 6427
>> 1908 S. First St
>> Champaign, IL 61822
>>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=jRrfF2nGEDNEOSN9u2TMIRbAao3Qya1dLiv0QLMNIrw&e=>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=R7lAUjJWXf1nxnzQVpYAnTkOe0Nj7JensDwaKj9B-r0&e=>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=l_zRh61jHy17fBuu9BQPIqxm4y9-HZCwKEtwhH8Qnos&e=>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=0TpvE_u2hS1ubQhK3gLhy94YgZm2k_r8JHJnqgjOXx4&m=3trc9dYkJzjsSQRfnDur7ImwclKqOBk4r-JAAZZewII&s=L5CKzXaeysdQ8JJq0pCGb3V6CM43b-vd-9vUK5qEgk8&e=>
>>
>>
>>
>> On Fri, Jul 9, 2021 at 12:02 PM Matthew Farkas <mf...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> My name is Matt and I'm a data engineer at Spotify. I'm testing out
>>> trying Data Sketches with Postgres, and running into some
>>> performance issues. I'm seeing merge times much slower than what I'm seeing
>>> in the docs here
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__datasketches.apache.org_docs_Theta_ThetaMergeSpeed.html&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=R8BDffIXwyiZ46IUKowhz2-gQqGfpM3u-KkwplE4Ing&e=> (millions
>>> of sketches/sec).
>>>
>>> In my case, I've pre-computed many sketches, inserted then into PG, then
>>> I'm running queries in PG and doing the merging there. My hunch is that
>>> there's something wrong with my Postgres configs, which I've tried tweaking
>>> extensively but haven't been able to improve query time.
>>>
>>> My question is if anyone knows what type of performance can be expected
>>> in Postgres and if anyone has any examples/tips in general from their
>>> implementations.
>>>
>>> Also, this is my first message to this list, so please let me know if I
>>> should be directing it anywhere else!
>>>
>>> Thanks!!
>>> Matt
>>>
>>>
>>>
>>> *Matthew Z. Farkas*
>>>
>>> Data Science @ Spotify
>>> MS Northwestern University, BS Georgia Tech
>>>
>>> m: (770) 337-2709
>>> e: mfarkas27@gmail.com
>>>
>>>
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=WBAi_Zz2AI6QpCCX6AsWbHRrBwTG4JtAMLfzxzllOU4&e=>
>>>
>>

Re: [E] Postgres Performance Question

Posted by Matthew Farkas <mf...@gmail.com>.
Hi Will,

Thanks for the quick response! For your questions:

1. Yup, looking at Theta sketches for set operations.
2. So I'm creating the initial sketches in dataflow like so, with K=4096
(so lgK=12) right now:
    UpdateSketch userSketch = UpdateSketch.builder().build(K);
    userSketch.update(requestValue.userId())
    // pass to PG using
    ByteString.copyFrom(userSketch.compact().toByteArray());
3. By "sketch size", do you mean the number of uniques in each sketch? If
so, there's a good bit of variance in sketch size, as I'm segmenting (by
dimensions like demo, geo, etc.) users and saving a sketch for each segment.
4. I do not know the proportion that are in direct vs. estimation.
(Admittedly, I'm not familiar with the differences there, will check it
out.) Is this explicitly set? Or maybe determined based on K & sketch size.

One thing I found interesting was that doing a
`THETA_SKETCH_UNION(user_id_sketch, 10)` on all sketches vastly improved
query time (70s to 6s), and produced the exact same results. I expected the
results to be the same, since lgK=12 when originally creating the sketches,
but I'm not sure why that would improve query time.

Thanks again!



*Matthew Z. Farkas*

Data Science @ Spotify
MS Northwestern University, BS Georgia Tech

m: (770) 337-2709
e: mfarkas27@gmail.com

<https://www.linkedin.com/in/matthewzfarkas>


On Fri, Jul 9, 2021 at 1:13 PM Will Lauer <wl...@verizonmedia.com.invalid>
wrote:

> Welcome Matt!
>
> One of the others is probably best qualified to answer your question, but
> I'll chime in early with a couple of questions. The performance of merging
> depends on many factors, including type of sketch and sketch size. I'm
> assuming from the link you posted that you are dealing with Theta sketches,
> for count unique operations. Can you confirm that? If so, what's the logK
> you are using? What is the sketch size? Do you happen to know what
> proportion of your sketches are in estimation mode vs exact mode?
>
> Will
>
> <http://www.verizonmedia.com>
>
> Will Lauer
>
> Senior Principal Architect, Audience & Advertising Reporting
> Data Platforms & Systems Engineering
>
> M 508 561 6427
> 1908 S. First St
> Champaign, IL 61822
>
> <http://www.facebook.com/verizonmedia>   <http://twitter.com/verizonmedia>
>    <https://www.linkedin.com/company/verizon-media/>
> <http://www.instagram.com/verizonmedia>
>
>
>
> On Fri, Jul 9, 2021 at 12:02 PM Matthew Farkas <mf...@gmail.com>
> wrote:
>
>> Hi,
>>
>> My name is Matt and I'm a data engineer at Spotify. I'm testing out
>> trying Data Sketches with Postgres, and running into some
>> performance issues. I'm seeing merge times much slower than what I'm seeing
>> in the docs here
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__datasketches.apache.org_docs_Theta_ThetaMergeSpeed.html&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=R8BDffIXwyiZ46IUKowhz2-gQqGfpM3u-KkwplE4Ing&e=> (millions
>> of sketches/sec).
>>
>> In my case, I've pre-computed many sketches, inserted then into PG, then
>> I'm running queries in PG and doing the merging there. My hunch is that
>> there's something wrong with my Postgres configs, which I've tried tweaking
>> extensively but haven't been able to improve query time.
>>
>> My question is if anyone knows what type of performance can be expected
>> in Postgres and if anyone has any examples/tips in general from their
>> implementations.
>>
>> Also, this is my first message to this list, so please let me know if I
>> should be directing it anywhere else!
>>
>> Thanks!!
>> Matt
>>
>>
>>
>> *Matthew Z. Farkas*
>>
>> Data Science @ Spotify
>> MS Northwestern University, BS Georgia Tech
>>
>> m: (770) 337-2709
>> e: mfarkas27@gmail.com
>>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=WBAi_Zz2AI6QpCCX6AsWbHRrBwTG4JtAMLfzxzllOU4&e=>
>>
>

Re: [E] Postgres Performance Question

Posted by Will Lauer <wl...@verizonmedia.com.INVALID>.
Welcome Matt!

One of the others is probably best qualified to answer your question, but
I'll chime in early with a couple of questions. The performance of merging
depends on many factors, including type of sketch and sketch size. I'm
assuming from the link you posted that you are dealing with Theta sketches,
for count unique operations. Can you confirm that? If so, what's the logK
you are using? What is the sketch size? Do you happen to know what
proportion of your sketches are in estimation mode vs exact mode?

Will

<http://www.verizonmedia.com>

Will Lauer

Senior Principal Architect, Audience & Advertising Reporting
Data Platforms & Systems Engineering

M 508 561 6427
1908 S. First St
Champaign, IL 61822

<http://www.facebook.com/verizonmedia>   <http://twitter.com/verizonmedia>
<https://www.linkedin.com/company/verizon-media/>
<http://www.instagram.com/verizonmedia>



On Fri, Jul 9, 2021 at 12:02 PM Matthew Farkas <mf...@gmail.com> wrote:

> Hi,
>
> My name is Matt and I'm a data engineer at Spotify. I'm testing out trying
> Data Sketches with Postgres, and running into some performance issues. I'm
> seeing merge times much slower than what I'm seeing in the docs here
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__datasketches.apache.org_docs_Theta_ThetaMergeSpeed.html&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=R8BDffIXwyiZ46IUKowhz2-gQqGfpM3u-KkwplE4Ing&e=> (millions
> of sketches/sec).
>
> In my case, I've pre-computed many sketches, inserted then into PG, then
> I'm running queries in PG and doing the merging there. My hunch is that
> there's something wrong with my Postgres configs, which I've tried tweaking
> extensively but haven't been able to improve query time.
>
> My question is if anyone knows what type of performance can be expected in
> Postgres and if anyone has any examples/tips in general from their
> implementations.
>
> Also, this is my first message to this list, so please let me know if I
> should be directing it anywhere else!
>
> Thanks!!
> Matt
>
>
>
> *Matthew Z. Farkas*
>
> Data Science @ Spotify
> MS Northwestern University, BS Georgia Tech
>
> m: (770) 337-2709
> e: mfarkas27@gmail.com
>
>
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_matthewzfarkas&d=DwMFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=wfXanJfFTJqpoX0hDe-0GzEkE5YndUaxQMI4dCAQM3c&s=WBAi_Zz2AI6QpCCX6AsWbHRrBwTG4JtAMLfzxzllOU4&e=>
>