You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@guacamole.apache.org by Nick Couchman <vn...@apache.org> on 2022/07/15 15:23:48 UTC

Spice Protocol Audio Input

Hello, everyone,
I'm plugging along trying to make progress on getting the Spice
protocol implemented in guacd in my not-so-plentiful amounts of free
time. I'm currently trying to get audio input working, and running
into a couple of roadblocks to that - mostly of a conceptual nature.

I'm using the RDP audio input plugin and buffer as a template, which,
of course, only goes so far. However, one common thing that I've found
among several different audio input buffer implementations, including
the Guacamole RDP one, is some sort of parameter that defines the
number frames expected by the destination (RDP Server, Spice server,
etc.). In Guacamole's RDP implementation this is passed by the RDP
server in the Open PDU provided by the RDP server itself, and
Guacamole then reads this value and uses it for both sizing of the
audio input buffer as well as knowing when to flush data from the
buffer.

The issue I'm running into - probably mostly because I've never dealt
with processing audio before - is that I cannot find a "frames" analog
in the Spice API. The "record-start" signal that the API uses to
trigger the Spice client to start recording sends the format (16-bit),
number of channels, and audio rate, and that's it - nothing about
frames per packet or anything like that. There also do not seem to be
any properties available anywhere else in the API that would dictate
this, nor can I find anything in the Spice protocol documentation that
indicates a standard or fixed value.

So, my questions are...Is there some sort of safe or fixed default
value I can assume for frames per packet, for both the purpose of
allocating the audio buffer and determining when to flush the data? Or
is there some other way I should think about this that doesn't deal in
"packets" at all, and instead just uses the items I'm given - format,
channels, and rate - to both allocate buffers and process the data?

Thanks, in advance, for any insight!

-Nick

Re: Spice Protocol Audio Input

Posted by Michael Jumper <mj...@apache.org>.
With audio input, latency increases with increasing buffer size, so you'll
want to keep that as small as you can get away with. For example, I think
the client side of the webapp currently uses a buffer of 2048
sample-frames, which would be roughly 46 ms of audio data at the common
44.1 kHz rate, thus roughly 46 ms of latency.

Whether it's necessary to additionally buffer that to smooth out any
hiccups in the network will depend on how Spice behaves in practice. Does a
typical Spice server take it upon itself to perform that buffering (ie:
within the emulated audio device exposed to applications)? What about the
applications that read from the audio device within the Spice session? If
buffering is already handled on the Spice server side, then there's no need
to additionally buffer on our end except as minimally necessary to perform
resampling, assuming resampling is necessary at all.

- Mike

On Sun, Jul 17, 2022 at 6:16 AM Nick Couchman <ni...@gmail.com>
wrote:

> Okay, makes sense. I'm guessing that a buffer of some sort will still
> be required for temporarily storing the data between reading it from
> the stream and sending it to the Spice server. Is it best to pick a
> fixed/constant buffer size, maybe based on time constant (10 seconds?)
> and then multiply that by audio rate and channels to get required
> buffer size?
>
> -Nick
>
> On Sun, Jul 17, 2022 at 1:09 AM Michael Jumper <mj...@apache.org> wrote:
> >
> > If the Spice protocol doesn't appear to require anything specific in
> terms
> > of packet size, I'd try just sending packets of data identical to the
> blobs
> > received from the webapp, resampling as needed.
> >
> > I believe the only reason that's necessary in the RDP case is that RDP is
> > *very* particular about the size of received packets. It may be that
> Spice
> > doesn't impose such limits, in which case we can avoid the complexity
> > entirely.
> >
> > - Mike
> >
> > On Fri, Jul 15, 2022, 08:24 Nick Couchman <vn...@apache.org> wrote:
> >
> > > Hello, everyone,
> > > I'm plugging along trying to make progress on getting the Spice
> > > protocol implemented in guacd in my not-so-plentiful amounts of free
> > > time. I'm currently trying to get audio input working, and running
> > > into a couple of roadblocks to that - mostly of a conceptual nature.
> > >
> > > I'm using the RDP audio input plugin and buffer as a template, which,
> > > of course, only goes so far. However, one common thing that I've found
> > > among several different audio input buffer implementations, including
> > > the Guacamole RDP one, is some sort of parameter that defines the
> > > number frames expected by the destination (RDP Server, Spice server,
> > > etc.). In Guacamole's RDP implementation this is passed by the RDP
> > > server in the Open PDU provided by the RDP server itself, and
> > > Guacamole then reads this value and uses it for both sizing of the
> > > audio input buffer as well as knowing when to flush data from the
> > > buffer.
> > >
> > > The issue I'm running into - probably mostly because I've never dealt
> > > with processing audio before - is that I cannot find a "frames" analog
> > > in the Spice API. The "record-start" signal that the API uses to
> > > trigger the Spice client to start recording sends the format (16-bit),
> > > number of channels, and audio rate, and that's it - nothing about
> > > frames per packet or anything like that. There also do not seem to be
> > > any properties available anywhere else in the API that would dictate
> > > this, nor can I find anything in the Spice protocol documentation that
> > > indicates a standard or fixed value.
> > >
> > > So, my questions are...Is there some sort of safe or fixed default
> > > value I can assume for frames per packet, for both the purpose of
> > > allocating the audio buffer and determining when to flush the data? Or
> > > is there some other way I should think about this that doesn't deal in
> > > "packets" at all, and instead just uses the items I'm given - format,
> > > channels, and rate - to both allocate buffers and process the data?
> > >
> > > Thanks, in advance, for any insight!
> > >
> > > -Nick
> > >
>

Re: Spice Protocol Audio Input

Posted by Nick Couchman <ni...@gmail.com>.
Okay, makes sense. I'm guessing that a buffer of some sort will still
be required for temporarily storing the data between reading it from
the stream and sending it to the Spice server. Is it best to pick a
fixed/constant buffer size, maybe based on time constant (10 seconds?)
and then multiply that by audio rate and channels to get required
buffer size?

-Nick

On Sun, Jul 17, 2022 at 1:09 AM Michael Jumper <mj...@apache.org> wrote:
>
> If the Spice protocol doesn't appear to require anything specific in terms
> of packet size, I'd try just sending packets of data identical to the blobs
> received from the webapp, resampling as needed.
>
> I believe the only reason that's necessary in the RDP case is that RDP is
> *very* particular about the size of received packets. It may be that Spice
> doesn't impose such limits, in which case we can avoid the complexity
> entirely.
>
> - Mike
>
> On Fri, Jul 15, 2022, 08:24 Nick Couchman <vn...@apache.org> wrote:
>
> > Hello, everyone,
> > I'm plugging along trying to make progress on getting the Spice
> > protocol implemented in guacd in my not-so-plentiful amounts of free
> > time. I'm currently trying to get audio input working, and running
> > into a couple of roadblocks to that - mostly of a conceptual nature.
> >
> > I'm using the RDP audio input plugin and buffer as a template, which,
> > of course, only goes so far. However, one common thing that I've found
> > among several different audio input buffer implementations, including
> > the Guacamole RDP one, is some sort of parameter that defines the
> > number frames expected by the destination (RDP Server, Spice server,
> > etc.). In Guacamole's RDP implementation this is passed by the RDP
> > server in the Open PDU provided by the RDP server itself, and
> > Guacamole then reads this value and uses it for both sizing of the
> > audio input buffer as well as knowing when to flush data from the
> > buffer.
> >
> > The issue I'm running into - probably mostly because I've never dealt
> > with processing audio before - is that I cannot find a "frames" analog
> > in the Spice API. The "record-start" signal that the API uses to
> > trigger the Spice client to start recording sends the format (16-bit),
> > number of channels, and audio rate, and that's it - nothing about
> > frames per packet or anything like that. There also do not seem to be
> > any properties available anywhere else in the API that would dictate
> > this, nor can I find anything in the Spice protocol documentation that
> > indicates a standard or fixed value.
> >
> > So, my questions are...Is there some sort of safe or fixed default
> > value I can assume for frames per packet, for both the purpose of
> > allocating the audio buffer and determining when to flush the data? Or
> > is there some other way I should think about this that doesn't deal in
> > "packets" at all, and instead just uses the items I'm given - format,
> > channels, and rate - to both allocate buffers and process the data?
> >
> > Thanks, in advance, for any insight!
> >
> > -Nick
> >

Re: Spice Protocol Audio Input

Posted by Michael Jumper <mj...@apache.org>.
If the Spice protocol doesn't appear to require anything specific in terms
of packet size, I'd try just sending packets of data identical to the blobs
received from the webapp, resampling as needed.

I believe the only reason that's necessary in the RDP case is that RDP is
*very* particular about the size of received packets. It may be that Spice
doesn't impose such limits, in which case we can avoid the complexity
entirely.

- Mike

On Fri, Jul 15, 2022, 08:24 Nick Couchman <vn...@apache.org> wrote:

> Hello, everyone,
> I'm plugging along trying to make progress on getting the Spice
> protocol implemented in guacd in my not-so-plentiful amounts of free
> time. I'm currently trying to get audio input working, and running
> into a couple of roadblocks to that - mostly of a conceptual nature.
>
> I'm using the RDP audio input plugin and buffer as a template, which,
> of course, only goes so far. However, one common thing that I've found
> among several different audio input buffer implementations, including
> the Guacamole RDP one, is some sort of parameter that defines the
> number frames expected by the destination (RDP Server, Spice server,
> etc.). In Guacamole's RDP implementation this is passed by the RDP
> server in the Open PDU provided by the RDP server itself, and
> Guacamole then reads this value and uses it for both sizing of the
> audio input buffer as well as knowing when to flush data from the
> buffer.
>
> The issue I'm running into - probably mostly because I've never dealt
> with processing audio before - is that I cannot find a "frames" analog
> in the Spice API. The "record-start" signal that the API uses to
> trigger the Spice client to start recording sends the format (16-bit),
> number of channels, and audio rate, and that's it - nothing about
> frames per packet or anything like that. There also do not seem to be
> any properties available anywhere else in the API that would dictate
> this, nor can I find anything in the Spice protocol documentation that
> indicates a standard or fixed value.
>
> So, my questions are...Is there some sort of safe or fixed default
> value I can assume for frames per packet, for both the purpose of
> allocating the audio buffer and determining when to flush the data? Or
> is there some other way I should think about this that doesn't deal in
> "packets" at all, and instead just uses the items I'm given - format,
> channels, and rate - to both allocate buffers and process the data?
>
> Thanks, in advance, for any insight!
>
> -Nick
>