Commit 39761e79 authored by Fabrice Bellet's avatar Fabrice Bellet

conncheck: adjust recheck on timeout strategy

The pair recheck on timeout can easily cause repetitive rechecks in
a ping-pong effect, if both peers with the same behaviour try to
check the same pair almost simultaneously, and if the network rtt
is greater than the initial timer rto. The reply to the initial
stun request may arrive after the in-progress conncheck
cancellation (described in RFC 5245, sect 7.2.1.4). Cancellation
creates a new stun request, and forgets the initial one.
The conncheck timer is restarted with the same initial value,
so the same situation happens again later.

We prefer a make a unique recheck per pair to avoid this bouncing
effect. Another workaround could be a avoid resetting the timer in such
situation. After enough retransmissions, the timeout delay, that doubles
after each timeout, becomes longer than the rtt, and the stun reply can
be handled.  The problem with this workaround is that a given stun
request may not be given enough individual retransmissions. If we
reset the number of retransmission while keeping the rto value, this
may lead to insanely long timeout values.

So the proposed strategy to make a unique recheck seems a good
compromise.

Differential Revision: https://phabricator.freedesktop.org/D1115
parent d05e7c8f
......@@ -579,7 +579,27 @@ candidate_check_pair_fail (NiceStream *stream, NiceAgent *agent, CandidateCheckP
static gboolean
priv_conn_recheck_on_timeout (NiceAgent *agent, CandidateCheckPair *p)
{
if (p->recheck_on_timeout) {
/* The pair recheck on timeout can easily cause repetitive rechecks in
* a ping-pong effect, if both peers with the same behaviour try to
* check the same pair almost simultaneously, and if the network rtt
* is greater than the initial timer rto. The reply to the initial
* stun request may arrive after the in-progress conncheck
* cancellation (described in RFC 5245, sect 7.2.1.4). Cancellation
* creates a new stun request, and forgets the initial one.
* The conncheck timer is restarted with the same initial value,
* so the same situation happens again later.
*
* We choose a make a unique recheck per pair to avoid this bouncing
* effect. Another workaround could be a avoid resetting the timer in
* such situation. After enough retransmissions, the timeout delay
* becomes longer than the rtt, and the stun reply can be handled.
* The problem is that a given stun request may not be given enough
* individual retransmissions.
*/
if (p->recheck_on_timeout && p->recheck_on_timeout_count > 0)
nice_debug ("Agent %p : pair %p has already been cancelled once, "
"ignoring recheck_on_timeout", agent, p);
if (p->recheck_on_timeout && p->recheck_on_timeout_count == 0) {
g_assert (p->state == NICE_CHECK_IN_PROGRESS);
/* this cancelled pair may have the flag 'mark nominated
* on response arrival' set, we want to keep it, because
......@@ -592,6 +612,7 @@ priv_conn_recheck_on_timeout (NiceAgent *agent, CandidateCheckPair *p)
nice_debug ("Agent %p : pair %p was cancelled, "
"triggering a new connection check", agent, p);
p->recheck_on_timeout = FALSE;
p->recheck_on_timeout_count++;
priv_add_pair_to_triggered_check_queue (agent, p);
return TRUE;
}
......
......@@ -89,6 +89,7 @@ struct _CandidateCheckPair
gboolean use_candidate_on_next_check;
gboolean mark_nominated_on_response_arrival;
gboolean recheck_on_timeout;
guint recheck_on_timeout_count;
struct _CandidateCheckPair *discovered_pair;
struct _CandidateCheckPair *succeeded_pair;
guint64 priority;
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment