SCTP failure detection time

Posted by rimmon | 10:19 AM

SCTP's multihoming failure detection time depends on three tunable parameters:

RTO.min (minimum retransmission timeout)
RTO.max (maximum retransmission timeout), and
Path.Max.Retrans (threshold number of consecutive timeouts that must be exceeded to detect failure).

RFC2960 recommends these values:

RTO.min - 1 second
RTO.max - 60 seconds
Path.Max.Retrans - 5 attempts per destination address

If the timer expires for the destination address, set RTO = RTO * 2 ("back off the timer").
The maximum value discussed (RTO.max) may be used to provide an upper bound to this doubling operation.

Since Path.Max.Retrans = 5 attempts, this translates to a failure detection time of at least 63 seconds (1 + 2 + 4 + 8 + 16 + 32).
In the worse case scenario, taking the maximum of 60 seconds, the failure detection time is 360 seconds (6 * 60).

In another example, where the following parameters are used,

RTO.min - 100ms
RTO.max - 400ms
Path.Max.Retrans - 4 attempts

Then,
Max. failure detection time = (1 + PMR)* RTO.max = 5*400 = 2000ms
Min. failure detection time = 100 + 200 + 400 + 400 + 400 = 1500ms

1 comments
  1. Matthew February 26, 2015 at 4:51 AM  

    Hi Rimmon. I've been working on a few calculations with regards to sctp failure/trigger and have a couple of questions. You state that it only takes minRTO, maxRTO and Pathmax for the calculation. I was under the impression that initRTO and heartbeats were used as well? Here's what I'm currently working with:

    associationMaxRtx = 20
    pathMaxRtx = 10
    minimumRto = 100ms
    initialRto = 200ms
    maximumRto = 600ms

    What would be the time for the path failure?

    Thanks,
    Matt