man/man7/tc-hfsc.7

   1 .TH HFSC 7 "31 October 2011" iproute2 Linux
   2 .ce 1
   3 \fBHIERARCHICAL FAIR SERVICE CURVE\fR
   4 .
   5 .SH "HISTORY & INTRODUCTION"
   6 .
   7 HFSC \- \fBHierarchical Fair Service Curve\fR was first presented at
   8 SIGCOMM'97. Developed as a part of ALTQ (ALTernative Queuing) on NetBSD, found
   9 its way quickly to other BSD systems, and then a few years ago became part of
  10 the linux kernel. Still, it's not the most popular scheduling algorithm \-
  11 especially if compared to HTB \- and it's not well documented from enduser's
  12 perspective. This introduction aims to explain how HFSC works without
  13 going to deep into math side of things (although some if it will be
  14 inevitable).
  15
  16 In short HFSC aims to:
  17 .
  18 .RS 4
  19 .IP \fB1)\fR 4
  20 guarantee precise bandwidth and delay allocation for all leaf classes (realtime
  21 criterion)
  22 .IP \fB2)\fR
  23 allocate excess bandwidth fairly as specified by class hierarchy (linkshare &
  24 upperlimit criterion)
  25 .IP \fB3)\fR
  26 minimize any discrepancy between the service curve and the actual amount of
  27 service provided during linksharing
  28 .RE
  29 .PP
  30 .
  31 The main "selling" point of HFSC is feature \fB(1)\fR, which is achieved by
  32 using nonlinear service curves (more about what it actually is later). This is
  33 particularly useful in VoIP or games, where not only guarantee of consistent
  34 bandwidth is important, but initial delay of a data stream as well. Note that
  35 it matters only for leaf classes (where the actual queues are) \- thus class
  36 hierarchy is ignored in realtime case.
  37
  38 Feature \fB(2)\fR is well, obvious \- any algorithm featuring class hierarchy
  39 (such as HTB or CBQ) strives to achieve that. HFSC does that well, although
  40 you might end with unusual situations, if you define service curves carelessly
  41 \- see section CORNER CASES for examples.
  42
  43 Feature \fB(3)\fR is mentioned due to the nature of the problem. There may be
  44 situations where it's either not possible to guarantee service of all curves at
  45 the same time, and/or it's impossible to do so fairly. Both will be explained
  46 later. Note that this is mainly related to interior (aka aggregate) classes, as
  47 the leafs are already handled by \fB(1)\fR. Still \- it's perfectly possible to
  48 create a leaf class w/o realtime service, and in such case \- the caveats will
  49 naturally extend to leaf classes as well.
  50
  51 .SH ABBREVIATIONS
  52 For the remaining part of the document, we'll use following shortcuts:
  53 .nf
  54 .RS 4
  55
  56 RT \- realtime
  57 LS \- linkshare
  58 UL \- upperlimit
  59 SC \- service curve
  60 .fi
  61 .
  62 .SH "BASICS OF HFSC"
  63 .
  64 To understand how HFSC works, we must first introduce a service curve.
  65 Overall, it's a nondecreasing function of some time unit, returning amount of
  66 service (allowed or allocated amount of bandwidth) by some specific point in
  67 time. The purpose of it should be subconsciously obvious \- if a class was
  68 allowed to transfer not less than the amount specified by its service curve \-
  69 then service curve is not violated.
  70
  71 Still \- we need more elaborate criterion than just the above (although in
  72 most generic case it can be reduced to it). The criterion has to take two
  73 things into account:
  74 .
  75 .RS 4
  76 .IP \(bu 4
  77 idling periods
  78 .IP \(bu
  79 ability to "look back", so if during current active period service curve is violated, maybe it
  80 isn't if we count excess bandwidth received during earlier active period(s)
  81 .RE
  82 .PP
  83 Let's define the criterion as follows:
  84 .RS 4
  85 .nf
  86 .IP "\fB(1)\fR" 4
  87 For each t1, there must exist t0 in set B, so S(t1\-t0)\~<=\~w(t0,t1)
  88 .fi
  89 .RE
  90 .
  91 .PP
  92 Here 'w' denotes the amount of service received during some time period between t0
  93 and t1. B is a set of all times, where a session becomes active after idling
  94 period (further denoted as 'becoming backlogged'). For a clearer picture,
  95 imagine two situations:
  96 .
  97 .RS 4
  98 .IP \fBa)\fR 4
  99 our session was active during two periods, with a small time gap between them
 100 .IP \fBb)\fR
 101 as in (a), but with a larger gap
 102 .RE
 103 .
 104 .PP
 105 Consider \fB(a)\fR \- if the service received during both periods meets
 106 \fB(1)\fR, then all is good. But what if it doesn't do so during the 2nd
 107 period ? If the amount of service received during the 1st period is bigger
 108 than the service curve, then it might compensate for smaller service during
 109 the 2nd period \fIand\fR the gap \- if the gap is small enough.
 110
 111 If the gap is larger \fB(b)\fR \- then it's less likely to happen (unless the
 112 excess bandwidth allocated during the 1st part was really large). Still, the
 113 larger the gap \- the less interesting is what happened in the past (e.g. 10
 114 minutes ago) \- what matters is the current traffic that just started.
 115
 116 From HFSC's perspective, more interesting is answering the following question:
 117 when should we start transferring packets, so a service curve of a class is not
 118 violated. Or rephrasing it: How much X() amount of service should a session
 119 receive by time t, so the service curve is not violated. Function X() defined
 120 as below is the basic building block of HFSC, used in: eligible, deadline,
 121 virtual\-time and fit\-time curves. Of course, X() is based on equation
 122 \fB(1)\fR and is defined recursively:
 123
 124 .RS 4
 125 .IP \(bu 4
 126 At the 1st backlogged period beginning function X is initialized to generic
 127 service curve assigned to a class
 128 .IP \(bu
 129 At any subsequent backlogged period, X() is:
 130 .nf
 131 \fBmin(X() from previous period ; w(t0)+S(t\-t0) for t>=t0),\fR
 132 .fi
 133 \&... where t0 denotes the beginning of the current backlogged period.
 134 .RE
 135 .
 136 .PP
 137 HFSC uses either linear, or two\-piece linear service curves. In case of
 138 linear or two\-piece linear convex functions (first slope < second slope),
 139 min() in X's definition reduces to the 2nd argument. But in case of two\-piece
 140 concave functions, the 1st argument might quickly become lesser for some
 141 t>=t0. Note, that for some backlogged period, X() is defined only from that
 142 period's beginning. We also define X^(\-1)(w) as smallest t>=t0, for which
 143 X(t)\~=\~w. We have to define it this way, as X() is usually not an injection.
 144
 145 The above generic X() can be one of the following:
 146 .
 147 .RS 4
 148 .IP "E()" 4
 149 In realtime criterion, selects packets eligible for sending. If none are
 150 eligible, HFSC will use linkshare criterion. Eligible time \&'et' is calculated
 151 with reference to packets' heads ( et\~=\~E^(\-1)(w) ). It's based on RT
 152 service curve, \fIbut in case of a convex curve, uses its 2nd slope only.\fR
 153 .IP "D()"
 154 In realtime criterion, selects the most suitable packet from the ones chosen
 155 by E(). Deadline time \&'dt' corresponds to packets' tails
 156 (dt\~=\~D^(\-1)(w+l), where \&'l' is packet's length). Based on RT service
 157 curve.
 158 .IP "V()"
 159 In linkshare criterion, arbitrates which packet to send next. Note that V() is
 160 function of a virtual time \- see \fBLINKSHARE CRITERION\fR section for
 161 details. Virtual time \&'vt' corresponds to packets' heads
 162 (vt\~=\~V^(\-1)(w)). Based on LS service curve.
 163 .IP "F()"
 164 An extension to linkshare criterion, used to limit at which speed linkshare
 165 criterion is allowed to dequeue. Fit\-time 'ft' corresponds to packets' heads
 166 as well (ft\~=\~F^(\-1)(w)). Based on UL service curve.
 167 .RE
 168
 169 Be sure to make clean distinction between session's RT, LS and UL service
 170 curves and the above "utility" functions.
 171 .
 172 .SH "REALTIME CRITERION"
 173 .
 174 RT criterion \fIignores class hierarchy\fR and guarantees precise bandwidth and
 175 delay allocation. We say that packet is eligible for sending, when current real
 176 time is bigger than eligible time. From all packets eligible, the one most
 177 suited for sending, is the one with the smallest deadline time. Sounds simply,
 178 but consider following example:
 179
 180 Interface 10mbit, two classes, both with two\-piece linear service curves:
 181 .RS 4
 182 .IP \(bu 4
 183 1st class \- 2mbit for 100ms, then 7mbit (convex \- 1st slope < 2nd slope)
 184 .IP \(bu
 185 2nd class \- 7mbit for 100ms, then 2mbit (concave \- 1st slope > 2nd slope)
 186 .RE
 187 .PP
 188 Assume for a moment, that we only use D() for both finding eligible packets,
 189 and choosing the most fitting one, thus eligible time would be computed as
 190 D^(\-1)(w) and deadline time would be computed as D^(\-1)(w+l). If the 2nd
 191 class starts sending packets 1 second after the 1st class, it's of course
 192 impossible to guarantee 14mbit, as the interface capability is only 10mbit.
 193 The only workaround in this scenario is to allow the 1st class to send the
 194 packets earlier that would normally be allowed. That's where separate E() comes
 195 to help. Putting all the math aside (see HFSC paper for details), E() for RT
 196 concave service curve is just like D(), but for the RT convex service curve \-
 197 it's constructed using \fIonly\fR RT service curve's 2nd slope (in our example
 198 \- 7mbit).
 199
 200 The effect of such E() \- packets will be sent earlier, and at the same time
 201 D() \fIwill\fR be updated \- so current deadline time calculated from it will
 202 be bigger. Thus, when the 2nd class starts sending packets later, both the 1st
 203 and the 2nd class will be eligible, but the 2nd session's deadline time will be
 204 smaller and its packets will be sent first. When the 1st class becomes idle at
 205 some later point, the 2nd class will be able to "buffer" up again for later
 206 active period of the 1st class.
 207
 208 A short remark \- in a situation, where the total amount of bandwidth
 209 available on the interface is bigger than the allocated total realtime parts
 210 (imagine interface 10 mbit, but 1mbit/2mbit and 2mbit/1mbit classes), the sole
 211 speed of the interface could suffice to guarantee the times.
 212
 213 Important part of RT criterion is that apart from updating its D() and E(),
 214 also V() used by LS criterion is updated. Generally the RT criterion is
 215 secondary to LS one, and used \fIonly\fR if there's a risk of violating precise
 216 realtime requirements. Still, the "participation" in bandwidth distributed by
 217 LS criterion is there, so V() has to be updated along the way. LS criterion can
 218 than properly compensate for non\-ideal fair sharing situation, caused by RT
 219 scheduling. If you use UL service curve its F() will be updated as well (UL
 220 service curve is an extension to LS one \- see \fBUPPERLIMIT CRITERION\fR
 221 section).
 222
 223 Anyway \- careless specification of LS and RT service curves can lead to
 224 potentially undesired situations (see CORNER CASES for examples). This wasn't
 225 the case in HFSC paper where LS and RT service curves couldn't be specified
 226 separately.
 227
 228 .SH "LINKSHARING CRITERION"
 229 .
 230 LS criterion's task is to distribute bandwidth according to specified class
 231 hierarchy. Contrary to RT criterion, there're no comparisons between current
 232 real time and virtual time \- the decision is based solely on direct comparison
 233 of virtual times of all active subclasses \- the one with the smallest vt wins
 234 and gets scheduled. One immediate conclusion from this fact is that absolute
 235 values don't matter \- only ratios between them (so for example, two children
 236 classes with simple linear 1mbit service curves will get the same treatment
 237 from LS criterion's perspective, as if they were 5mbit). The other conclusion
 238 is, that in perfectly fluid system with linear curves, all virtual times across
 239 whole class hierarchy would be equal.
 240
 241 Why is VC defined in term of virtual time (and what is it) ?
 242
 243 Imagine an example: class A with two children \- A1 and A2, both with let's say
 244 10mbit SCs. If A2 is idle, A1 receives all the bandwidth of A (and update its
 245 V() in the process). When A2 becomes active, A1's virtual time is already
 246 \fIfar\fR bigger than A2's one. Considering the type of decision made by LS
 247 criterion, A1 would become idle for a lot of time. We can workaround this
 248 situation by adjusting virtual time of the class becoming active \- we do that
 249 by getting such time "up to date". HFSC uses a mean of the smallest and the
 250 biggest virtual time of currently active children fit for sending. As it's not
 251 real time anymore (excluding trivial case of situation where all classes become
 252 active at the same time, and never become idle), it's called virtual time.
 253
 254 Such approach has its price though. The problem is analogous to what was
 255 presented in previous section and is caused by non\-linearity of service
 256 curves:
 257 .IP 1) 4
 258 either it's impossible to guarantee service curves and satisfy fairness
 259 during certain time periods:
 260
 261 .RS 4
 262 Recall the example from RT section, slightly modified (with 3mbit slopes
 263 instead of 2mbit ones):
 264
 265 .IP \(bu 4
 266 1st class \- 3mbit for 100ms, then 7mbit (convex \- 1st slope < 2nd slope)
 267 .IP \(bu
 268 2nd class \- 7mbit for 100ms, then 3mbit (concave \- 1st slope > 2nd slope)
 269
 270 .PP
 271 They sum up nicely to 10mbit \- interface's capacity. But if we wanted to only
 272 use LS for guarantees and fairness \- it simply won't work. In LS context,
 273 only V() is used for making decision which class to schedule. If the 2nd class
 274 becomes active when the 1st one is in its second slope, the fairness will be
 275 preserved \- ratio will be 1:1 (7mbit:7mbit), but LS itself is of course
 276 unable to guarantee the absolute values themselves \- as it would have to go
 277 beyond of what the interface is capable of.
 278 .RE
 279
 280 .IP 2) 4
 281 and/or it's impossible to guarantee service curves of all classes at the same
 282 time [fairly or not]:
 283
 284 .RS 4
 285
 286 This is similar to the above case, but a bit more subtle. We will consider two
 287 subtrees, arbitrated by their common (root here) parent:
 288
 289 .nf
 290 R (root) -\ 10mbit
 291
 292 A  \- 7mbit, then 3mbit
 293 A1 \- 5mbit, then 2mbit
 294 A2 \- 2mbit, then 1mbit
 295
 296 B  \- 3mbit, then 7mbit
 297 .fi
 298
 299 R arbitrates between left subtree (A) and right (B). Assume that A2 and B are
 300 constantly backlogged, and at some later point A1 becomes backlogged (when all
 301 other classes are in their 2nd linear part).
 302
 303 What happens now ? B (choice made by R) will \fIalways\fR get 7 mbit as R is
 304 only (obviously) concerned with the ratio between its direct children. Thus A
 305 subtree gets 3mbit, but its children would want (at the point when A1 became
 306 backlogged) 5mbit + 1mbit. That's of course impossible, as they can only get
 307 3mbit due to interface limitation.
 308
 309 In the left subtree \- we have the same situation as previously (fair split
 310 between A1 and A2, but violated guarantees), but in the whole tree \- there's
 311 no fairness (B got 7mbit, but A1 and A2 have to fit together in 3mbit) and
 312 there's no guarantees for all classes (only B got what it wanted). Even if we
 313 violated fairness in the A subtree and set A2's service curve to 0, A1 would
 314 still not get the required bandwidth.
 315 .RE
 316 .
 317 .SH "UPPERLIMIT CRITERION"
 318 .
 319 UL criterion is an extensions to LS one, that permits sending packets only
 320 if current real time is bigger than fit\-time ('ft'). So the modified LS
 321 criterion becomes: choose the smallest virtual time from all active children,
 322 such that fit\-time < current real time also holds. Fit\-time is calculated
 323 from F(), which is based on UL service curve. As you can see, it's role is
 324 kinda similar to E() used in RT criterion. Also, for obvious reasons \- you
 325 can't specify UL service curve without LS one.
 326
 327 Main purpose of UL service curve is to limit HFSC to bandwidth available on the
 328 upstream router (think adsl home modem/router, and linux server as
 329 nat/firewall/etc. with 100mbit+ connection to mentioned modem/router).
 330 Typically, it's used to create a single class directly under root, setting
 331 linear UL service curve to available bandwidth \- and then creating your class
 332 structure from that class downwards. Of course, you're free to add UL service
 333 (linear or not) curve to any class with LS criterion.
 334
 335 Important part about UL service curve is, that whenever at some point in time
 336 a class doesn't qualify for linksharing due to its fit\-time, the next time it
 337 does qualify, it will update its virtual time to the smallest virtual time of
 338 all active children fit for linksharing. This way, one of the main things LS
 339 criterion tries to achieve \- equality of all virtual times across whole
 340 hierarchy \- is preserved (in perfectly fluid system with only linear curves,
 341 all virtual times would be equal).
 342
 343 Without that, 'vt' would lag behind other virtual times, and could cause
 344 problems. Consider interface with capacity 10mbit, and following leaf classes
 345 (just in case you're skipping this text quickly \- this example shows behavior
 346 that \f(BIdoesn't happen\fR):
 347
 348 .nf
 349 A \- ls 5.0mbit
 350 B \- ls 2.5mbit
 351 C \- ls 2.5mbit, ul 2.5mbit
 352 .fi
 353
 354 If B was idle, while A and C were constantly backlogged, they would normally
 355 (as far as LS criterion is concerned) divide bandwidth in 2:1 ratio. But due
 356 to UL service curve in place, C would get at most 2.5mbit, and A would get the
 357 remaining 7.5mbit. The longer the backlogged period, the more virtual times of
 358 A and C would drift apart. If B became backlogged at some later point in time,
 359 its virtual time would be set to (A's\~vt\~+\~C's\~vt)/2, thus blocking A from
 360 sending any traffic, until B's virtual time catches up with A.
 361 .
 362 .SH "SEPARATE LS / RT SCs"
 363 .
 364 Another difference from original HFSC paper, is that RT and LS SCs can be
 365 specified separately. Moreover \- leaf classes are allowed to have only either
 366 RT SC or LS SC. For interior classes, only LS SCs make sense \- Any RT SC will
 367 be ignored.
 368 .
 369 .SH "CORNER CASES"
 370 .
 371 Separate service curves for LS and RT criteria can lead to certain traps,
 372 that come from "fighting" between ideal linksharing and enforced realtime
 373 guarantees. Those situations didn't exist in original HFSC paper, where
 374 specifying separate LS / RT service curves was not discussed.
 375
 376 Consider interface with capacity 10mbit, with following leaf classes:
 377
 378 .nf
 379 A \- ls 5.0mbit, rt 8mbit
 380 B \- ls 2.5mbit
 381 C \- ls 2.5mbit
 382 .fi
 383
 384 Imagine A and C are constantly backlogged. As B is idle, A and C would divide
 385 bandwidth in 2:1 ratio, considering LS service curve (so in theory \- 6.66 and
 386 3.33). Alas RT criterion takes priority, so A will get 8mbit and LS will be
 387 able to compensate class C for only 2 mbit \- this will cause discrepancy
 388 between virtual times of A and C.
 389
 390 Assume this situation lasts for a lot of time with no idle periods, and
 391 suddenly B becomes active. B's virtual time will be updated to
 392 (A's\~vt\~+\~C's\~vt)/2, effectively landing in the middle between A's and C's
 393 virtual time. The effect \- B, having no RT guarantees, will be punished and
 394 will not be allowed to transfer until C's virtual time catches up.
 395
 396 If the interface had higher capacity \- for example 100mbit, this example
 397 would behave perfectly fine though.
 398
 399 Let's look a bit closer at the above example \- it "cleverly" invalidates one
 400 of the basic things LS criterion tries to achieve \- equality of all virtual
 401 times across class hierarchy. Leaf classes without RT service curves are
 402 literally left to their own fate (governed by messed up virtual times).
 403
 404 Also - it doesn't make much sense. Class A will always be guaranteed up to
 405 8mbit, and this is more than any absolute bandwidth that could happen from its
 406 LS criterion (excluding trivial case of only A being active). If the bandwidth
 407 taken by A is smaller than absolute value from LS criterion, the unused part
 408 will be automatically assigned to other active classes (as A has idling periods
 409 in such case). The only "advantage" is, that even in case of low bandwidth on
 410 average, bursts would be handled at the speed defined by RT criterion. Still,
 411 if extra speed is needed (e.g. due to latency), non linear service curves
 412 should be used in such case.
 413
 414 In the other words - LS criterion is meaningless in the above example.
 415
 416 You can quickly "workaround" it by making sure each leaf class has RT service
 417 curve assigned (thus guaranteeing all of them will get some bandwidth), but it
 418 doesn't make it any more valid.
 419
 420 Keep in mind - if you use nonlinear curves and irregularities explained above
 421 happen \fIonly\fR in the first segment, then there's little wrong with
 422 "overusing" RT curve a bit:
 423
 424 .nf
 425 A \- ls 5.0mbit, rt 9mbit/30ms, then 1mbit
 426 B \- ls 2.5mbit
 427 C \- ls 2.5mbit
 428 .fi
 429
 430 Here, the vt of A will "spike" in the initial period, but then A will never get more
 431 than 1mbit, until B & C catch up. Then everything will be back to normal.
 432 .
 433 .SH "LINUX AND TIMER RESOLUTION"
 434 .
 435 In certain situations, the scheduler can throttle itself and setup so
 436 called watchdog to wakeup dequeue function at some time later. In case of HFSC
 437 it happens when for example no packet is eligible for scheduling, and UL
 438 service curve is used to limit the speed at which LS criterion is allowed to
 439 dequeue packets. It's called throttling, and accuracy of it is dependent on
 440 how the kernel is compiled.
 441
 442 There're 3 important options in modern kernels, as far as timers' resolution
 443 goes: \&'tickless system', \&'high resolution timer support' and \&'timer
 444 frequency'.
 445
 446 If you have \&'tickless system' enabled, then the timer interrupt will trigger
 447 as slowly as possible, but each time a scheduler throttles itself (or any
 448 other part of the kernel needs better accuracy), the rate will be increased as
 449 needed / possible. The ceiling is either \&'timer frequency' if \&'high
 450 resolution timer support' is not available or not compiled in, or it's
 451 hardware dependent and can go \fIfar\fR beyond the highest \&'timer frequency'
 452 setting available.
 453
 454 If \&'tickless system' is not enabled, the timer will trigger at a fixed rate
 455 specified by \&'timer frequency' \- regardless if high resolution timers are
 456 or aren't available.
 457
 458 This is important to keep those settings in mind, as in scenario like: no
 459 tickless, no HR timers, frequency set to 100hz \- throttling accuracy would be
 460 at 10ms. It doesn't automatically mean you would be limited to ~0.8mbit/s
 461 (assuming packets at ~1KB) \- as long as your queues are prepared to cover for
 462 timer inaccuracy. Of course, in case of e.g. locally generated udp traffic \-
 463 appropriate socket size is needed as well. Short example to make it more
 464 understandable (assume hardcore anti\-schedule settings \- HZ=100, no HR
 465 timers, no tickless):
 466
 467 .nf
 468 tc qdisc add dev eth0 root handle 1:0 hfsc default 1
 469 tc class add dev eth0 parent 1:0 classid 1:1 hfsc rt m2 10mbit
 470 .fi
 471
 472 Assuming packet of ~1KB size and HZ=100, that averages to ~0.8mbit \- anything
 473 beyond it (e.g. the above example with specified rate over 10x bigger) will
 474 require appropriate queuing and cause bursts every ~10 ms. As you can
 475 imagine, any HFSC's RT guarantees will be seriously invalidated by that.
 476 Aforementioned example is mainly important if you deal with old hardware \- as
 477 it's particularly popular for home server chores. Even then, you can easily
 478 set HZ=1000 and have very accurate scheduling for typical adsl speeds.
 479
 480 Anything modern (apic or even hpet msi based timers + \&'tickless system')
 481 will provide enough accuracy for superb 1gbit scheduling. For example, on one
 482 of basically cheap dual core AMD boards I have with following settings:
 483
 484 .nf
 485 tc qdisc add dev eth0 parent root handle 1:0 hfsc default 1
 486 tc class add dev eth0 paretn 1:0 classid 1:1 hfsc rt m2 300mbit
 487 .fi
 488
 489 And simple:
 490
 491 .nf
 492 nc \-u dst.host.com 54321 </dev/zero
 493 nc \-l \-p 54321 >/dev/null
 494 .fi
 495
 496 \&...will yield following effects over period of ~10 seconds (taken from
 497 /proc/interrupts):
 498
 499 .nf
 500 319: 42124229   0  HPET_MSI\-edge  hpet2 (before)
 501 319: 42436214   0  HPET_MSI\-edge  hpet2 (after 10s.)
 502 .fi
 503
 504 That's roughly 31000/s. Now compare it with HZ=1000 setting. The obvious
 505 drawback of it is that cpu load can be rather extensive with servicing that
 506 many timer interrupts. Example with 300mbit RT service curve on 1gbit link is
 507 particularly ugly, as it requires a lot of throttling with minuscule delays.
 508
 509 Also note that it's just an example showing capability of current hardware.
 510 The above example (essentially 300mbit TBF emulator) is pointless on internal
 511 interface to begin with \- you will pretty much always want regular LS service
 512 curve there, and in such scenario HFSC simply doesn't throttle at all.
 513
 514 300mbit RT service curve (selected columns from mpstat \-P ALL 1):
 515
 516 .nf
 517 10:56:43 PM  CPU  %sys     %irq   %soft   %idle
 518 10:56:44 PM  all  20.10    6.53   34.67   37.19
 519 10:56:44 PM    0  35.00    0.00   63.00    0.00
 520 10:56:44 PM    1   4.95   12.87    6.93   73.27
 521 .fi
 522
 523 So, in rare case you need those speeds with only RT service curve, or with UL
 524 service curve \- remember about drawbacks.
 525 .
 526 .SH "CAVEAT: RANDOM ONLINE EXAMPLES"
 527 .
 528 For reasons unknown (though well guessed), many examples you can google love to
 529 overuse UL criterion and stuff it in every node possible. This makes no sense
 530 and works against what HFSC tries to do (and does pretty damn well). Use UL
 531 where it makes sense - on the uppermost node to match upstream router's uplink
 532 capacity. Or - in special cases, such as testing (limit certain subtree to some
 533 speed) or customers that must never get more than certain speed. In the last
 534 case you can usually achieve the same by just using RT criterion without LS+UL
 535 on leaf nodes.
 536
 537 As for router case - remember it's good to differentiate between "traffic to
 538 router" (remote console, web config, etc.) and "outgoing traffic", so for
 539 example:
 540
 541 .nf
 542 tc qdisc add dev eth0 root handle 1:0 hfsc default 0x8002
 543 tc class add dev eth0 parent 1:0 classid 1:999 hfsc rt m2 50mbit
 544 tc class add dev eth0 parent 1:0 classid 1:1 hfsc ls m2 2mbit ul m2 2mbit
 545 .fi
 546
 547 \&... so "internet" tree under 1:1 and "router itself" as 1:999
 548 .
 549 .SH "LAYER2 ADAPTATION"
 550 .
 551 Please refer to \fBtc\-stab\fR(8)
 552 .
 553 .SH "SEE ALSO"
 554 .
 555 \fBtc\fR(8), \fBtc\-hfsc\fR(8), \fBtc\-stab\fR(8)
 556
 557 Please direct bugreports and patches to: <net...@vger.kernel.org>
 558 .
 559 .SH "AUTHOR"
 560 .
 561 Manpage created by Michal Soltys (sol...@ziu.info)