1 | <?xml version='1.0' encoding='utf-8'?>
|
2 | <!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent">
|
3 |
|
4 | <rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="info"
|
5 | ipr="trust200902" obsoletes="" updates="" submissionType="IETF"
|
6 | consensus="true" number="9999" xml:lang="en" tocInclude="true" symRefs="true" sortRefs="true" version="3">
|
7 |
|
8 | <!-- xml2rfc v2v3 conversion 2.23.0 -->
|
9 |
|
10 | <front>
|
11 | <title abbrev="BGP-Prefix SID in large-scale DCs">BGP-Prefix Segment in
|
12 | large-scale data centers</title>
|
13 | <seriesInfo name="RFC" value="9999"/>
|
14 | <author fullname="Clarence Filsfils" initials="C." role="editor" surname="Filsfils">
|
15 | <organization>Cisco Systems, Inc.</organization>
|
16 | <address>
|
17 | <postal>
|
18 | <street/>
|
19 | <city>Brussels</city>
|
20 | <region/>
|
21 | <code/>
|
22 | <country>BE</country>
|
23 | </postal>
|
24 | <email>cfilsfil@cisco.com</email>
|
25 | </address>
|
26 | </author>
|
27 | <author fullname="Stefano Previdi" initials="S." surname="Previdi">
|
28 | <organization>Cisco Systems, Inc.</organization>
|
29 | <address>
|
30 | <postal>
|
31 | <street/>
|
32 | <city/>
|
33 | <code/>
|
34 | <country>Italy</country>
|
35 | </postal>
|
36 | <email>stefano@previdi.net</email>
|
37 | </address>
|
38 | </author>
|
39 | <author fullname="Gaurav Dawra" initials="G." surname="Dawra">
|
40 | <organization>LinkedIn</organization>
|
41 | <address>
|
42 | <postal>
|
43 | <street/>
|
44 | <city/>
|
45 | <code/>
|
46 | <country>USA</country>
|
47 | </postal>
|
48 | <email>gdawra.ietf@gmail.com</email>
|
49 | </address>
|
50 | </author>
|
51 | <author fullname="Ebben Aries" initials="E." surname="Aries">
|
52 | <organization>Juniper Networks</organization>
|
53 | <address>
|
54 | <postal>
|
55 | <street>1133 Innovation Way</street>
|
56 | <city>Sunnyvale</city>
|
57 | <code>CA 94089</code>
|
58 | <country>US</country>
|
59 | </postal>
|
60 | <email>exa@juniper.net</email>
|
61 | </address>
|
62 | </author>
|
63 | <author fullname="Petr Lapukhov" initials="P." surname="Lapukhov">
|
64 | <organization>Facebook</organization>
|
65 | <address>
|
66 | <postal>
|
67 | <street/>
|
68 | <city/>
|
69 | <code/>
|
70 | <country>US</country>
|
71 | </postal>
|
72 | <email>petr@fb.com</email>
|
73 | </address>
|
74 | </author>
|
75 | <date month="July" year="2019"/>
|
76 | <workgroup>Network Working Group</workgroup>
|
77 | <abstract>
|
78 | <t>This document describes the motivation and benefits for applying
|
79 | segment routing in BGP-based large-scale data-centers. It describes the
|
80 | design to deploy segment routing in those data-centers, for both the
|
81 | MPLS and IPv6 dataplanes.</t>
|
82 | </abstract>
|
83 | </front>
|
84 | <middle>
|
85 | <section anchor="INTRO" numbered="true" toc="default">
|
86 | <name>Introduction</name>
|
87 | <t>Segment Routing (SR), as described in <xref target="I-D.ietf-spring-segment-routing" format="default"/> leverages the source routing
|
88 | paradigm. A node steers a packet through an ordered list of
|
89 | instructions, called segments. A segment can represent any instruction,
|
90 | topological or service-based. A segment can have a local semantic to an
|
91 | SR node or global within an SR domain. SR allows to enforce a flow
|
92 | through any topological path while maintaining per-flow state only at
|
93 | the ingress node to the SR domain. Segment Routing can be applied to the
|
94 | MPLS and IPv6 data-planes.</t>
|
95 | <t>The use-cases described in this document should be considered in the
|
96 | context of the BGP-based large-scale data-center (DC) design described
|
97 | in <xref target="RFC7938" format="default"/>. This document extends it by applying SR
|
98 | both with IPv6 and MPLS dataplane.</t>
|
99 | </section>
|
100 | <section anchor="LARGESCALEDC" numbered="true" toc="default">
|
101 | <name>Large Scale Data Center Network Design Summary</name>
|
102 | <t>This section provides a brief summary of the informational document
|
103 | <xref target="RFC7938" format="default"/> that outlines a practical network design
|
104 | suitable for data-centers of various scales:</t>
|
105 | <ul spacing="normal">
|
106 | <li>Data-center networks have highly symmetric topologies with
|
107 | multiple parallel paths between two server attachment points. The
|
108 | well-known Clos topology is most popular among the operators (as
|
109 | described in <xref target="RFC7938" format="default"/>). In a Clos topology, the
|
110 | minimum number of parallel paths between two elements is determined
|
111 | by the "width" of the "Tier-1" stage. See <xref target="FIGLARGE" format="default"/>
|
112 | below for an illustration of the concept.</li>
|
113 | <li>Large-scale data-centers commonly use a routing protocol, such as
|
114 | BGP-4 <xref target="RFC4271" format="default"/> in order to provide endpoint
|
115 | connectivity. Recovery after a network failure is therefore driven
|
116 | either by local knowledge of directly available backup paths or by
|
117 | distributed signaling between the network devices.</li>
|
118 | <li>Within data-center networks, traffic is load-shared using the
|
119 | Equal Cost Multipath (ECMP) mechanism. With ECMP, every network
|
120 | device implements a pseudo-random decision, mapping packets to one
|
121 | of the parallel paths by means of a hash function calculated over
|
122 | certain parts of the packet, typically a combination of various
|
123 | packet header fields.</li>
|
124 | </ul>
|
125 | <t>The following is a schematic of a five-stage Clos topology, with four
|
126 | devices in the "Tier-1" stage. Notice that number of paths between Node1
|
127 | and Node12 equals to four: the paths have to cross all of Tier-1
|
128 | devices. At the same time, the number of paths between Node1 and Node2
|
129 | equals two, and the paths only cross Tier-2 devices. Other topologies
|
130 | are possible, but for simplicity only the topologies that have a single
|
131 | path from Tier-1 to Tier-3 are considered below. The rest could be
|
132 | treated similarly, with a few modifications to the logic.</t>
|
133 | <section anchor="REFDESIGN" numbered="true" toc="default">
|
134 | <name>Reference design</name>
|
135 | <figure anchor="FIGLARGE">
|
136 | <name>5-stage Clos topology</name>
|
137 | <artwork name="" type="" align="left" alt=""><![CDATA[ Tier-1
|
138 | +-----+
|
139 | |NODE |
|
140 | +->| 5 |--+
|
141 | | +-----+ |
|
142 | Tier-2 | | Tier-2
|
143 | +-----+ | +-----+ | +-----+
|
144 | +------------>|NODE |--+->|NODE |--+--|NODE |-------------+
|
145 | | +-----| 3 |--+ | 6 | +--| 9 |-----+ |
|
146 | | | +-----+ +-----+ +-----+ | |
|
147 | | | | |
|
148 | | | +-----+ +-----+ +-----+ | |
|
149 | | +-----+---->|NODE |--+ |NODE | +--|NODE |-----+-----+ |
|
150 | | | | +---| 4 |--+->| 7 |--+--| 10 |---+ | | |
|
151 | | | | | +-----+ | +-----+ | +-----+ | | | |
|
152 | | | | | | | | | | |
|
153 | +-----+ +-----+ | +-----+ | +-----+ +-----+
|
154 | |NODE | |NODE | Tier-3 +->|NODE |--+ Tier-3 |NODE | |NODE |
|
155 | | 1 | | 2 | | 8 | | 11 | | 12 |
|
156 | +-----+ +-----+ +-----+ +-----+ +-----+
|
157 | | | | | | | | |
|
158 | A O B O <- Servers -> Z O O O
|
159 | ]]></artwork>
|
160 | </figure>
|
161 | <t>In the reference topology illustrated in <xref target="FIGLARGE" format="default"/>,
|
162 | It is assumed:</t>
|
163 | <ul spacing="normal">
|
164 | <li>
|
165 | <t>Each node is its own AS (Node X has AS X). 4-byte AS numbers
|
166 | are recommended (<xref target="RFC6793" format="default"/>).</t>
|
167 | <ul spacing="normal">
|
168 | <li>For simple and efficient route propagation filtering,
|
169 | Node5, Node6, Node7 and Node8 use the same AS, Node3 and Node4
|
170 | use the same AS, Node9 and Node10 use the same AS.</li>
|
171 | <li>In case of 2-byte autonomous system numbers are used and
|
172 | for efficient usage of the scarce 2-byte Private Use AS pool,
|
173 | different Tier-3 nodes might use the same AS.</li>
|
174 | <li>Without loss of generality, these details will be
|
175 | simplified in this document and assume that each node has its
|
176 | own AS.</li>
|
177 | </ul>
|
178 | </li>
|
179 | <li>Each node peers with its neighbors with a BGP session. If not
|
180 | specified, eBGP is assumed. In a specific use-case, iBGP will be
|
181 | used but this will be called out explicitly in that case.</li>
|
182 | <li>
|
183 | <t>Each node originates the IPv4 address of its loopback interface
|
184 | into BGP and announces it to its neighbors. </t>
|
185 | <ul spacing="normal">
|
186 | <li>The loopback of Node X is 192.0.2.x/32.</li>
|
187 | </ul>
|
188 | </li>
|
189 | </ul>
|
190 | <t>In this document, the Tier-1, Tier-2 and Tier-3 nodes are referred
|
191 | to respectively as Spine, Leaf and ToR (top of rack) nodes. When a ToR
|
192 | node acts as a gateway to the "outside world", it is referred to as a
|
193 | border node.</t>
|
194 | </section>
|
195 | </section>
|
196 | <section anchor="OPENPROBS" numbered="true" toc="default">
|
197 | <name>Some open problems in large data-center networks</name>
|
198 | <t>The data-center network design summarized above provides means for
|
199 | moving traffic between hosts with reasonable efficiency. There are few
|
200 | open performance and reliability problems that arise in such design:
|
201 | </t>
|
202 | <ul spacing="normal">
|
203 | <li>ECMP routing is most commonly realized per-flow. This means that
|
204 | large, long-lived "elephant" flows may affect performance of
|
205 | smaller, short-lived "mouse" flows and reduce efficiency
|
206 | of per-flow load-sharing. In other words, per-flow ECMP does not
|
207 | perform efficiently when flow lifetime distribution is heavy-tailed.
|
208 | Furthermore, due to hash-function inefficiencies it is possible to
|
209 | have frequent flow collisions, where more flows get placed on one
|
210 | path over the others.</li>
|
211 | <li>Shortest-path routing with ECMP implements an oblivious routing
|
212 | model, which is not aware of the network imbalances. If the network
|
213 | symmetry is broken, for example due to link failures, utilization
|
214 | hotspots may appear. For example, if a link fails between Tier-1 and
|
215 | Tier-2 devices (e.g. Node5 and Node9), Tier-3 devices Node1 and
|
216 | Node2 will not be aware of that, since there are other paths
|
217 | available from perspective of Node3. They will continue sending
|
218 | roughly equal traffic to Node3 and Node4 as if the failure didn't
|
219 | exist which may cause a traffic hotspot.</li>
|
220 | <li>Isolating faults in the network with multiple parallel paths and
|
221 | ECMP-based routing is non-trivial due to lack of determinism.
|
222 | Specifically, the connections from HostA to HostB may take a
|
223 | different path every time a new connection is formed, thus making
|
224 | consistent reproduction of a failure much more difficult. This
|
225 | complexity scales linearly with the number of parallel paths in the
|
226 | network, and stems from the random nature of path selection by the
|
227 | network devices.</li>
|
228 | </ul>
|
229 | <t>First, it will be explained how to apply SR in the DC, for MPLS and
|
230 | IPv6 data-planes.</t>
|
231 | </section>
|
232 | <section anchor="APPLYSR" numbered="true" toc="default">
|
233 | <name>Applying Segment Routing in the DC with MPLS dataplane</name>
|
234 | <section anchor="BGPREFIXSEGMENT" numbered="true" toc="default">
|
235 | <name>BGP Prefix Segment (BGP-Prefix-SID)</name>
|
236 | <t>A BGP Prefix Segment is a segment associated with a BGP prefix. A
|
237 | BGP Prefix Segment is a network-wide instruction to forward the packet
|
238 | along the ECMP-aware best path to the related prefix.</t>
|
239 | <t>The BGP Prefix Segment is defined as the BGP-Prefix-SID Attribute
|
240 | in <xref target="I-D.ietf-idr-bgp-prefix-sid" format="default"/> which contains an
|
241 | index. Throughout this document the BGP Prefix Segment Attribute is
|
242 | referred as the BGP-Prefix-SID and the encoded index as the
|
243 | label-index.</t>
|
244 | <t>In this document, the network design decision has been made to
|
245 | assume that all the nodes are allocated the same SRGB (Segment Routing
|
246 | Global Block), e.g. [16000, 23999]. This provides operational
|
247 | simplification as explained in <xref target="SINGLESRGB" format="default"/>, but this
|
248 | is not a requirement.</t>
|
249 | <t>For illustration purpose, when considering an MPLS data-plane, it
|
250 | is assumed that the label-index allocated to prefix 192.0.2.x/32 is X.
|
251 | As a result, a local label (16000+x) is allocated for prefix
|
252 | 192.0.2.x/32 by each node throughout the DC fabric.</t>
|
253 | <t>When IPv6 data-plane is considered, it is assumed that Node X is
|
254 | allocated IPv6 address (segment) 2001:DB8::X.</t>
|
255 | </section>
|
256 | <section anchor="eBGP8277" numbered="true" toc="default">
|
257 | <name>eBGP Labeled Unicast (RFC8277)</name>
|
258 | <t>Referring to <xref target="FIGLARGE" format="default"/> and <xref target="RFC7938" format="default"/>, the following design modifications are
|
259 | introduced:</t>
|
260 | <ul spacing="normal">
|
261 | <li>Each node peers with its neighbors via a eBGP session with
|
262 | extensions defined in <xref target="RFC8277" format="default"/> (named "eBGP8277"
|
263 | throughout this document) and with the BGP-Prefix-SID attribute
|
264 | extension as defined in <xref target="I-D.ietf-idr-bgp-prefix-sid" format="default"/>.</li>
|
265 | <li>The forwarding plane at Tier-2 and Tier-1 is MPLS.</li>
|
266 | <li>The forwarding plane at Tier-3 is either IP2MPLS (if the host
|
267 | sends IP traffic) or MPLS2MPLS (if the host sends MPLS-
|
268 | encapsulated traffic).</li>
|
269 | </ul>
|
270 | <t><xref target="FIGSMALL" format="default"/> zooms into a path from server A to server
|
271 | Z within the topology of <xref target="FIGLARGE" format="default"/>.</t>
|
272 | <figure anchor="FIGSMALL">
|
273 | <name>Path from A to Z via nodes 1, 4, 7, 10 and 11</name>
|
274 | <artwork name="" type="" align="left" alt=""><![CDATA[
|
275 | +-----+ +-----+ +-----+
|
276 | +---------->|NODE | |NODE | |NODE |
|
277 | | | 4 |--+->| 7 |--+--| 10 |---+
|
278 | | +-----+ +-----+ +-----+ |
|
279 | | |
|
280 | +-----+ +-----+
|
281 | |NODE | |NODE |
|
282 | | 1 | | 11 |
|
283 | +-----+ +-----+
|
284 | | |
|
285 | A <- Servers -> Z
|
286 | ]]></artwork>
|
287 | </figure>
|
288 | <t>Referring to <xref target="FIGLARGE" format="default"/> and <xref target="FIGSMALL" format="default"/> and assuming the IP address with the AS and
|
289 | label-index allocation previously described, the following sections
|
290 | detail the control plane operation and the data plane states for the
|
291 | prefix 192.0.2.11/32 (loopback of Node11)</t>
|
292 | <section anchor="CONTROLPLANE" numbered="true" toc="default">
|
293 | <name>Control Plane</name>
|
294 | <t>Node11 originates 192.0.2.11/32 in BGP and allocates to it a
|
295 | BGP-Prefix-SID with label-index: index11 <xref target="I-D.ietf-idr-bgp-prefix-sid" format="default"/>.</t>
|
296 | <ul empty="true">
|
297 | <li><t>Node11 sends the following eBGP8277 update to Node10:</t>
|
298 | <dl spacing="compact">
|
299 | <dt>IP Prefix:</dt><dd>192.0.2.11/32</dd>
|
300 | <dt>Label:</dt><dd>Implicit-Null</dd>
|
301 | <dt>Next-hop:</dt><dd>Node11's interface address on the link to Node10</dd>
|
302 | <dt>AS Path:</dt><dd>{11}</dd>
|
303 | <dt>BGP-Prefix-SID:</dt><dd>Label-Index 11</dd>
|
304 | </dl>
|
305 | </li>
|
306 | </ul>
|
307 |
|
308 | <t>Node10 receives the above update. As it is SR capable, Node10 is
|
309 | able to interpret the BGP-Prefix-SID and hence understands that it
|
310 | should allocate the label from its own SRGB block, offset by the
|
311 | Label-Index received in the BGP-Prefix-SID (16000+11 hence 16011) to
|
312 | the NLRI instead of allocating a non-deterministic label out of a
|
313 | dynamically allocated portion of the local label space. The
|
314 | implicit-null label in the NLRI tells Node10 that it is the
|
315 | penultimate hop and must pop the top label on the stack before
|
316 | forwarding traffic for this prefix to Node11.</t>
|
317 | <ul empty="true">
|
318 | <li><t>Then, Node10 sends the following eBGP8277 update to Node7:</t>
|
319 | <dl spacing="compact">
|
320 | <dt>IP Prefix:</dt><dd>192.0.2.11/32</dd>
|
321 | <dt>Label:</dt><dd>16011</dd>
|
322 | <dt>Next-hop:</dt><dd>Node10's interface address on the link to Node7</dd>
|
323 | <dt>AS Path:</dt><dd>{10, 11}</dd>
|
324 | <dt>BGP-Prefix-SID:</dt><dd>Label-Index 11</dd>
|
325 | </dl>
|
326 | </li>
|
327 | </ul>
|
328 | <t>Node7 receives the above update. As it is SR capable, Node7 is
|
329 | able to interpret the BGP-Prefix-SID and hence allocates the local
|
330 | (incoming) label 16011 (16000 + 11) to the NLRI (instead of
|
331 | allocating a "dynamic" local label from its label
|
332 | manager). Node7 uses the label in the received eBGP8277 NLRI as the
|
333 | outgoing label (the index is only used to derive the local/incoming
|
334 | label).</t>
|
335 | <ul empty="true">
|
336 | <li><t>Node7 sends the following eBGP8277 update to Node4:</t>
|
337 | <dl spacing="compact">
|
338 | <dt>Label:</dt><dd>16011</dd>
|
339 | <dt>Next-hop:</dt><dd>Node7's interface address on the link to Node4</dd>
|
340 | <dt>AS Path:</dt><dd>{7, 10, 11}</dd>
|
341 | <dt>BGP-Prefix-SID:</dt><dd>Label-Index 11</dd>
|
342 | </dl>
|
343 | </li>
|
344 | </ul>
|
345 | <t>Node4 receives the above update. As it is SR capable, Node4 is
|
346 | able to interpret the BGP-Prefix-SID and hence allocates the local
|
347 | (incoming) label 16011 to the NLRI (instead of allocating a
|
348 | "dynamic" local label from its label manager). Node4
|
349 | uses the label in the received eBGP8277 NLRI as outgoing label (the
|
350 | index is only used to derive the local/incoming label).</t>
|
351 |
|
352 | <ul empty="true">
|
353 | <li><t>Node4 sends the following eBGP8277 update to Node1:</t>
|
354 | <dl spacing="compact">
|
355 | <dt>IP Prefix:</dt><dd>192.0.2.11/32</dd>
|
356 | <dt>Label:</dt><dd>16011</dd>
|
357 | <dt>Next-hop:</dt><dd>Node4's interface address on the link to Node1</dd>
|
358 | <dt>AS Path:</dt><dd>{4, 7, 10, 11}</dd>
|
359 | <dt>BGP-Prefix-SID:</dt><dd>Label-Index 11</dd>
|
360 | </dl>
|
361 | </li>
|
362 | </ul>
|
363 |
|
364 | <t>Node1 receives the above update. As it is SR capable, Node1 is
|
365 | able to interpret the BGP-Prefix-SID and hence allocates the local
|
366 | (incoming) label 16011 to the NLRI (instead of allocating a
|
367 | "dynamic" local label from its label manager). Node1
|
368 | uses the label in the received eBGP8277 NLRI as outgoing label (the
|
369 | index is only used to derive the local/incoming label).</t>
|
370 | </section>
|
371 | <section anchor="DATAPLANE" numbered="true" toc="default">
|
372 | <name>Data Plane</name>
|
373 | <t>Referring to <xref target="FIGLARGE" format="default"/>, and assuming all nodes
|
374 | apply the same advertisement rules described above and all nodes
|
375 | have the same SRGB (16000-23999), here are the IP/MPLS forwarding
|
376 | tables for prefix 192.0.2.11/32 at Node1, Node4, Node7 and
|
377 | Node10.</t>
|
378 | <table anchor="NODE1FIB" align="center">
|
379 | <name>Node1 Forwarding Table</name>
|
380 | <thead>
|
381 | <tr>
|
382 | <th align="center">Incoming label or IP destination</th>
|
383 | <th align="center">Outgoing label</th>
|
384 | <th align="center">Outgoing Interface</th>
|
385 | </tr>
|
386 | </thead>
|
387 | <tbody>
|
388 | <tr>
|
389 | <td align="center">16011</td>
|
390 | <td align="center">16011</td>
|
391 | <td align="center">ECMP{3, 4}</td>
|
392 | </tr>
|
393 | <tr>
|
394 | <td align="center">192.0.2.11/32</td>
|
395 | <td align="center">16011</td>
|
396 | <td align="center">ECMP{3, 4}</td>
|
397 | </tr>
|
398 | </tbody>
|
399 | </table>
|
400 |
|
401 | <table anchor="NODE4FIB" align="center">
|
402 | <name>Node4 Forwarding Table</name>
|
403 | <thead>
|
404 | <tr>
|
405 | <th align="center">Incoming label or IP destination</th>
|
406 | <th align="center">Outgoing label</th>
|
407 | <th align="center">Outgoing Interface</th>
|
408 | </tr>
|
409 | </thead>
|
410 | <tbody>
|
411 | <tr>
|
412 | <td align="center">16011</td>
|
413 | <td align="center">16011</td>
|
414 | <td align="center">ECMP{7, 8}</td>
|
415 | </tr>
|
416 | <tr>
|
417 | <td align="center">192.0.2.11/32</td>
|
418 | <td align="center">16011</td>
|
419 | <td align="center">ECMP{7, 8}</td>
|
420 | </tr>
|
421 | </tbody>
|
422 | </table>
|
423 |
|
424 | <table anchor="NODE7FIB" align="center">
|
425 | <name>Node7 Forwarding Table</name>
|
426 | <thead>
|
427 | <tr>
|
428 | <th align="center">Incoming label or IP destination</th>
|
429 | <th align="center">Outgoing label</th>
|
430 | <th align="center">Outgoing Interface</th>
|
431 | </tr>
|
432 | </thead>
|
433 | <tbody>
|
434 | <tr>
|
435 | <td align="center">16011</td>
|
436 | <td align="center">16011</td>
|
437 | <td align="center">10</td>
|
438 | </tr>
|
439 | <tr>
|
440 | <td align="center">192.0.2.11/32</td>
|
441 | <td align="center">16011</td>
|
442 | <td align="center">10</td>
|
443 | </tr>
|
444 | </tbody>
|
445 | </table>
|
446 |
|
447 | <table align="center">
|
448 | <name/>
|
449 | <thead>
|
450 | <tr>
|
451 | <th align="center">Incoming label or IP destination</th>
|
452 | <th align="center">Outgoing label</th>
|
453 | <th align="center">Outgoing Interface</th>
|
454 | </tr>
|
455 | </thead>
|
456 | <tbody>
|
457 | <tr>
|
458 | <td align="center">16011</td>
|
459 | <td align="center">POP</td>
|
460 | <td align="center">11</td>
|
461 | </tr>
|
462 | <tr>
|
463 | <td align="center">192.0.2.11/32</td>
|
464 | <td align="center">N/A</td>
|
465 | <td align="center">11</td>
|
466 | </tr>
|
467 | </tbody>
|
468 | </table>
|
469 | </section>
|
470 | <section anchor="VARIATIONS" numbered="true" toc="default">
|
471 | <name>Network Design Variation</name>
|
472 | <t>A network design choice could consist of switching all the
|
473 | traffic through Tier-1 and Tier-2 as MPLS traffic. In this case, one
|
474 | could filter away the IP entries at Node4, Node7 and Node10. This
|
475 | might be beneficial in order to optimize the forwarding table
|
476 | size.</t>
|
477 | <t>A network design choice could consist in allowing the hosts to
|
478 | send MPLS-encapsulated traffic based on the Egress Peer Engineering
|
479 | (EPE) use-case as defined in <xref target="I-D.ietf-spring-segment-routing-central-epe" format="default"/>. For example,
|
480 | applications at HostA would send their Z-destined traffic to Node1
|
481 | with an MPLS label stack where the top label is 16011 and the next
|
482 | label is an EPE peer segment (<xref target="I-D.ietf-spring-segment-routing-central-epe" format="default"/>) at Node11
|
483 | directing the traffic to Z.</t>
|
484 | </section>
|
485 | <section anchor="FABRIC" numbered="true" toc="default">
|
486 | <name>Global BGP Prefix Segment through the fabric</name>
|
487 | <t>When the previous design is deployed, the operator enjoys global
|
488 | BGP-Prefix-SID and label allocation throughout the DC fabric.</t>
|
489 | <t>A few examples follow:</t>
|
490 | <ul spacing="normal">
|
491 | <li>Normal forwarding to Node11: a packet with top label 16011
|
492 | received by any node in the fabric will be forwarded along the
|
493 | ECMP-aware BGP best-path towards Node11 and the label 16011 is
|
494 | penultimate-popped at Node10 (or at Node 9).</li>
|
495 | <li>Traffic-engineered path to Node11: an application on a host
|
496 | behind Node1 might want to restrict its traffic to paths via the
|
497 | Spine node Node5. The application achieves this by sending its
|
498 | packets with a label stack of {16005, 16011}. BGP Prefix SID
|
499 | 16005 directs the packet up to Node5 along the path (Node1,
|
500 | Node3, Node5). BGP-Prefix-SID 16011 then directs the packet down
|
501 | to Node11 along the path (Node5, Node9, Node11).</li>
|
502 | </ul>
|
503 | </section>
|
504 | <section anchor="INCRDEP" numbered="true" toc="default">
|
505 | <name>Incremental Deployments</name>
|
506 | <t>The design previously described can be deployed incrementally.
|
507 | Let us assume that Node7 does not support the BGP-Prefix-SID and let
|
508 | us show how the fabric connectivity is preserved.</t>
|
509 | <t>From a signaling viewpoint, nothing would change: even though
|
510 | Node7 does not support the BGP-Prefix-SID, it does propagate the
|
511 | attribute unmodified to its neighbors.</t>
|
512 | <t>From a label allocation viewpoint, the only difference is that
|
513 | Node7 would allocate a dynamic (random) label to the prefix
|
514 | 192.0.2.11/32 (e.g. 123456) instead of the "hinted" label as
|
515 | instructed by the BGP-Prefix-SID. The neighbors of Node7 adapt
|
516 | automatically as they always use the label in the BGP8277 NLRI as
|
517 | outgoing label.</t>
|
518 | <t>Node4 does understand the BGP-Prefix-SID and hence allocates the
|
519 | indexed label in the SRGB (16011) for 192.0.2.11/32.</t>
|
520 | <t>As a result, all the data-plane entries across the network would
|
521 | be unchanged except the entries at Node7 and its neighbor Node4 as
|
522 | shown in the figures below.</t>
|
523 | <t>The key point is that the end-to-end Label Switched Path (LSP) is
|
524 | preserved because the outgoing label is always derived from the
|
525 | received label within the BGP8277 NLRI. The index in the
|
526 | BGP-Prefix-SID is only used as a hint on how to allocate the local
|
527 | label (the incoming label) but never for the outgoing label.</t>
|
528 | <table anchor="NODE7FIBINC" align="center">
|
529 | <name>Node7 Forwarding Table</name>
|
530 | <thead>
|
531 | <tr>
|
532 | <th align="center">Incoming label or IP destination</th>
|
533 | <th align="center">Outgoing label</th>
|
534 | <th align="center">Outgoing interface</th>
|
535 | </tr>
|
536 | </thead>
|
537 | <tbody>
|
538 | <tr>
|
539 | <td align="center">12345</td>
|
540 | <td align="center">16011</td>
|
541 | <td align="center">10</td>
|
542 | </tr>
|
543 | </tbody>
|
544 | </table>
|
545 | <table anchor="NODE4FIBINC" align="center">
|
546 | <name>Node4 Forwarding Table</name>
|
547 | <thead>
|
548 | <tr>
|
549 | <th align="center">Incoming label or IP destination</th>
|
550 | <th align="center">Outgoing label</th>
|
551 | <th align="center">Outgoing interface</th>
|
552 | </tr>
|
553 | </thead>
|
554 | <tbody>
|
555 | <tr>
|
556 | <td align="center">16011</td>
|
557 | <td align="center">12345</td>
|
558 | <td align="center">7</td>
|
559 | </tr>
|
560 | </tbody>
|
561 | </table>
|
562 | <t>The BGP-Prefix-SID can thus be deployed incrementally one node at
|
563 | a time.</t>
|
564 | <t>When deployed together with a homogeneous SRGB (same SRGB across
|
565 | the fabric), the operator incrementally enjoys the global prefix
|
566 | segment benefits as the deployment progresses through the
|
567 | fabric.</t>
|
568 | </section>
|
569 | </section>
|
570 | <section anchor="iBGP3107" numbered="true" toc="default">
|
571 | <name>iBGP Labeled Unicast (RFC8277)</name>
|
572 | <t>The same exact design as eBGP8277 is used with the following
|
573 | modifications:</t>
|
574 | <ul empty="true" spacing="normal">
|
575 | <li>All nodes use the same AS number.</li>
|
576 | <li>Each node peers with its neighbors via an internal BGP session
|
577 | (iBGP) with extensions defined in <xref target="RFC8277" format="default"/> (named
|
578 | "iBGP8277" throughout this document).</li>
|
579 | <li>Each node acts as a route-reflector for each of its neighbors
|
580 | and with the next-hop-self option. Next-hop-self is a well known
|
581 | operational feature which consists of rewriting the next-hop of a
|
582 | BGP update prior to send it to the neighbor. Usually, it's a
|
583 | common practice to apply next-hop-self behavior towards iBGP peers
|
584 | for eBGP learned routes. In the case outlined in this section it
|
585 | is proposed to use the next-hop-self mechanism also to iBGP
|
586 | learned routes.</li>
|
587 | <li>
|
588 | <figure anchor="IBGPFIG">
|
589 | <name>iBGP Sessions with Reflection and Next-Hop-Self</name>
|
590 | <artwork name="" type="" align="left" alt=""><![CDATA[
|
591 | Cluster-1
|
592 | +-----------+
|
593 | | Tier-1 |
|
594 | | +-----+ |
|
595 | | |NODE | |
|
596 | | | 5 | |
|
597 | Cluster-2 | +-----+ | Cluster-3
|
598 | +---------+ | | +---------+
|
599 | | Tier-2 | | | | Tier-2 |
|
600 | | +-----+ | | +-----+ | | +-----+ |
|
601 | | |NODE | | | |NODE | | | |NODE | |
|
602 | | | 3 | | | | 6 | | | | 9 | |
|
603 | | +-----+ | | +-----+ | | +-----+ |
|
604 | | | | | | |
|
605 | | | | | | |
|
606 | | +-----+ | | +-----+ | | +-----+ |
|
607 | | |NODE | | | |NODE | | | |NODE | |
|
608 | | | 4 | | | | 7 | | | | 10 | |
|
609 | | +-----+ | | +-----+ | | +-----+ |
|
610 | +---------+ | | +---------+
|
611 | | |
|
612 | | +-----+ |
|
613 | | |NODE | |
|
614 | Tier-3 | | 8 | | Tier-3
|
615 | +-----+ +-----+ | +-----+ | +-----+ +-----+
|
616 | |NODE | |NODE | +-----------+ |NODE | |NODE |
|
617 | | 1 | | 2 | | 11 | | 12 |
|
618 | +-----+ +-----+ +-----+ +-----+
|
619 | ]]></artwork>
|
620 | </figure>
|
621 | </li>
|
622 | <li>
|
623 | <t>For simple and efficient route propagation filtering and as
|
624 | illustrated in <xref target="IBGPFIG" format="default"/>: </t>
|
625 | <ul spacing="normal">
|
626 | <li>Node5, Node6, Node7 and Node8 use the same Cluster ID
|
627 | (Cluster-1)</li>
|
628 | <li>Node3 and Node4 use the same Cluster ID (Cluster-2)</li>
|
629 | <li>Node9 and Node10 use the same Cluster ID (Cluster-3)</li>
|
630 | </ul>
|
631 | </li>
|
632 | <li>The control-plane behavior is mostly the same as described in
|
633 | the previous section: the only difference is that the eBGP8277
|
634 | path propagation is simply replaced by an iBGP8277 path reflection
|
635 | with next-hop changed to self.</li>
|
636 | <li>The data-plane tables are exactly the same.</li>
|
637 | </ul>
|
638 | </section>
|
639 | </section>
|
640 | <section anchor="IPV6" numbered="true" toc="default">
|
641 | <name>Applying Segment Routing in the DC with IPv6 dataplane</name>
|
642 | <t>The design described in <xref target="RFC7938" format="default"/> is reused with one
|
643 | single modification. It is highlighted using the example of the
|
644 | reachability to Node11 via spine node Node5.</t>
|
645 | <t>Node5 originates 2001:DB8::5/128 with the attached BGP-Prefix-SID for
|
646 | IPv6 packets destined to segment 2001:DB8::5 (<xref target="I-D.ietf-idr-bgp-prefix-sid" format="default"/>).</t>
|
647 | <t>Node11 originates 2001:DB8::11/128 with the attached BGP-Prefix-SID
|
648 | advertising the support of the SRH for IPv6 packets destined to segment
|
649 | 2001:DB8::11.</t>
|
650 | <t>The control-plane and data-plane processing of all the other nodes in
|
651 | the fabric is unchanged. Specifically, the routes to 2001:DB8::5 and
|
652 | 2001:DB8::11 are installed in the FIB along the eBGP best-path to Node5
|
653 | (spine node) and Node11 (ToR node) respectively.</t>
|
654 | <t>An application on HostA which needs to send traffic to HostZ via only
|
655 | Node5 (spine node) can do so by sending IPv6 packets with a Segment
|
656 | Routing header (SRH, <xref target="I-D.ietf-6man-segment-routing-header" format="default"/>). The destination
|
657 | address and active segment is set to 2001:DB8::5. The next and last
|
658 | segment is set to 2001:DB8::11.</t>
|
659 | <t>The application must only use IPv6 addresses that have been
|
660 | advertised as capable for SRv6 segment processing (e.g. for which the
|
661 | BGP prefix segment capability has been advertised). How applications
|
662 | learn this (e.g.: centralized controller and orchestration) is outside
|
663 | the scope of this document.</t>
|
664 | </section>
|
665 | <section anchor="COMMHOSTS" numbered="true" toc="default">
|
666 | <name>Communicating path information to the host</name>
|
667 | <t>There are two general methods for communicating path information to
|
668 | the end-hosts: "proactive" and "reactive", aka "push" and "pull" models.
|
669 | There are multiple ways to implement either of these methods. Here, it
|
670 | is noted that one way could be using a centralized controller: the
|
671 | controller either tells the hosts of the prefix-to-path mappings
|
672 | beforehand and updates them as needed (network event driven push), or
|
673 | responds to the hosts making request for a path to specific destination
|
674 | (host event driven pull). It is also possible to use a hybrid model,
|
675 | i.e., pushing some state from the controller in response to particular
|
676 | network events, while the host pulls other state on demand.</t>
|
677 | <t>It is also noted, that when disseminating network-related data to the
|
678 | end-hosts a trade-off is made to balance the amount of information Vs.
|
679 | the level of visibility in the network state. This applies both to push
|
680 | and pull models. In the extreme case, the host would request path
|
681 | information on every flow, and keep no local state at all. On the other
|
682 | end of the spectrum, information for every prefix in the network along
|
683 | with available paths could be pushed and continuously updated on all
|
684 | hosts.</t>
|
685 | </section>
|
686 | <section anchor="BENEFITS" numbered="true" toc="default">
|
687 | <name>Additional Benefits</name>
|
688 | <section anchor="MPLSIMPLE" numbered="true" toc="default">
|
689 | <name>MPLS Dataplane with operational simplicity</name>
|
690 | <t>As required by <xref target="RFC7938" format="default"/>, no new signaling protocol
|
691 | is introduced. The BGP-Prefix-SID is a lightweight extension to BGP
|
692 | Labeled Unicast <xref target="RFC8277" format="default"/>. It applies either to eBGP or
|
693 | iBGP based designs.</t>
|
694 | <t>Specifically, LDP and RSVP-TE are not used. These protocols would
|
695 | drastically impact the operational complexity of the Data Center and
|
696 | would not scale. This is in line with the requirements expressed in
|
697 | <xref target="RFC7938" format="default"/>.</t>
|
698 | <t>Provided the same SRGB is configured on all nodes, all nodes use
|
699 | the same MPLS label for a given IP prefix. This is simpler from an
|
700 | operation standpoint, as discussed in <xref target="SINGLESRGB" format="default"/></t>
|
701 | </section>
|
702 | <section anchor="MINFIB" numbered="true" toc="default">
|
703 | <name>Minimizing the FIB table</name>
|
704 | <t>The designer may decide to switch all the traffic at Tier-1 and
|
705 | Tier-2's based on MPLS, hence drastically decreasing the IP table size
|
706 | at these nodes.</t>
|
707 | <t>This is easily accomplished by encapsulating the traffic either
|
708 | directly at the host or the source ToR node by pushing the
|
709 | BGP-Prefix-SID of the destination ToR for intra-DC traffic, or the
|
710 | BGP-Prefix-SID for the the border node for inter-DC or
|
711 | DC-to-outside-world traffic.</t>
|
712 | </section>
|
713 | <section anchor="EPE" numbered="true" toc="default">
|
714 | <name>Egress Peer Engineering</name>
|
715 | <t>It is straightforward to combine the design illustrated in this
|
716 | document with the Egress Peer Engineering (EPE) use-case described in
|
717 | <xref target="I-D.ietf-spring-segment-routing-central-epe" format="default"/>.</t>
|
718 | <t>In such case, the operator is able to engineer its outbound traffic
|
719 | on a per host-flow basis, without incurring any additional state at
|
720 | intermediate points in the DC fabric.</t>
|
721 | <t>For example, the controller only needs to inject a per-flow state
|
722 | on the HostA to force it to send its traffic destined to a specific
|
723 | Internet destination D via a selected border node (say Node12 in <xref target="FIGLARGE" format="default"/> instead of another border node, Node11) and a
|
724 | specific egress peer of Node12 (say peer AS 9999 of local PeerNode
|
725 | segment 9999 at Node12 instead of any other peer which provides a path
|
726 | to the destination D). Any packet matching this state at host A would
|
727 | be encapsulated with SR segment list (label stack) {16012, 9999}.
|
728 | 16012 would steer the flow through the DC fabric, leveraging any ECMP,
|
729 | along the best path to border node Node12. Once the flow gets to
|
730 | border node Node12, the active segment is 9999 (because of PHP on the
|
731 | upstream neighbor of Node12). This EPE PeerNode segment forces border
|
732 | node Node12 to forward the packet to peer AS 9999, without any IP
|
733 | lookup at the border node. There is no per-flow state for this
|
734 | engineered flow in the DC fabric. A benefit of segment routing is the
|
735 | per-flow state is only required at the source.</t>
|
736 | <t>As well as allowing full traffic engineering control such a design
|
737 | also offers FIB table minimization benefits as the Internet-scale FIB
|
738 | at border node Node12 is not required if all FIB lookups are avoided
|
739 | there by using EPE.</t>
|
740 | </section>
|
741 | <section anchor="ANYCAST" numbered="true" toc="default">
|
742 | <name>Anycast</name>
|
743 | <t>The design presented in this document preserves the availability
|
744 | and load-balancing properties of the base design presented in <xref target="I-D.ietf-spring-segment-routing" format="default"/>.</t>
|
745 | <t>For example, one could assign an anycast loopback 192.0.2.20/32 and
|
746 | associate segment index 20 to it on the border Node11 and Node12 (in
|
747 | addition to their node-specific loopbacks). Doing so, the EPE
|
748 | controller could express a default "go-to-the-Internet via any border
|
749 | node" policy as segment list {16020}. Indeed, from any host in the DC
|
750 | fabric or from any ToR node, 16020 steers the packet towards the
|
751 | border Node11 or Node12 leveraging ECMP where available along the best
|
752 | paths to these nodes.</t>
|
753 | </section>
|
754 | </section>
|
755 | <section anchor="SINGLESRGB" numbered="true" toc="default">
|
756 | <name>Preferred SRGB Allocation</name>
|
757 | <t>In the MPLS case, it is recommend to use same SRGBs at each node.</t>
|
758 | <t>Different SRGBs in each node likely increase the complexity of the
|
759 | solution both from an operational viewpoint and from a controller
|
760 | viewpoint.</t>
|
761 | <t>From an operation viewpoint, it is much simpler to have the same
|
762 | global label at every node for the same destination (the MPLS
|
763 | troubleshooting is then similar to the IPv6 troubleshooting where this
|
764 | global property is a given).</t>
|
765 | <t>From a controller viewpoint, this allows us to construct simple
|
766 | policies applicable across the fabric.</t>
|
767 | <t>Let us consider two applications A and B respectively connected to
|
768 | Node1 and Node2 (ToR nodes). A has two flows FA1 and FA2 destined to Z.
|
769 | B has two flows FB1 and FB2 destined to Z. The controller wants FA1 and
|
770 | FB1 to be load-shared across the fabric while FA2 and FB2 must be
|
771 | respectively steered via Node5 and Node8.</t>
|
772 | <t>Assuming a consistent unique SRGB across the fabric as described in
|
773 | the document, the controller can simply do it by instructing A and B to
|
774 | use {16011} respectively for FA1 and FB1 and by instructing A and B to
|
775 | use {16005 16011} and {16008 16011} respectively for FA2 and FB2.</t>
|
776 | <t>Let us assume a design where the SRGB is different at every node and
|
777 | where the SRGB of each node is advertised using the Originator SRGB TLV
|
778 | of the BGP-Prefix-SID as defined in <xref target="I-D.ietf-idr-bgp-prefix-sid" format="default"/>: SRGB of Node K starts at value
|
779 | K*1000 and the SRGB length is 1000 (e.g. Node1's SRGB is [1000,
|
780 | 1999], Node2's SRGB is [2000, 2999], ...).</t>
|
781 | <t>In this case, not only the controller would need to collect and store
|
782 | all of these different SRGB's (e.g., through the Originator SRGB
|
783 | TLV of the BGP-Prefix-SID), furthermore it would need to adapt the
|
784 | policy for each host. Indeed, the controller would instruct A to use
|
785 | {1011} for FA1 while it would have to instruct B to use {2011} for FB1
|
786 | (while with the same SRGB, both policies are the same {16011}).</t>
|
787 | <t>Even worse, the controller would instruct A to use {1005, 5011} for
|
788 | FA1 while it would instruct B to use {2011, 8011} for FB1 (while with
|
789 | the same SRGB, the second segment is the same across both policies:
|
790 | 16011). When combining segments to create a policy, one need to
|
791 | carefully update the label of each segment. This is obviously more
|
792 | error-prone, more complex and more difficult to troubleshoot.</t>
|
793 | </section>
|
794 | <section anchor="IANA" numbered="true" toc="default">
|
795 | <name>IANA Considerations</name>
|
796 | <t>This document does not make any IANA request.</t>
|
797 | </section>
|
798 | <section anchor="MANAGE" numbered="true" toc="default">
|
799 | <name>Manageability Considerations</name>
|
800 | <t>The design and deployment guidelines described in this document are
|
801 | based on the network design described in <xref target="RFC7938" format="default"/>.</t>
|
802 | <t>The deployment model assumed in this document is based on a single
|
803 | domain where the interconnected DCs are part of the same administrative
|
804 | domain (which, of course, is split into different autonomous systems).
|
805 | The operator has full control of the whole domain and the usual
|
806 | operational and management mechanisms and procedures are used in order
|
807 | to prevent any information related to internal prefixes and topology to
|
808 | be leaked outside the domain.</t>
|
809 | <t>As recommended in <xref target="I-D.ietf-spring-segment-routing" format="default"/>,
|
810 | the same SRGB should be allocated in all nodes in order to facilitate
|
811 | the design, deployment and operations of the domain.</t>
|
812 | <t>When EPE (<xref target="I-D.ietf-spring-segment-routing-central-epe" format="default"/>) is used (as
|
813 | explained in <xref target="EPE" format="default"/>, the same operational model is
|
814 | assumed. EPE information is originated and propagated throughout the
|
815 | domain towards an internal server and unless explicitly configured by
|
816 | the operator, no EPE information is leaked outside the domain
|
817 | boundaries.</t>
|
818 | </section>
|
819 | <section anchor="SEC" numbered="true" toc="default">
|
820 | <name>Security Considerations</name>
|
821 | <t>This document proposes to apply Segment Routing to a well known
|
822 | scalability requirement expressed in <xref target="RFC7938" format="default"/> using the
|
823 | BGP-Prefix-SID as defined in <xref target="I-D.ietf-idr-bgp-prefix-sid" format="default"/>.</t>
|
824 | <t>It has to be noted, as described in <xref target="MANAGE" format="default"/> that the
|
825 | design illustrated in <xref target="RFC7938" format="default"/> and in this document,
|
826 | refer to a deployment model where all nodes are under the same
|
827 | administration. In this context, it is assumed that the operator doesn't
|
828 | want to leak outside of the domain any information related to internal
|
829 | prefixes and topology. The internal information includes prefix-sid and
|
830 | EPE information. In order to prevent such leaking, the standard BGP
|
831 | mechanisms (filters) are applied on the boundary of the domain.</t>
|
832 | <t>Therefore, the solution proposed in this document does not introduce
|
833 | any additional security concerns from what expressed in <xref target="RFC7938" format="default"/> and <xref target="I-D.ietf-idr-bgp-prefix-sid" format="default"/>. It
|
834 | is assumed that the security and confidentiality of the prefix and
|
835 | topology information is preserved by outbound filters at each peering
|
836 | point of the domain as described in <xref target="MANAGE" format="default"/>.</t>
|
837 | </section>
|
838 | <section anchor="Acknowledgements" numbered="true" toc="default">
|
839 | <name>Acknowledgements</name>
|
840 | <t>The authors would like to thank Benjamin Black, Arjun Sreekantiah,
|
841 | Keyur Patel, Acee Lindem and Anoop Ghanwani for their comments and
|
842 | review of this document.</t>
|
843 | </section>
|
844 | <section anchor="Contributors" numbered="true" toc="default">
|
845 | <name>Contributors</name>
|
846 | <artwork><![CDATA[
|
847 | Gaya Nagarajan
|
848 | Facebook
|
849 | US
|
850 |
|
851 | Email: gaya@fb.com
|
852 |
|
853 |
|
854 | Gaurav Dawra
|
855 | Cisco Systems
|
856 | US
|
857 |
|
858 | Email: gdawra.ietf@gmail.com
|
859 |
|
860 |
|
861 | Dmitry Afanasiev
|
862 | Yandex
|
863 | RU
|
864 |
|
865 | Email: fl0w@yandex-team.ru
|
866 |
|
867 |
|
868 | Tim Laberge
|
869 | Cisco
|
870 | US
|
871 |
|
872 | Email: tlaberge@cisco.com
|
873 |
|
874 |
|
875 | Edet Nkposong
|
876 | Salesforce.com Inc.
|
877 | US
|
878 |
|
879 | Email: enkposong@salesforce.com
|
880 |
|
881 |
|
882 | Mohan Nanduri
|
883 | Microsoft
|
884 | US
|
885 |
|
886 | Email: mnanduri@microsoft.com
|
887 |
|
888 |
|
889 | James Uttaro
|
890 | ATT
|
891 | US
|
892 |
|
893 | Email: ju1738@att.com
|
894 |
|
895 |
|
896 | Saikat Ray
|
897 | Unaffiliated
|
898 | US
|
899 |
|
900 | Email: raysaikat@gmail.com
|
901 |
|
902 | Jon Mitchell
|
903 | Unaffiliated
|
904 | US
|
905 |
|
906 | Email: jrmitche@puck.nether.net
|
907 | ]]></artwork>
|
908 |
|
909 | </section>
|
910 | </middle>
|
911 | <back>
|
912 | <references>
|
913 | <name>References</name>
|
914 | <references>
|
915 | <name>Normative References</name>
|
916 |
|
917 | <reference anchor="RFC2119"
|
918 | target="https://www.rfc-editor.org/info/rfc2119"
|
919 | xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
|
920 | <front>
|
921 | <title>Key words for use in RFCs to Indicate Requirement
|
922 | Levels</title>
|
923 | <seriesInfo name="DOI" value="10.17487/RFC2119"/>
|
924 | <seriesInfo name="RFC" value="2119"/>
|
925 | <seriesInfo name="BCP" value="14"/>
|
926 | <author initials="S." surname="Bradner" fullname="S. Bradner">
|
927 | <organization/>
|
928 | </author>
|
929 | <date year="1997" month="March"/>
|
930 | <abstract>
|
931 | <t>In many standards track documents several words are used to
|
932 | signify the requirements in the specification. These words are
|
933 | often capitalized. This document defines these words as they
|
934 | should be interpreted in IETF documents. This document
|
935 | specifies an Internet Best Current Practices for the Internet
|
936 | Community, and requests discussion and suggestions for improvements.</t>
|
937 | </abstract>
|
938 | </front>
|
939 | </reference>
|
940 | <reference anchor="RFC8277"
|
941 | target="https://www.rfc-editor.org/info/rfc8277"
|
942 | xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8277.xml">
|
943 | <front>
|
944 | <title>Using BGP to Bind MPLS Labels to Address Prefixes</title>
|
945 | <seriesInfo name="DOI" value="10.17487/RFC8277"/>
|
946 | <seriesInfo name="RFC" value="8277"/>
|
947 | <author initials="E." surname="Rosen" fullname="E. Rosen">
|
948 | <organization/>
|
949 | </author>
|
950 | <date year="2017" month="October"/>
|
951 | <abstract>
|
952 | <t>This document specifies a set of procedures for using BGP to
|
953 | advertise that a specified router has bound a specified MPLS
|
954 | label (or a specified sequence of MPLS labels organized as a
|
955 | contiguous part of a label stack) to a specified address prefix.
|
956 | This can be done by sending a BGP UPDATE message whose Network
|
957 | Layer Reachability Information field contains both the prefix
|
958 | and the MPLS label(s) and whose Next Hop field identifies the
|
959 | node at which said prefix is bound to said label(s). This
|
960 | document obsoletes RFC 3107.</t>
|
961 | </abstract>
|
962 | </front>
|
963 | </reference>
|
964 | <reference anchor="RFC4271"
|
965 | target="https://www.rfc-editor.org/info/rfc4271"
|
966 | xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4271.xml">
|
967 | <front>
|
968 | <title>A Border Gateway Protocol 4 (BGP-4)</title>
|
969 | <seriesInfo name="DOI" value="10.17487/RFC4271"/>
|
970 | <seriesInfo name="RFC" value="4271"/>
|
971 | <author initials="Y." surname="Rekhter" fullname="Y. Rekhter" role="editor">
|
972 | <organization/>
|
973 | </author>
|
974 | <author initials="T." surname="Li" fullname="T. Li" role="editor">
|
975 | <organization/>
|
976 | </author>
|
977 | <author initials="S." surname="Hares" fullname="S. Hares" role="editor">
|
978 | <organization/>
|
979 | </author>
|
980 | <date year="2006" month="January"/>
|
981 | <abstract>
|
982 | <t>This document discusses the Border Gateway Protocol (BGP),
|
983 | which is an inter-Autonomous System routing protocol.</t>
|
984 | <t>The primary function of a BGP speaking system is to exchange
|
985 | network reachability information with other BGP systems. This
|
986 | network reachability information includes information on the
|
987 | list of Autonomous Systems (ASes) that reachability information
|
988 | traverses. This information is sufficient for constructing a
|
989 | graph of AS connectivity for this reachability from which
|
990 | routing loops may be pruned, and, at the AS level, some policy
|
991 | decisions may be enforced.</t>
|
992 | <t>BGP-4 provides a set of mechanisms for supporting Classless
|
993 | Inter-Domain Routing (CIDR). These mechanisms include support
|
994 | for advertising a set of destinations as an IP prefix, and
|
995 | eliminating the concept of network "class" within BGP. BGP-4
|
996 | also introduces mechanisms that allow aggregation of routes,
|
997 | including aggregation of AS paths.</t>
|
998 | <t>This document obsoletes RFC 1771. [STANDARDS-TRACK]</t>
|
999 | </abstract>
|
1000 | </front>
|
1001 | </reference>
|
1002 |
|
1003 | <reference anchor="RFC7938"
|
1004 | target="https://www.rfc-editor.org/info/rfc7938"
|
1005 | xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7938.xml">
|
1006 | <front>
|
1007 | <title>Use of BGP for Routing in Large-Scale Data Centers</title>
|
1008 | <seriesInfo name="DOI" value="10.17487/RFC7938"/>
|
1009 | <seriesInfo name="RFC" value="7938"/>
|
1010 | <author initials="P." surname="Lapukhov" fullname="P. Lapukhov">
|
1011 | <organization/>
|
1012 | </author>
|
1013 | <author initials="A." surname="Premji" fullname="A. Premji">
|
1014 | <organization/>
|
1015 | </author>
|
1016 | <author initials="J." surname="Mitchell" fullname="J. Mitchell" role="editor">
|
1017 | <organization/>
|
1018 | </author>
|
1019 | <date year="2016" month="August"/>
|
1020 | <abstract>
|
1021 | <t>Some network operators build and operate data centers that
|
1022 | support over one hundred thousand servers. In this document,
|
1023 | such data centers are referred to as "large-scale" to
|
1024 | differentiate them from smaller infrastructures. Environments
|
1025 | of this scale have a unique set of network requirements with an
|
1026 | emphasis on operational simplicity and network stability. This
|
1027 | document summarizes operational experience in designing and
|
1028 | operating large-scale data centers using BGP as the only routing
|
1029 | protocol. The intent is to report on a proven and stable
|
1030 | routing design that could be leveraged by others in the
|
1031 | industry.</t>
|
1032 | </abstract>
|
1033 | </front>
|
1034 | </reference>
|
1035 | <reference anchor="I-D.ietf-spring-segment-routing"
|
1036 | target="http://www.ietf.org/internet-drafts/draft-ietf-spring-segment-routing-15.txt">
|
1037 | <front>
|
1038 | <title>Segment Routing Architecture</title>
|
1039 | <seriesInfo name="Internet-Draft"
|
1040 | value="draft-ietf-spring-segment-routing-15"/>
|
1041 | <author initials="C" surname="Filsfils" fullname="Clarence Filsfils">
|
1042 | <organization/>
|
1043 | </author>
|
1044 | <author initials="S" surname="Previdi" fullname="Stefano Previdi">
|
1045 | <organization/>
|
1046 | </author>
|
1047 | <author initials="L" surname="Ginsberg" fullname="Les Ginsberg">
|
1048 | <organization/>
|
1049 | </author>
|
1050 | <author initials="B" surname="Decraene" fullname="Bruno Decraene">
|
1051 | <organization/>
|
1052 | </author>
|
1053 | <author initials="S" surname="Litkowski" fullname="Stephane Litkowski">
|
1054 | <organization/>
|
1055 | </author>
|
1056 | <author initials="R" surname="Shakir" fullname="Rob Shakir">
|
1057 | <organization/>
|
1058 | </author>
|
1059 | <date month="January" day="25" year="2018"/>
|
1060 | <abstract>
|
1061 | <t>Segment Routing (SR) leverages the source routing paradigm.
|
1062 | A node steers a packet through an ordered list of instructions,
|
1063 | called segments. A segment can represent any instruction,
|
1064 | topological or service-based. A segment can have a semantic
|
1065 | local to an SR node or global within an SR domain. SR allows to
|
1066 | enforce a flow through any topological path while maintaining
|
1067 | per-flow state only at the ingress nodes to the SR domain.
|
1068 | Segment Routing can be directly applied to the MPLS
|
1069 | architecture with no change on the forwarding plane. A segment
|
1070 | is encoded as an MPLS label. An ordered list of segments is
|
1071 | encoded as a stack of labels. The segment to process is on the
|
1072 | top of the stack. Upon completion of a segment, the related
|
1073 | label is popped from the stack. Segment Routing can be applied
|
1074 | to the IPv6 architecture, with a new type of routing header. A
|
1075 | segment is encoded as an IPv6 address. An ordered list of
|
1076 | segments is encoded as an ordered list of IPv6 addresses in the
|
1077 | routing header. The active segment is indicated by the
|
1078 | Destination Address of the packet. The next active segment is
|
1079 | indicated by a pointer in the new routing header.</t>
|
1080 | </abstract>
|
1081 | </front>
|
1082 | </reference>
|
1083 |
|
1084 | <reference anchor="I-D.ietf-idr-bgp-prefix-sid"
|
1085 | target="http://www.ietf.org/internet-drafts/draft-ietf-idr-bgp-prefix-sid-27.txt">
|
1086 | <front>
|
1087 | <title>Segment Routing Prefix SID extensions for BGP</title>
|
1088 | <seriesInfo name="Internet-Draft"
|
1089 | value="draft-ietf-idr-bgp-prefix-sid-27"/>
|
1090 | <author initials="S" surname="Previdi" fullname="Stefano Previdi">
|
1091 | <organization/>
|
1092 | </author>
|
1093 | <author initials="C" surname="Filsfils" fullname="Clarence Filsfils">
|
1094 | <organization/>
|
1095 | </author>
|
1096 | <author initials="A" surname="Lindem" fullname="Acee Lindem">
|
1097 | <organization/>
|
1098 | </author>
|
1099 | <author initials="A" surname="Sreekantiah" fullname="Arjun Sreekantiah">
|
1100 | <organization/>
|
1101 | </author>
|
1102 | <author initials="H" surname="Gredler" fullname="Hannes Gredler">
|
1103 | <organization/>
|
1104 | </author>
|
1105 | <date month="June" day="26" year="2018"/>
|
1106 | <abstract>
|
1107 | <t>Segment Routing (SR) leverages the source routing paradigm.
|
1108 | A node steers a packet through an ordered list of instructions,
|
1109 | called segments. A segment can represent any instruction,
|
1110 | topological or service-based. The ingress node prepends an SR
|
1111 | header to a packet containing a set of segment identifiers
|
1112 | (SID). Each SID represents a topological or a service-based
|
1113 | instruction. Per-flow state is maintained only on the ingress
|
1114 | node of the SR domain. An SR domain is defined as a single
|
1115 | administrative domain for global SID assignment. This document
|
1116 | defines an optional, transitive BGP attribute for announcing BGP
|
1117 | Prefix Segment Identifiers (BGP Prefix-SID) information and the
|
1118 | specification for SR-MPLS SIDs.</t>
|
1119 | </abstract>
|
1120 | </front>
|
1121 | </reference>
|
1122 | <reference anchor="I-D.ietf-spring-segment-routing-central-epe"
|
1123 | target="http://www.ietf.org/internet-drafts/draft-ietf-spring-segment-routing-central-epe-10.txt">
|
1124 | <front>
|
1125 | <title>Segment Routing Centralized BGP Egress Peer
|
1126 | Engineering</title>
|
1127 | <seriesInfo name="Internet-Draft"
|
1128 | value="draft-ietf-spring-segment-routing-central-epe-10"/>
|
1129 | <author initials="C" surname="Filsfils" fullname="Clarence Filsfils">
|
1130 | <organization/>
|
1131 | </author>
|
1132 | <author initials="S" surname="Previdi" fullname="Stefano Previdi">
|
1133 | <organization/>
|
1134 | </author>
|
1135 | <author initials="G" surname="Dawra" fullname="Gaurav Dawra">
|
1136 | <organization/>
|
1137 | </author>
|
1138 | <author initials="E" surname="Aries" fullname="Ebben Aries">
|
1139 | <organization/>
|
1140 | </author>
|
1141 | <author initials="D" surname="Afanasiev" fullname="Dmitry Afanasiev">
|
1142 | <organization/>
|
1143 | </author>
|
1144 | <date month="December" day="21" year="2017"/>
|
1145 | <abstract>
|
1146 | <t>Segment Routing (SR) leverages source routing. A node steers
|
1147 | a packet through a controlled set of instructions, called
|
1148 | segments, by prepending the packet with an SR header. A segment
|
1149 | can represent any instruction topological or service-based. SR
|
1150 | allows to enforce a flow through any topological path while
|
1151 | maintaining per-flow state only at the ingress node of the SR
|
1152 | domain. The Segment Routing architecture can be directly
|
1153 | applied to the MPLS dataplane with no change on the forwarding
|
1154 | plane. It requires a minor extension to the existing link-state
|
1155 | routing protocols. This document illustrates the application of
|
1156 | Segment Routing to solve the BGP Egress Peer Engineering
|
1157 | (BGP-EPE) requirement. The SR-based BGP-EPE solution allows a
|
1158 | centralized (Software Defined Network, SDN) controller to
|
1159 | program any egress peer policy at ingress border routers or at
|
1160 | hosts within the domain.</t>
|
1161 | </abstract>
|
1162 | </front>
|
1163 | </reference>
|
1164 | </references>
|
1165 |
|
1166 | <references>
|
1167 | <name>Informative References</name>
|
1168 | <reference anchor="RFC6793"
|
1169 | target="https://www.rfc-editor.org/info/rfc6793"
|
1170 | xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6793.xml">
|
1171 | <front>
|
1172 | <title>BGP Support for Four-Octet Autonomous System (AS) Number
|
1173 | Space</title>
|
1174 | <seriesInfo name="DOI" value="10.17487/RFC6793"/>
|
1175 | <seriesInfo name="RFC" value="6793"/>
|
1176 | <author initials="Q." surname="Vohra" fullname="Q. Vohra">
|
1177 | <organization/>
|
1178 | </author>
|
1179 | <author initials="E." surname="Chen" fullname="E. Chen">
|
1180 | <organization/>
|
1181 | </author>
|
1182 | <date year="2012" month="December"/>
|
1183 | <abstract>
|
1184 | <t>The Autonomous System number is encoded as a two-octet entity
|
1185 | in the base BGP specification. This document describes
|
1186 | extensions to BGP to carry the Autonomous System numbers as
|
1187 | four-octet entities. This document obsoletes RFC 4893 and
|
1188 | updates RFC 4271. [STANDARDS-TRACK]</t>
|
1189 | </abstract>
|
1190 | </front>
|
1191 | </reference>
|
1192 | <reference anchor="I-D.ietf-6man-segment-routing-header"
|
1193 | target="http://www.ietf.org/internet-drafts/draft-ietf-6man-segment-routing-header-21.txt">
|
1194 | <front>
|
1195 | <title>IPv6 Segment Routing Header (SRH)</title>
|
1196 | <seriesInfo name="Internet-Draft"
|
1197 | value="draft-ietf-6man-segment-routing-header-21"/>
|
1198 | <author initials="C" surname="Filsfils" fullname="Clarence Filsfils">
|
1199 | <organization/>
|
1200 | </author>
|
1201 | <author initials="D" surname="Dukes" fullname="Darren Dukes">
|
1202 | <organization/>
|
1203 | </author>
|
1204 | <author initials="S" surname="Previdi" fullname="Stefano Previdi">
|
1205 | <organization/>
|
1206 | </author>
|
1207 | <author initials="J" surname="Leddy" fullname="John Leddy">
|
1208 | <organization/>
|
1209 | </author>
|
1210 | <author initials="S" surname="Matsushima" fullname="Satoru Matsushima">
|
1211 | <organization/>
|
1212 | </author>
|
1213 | <author initials="d" surname="daniel.voyer@bell.ca"
|
1214 | fullname="daniel.voyer@bell.ca">
|
1215 | <organization/>
|
1216 | </author>
|
1217 | <date month="June" day="13" year="2019"/>
|
1218 | <abstract>
|
1219 | <t>Segment Routing can be applied to the IPv6 data plane using a
|
1220 | new type of Routing Extension Header. This document describes
|
1221 | the Segment Routing Extension Header and how it is used by
|
1222 | Segment Routing capable nodes.</t>
|
1223 | </abstract>
|
1224 | </front>
|
1225 | </reference>
|
1226 |
|
1227 | </references>
|
1228 | </references>
|
1229 | </back>
|
1230 | </rfc>
|