draft-ietf-bmwg-dcbench-terminology-19v3.original | draft-ietf-bmwg-dcbench-terminology-19v3output.txt | |||
---|---|---|---|---|
Internet Engineering Task Force L. Avramov | Internet Engineering Task Force L. Avramov | |||
INTERNET-DRAFT, Intended status: Informational Google | Internet-Draft Google | |||
Expires: December 24,2017 J. Rapp | Intended status: Informational J. Rapp | |||
June 22, 2017 VMware | Expires: December 24, 2017 VMware | |||
June 22, 2017 | ||||
Data Center Benchmarking Terminology | Data Center Benchmarking Terminology | |||
draft-ietf-bmwg-dcbench-terminology-19 | draft-ietf-bmwg-dcbench-terminology-19 | |||
Abstract | Abstract | |||
The purpose of this informational document is to establish definitions | The purpose of this informational document is to establish | |||
and describe measurement techniques for data center benchmarking, as | definitions and describe measurement techniques for data center | |||
well as it is to introduce new terminologies applicable to performance | benchmarking, as well as it is to introduce new terminologies | |||
evaluations of data center network equipment. This document establishes | applicable to performance evaluations of data center network | |||
the important concepts for benchmarking network switches and routers in | equipment. This document establishes the important concepts for | |||
the data center and, is a pre-requisite to the test methodology | benchmarking network switches and routers in the data center and, is | |||
publication [draft-ietf-bmwg-dcbench-methodology]. Many of these terms | a pre-requisite to the test methodology publication [draft-ietf-bmwg- | |||
and methods may be applicable to network equipment beyond this | dcbench-methodology]. Many of these terms and methods may be | |||
publication's scope as the technologies originally applied in the data | applicable to network equipment beyond this publication's scope as | |||
center are deployed elsewhere. | the technologies originally applied in the data center are deployed | |||
elsewhere. | ||||
Status of this Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the provisions | This Internet-Draft is submitted in full conformance with the | |||
of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering Task | Internet-Drafts are working documents of the Internet Engineering | |||
Force (IETF). Note that other groups may also distribute working | Task Force (IETF). Note that other groups may also distribute | |||
documents as Internet-Drafts. The list of current Internet-Drafts is at | working documents as Internet-Drafts. The list of current Internet- | |||
http://datatracker.ietf.org/drafts/current. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference material | time. It is inappropriate to use Internet-Drafts as reference | |||
or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on December 24, 2017. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2017 IETF Trust and the persons identified as the document | Copyright (c) 2017 IETF Trust and the persons identified as the | |||
authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal Provisions | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Relating to IETF Documents (http://trustee.ietf.org/license-info) in | Provisions Relating to IETF Documents | |||
effect on the date of publication of this document. Please review these | (http://trustee.ietf.org/license-info) in effect on the date of | |||
documents carefully, as they describe your rights and restrictions with | publication of this document. Please review these documents | |||
respect to this document. Code Components extracted from this document | carefully, as they describe your rights and restrictions with respect | |||
must include Simplified BSD License text as described in Section 4.e of | to this document. Code Components extracted from this document must | |||
the Trust Legal Provisions and are provided without warranty as | include Simplified BSD License text as described in Section 4.e of | |||
described in the Simplified BSD License. | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 | 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 | |||
1.2. Definition format . . . . . . . . . . . . . . . . . . . . . 4 | 1.2. Definition format . . . . . . . . . . . . . . . . . . . . 4 | |||
2. Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 2.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
2.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 6 | 2.3. Measurement Units . . . . . . . . . . . . . . . . . . . . 6 | |||
3 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 3. Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 3.1. Definition . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 3.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
3.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 7 | 3.3. Measurement Units . . . . . . . . . . . . . . . . . . . . 7 | |||
4 Physical Layer Calibration . . . . . . . . . . . . . . . . . . . 7 | 4. Physical Layer Calibration . . . . . . . . . . . . . . . . . 7 | |||
4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 4.1. Definition . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 4.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
4.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 8 | 4.3. Measurement Units . . . . . . . . . . . . . . . . . . . . 8 | |||
5 Line rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 5. Line rate . . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 5.1. Definition . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 5.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
5.3 Measurement Units . . . . . . . . . . . . . . . . . . . . . 10 | 5.3. Measurement Units . . . . . . . . . . . . . . . . . . . . 10 | |||
6 Buffering . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 6. Buffering . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
6.1 Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 6.1. Buffer . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
6.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 11 | 6.1.1. Definition . . . . . . . . . . . . . . . . . . . . . 11 | |||
6.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 12 | 6.1.2. Discussion . . . . . . . . . . . . . . . . . . . . . 12 | |||
6.1.3 Measurement Units . . . . . . . . . . . . . . . . . . . 12 | 6.1.3. Measurement Units . . . . . . . . . . . . . . . . . . 13 | |||
6.2 Incast . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 | 6.2. Incast . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
6.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . 13 | 6.2.1. Definition . . . . . . . . . . . . . . . . . . . . . 13 | |||
6.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . 14 | 6.2.2. Discussion . . . . . . . . . . . . . . . . . . . . . 14 | |||
6.2.3 Measurement Units . . . . . . . . . . . . . . . . . . . 14 | 6.2.3. Measurement Units . . . . . . . . . . . . . . . . . . 14 | |||
7 Application Throughput: Data Center Goodput . . . . . . . . . . 14 | 7. Application Throughput: Data Center Goodput . . . . . . . . . 14 | |||
7.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 14 | 7.1. Definition . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
7.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . . 14 | 7.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
7.3. Measurement Units . . . . . . . . . . . . . . . . . . . . . 15 | 7.3. Measurement Units . . . . . . . . . . . . . . . . . . . . 15 | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 | |||
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . 16 | 10.1. Normative References . . . . . . . . . . . . . . . . . . 16 | |||
10.2. Informative References . . . . . . . . . . . . . . . . . 17 | 10.2. Informative References . . . . . . . . . . . . . . . . . 17 | |||
10.3. Acknowledgments . . . . . . . . . . . . . . . . . . . . . 17 | Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
authors would like to thank Alfred Morton, Scott Bradner, Ian | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 | Cox, Tim Stevenson for their reviews and feedback. . . . . . . . 17 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 | ||||
1. Introduction | 1. Introduction | |||
Traffic patterns in the data center are not uniform and are | Traffic patterns in the data center are not uniform and are | |||
constantly changing. They are dictated by the nature and variety of | constantly changing. They are dictated by the nature and variety of | |||
applications utilized in the data center. It can be largely east-west | applications utilized in the data center. It can be largely east- | |||
traffic flows (server to server inside the data center) in one data | west traffic flows (server to server inside the data center) in one | |||
center and north-south (outside of the data center to server) in | data center and north-south (outside of the data center to server) in | |||
another, while some may combine both. Traffic patterns can be bursty | another, while some may combine both. Traffic patterns can be bursty | |||
in nature and contain many-to-one, many-to-many, or one-to-many | in nature and contain many-to-one, many-to-many, or one-to-many | |||
flows. Each flow may also be small and latency sensitive or large and | flows. Each flow may also be small and latency sensitive or large | |||
throughput sensitive while containing a mix of UDP and TCP traffic. | and throughput sensitive while containing a mix of UDP and TCP | |||
One or more of these may coexist in a single cluster and flow through | traffic. One or more of these may coexist in a single cluster and | |||
a single network device simultaneously. Benchmarking of network | flow through a single network device simultaneously. Benchmarking of | |||
devices have long used [RFC1242], [RFC2432], [RFC2544], [RFC2889] and | network devices have long used [RFC1242], [RFC2432], [RFC2544], | |||
[RFC3918]. These benchmarks have largely been focused around various | [RFC2889] and [RFC3918]. These benchmarks have largely been focused | |||
latency attributes and max throughput of the Device Under Test being | around various latency attributes and max throughput of the Device | |||
benchmarked. These standards are good at measuring theoretical max | Under Test being benchmarked. These standards are good at measuring | |||
throughput, forwarding rates and latency under testing conditions, | theoretical max throughput, forwarding rates and latency under | |||
but they do not represent real traffic patterns that may affect these | testing conditions, but they do not represent real traffic patterns | |||
networking devices. The data center networking devices covered are | that may affect these networking devices. The data center networking | |||
switches and routers. | devices covered are switches and routers. | |||
Currently, typical data center networking devices are characterized | Currently, typical data center networking devices are characterized | |||
by: | by: | |||
-High port density (48 ports of more) | -High port density (48 ports of more) | |||
-High speed (up to 100 GB/s currently per port) | -High speed (up to 100 GB/s currently per port) | |||
-High throughput (line rate on all ports for Layer 2 and/or Layer 3) | -High throughput (line rate on all ports for Layer 2 and/or Layer 3) | |||
-Low latency (in the microsecond or nanosecond range) | -Low latency (in the microsecond or nanosecond range) | |||
-Low amount of buffer (in the MB range per networking device) | -Low amount of buffer (in the MB range per networking device) | |||
-Layer 2 and Layer 3 forwarding capability (Layer 3 not mandatory) | -Layer 2 and Layer 3 forwarding capability (Layer 3 not mandatory) | |||
The following document defines a set of definitions, metrics and | The following document defines a set of definitions, metrics and | |||
terminologies including congestion scenarios, switch buffer analysis | terminologies including congestion scenarios, switch buffer analysis | |||
and redefines basic definitions in order to represent a wide mix of | and redefines basic definitions in order to represent a wide mix of | |||
traffic conditions. The test methodologies are defined in [draft- | traffic conditions. The test methodologies are defined in [draft- | |||
ietf-bmwg-dcbench-methodology]. | ietf-bmwg-dcbench-methodology]. | |||
1.1. Requirements Language | 1.1. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in RFC 2119 [RFC2119]. | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
1.2. Definition format | 1.2. Definition format | |||
Term to be defined. (e.g., Latency) | Term to be defined. (e.g., Latency) | |||
Definition: The specific definition for the term. | Definition: The specific definition for the term. | |||
Discussion: A brief discussion about the term, its application and | Discussion: A brief discussion about the term, its application and | |||
any restrictions on measurement procedures. | any restrictions on measurement procedures. | |||
Measurement Units: Methodology for the measure and units used to | Measurement Units: Methodology for the measure and units used to | |||
report measurements of this term, if applicable. | report measurements of this term, if applicable. | |||
2. Latency | 2. Latency | |||
2.1. Definition | 2.1. Definition | |||
Latency is a the amount of time it takes a frame to transit the | Latency is a the amount of time it takes a frame to transit the | |||
Device Under Test (DUT). Latency is measured in units of time | Device Under Test (DUT). Latency is measured in units of time | |||
(seconds, milliseconds, microseconds and so on). The purpose of | (seconds, milliseconds, microseconds and so on). The purpose of | |||
measuring latency is to understand the impact of adding a device in | measuring latency is to understand the impact of adding a device in | |||
the communication path. | the communication path. | |||
The Latency interval can be assessed between different combinations | The Latency interval can be assessed between different combinations | |||
of events, regardless of the type of switching device (bit forwarding | of events, regardless of the type of switching device (bit forwarding | |||
aka cut-through, or store-and-forward type of device). [RFC1242] | aka cut-through, or store-and-forward type of device). [RFC1242] | |||
defined Latency differently for each of these types of devices. | defined Latency differently for each of these types of devices. | |||
Traditionally the latency measurement definitions are: | Traditionally the latency measurement definitions are: | |||
FILO (First In Last Out) | FILO (First In Last Out) | |||
The time interval starting when the end of the first bit of the input | The time interval starting when the end of the first bit of the input | |||
frame reaches the input port and ending when the last bit of the | frame reaches the input port and ending when the last bit of the | |||
output frame is seen on the output port. | output frame is seen on the output port. | |||
FIFO (First In First Out): | FIFO (First In First Out): | |||
The time interval starting when the end of the first bit of the input | The time interval starting when the end of the first bit of the input | |||
frame reaches the input port and ending when the start of the first | frame reaches the input port and ending when the start of the first | |||
bit of the output frame is seen on the output port. [RFC1242] Latency | bit of the output frame is seen on the output port. [RFC1242] | |||
for bit forwarding devices uses these events. | Latency for bit forwarding devices uses these events. | |||
LILO (Last In Last Out): | LILO (Last In Last Out): | |||
The time interval starting when the last bit of the input frame | The time interval starting when the last bit of the input frame | |||
reaches the input port and the last bit of the output frame is seen | reaches the input port and the last bit of the output frame is seen | |||
on the output port. | on the output port. | |||
LIFO (Last In First Out): | LIFO (Last In First Out): | |||
The time interval starting when the last bit of the input frame | The time interval starting when the last bit of the input frame | |||
reaches the input port and ending when the first bit of the output | reaches the input port and ending when the first bit of the output | |||
frame is seen on the output port. [RFC1242] Latency for bit | frame is seen on the output port. [RFC1242] Latency for bit | |||
forwarding devices uses these events. | forwarding devices uses these events. | |||
Another possibility to summarize the four different definitions above | Another possibility to summarize the four different definitions above | |||
is to refer to the bit position as they normally occur: Input to | is to refer to the bit position as they normally occur: Input to | |||
output. | output. | |||
FILO is FL (First bit Last bit). FIFO is FF (First bit First bit). | FILO is FL (First bit Last bit). FIFO is FF (First bit First bit). | |||
LILO is LL (Last bit Last bit). LIFO is LF (Last bit First bit). | LILO is LL (Last bit Last bit). LIFO is LF (Last bit First bit). | |||
This definition explained in this section in context of data center | This definition explained in this section in context of data center | |||
switching benchmarking is in lieu of the previous definition of | switching benchmarking is in lieu of the previous definition of | |||
Latency defined in RFC 1242, section 3.8 and is quoted here: | Latency defined in RFC 1242, section 3.8 and is quoted here: | |||
For store and forward devices: The time interval starting when the | For store and forward devices: The time interval starting when the | |||
last bit of the input frame reaches the input port and ending when | last bit of the input frame reaches the input port and ending when | |||
the first bit of the output frame is seen on the output port. | the first bit of the output frame is seen on the output port. | |||
For bit forwarding devices: The time interval starting when the end | For bit forwarding devices: The time interval starting when the end | |||
of the first bit of the input frame reaches the input port and ending | of the first bit of the input frame reaches the input port and ending | |||
when the start of the first bit of the output frame is seen on the | when the start of the first bit of the output frame is seen on the | |||
output port. | output port. | |||
To accommodate both types of network devices and hybrids of the two | To accommodate both types of network devices and hybrids of the two | |||
types that have emerged, switch Latency measurements made according | types that have emerged, switch Latency measurements made according | |||
to this document MUST be measured with the FILO events. FILO will | to this document MUST be measured with the FILO events. FILO will | |||
include the latency of the switch and the latency of the frame as | include the latency of the switch and the latency of the frame as | |||
well as the serialization delay. It is a picture of the 'whole' | well as the serialization delay. It is a picture of the 'whole' | |||
latency going through the DUT. For applications which are latency | latency going through the DUT. For applications which are latency | |||
sensitive and can function with initial bytes of the frame, FIFO (or | sensitive and can function with initial bytes of the frame, FIFO (or | |||
RFC 1242 Latency for bit forwarding devices) MAY be used. In all | RFC 1242 Latency for bit forwarding devices) MAY be used. In all | |||
cases, the event combination used in Latency measurement MUST be | cases, the event combination used in Latency measurement MUST be | |||
reported. | reported. | |||
2.2 Discussion | 2.2. Discussion | |||
As mentioned in section 2.1, FILO is the most important measuring | As mentioned in section 2.1, FILO is the most important measuring | |||
definition. | definition. | |||
Not all DUTs are exclusively cut-through or store-and-forward. Data | Not all DUTs are exclusively cut-through or store-and-forward. Data | |||
Center DUTs are frequently store-and-forward for smaller packet sizes | Center DUTs are frequently store-and-forward for smaller packet sizes | |||
and then adopting a cut-through behavior. The change of behavior | and then adopting a cut-through behavior. The change of behavior | |||
happens at specific larger packet sizes. The value of the packet size | happens at specific larger packet sizes. The value of the packet | |||
for the behavior to change MAY be configurable depending on the DUT | size for the behavior to change MAY be configurable depending on the | |||
manufacturer. FILO covers all scenarios: Store-and-forward or cut- | DUT manufacturer. FILO covers all scenarios: Store-and-forward or | |||
through. The threshold of behavior change does not matter for | cut- through. The threshold of behavior change does not matter for | |||
benchmarking since FILO covers both possible scenarios. | benchmarking since FILO covers both possible scenarios. | |||
LIFO mechanism can be used with store forward type of switches but | LIFO mechanism can be used with store forward type of switches but | |||
not with cut-through type of switches, as it will provide negative | not with cut-through type of switches, as it will provide negative | |||
latency values for larger packet sizes because LIFO removes the | latency values for larger packet sizes because LIFO removes the | |||
serialization delay. Therefore, this mechanism MUST NOT be used when | serialization delay. Therefore, this mechanism MUST NOT be used when | |||
comparing latencies of two different DUTs. | comparing latencies of two different DUTs. | |||
2.3 Measurement Units | 2.3. Measurement Units | |||
The measuring methods to use for benchmarking purposes are as | The measuring methods to use for benchmarking purposes are as | |||
follows: | follows: | |||
1) FILO MUST be used as a measuring method, as this will include the | 1) FILO MUST be used as a measuring method, as this will include the | |||
latency of the packet; and today the application commonly needs to | latency of the packet; and today the application commonly needs to | |||
read the whole packet to process the information and take an action. | read the whole packet to process the information and take an action. | |||
2) FIFO MAY be used for certain applications able to proceed the data | 2) FIFO MAY be used for certain applications able to proceed the data | |||
as the first bits arrive, as for example for a Field-Programmable | as the first bits arrive, as for example for a Field-Programmable | |||
Gate Array (FPGA) | Gate Array (FPGA) | |||
3) LIFO MUST NOT be used, because it subtracts the latency of the | 3) LIFO MUST NOT be used, because it subtracts the latency of the | |||
packet; unlike all the other methods. | packet; unlike all the other methods. | |||
3 Jitter | 3. Jitter | |||
3.1 Definition | 3.1. Definition | |||
Jitter in the data center context is synonymous with the common term | Jitter in the data center context is synonymous with the common term | |||
Delay variation. It is derived from multiple measurements of one-way | Delay variation. It is derived from multiple measurements of one-way | |||
delay, as described in RFC 3393. The mandatory definition of Delay | delay, as described in RFC 3393. The mandatory definition of Delay | |||
Variation is the Packet Delay Variation (PDV) from section 4.2 of | Variation is the Packet Delay Variation (PDV) from section 4.2 of | |||
[RFC5481]. When considering a stream of packets, the delays of all | [RFC5481]. When considering a stream of packets, the delays of all | |||
packets are subtracted from the minimum delay over all packets in the | packets are subtracted from the minimum delay over all packets in the | |||
stream. This facilitates assessment of the range of delay variation | stream. This facilitates assessment of the range of delay variation | |||
(Max - Min), or a high percentile of PDV (99th percentile, for | (Max - Min), or a high percentile of PDV (99th percentile, for | |||
robustness against outliers). | robustness against outliers). | |||
When First-bit to Last-bit timestamps are used for Delay measurement, | When First-bit to Last-bit timestamps are used for Delay measurement, | |||
then Delay Variation MUST be measured using packets or frames of the | then Delay Variation MUST be measured using packets or frames of the | |||
same size, since the definition of latency includes the serialization | same size, since the definition of latency includes the serialization | |||
time for each packet. Otherwise if using First-bit to First-bit, the | time for each packet. Otherwise if using First-bit to First-bit, the | |||
size restriction does not apply. | size restriction does not apply. | |||
3.2 Discussion | 3.2. Discussion | |||
In addition to PDV Range and/or a high percentile of PDV, Inter- | In addition to PDV Range and/or a high percentile of PDV, Inter- | |||
Packet Delay Variation (IPDV) as defined in section 4.1 of [RFC5481] | Packet Delay Variation (IPDV) as defined in section 4.1 of [RFC5481] | |||
(differences between two consecutive packets) MAY be used for the | (differences between two consecutive packets) MAY be used for the | |||
purpose of determining how packet spacing has changed during | purpose of determining how packet spacing has changed during | |||
transfer, for example, to see if packet stream has become closely- | transfer, for example, to see if packet stream has become closely- | |||
spaced or "bursty". However, the Absolute Value of IPDV SHOULD NOT be | spaced or "bursty". However, the Absolute Value of IPDV SHOULD NOT | |||
used, as this collapses the "bursty" and "dispersed" sides of the | be used, as this collapses the "bursty" and "dispersed" sides of the | |||
IPDV distribution together. | IPDV distribution together. | |||
3.3 Measurement Units | 3.3. Measurement Units | |||
The measurement of delay variation is expressed in units of seconds. | The measurement of delay variation is expressed in units of seconds. | |||
A PDV histogram MAY be provided for the population of packets | A PDV histogram MAY be provided for the population of packets | |||
measured. | measured. | |||
4 Physical Layer Calibration | 4. Physical Layer Calibration | |||
4.1 Definition | 4.1. Definition | |||
The calibration of the physical layer consists of defining and | The calibration of the physical layer consists of defining and | |||
measuring the latency of the physical devices used to perform tests | measuring the latency of the physical devices used to perform tests | |||
on the DUT. | on the DUT. | |||
It includes the list of all physical layer components used as listed | It includes the list of all physical layer components used as listed | |||
here after: | here after: | |||
-Type of device used to generate traffic / measure traffic | -Type of device used to generate traffic / measure traffic | |||
-Type of line cards used on the traffic generator | -Type of line cards used on the traffic generator | |||
-Type of transceivers on traffic generator | -Type of transceivers on traffic generator | |||
-Type of transceivers on DUT | -Type of transceivers on DUT | |||
-Type of cables | -Type of cables | |||
-Length of cables | ||||
-Length of cables | ||||
-Software name, and version of traffic generator and DUT | -Software name, and version of traffic generator and DUT | |||
-List of enabled features on DUT MAY be provided and is recommended | -List of enabled features on DUT MAY be provided and is recommended | |||
(especially the control plane protocols such as Link Layer Discovery | (especially the control plane protocols such as Link Layer Discovery | |||
Protocol, Spanning-Tree etc.). A comprehensive configuration file MAY | Protocol, Spanning-Tree etc.). A comprehensive configuration file | |||
be provided to this effect. | MAY be provided to this effect. | |||
4.2 Discussion | 4.2. Discussion | |||
Physical layer calibration is part of the end to end latency, which | Physical layer calibration is part of the end to end latency, which | |||
should be taken into acknowledgment while evaluating the DUT. Small | should be taken into acknowledgment while evaluating the DUT. Small | |||
variations of the physical components of the test may impact the | variations of the physical components of the test may impact the | |||
latency being measured, therefore they MUST be described when | latency being measured, therefore they MUST be described when | |||
presenting results. | presenting results. | |||
4.3 Measurement Units | 4.3. Measurement Units | |||
It is RECOMMENDED to use all cables of: The same type, the same | It is RECOMMENDED to use all cables of: The same type, the same | |||
length, when possible using the same vendor. It is a MUST to document | length, when possible using the same vendor. It is a MUST to | |||
the cables specifications on section 4.1 along with the test results. | document the cables specifications on section 4.1 along with the test | |||
The test report MUST specify if the cable latency has been removed | results. The test report MUST specify if the cable latency has been | |||
from the test measures or not. The accuracy of the traffic generator | removed from the test measures or not. The accuracy of the traffic | |||
measure MUST be provided (this is usually a value in the 20ns range | generator measure MUST be provided (this is usually a value in the | |||
for current test equipment). | 20ns range for current test equipment). | |||
5 Line rate | 5. Line rate | |||
5.1 Definition | 5.1. Definition | |||
The transmit timing, or maximum transmitted data rate is controlled | The transmit timing, or maximum transmitted data rate is controlled | |||
by the "transmit clock" in the DUT. The receive timing (maximum | by the "transmit clock" in the DUT. The receive timing (maximum | |||
ingress data rate) is derived from the transmit clock of the | ingress data rate) is derived from the transmit clock of the | |||
connected interface. | connected interface. | |||
The line rate or physical layer frame rate is the maximum capacity to | The line rate or physical layer frame rate is the maximum capacity to | |||
send frames of a specific size at the transmit clock frequency of the | send frames of a specific size at the transmit clock frequency of the | |||
DUT. | DUT. | |||
The term "nominal value of Line Rate" defines the maximum speed | The term "nominal value of Line Rate" defines the maximum speed | |||
capability for the given port; for example 1GE, 10GE, 40GE, 100GE | capability for the given port; for example 1GE, 10GE, 40GE, 100GE | |||
etc. | etc. | |||
The frequency ("clock rate") of the transmit clock in any two | The frequency ("clock rate") of the transmit clock in any two | |||
connected interfaces will never be precisely the same; therefore, a | connected interfaces will never be precisely the same; therefore, a | |||
tolerance is needed. This will be expressed by Parts Per Million | tolerance is needed. This will be expressed by Parts Per Million | |||
(PPM) value. The IEEE standards allow a specific +/- variance in the | (PPM) value. The IEEE standards allow a specific +/- variance in the | |||
transmit clock rate, and Ethernet is designed to allow for small, | transmit clock rate, and Ethernet is designed to allow for small, | |||
normal variations between the two clock rates. This results in a | normal variations between the two clock rates. This results in a | |||
tolerance of the line rate value when traffic is generated from a | tolerance of the line rate value when traffic is generated from a | |||
testing equipment to a DUT. | testing equipment to a DUT. | |||
Line rate SHOULD be measured in frames per second. | Line rate SHOULD be measured in frames per second. | |||
5.2 Discussion | 5.2. Discussion | |||
For a transmit clock source, most Ethernet switches use "clock | For a transmit clock source, most Ethernet switches use "clock | |||
modules" (also called "oscillator modules") that are sealed, | modules" (also called "oscillator modules") that are sealed, | |||
internally temperature-compensated, and very accurate. The output | internally temperature-compensated, and very accurate. The output | |||
frequency of these modules is not adjustable because it is not | frequency of these modules is not adjustable because it is not | |||
necessary. Many test sets, however, offer a software-controlled | necessary. Many test sets, however, offer a software-controlled | |||
adjustment of the transmit clock rate. These adjustments SHOULD be | adjustment of the transmit clock rate. These adjustments SHOULD be | |||
used to compensate the test equipment in order to not send more than | used to compensate the test equipment in order to not send more than | |||
the line rate of the DUT. | the line rate of the DUT. | |||
To allow for the minor variations typically found in the clock rate | To allow for the minor variations typically found in the clock rate | |||
of commercially-available clock modules and other crystal-based | of commercially-available clock modules and other crystal-based | |||
oscillators, Ethernet standards specify the maximum transmit clock | oscillators, Ethernet standards specify the maximum transmit clock | |||
rate variation to be not more than +/- 100 PPM (parts per million) | rate variation to be not more than +/- 100 PPM (parts per million) | |||
from a calculated center frequency. Therefore a DUT must be able to | from a calculated center frequency. Therefore a DUT must be able to | |||
accept frames at a rate within +/- 100 PPM to comply with the | accept frames at a rate within +/- 100 PPM to comply with the | |||
standards. | standards. | |||
Very few clock circuits are precisely +/- 0.0 PPM because: | Very few clock circuits are precisely +/- 0.0 PPM because: | |||
1.The Ethernet standards allow a maximum of +/- 100 PPM (parts per | 1.The Ethernet standards allow a maximum of +/- 100 PPM (parts per | |||
million) variance over time. Therefore it is normal for the frequency | million) variance over time. Therefore it is normal for the | |||
of the oscillator circuits to experience variation over time and over | frequency of the oscillator circuits to experience variation over | |||
a wide temperature range, among external factors. | time and over a wide temperature range, among external factors. | |||
2.The crystals, or clock modules, usually have a specific +/- PPM | 2.The crystals, or clock modules, usually have a specific +/- PPM | |||
variance that is significantly better than +/- 100 PPM. Often times | variance | |||
this is +/- 30 PPM or better in order to be considered a | that is si | |||
"certification instrument". | gnificantl | |||
y better | ||||
than +/- | ||||
100 PPM. | ||||
Often | ||||
times this | ||||
is +/- 30 | ||||
PPM or | ||||
better in | ||||
order to | ||||
be | ||||
considered | ||||
a "certifi | ||||
cation ins | ||||
trument". | ||||
When testing an Ethernet switch throughput at "line rate", any | When testing an Ethernet switch throughput at "line rate", any | |||
specific switch will have a clock rate variance. If a test set is | specific switch will have a clock rate variance. If a test set is | |||
running +1 PPM faster than a switch under test, and a sustained line | running +1 PPM faster than a switch under test, and a sustained line | |||
rate test is performed, a gradual increase in latency and eventually | rate test is performed, a gradual increase in latency and eventually | |||
packet drops as buffers fill and overflow in the switch can be | packet drops as buffers fill and overflow in the switch can be | |||
observed. Depending on how much clock variance there is between the | observed. Depending on how much clock variance there is between the | |||
two connected systems, the effect may be seen after the traffic | two connected systems, the effect may be seen after the traffic | |||
stream has been running for a few hundred microseconds, a few | stream has been running for a few hundred microseconds, a few | |||
milliseconds, or seconds. The same low latency and no-packet-loss can | milliseconds, or seconds. The same low latency and no-packet-loss | |||
be demonstrated by setting the test set link occupancy to slightly | can be demonstrated by setting the test set link occupancy to | |||
less than 100 percent link occupancy. Typically 99 percent link | slightly less than 100 percent link occupancy. Typically 99 percent | |||
occupancy produces excellent low-latency and no packet loss. No | link occupancy produces excellent low-latency and no packet loss. No | |||
Ethernet switch or router will have a transmit clock rate of exactly | Ethernet switch or router will have a transmit clock rate of exactly | |||
+/- 0.0 PPM. Very few (if any) test sets have a clock rate that is | +/- 0.0 PPM. Very few (if any) test sets have a clock rate that is | |||
precisely +/- 0.0 PPM. | precisely +/- 0.0 PPM. | |||
Test set equipment manufacturers are well-aware of the standards, and | Test set equipment manufacturers are well-aware of the standards, and | |||
allow a software-controlled +/- 100 PPM "offset" (clock-rate | allow a software-controlled +/- 100 PPM "offset" (clock-rate | |||
adjustment) to compensate for normal variations in the clock speed of | adjustment) to compensate for normal variations in the clock speed of | |||
DUTs. This offset adjustment allows engineers to determine the | DUTs. This offset adjustment allows engineers to determine the | |||
approximate speed the connected device is operating, and verify that | approximate speed the connected device is operating, and verify that | |||
it is within parameters allowed by standards. | it is within parameters allowed by standards. | |||
5.3 Measurement Units | 5.3. Measurement Units | |||
"Line Rate" can be measured in terms of "Frame Rate": | "Line Rate" can be measured in terms of "Frame Rate": | |||
Frame Rate = Transmit-Clock-Frequency / (Frame-Length*8 + Minimum_Gap | Frame Rate = Transmit-Clock-Frequency / (Frame-Length*8 + Minimum_Gap | |||
+ Preamble + Start-Frame Delimiter) | + Preamble + Start-Frame Delimiter) | |||
Minimum_Gap represents the inter frame gap. This formula "scales up" | Minimum_Gap represents the inter frame gap. This formula "scales up" | |||
or "scales down" to represent 1 GB Ethernet, or 10 GB Ethernet and so | or "scales down" to represent 1 GB Ethernet, or 10 GB Ethernet and so | |||
on. | on. | |||
Example for 1 GB Ethernet speed with 64-byte frames: Frame Rate = | Example for 1 GB Ethernet speed with 64-byte frames: Frame Rate = | |||
1,000,000,000 /(64*8 + 96 + 56 + 8) Frame Rate = 1,000,000,000 / 672 | 1,000,000,000 /(64*8 + 96 + 56 + 8) Frame Rate = 1,000,000,000 / 672 | |||
Frame Rate = 1,488,095.2 frames per second. | Frame Rate = 1,488,095.2 frames per second. | |||
Considering the allowance of +/- 100 PPM, a switch may "legally" | Considering the allowance of +/- 100 PPM, a switch may "legally" | |||
transmit traffic at a frame rate between 1,487,946.4 FPS and | transmit traffic at a frame rate between 1,487,946.4 FPS and | |||
1,488,244 FPS. Each 1 PPM variation in clock rate will translate to | 1,488,244 FPS. Each 1 PPM variation in clock rate will translate to | |||
a 1.488 frame-per-second frame rate increase or decrease. | a 1.488 frame-per-second frame rate increase or decrease. | |||
In a production network, it is very unlikely to see precise line rate | In a production network, it is very unlikely to see precise line rate | |||
over a very brief period. There is no observable difference between | over a very brief period. There is no observable difference between | |||
dropping packets at 99% of line rate and 100% of line rate. | dropping packets at 99% of line rate and 100% of line rate. | |||
Line rate can be measured at 100% of line rate with a -100PPM | Line rate can be measured at 100% of line rate with a -100PPM | |||
adjustment. | adjustment. | |||
Line rate SHOULD be measured at 99,98% with 0 PPM adjustment. | Line rate SHOULD be measured at 99,98% with 0 PPM adjustment. | |||
The PPM adjustment SHOULD only be used for a line rate type of | The PPM adjustment SHOULD only be used for a line rate type of | |||
measurement. | measurement. | |||
6 Buffering | 6. Buffering | |||
6.1 Buffer | 6.1. Buffer | |||
6.1.1 Definition | 6.1.1. Definition | |||
Buffer Size: The term buffer size represents the total amount of | Buffer Size: The term buffer size represents the total amount of | |||
frame buffering memory available on a DUT. This size is expressed in | frame buffering memory available on a DUT. This size is expressed in | |||
B (byte); KB (kilobyte), MB (megabyte) or GB (gigabyte). When the | B (byte); KB (kilobyte), MB (megabyte) or GB (gigabyte). When the | |||
buffer size is expressed it SHOULD be defined by a size metric stated | buffer size is expressed it SHOULD be defined by a size metric stated | |||
above. When the buffer size is expressed, an indication of the frame | above. When the buffer size is expressed, an indication of the frame | |||
MTU used for that measurement is also necessary as well as the cos | MTU used for that measurement is also necessary as well as the cos | |||
(class of service) or dscp (differentiated services code point) value | (class of service) or dscp (differentiated services code point) value | |||
set; as often times the buffers are carved by quality of service | set; as often times the buffers are carved by quality of service | |||
implementation. Please refer to the buffer efficiency section for | implementation. Please refer to the buffer efficiency section for | |||
further details. | further details. | |||
Example: Buffer Size of DUT when sending 1518 byte frames is 18 MB. | Example: Buffer Size of DUT when sending 1518 byte frames is 18 MB. | |||
Port Buffer Size: The port buffer size is the amount of buffer for a | Port Buffer Size: The port buffer size is the amount of buffer for a | |||
single ingress port, egress port or combination of ingress and egress | single ingress port, egress port or combination of ingress and egress | |||
buffering location for a single port. The reason for mentioning the | buffering location for a single port. The reason for mentioning the | |||
three locations for the port buffer is because the DUT buffering | three locations for the port buffer is because the DUT buffering | |||
scheme can be unknown or untested, and so knowing the buffer location | scheme can be unknown or untested, and so knowing the buffer location | |||
helps clarify the buffer architecture and consequently the total | helps clarify the buffer architecture and consequently the total | |||
buffer size. The Port Buffer Size is an informational value that MAY | buffer size. The Port Buffer Size is an informational value that MAY | |||
be provided from the DUT vendor. It is not a value that is tested by | be provided from the DUT vendor. It is not a value that is tested by | |||
benchmarking. Benchmarking will be done using the Maximum Port Buffer | benchmarking. Benchmarking will be done using the Maximum Port | |||
Size or Maximum Buffer Size methodology. | Buffer Size or Maximum Buffer Size methodology. | |||
Maximum Port Buffer Size: In most cases, this is the same as the Port | Maximum Port Buffer Size: In most cases, this is the same as the Port | |||
Buffer Size. In certain switch architecture called SoC (switch on | Buffer Size. In certain switch architecture called SoC (switch on | |||
chip), there is a port buffer and a shared buffer pool available for | chip), there is a port buffer and a shared buffer pool available for | |||
all ports. The Maximum Port Buffer Size , in terms of an SoC buffer, | all ports. The Maximum Port Buffer Size , in terms of an SoC buffer, | |||
represents the sum of the port buffer and the maximum value of shared | represents the sum of the port buffer and the maximum value of shared | |||
buffer allowed for this port, defined in terms of B (byte), KB | buffer allowed for this port, defined in terms of B (byte), KB | |||
(kilobyte), MB (megabyte), or GB (gigabyte). The Maximum Port Buffer | (kilobyte), MB (megabyte), or GB (gigabyte). The Maximum Port Buffer | |||
Size needs to be expressed along with the frame MTU used for the | Size needs to be expressed along with the frame MTU used for the | |||
measurement and the cos or dscp bit value set for the test. | measurement and the cos or dscp bit value set for the test. | |||
Example: A DUT has been measured to have 3KB of port buffer for 1518 | Example: A DUT has been measured to have 3KB of port buffer for 1518 | |||
frame size packets and a total of 4.7 MB of maximum port buffer for | frame size packets and a total of 4.7 MB of maximum port buffer for | |||
1518 frame size packets and a cos of 0. | 1518 frame size packets and a cos of 0. | |||
Maximum DUT Buffer Size: This is the total size of Buffer a DUT can | Maximum DUT Buffer Size: This is the total size of Buffer a DUT can | |||
be measured to have. It is, most likely, different than than the | be measured to have. It is, most likely, different than than the | |||
Maximum Port Buffer Size. It can also be different from the sum of | Maximum Port Buffer Size. It can also be different from the sum of | |||
Maximum Port Buffer Size. The Maximum Buffer Size needs to be | Maximum Port Buffer Size. The Maximum Buffer Size needs to be | |||
expressed along with the frame MTU used for the measurement and along | expressed along with the frame MTU used for the measurement and along | |||
with the cos or dscp value set during the test. | with the cos or dscp value set during the test. | |||
Example: A DUT has been measured to have 3KB of port buffer for 1518 | Example: A DUT has been measured to have 3KB of port buffer for 1518 | |||
frame size packets and a total of 4.7 MB of maximum port buffer for | frame size packets and a total of 4.7 MB of maximum port buffer for | |||
1518 B frame size packets. The DUT has a Maximum Buffer Size of 18 MB | 1518 B frame size packets. The DUT has a Maximum Buffer Size of 18 | |||
at 1500 B and a cos of 0. | MB at 1500 B and a cos of 0. | |||
Burst: The burst is a fixed number of packets sent over a percentage | Burst: The burst is a fixed number of packets sent over a percentage | |||
of linerate of a defined port speed. The amount of frames sent are | of linerate of a defined port speed. The amount of frames sent are | |||
evenly distributed across the interval, T. A constant, C, can be | evenly distributed across the interval, T. A constant, C, can be | |||
defined to provide the average time between two consecutive packets | defined to provide the average time between two consecutive packets | |||
evenly spaced. | evenly spaced. | |||
Microburst: It is a burst. A microburst is when packet drops occur | Microburst: It is a burst. A microburst is when packet drops occur | |||
when there is not sustained or noticeable congestion upon a link or | when there is not sustained or noticeable congestion upon a link or | |||
device. A characterization of microburst is when the Burst is not | device. A characterization of microburst is when the Burst is not | |||
evenly distributed over T, and is less than the constant C [C= | evenly distributed over T, and is less than the constant C [C= | |||
average time between two consecutive packets evenly spaced out]. | average time between two consecutive packets evenly spaced out]. | |||
Intensity of Microburst: This is a percentage, representing the level | Intensity of Microburst: This is a percentage, representing the level | |||
of microburst between 1 and 100%. The higher the number the higher | of microburst between 1 and 100%. The higher the number the higher | |||
the microburst is. I=[1-[ (TP2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1) ] / | the microburst is. I=[1-[ (TP2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1) ] / | |||
Sum(packets)]]*100 | Sum(packets)]]*100 | |||
The above definitions are not meant to comment on the ideal sizing of | The above definitions are not meant to comment on the ideal sizing of | |||
a buffer, rather on how to measure it. A larger buffer is not | a buffer, rather on how to measure it. A larger buffer is not | |||
necessarily better and can cause issues with buffer bloat. | necessarily better and can cause issues with buffer bloat. | |||
6.1.2 Discussion | 6.1.2. Discussion | |||
When measuring buffering on a DUT, it is important to understand the | When measuring buffering on a DUT, it is important to understand the | |||
behavior for each and all ports. This provides data for the total | behavior for each and all ports. This provides data for the total | |||
amount of buffering available on the switch. The terms of buffer | amount of buffering available on the switch. The terms of buffer | |||
efficiency here helps one understand the optimum packet size for the | efficiency here helps one understand the optimum packet size for the | |||
buffer, or the real volume of the buffer available for a specific | buffer, or the real volume of the buffer available for a specific | |||
packet size. This section does not discuss how to conduct the test | packet size. This section does not discuss how to conduct the test | |||
methodology; instead, it explains the buffer definitions and what | methodology; instead, it explains the buffer definitions and what | |||
metrics should be provided for a comprehensive data center device | metrics should be provided for a comprehensive data center device | |||
buffering benchmarking. | buffering benchmarking. | |||
6.1.3 Measurement Units | 6.1.3. Measurement Units | |||
When Buffer is measured: | When Buffer is measured: | |||
-The buffer size MUST be measured | -The buffer size MUST be measured | |||
-The port buffer size MAY be provided for each port | -The port buffer size MAY be provided for each port | |||
-The maximum port buffer size MUST be measured | -The maximum port buffer size MUST be measured | |||
-The maximum DUT buffer size MUST be measured | -The maximum DUT buffer size MUST be measured | |||
-The intensity of microburst MAY be mentioned when a microburst test | -The intensity of microburst MAY be mentioned when a microburst test | |||
is performed | is performed | |||
-The cos or dscp value set during the test SHOULD be provided | -The cos or dscp value set during the test SHOULD be provided | |||
6.2 Incast | 6.2. Incast | |||
6.2.1 Definition | ||||
6.2.1. Definition | ||||
The term Incast, very commonly utilized in the data center, refers to | The term Incast, very commonly utilized in the data center, refers to | |||
the traffic pattern of many-to-one or many-to-many traffic patterns. | the traffic pattern of many-to-one or many-to-many traffic patterns. | |||
It measures the number of ingress and egress ports and the level of | It measures the number of ingress and egress ports and the level of | |||
synchronization attributed, as defined in this section. Typically in | synchronization attributed, as defined in this section. Typically in | |||
the data center it would refer to many different ingress server ports | the data center it would refer to many different ingress server ports | |||
(many), sending traffic to a common uplink (many-to-one), or multiple | (many), sending traffic to a common uplink (many-to-one), or multiple | |||
uplinks (many-to-many). This pattern is generalized for any network | uplinks (many-to-many). This pattern is generalized for any network | |||
as many incoming ports sending traffic to one or few uplinks. | as many incoming ports sending traffic to one or few uplinks. | |||
Synchronous arrival time: When two, or more, frames of respective | Synchronous arrival time: When two, or more, frames of respective | |||
sizes L1 and L2 arrive at their respective one or multiple ingress | sizes L1 and L2 arrive at their respective one or multiple ingress | |||
ports, and there is an overlap of the arrival time for any of the | ports, and there is an overlap of the arrival time for any of the | |||
bits on the Device Under Test (DUT), then the frames L1 and L2 have a | bits on the Device Under Test (DUT), then the frames L1 and L2 have a | |||
synchronous arrival times. This is called Incast regardless of in | synchronous arrival times. This is called Incast regardless of in | |||
many-to-one (simpler form) or, many-to-many. | many-to-one (simpler form) or, many-to-many. | |||
Asynchronous arrival time: Any condition not defined by synchronous | Asynchronous arrival time: Any condition not defined by synchronous | |||
arrival time. | arrival time. | |||
Percentage of synchronization: This defines the level of overlap | Percentage of synchronization: This defines the level of overlap | |||
[amount of bits] between the frames L1,L2..Ln. | [amount of bits] between the frames L1,L2..Ln. | |||
Example: Two 64 bytes frames, of length L1 and L2, arrive to ingress | Example: Two 64 bytes frames, of length L1 and L2, arrive to ingress | |||
port 1 and port 2 of the DUT. There is an overlap of 6.4 bytes | port 1 and port 2 of the DUT. There is an overlap of 6.4 bytes | |||
between the two where L1 and L2 were at the same time on the | between the two where L1 and L2 were at the same time on the | |||
respective ingress ports. Therefore the percentage of synchronization | respective ingress ports. Therefore the percentage of | |||
is 10%. | synchronization is 10%. | |||
Stateful type traffic defines packets exchanged with a stateful | Stateful type traffic defines packets exchanged with a stateful | |||
protocol such as TCP. | protocol such as TCP. | |||
Stateless type traffic defines packets exchanged with a stateless | Stateless type traffic defines packets exchanged with a stateless | |||
protocol such as UDP. | protocol such as UDP. | |||
6.2.2 Discussion | 6.2.2. Discussion | |||
In this scenario, buffers are solicited on the DUT. In an ingress | In this scenario, buffers are solicited on the DUT. In an ingress | |||
buffering mechanism, the ingress port buffers would be solicited | buffering mechanism, the ingress port buffers would be solicited | |||
along with Virtual Output Queues, when available; whereas in an | along with Virtual Output Queues, when available; whereas in an | |||
egress buffer mechanism, the egress buffer of the one outgoing port | egress buffer mechanism, the egress buffer of the one outgoing port | |||
would be used. | would be used. | |||
In either case, regardless of where the buffer memory is located on | In either case, regardless of where the buffer memory is located on | |||
the switch architecture, the Incast creates buffer utilization. | the switch architecture, the Incast creates buffer utilization. | |||
When one or more frames having synchronous arrival times at the DUT | When one or more frames having synchronous arrival times at the DUT | |||
they are considered forming an Incast. | they are considered forming an Incast. | |||
6.2.3 Measurement Units | 6.2.3. Measurement Units | |||
It is a MUST to measure the number of ingress and egress ports. It is | It is a MUST to measure the number of ingress and egress ports. It | |||
a MUST to have a non-null percentage of synchronization, which MUST | is a MUST to have a non-null percentage of synchronization, which | |||
be specified. | MUST be specified. | |||
7 Application Throughput: Data Center Goodput | 7. Application Throughput: Data Center Goodput | |||
7.1. Definition | 7.1. Definition | |||
In Data Center Networking, a balanced network is a function of | In Data Center Networking, a balanced network is a function of | |||
maximal throughput and minimal loss at any given time. This is | maximal throughput and minimal loss at any given time. This is | |||
captured by the Goodput [4]. Goodput is the application-level | captured by the Goodput [4]. Goodput is the application-level | |||
throughput. For standard TCP applications, a very small loss can have | throughput. For standard TCP applications, a very small loss can | |||
a dramatic effect on application throughput. [RFC2647] has a | have a dramatic effect on application throughput. [RFC2647] has a | |||
definition of Goodput; the definition in this publication is a | definition of Goodput; the definition in this publication is a | |||
variance. | variance. | |||
Goodput is the number of bits per unit of time forwarded to the | Goodput is the number of bits per unit of time forwarded to the | |||
correct destination interface of the DUT, minus any bits | correct destination interface of the DUT, minus any bits | |||
retransmitted. | retransmitted. | |||
7.2. Discussion | 7.2. Discussion | |||
In data center benchmarking, the goodput is a value that SHOULD be | In data center benchmarking, the goodput is a value that SHOULD be | |||
measured. It provides a realistic idea of the usage of the available | measured. It provides a realistic idea of the usage of the available | |||
bandwidth. A goal in data center environments is to maximize the | bandwidth. A goal in data center environments is to maximize the | |||
goodput while minimizing the loss. | goodput while minimizing the loss. | |||
7.3. Measurement Units | 7.3. Measurement Units | |||
The Goodput, G, is then measured by the following formula: | The Goodput, G, is then measured by the following formula: | |||
G=(S/F) x V bytes per second | G=(S/F) x V bytes per second | |||
-S represents the payload bytes, which does not include packet or TCP | -S represents the payload bytes, which does not include packet or TCP | |||
headers | headers | |||
-F is the frame size | -F is the frame size | |||
-V is the speed of the media in bytes per second | -V is the speed of the media in bytes per second | |||
Example: A TCP file transfer over HTTP protocol on a 10GB/s media. | Example: A TCP file transfer over HTTP protocol on a 10GB/s media. | |||
The file cannot be transferred over Ethernet as a single continuous | The file cannot be transferred over Ethernet as a single continuous | |||
stream. It must be broken down into individual frames of 1500B when | stream. It must be broken down into individual frames of 1500B when | |||
the standard MTU (Maximum Transmission Unit) is used. Each packet | the standard MTU (Maximum Transmission Unit) is used. Each packet | |||
requires 20B of IP header information and 20B of TCP header | requires 20B of IP header information and 20B of TCP header | |||
information; therefore 1460B are available per packet for the file | information; therefore 1460B are available per packet for the file | |||
transfer. Linux based systems are further limited to 1448B as they | transfer. Linux based systems are further limited to 1448B as they | |||
also carry a 12B timestamp. Finally, the date is transmitted in this | also carry a 12B timestamp. Finally, the date is transmitted in this | |||
example over Ethernet which adds a 26B overhead per packet. | example over Ethernet which adds a 26B overhead per packet. | |||
G= 1460/1526 x 10 Gbit/s which is 9.567 Gbit per second or 1.196 GB | G= 1460/1526 x 10 Gbit/s which is 9.567 Gbit per second or 1.196 GB | |||
per second. | per second. | |||
Please note: This example does not take into consideration the | Please note: This example does not take into consideration the | |||
additional Ethernet overhead, such as the interframe gap (a minimum | additional Ethernet overhead, such as the interframe gap (a minimum | |||
of 96 bit times), nor collisions (which have a variable impact, | of 96 bit times), nor collisions (which have a variable impact, | |||
depending on the network load). | depending on the network load). | |||
skipping to change at page 16, line 21 | skipping to change at page 16, line 23 | |||
The benchmarking network topology will be an independent test setup | The benchmarking network topology will be an independent test setup | |||
and MUST NOT be connected to devices that may forward the test | and MUST NOT be connected to devices that may forward the test | |||
traffic into a production network, or misroute traffic to the test | traffic into a production network, or misroute traffic to the test | |||
management network. | management network. | |||
Further, benchmarking is performed on a "black-box" basis, relying | Further, benchmarking is performed on a "black-box" basis, relying | |||
solely on measurements observable external to the DUT. | solely on measurements observable external to the DUT. | |||
Special capabilities SHOULD NOT exist in the DUT specifically for | Special capabilities SHOULD NOT exist in the DUT specifically for | |||
benchmarking purposes. Any implications for network security arising | benchmarking purposes. Any implications for network security arising | |||
from the DUT SHOULD be identical in the lab and in production | from the DUT SHOULD be identical in the lab and in production | |||
networks. | networks. | |||
9. IANA Considerations | 9. IANA Considerations | |||
NO IANA Action is requested at this time. | NO IANA Action is requested at this time. | |||
10. References | 10. References | |||
10.1. Normative References | 10.1. Normative References | |||
[draft-ietf-bmwg-dcbench-methodology] Avramov L. and Rapp J., "Data | [RFC1242] Bradner, S., "Benchmarking Terminology for Network | |||
Center Benchmarking Methodology", RFC "draft-ietf-bmwg-dcbench- | Interconnection Devices", RFC 1242, DOI 10.17487/RFC1242, | |||
methodology", DATE (to be updated once published) | July 1991, <http://www.rfc-editor.org/info/rfc1242>. | |||
[RFC1242] Bradner, S. "Benchmarking Terminology for Network | ||||
Interconnection Devices", RFC 1242, July 1991, <http://www.rfc- | ||||
editor.org/info/rfc1242> | ||||
[RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for | [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for | |||
Network Interconnect Devices", RFC 2544, March 1999, | Network Interconnect Devices", RFC 2544, | |||
<http://www.rfc-editor.org/info/rfc2544> | DOI 10.17487/RFC2544, March 1999, | |||
<http://www.rfc-editor.org/info/rfc2544>. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, | Requirement Levels", BCP 14, RFC 2119, | |||
March 1997, <http://www.rfc-editor.org/info/rfc2119> | DOI 10.17487/RFC2119, March 1997, | |||
<http://www.rfc-editor.org/info/rfc2119>. | ||||
[RFC5481] , Morton, A., "Packet Delay Variation Applicability | [RFC5481] Morton, A. and B. Claise, "Packet Delay Variation | |||
Statement", BCP 14, RFC 5481, March 2009, <http://www.rfc- | Applicability Statement", RFC 5481, DOI 10.17487/RFC5481, | |||
editor.org/info/rfc5481> | March 2009, <http://www.rfc-editor.org/info/rfc5481>. | |||
10.2. Informative References | 10.2. Informative References | |||
[RFC2889] Mandeville R. and Perser J., "Benchmarking | [RFC2889] Mandeville, R. and J. Perser, "Benchmarking Methodology | |||
Methodology for LAN Switching Devices", RFC 2889, August 2000, | for LAN Switching Devices", RFC 2889, | |||
<http://www.rfc-editor.org/info/rfc2889> | DOI 10.17487/RFC2889, August 2000, | |||
<http://www.rfc-editor.org/info/rfc2889>. | ||||
[RFC3918] Stopp D. and Hickman B., "Methodology for IP Multicast | ||||
Benchmarking", RFC 3918, October 2004, <http://www.rfc- | ||||
editor.org/info/rfc3918> | ||||
[4] Yanpei Chen, Rean Griffith, Junda Liu, Randy H. Katz, Anthony D. | [RFC3918] Stopp, D. and B. Hickman, "Methodology for IP Multicast | |||
Joseph, "Understanding TCP Incast Throughput Collapse in | Benchmarking", RFC 3918, DOI 10.17487/RFC3918, October | |||
Datacenter Networks, | 2004, <http://www.rfc-editor.org/info/rfc3918>. | |||
"http://yanpeichen.com/professional/usenixLoginIncastReady.pdf" | ||||
[RFC2432] Dubray, K., "Terminology for IP Multicast | [RFC2432] Dubray, K., "Terminology for IP Multicast Benchmarking", | |||
Benchmarking", BCP 14, RFC 2432, DOI 10.17487/RFC2432, October | RFC 2432, DOI 10.17487/RFC2432, October 1998, | |||
1998, <http://www.rfc-editor.org/info/rfc2432> | <http://www.rfc-editor.org/info/rfc2432>. | |||
[RFC2647] Newman D. ,"Benchmarking Terminology for Firewall | [RFC2647] Newman, D., "Benchmarking Terminology for Firewall | |||
Performance" BCP 14, RFC 2647, August 1999, <http://www.rfc- | Performance", RFC 2647, DOI 10.17487/RFC2647, August 1999, | |||
editor.org/info/rfc2647> | <http://www.rfc-editor.org/info/rfc2647>. | |||
10.3. Acknowledgments | Acknowledgments | |||
The authors would like to thank Alfred Morton, Scott Bradner, | authors would like to thank Alfred Morton, Scott Bradner, Ian Cox, Tim | |||
Ian Cox, Tim Stevenson for their reviews and feedback. | Stevenson for their reviews and feedback. | |||
Authors' Addresses | Authors' Addresses | |||
Lucien Avramov | Lucien Avramov | |||
1600 Amphitheatre Parkway | 1600 Amphitheatre Parkway | |||
Mountain View, CA 94043 | Mountain View, CA 94043 | |||
United States | United States | |||
Phone: +1 408 774 9077 | ||||
Email: lucien.avramov@gmail.com | ||||
Jacob Rapp | Phone: +1 408 774 9077 | |||
VMware | Email: lucien.avramov@gmail.com | |||
3401 Hillview Ave | ||||
Palo Alto, CA 94304 | Jacob Rapp | |||
United States | VMware | |||
Phone: +1 650 857 3367 | 3401 Hillview Ave | |||
Email: jrapp@vmware.com | Palo Alto, CA 94304 | |||
United States | ||||
Phone: +1 650 857 3367 | ||||
Email: jrapp@vmware.com | ||||
End of changes. 117 change blocks. | ||||
270 lines changed or deleted | 283 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |