I am going to break this topic down into several chapters. This first one will focus on a general overview of interfaces and those decisions specific to taps and packet brokers. In coming weeks we’ll be publishing a follow up chapter for each management system area (i.e. fault, configuration, accounting, performance monitoring, security and remote access). These subsequent chapters will go into more detail about our decisions, the history and current state of the industry and the market expectations.
For the initial overview, let us start by looking at the general set of protocols for network and telecom equipment but temper this list by recognizing that taps and packet brokers are simple devices in the network and have a different role than the typical router, switch or server.
We will go into much more detail in the individual chapters including reviewing individual protocols and their pros and cons. For now, let us start with a broad list of interface types and functions to consider
Beginning with fault management we can break the function down into three main functional areas: notification, history, and status reporting. Let us also define two basic terms: events and alarms. An event is a message or announcement that indicates a condition or state change occurred at a specific point in time. An alarm is a fault state that persists over time. An alarm can be raised (created) and an event sent, can change alarm state (and another event), or can clear when then alarm condition is resolved (also indicated by an event). Non-persistent faults can result in events only and no alarm.
Fault Notification – events are presented to the user as notification messages these can be sent over various protocols such as syslog or SNMP and would also appear in a user’s GUI or command line session or could be emailed or texted.
Fault History – is a history record of the notification events. This information is quite useful in determining how many times the event occurred, when it occurred, and any trends in faults to the system. It also allows a user or system to “catch up” on any events they may have missed.
Fault Status – this is where the concept of alarm comes into play. It is one thing to look at a device and see the history of fault notifications, but what is usually more important is the question: is there a current fault or issue in progress. Generally, fault status is presented as a list of active faults (alarms) and information about them including when they began, their severity and what systems they affect. The fault status list will only list current issues, so once a fault has been resolved it is removed from the list (its history is still available in the fault history record.)
The best practice is that all fault records (notifications, history records and status) contain a basic set of common information including the device identifier, the function, component, or sub-component within the record pertains to, an accurate time stamp, and most importantly a severity indicator. We will discuss each of these in more detail in the fault chapter.
The next area we will review is configuration. Configuration management consists of several major areas including the initial setup of the device, loading of software and software updates, backup and restore of the configuration, and on-going provisioning activities of individual device ports and functions. Configuration also includes synchronization with network services such as Time of Day or DHCP.
Initial device setup- there are several ways this can be performed in network devices, the most common is via a command line or graphical interface where the user is prompted for a series of initial values including IP address, device name and identifiers, administrative user accounts, initial device operational modes, etc. In some cases, some or all of the configuration data can also be pre-set at the factory or delivered by configuration services in the network such as BootP, DHCP, TFTP, TR-069 or other protocols.
Software initialization and updates- this category includes any type of software or firmware on the device including operating system, firmware, device drivers, application software, optional functional modules, etc. It also includes the ongoing management of that software including patches, updates, and new versions as well as rollback or reload of software to previous versions. Software and software updates are typically delivered as a file of file set (package of files such as a zip or tar bundle) to the user and then the user transfers those into the device utilizing protocols such as HTTP, TFTP, SFTP or via physical media such as an SD card or USB drive. In some cases, software loads or updates may be delivered over the Internet if a device has sufficient access.
Backup and restore- there are two main reasons to back up the configuration of a device. The first one is for recovery of the device in the event of a hardware or software failure. Whether replacing the device with a new unit (hardware failure) or resolving a software issue by resetting/reloading the unit to a factory default state (due to a software or configuration error), a configuration backup allows this recovery to be done quickly and accurately without requiring the user to re-enter all the settings for the device after reloading. The second use case is to recover from a user error or attack against the device. The user may inadvertently disrupt the proper configuration of the device through a provisioning action (such as accidentally deleting port or flow configurations that are in-use) or changing the configuration to an invalid state, or a hacker may access the device to intentionally corrupt or usurp the configuration. Having a regular configuration backup allows the system configuration stored prior to that event to be reloaded and hence can restore a device to a functioning state very quickly. It is best practice to automatically perform backups to a secure network location using protocols such as SFTP and it is also best practice to maintain multiple historical copies of the backup to allow for the rollback situations. Most device also keep at least one backup copy on-board for rapid rollback/recovery.
Ongoing provisioning activities- depending on how the device is used in the network it may need to be provisioned periodically, perhaps frequently, to change its operational behavior, to setup new ports or services, to reconfigure filters or flows, to configure or adjust monitoring functions, or to redirect flows due to changes in the network or to work around network issues. Generally, provisioning interfaces are provided via command line interface or graphical interface. In some cases, we see the need for other systems in the network to perform a configuration change to the device either autonomously or triggered/driven by a higher-level provisioning activity. In those cases, configuration protocols such as HTTP, JSON, REST, XML, NETCONF, RESTCONF or SNMP are used.
Accounting includes protocols which report on services delivered via a device to allow for customers/subscribers to be billed or to receive reports of their activity (not to be confused with accounting of administrative user access activity which is discussed in the security section*). Accounting generally does not apply to network taps and packet brokers but could be considered as an add-on service for reporting of traffic types/volumes to various consumers. In these cases, for taps and NPBs, the performance reporting protocols will be used (see below) instead of the more traditional telecom accounting protocols.
*In some routers and other network devices, ‘accounting’ may also refer to the user tracking and logging service, but I have grouped this into the security category along with the other AAA (authentication, Authorization and Accounting (Auditing)) functions.
Performance Monitoring includes the protocols used to monitor, record and report on performance and statistic information about the device and the traffic being carried through the device. I have elected not to include network flow monitoring in this category since it is its own analysis application rather than a device management function. Performance monitoring can be grouped into several categories including reporting of device performance and traffic performance – both are needed to understand the type of workload the device is experiencing.
Device performance – includes monitoring aspects including CPU load, memory utilization, application, subsystem or function utilization, fault rates, and interface utilization. The device generally will keep counters for these key metrics and will have thresholds which initiate a fault event or warning message to be transmitted if a particular high or low watermark is met. In addition, the device should also record these performance metrics over time so trends and variations in these metrics can be assessed. Reporting of threshold events are via the fault notification interface and fault status interface discussed above. Reporting of device performance metric history is typically done via a table/list export interface and can be provided either as files (CSV, XML, etc.) over file transfer protocols such as SFTP, SCP, etc. or via data query protocols such as NETCONF, REST, JSON, HTML, etc.
Traffic performance – traffic performance monitoring is used to report on levels of traffic flowing through the device both at a port level (ingress/egress) and at a flow/route level. These metrics may include metrics by VLAN or source/destination subnet/IP address or other criteria. As with device performance, most devices will support the concept of a configurable thresholds to indicate important events to the user (for example no traffic is flowing on a port that should have traffic or traffic has reached or exceeded the maximum capacity of a port, a flow, or its target). Traffic performance metrics can also be used to identify trends which may indicate that when traffic on a given port should be moved to a higher bandwidth connection or where unusual traffic patterns appear at different times of day, or on specific days. As with device performance the delivery of this type of information can be via table and list export interfaces as listed above.
Security- security interfaces include the verification of user identity and authority, the tracking of user or system activity, and the securing of network protocols and their encryption.
User identity verification – Common identification verification interfaces include RADIUS, TACACS, DIAMTER, PAP/CHAP/EAP, etc. Many of these protocols also include delivering the user authority information to the device. In other words, it identifies what functions of the device the user has permissions to access. The simplest devices maintain the user security on board in the form of user IDs and passwords, but this becomes difficult to administer in a network with hundreds or thousands of devices and users and in practice becomes difficult to maintain integrity from a security policy perspective. We will discuss the pros and cons of various security practices and protocols in a later chapter.
Security activity tracking- another key part of security is to maintain a record, or audit log of who or what accesses the system and what activities they perform. This type of data is key for any forensic analysis in the event of a security breach. This data also proves helpful in determining sources of misconfiguration. For example: the device stopped sending the required flow at 12:04pm – by looking at the audit log we find that Mary made a configuration change to the flow at 12:03pm. She made this change based on a work order she received, and by further investigation we determined that the work order had an error identifying the wrong customer port and hence the wrong flow was changed. (i.e. it was not a network, device or software error and it wasn’t Mary’s error in applying the configuration change, but it was an issue introduced during the creation of the original work order.)
Network protocols and encryption- another key area ensuring device integrity is to ensure that the device is not being attacked or compromised. This includes ensuring that encryption mechanisms are secure and includes monitoring the number and source of access attempts, particularly monitoring invalid attempts, the frequency of cases where traffic appears on the device ports but fails to pass encryption validation and creating and updating security tokens and security keys. While this category is not its own explicit network protocol it is a key aspect of all the other protocols that will be discussed.
Remote access refers to interactive user sessions, specifically command line interfaces and graphical interfaces. Most devices have a local physical user interface via serial port or USB to facilitate initial device setup and recovery, Typical day to day device management, however, is done via remotely accessible network protocols. Remote access protocols for command line interfaces are most commonly performed via SSH but may also be provided via other protocols such as telnet or X-25. Graphical interfaces in modern systems are almost always provided via HTTP interfaces (HTML) but in some cases may be supplied as X11, RDP (remote desktop protocol) or RFB (VNC’s remote buffering protocol). Some systems utilize dedicated graphical presentation software running on the user’s PC/workstation and then communicate to the device over the other protocols we have discussed. This method has been largely replaced by on-board HTTP interfaces for normal craft user access but may still be used in specialized cases such as network wide configuration or initial device configuration applications.
Another set of protocols commonly associated with Taps, Packet Brokers and Routers are the packet sampling and flow monitoring protocols. Since these protocols are, in effect, their own service, I’ve not grouped them with device management protocols but instead consider this a separate category of monitoring and classification services. This includes protocols such as sFlow, NetFlow and ipFix. We will be publishing a separate blog topic specifically devoted this topic.