A central video management system cannot exist without fault tolerance mechanisms. CORTROL Global offers several redundancy features — database backup, management server mirroring, failover, fallback storage, replication, and edge recording. These combined efforts, help to reduce the disruption of the video recordings to zero by providing high availability service and keeping the recorded data integral.
In this article, we will take a closer look at failover mechanism for recording servers in the CORTROL Global system and see how it can be fine-tuned to meet high standard requirements for redundancy, for instance, those dictated by Security Monitoring Standards from Monitoring & Control Centre (MCC) of UAE.
The failover feature enables quick automated recovery of the recording system in case of a sudden failure. CORTROL Global server carefully tracks the operation of all the servers in the system by exchanging heartbeat messages with each one; if there is no response for too long, the central management server assumes that the target recording server is down and immediately replaces that server with a spare one.
What does happen when a faulty server is replaced? The central management server remembers the configuration of all the servers in the system; so, a secondary (spare) server is assigned a configuration that is identical to the one from the primary recording server. Thus, the fault has no consequences on everyone else in the system as all the live and recorded video streams are continuously available. When the malfunctioned machine comes back online, it is returned to operation either automatically or manually by the administrator, depending on the desired system configuration.
For rapid failover to be achieved, recording servers are split into groups — clusters — based on their location, purpose, importance, configuration or other attributes or their combinations. Inside the cluster, there may be any proportion of actual (primary) recording servers and ‘spare parts’ — failover nodes: thus, any required redundancy level can be reached. There are no limitations on the number of clusters or the number of servers in a cluster.
A faulty server is automatically replaced by a spare one thanks to the failover mechanism.
For both primary recorders and failover servers, the same installation package — CORTROL Global Recording Server — is used, with server roles assigned later in their settings. The Global server itself does not participate in the failover clustering and is therefore not covered by failover. For this reason, we do not recommend using the Global server for recording (although it is allowed) but rather for management only. The central management server high availability is achieved using a different redundancy feature called central server mirroring.
Failover is configured via CORTROL Console application. Generally, the plan is:
The order of these steps is not crucial; e.g., you can first add the servers and create the failover cluster later.
We shall consider a system with one cluster that contains two primary recording servers and two spare servers. All these servers (four of them) are located in the same local network and, as a result, have access to the same cameras. This network can be different from the Global server network.
A list of servers ready to be added to a failover cluster
Double-click a recording server to bring up its properties editing dialog box, and switch to the Failovertab (do not forget to fill in the rest of the tabs afterward). Here, it is necessary to choose the failover cluster that servers will belong to and define failover settings for this particular server.
Click Change to choose a pre-created failover cluster from the list. Do not worry if you have not created a cluster beforehand, and you can do so right now by clicking the button in the bottom of the cluster list.
Failover settings for a primary recording server
In the Current failover server field, it should be saying none at this point: here, the currently active failover node is shown, so this field will become non-empty once the target recording server fails and is replaced by a failover server.
The rest of the settings explained:
Repeat this for every primary recording server in the list.
For failover servers, the settings differ a bit. Double-click a server that is intended to be a failover node and stay on the Details tab: here, enable the Failover node role.
Enabling the failover node role
Next, switch to the Failover tab: note that there are fewer settings here this time as there is no need to define any auto-recovery settings. The Failover timeout parameter will be used when this server fails while operating with an assigned primary server configuration so that the faulty failover server is replaced with another (next) failover node: we shall set this to ten seconds, similarly to the regular recording server. Analogously, the Central server connection timeout parameter shall be one minute.
Failover settings for a failover node
Put the failover node into the same failover cluster and save the server configuration by clicking OK.
The failover functionality is now enabled within your CORTROL Global system for the specified servers.
The current server status and also hardware load (for connected machines) can be viewed in the Monitoring section of CORTROL Console, under Servers.
Also, for each primary recording server, its current failover substitute is displayed in the server settings, Failover tab (as shown in the snapshot above).
Once you have clustered the servers and configured them as described above, CORTROL Global is ready for the recording server misbehavior: whenever any of the two recording servers fails, it will be replaced by a failover server right away. In addition to the camera configuration, the failover server sustains the state of the server event & action configuration.
From the connected clients — the CORTROL Monitor, mobile apps, and others — all failover operations are transparent for the user’s convenience. Clients receive the requested live streams and recordings, and present these to their users, so the latter may not even suspect something might have gone wrong with any of the servers.
Recordings that have been made on failover servers remain there until they are erased, having reached one of the quotas. Individual archive duration quotas set for specific channels will affect all servers, i.e., the old recordings will be erased from the failover servers as well (provided, of course, that these servers are online and connected to the Global server).
There are a few hints that can make your experience with CORTROL Global failover even more exciting. ??
Now to you are ready to set this up for your CORTROL Global system. May your servers never break down!