Control System Safety

Overview

UAV control system is undoubtedly a critical system, and must have a reasonable safety margin to be of any practical use. An out-of-control UAV might cause property damage and/or personal injury (not to mention vehicle loss), even a small one, for which the RAMA control system is intended. Let us now describe various safety mechanisms incorporated in the RAMA UAV control system.

RAMA is designed so that for most of the failures, there is an appropriate Failure Mode the system can engage, in order to preserve the critical functions (i.e. steering) for the sake of some high-level functionality loss. Most of the failures would likely cause the mission to abort, but there is still a good chance of vehicle recovery by some sort of semi or fully manual control. If the experienced failure is so severe that it leaves no chance of saving the vehicle, RAMA should at least minimize the consequences - meaning that it prevents the injured craft from swinging around wildly and unpredictably, but would cut off the engine and put the control surfaces into some predefined positions (this is called the Critical Failure Mode - CFM). This feature is already implemented in the Spektrum DX-7 RC set.

The principles of graceful degradation are used to achieve these goals. The only critical node of the system is the Servo Control Unit (SCU), whose complete failure would definitely lead to grave consequences. Partial or complete failure of any other system part (the Main Control Computer, the Data Acquisition Module, or some/all of the sensors) would lead to the loss of some functionality, but would definitely preserve at least fully manual control of the vehicle (providing that the RC set is working).

It must be noted that there is no overall “failure control system”, which would put RAMA in an appropriate fail-safe mode automatically in case of a failure. It is always the responsibility of the pilot to perform the reconfiguration by flipping a switch if he/she thinks it is necessary. The RAMA system only gives a warning that something has gone wrong, but does not perform any reconfiguration on its own. This is intentional, as pilots generally strongly dislike any autonomous system out of their direct control, which can dramatically change the behavior of the vehicle on its own decision.

A fully automatic “failure control system” would also add some failure modes itself and would increase the overall system complexity. RAMA generally does not need such a system, as it is not intended to operate autonomously. It is assumed that RAMA will always operate under a human pilot’s direct supervision.

The only failure mode that RAMA enters automatically is when the wireless control signal from the ground is lost. Naturally, in this case, the vehicle is not under a human pilot’s control anymore, and the control system must act on its own.

Control System Failure Modes

Main Control Computer Failure Modes

safety_current.jpg

In case of the MCC failure (of any type) the only viable solution is to put the Servo Control Unit (SCU) into the Manual Control Mode (MCC). In this mode, the SCU takes over the direct control of the actuators and runs unaffected by the rest of the system. It ignores any commands possibly coming from the faulty MCC or other nodes and actuates the control surfaces directly according the control stick positions.

This solution is good enough in the current state of RAMA’s development, because there is still the modeler Futaba GY-401 tail rate controller present (it is not used when RAMA takes over control, it serves only as a backup for the case of MCM engagement) and also the Bell-Hiller roll and pitch rate stabilizer. So, when in the MCM, the helicopter could still be controlled as any other model helicopter with the help of the mechanical Bell-Hiller system and a backup electronic tail rate stabilizer (the GY-401). But the situation will change radically once those remnants of hobby helicoptering will be removed, as is ultimately planned. There will be no backup for the disabled electronic rate stabilizers anymore, which would be a very unfortunate situation. Therefore, there is a plan for the future modification the SCU software, so that it could execute the rate stabilizing algorithm directly in case of need. The SCU has more than enough computational power for this and the modification would be relatively simple and straightforward.

Once completed, there will be two failure modes available in the case of the MCC demise: The SCU could be reconfigured to take over the Angular Rates Control Layer (ARCL) instead of the MCC, providing the Data Acquisition Module (DAM) and Inertial Measurement Unit (IMU) are still running, entering the so-called Semi-Automatic Control Mode (SACM). Or, should the IMU and/or the DAM fail together with the MCC, the SCU could perform another reconfiguration, entering the Manual Control Mode (MCM), as it does now. This would make the human pilot really earn his money since his task would be significantly harder without the help of the rate stabilizers, but still gives him/her a good chance of recovering the injured vehicle somehow - for example by auto-rotation.

safety_final.jpg

Overall, in all cases, the telemetry would be lost, together with higher control layers, but the semi-automatic or at least fully manual control of the vehicle should be preserved in any case.

The MCC has no watchdog timer, as its possible restart would take too much time anyway, so it would bring no benefits.

The MCC failure is not detected automatically in any way, or at least not directly. The Ground Station (GS) monitors the MCC indirectly via the telemetry data, and in case it detects any erratic behavior (such as controller internal variables out of range, a datapacket loss or CRC errors), it informs the pilot that something is wrong with the telemetry (and possibly the MCC). However, it is up to him/her to decide whether to engage the Semi-Automatic Control Mode (SACM) or not.

Critical MCC failures are quite obvious to the pilot - the vehicle simply stops responding to his/her commands or behaves erratically. Again, it is up to him/her to engage the SACM. The reconfiguration is naturally completely independent on the MCC - the SCU receives the command and reconfigures itself into the MCM, so the reconfiguration works even in the case of a total MCC failure.

State diagrams of the failure modes are shown in figures (current state of development above, proposed future solution, extended by the Semi-Automatic Control Mode, below).

Navigation Unit Failure Modes

The Data Acquisition Module (DAM) failure makes the system enter the Manual Control Mode (upon the pilot’s command). The DAM is equipped with a watchdog timer and brown-out detection (undervoltage reset) protective circuitry. A possible DAM restart after the watchdog reset would only take about 50ms, which is fast enough compared to the vehicle dynamics.

In case of a failure of some of the sensors, the respective control layer which needs the missing data (along with all higher layers) would be turned down.

Failure of any of the sensors is detected by the DAM. The GPS and IMU (Inertial Navigation Unit) are both off-the-shelf products equipped with its own internal diagnosis, so they both issue an error message if anything is wrong with them. GPS reports the estimated position errors and the number of satellites detected, while the IMU reports the out-of range measurements and other internal errors. The communication loss with the GPS or IMU is detected by the DAM via timeouts, and is reported as a critical GPS or IMU failure. Also, communication with both units is secured via the CRC. The CRC errors are reported by the DAM to the MCC and eventually via telemetry to the Ground Station (GS). The affected data are not used in the control loops. The density of CRC errors is also monitored and reported to the GS.

The Three-Axis Magnetometer (TAM) is an analog unit. The out-of-range measurements and unnaturally rapid changes in the data (gradients) are registered and reported by the DAM as errors.

The DAM failures are detected by the MCC only by erratic or completely lost communication (via the CRC and timeouts). Also, an unexpected DAM reset (possibly via the watchdog) is monitored and reported by the MCC (at start-up, the DAM sends the boot-up message to the MCC). If the DAM communicates, it is assumed it works well. DAM failures are reported by the MCC via the telemetry to the GS.

Servo Control Unit Failure Modes

The Servo Control Unit (SCU) failure would inevitably lead to the vehicle loss. If at least some of the actuators would be working (in the case of a partial SCU failure), the system would enter the Critical Failure Mode (CFM). The SCU is naturally the most rugged unit of the whole system, kept as simple as possible and thoroughly tested. It is equipped with a watchdog timer, ensuring a restart in the case of a microcontroller hangup. A full restart of the SCU is fast enough (it takes about 50ms, much like the DAM restart). It also has the brown-out detection circuitry. However, the SCU should be redesigned in the future, to provide at least some redundancy (meaning that at least the Manual Control Mode should be made redundant somehow).

Communication Failure Modes

In case of a data communication loss (detected by the TCP/IP protocol via the loss of acknowledges), the telemetry datapackets are stored in a buffer and sent off-line as soon as the data link is re-established. Moreover, each datapacket is assigned a unique number and a time stamp, to allow one to check the received data consistency. Telemetry is not critical and can be also downloaded off-line, after landing.

Failure of the wireless control link (detected via the control signal loss) would currently make the system enter the Critical Failure Mode (meaning it sets the engine throttle to idle and engages neutral control surface positions). This is necessary because the higher control layers are not implemented yet, so the vehicle is not able to operate autonomously. In the future, when all control layers will be fully functional, this mode may be changed so that the vehicle would enter hover (in case of a rotorcraft) or orbiting (in case of fixed-wing) in case of this failure. The system has the ability to fully recuperate from the Critical Failure Mode immediately once the control link is re-established, so the vehicle can still be recovered if this failure occurs only momentarily.

The wireless control link is very reliable though. It is redundant - the RC transmitter transmits the control signal on two separate channels, with the ability to switch dynamically to another channel if an interference is detected. The system is equipped with two independent Wireless Control Units (receivers), with antennas positioned to mutually perpendicular polarizations. Only one channel is sufficient to maintain full control over the vehicle. Digital data encoding, secured with the CRC, is used. The transmitter is identified by a unique ID code, embedded into each datapacket, to prevent any other transmitter possibly taking over the vehicle control.

Power Failure Modes

Currently, there are no Battery Failure Modes - both batteries must be in good shape to keep the system running. Failure of any of them would have catastrophic consequences. A re-design of the SCU is considered, the purpose of which would be to make the SCU able to run from either battery, so it can drain power from the Actuator Battery (ACB), should the Avionics Battery (AVB) fail. Moreover, it should also be able to re-route power to the actuators from the AVB in case of ACB failure. This would bring a power supply redundancy into the system, without adding any mass or size.

The “backbone” power wiring and power switches (i.e. the lines whose failure would cause the whole system to fail due to power loss) are divided into two redundant parallel branches. This measure does not add much weight, while the safety benefits are obvious.

Fault-Tolerance of the RAMA System in Real Life

failsafe.jpg

Let us now show two real-life cases, which accidentally occurred during flight tests and where the fail-safe ability of the RAMA system helped to prevent an accident. In the first case, the control signal was lost for approx. 600ms (most likely because of radio interference). At the time, the vehicle was equipped with an older version of the Radio Control (RC) set, operating at 35MHz with no redundancy. The control system entered the Critical Failure Mode (CFM) in response to the signal loss, and later recovered after the control signal reappeared.

Reaction of the RAMA control system to this failure can be seen in figure, showing the actuator positions in time (on the y axis, 0 means actuator neutral position, while -0.5 means full throw to one side and 0.5 full throw to the other side). At 193.1 seconds into the flight, the control signal is lost. The condition is immediately detected and the control system reacts by engaging the CFM, which means re-positioning the actuators to preset positions - engine throttle to idle (fully negative actuator throw), collective pitch to slightly positive and roll, pitch and yaw actuators to their neutral positions. After 600 miliseconds, the control signal reception is renewed and the control system recovers from the CFM. The condition occurred when the vehicle was banking left at full-speed in forward flight, and although it was negatively perceived by the pilot, it did not cause a crash thanks to the rapid recovery. If the actuators were not set into neutrals and were subjected to some uncontrolled noise in the critical phase, the vehicle would likely crash.

In the second case, the roll gyroscope malfunctioned because of extensive vibrations of the helicopter body, caused by a mechanical resonance mode that showed up during the flight. The root cause was later traced to a loose bolt in the rotor head damper. In this case, the control system successfully engaged the Manual Control Mode (MCM). This action undisputedly saved the vehicle from crashing, as the roll rate control loop did not work, leading to the loss of vehicle stability. In figure, the roll rate measurements, along with the reference signal and actuator position, are shown. At 155 seconds into the flight (marked by the dashed line) resonance occurred, causing the roll gyroscope to fail (the mean value shift, explained in section Inertial Measurement Issues, was experienced). The problem caused the roll rate controller to compensate, falsely believing that the vehicle was rolling, inducing itself an undesired roll. The pilot immediately switched to the Manual Control Mode (MCM) and recovered the vehicle. Please note that from the dashed line on, the roll rate gyro readings and actuator position signals are no longer correlated. This is because the MCM was engaged, and in this mode the sensor readings are ignored and the rescaled reference signal (issued by a human pilot) is directly fed to the actuator (it can be seen that from the dash line on, the actuator position signal is equivalent to the rescaled reference signal).

roll_gyro_fail.jpg

The fail-safe ability and early warnings issued by the control system via the on-line telemetry were invaluable during flight tests, and prevented potential vehicle loss many times. Once, a failing actuator battery almost led to a crash, had it not been for the early warning issued by the control system. RAMA discovered an unusually high voltage drop on the Actuator Battery (ACB), which did not show up on earth, because the actuators were not subjected to the load, induced by the aerodynamic forces. This triggered a warning, issued to the human pilot. The vehicle was immediately landed and the suspected battery actually failed only two minutes after. This case led to the decision to re-design the Servo Control Unit in the future, as was described in section Power Failure Modes.

 
control_system_safety.txt · Last modified: 07/01/2010 09:14 by ondra
 
Recent changes RSS feed Creative Commons License Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki