Robotics Research Group
Learn MoreTask Planning and Operations: Fault Tolerance
What is Fault Tolerance?
Why is Fault Tolerance is necessary for robots?
How do we implement Fault Tolerance?
Present Development and Achievement
Videos
What is Fault Tolerance?
A fault tolerant system is one that can identify a failure, isolate that failure and provide a means of recovery.

There are two animations for fault tolerance implementation using redundance. They are a two DOF arm for a robot.

One uses the minimum torque criteria and has the added effect of reducing the maximum torque to 0.35 of the normal maximum. The other uses the minimume velocity criteria to resolve the redundancy.

Both have the same end effector motion file, which is only defined in the y direcition. The x direction is how the redundancy is resolved.
Why is Fault Tolerance necessary for robots?
Fault Tolernace image of Robonaut
Robonaut removing an ORU on space station Freedom. Note the "stinger" tail is attached to a rail while the arms remove the ORU. Also note the hand rails on the space station’s superstructure.
Robots have found a niche working in dangerous environments. Robots today work in deep sea operations, space missions, nuclear cleanup, and bomb disposal.

Because these robots are in situations that are hazardous for humans, a robot failure can be very expensive. In these failure critical missions, robotic systems must be fault tolerant.

The problem of failure recovery was first addressed in the aerospace community. In 1977, a pilot flying a Lockheed L-1011 landed the plane despite the complete lack of conventional pitch control [Montoya, 1982].

This was revolutionary because the pilot had used redundant control (in this case the engines) to control a plane that was out of control in the traditional sense.
How do we implement Fault Tolerance?
Rendundant Robot
A redundant robot demonstrating its self motion or null space. Note there are infinite manipulator configurations for a single end effector position.
Using redundant serial robots for failure recovery seems simple, if an actuator fails, the controller locks the faulty joint and the redundant actuators continue operation. Unfortunately, this problem has proved to be deceptively difficult.

Following a joint failure, a robot may possess a sufficient number of actuators to address the end-effector space, but geometric singularities may leave the robot unable to work. In addition, locking a joint will change a robot's workspace.

The problem summary is:
Given an actuator or sensor experiences a failure, what is the impact of that failure and how should the manipulator be reconfigured in order to provide continued operation?

The problem of post failure recovery revolves around a principle condition - the effective use of the null space to continue operation, maintain positional control and provide the best overall performance.
Present Development and Achievement
In 1988, Pradeep described locking a joint and the resulting workspace effects for several commercially available robots. Pradeep [1988] limited his research to nonredundant robots and fault tolerant design issues.

Maciejewski [1990] introduced using redundant robots as a solution to the workspace problem Pradeep described. The University of Texas [Tesar, Menon, Sreevijayan and Ting] proceeded to develop partial failure solutions as a means to continue operation with no workspace constraints, but rather load and speed constraints.

Sreevijayan [1992] established a unified framework for redundancy and decision making for fault tolerant robots. Ting [1993] evaluated optimal control algorithms for failure recovery and introduced time regulation and torque redistribution.

Menon [1994] describes several online recovery strategies as well as introducing several performance criteria. Sreevijayan [1997] presented a reconfiguration algorithm based on generalized inverses of nonlinear operators mapping Hilbert spaces.