Probabilistic vs. Deterministic Models - Modeling Uncertainty
This page examines probabilistic vs. deterministic models -- the modeling of uncertainty in models and sensors. This is part of the section on
Model Based Reasoning that is part of the white paper
A Guide to Fault Detection and Diagnosis.
Diagnostic systems inherently make assumptions on uncertainty. The only question is whether this uncertainty is explicit, or is hidden inside of “black box” techniques, or is just part of engineering judgment during tuning.
For models, we say they are deterministic if they include no representation of uncertainty. First principles, engineering design models generally are deterministic. But the uncertainty representations used for estimation and diagnosis are usually extensions the deterministic model. For instance, the uncertainty model may include additional noise terms added to the measurements and to the process model. In the case of empirically derived models such as regression models, the uncertainty is generally available as a byproduct of the regression or other procedures used.
Where does uncertainty come from?
Some examples of uncertainties:
(1) Imperfect observations ("facts")
Analog sensor readings like temperature, position, etc., have random measurement noise, and systematic errors like biases. Those uncertainties are often characterized by variance, min/max error, etc. There are also various possible outright failures (with some probability of failure). The diagnostic system has to distinguish between the process faults and sensor faults, which is usually the hardest problem of all. In some industries (such as the process industries), some of the sensors are less reliable than the equipment they are monitoring.
While it may be best to do as much as analysis as possible with numerical models and sensor values first, when eventually converting to "logical" values like "high temperature", you'll often want to carry over some measure of uncertainty, whether it's some form of probability, fuzzy value, confidence factor, voting logic, whatever. You'll be combining evidence, and will need to know how much to believe one sensor (or human input) vs. another.
(2) Imperfect models (whether qualitative or quantitative).
This includes models in a very loose sense - they might be qualitative models like cause-effect models or rules, or quantitative models, etc. -- any part of the knowledge representation beyond the "facts". Think in terms of model error, which might have its own representation of uncertainties independent of measurement uncertainties.
(3) Inconsistencies caused by timing
Important when there are time delays between observations of both cause and effect. It also happens when drawing conclusions based on data that is filtered or averaged over time, if there are different sample times, filter coefficients or averaging periods for different sensors. This causes temporary inconsistencies when the system is in transition due to normal changes or due to faults.
(4) Missing observations
Sometimes sensors are temporarily out of service due to failures, calibration procedures, or other maintenance work in the area. Recovery depends on having some redundancy - in effect determining values another way, or determining a number of state variables smaller than the number of measurements. You also might carry over previous values, or use a priori (assumed default) values. In these cases, the uncertainty should be higher than when observations are present, and uncertainty should grow over time
(5) Fault masking
When there's one fault, you might not even be able to see the symptoms of other faults. For instance, if you lose network connectivity to a region, you can't see any symptoms of other problems in the region. If your car battery fails, you won't be able to see faults in the fuel pump when you try to start your car. This is really a special case of missing observations, that arises only when there are multiple faults present. In systems where faults can exist a long time (like 1.5 years until the next plant shutdown, or weeks until a new part arrives for your autonomous robot), so that you get multiple faults often, additional uncertainty arises. Models used for model-based reasoning can make it possible to spot such situations.
(6) Initial conditions/prior knowledge & assumptions errors.
- e.g., initial location, etc. Uncertainty here would normally be handled the same way as observation uncertainty.
(7) Extent of faults.
Many significant faults result in varying degrees of effect. For instance, a fault that is a leak of material or energy may be tiny or it may be very large. A heat exchanger or a temperature sensor may be “fouled”, meaning that there is a buildup of crud that provides some insulation. The amount of fouling may be tiny or very significant. In the case of sensor bias, the bias might be small or large. These sorts of variations make it difficult to pick thresholds for “normal” vs. “bad”.
An example illustrating some uncertainties - industrial cookie baking
An example with various uncertainties:
Consider an industrial expert system for monitoring the baking of batches of cookies, with the hope of avoiding burning them. You'll have an oven temperature measurement, and some standard for "burned", based on color measurement averaged over the cookies.
You might want to say something like
if the temperature exceeds 450 deg for more than 5 minutes, the cookies will be burned.
(You will be looking for root cause problems for burnin, related to oven temperature. There might also be other root causes for burning, such as "inadequate dough mixing, improper recipe, etc.)
But, that's just an approximation. The oven temperature distribution is uneven, and you won't have measurements of how hot each different location in the oven is. So, in the hot areas, you will get some burning even if the (spatial) average is less than 450. And, the composition of the cookies isn't 100% perfectly mixed, either. Some components like sugar burn more easily, so you'll get some areas burned where there happens to be more sugar. And you'll never know ahead of time what those areas are. In reality, you'll get some burning at 448 deg F or lower because of all of this "process noise".
So, given any temperature, at best, you're really only able to give a probability that there's burning. If you're trying to avoid burning, and know the temperature (averaged across the oven) is 448, 449, 450, or 451, at best you could say is there's some probability of burning, and some that it's not for each threshold.
Humans are very subjective, so you'll get different numbers from different "experts". The ones who really want to avoid risk ( or really prefer doughy cookies to burned cookies) might give more conservative numbers like 447, while the ones who really want "accuracy" or prefer crisp cookies might give a higher number. Some will parrot old numbers from old operating guides than no one has updated to reflect the current recipes, and there probably wasn't any real lab work on the current recipe anyway.
And all this assumes you know the actual temperature. In reality, you NEVER know the actual temperature at ANY point. Instead, you have a measurement, which provides an estimate with its own set of errors, does some space & time averaging, and also some probability of failure and major errors. There could be measurement noise, bias or other systematic errors. The same applies to any measurements of color that you make. So any decision or rule relating temperature and burning has uncertainty.
So, ideally, the best you can do is talk in terms of probabilities. The physical system, any models you might have about it, the rules, and the measurements are all just approximations. And that's true for any physical system with analog variables that must be measured.
There are other problems with threshold events like T > 450. The original fault occurs, then there's some delays and lags in the physical process. Then, because of sensor noise, it might take several samples before you get a high enough value. Some delays are caused by scanning times -- different variables may be scanned at different intervals, or even the same intervals, but in some sequential order specified in the scanning equipment, not the physical process.
Besides the noise issue, most faults cause a change in the variable -- but the delay until you reach the threshold depends on where you start. If the process is at T=449, and a failure increases T, then response is quick. But if the process is at T=400, and the fault starts T rising, it takes longer. So, you have a variable delay based on the previous process state before the failure.
Finally, a big factor people forget is any filtering, averaging or discretization that is done. When the sensor changes, it takes time to propagate through the filters based on the filter parameters. And if the discretization is large, you might not even see the change. (If T is at 449, and you only get notified when it changes by 5, you won't even realize it gets to 453).
Always distinguish between actual values and observed values
Even experts sometimes get careless about the distinction between actual values and observed values. They're different things, because of both observation errors and the probability of failure of the observation mechanism, such as a sensor in an automated system. This is critical in many domains, such as continuous process plants, where the sensors are often less reliable than the equipment they monitor. Even when a human is answering questions, they are reading sensors, so the problem doesn't go away in manual systems in those domains.
In the cookie baking example, it's the actual temperature that makes the cookies burn, not the observed temperature. So if the observed temperature is high, you can't necessarily conclude that the cookies will burn. The real temperature might be high, or the temperature sensor might be reading high (biased), or even stuck at a high value.
Conversely, if you get burned cookies, and see that the temperature appears in normal range, you can't conclusively rule out that the actual temperature was high, because the temperature might be reading low or stuck low. You still need to do other checks on the sensor: like looking for flatlined values or sudden jumps. You'd probably start looking at other data, e.g., looking at a model of power consumption over the batch vs the integral of the temperature profile over time compared to normal, making some metric analogous to "degree-days" used in HVAC). And then discrepancies in the temperature profile vs. time could be due to a biased or wrong power sensor... )
Possible root causes of the burned cookies include included various sensor failure modes, incorrect setpoint (target) temperature by human, temperature controller in manual or otherwise broken, various physical properties of the cookie dough, etc.
In some cases, you simply can't observe enough variables to differentiate among multiple possible failures when you include those sensor failures. You can only report multiple possible causes. Ideally, you'd at least rank them based partly on a priori failure probability,and ideally also partly based on the strength of the connection between their failure and the observed problems.
It's important to clearly include careful naming of anything, usually with adjectives like "observed" , "estimated", "actual", etc., so there's never any doubt which you're talking about. People found this necessary in the analogous statistical estimation systems, e.g., putting in little "hats" to indicate "estimated" or "predicted", distinguishing measurement and internal state by notation such as x vs. z, an so on.
Needless to say, "actual" values are never known when sensors are involved - they only can be inferred or predicted. Actual values might be directly testable with some sensor, or might not even be directly testable at all. (In the case of, say, "plugged distillation tray", you can't observe the fault directly at all without a plant shutdown -- you can only infer it indirectly based on pressure, flow, and temperature information, etc.).
Dealing with uncertainty
Even in the simple case of manually setting a simple alarm/event threshold for event detection, probability is a factor. If the threshold is set too far from normal operations, you will fail to detect some problems. If the threshold is set too close to normal operations, you risk getting “false positives” during small transient disturbances. Statistically, this is balancing type I error vs. type II error. These error probabilities could be estimated using process data and knowledge of whether the problem actually existed. But usually this tuning setting is just based on engineering judgment and experience -- the probabilities in that case are only implicit in the designer’s head, not ever formally stated.
To work in the logical realm often means setting thresholds on analog variables to distinguish between a normal value, high value, extra-high value, etc. Given the noise involved with sensors, that means that when true values are near thresholds, the logical values can easily go one way or the other. (The result is what we used to call the "chattering rule"). (Faults in sensors themselves also obviously affect the results). That's why it's best to avoid things like decision trees when the values are very near the decision points - both paths can still be relevant.
You can usually get more sensitive fault detection and isolation when making use of some numerical models outside of basic logic. As an example, consider something like a stuck sensor. It may or may not be stuck in the middle of a "normal" range, so it may or may not trigger any rules. Even if you detect the problem, you often won't have enough redundant information for fault isolation - to distinguish which sensor is stuck (or which other process problem like a leak) might have occurred. On the other hand, simply monitoring the standard deviation or derivative of each sensor over time, you can quickly realize that.the sensor is stuck. You detect and diagnose (isolate) this using input features based on this simple sort of time series analysis.
Consider the role of uncertainty when estimating variable values. Models used in estimation include some form of uncertainty in the measurements, in the model, or both. Estimation is closely linked to diagnosis, and in many cases the techniques are extensions of estimation techniques. When estimated values start to deviate significantly from measured values, that detects a fault ( or errors in models or noise models).
In the case of estimation with a Kalman filter, the starting point is a deterministic model of the system state (differential equation or difference equations, plus an algebraic relation between states and measurements). But then the filter model adds in uncertainty in the measurements (“measurement noise”), uncertainty in the model (“process noise”), and uncertainty in the initial condition estimate. The key tuning parameters are expressed as covariance matrices representing the measurement noise, process noise, and uncertainty in the initial conditions.. One derivation of the Kalman filter equations is as the solution to a least squares problem minimizing a weighted combination of measurement adjustments and model prediction adjustments to achieve an optimal balance between the two. The weighting is based on the inverses of the covariance matrices for the measurement noise and process noise. The static case of data reconciliation is a special case, using only algebraic equations, and asserting that the process noise is zero. (More general steady state estimation could also allow process noise, and really should in most cases, because of model uncertainty introduced in physical properties and other calculations).
The model for estimation depends on explicit assumptions on the noise. These can be estimated from historical data, or simply set as tuning parameters using engineering judgment and rules of thumb. For instance, for measurements, we normally assume independent measurements (diagonal covariance matrix) and “typical” variances (starting with typical standard deviations as a percent of the instrument range) for different types of sensors, possibly adjusting for the type of service they are in.
Diagnostic techniques based on these models typically look at model residuals or measurement adjustments. In either case, the tests are based on the assumed uncertainties.
Other forms of models are stated directly in probabilistic terms. An example of that is Bayesian modeling.
Dealing with uncertainty, inconsistency, noise, etc., often involves some kind of weighted evidence combination, which generally requires some numerical calculations. This might be conveniently represented in various forms such as as probabilities, noise parameters like variances, with evidence combiners using fuzzy techniques, bayesian analysis, measuring a distance between observations and patterns for different failure modes in the case of diagnosis, the "certainty factors" and Dempster-Shafer approaches of the early expert systems, etc.
From a practical standpoint, we've built plenty of systems using simplistic sorts of thresholds, making judgments trading off the cost of "false alarms" vs. missed true failures. We usually try to have multiple, redundant predictors of things like "burned" to avoid false alarms. Use "AND" logic, voting logic, etc. False alarms become a major nuisance in an industrial setting, leading to applications being turned off.
An example of tradeoffs in estimation, fault detection, and robustness, using redundant data
There are tradeoffs in goals to get the optimal estimate, vs. detecting sensor faults and providing "robust" estimates. As an example, consider a mobile robot. Suppose you have 3 or more position readings from sensors as varied as cameras, sonar, and intertial guidance (take the last position, update it based on estimated changes based on motor power and wheel directions), etc. (A possible Kalman filter application.)
To eliminate obvious problems, you'll first eliminate spikes by a spike filter, and probably some rate-of-change limits based on what's possible for the robot. You'll eliminate high frequency noise by some combination of analog filter when appropriate, and short-time constant digital filter such as averaging or an exponential filter. You'll eliminate obvious major failures when the readings hit "reasonableness" limits you set. You'd also run checks for "flatline" by looking at the time history, checking either standard deviation or estimating the time derivative. The worst problems are due to systematic errors - e.g., bias errors, where a reading is consistently high or low. It might be steady, or it might slowly drift. Those low-frequency problems aren't eliminated so easily.
Assuming no faults (including no biases), the optimal estimate (under mild conditions, via minimum variance, maximum likelihood, or intuitive least squares approaches) for this linear case is a weighted sum of all the readings, with the weights inversely proportional to their measurement variances.
But, a simple weighted average is sensitive to any errors in any sensor, so it's not robust to failures. You might think you could also check the differences between the optimal estimate and each measurement, and discard the measurements that have deviations that are big. But that doesn't work -- the biggest change between the measured and the "optimal" estimate goes to the sensor whose variance specification was the largest, not necessarily the worst sensor. However, you can look at the average of these deviations over time. If the errors are truly random, then the average should be zero. When they're not, you can suspect the sensor, even low-frequency errors.
A simple approach is to take the median value, and discard the rest. That is not optimal, but it is much more robust, tolerating up to 2 failures for 3 sensors. Furthermore, you can compare each measurement to the final estimate, looking at the behavior over time as above to flag suspect measurements.
The above redundancy may seem extreme. But redundancy is common even though it isn't always so obvious. Consider the example of the 3 electrical current measurements in and out of a node. There aren't literally duplicate measurements, but they are "analytically redundant" because there is an equation linking 3 variables. The equation sops up a degree of freedom, giving you the redundancy. With this, you can come up with revised, better current estimates (assuming no faults, just noise), and you can also detect a single fault by calculating the imbalance around the node -- a big imbalance means a fault. Those detectable faults are any that affect the sensors, plus the possibility of shorts or other faults that introduce an unmeasured current leakage from the node, because that additional fault's current path invalidates the assumed model.
Without additional consideration in this case (such as watching time behavior, bringing in other measurements like voltage, looking at adjacent nodes, etc.), you can't determine which one happened - you can't isolate the fault, just report that there's at least one of the 4 types of faults present. If there's multiple faults, they might even cancel out.
The median value approach for triply-redundant sensors won't apply to this electrical example, but the other comments do. An approach for both estimation and fault detection for these sorts of systems described by algebraic equations is called data reconciliation . That includes steady state electrical circuits, and networks with flows of material and energy.
A personal history of attempts to deal with uncertainty in real-time expert systems
We (meaning me as both application implementer and product developer, my companies, and customers) tried a variety of approaches approaches to making real time applications more robust and adaptable in the face of uncertainties. We were never completely satisfied.
For basic rule processing, such as those T < 450 thresholds for baking cookies, all such limits are somewhat arbitrary, probabilistic and can change over time. One obvious thing is to make sure those thresholds are variables that can be changed while the system is running (Real time systems should rarely shut down). This includes the human user interface, and being changeable by the system itself. Of course, you need to think through any adaptive or learning strategy for changing the thresholds to be sure it will be a stable process, and impose some ultimate limits. Statistical Process Control is a good approach, whether done by a person or by the system itself.
For systems that needed adaptation to changing states and parameters, we used a lot of tricks like using heavily-filtered values as averages, and looking at deviations from those averages. When possible, we also looked for invariants under normal operations that were less likely to change with typical harmless process changes. Simple things like monitoring product to feed ratios around splitters like distillation columns instead of just the product flow rate (people and the controls changed them together, so the monitoring system reflected that). Use dimensionless variables whenever possible, e.g., dividing values by the maximum possible. Relationships like pump curves look similar for pumps of different sizes, so you can get better starting points.
While the G2 environment had issues (cost, complexity, support issues, etc.), its rule engine portion did support simple fuzzy value combination for uncertainty. Logical values took a numerical range from true to false. This didn't use the full fuzzy logic set membership functions many people think of -- it was much simpler. Expressions evaluated using OR as a maximum operation and AND as a minimum operation. That works, but it's sensitive to single errors. And it doesn't provide a satisfying increase in belief as you accumulate positive evidence from multiple sources. But arbitrary expressions and functions could be used in the rules, so people used their own schemes, such as requiring multiple conditions ANDed for a conclusion as a fault tolerance approach.
We never used classic expert system uncertainty approaches (MYCIN, Dempster-Schafer, etc.) They all seemed like too much computation, or of debatable value or theoretical justification. In any case it wasn't clear how to mesh that with real time data handling. It seemed best to think in terms of probability for logical variables, and variances for continuous measurements, tying into well-developed disciplines. Many people just used the extreme true or false values, effectively ignoring the fuzzy values in between.
For real-time data samples and conclusions, an exponentially growing uncertainty after the sample time or conclusion time sounds good in principle. But, that requires constant re-computation of every variable, not very practical for large-scale systems. Instead, data in G2 and GDA was time stamped and assigned an expiration time based on a user-specified validity interval. (After the validity interval, the value expired, becoming unknown unless updated before that). The inference engine propagated changes in fuzzy values, time stamps, and expiration time (with the obvious minimization of expiration time for AND, maximization for OR). So variables held their fuzzy values until they expired or otherwise changed. Simple, with a lot less computation.
When numerical models were needed, people could have assigned statistical variances to measurements, doing their own calculations calculating the propagation of variances as in Kalman filtering. But the customers were often in the process industries (where the numerical models were complex and low quality) or in telecom where those kinds of math models rarely applied, so those kinds of methods just weren't popular. People couldn't justify the kind of time needed.
We built graphical languages (implemented in G2), both to simplify applications and also offer different approaches. One (GDA) combined a dataflow language (forward chaining only) with a workflow/procedural language.
The dataflow portion started with data acquisition, with numerical, time-series processing, SPC, and filtering blocks so application developers could attenuate noise and reduce sensitivity to errors. Additional filtering could in effect occur when converting to logical variables - e.g., requiring time patterns like m of the last n samples to be greater than some limit. The sorts of approaches to reducing the effect of noise are described in the filtering guide at
Then, for logic, there were blocks for AND, OR, NOT, m-of-n inputs, sequence detection, latching, counters, alarms and meta-alarms, etc. The default logic gates copied the G2 style fuzzy values, but there were also options for full fuzzy set membership and associated fuzzy combination gates. There was also a nonlinear "evidence combiner" that did reasonably represent an S-shaped curve of output belief vs. percentage of true inputs, intuitively appealing but without any particular theoretical justification.
Even with all that capability, most people ignored the fuzzy set membership stuff for real applications-- too much trouble. They dealt with noise by filtering, various time patterns like m of the last n samples, explicit ANDing of multiple conditions to conclude anything (or an m-of-n inputs true to get a true output), hysteresis, latching, etc.
The next generation of tools had a different focus, on large event-oriented systems, where numerical filtering, processing and subsequent creation of events with symbolic values was outside of the tool. That includes the CDG tool based on causal models as well as a graphical procedural language called OPAC, introduced as part of the Integrity product described in “Using Expert Systems to manage Diverse Networks & Systems.”
CDG by default used simple fuzzy values as in G2. It suffered the same weakness as the other tools in that it was sensitive to single errors due to the min/max logic for AND & OR. Its philosophy on conflict resolution was to believe the last value entered, propagating the new value upstream and downstream in the models, overwriting old conclusions as necessary. This made sense based on the idea that the newest events (that arrived asynchronously) represented more recent state knowledge.
OPAC, like CDG, worked on events, so it also didn't address the filtering and so on. It did a lot of operations based on combining information from queries of the event history, so fault tolerance was often achieved by counting events of certain categories, applied to a specific domain objects, over a specific time period. As a procedural language, it didn't by default contain any logical variables or maintain any truth values - those were implicit in the branches taken. (State could be maintained as desired in the domain model and in local variables defined for the procedure instances.) People could use it to build decision trees and state diagrams, as well as test procedures and corrective actions.
A different group built a Bayesian belief network toolkit. Bayesian methods offer some major advantages:
(1) Accounting for a priori probability estimates (knowledge that some things were more likely to fail than others, and accounting for that in the final results)
(2) Evidence combination following the laws of probability. In particular, you can better resolve conflicting data. Also, as you accumulate more positive evidence, your belief in an outcome really does increase.
(3) Coming up with a ranked list of the potential causes of problems
(4) Having a more solid theoretical basis
Those were all valid points unaddressed by the other tools above. But there were downsides:
(1) people had to understand and specify a lot of conditional probabilities, and understand just what things like "noisy OR gates" really meant - not an easy sell.
(2) Scalability concerns.
(3) Naive Bayesian methods implicitly make a single-fault assumption. That's not realistic for many applications where there may be many ongoing faults, partly because of waiting a potentially long time to repair. That happens in large-scale applications like process plants, as well as in autonomous vehicles.
This toolkit wasn't popular, although that was probably due mainly to non-technical reasons: poor marketing, arriving at a bad time, high cost, and being tied to a complex language with limited market size.
More recently, I had been working on a newer approach to reasoning with causal models, but combining evidence in a way that approximates Bayesian methods, to get the most of above benefits without the above problems. Unlike the previous G2-based tools, it is Java based, running on a Google App Engine server. But it's not ready.
Copyright 2010 - 2014, Greg Stanley
(Return to A Guide to Fault Detection and Diagnosis)