STRUCTURE AND FUNCTIONS OF INFORMATION PROCESSING/ANALYSIS SYSTEM FOR ENVIRONMENTAL EPIDEMIOLOGY (IS)

PROBLEMS TO BE SOLVED BY IS
INFORMATION FLOWS IN IS
THE COMPONENTS OF IS
PROCESSES INSIDE AND OUTSIDE IS
THE USAGE OF IS

PROBLEMS TO BE SOLVED BY IS
   IS is a branch of environmental control. It handles informational control, in contrast to direct control by force. It is this informational branch that is presently underdeveloped in RF.

   Russia collects a lot of data on health and environment. However, their potential is just minimally used in decision-making because data per se are indigestible and the decision-maker receives only a tiny portion of their informational content. IS will handle this bottleneck by using a deeper-than-usual information processing and a higher-capacity tool - environmental health scenario - for contacting the decision-maker.

   As any information system, IS can be split into 3 stages:

Information input:
1. Information input (primarily, monitoring data) is full of errors and gaps. How should it be refined and where should the refined information be stored? (IS answer: by filtering through the model.)
2. The monitoring data for different territories etc. are usually incomparable in their raw form. Also, data may have several levels - from an individual to RF as a whole. What common denominator can be found for comparison? (IS answer: use a coherent data processing technology based on multilevel models of the unified form.)
3. The existing monitoring system has hardly an element that doesn't call for improvement. What should be improved first? (IS answer: what is calculated by IS to be the most important for decision-making.)
Information processing:
1. Monitoring provides numerical data that is processed numerically, too. However, experts have a lot of qualitative, logical information that could potentially fill the gaps that exist in figures. How can it be used? (IS answer: by using only those models and processing algorithms that have both a numerical and a logical version.)
2. There are two approaches to the impact of pollution on health. Toxicology relies on pollutant concentrations and toxicities measured in laboratory. Epidemiology relies on real-world disease incidence data. Both have merits and drawbacks. How can they be combined, so that output information would stand on both feet? (IS answer: toxicological approach is more universal and is chosen for IS skeleton - automatic information processing pipeline. Epidemiological studies serve as plug-ins to the pipeline and are used to refine the environmental epidemiological (EE) models in IS.)
3. The correlations between pollution and health within the total bundle of factors are too weak. When a single causal chain "pollutant->disease" is extracted from the bundle, it may prove reliable. However, one can't afford to extract all chains in this way. Then, on what should data analysis be based? (IS answer: on stable patterns of factors that, on one hand, go together, and, on the other hand, don't need special extraction efforts.)
Information output:
1. How should the cost of EE decisions be taken into account? (IS answer: through the balance of medical and economical criteria for choosing a plan of action appropriate for each hazard. A set of model plans, each with its own costs and expected effects, will be supported by IS.)
2. What should be left to decision-maker's choice and what should be optimized in advance? (IS answer: the decision-maker should input into IS his/her criteria of evaluation for EE situation and for possible action. IS should produce a scenario optimized along the lines consistent with these criteria. Where there is not enough data for automatic optimization, IS should still leave the choice to decision-maker.)
3. There are several stockholders in the decision-making process, each with one's own viewpoint and interest. How should this multiplicity be handled? (IS answer: by searching for a multicriterial balance of interests within a bundle of model scenarios. The search can be directed by IS or by stockholders themselves.)

Each stage puts its specific problems that are too difficult to be handled in isolation. IPS will provide a complex solution for them, which will be, hopefully, more effective than partial solutions.

INFORMATION FLOWS IN IS
Each hazard has several components:

pollution source;
recipient organism;
social factors;
possible organizational countermeasures.

The goal of IS is connecting the components into a self-regulating system called henceforth controlled hazard. Presently, informational feedback from the monitoring of a hazard to its control is too weak for self-regulation to appear. IS should amplify this branch by means of a powerful informational channel - a model.

Fig. 1

Fig.1 shows a transition from the present information flow structure to one characteristic for IS. The width of lines indicates the capacity of respective information channels. The overall capacity is, of course, determined by bottlenecks. One such place is held by experts and decision-makers, but in IS they cease to represent a consecutive information link and become sort of controller sitting on a much more powerful information flow from hazard monitoring data to hazard control measures.
Direct information transfer shown by point is justified only in emergency regime when IS is not fast enough to respond, so that monitoring/control chain obviates the model.

THE COMPONENTS OF IS
   As shown in Fig. 1, IS can be thought of as a part of controlled hazard. The core of IS is formed by the model of the hazard. Although the environmental health problem is, in principle, a whole, it can be naturally split into more or less independent hazards according to territory, pollutant, etc. The models split in the same way. If hazard control is permanent, each model object participating in the hazard - plant, social group etc. - acquires its own "mini-IS" that connects it to other components of the hazard control cycle.
   "Mini-IS" may exist just within a computer, as a relatively independent part of the total model. However, it can also be embodied into an organization, e.g., into a territorial center for IS. It is important to distinguish between organizational and informational structure of IS. Both have several levels, but in an entirely different way.
   Organizationally, IS is a network of computerized centers for data analysis served by experts. Regional centers are situated in regional institutions, e.g., of Sanepidnadzor. They are linked by computer networks to the federal center situated in one of the federal institutions.
   The federal center supports only models and data necessary for problem-solving at the federal level, e.g. norm-setting at the federal scale. Regional centers can access the full variety of EE information on the region, including individual health registers. They also interface the databases that belong to other institutions. If necessary, the federal center addresses the regional ones a request for scenarios and forecasts related to any regional hazard, even up to an individual plant's scale.
   Informationally, the skeleton of IS is formed by the information processing pipeline. It consists of submodels that correspond to components of a hazard - pollution sources, populational groups, etc. - process information and pass it to each other in a row. Submodels are hierarchical and include several levels of objects: a plant to an industrial branch; an individual to a population etc. Information processing may involve a transition from a level to another.

Fig.2 The structure of main informational pipeline.

The model objects should be as stable as their real prototypes. So, they must include elements of self-regulation. The most important part of any model object is its internal dynamics (or its internal method, in the programmer's terms). It determines what is stable in an object (say, in a population) or, alternatively, what is maximized by an object (say, cost-effect relation is maximized by the object "action plan").
There are two types of stability:

w.r.t. forcing that models the real-world interaction and is applied by a model object to another;
w.r.t. "informational forcing" produced by monitoring data; this type of stability lets IS discard unreliable data.

In addition to internal self-regulation, the model objects' state is regulated through external informational chains. The most important of these is that called above "mini-IS". To involve an object into this chain is the main objective of IS.
The informational pipeline has its inputs and outputs. The inputs are:

the monitoring data;
the choices made by a decision-maker in scenarios of environmental action;
information-refining rules and data from experts.

The outputs are:

hazard representation as a model structure for experts;
hazard representation as possible scenarios for decision-makers;
hazard control levers plugged into IS.

Of hazard control levers, two are the most important ones:

direct output of hazard control plan that results from a scenario approved by decision-makers;
indirect influence on decision-maker through scenarios produced by IS, which stimulate actions (e.g., norm-setting) that IS has no power to implement on its own or whose consequences IS cannot model.

One important way of organizational implementation of "mini-IS" is represented by the autonomous pilot software that runs under no direct computer link to reality. It can run on an isolated computer, on fixed databases that are not renewed systematically, and it serves as a sort of typical scenario library for decision-makers. It can control hazards only indirectly, through scenarios presented to decision-makers.
Such software might prove valuable for adjustment of data processing methodology, for training experts and decision-makers, for promulgating IS to new regions.

PROCESSES INSIDE AND OUTSIDE IS
   Being an information system, IS perceives reality only through its informational counterpart - monitoring data, action plans, etc.
   There is a standard method of interfacing this informational environment - adaptation method, available to each model object as a predictor/corrector algorithm. Its runs along the same lines both for importing information into IS, e.g. in the form of monitoring data, and for exporting information, e.g. as a hazard control plan.
   For import, the model of measuring device is necessary. It relates measurements to some unknown true state, which is to be modeled. Then, a model forecast of this state is produced by the internal method of a model object and is transformed into expected value of the measurement. The discrepancy between the latter and real measurement is assimilated into the model in such a way as to decrease the expected subsequent discrepancies. If, however, the discrepancy is too large, the measurement will be rejected as erroneous. The rejection threshold reflects the relative information content of the model and the measurement and is continuously adapted by both model and experts.
   While exporting information, the internal method of a model object first generates a target value for object state in the points of time and space where IS has some levers of control over situation. Then, discrepancy between the target state and the monitored real state is transformed into the action applied to the lever, which is expected to drive the object towards target. A model of the lever, analogous to the model of the measuring device, is used in the process. Then, the reaction of the object is observed and if it does not correspond to expectations, the model object is corrected.
   This type of adaptation was thoroughly tested in multiple branches of science and technology. It is well known when it converges.
   Since IS may contain copies of "mini-IS", model objects can represent informational environment for other model objects. Therefore, adaptation can be used to harmonize information contained in different objects. E.g., the influence of a higher-level object "population" on a lower-level object "individual" can be represented as the export of social parameters. Conversely, the formation of the average populational value from individual values can be represented as import.
   The informational environment contains also the external models produced by independent organizations. They can be used in the same way as monitoring data: by adapting internal models to them.
   Special EE studies can be viewed as a sort of external models w.r.t. the informational pipeline. They produce the relation between pollution and health. The problem is that they use a definition of health risk different from that used in toxicological risk assessment. EE risks are more realistic, but more difficult to extrapolate to other study conditions and other cohorts.
   IS considers EE studies as a sort of measuring device that characterizes the same risk, but in a different reference frame (a narrower one, adapted to a specific cohort). There should be a model of this "device" that describes how the subset of a hazard chosen for EE studies relates to objects modeling the "full" hazard. Then, the models can be adapted to EE study results as if these latter were pointwise measurements. Extrapolation to other "points" (times, cohorts, etc.) is provided by the model automatically.
   A converse process is also possible: refining the set used in EE studies by adapting it to the model. It is a conceptual analog of monitoring schedule optimization. Both form a feedback form IS to its informational environment.
   This "monitoring-optimizing" feedback runs when the model has insufficient data for choosing between two possible scenarios. Then, the most sensitive link in scenario deduction chain is searched. It can be monitoring data, or expert knowledge and rules, or calculations by external models. To this link, a request for information refinement is addressed.
   Scenarios produced by IS for decision-makers may be also considered as a sort of request - for criteria used in choosing an action plan. "Action" may consist in organizing an additional monitoring. This option will be included into scenarios when requests for additional monitoring data become at least as important as the right choice of criteria. As all other options, this one will also include economical aspects - the price of information.

THE USAGE OF IS
There are 4 main types of IS users:

the decision-makers' round table striving at a coordinated action plan;
an individual decision-maker who studies the possible scenarios within his/her own domain of responsibilities;
an institutional expert who supports a decision-maker and controls the appropriateness of scenarios produced by IS;
a dedicated IS expert who checks the internal model, supplies the necessary knowledge and rules, conducts EE research, etc.

   For decision-makers, the main IS product is a scenario. This term means a "portrait" of a hazard based on its model and including hazard-combatting measures known to IS. A scenario extrapolates the hazard to domains where there are no direct data on it: to future, to other cohorts, territories, pollutants etc. Importantly, it includes a characteristic of extrapolation uncertainty, which shows where the causal chain on which the scenario is built runs thin.
   A scenario bundle presented to the round table includes options optimized for each individual decision-maker, i.e., according to his/her criteria. The round table should study different combinations of these options, look out for points where they hit the interests of some stockholder and find a mutually acceptable combination. The process can be self-guided of directed by IS.
   An individual decison-maker's first question would be, probably, "what if?". Various uncontrollable factors, such as wind direction, can be experimented with, and various options of environmental control studied. Optimization is confined here to a fixed system of cruteria specific to this decision-maker.
   An institutional expert would advice the respective decision-maker on the overall reliability of a scenario and its thin points - even up to returning a scenario to IS for refinement. The expert's responsibility is also to check the real results of an action plan against the forecasted ones and to order an accordingly corrected scenario from IS.
   An IS expert is the only one that has a direct access to internal EE model. However, arbitrary correction of the model is still impossible. The "model anatomy" presented to the expert will have its "hot points" for correction, just as a scenario has. Corrections will be assimilated only if they improve the quality of data representation by the model. Also, IS can address an expert on its own in search of knowledge that removes uncertainty preventing good scenario production.

Webmaster:
Stalnaya М.

//-->

STRUCTURE AND FUNCTIONS OF INFORMATION PROCESSING/ANALYSIS SYSTEM FOR ENVIRONMENTAL EPIDEMIOLOGY (IS)

&nbsp;

© Space Research Institute, RAS, 1998-2001

Webmaster: Stalnaya М.

Webmaster:
Stalnaya М.