<ch23 toc ch25>

24 Data flow diagrams

24.1 Purpose
24.2 Strengths, weaknesses, and limitations
24.3 Inputs and related ideas
24.4 Concepts
24.4.1 Data flow diagram symbols
24.4.2 Conventions
24.4.3 Underlying principles
24.4.4 The context (level 0) diagram
24.4.5 The level 1 data flow diagram
24.4.6 Documenting the model
24.4.7 Verifying the model
24.4.8 Exploding the processes
24.4.9 Functional primatives
24.4.10 The configuration item level
24.4.11 The complete logical model
24.4.12 Logical and physical data flow diagrams
24.5 Key terms
24.6 Software
24.7 References

24.1 Purpose

A data flow diagram is a logical model of the flow of data through a system that shows how the system’s boundaries, processes, and data entities are logically related.

24.2 Strengths, weaknesses, and limitations

A data flow diagram is an excellent tool for summarizing and organizing detailed information about a system’s boundaries, processes, and data entities, providing the analyst with a logical map of the system. Documenting the system’s boundaries by drawing a context diagram helps the analyst, the user, and the responsible managers visualize alternative high-level logical system designs. The elements of a data flow diagram lead directly into physical design, with processes suggesting programs and procedures, data flows suggesting composites, and data stores suggesting data entities, files, and databases.

Creating a data flow diagram is a process driven task. Consequently, it is relatively easy to overlook key data elements and composites. Balancing a data flow diagram verifies the model’s internal consistency, but does not necessarily reveal missing elements. Attempting to balance a significant logical model without appropriate software (such as CASE software) is at best difficult and can be misleading. Beginners and users often confuse data flow diagrams with process flowcharts.

24.3 Inputs and related ideas

The first step in creating a data flow diagram is to prepare a list of the system’s boundaries, data, and processes using the tools covered in Part II. Data flow diagrams are a significant part of the structured analysis and design methodology (# 3). A data flow diagram is sometimes created in conjunction with an entity-relationship diagram (# 26) or data normalization (# 28). Processes are documented using one or more of the process description tools in Part VI (#s 55 through 60). The data elements and data composites are documented in the data dictionary (# 25). The data flow diagram is sometimes included in the requirements specification (# 35). A completed data flow is required by the automation boundaries technique described in # 36.

24.4 Concepts

A data flow diagram is a logical model that shows the flow of data through a system.

24.4.1 Data flow diagram symbols


Figure 24.1  
Using Gane and Sarson’s notation,4 four primary symbols are used to create a data flow diagram (Figure 24.1). A source or destination (sink) is represented by a (shaded) square. Sources and destinations define the system’s boundaries; each one represents a person, organization, or other system that supplies data to the system, gets data from the system, or both. A process, or transform, (a round-cornered rectangle) identifies an activity that changes, moves, or otherwise transforms data. A data store (an open-ended, horizontal rectangle) represents data at rest and implies that the data are held (for some logical reason) between processes. A data flow (an arrow) represents data in motion. Additionally, Gane and Sarson use thick arrows to show physical or material flows.

Using Yourdon7 and DeMarco’s2 notation, sources and sinks are represented as rectangles, processes as circles, and data stores as horizontal rectangles open at both ends (two parallel horizontal lines). Data flows are shown as arrows. There is no symbol for a material flow.

24.4.2 Conventions

The following conventions are used.

24.4.2.1 Legal and illegal data flows

All data flows must begin and/or end with a process (Figure 24.2). Data cannot legally flow directly from a source to a destination or between a source/destination and a data store unless they pass through an intermediate process.


Figure 24.2  All data flows must begin and/or end with a process.

24.4.2.2 Data flow lines

Multiple data flows between two components can be shown by two data flow lines or by a two-headed arrow. Some analysts use two flow lines when the input and output data flows are different and a single two-headed arrow when they are the same. For example, a process that gets data from a store, updates the data, and then sends the same data elements back to the store calls for a two-headed arrow.

24.4.2.3 Naming

A process name consists of a verb followed by a noun. By convention, the names of the sources, destinations, and data stores are capitalized, while process names and data flows are shown mixed case.

24.4.2.4 Numbering

By convention, the processes in a level 1 data flow diagram are numbered 1, 2, 3, and so on. The numbers do not imply sequence; they are for reference only.

The sub-processes in an exploded data flow diagram are assigned numbers starting with the parent process’s number. For example, level 1 process 4 might be exploded into level 2 processes 4.1, 4.2, 4.3, and so on, while level 2 process 4.3 might be decomposed into level 3 processes 4.3.1, 4.3.2, 4.3.3, and so on.

Many analysts use the letter D followed by a number to identify the data stores. For example, in an inventory system, INVENTORY might be D1, SALES might be D2, and so on. Some analysts identify the sources and destinations as well.

24.4.2.5 Duplicate symbols

Symbols can be repeated if doing so makes the diagram easier to read. For example, duplicating a symbol might be clearer than drawing lengthy or crossing data flows. Duplicate symbols are usually marked in some way; for example, source/destinations might be marked with a slash in the lower-left corner and data stores might be marked with an extra vertical line.

24.4.3 Underlying principles

Two general principles guide the creation of a data flow diagram: the principle of data conservation and the principle of iteration.

24.4.3.1 The principle of data conservation

There are no miracles, and there are no black holes. A given process can neither lose nor create data. Any data that flow into a process must be used by or output by that process. Any data output by a process must be input to or created by an algorithm within that process. Except for constants, any data used by an algorithm within a process must first flow into the process. Finally, any data created by an algorithm must either be used by another algorithm within the same process or output by the process.

24.4.3.2 The principle of iteration

High-level processes are decomposed into lower-level processes. At the lowest level are primitive processes that perform a single function (or algorithm). Note that a lower-level process gets its data from its higher-level parent.

24.4.4 The context (level 0) diagram

A context (level 0) diagram documents the system’s boundaries by highlighting its sources and destinations. Documenting the system’s boundaries by drawing a context diagram helps the analyst, the user, and the responsible managers visualize alternative high-level logical system designs.

For example, Figure 24.3 shows a context diagram for a typical inventory system. The system itself is shown as a single process. It provides data to the FINANCIAL SYSTEM. It both provides data to and gets data from MANAGER, SUPPLIER, and CUSTOMER. Note that the data flows are labeled with (at this level) composite names.

Moving the boundaries significantly changes the system, and the ability to visualize the implications of different boundary assumptions is a powerful reason for creating a context diagram. For example, in Figure 24.3 the financial system and the inventory system are independent. An alternative logical design might move the financial system inside the inventory system (or vice versa), effectively integrating them. The result would be a somewhat more complex (but perhaps more efficient) system.


Figure 24.3  A context diagram.

24.4.5 The level 1 data flow diagram

A level 1 data flow diagram shows the system’s primary processes, data stores, sources, and destinations linked by data flows. Generally, a system’s primary processes are independent, and thus, separated from each other by intermediate data stores that suggest the data are held in some way between processes.

For example, Figure 24.4 shows a level 1 data flow diagram for an inventory system. Start at the upper left with source/destination FINANCIAL SYSTEM. Data flow to FINANCIAL SYSTEM from process 9, Report cash flow. Data enter process 9 from data store D1, SALES. Data enter D1 from process 2, Sell appliance. Process 2 gets its data from CUSTOMER and from data stores D1, D3, D5, and D6, and so on. Note how intermediate data stores serve to insulate the primary processes from each other and thus promote process independence.

A level 1 process is a composite item that might incorporate related programs, routines, manual procedures, hardware-based procedures, and other activities. For example, process 2, Sell appliance might imply (in one alternative) a set of sales associate’s guidelines, while another alternative might include a point-of-sale terminal equipped with a bar code scanner and necessary support software. In effect, the level 1 process Sell appliance represents all the hardware, software, and procedures associated with selling an appliance. As the data flow diagram is decomposed, the various sub-processes are eventually isolated and defined.

24.4.6 Documenting the model

The data flow diagram shows the data flows between the system’s sources, destinations, processes, and data stores.

24.4.6.1 The data dictionary

The data elements are recorded in the data dictionary (# 25). As work progresses, the data elements that occupy the same data store or share a data flow form composite items or data structures that are also documented in the data dictionary. For example, Supplier name, Supplier address, Description, Reorder quantity and other data elements flow to SUPPLIER from process 4 and form a data structure that might be called Reorder.


Figure 24.4  A level 1 data flow diagram.

24.4.6.2 Process descriptions

Each process is defined in a process description that notes its input and output data elements and composites and briefly describes the tasks or activities it performs. (Process description tools are described in Part VI.) Process (or data transform) descriptions are sometimes recorded in the data dictionary.

24.4.6.3 The CASE repository

In most CASE products (# 5), the data descriptions and process descriptions are stored in the CASE repository.

24.4.7 Verifying the model

The point of verification is to ensure that the model is complete and internally consistent.

24.4.7.1 Syntax checking

Every data flow must begin and/or end with a process and have at least one arrowhead to define the direction of data movement. Every process and every data store must have at least one input data flow and at least one output data flow. If the inflow is missing, the source of the data is unknown. If the outflow is missing, that process or store acts like a black hole. In either case, something is wrong.

Other syntax checks involve judgement. Process names should imply their function. Component names should be unique because redundant names are confusing.

24.4.7.2 Tracing data elements

Following the principle of data conservation, each data element in a level 1 data flow diagram must be rigorously traced from its destination, through the model, back to its source. If the source of every data element is accounted for, the data flow diagram is internally consistent.

24.4.7.3 Cross referencing

On the data flow diagram, each data element, data store, and data flow must appear in the data dictionary, and each process must have a matching process description.

In the data dictionary, each logical data structure must match a data flow or a data store, and each data element must appear at least once on the data flow diagram. Additionally, each data element and each logical data structure must appear in the input or output list of at least one process description. There are two possible explanations for unused data elements: Either they are not needed by the system, or the analyst overlooked them.

Each process description must match a process on the data flow diagram, and the input and output lists must match the data flows. Every data element entering or leaving a process must appear in the data dictionary. Unused processes may have been overlooked when the data flow diagram was created. If not, they are unnecessary.

24.4.7.4 Tracing objectives

Note that if a significant feature of the system was overlooked, verification will not necessarily find the error. Consequently, the logical model should always be checked against the system objectives and the process or processes that contribute to meeting each one identified. If an objective cannot be matched with at least one process, that objective may have been overlooked. If a process cannot be matched with at least one objective, that process might be unnecessary.

24.4.8 Exploding the processes

A level 1 data flow diagram is a high-level logical map of the system. It shows the key relationships but hides most of the details. Consequently, the next step is to explode the processes by taking advantage of the principle of iteration. The act of exploding a data flow diagram is sometimes called functional decomposition.

24.4.8.1 Level 2

Each level 1 process consists of several sub-processes that are listed on the process description. To explode the data flow diagram, the analyst creates an independent level 2 data flow diagram for each level 1 process.

For example, Figure 24.5 shows a level 2 data flow diagram for process 4, Reorder stock (Figure 24.4). Note the numbering scheme. Processes 4.1, 4.2, 4.3, 4.4, and 4.5 are sub-processes of level 1 process 4.


Figure 24.5  A level 2 data flow diagram for process 4.

24.4.8.2 Local and global data

Global data are shared by two or more higher level processes. Local data are known only within one part of the system; intermediate computations are a good example. For example, in Figure 24.5, the data elements in data store D7, REORDER DATA are known only within the level 2 explosion of process 4 (and its sub-processes).

Mistakes made while working with local data tend to be limited in scope, but global data errors can ripple throughout the system. Local data elements should be recorded in the data dictionary and identified as local. If they already exist, they might not be local; perhaps a global data element was overlooked.

24.4.8.3 Balancing the level 2 explosion

An exploded data flow diagram must be balanced by accounting for each input from the parent level and each output to the parent level. Checking to ensure that an explosion is balanced is similar to tracing data elements from their destination (output) back to their source (input). The only difference is that the higher-level process’s outputs are traced back to the higher-level process’s inputs through the exploded data flow diagram.

Every global data element (or composite) input to the lower level must be used by at least one lower-level sub-process. Every global data element (or composite) output to the higher level must either be input to the lower level or generated by an algorithm within a lower-level sub-process. Each data element or composite input to or used by an exploded process must be defined in the higher-level process.

Note that a higher-level composite might be decomposed into data elements or sub-composites at the lower level. Local data (by definition) are neither input to nor output from the explosion.

24.4.9 Functional primatives

A functional primitive is a process (or transform) that requires no further decomposition. The process description for a functional primitive is sometimes called a mini-spec. The system’s discrete physical components lie one step below a functional primitive.

24.4.10 The configuration item level

The functional primitives and the data stores that appear at the lowest level of decomposition are called configuration items. A configuration item is a composite rather than a specific physical component; for example, a composite item might represent a program and the computer on which it runs, or a database and the device on which it resides. In a complete logical model, all the processes are decomposed down to the configuration item level, an imaginary line that links the system’s configuration items.

24.4.11 The complete logical model

A logical model consists of a complete set of balanced data flow diagrams, a data dictionary, and one process description for each process at each level down to the configuration item level. Note that some processes will be exploded only to level 2, others to level 3, and so on, so the configuration item level does not necessarily correspond to a single, consistent data flow diagram level.

The documentation package for a large system can be quite lengthy. Processes above the configuration item level are purely logical; their process descriptions consist of little more than lists of sub-processes. Those sub-processes can be obtained from the exploded data flow diagram, so some organizations exclude them from the finished model process descriptions above the configuration item level.

The configuration item level processes will decompose into the system’s programs and procedures. The data stores will map into files and databases. The data flows will become reports, screens, forms, and dialogues. Above the configuration item level, the logical relationships between the components support planning, coordination, and control.

24.4.12 Logical and physical data flow diagrams

A logical data flow diagram’s symbols are used to describe logical not physical entities. A process might eventually be implemented as a computer program, a subroutine, or a manual procedure. A data store might represent a database, a file, a book, a folder in a filing cabinet, or even notes on a sheet of paper. Data flows show how the data move between the system’s components, but they do not show the flow of control. The idea is to create a logical model that focuses on what the system does while disregarding the physical details of how it works.

A physical data flow diagram uses data flow diagram symbols to represent the system’s physical processes (programs, manual procedures) and physical data stores (files, databases, reports, screens, etc.) and shows how the system works. Some analysts like to start the analysis process by preparing a physical data flow diagram of the present system. Following the analysis stage, physical data flow diagrams can be used to document alternative solutions.

24.5 Key terms

Balance —
A characteristic of an exploded data flow diagram in which each input from and output to the parent level is accounted for.
Composite —
A set of related data elements; a data structure.
Configuration item —
A functional primitive that appears at the lowest level of decomposition.
Configuration item level —
An imaginary line that links the system’s configuration items.
Context diagram (level 0 data flow diagram) —
A data flow diagram that documents the system’s boundaries by highlighting its sources and destinations.
Data flow —
Data in motion.
Data flow diagram —
A logical model of the flow of data through a system.
Data store
Data at rest; implies that the data are held between processes.
Data structure —
A set of related data elements; a composite.
Destination (sink) —
A person, organization, or other system that gets data from the target system; a destination defines a system boundary.
Explode —
To decompose a process in a data flow diagram to a lower level.
Functional decomposition —
The act of exploding a data flow diagram.
Functional primitive —
A process (or transform) that requires no further decomposition.
Global data —
Data elements or composites that are shared by two or more processes.
Level 1 data flow diagram —
A data flow diagram that shows the system’s primary processes, data stores, sources, and destinations linked by data flows.
Level 2 data flow diagram —
An explosion of a level 1 process.
Local data —
Data elements or composites that are known only within one part of the system.
Logical data flow diagram —
A data flow diagram that does not suggest physical references but shows the system’s components as logical entities.
Mini-spec —
The process description for a functional primitive.
Physical data flow diagram —
A data flow diagram that identifies the system’s physical processes and physical data stores.
Process (transform) —
An activity that changes, moves, or otherwise transforms data.
Source —
A person, organization, or other system that supplies data to the target system; a source defines a system boundary.

24.6 Software

Many CASE products support creating, modifying, maintaining, and balancing data flow diagrams. Charting programs, such as Visio and Micrografx’s Flowcharter can be used to create data flow diagrams. The data flow diagrams in this # were created using Visio.

24.7 References

1.  Davis, W. S., Business Systems Analysis and Design, Wadsworth, Belmont, CA, 1994.
2.  DeMarco, T., Structured Analysis and System Specification, Yourdon, New York, 1978.
3.  Gane, C., Rapid System Development, Rapid System Development, New York, 1987.
4.  Gane, C. and Sarson, T., Structured Systems Analysis: Tools and Techniques, Prentice-Hall, Englewood Cliffs, NJ, 1979.
5.  Martin, J. and McClure, C., Diagramming Techniques for Analysts and Programmers, Prentice-Hall, Englewood Cliffs, NJ, 1985.
6.  Thayer, R. H. and Dorfman, M., System and Software Requirements Engineering, IEEE Computer Society Press, Los Alamitos, CA, 1990.
7.  Yourdon, E. and Constantine, L. L., Structured Design, Prentice-Hall, Englewood Cliffs, NJ, 1979.
<ch23 toc ch25>