The *watershed graph* is an undirected graph containing one
node for each watershed, and edges between watersheds that are
adjacent. The edges are labeled with the lowest elevation that occurs
along the boundary between the two watersheds corresponding to that
edge.

In order to identify watersheds we must route flow for each cell in the (unfilled) terrain. Flow routing on the unfilled terrain can be accomplished for most cells by simply inspecting the neighboring cells. In order to assign directions in plateaus, it is necessary to examine the whole flat region. Plateaus have one or more cells on their boundary which have downslope neighbors. On each plateau, directions are assigned by doing a breadth-first search (BFS) starting from all the spill points of the plateau. When a node is discovered for the first time, it sets its flow direction towards the node which discovered it. This partitions the total flow between the multiple spill points.

Once we have determined flow directions for slopes and plateaus we
can identify the sinks, which are the only cells with flow directions
undefined. We give each sink a *watershed label*, which we now
need to assign to each cell in the corresponding watershed. We
*sweep* the terrain bottom-up with a horizontal plane,
propagating watershed labels from each cell to its neighbors which
flow into it. The sweep plane touches the cells in the grid in
*reverse topological order* of the flow directions; when a cell
*u* is processed, all the cells that it flows into have already
been touched by the sweep plane and hence have set the watershed label
of *u* to their value.

During the sweep, we propagate the watershed labels up, from each
cell up to all neighbors that flow into it. The straightforward
solution would be to keep a watershed label grid *W* and to
access individual cells *W*_{ij}as needed along
the way. However, this may result in *O*(*N*) scattered
accesses to *W* because cells are processed in topological order
and are not necessarily well clustered spatially in this order.
Instead we maintain an *I/O efficient priority queue* which
contains the watershed label (``sent forward'' to the cell from one of
its downslope neighbors). When we process a grid cell during the
sweep, its watershed label is propagated to its upslope neighbors by
inserting an element for each neighbor into the priority queue with
key equal to the priority of the neighbor. The watershed label can
then be found by performing `extract_min` operations on the
priority queue. Because a cell can have multiple outgoing edges, we
may need to perform several `extract_min` operations. If a
cell receives multiple watershed labels, it must be on a boundary
between watersheds. In this case we add an edge to the watershed graph
labeled to indicate that the two watersheds merge at the elevation of
this cell. We later eliminate all but the lowest edge between two
watersheds. The algorithm for identifying watersheds and building the
watershed graph is outlined below:

- Assign flow directions on the slopes.
- Identify the flat areas (plateaus and sinks).
- Assign flow directions on the plateaus.
- Sort the grid in reverse topological order of flow directions (bottom to top of terrain).
- Assign to each sink a different watershed label. Scan through
the cells in the sorted grid using a priority queue to propagate
the watershed labels up. For each cell:
- Determine the watershed label of the current cell. When two watersheds meet at a cell, add an edge to the watershed graph labeled with the height.
- Propagate the current label to all neighbors which flow into the current cell.

- Sort the edges in the watershed graph and retain the lowest edge between each pair of adjacent watersheds.

Step 1 uses *Scan(N)* I/Os. Steps 2, 3, 4 and 6 use
*Sort(N)* I/Os. Step 5 can also be done in *Sort(N)* by using
a standard I/O-efficient priority queue. Thus, partitioning the
terrain into watersheds and constructing the watershed graph can be
done in *Sort(N)* I/Os.

<laura@cs.duke.edu> Last modified: Tue Apr 17 11:32:34 2001 by laura.