# Identifying watersheds and constructing the Watershed Graph

The watershed graph is an undirected graph containing one node for each watershed, and edges between watersheds that are adjacent. The edges are labeled with the lowest elevation that occurs along the boundary between the two watersheds corresponding to that edge.

In order to identify watersheds we must route flow for each cell in the (unfilled) terrain. Flow routing on the unfilled terrain can be accomplished for most cells by simply inspecting the neighboring cells. In order to assign directions in plateaus, it is necessary to examine the whole flat region. Plateaus have one or more cells on their boundary which have downslope neighbors. On each plateau, directions are assigned by doing a breadth-first search (BFS) starting from all the spill points of the plateau. When a node is discovered for the first time, it sets its flow direction towards the node which discovered it. This partitions the total flow between the multiple spill points.

Once we have determined flow directions for slopes and plateaus we can identify the sinks, which are the only cells with flow directions undefined. We give each sink a watershed label, which we now need to assign to each cell in the corresponding watershed. We sweep the terrain bottom-up with a horizontal plane, propagating watershed labels from each cell to its neighbors which flow into it. The sweep plane touches the cells in the grid in reverse topological order of the flow directions; when a cell u is processed, all the cells that it flows into have already been touched by the sweep plane and hence have set the watershed label of u to their value.

During the sweep, we propagate the watershed labels up, from each cell up to all neighbors that flow into it. The straightforward solution would be to keep a watershed label grid W and to access individual cells Wijas needed along the way. However, this may result in O(N) scattered accesses to W because cells are processed in topological order and are not necessarily well clustered spatially in this order. Instead we maintain an I/O efficient priority queue which contains the watershed label (``sent forward'' to the cell from one of its downslope neighbors). When we process a grid cell during the sweep, its watershed label is propagated to its upslope neighbors by inserting an element for each neighbor into the priority queue with key equal to the priority of the neighbor. The watershed label can then be found by performing extract_min operations on the priority queue. Because a cell can have multiple outgoing edges, we may need to perform several extract_min operations. If a cell receives multiple watershed labels, it must be on a boundary between watersheds. In this case we add an edge to the watershed graph labeled to indicate that the two watersheds merge at the elevation of this cell. We later eliminate all but the lowest edge between two watersheds. The algorithm for identifying watersheds and building the watershed graph is outlined below:

1. Assign flow directions on the slopes.
2. Identify the flat areas (plateaus and sinks).
3. Assign flow directions on the plateaus.
4. Sort the grid in reverse topological order of flow directions (bottom to top of terrain).
5. Assign to each sink a different watershed label. Scan through the cells in the sorted grid using a priority queue to propagate the watershed labels up. For each cell:
• Determine the watershed label of the current cell. When two watersheds meet at a cell, add an edge to the watershed graph labeled with the height.
• Propagate the current label to all neighbors which flow into the current cell.
6. Sort the edges in the watershed graph and retain the lowest edge between each pair of adjacent watersheds.

Step 1 uses Scan(N) I/Os. Steps 2, 3, 4 and 6 use Sort(N) I/Os. Step 5 can also be done in Sort(N) by using a standard I/O-efficient priority queue. Thus, partitioning the terrain into watersheds and constructing the watershed graph can be done in Sort(N) I/Os.

<laura@cs.duke.edu>