GraphGridLayout: public GraphLayout¶
Graph layout algorithm on layered graph layout approach. For simplicity all the nodes are placed in a grid.
Basic familiarity with graph algorithms is recommended.
Vertex, node, block - read description of graph for definition. Within this text vertex and node are used interchangeably with block due to code being written for visualizing basic block control flow graph.
edge - read description of graph for definition for precise definition.
DAG - directed acyclic graph, graph using directed edges which doesn’t have cycles. DAG may contain loops if following them would require going in both directions of edges. Example 1->2 1->3 3->2 is a DAG, 2->1 1->3 3->2 isn’t a DAG.
DFS - depth first search, a graph traversal algorithm
toposort - topological sorting, process of ordering a DAG vertices that all edges go from vertices earlier in the toposort order to vertices later in toposort order. There are multiple algorithms for implementing toposort operation. Single DAG can have multiple valid topological orderings, a toposort algorithm can be designed to prioritize a specific one from all valid toposort orders. Example: for graph 1->4, 2->1, 2->3, 3->4 valid topological orders are [2,1,3,4] and [2,3,1,4].
High level structure of the algorithm
select subset of edges that form a DAG (remove cycles)
toposort the DAG
choose a subset of edges that form a tree and assign layers
assign node positions within grid using tree structure, child subtrees are placed side by side with parent on top
perform edge routing
calculate column and row pixel positions based on node sizes and amount edges between the rows
[optional] layout compacting
Contrary to many other layered graph drawing algorithm this implementation doesn’t perform node reordering to minimize edge crossing. This simplifies implementation, and preserves original control flow structure for conditional jumps ( true jump on one side, false jump on other). Due to most of control flow being result of structured programming constructs like if/then/else and loops, resulting layout is usually readable without node reordering within layers.
Description of grid.
To simplify the layout algorithm initial steps assume that all nodes have the same size and edges are zero width. After placing the nodes and routing the edges it is known which nodes are in in which row and column, how many edges are between each pair of rows. Using this information positions are converted from the grid cells to pixel coordinates. Routing 0 width edges between rows can also be interpreted as every second row and column being reserved for edges. The row numbers in code are using first interpretation. To allow better centering of nodes one above other each node is 2 columns wide and 1 row high.
1-2 Cycle removal and toposort
Cycle removal and toposort are done at the same time during single DFS traversal. In case entrypoint is part of a loop DFS started from entrypoint. This ensures that entrypoint is at the top of resulting layout if possible. Resulting toposort order is used in many of the following layout steps that require calculating some property of a vertex based on child property or the other way around. Using toposort order such operations can be implemented iteration through array in either forward or reverse direction. To prevent running out of stack memory when processing large graphs DFS is implemented non-recursively.
Rows are assigned in toposort order from top to bottom, with nodes row being max(predecessor.row)+1. This ensures that loop edges are only ones going from deeper levels to previous layers.
To further simply node placement a subset of edges is selected which forms a tree. This turns DAG drawing problem into a tree drawing problem. For each node in level n following nodes which have level exactly n+1 are greedily assigned as child nodes in tree. If a node already has parent assigned then corresponding edge is not part of tree.
Node position assignment
Since the graph has been reduced to a tree, node placement is more or less putting subtrees side by side with parent on top. There is some room for interpretation what exactly side by side means and where exactly on top is. Drawing the graph either too dense or too big may make it less readable so there are configuration options which allow choosing these things resulting in more or less dense layout.
Once the subtrees are placed side by side. Parent node can be placed either in the middle of horizontal bounds or in the middle of direct children. First option results in narrower layout and more vertical columns. Second option results in nodes being more spread out which may help seeing where each edge goes.
In more compact mode two subtrees are placed side by side taking into account their shape. In wider mode bounding box of shorter subtree is used instead of exact shape. This gives slightly sparse layout without it being too wide.
Edge routing can be split into: main column selection, rough routing, segment offset calculation.
Transition from source to target row is done using single vertical segment. This is called main column.
Rough routing creates the path of edge using up to 5 segments using grid coordinates. Due to nodes being placed in a grid. Horizontal segments of edges can’t intersect with any nodes. The path for edges is chosen so that it consists of at most 5 segments, typically resulting in sideways U shape or square Z shape.
short vertical segment from node to horizontal line
move to empty column
vertical segment between starting row and end row, an empty column can always be found, in the worst case there are empty columns at the sides of drawing
horizontal segment to target node column
short vertical segment connecting to target node
There are 3 special cases:
source and target nodes are in the same column with no nodes between - single vertical segment
column bellow stating node is empty - segments 1-3 are merged
column above target node is empty - segments 3-5 are merged Vertical segment intersection with nodes is prevented using a 2d array marking which vertical segments are blocked and naively iterating through all rows between start and end at the desired column.
After rough routing segment offsets are calculated relative to their corresponding edge column. This ensures that two segments don’t overlap. Segment offsets within each column are assigned greedily with some heuristics for assignment order to reduce amount of edge crossings and result in more visually pleasing output for a typical CFG graph. Each segment gets assigned an offset that is maximum of previously assigned offsets overlapping with current segment + segment spacing. Assignment order is chosen based on: direction of previous and last segment - helps reducing crossings and place the segments between nodes segment length - reduces crossing when segment endpoints have the same structure as valid parentheses expression edge length - establishes some kind of order when single node is connected to many edges, typically a block with switch statement or block after switch statement.
Doing the layout within a grid causes minimal spacing to be limited by widest and tallest block within each column and row. One common case is block with function entrypoint being wider due to function name causing wide horizontal space between branching blocks. Another case is rows in two parallel columns being aligned.
Both problems are mitigated by squishing graph. Compressing in each of the two direction is done separately. The process is defined as liner program. Each variable represents a position of edge segment or node in the direction being optimized.
Following constraints are used
Keep the order with nearest segments.
If the node has two outgoing edges, one to the node on left side and other to the right, keep them on the corresponding side of node’s center.
For all edges keep the node which is above above. This helps when vertical block spacing is set bigger than double edge spacing and edge shadows relationship between two blocks.
Equality constraint to keep relative position between nodes and and segments directly connected to them.
Equality constraint to keep the node centered when control flow merges In the vertical direction objective function minimizes y positions of nodes and lengths of vertical segments. In the horizontal direction objective function minimizes lengths of horizontal segments.
In the resulting linear program all constraints beside x_i >= 0 consist of exactly two variables: either x_i - x_j <= c_k or x_i = x_j + c_k.
Since it isn’t necessary get perfect solution and to avoid worst case performance current implementation isn’t using a general purpose linear programming solver. Each variable is changed until constraint is reached and afterwards variables are grouped and changed together.
CalculateLayout(Graph &blocks, ut64 entry, int &width, int &height) const override¶