In this contribution knowledge-based image understanding is treated. The knowledge is coded declaratively in a production system. Applying this knowledge to a large set of primitives may lead to high computational efforts. A particular accumulating parsing scheme trades soundness for feasibility. Per default this utilizes a bottom-up control based on the quality assessment of the object instances. The point of this work is in the description of top-down control rationales to accelerate the search dramatically. Top-down strategies are distinguished in two types: (i) Global control and (ii) localized focus of attention and inhibition methods. These are discussed and empirically compared using a particular landmark recognition system and representative aerial image data from GOOGLE-earth.