Our study shows that when one combines two state-of-the-art models, one describing images and the other regenerating them, and they interact without human input, they converge toward a small set of highly conventional visual motifs, such as lighthouses, cathedrals, and palatial interiors. This finding reveals that, even without additional training, autonomous AI feedback loops naturally drift toward common attractors—very generic-looking images, which we call “visual elevator music.”