We walk through the evolution from simple feedforward networks to powerful transformers and graph models, showing how richer data structures demand more sophisticated architectures. More importantly, we uncover how embedding spaces allow different modalities to interact.