But how do we use this in a cross-language setting?
Document-aligned corpora!
If we want an English-French system, we need to ``train'' it on a bunch of English documents (paragraphs), along with semantically equivalent documents in French.
We need to identify which documents go together, nothing more (like sentence or word equivalence).