Cross-language LSI

But how do we use this in a cross-language setting?

Document-aligned corpora!

If we want an English-French system, we need to ``train'' it on a bunch of English documents (paragraphs), along with semantically equivalent documents in French.

We need to identify which documents go together, nothing more (like sentence or word equivalence).


next up previous
Next: CL-LSI Math Up: CL-LSI Previous: Properties