Having a bilingual document-aligned training corpus makes it
possible to use adaptations of other IR approaches:
- GVSM (generalized vector space model): Represent terms in either
language by their pattern of occurrence in the training corpus.
- Pseudo-relevance feedback: Use query in language X to retrieve
documents in language X. Replace returned documents by their
equivalents in language Y. Use the top documents in language Y as
a query against the full document collection in language Y.
Paper
describing these ideas by a group at CMU's language technology institute.
Next: TREC-6 EXPERIMENTS
Up: CL-LSI
Previous: Early Results