Aligning French and German

One group provided a (noisy) alignment of SDA-G and SDA-F, since these corpora had a common set of descriptor tags. Used date, place, and descriptor tags.

Some bad links (two different terrorist stories in Jerusalem), and some variations in story content (as more details are added during the day). Didn't want these to throw off LSI.

``Boostripping'': Start with 80k aligned French-German pairs, train, identify bad matches, remove them, repeat. At 40k, mate-retrieval performance peaked, so we considered that set sufficiently well aligned.


next up previous
Next: Aligning German and English Up: TREC-6 EXPERIMENTS Previous: Challenge