Is Multi-task Learning (MTL) Always Helpful to Improve the Original Model’s Performance?

Master's Defense
Speaker Name
Cheng Chen
Date and Time
North 311

Recently convolutional neural networks (CNNs) have become popular for image recognition tasks due to their excellent performance compared to other earlier approaches. One limitation of CNNs however is that they require substantial quantities of hand-labeled training imagery compared to other models before they achieve their performance advantage. In this circumstance, multi-task learning (MTL) has been proposed, in which a single CNN is trained to perform several recognition tasks simultaneously. Typically, this dramatically increases the amount of available training data, and has become widely-used in the literature. It is known that the performance advantages of MTL depend upon the similarity between the added tasks and the original desired task. In this work we ask the following more general question: does MTL always improve the performance of the original model?

To answer the above question, we conduct controlled experiments in which we compare multi-task learning to conventional learning, termed single-task learning (STL), and we do this as we vary the similarity of different tasks on which the CNNs are trained. Specifically, we conduct the following three groups of experiments, focused on the problem of detecting building footprints in overhead imagery (e.g., satellite imagery): baseline, STL, and MTL experiments. In the baseline experiment, a standard CNN is trained on a desired task, called the “source” task, using only data for that task. In the STL experiments, the CNNs are trained on 6 larger datasets, respectively, that are generated by pooling the source data with 6 different auxiliary datasets each. As a comparison, in the MTL experiments, an MTL model is trained on the original data and the auxiliary data at the same time. By comparing the evaluation results of the trained models from the three groups, we can know whether MTL is always helpful to improve the performance of the original model, as well as whether MTL models always perform better than the corresponding STL models. Finally, I explore several variables that may affect the performance changes of MTL models after adding the auxiliary data.

Co-advisors: John Board and Jordan Malof Committee: Carlo Tomasi