Computational (Optimal) Transport for Domain Adaptation

Modern machine learning tasks often requires the access to high volumes of labelled data to produce exceptional and reliable performance. But even when given a large amount of both data and computing resources, it can still be challenging to apply a well-trained model to new sets of data in the real-world applications. This can be seen as due to the differences in the distributions of the new data and the ones that were used for training such model. A simple solution is to obtain more and more observations from the new sets of data, but it can expensive and requires continuous acquisition, which can be burdensome or infeasible. Domain adaptation is a sub-field of machine learning that focuses on applying learnt information from the sets of labelled source data to some target data with unknown labels. In particular, it deals with the cases where the distributions of the source and target domains are different. Such differences (called domain drift) can be caused by multiple reasons with potential physical interpretations. In the case of computer vision, the drift can occur due to the changes in the angles, lightings, backgrounds, random noises or simply due to different acquisition devices. In the task of detecting epilepsy using data from Electroencephalogram (EEG) test, deploying predictive model developed with one patient’s data to other patients can be obstructed because of the differences in patients’ conditions.

In this work, we investigate the case of unsupervised domain adaptation with a single source domain associated with the outputs/labels and a single unlabelled target domain. This is to separate from semi-supervised domain adaptation with few known outputs/labels in the target domains, and multiple domain adaptation with multiple source and target domains. In more detail, we want to explore the least effort principle approaches to tackle the problem of domain adaptation with the assumption that the domain drift is in the form of some transformation from the source to the target domains, so that the transformation is minimal with respect to some cost metric. From this principle, the domain adaptation problem can be expressed as first finding a transformation making the distribution in the source domain to be similar to that in the target domain, and then making use of the learnt transformation to estimate the target outputs/labels. Our problem can then be formulated under the framework of Optimal Transportation (OT) theory, which has been well studied and applied in multiple fields due to the ability to compute distance between probability distribution with potentially non-overlapping support spaces. With two different formulations of domain adaptation problem under OT framework, we explore their properties in some examples to provide some deeper knowledge about such formulations.

Thanh Dat Tran
Monash University