Recent decades have seen large advancements in the development and application of machine learning (ML) algorithms. Objective performance evaluation of ML techniques is now of paramount importance in justifying their real-world use as we start to increasingly rely on them for AI-based automated systems. Worryingly, potential inadequacies in the current accepted methodology for testing classification algorithms have been reported for over a decade now. For unbiased evaluation of classifiers, and improved understanding, a more diverse and expansive test instance space is required. When such a space is absent, it must be created. Therefore, the motivation of this project is to generate new test instances with the aim of enriching the diversity of the instance space. Datasets will be generated through Gaussian Mixture Modelling (GMM) to possess features that place them in sparse regions in the instance space or extend the boundaries of the instance space. Finding GMM parameters to generate datasets with controllable properties is a continuous black-box minimization problem, and will enable test instances to lie at target locations in the instance space. The resulting instance space will potentially enable greater insights than can be afforded by the current UCI repository.
The University of Melbourne
Kulunu Dharmakeerthi is a third-year Mathematics major studying at the University of Melbourne. Having previously been involved in applied mathematics research at the University of California, Berkeley, Kulunu is eager to continue engaging in research that tackles emerging real-world problems. Recently, he has been drawn to rapidly developing areas in machine learning and statistical theory and is keen to explore at the frontiers of these fields.