18 February 2008
University of Otawa
Stan Matwin

A practical application (digital games-based learning) challenged us with a learning task in which there is a limited set of labeled instances, and a large set of unlabeled ones. This setting is often known as semi-supervised learning. Co-training is an effective semi-supervised learning method. In the co-training process, random sampling is used to select some unlabeled instances from a large unlabeled instance pool. Other research, particularly active learning, has shown that random sampling often yields suboptimal performance. We explore the use of a confidence-based sampling method for co-training. Experimental results show that this method may significantly improve the original Mitchell/Blum co-training algorithm. The approach is of interest when one needs to grow additional labeled data to learn a classifier with certain properties (e.g. interpretability) , while any classifier can be used in the generation of this additional data. This is joint work with J. Sayyad Shirabad, J. Huang, J. Su. Stan Matwin is a professor at the School of Information Technology and Engineering, University of Ottawa, where he directs the Text Analysis and Machine Learning (TAMALE) lab. His research is in machine learning, data mining, and their applications, and in data privacy. Stan has worked at universities in Canada, the U.S., Europe and Latin America, where in 1997 he held the UNESCO Distinguished Chair in Science and Sustainable Development. Former president of the Canadian Society for the Computational Studies of Intelligence (CSCSI) and of the IFIP Working Group 12.2 (Machine Learning). Recipient of the Communications and Information Technology Ontario Champion of Innovation Award. Co-founder of Distil Interactive Inc.

Institution department: 
School of Information Technology and Engineerin