RESEARCH STATEMENT: Kuan-ming Lin, Feb. 5, 2006. Since I took the Database course taught by Prof. Yang, I have been attracted by the excitement of data mining. Broadly speaking, I am interested in finding useful patterns in large stores of data that can be handled only by computers. In earlier years, such large size of data was collected by business or organizations and kept in proprietary data warehouses. Therefore, many data mining techniques were specialized to tackle individual cases, and their generalizability has not been deeply studied. Today, as the Internet has become the largest information storage worldwide, it is possible for the public to perform data analysis without access to proprietary databases, so it is high time I participated in data mining research. In particular, I am curious in mining two categories of public-accessible electronic data, namely biological databases and Web documents. In recent years, a number of repositories of biological data has been published on the Internet, freely and publicly available to the global community. Notably, the emerging of high throughput methods contributes to plentiful public databases measuring different aspects of biological processes. The data collectively are believed to provide more information than what they were explained in individual studies. For example, we could assign function to unknown genes by looking at their expression correlations across multiple microarrays. As a result, how to extract complementary information across heterogeneous datasets is an appealing data mining problem for discovering unknown biological relations. Since there is usually no underlying theoretical model available for biological datasets, this type of data mining is very challenging and worth studying. As the rapid growing of textual information on the Web, text mining on Web documents is receiving much attention. That is, retrieving interesting information from Web documents could have a high commercial potential value. In academic research, Web text mining is a unique challenge over traditional information retrieval in that Web documents are by nature highly unstructured but interconnected through hyperlinks. Therefore, designing new techniques for mining Web document could lead to great applications, and I will be happy to be involved in related research. Throughout the last year, I started to develop skills in my interested data mining subfields: My courses and initiation project for my second year focus on bioinformatics and data analysis, and I gained experiences on Web text mining through my research internship in Taiwan. In this year, I hope to continue my research in data mining, and find a related research group in Duke who can support me throughout my PhD study.