CAREER: Mining Genome-Wide Chemical Structure-Activity Relationships in Emergent Chemical Genomics Databases

NSF Award: 0845931
PI: Jun Huan
University of kansas
Lawrence, KS, 66046


The objective of this research is to develop an integrated research and education program for advancing the underlying theoretical and computational principles of data mining in the emergent chemical genomics databases. In particular, our group will focus on the following three core technical challenges:

  • Developing effective kernel and non-kernel based representations of discrete structures that capture the intrinsic characteristics of the chemical space
  • Designing methods for similarity search in large chemical databases and methods for predictive model construction in connecting the chemical space to a biological space
  • Deriving application oriented validations

A key feature of this work is the application of the theoretic and computational advancements to real-world problems, namely, genome-wide protein-chemical interaction prediction, chemical toxicity prediction based on microarray gene expression profiles, and high-throughput drug screening. We are working closely with our collaborators in academia, industry, and government agencies to evaluate insights discovered through the data mining practice. Though the project is primarily focused on applications in the Bioinformatics domain, the gained data mining knowledge is applicable to a wide range of applications where input data or output results have intrinsic structures; examples of such applications include many hot topics in social network analysis, stream data analysis, and wireless sensor network analysis.