BIOS6005 Pharmaceutical Bioinformatics

The course will provide a broad overview and introduction to bioinformatics and its applications in the pharmaceutical industry. Topics will cover (1) basic bioinformatics methods: hierarchical clustering, lasso, random forest, LDA, PCA, boosting, bootstrapping, etc. (2) data sequencing and management: microarray data, GWAS data, the raw data treatment and analysis method, batch effect and normalization, parallel programming in R; (3) phylogenic analysis; (4) Chemobioinformatics modelling, 3D structure, chemical – protein relation leading to drug discovery.

Topic Contents/fundamental concepts
  1. Basic bioinformatics methods
  • Hierarchical clustering
  • Penalized regression, lasso
  • Trees, random forest
  • Support vector machines
  • Factor analysis, LDA, PCA
  • Bootstrapping, cross-validation
  • Adaptive boosting


  1. Data sequencing and management
  • Gene expression data sequencing, treatment, and analysis
  • GWAS data sequencing, treatment, and analysis
  • Data normalization and removal of a batch effect
  • Parallel programming in R


  1. Phylogenic analysis
  • RNA, DNA sequence formatting, FASTA, Philip
  • Translation to protein sequence, CLUSTAL
  • Substitution matrix calculation, distance
  • Building phylogenic trees, PHYLIP, Bioconductor
  • Minimal parsimony tree, Bayesian phylogenic tree
  • Tree bootstrapping and significance
  1. Chemobioinformatics modelling
  • 3D structure representation
  • chemical – protein relation leading to drug discovery
  • Application of bioinformatics data in a clinical trial