BIOS6005 Pharmaceutical Bioinformatics

The course will provide a broad overview and introduction to bioinformatics and its applications in the pharmaceutical industry. Topics will cover (1) basic bioinformatics methods: hierarchical clustering, lasso, random forest, LDA, PCA, boosting, bootstrapping, etc. (2) data sequencing and management: microarray data, GWAS data, the raw data treatment and analysis method, batch effect and normalization, parallel programming in R; (3) phylogenic analysis; (4) Chemobioinformatics modelling, 3D structure, chemical – protein relation leading to drug discovery.

Topic	Contents/fundamental concepts
Basic bioinformatics methods	Hierarchical clustering Penalized regression, lasso Trees, random forest Support vector machines Factor analysis, LDA, PCA Bootstrapping, cross-validation Adaptive boosting
Data sequencing and management	Gene expression data sequencing, treatment, and analysis GWAS data sequencing, treatment, and analysis Data normalization and removal of a batch effect Parallel programming in R
Phylogenic analysis	RNA, DNA sequence formatting, FASTA, Philip Translation to protein sequence, CLUSTAL Substitution matrix calculation, distance Building phylogenic trees, PHYLIP, Bioconductor Minimal parsimony tree, Bayesian phylogenic tree Tree bootstrapping and significance
Chemobioinformatics modelling	3D structure representation chemical – protein relation leading to drug discovery Application of bioinformatics data in a clinical trial