The course will provide a broad overview and introduction to bioinformatics and its applications in the pharmaceutical industry. Topics will cover (1) basic bioinformatics methods: hierarchical clustering, lasso, random forest, LDA, PCA, boosting, bootstrapping, etc. (2) data sequencing and management: microarray data, GWAS data, the raw data treatment and analysis method, batch effect and normalization, parallel programming in R; (3) phylogenic analysis; (4) Chemobioinformatics modelling, 3D structure, chemical – protein relation leading to drug discovery.
| Topic | Contents/fundamental concepts | 
| 
Basic bioinformatics methods | 
Hierarchical clusteringPenalized regression, lassoTrees, random forestSupport vector machinesFactor analysis, LDA, PCABootstrapping, cross-validationAdaptive boosting   | 
| 
Data sequencing and management | 
Gene expression data sequencing, treatment, and analysisGWAS data sequencing, treatment, and analysisData normalization and removal of a batch effectParallel programming in R   | 
| 
Phylogenic analysis | 
RNA, DNA sequence formatting, FASTA, PhilipTranslation to protein sequence, CLUSTALSubstitution matrix calculation, distanceBuilding phylogenic trees, PHYLIP, BioconductorMinimal parsimony tree, Bayesian phylogenic treeTree bootstrapping and significance | 
| 
Chemobioinformatics modelling | 
3D structure representationchemical – protein relation leading to drug discoveryApplication of bioinformatics data in a clinical trial   |