The course will provide a broad overview and introduction to bioinformatics and its applications in the pharmaceutical industry. Topics will cover (1) basic bioinformatics methods: hierarchical clustering, lasso, random forest, LDA, PCA, boosting, bootstrapping, etc. (2) data sequencing and management: microarray data, GWAS data, the raw data treatment and analysis method, batch effect and normalization, parallel programming in R; (3) phylogenic analysis; (4) Chemobioinformatics modelling, 3D structure, chemical – protein relation leading to drug discovery.
Topic |
Contents/fundamental concepts |
- Basic bioinformatics methods
|
- Hierarchical clustering
- Penalized regression, lasso
- Trees, random forest
- Support vector machines
- Factor analysis, LDA, PCA
- Bootstrapping, cross-validation
- Adaptive boosting
|
- Data sequencing and management
|
- Gene expression data sequencing, treatment, and analysis
- GWAS data sequencing, treatment, and analysis
- Data normalization and removal of a batch effect
- Parallel programming in R
|
- Phylogenic analysis
|
- RNA, DNA sequence formatting, FASTA, Philip
- Translation to protein sequence, CLUSTAL
- Substitution matrix calculation, distance
- Building phylogenic trees, PHYLIP, Bioconductor
- Minimal parsimony tree, Bayesian phylogenic tree
- Tree bootstrapping and significance
|
- Chemobioinformatics modelling
|
- 3D structure representation
- chemical – protein relation leading to drug discovery
- Application of bioinformatics data in a clinical trial
|