Research Title

Genetic Variation in miRNA Primary Transcripts

Research Objectives

Investigate the genetic variation in human populations based on data from the Phase III of 1000 genomes project.
Identify SNPs that are unique to certain populations and, hence, may have emerged due to positive selection.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

The R code requires a specific directory structure. Create an R project (anywhere) from this repository and follow the structure below:

 ─── miRNA                   # The project directory.
     ├── R                   # The folder containing .R files
     ├── miRNA.gff3          # The file containing miRNA genome coordinates.
     └── VCF                 # The folder containing VCF files. All VCF files for a specific population must be
                               placed in a folder named as the population code. 
           ├── ALL           # The folder containing the VCF data for all populations. Downloaded from 1000 
                               genomes project.           
           ├── ACB           # The folder containing the VCF files for population 1. Each of these folders must
                               contain the VCF files for this population seperated by chromosome number (1-22).
               ├── chr1.vcf.gz
               ├── chr2.vcf.gz
               ├── chr3.vcf.gz
               .
               .
               .
               └── chr22.vcf.gz
           ├── ASW
           ├── BEB
           .
           .
           .
           └── YRI

Preparing Data

Obtain the VCF folder with the correct directory structure from this repository. This folder contains samples.txt for each population, which contains the sample names for that population.
Download the original VCF files from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/, and store them in a folder named ALL under the VCF folder.
Run the following command in BASH from the VCF directory. This command subsets the VCF data for one population from the original VCF files. Run this command once for each population (Replace ??? with populations code. ex: 'ACB').
```
   for file in ALL/*.vcf.gz; do echo "Subsetting $(basename $file)"; bcftools view --min-ac=1 --force-samples -Oz -S ???/samples.txt $file > ???/$(basename $file); done
```
Download miRNA.gff3.zip from this repository and unzip it under the miRNA directory.

You are now ready to run the analysis using main.R.

Running the analysis

Simply follow main.R to run the analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
R		R
VCF		VCF
plots		plots
Presentation.pdf		Presentation.pdf
README.md		README.md
Research Report.pdf		Research Report.pdf
miRNA.gff3.zip		miRNA.gff3.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Title

Research Objectives

Getting Started

Prerequisites

Preparing Data

Running the analysis

About

Uh oh!

Releases

Packages

Languages

bagherig/miRNA

Folders and files

Latest commit

History

Repository files navigation

Research Title

Research Objectives

Getting Started

Prerequisites

Preparing Data

Running the analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages