In this exercise, we will use DNAse hypersensitivity data to explore range operations.
AnnotationHub package to find the
goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep1.narrowPeak.gz
GRanges object. Load that into R as the variable
dnase.library(AnnotationHub)
ah = AnnotationHub()
query(ah, "goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep1.narrowPeak.gz")
# the thing above should have only one record, so we can
# just grab it
dnase = query(ah, "goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep1.narrowPeak.gz")[[1]]
dnase
class(dnase)
mcols(dnase)
library(GenomicFeatures)
table(seqnames(dnase))
summary(width(dnase))
seqlevelsStyle to adjust the sequence names.seqlevels(dnase)
seqlevelsStyle(dnase)
seqlevelsStyle(dnase) = 'ensembl'
seqlevelsStyle(dnase)
seqlevels(dnase)
sum(width(dnase))
sum(seqlengths(dnase))
sum(width(dnase))/sum(seqlengths(dnase))
In this exercise, we are going to load the second DNAse hypersensitivity replicate to investigate overlaps with the first replicate.
goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep1.narrowPeak.gz.
Load that as dnase2.query(ah, "goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep2.narrowPeak.gz")
# the thing above should have only one record, so we can
# just grab it
dnase2 = query(ah, "goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep2.narrowPeak.gz")[[1]]
dnase and
dnase2? Are there are similar number?length(dnase)
length(dnase2)
dnase2?summary(width(dnase2))
dnase2 cover?sum(width(dnase))/sum(seqlengths(dnase))
dnase that overlap with
dnase2.sum(dnase %over% dnase2)
resize (and pay attention to the
fix argument) and width to do this expansion
to dnase and call the new GRanges object “dnase_wide”.w = width(dnase)
dnase_wide = resize(dnase, width=w+100, fix='center') #make a copy
width(dnase_wide)
In this exercise, we are going to look at the overlap of DNAse sites
relative to genes/transcripts. To get started, install and load the
TxDb.Hsapiens.UCSC.hg19.knownGene txdb object.
BiocManager::install("TxDb.Hsapiens.UCSC.hg19.knownGene")
library("TxDb.Hsapiens.UCSC.hg19.knownGene")
kg = TxDb.Hsapiens.UCSC.hg19.knownGene
library("TxDb.Hsapiens.UCSC.hg19.knownGene")
kg = TxDb.Hsapiens.UCSC.hg19.knownGene
tx = transcripts(kg)
class(tx)
length(tx)
flank method for GRanges objects. How
could you use that to get the “promoter” regions of the transcripts?
Let’s assume that the promoter region is 2kb upstream of the gene.flank(tx,2000)
?promoters).proms = promoters(kg)
summary(countOverlaps(proms))
reduce.prom_regions = reduce(proms)
summary(countOverlaps(prom_regions))
sum(dnase %over% prom_regions)
# if you notice no overlap, check the seqlevels
# and seqlevelsStyle
seqlevelsStyle(dnase) = "UCSC"
sum(dnase %over% prom_regions)
sum(dnase2 %over% prom_regions)
prop_proms = sum(width(prom_regions))/sum(seqlengths(prom_regions))
prop_dnase = sum(width(dnase))/sum(seqlengths(prom_regions))
# Iff the dnase and promoter regions are
# not related, then we would expect this number
# of DNAse overlaps with promoters.
prop_proms * prop_dnase * length(dnase)