In this exercise, we will use DNAse hypersensitivity data to explore range operations.
AnnotationHub
package to find the
goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep1.narrowPeak.gz
GRanges
object. Load that into R as the variable
dnase.library(AnnotationHub)
ah = AnnotationHub()
query(ah, "goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep1.narrowPeak.gz")
# the thing above should have only one record, so we can
# just grab it
dnase = query(ah, "goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep1.narrowPeak.gz")[[1]]
dnase
class(dnase)
mcols(dnase)
library(GenomicFeatures)
table(seqnames(dnase))
summary(width(dnase))
seqlevelsStyle
to adjust the sequence names.seqlevels(dnase)
seqlevelsStyle(dnase)
seqlevelsStyle(dnase) = 'ensembl'
seqlevelsStyle(dnase)
seqlevels(dnase)
sum(width(dnase))
sum(seqlengths(dnase))
sum(width(dnase))/sum(seqlengths(dnase))
In this exercise, we are going to load the second DNAse hypersensitivity replicate to investigate overlaps with the first replicate.
goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep1.narrowPeak.gz
.
Load that as dnase2
.query(ah, "goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep2.narrowPeak.gz")
# the thing above should have only one record, so we can
# just grab it
dnase2 = query(ah, "goldenpath/hg19/encodeDCC/wgEncodeUwDnase/wgEncodeUwDnaseK562PkRep2.narrowPeak.gz")[[1]]
dnase
and
dnase2
? Are there are similar number?length(dnase)
length(dnase2)
dnase2
?summary(width(dnase2))
dnase2
cover?sum(width(dnase))/sum(seqlengths(dnase))
dnase
that overlap with
dnase2
.sum(dnase %over% dnase2)
resize
(and pay attention to the
fix
argument) and width
to do this expansion
to dnase and call the new GRanges
object “dnase_wide”.w = width(dnase)
dnase_wide = resize(dnase, width=w+100, fix='center') #make a copy
width(dnase_wide)
In this exercise, we are going to look at the overlap of DNAse sites
relative to genes/transcripts. To get started, install and load the
TxDb.Hsapiens.UCSC.hg19.knownGene
txdb object.
BiocManager::install("TxDb.Hsapiens.UCSC.hg19.knownGene")
library("TxDb.Hsapiens.UCSC.hg19.knownGene")
kg = TxDb.Hsapiens.UCSC.hg19.knownGene
library("TxDb.Hsapiens.UCSC.hg19.knownGene")
kg = TxDb.Hsapiens.UCSC.hg19.knownGene
tx = transcripts(kg)
class(tx)
length(tx)
flank
method for GRanges objects. How
could you use that to get the “promoter” regions of the transcripts?
Let’s assume that the promoter region is 2kb upstream of the gene.flank(tx,2000)
?promoters
).proms = promoters(kg)
summary(countOverlaps(proms))
reduce
.prom_regions = reduce(proms)
summary(countOverlaps(prom_regions))
sum(dnase %over% prom_regions)
# if you notice no overlap, check the seqlevels
# and seqlevelsStyle
seqlevelsStyle(dnase) = "UCSC"
sum(dnase %over% prom_regions)
sum(dnase2 %over% prom_regions)
prop_proms = sum(width(prom_regions))/sum(seqlengths(prom_regions))
prop_dnase = sum(width(dnase))/sum(seqlengths(prom_regions))
# Iff the dnase and promoter regions are
# not related, then we would expect this number
# of DNAse overlaps with promoters.
prop_proms * prop_dnase * length(dnase)