Introduction

Finding and using public genomics data such as browser or chip-seq tracks; annotation for genes, exons, transcripts; gene ontology and functional gene information; etc. often requires quite a bit of work. Bioconductor has done some of this work already by

  1. Finding and curating popular genomic resources
  2. Using “recipes” to create R object versions of these resources
  3. Make those resources available as a web service that is accessible from R

The AnnotationHub server provides easy R / Bioconductor access to large collections of publicly available whole genome resources, e.g,. ENSEMBL genome fasta or gtf files, UCSC chain resources, ENCODE data tracks at UCSC, etc.

To get started, make sure that you have the AnnotationHub package installed:

BiocManager::install('AnnotationHub')

AnnotationHub objects

The AnnotationHub package provides a client interface to resources stored at the AnnotationHub web service.

library(AnnotationHub)

The AnnotationHub package is straightforward to use. Create an AnnotationHub object

ah = AnnotationHub()
## snapshotDate(): 2022-04-21

Now at this point you have already done everything you need in order to start retrieving annotations. For most operations, using the AnnotationHub object should feel a lot like working with a familiar list or data.frame.

Lets take a minute to look at the show method for the hub object ah

ah
## AnnotationHub with 64948 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: Ensembl, BroadInstitute, UCSC, ftp://ftp.ncbi.nlm.nih.gov/g...
## # $species: Homo sapiens, Mus musculus, Drosophila melanogaster, Bos taurus,...
## # $rdataclass: GRanges, TwoBitFile, BigWigFile, EnsDb, Rle, OrgDb, ChainFile...
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["AH5012"]]' 
## 
##              title                                                         
##   AH5012   | Chromosome Band                                               
##   AH5013   | STS Markers                                                   
##   AH5014   | FISH Clones                                                   
##   AH5015   | Recomb Rate                                                   
##   AH5016   | ENCODE Pilot                                                  
##   ...        ...                                                           
##   AH104604 | Zonotrichia_albicollis.Zonotrichia_albicollis-1.0.1.ncrna.2bit
##   AH104605 | Zosterops_lateralis_melanops.ASM128173v1.cdna.all.2bit        
##   AH104606 | Zosterops_lateralis_melanops.ASM128173v1.dna_rm.toplevel.2bit 
##   AH104607 | Zosterops_lateralis_melanops.ASM128173v1.dna_sm.toplevel.2bit 
##   AH104608 | Zosterops_lateralis_melanops.ASM128173v1.ncrna.2bit

You can see that it gives you an idea about the different types of data that are present inside the hub. You can see where the data is coming from (dataprovider), as well as what species have samples present (species), what kinds of R data objects could be returned (rdataclass). We can take a closer look at all the kinds of data providers that are available by simply looking at the contents of dataprovider as if it were the column of a data.frame object like this:

unique(ah$dataprovider)
##  [1] "UCSC"                                                                                                      
##  [2] "Ensembl"                                                                                                   
##  [3] "RefNet"                                                                                                    
##  [4] "Inparanoid8"                                                                                               
##  [5] "NHLBI"                                                                                                     
##  [6] "ChEA"                                                                                                      
##  [7] "Pazar"                                                                                                     
##  [8] "NIH Pathway Interaction Database"                                                                          
##  [9] "Haemcode"                                                                                                  
## [10] "BroadInstitute"                                                                                            
## [11] "PRIDE"                                                                                                     
## [12] "Gencode"                                                                                                   
## [13] "CRIBI"                                                                                                     
## [14] "Genoscope"                                                                                                 
## [15] "MISO, VAST-TOOLS, UCSC"                                                                                    
## [16] "UWashington"                                                                                               
## [17] "Stanford"                                                                                                  
## [18] "dbSNP"                                                                                                     
## [19] "BioMart"                                                                                                   
## [20] "GeneOntology"                                                                                              
## [21] "KEGG"                                                                                                      
## [22] "URGI"                                                                                                      
## [23] "EMBL-EBI"                                                                                                  
## [24] "MicrosporidiaDB"                                                                                           
## [25] "FungiDB"                                                                                                   
## [26] "TriTrypDB"                                                                                                 
## [27] "ToxoDB"                                                                                                    
## [28] "AmoebaDB"                                                                                                  
## [29] "PlasmoDB"                                                                                                  
## [30] "PiroplasmaDB"                                                                                              
## [31] "CryptoDB"                                                                                                  
## [32] "TrichDB"                                                                                                   
## [33] "GiardiaDB"                                                                                                 
## [34] "The Gene Ontology Consortium"                                                                              
## [35] "ENCODE Project"                                                                                            
## [36] "SchistoDB"                                                                                                 
## [37] "NCBI/UniProt"                                                                                              
## [38] "GENCODE"                                                                                                   
## [39] "http://www.pantherdb.org"                                                                                  
## [40] "RMBase v2.0"                                                                                               
## [41] "snoRNAdb"                                                                                                  
## [42] "tRNAdb"                                                                                                    
## [43] "NCBI"                                                                                                      
## [44] "DrugAge, DrugBank, Broad Institute"                                                                        
## [45] "DrugAge"                                                                                                   
## [46] "DrugBank"                                                                                                  
## [47] "Broad Institute"                                                                                           
## [48] "HMDB, EMBL-EBI, EPA"                                                                                       
## [49] "STRING"                                                                                                    
## [50] "OMA"                                                                                                       
## [51] "OrthoDB"                                                                                                   
## [52] "PathBank"                                                                                                  
## [53] "EBI/EMBL"                                                                                                  
## [54] "NCBI,DBCLS"                                                                                                
## [55] "FANTOM5,DLRP,IUPHAR,HPRD,STRING,SWISSPROT,TREMBL,ENSEMBL,CELLPHONEDB,BADERLAB,SINGLECELLSIGNALR,HOMOLOGENE"
## [56] "WikiPathways"                                                                                              
## [57] "UCSC Jaspar"                                                                                               
## [58] "VAST-TOOLS"                                                                                                
## [59] "pyGenomeTracks "                                                                                           
## [60] "NA"                                                                                                        
## [61] "UoE"                                                                                                       
## [62] "mitra.stanford.edu/kundaje/akundaje/release/blacklists/"                                                   
## [63] "ENCODE"                                                                                                    
## [64] "TargetScan,miRTarBase,USCS,ENSEMBL"                                                                        
## [65] "TargetScan"                                                                                                
## [66] "QuickGO"                                                                                                   
## [67] "ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/"

In the same way, you can also see data from different species inside the hub by looking at the contents of species like this:

head(unique(ah$species))
## [1] "Homo sapiens"         "Vicugna pacos"        "Dasypus novemcinctus"
## [4] "Otolemur garnettii"   "Papio hamadryas"      "Papio anubis"

And this will also work for any of the other types of metadata present. You can learn which kinds of metadata are available by simply hitting the tab key after you type ‘ah$’. In this way you can explore for yourself what kinds of data are present in the hub right from the command line. This interface also allows you to access the hub programatically to extract data that matches a particular set of criteria.

Another valuable types of metadata to pay attention to is the rdataclass.

head(unique(ah$rdataclass))
## [1] "GRanges"          "data.frame"       "Inparanoid8Db"    "TwoBitFile"      
## [5] "ChainFile"        "SQLiteConnection"

The rdataclass allows you to see which kinds of R objects the hub will return to you. This kind of information is valuable both as a means to filter results and also as a means to explore and learn about some of the kinds of annotation objects that are widely available for the project. Right now this is a pretty short list, but over time it should grow as we support more of the different kinds of annotation objects via the hub.

Now lets try getting the Chain Files from UCSC using the query and subset methods to selectively pare down the hub based on specific criteria. The query method lets you search rows for specific strings, returning an AnnotationHub instance with just the rows matching the query.

From the show method, one can easily see that one of the dataprovider is UCSC and there is a rdataclass for ChainFile

One can get chain files for Drosophila melanogaster from UCSC with:

dm <- query(ah, c("ChainFile", "UCSC", "Drosophila melanogaster"))
dm
## AnnotationHub with 45 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: UCSC
## # $species: Drosophila melanogaster
## # $rdataclass: ChainFile
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["AH15102"]]' 
## 
##             title                     
##   AH15102 | dm3ToAnoGam1.over.chain.gz
##   AH15103 | dm3ToApiMel3.over.chain.gz
##   AH15104 | dm3ToDm2.over.chain.gz    
##   AH15105 | dm3ToDm6.over.chain.gz    
##   AH15106 | dm3ToDp3.over.chain.gz    
##   ...       ...                       
##   AH15142 | dm2ToDroVir3.over.chain.gz
##   AH15143 | dm2ToDroWil1.over.chain.gz
##   AH15144 | dm2ToDroYak1.over.chain.gz
##   AH15145 | dm2ToDroYak2.over.chain.gz
##   AH15146 | dm1ToDm2.over.chain.gz

Query has worked and you can now see that the only species present is Drosophila melanogaster.

The metadata underlying this hub object can be retrieved by you

df <- mcols(dm)
# what is df?
class(df)
## [1] "DFrame"
## attr(,"package")
## [1] "S4Vectors"
head(df[,1:5])
## DataFrame with 6 rows and 5 columns
##                          title dataprovider                species taxonomyid
##                    <character>  <character>            <character>  <integer>
## AH15102 dm3ToAnoGam1.over.ch..         UCSC Drosophila melanogas..       7227
## AH15103 dm3ToApiMel3.over.ch..         UCSC Drosophila melanogas..       7227
## AH15104 dm3ToDm2.over.chain.gz         UCSC Drosophila melanogas..       7227
## AH15105 dm3ToDm6.over.chain.gz         UCSC Drosophila melanogas..       7227
## AH15106 dm3ToDp3.over.chain.gz         UCSC Drosophila melanogas..       7227
## AH15107 dm3ToDp4.over.chain.gz         UCSC Drosophila melanogas..       7227
##              genome
##         <character>
## AH15102         dm3
## AH15103         dm3
## AH15104         dm3
## AH15105         dm3
## AH15106         dm3
## AH15107         dm3

By default the show method will only display the first 5 and last 5 rows. There are already thousands of records present in the hub.

length(ah)
## [1] 64948

Lets look at another example, where we pull down only Inparanoid8 data from the hub and use subset to return a smaller base object (here we are finding cases where the genome column is set to panda).

ahs <- query(ah, c('inparanoid8', 'ailuropoda'))
ahs
## AnnotationHub with 1 record
## # snapshotDate(): 2022-04-21
## # names(): AH10451
## # $dataprovider: Inparanoid8
## # $species: Ailuropoda melanoleuca
## # $rdataclass: Inparanoid8Db
## # $rdatadateadded: 2014-03-31
## # $title: hom.Ailuropoda_melanoleuca.inp8.sqlite
## # $description: Inparanoid 8 annotations about Ailuropoda melanoleuca
## # $taxonomyid: 9646
## # $genome: inparanoid8 genomes
## # $sourcetype: Inparanoid
## # $sourceurl: http://inparanoid.sbc.su.se/download/current/Orthologs/A.melan...
## # $sourcesize: NA
## # $tags: c("Inparanoid", "Gene", "Homology", "Annotation") 
## # retrieve record with 'object[["AH10451"]]'

We can also look at the AnnotationHub object in a browser using the display() function. We can then filter the AnnotationHub object for _chainFile__ by either using the Global search field on the top right corner of the page or the in-column search field for `rdataclass’.

d <- display(ah)

Displaying and filtering the Annotation Hub object in a browser

By default 1000 entries are displayed per page, we can change this using the filter on the top of the page or navigate through different pages using the page scrolling feature at the bottom of the page.

We can also select the rows of interest to us and send them back to the R session using ‘Return rows to R session’ button ; this sets a filter internally which filters the AnnotationHub object. The names of the selected AnnotationHub elements displayed at the top of the page.

Using AnnotationHub to retrieve data

Looking back at our chain file example, if we are interested in the file dm1ToDm2.over.chain.gz, we can gets its metadata using

dm
## AnnotationHub with 45 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: UCSC
## # $species: Drosophila melanogaster
## # $rdataclass: ChainFile
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["AH15102"]]' 
## 
##             title                     
##   AH15102 | dm3ToAnoGam1.over.chain.gz
##   AH15103 | dm3ToApiMel3.over.chain.gz
##   AH15104 | dm3ToDm2.over.chain.gz    
##   AH15105 | dm3ToDm6.over.chain.gz    
##   AH15106 | dm3ToDp3.over.chain.gz    
##   ...       ...                       
##   AH15142 | dm2ToDroVir3.over.chain.gz
##   AH15143 | dm2ToDroWil1.over.chain.gz
##   AH15144 | dm2ToDroYak1.over.chain.gz
##   AH15145 | dm2ToDroYak2.over.chain.gz
##   AH15146 | dm1ToDm2.over.chain.gz
dm["AH15146"]
## AnnotationHub with 1 record
## # snapshotDate(): 2022-04-21
## # names(): AH15146
## # $dataprovider: UCSC
## # $species: Drosophila melanogaster
## # $rdataclass: ChainFile
## # $rdatadateadded: 2014-12-15
## # $title: dm1ToDm2.over.chain.gz
## # $description: UCSC liftOver chain file from dm1 to dm2
## # $taxonomyid: 7227
## # $genome: dm1
## # $sourcetype: Chain
## # $sourceurl: http://hgdownload.cse.ucsc.edu/goldenpath/dm1/liftOver/dm1ToDm...
## # $sourcesize: NA
## # $tags: c("liftOver", "chain", "UCSC", "genome", "homology") 
## # retrieve record with 'object[["AH15146"]]'

We can download the file using

dm[["AH15146"]]
## loading from cache
## require("rtracklayer")
## Chain of length 11
## names(11): chr2L chr2R chr3L chr3R chr4 chrX chrU chr2h chr3h chrXh chrYh

Each file is retrieved from the AnnotationHub server and the file is also cache locally, so that the next time you need to retrieve it, it should download much more quickly.

Accessing Genome-Scale Data

Non-model organism gene annotations

Bioconductor offers pre-built org.* annotation packages for model organisms, with their use described in the OrgDb section of the Annotation work flow. Here we discover available OrgDb objects for less-model organisms.

The query() interface allows us to do full-text searching of AnnotationHub objects. How many OrgDb packages are represented in AnnotationHub?

library(AnnotationHub)
## create the annotationhub object
ah <- AnnotationHub()
## snapshotDate(): 2022-04-21
## Query the annotationhub metadata
query(ah, "OrgDb")
## AnnotationHub with 1830 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
## # $species: Escherichia coli, greater Indian_fruit_bat, Zootoca vivipara, Zo...
## # $rdataclass: OrgDb
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["AH100399"]]' 
## 
##              title                                    
##   AH100399 | org.Ag.eg.db.sqlite                      
##   AH100400 | org.At.tair.db.sqlite                    
##   AH100401 | org.Bt.eg.db.sqlite                      
##   AH100402 | org.Cf.eg.db.sqlite                      
##   AH100403 | org.Gg.eg.db.sqlite                      
##   ...        ...                                      
##   AH102596 | org.Lobosporangium_transversale.eg.sqlite
##   AH102597 | org.Sulfolobus_acidocaldarius.eg.sqlite  
##   AH102598 | org.Penicillium_rugulosum.eg.sqlite      
##   AH102599 | org.Talaromyces_rugulosus.eg.sqlite      
##   AH102600 | org.Metallosphaera_sedula.eg.sqlite

Let’s assume that we are working with yeast (Saccharomyces cerevisiae) and get the OrgDb package.

sub_ah = query(ah, c("OrgDb", "cerevisiae"))
sub_ah
## AnnotationHub with 1 record
## # snapshotDate(): 2022-04-21
## # names(): AH100415
## # $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
## # $species: Saccharomyces cerevisiae
## # $rdataclass: OrgDb
## # $rdatadateadded: 2022-04-18
## # $title: org.Sc.sgd.db.sqlite
## # $description: NCBI gene ID based annotations about Saccharomyces cerevisiae
## # $taxonomyid: 559292
## # $genome: NCBI genomes
## # $sourcetype: NCBI/ensembl
## # $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/p...
## # $sourcesize: NA
## # $tags: c("NCBI", "Gene", "Annotation") 
## # retrieve record with 'object[["AH100415"]]'
orgdb <- query(sub_ah, "OrgDb")[[1]]
## loading from cache
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:AnnotationHub':
## 
##     cache

Look at the orgdb object.

orgdb
## OrgDb object:
## | DBSCHEMAVERSION: 2.1
## | Db type: OrgDb
## | Supporting package: AnnotationDbi
## | DBSCHEMA: YEAST_DB
## | ORGANISM: Saccharomyces cerevisiae
## | SPECIES: Yeast
## | YGSOURCENAME: Yeast Genome
## | YGSOURCEURL: http://sgd-archive.yeastgenome.org
## | YGSOURCEDATE: 2019-Oct25
## | CENTRALID: ORF
## | TAXID: 559292
## | KEGGSOURCENAME: KEGG GENOME
## | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
## | KEGGSOURCEDATE: 2011-Mar15
## | GOSOURCENAME: Gene Ontology
## | GOSOURCEURL: http://current.geneontology.org/ontology/go-basic.obo
## | GOSOURCEDATE: 2022-03-10
## | EGSOURCEDATE: 2022-Mar17
## | EGSOURCENAME: Entrez Gene
## | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | ENSOURCEDATE: 2021-Dec21
## | ENSOURCENAME: Ensembl
## | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
## | UPSOURCENAME: Uniprot
## | UPSOURCEURL: http://www.UniProt.org/
## | UPSOURCEDATE: Fri Apr  1 15:07:54 2022
## 
## Please see: help('select') for usage information

The orgdb object works like a little database. We can look at the columns in the database.

columns(orgdb)
##  [1] "ALIAS"        "COMMON"       "DESCRIPTION"  "ENSEMBL"      "ENSEMBLPROT" 
##  [6] "ENSEMBLTRANS" "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL" 
## [11] "GENENAME"     "GO"           "GOALL"        "INTERPRO"     "ONTOLOGY"    
## [16] "ONTOLOGYALL"  "ORF"          "PATH"         "PFAM"         "PMID"        
## [21] "REFSEQ"       "SGD"          "SMART"        "UNIPROT"

Not all columns can always be used for “lookup”, though. We need to know which columns represent keys that we can use to retrieve data.

keytypes(orgdb)
##  [1] "ALIAS"        "COMMON"       "DESCRIPTION"  "ENSEMBL"      "ENSEMBLPROT" 
##  [6] "ENSEMBLTRANS" "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL" 
## [11] "GENENAME"     "GO"           "GOALL"        "INTERPRO"     "ONTOLOGY"    
## [16] "ONTOLOGYALL"  "ORF"          "PATH"         "PFAM"         "PMID"        
## [21] "REFSEQ"       "SGD"          "SMART"        "UNIPROT"

You can examine the values available for lookup by using the keys() method.

head(keys(orgdb, keytype="ORF"))
## [1] "AIP5"    "ARS1001" "ARS1002" "ARS1003" "ARS1004" "ARS1005"
head(keys(orgdb, keytype="GO"))
## [1] "GO:0003674" "GO:0005199" "GO:0005575" "GO:0009277" "GO:0030437"
## [6] "GO:0031505"

Notice that there are two columns that look useful for the derisi data–ORF and GENENAME. Let’s assume that we had not been given the gene name as part of the derisi data, but only the ORF. We can use the OrgDb select() interface to retrieve the gene names for these five ORF ids, just to get a sense of how select() works to get information from an OrgDb object.

orfids = c("YAL001C", "YAL002W", "YAL003W", "YAL004W", "YAL005C")
select(orgdb, keys = orfids, columns = "GENENAME", keytype="ORF")
## 'select()' returned 1:1 mapping between keys and columns
##       ORF        SGD GENENAME
## 1 YAL001C S000000001     TFC3
## 2 YAL002W S000000002     VPS8
## 3 YAL003W S000000003     EFB1
## 4 YAL004W S000002136     <NA>
## 5 YAL005C S000000004     SSA1
select(orgdb, keys = orfids, columns = "GO", keytype="ORF")
## 'select()' returned 1:many mapping between keys and columns
##        ORF        SGD         GO EVIDENCE ONTOLOGY
## 1  YAL001C S000000001 GO:0000127      IBA       CC
## 2  YAL001C S000000001 GO:0000127      IDA       CC
## 3  YAL001C S000000001 GO:0001002      IDA       MF
## 4  YAL001C S000000001 GO:0001003      IDA       MF
## 5  YAL001C S000000001 GO:0003677      IEA       MF
## 6  YAL001C S000000001 GO:0005634      IEA       CC
## 7  YAL001C S000000001 GO:0005739      HDA       CC
## 8  YAL001C S000000001 GO:0005739      IEA       CC
## 9  YAL001C S000000001 GO:0006383      IDA       BP
## 10 YAL001C S000000001 GO:0006384      IBA       BP
## 11 YAL001C S000000001 GO:0008301      IDA       MF
## 12 YAL001C S000000001 GO:0042791      IBA       BP
## 13 YAL001C S000000001 GO:0042791      IDA       BP
## 14 YAL001C S000000001 GO:0042791      IMP       BP
## 15 YAL001C S000000001 GO:0071168      IMP       BP
## 16 YAL002W S000000002 GO:0005770      IBA       CC
## 17 YAL002W S000000002 GO:0005770      IDA       CC
## 18 YAL002W S000000002 GO:0005794      IEA       CC
## 19 YAL002W S000000002 GO:0005795      IEA       CC
## 20 YAL002W S000000002 GO:0006623      IBA       BP
## 21 YAL002W S000000002 GO:0006623      IMP       BP
## 22 YAL002W S000000002 GO:0006886      IEA       BP
## 23 YAL002W S000000002 GO:0015031      IEA       BP
## 24 YAL002W S000000002 GO:0016020      IDA       CC
## 25 YAL002W S000000002 GO:0016192      IEA       BP
## 26 YAL002W S000000002 GO:0030897      IBA       CC
## 27 YAL002W S000000002 GO:0032511      IMP       BP
## 28 YAL002W S000000002 GO:0033263      IDA       CC
## 29 YAL002W S000000002 GO:0034058      IBA       BP
## 30 YAL002W S000000002 GO:0043495      IMP       MF
## 31 YAL002W S000000002 GO:0046872      IEA       MF
## 32 YAL002W S000000002 GO:0051020      IDA       MF
## 33 YAL002W S000000002 GO:0051020      IPI       MF
## 34 YAL003W S000000003 GO:0003746      IEA       MF
## 35 YAL003W S000000003 GO:0005085      IBA       MF
## 36 YAL003W S000000003 GO:0005085      IDA       MF
## 37 YAL003W S000000003 GO:0005829      IBA       CC
## 38 YAL003W S000000003 GO:0005853      IEA       CC
## 39 YAL003W S000000003 GO:0005853      IMP       CC
## 40 YAL003W S000000003 GO:0006412      IEA       BP
## 41 YAL003W S000000003 GO:0006414      IBA       BP
## 42 YAL003W S000000003 GO:0006414      IEA       BP
## 43 YAL003W S000000003 GO:0006414      IMP       BP
## 44 YAL003W S000000003 GO:0006449      IGI       BP
## 45 YAL003W S000000003 GO:0032232      IDA       BP
## 46 YAL003W S000000003 GO:1990145      IMP       BP
## 47 YAL004W S000002136       <NA>     <NA>     <NA>
## 48 YAL005C S000000004 GO:0000049      IDA       MF
## 49 YAL005C S000000004 GO:0000166      IEA       MF
## 50 YAL005C S000000004 GO:0000209      IDA       BP
## 51 YAL005C S000000004 GO:0000329      IDA       CC
## 52 YAL005C S000000004 GO:0002181      IMP       BP
## 53 YAL005C S000000004 GO:0005524      IBA       MF
## 54 YAL005C S000000004 GO:0005524      IEA       MF
## 55 YAL005C S000000004 GO:0005576      IEA       CC
## 56 YAL005C S000000004 GO:0005618      IEA       CC
## 57 YAL005C S000000004 GO:0005634      HDA       CC
## 58 YAL005C S000000004 GO:0005634      IBA       CC
## 59 YAL005C S000000004 GO:0005634      IDA       CC
## 60 YAL005C S000000004 GO:0005737      HDA       CC
## 61 YAL005C S000000004 GO:0005737      IBA       CC
## 62 YAL005C S000000004 GO:0005737      IDA       CC
## 63 YAL005C S000000004 GO:0005737      IEA       CC
## 64 YAL005C S000000004 GO:0005829      HDA       CC
## 65 YAL005C S000000004 GO:0005829      IBA       CC
## 66 YAL005C S000000004 GO:0005844      IDA       CC
## 67 YAL005C S000000004 GO:0005886      HDA       CC
## 68 YAL005C S000000004 GO:0005886      IBA       CC
## 69 YAL005C S000000004 GO:0006457      IDA       BP
## 70 YAL005C S000000004 GO:0006606      IDA       BP
## 71 YAL005C S000000004 GO:0006606      IGI       BP
## 72 YAL005C S000000004 GO:0006616      IBA       BP
## 73 YAL005C S000000004 GO:0006616      IDA       BP
## 74 YAL005C S000000004 GO:0006626      IMP       BP
## 75 YAL005C S000000004 GO:0006986      IBA       BP
## 76 YAL005C S000000004 GO:0009277      IDA       CC
## 77 YAL005C S000000004 GO:0016192      IBA       BP
## 78 YAL005C S000000004 GO:0016887      IBA       MF
## 79 YAL005C S000000004 GO:0016887      IDA       MF
## 80 YAL005C S000000004 GO:0031072      IBA       MF
## 81 YAL005C S000000004 GO:0034620      IBA       BP
## 82 YAL005C S000000004 GO:0035617      IDA       BP
## 83 YAL005C S000000004 GO:0042026      IBA       BP
## 84 YAL005C S000000004 GO:0042026      IDA       BP
## 85 YAL005C S000000004 GO:0043161      IGI       BP
## 86 YAL005C S000000004 GO:0043161      IMP       BP
## 87 YAL005C S000000004 GO:0044183      IBA       MF
## 88 YAL005C S000000004 GO:0051082      IBA       MF
## 89 YAL005C S000000004 GO:0051082      IDA       MF
## 90 YAL005C S000000004 GO:0051085      IBA       BP
## 91 YAL005C S000000004 GO:0051787      IBA       MF
## 92 YAL005C S000000004 GO:0071470      IDA       BP
## 93 YAL005C S000000004 GO:0072318      IDA       BP
## 94 YAL005C S000000004 GO:0072671      IMP       BP
## 95 YAL005C S000000004 GO:0090344      IMP       BP

Exercise: Use the derisi ORF column to look up the associated gene names.

Exercise: Use the derisi ORF column to look up the gene ontology GO annotations for the ORFs. What is the general relationship between GO terms and ORFs?

Bonus Exercise: Read the documentation associated with the GO.db package here. Use the GO.db package and the select interface to look up the descriptions and terms for the gene ontology terms from the previous exercise.

Roadmap Epigenomics Project

All Roadmap Epigenomics files are hosted here. If one had to download these files on their own, one would navigate through the web interface to find useful files, then use something like the following R code.

url <- "http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/broadPeak/E001-H3K4me1.broadPeak.gz"
filename <-  basename(url)
download.file(url, destfile=filename)
if (file.exists(filename))
   data <- import(filename, format="bed")

This would have to be repeated for all files, and the onus would lie on the user to identify, download, import, and manage the local disk location of these files.

AnnotationHub reduces this task to just a few lines of R code

library(AnnotationHub)
ah = AnnotationHub()
## snapshotDate(): 2022-04-21
epiFiles <- query(ah, "EpigenomeRoadMap")

A look at the value returned by epiFiles shows us that 18248 roadmap resources are available via AnnotationHub. Additional information about the files is also available, e.g., where the files came from (dataprovider), genome, species, sourceurl, sourcetypes.

epiFiles
## AnnotationHub with 18248 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: BroadInstitute
## # $species: Homo sapiens
## # $rdataclass: BigWigFile, GRanges, data.frame
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["AH28856"]]' 
## 
##             title                                 
##   AH28856 | E001-H3K4me1.broadPeak.gz             
##   AH28857 | E001-H3K4me3.broadPeak.gz             
##   AH28858 | E001-H3K9ac.broadPeak.gz              
##   AH28859 | E001-H3K9me3.broadPeak.gz             
##   AH28860 | E001-H3K27me3.broadPeak.gz            
##   ...       ...                                   
##   AH49540 | E058_mCRF_FractionalMethylation.bigwig
##   AH49541 | E059_mCRF_FractionalMethylation.bigwig
##   AH49542 | E061_mCRF_FractionalMethylation.bigwig
##   AH49543 | E081_mCRF_FractionalMethylation.bigwig
##   AH49544 | E082_mCRF_FractionalMethylation.bigwig

A good sanity check to ensure that we have files only from the Roadmap Epigenomics project is to check that all the files in the returned smaller hub object come from Homo sapiens and the hg19 genome

unique(epiFiles$species)
## [1] "Homo sapiens"
unique(epiFiles$genome)
## [1] "hg19"

Broadly, one can get an idea of the different files from this project looking at the sourcetype

table(epiFiles$sourcetype)
## 
##    BED BigWig    GTF    tab    Zip 
##   8298   9932      3      1     14

To get a more descriptive idea of these different files one can use:

sort(table(epiFiles$description), decreasing=TRUE)
## 
##                       Bigwig File containing -log10(p-value) signal tracks from EpigenomeRoadMap Project 
##                                                                                                     6881 
##                       Bigwig File containing fold enrichment signal tracks from EpigenomeRoadMap Project 
##                                                                                                     2947 
##                          Narrow ChIP-seq peaks for consolidated epigenomes from EpigenomeRoadMap Project 
##                                                                                                     2894 
##                           Broad ChIP-seq peaks for consolidated epigenomes from EpigenomeRoadMap Project 
##                                                                                                     2534 
##                          Gapped ChIP-seq peaks for consolidated epigenomes from EpigenomeRoadMap Project 
##                                                                                                     2534 
##                              Narrow DNasePeaks for consolidated epigenomes from EpigenomeRoadMap Project 
##                                                                                                      131 
##                                           15 state chromatin segmentations from EpigenomeRoadMap Project 
##                                                                                                      127 
##      Broad domains on enrichment for DNase-seq for consolidated epigenomes from EpigenomeRoadMap Project 
##                                                                                                       78 
##                                         RRBS fractional methylation calls from EpigenomeRoadMap Project  
##                                                                                                       51 
##                       Whole genome bisulphite fractional methylation calls from EpigenomeRoadMap Project 
##                                                                                                       37 
##                               MeDIP/MRE(mCRF) fractional methylation calls from EpigenomeRoadMap Project 
##                                                                                                       16 
## GencodeV10 gene/transcript coordinates and annotations corresponding to hg19 version of the human genome 
##                                                                                                        3 
##                                       RNA-seq read count matrix for intronic protein-coding RNA elements 
##                                                                                                        2 
##                                                      RNA-seq read counts matrix for ribosomal gene exons 
##                                                                                                        2 
##                                                          RPKM expression matrix for ribosomal gene exons 
##                                                                                                        2 
##                                                                    Metadata for EpigenomeRoadMap Project 
##                                                                                                        1 
##                                                           RNA-seq read counts matrix for non-coding RNAs 
##                                                                                                        1 
##                                                      RNA-seq read counts matrix for protein coding exons 
##                                                                                                        1 
##                                                      RNA-seq read counts matrix for protein coding genes 
##                                                                                                        1 
##                                                           RNA-seq read counts matrix for ribosomal genes 
##                                                                                                        1 
##                                                               RPKM expression matrix for non-coding RNAs 
##                                                                                                        1 
##                                                          RPKM expression matrix for protein coding exons 
##                                                                                                        1 
##                                                          RPKM expression matrix for protein coding genes 
##                                                                                                        1 
##                                                                RPKM expression matrix for ribosomal RNAs 
##                                                                                                        1

The ‘metadata’ provided by the Roadmap Epigenomics Project is also available. Note that the information displayed about a hub with a single resource is quite different from the information displayed when the hub references more than one resource.

metadata.tab <- query(ah , c("EpigenomeRoadMap", "Metadata"))
metadata.tab
## AnnotationHub with 1 record
## # snapshotDate(): 2022-04-21
## # names(): AH41830
## # $dataprovider: BroadInstitute
## # $species: Homo sapiens
## # $rdataclass: data.frame
## # $rdatadateadded: 2015-05-11
## # $title: EID_metadata.tab
## # $description: Metadata for EpigenomeRoadMap Project
## # $taxonomyid: 9606
## # $genome: hg19
## # $sourcetype: tab
## # $sourceurl: http://egg2.wustl.edu/roadmap/data/byFileType/metadata/EID_met...
## # $sourcesize: 18035
## # $tags: c("EpigenomeRoadMap", "Metadata") 
## # retrieve record with 'object[["AH41830"]]'

So far we have been exploring information about resources, without downloading the resource to a local cache and importing it into R. One can retrieve the resource using [[ as indicated at the end of the show method

## loading from cache
metadata.tab <- ah[["AH41830"]]
## loading from cache

The metadata.tab file is returned as a data.frame. The first 6 rows of the first 5 columns are shown here:

metadata.tab[1:6, 1:5]
##    EID    GROUP   COLOR          MNEMONIC
## 1 E001      ESC #924965            ESC.I3
## 2 E002      ESC #924965           ESC.WA7
## 3 E003      ESC #924965            ESC.H1
## 4 E004 ES-deriv #4178AE ESDR.H1.BMP4.MESO
## 5 E005 ES-deriv #4178AE ESDR.H1.BMP4.TROP
## 6 E006 ES-deriv #4178AE       ESDR.H1.MSC
##                                     STD_NAME
## 1                                ES-I3 Cells
## 2                               ES-WA7 Cells
## 3                                   H1 Cells
## 4 H1 BMP4 Derived Mesendoderm Cultured Cells
## 5 H1 BMP4 Derived Trophoblast Cultured Cells
## 6          H1 Derived Mesenchymal Stem Cells

One can keep constructing different queries using multiple arguments to trim down these 18248 to get the files one wants. For example, to get the ChIP-Seq files for consolidated epigenomes, one could use

bpChipEpi <- query(ah , c("EpigenomeRoadMap", "broadPeak", "chip", "consolidated"))

To get all the bigWig signal files, one can query the hub using

allBigWigFiles <- query(ah, c("EpigenomeRoadMap", "BigWig"))

To access the 15 state chromatin segmentations, one can use

seg <- query(ah, c("EpigenomeRoadMap", "segmentations"))

If one is interested in getting all the files related to one sample

E126 <- query(ah , c("EpigenomeRoadMap", "E126", "H3K4ME2"))
E126
## AnnotationHub with 6 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: BroadInstitute
## # $species: Homo sapiens
## # $rdataclass: GRanges, BigWigFile
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["AH29817"]]' 
## 
##             title                                  
##   AH29817 | E126-H3K4me2.broadPeak.gz              
##   AH30868 | E126-H3K4me2.narrowPeak.gz             
##   AH31801 | E126-H3K4me2.gappedPeak.gz             
##   AH32990 | E126-H3K4me2.fc.signal.bigwig          
##   AH34022 | E126-H3K4me2.pval.signal.bigwig        
##   AH40177 | E126-H3K4me2.imputed.pval.signal.bigwig

Hub resources can also be selected using $, subset(), and display().

Hub resources are imported as the appropriate Bioconductor object for use in further analysis. For example, peak files are returned as GRanges objects.

## loading from cache
peaks <- E126[['AH29817']]
## loading from cache
seqinfo(peaks)
## Seqinfo object with 298 sequences (2 circular) from hg19 genome:
##   seqnames       seqlengths isCircular genome
##   chr1            249250621      FALSE   hg19
##   chr2            243199373      FALSE   hg19
##   chr3            198022430      FALSE   hg19
##   chr4            191154276      FALSE   hg19
##   chr5            180915260      FALSE   hg19
##   ...                   ...        ...    ...
##   chrUn_gl000245      36651      FALSE   hg19
##   chrUn_gl000246      38154      FALSE   hg19
##   chrUn_gl000247      36422      FALSE   hg19
##   chrUn_gl000248      39786      FALSE   hg19
##   chrUn_gl000249      38502      FALSE   hg19

BigWig files are returned as BigWigFile objects. A BigWigFile is a reference to a file on disk; the data in the file can be read in using rtracklayer::import(), perhaps querying these large files for particular genomic regions of interest as described on the help page ?import.bw.

Each record inside AnnotationHub is associated with a unique identifier. Most GRanges objects returned by AnnotationHub contain the unique AnnotationHub identifier of the resource from which the GRanges is derived. This can come handy when working with the GRanges object for a while, and additional information about the object (e.g., the name of the file in the cache, or the original sourceurl for the data underlying the resource) that is being worked with.

metadata(peaks)
## $AnnotationHubName
## [1] "AH29817"
## 
## $`File Name`
## [1] "E126-H3K4me2.broadPeak.gz"
## 
## $`Data Source`
## [1] "http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/broadPeak/E126-H3K4me2.broadPeak.gz"
## 
## $Provider
## [1] "BroadInstitute"
## 
## $Organism
## [1] "Homo sapiens"
## 
## $`Taxonomy ID`
## [1] 9606
ah[metadata(peaks)$AnnotationHubName]$sourceurl
## [1] "http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/broadPeak/E126-H3K4me2.broadPeak.gz"

Configuring AnnotationHub objects

Most users will not need to do any specific configuration, but this section is included for completeness.

When you create the AnnotationHub object, it will set up the object for you with some default settings. See ?AnnotationHub for ways to customize the hub source, the local cache, and other instance-specific options, and ?getAnnotationHubOption to get or set package-global options for use across sessions.

If you look at the object you will see some helpful information about it such as where the data is cached and where online the hub server is set to.

ah
## AnnotationHub with 64948 records
## # snapshotDate(): 2022-04-21
## # $dataprovider: Ensembl, BroadInstitute, UCSC, ftp://ftp.ncbi.nlm.nih.gov/g...
## # $species: Homo sapiens, Mus musculus, Drosophila melanogaster, Bos taurus,...
## # $rdataclass: GRanges, TwoBitFile, BigWigFile, EnsDb, Rle, OrgDb, ChainFile...
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["AH5012"]]' 
## 
##              title                                                         
##   AH5012   | Chromosome Band                                               
##   AH5013   | STS Markers                                                   
##   AH5014   | FISH Clones                                                   
##   AH5015   | Recomb Rate                                                   
##   AH5016   | ENCODE Pilot                                                  
##   ...        ...                                                           
##   AH104604 | Zonotrichia_albicollis.Zonotrichia_albicollis-1.0.1.ncrna.2bit
##   AH104605 | Zosterops_lateralis_melanops.ASM128173v1.cdna.all.2bit        
##   AH104606 | Zosterops_lateralis_melanops.ASM128173v1.dna_rm.toplevel.2bit 
##   AH104607 | Zosterops_lateralis_melanops.ASM128173v1.dna_sm.toplevel.2bit 
##   AH104608 | Zosterops_lateralis_melanops.ASM128173v1.ncrna.2bit

By default the AnnotationHub object is set to the latest snapshotData and a snapshot version that matches the version of Bioconductor that you are using. You can also learn about these data with the appropriate methods.

snapshotDate(ah)
## [1] "2022-04-21"

If you are interested in using an older version of a snapshot, you can list previous versions with the possibleDates() like this:

pd <- possibleDates(ah)
pd
##   [1] "2013-03-19" "2013-03-21" "2013-03-26" "2013-04-04" "2013-04-29"
##   [6] "2013-06-24" "2013-06-25" "2013-06-26" "2013-06-27" "2013-10-29"
##  [11] "2013-11-20" "2013-12-19" "2014-02-12" "2014-02-13" "2014-03-31"
##  [16] "2014-04-27" "2014-05-11" "2014-05-13" "2014-05-14" "2014-05-22"
##  [21] "2014-07-02" "2014-07-09" "2014-12-15" "2014-12-24" "2015-01-08"
##  [26] "2015-01-14" "2015-03-09" "2015-03-11" "2015-03-12" "2015-03-25"
##  [31] "2015-03-26" "2015-05-06" "2015-05-07" "2015-05-08" "2015-05-11"
##  [36] "2015-05-14" "2015-05-21" "2015-05-22" "2015-05-26" "2015-07-17"
##  [41] "2015-07-27" "2015-07-31" "2015-08-10" "2015-08-13" "2015-08-14"
##  [46] "2015-08-17" "2015-08-26" "2015-12-28" "2015-12-29" "2016-01-25"
##  [51] "2016-03-07" "2016-05-03" "2016-05-25" "2016-06-06" "2016-07-20"
##  [56] "2016-08-15" "2016-10-11" "2016-11-03" "2016-11-08" "2016-11-09"
##  [61] "2016-11-13" "2016-11-14" "2016-12-22" "2016-12-28" "2017-01-05"
##  [66] "2017-02-07" "2017-04-03" "2017-04-04" "2017-04-05" "2017-04-10"
##  [71] "2017-04-11" "2017-04-13" "2017-04-24" "2017-04-25" "2017-05-31"
##  [76] "2017-06-06" "2017-06-07" "2017-06-08" "2017-06-29" "2017-08-28"
##  [81] "2017-08-31" "2017-09-07" "2017-10-18" "2017-10-23" "2017-10-24"
##  [86] "2017-10-27" "2017-11-24" "2017-10-26" "2017-10-20" "2017-12-21"
##  [91] "2018-01-18" "2018-02-20" "2018-04-11" "2018-04-13" "2018-04-16"
##  [96] "2018-04-19" "2018-04-20" "2018-04-23" "2018-04-30" "2018-06-27"
## [101] "2018-07-31" "2018-08-01" "2018-08-20" "2018-10-04" "2018-10-11"
## [106] "2018-10-15" "2018-10-16" "2018-10-18" "2018-10-22" "2018-10-24"
## [111] "2018-12-20" "2019-01-14" "2019-02-14" "2019-02-15" "2019-04-15"
## [116] "2019-04-18" "2019-04-22" "2019-04-29" "2019-05-01" "2019-05-02"
## [121] "2019-09-17" "2019-10-08" "2019-10-17" "2019-10-21" "2019-10-22"
## [126] "2019-10-28" "2019-10-29" "2020-01-28" "2020-02-28" "2020-10-14"
## [131] "2020-05-01" "2020-04-27" "2020-05-11" "2020-06-18" "2020-07-20"
## [136] "2020-09-03" "2020-10-20" "2020-10-26" "2020-10-27" "2020-12-16"
## [141] "2021-01-14" "2021-02-24" "2021-04-12" "2021-04-19" "2021-04-30"
## [146] "2021-05-04" "2021-05-14" "2021-05-17" "2021-05-18" "2021-06-15"
## [151] "2021-08-03" "2021-09-10" "2021-09-23" "2021-10-08" "2021-10-13"
## [156] "2021-10-15" "2021-10-18" "2021-10-20" "2022-01-20" "2022-01-31"
## [161] "2022-04-12" "2022-04-18" "2022-04-21"

Set the dates like this:

snapshotDate(ah) <- pd[1]

Session info

sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur/Monterey 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] AnnotationDbi_1.58.0 Biobase_2.56.0       rtracklayer_1.56.0  
##  [4] GenomicRanges_1.48.0 GenomeInfoDb_1.32.0  IRanges_2.30.0      
##  [7] S4Vectors_0.34.0     AnnotationHub_3.4.0  BiocFileCache_2.4.0 
## [10] dbplyr_2.1.1         BiocGenerics_0.42.0  BiocStyle_2.24.0    
## 
## loaded via a namespace (and not attached):
##  [1] MatrixGenerics_1.8.0          httr_1.4.2                   
##  [3] sass_0.4.1                    bit64_4.0.5                  
##  [5] jsonlite_1.8.0                bslib_0.3.1                  
##  [7] shiny_1.7.1                   assertthat_0.2.1             
##  [9] interactiveDisplayBase_1.34.0 BiocManager_1.30.17          
## [11] blob_1.2.3                    BSgenome_1.64.0              
## [13] GenomeInfoDbData_1.2.8        Rsamtools_2.12.0             
## [15] yaml_2.3.5                    BiocVersion_3.15.2           
## [17] lattice_0.20-45               pillar_1.7.0                 
## [19] RSQLite_2.2.12                glue_1.6.2                   
## [21] digest_0.6.29                 promises_1.2.0.1             
## [23] XVector_0.36.0                Matrix_1.4-1                 
## [25] htmltools_0.5.2               httpuv_1.6.5                 
## [27] XML_3.99-0.9                  pkgconfig_2.0.3              
## [29] zlibbioc_1.42.0               purrr_0.3.4                  
## [31] xtable_1.8-4                  later_1.3.0                  
## [33] BiocParallel_1.30.0           tibble_3.1.6                 
## [35] KEGGREST_1.36.0               generics_0.1.2               
## [37] ellipsis_0.3.2                cachem_1.0.6                 
## [39] withr_2.5.0                   SummarizedExperiment_1.26.0  
## [41] cli_3.3.0                     magrittr_2.0.3               
## [43] crayon_1.5.1                  mime_0.12                    
## [45] memoise_2.0.1                 evaluate_0.15                
## [47] fansi_1.0.3                   tools_4.2.0                  
## [49] matrixStats_0.62.0            BiocIO_1.6.0                 
## [51] lifecycle_1.0.1               stringr_1.4.0                
## [53] DelayedArray_0.22.0           Biostrings_2.64.0            
## [55] compiler_4.2.0                jquerylib_0.1.4              
## [57] rlang_1.0.2                   grid_4.2.0                   
## [59] RCurl_1.98-1.6                rjson_0.2.21                 
## [61] rappdirs_0.3.3                bitops_1.0-7                 
## [63] rmarkdown_2.14                restfulr_0.0.13              
## [65] DBI_1.1.2                     curl_4.3.2                   
## [67] R6_2.5.1                      GenomicAlignments_1.32.0     
## [69] knitr_1.39                    dplyr_1.0.8                  
## [71] fastmap_1.1.0                 bit_4.0.4                    
## [73] utf8_1.2.2                    filelock_1.0.2               
## [75] stringi_1.7.6                 parallel_4.2.0               
## [77] Rcpp_1.0.8.3                  vctrs_0.4.1                  
## [79] png_0.1-7                     tidyselect_1.1.2             
## [81] xfun_0.30