For packages that live on GitHub, we can mine further details. This function returns the GitHub details for the listed packages.
githubDetails(pkgs, sleep = 0)
a character() vector of username/repo
for one or more GitHub repos, such as seandavi/GEOquery
.
numeric() denoting the number of seconds to
sleep between GitHub API calls. Since GitHub rate limits
its APIs, it might be necessary to either use small
chunks of packages iteratively or to supply a non-zero
argument here. See the details
section for a better
solution using GitHub tokens.
The gh
function is used to
do the fetching. If the number of packages supplied
to this function is large (>40 or so), it is possible
to run into problems with API rate limits. The gh
package uses the environment variable "GITHUB_PAT"
(for personal access token) to authenticate and then
provide higher rate limits. If you run into problems
with rate limits, set sleep to some small positive
number to slow queries. Alternatively, create a Personal
Access Token on GitHub and register it. See the gh
package for details.
pkglist = biocPkgList()
#> 'getOption("repos")' replaces Bioconductor standard repositories, see
#> 'help("repositories", package = "BiocManager")' for details.
#> Replacement repositories:
#> CRAN: https://cloud.r-project.org
# example of "pkgs" format.
head(pkglist$URL)
#> [1] NA NA NA NA NA NA
gh_list = githubURLParts(pkglist$URL)
gh_list = gh_list[!is.null(gh_list$user_repo),]
head(gh_list$user_repo)
#> [1] NA NA NA NA NA NA
ghd = githubDetails(gh_list$user_repo[1:5])
#> Warning: package NA not found
#> Warning: package NA not found
#> Warning: package NA not found
#> Warning: package NA not found
#> Warning: package NA not found
lapply(ghd, '[[', "stargazers")
#> named list()