Where to go with Bioconductor
Data availability can be thought of as a service. The service might simply be the file system, but it could be more complex.
Data services vary in the complexity of operations they enable. A filesystem, for example, has no “smarts” in that it delivers bits-and-bytes as asked. A more complex system might include indexing capabilities (VCF files, hdf5). More complexity again comes with systems like MongoDB and PostgreSQL. At the level of Bigquery, parallelism and rudimentary machine learning is possible. At the extreme are projects like Apache Spark that can serve as distributed analysis engines over arbitrarily large datasets.
Loom is an efficient file format based on HDF5 for very large omics datasets, consisting of a main matrix, optional additional layers, a variable number of row and column annotations, and sparse graph objects.
Columnar memory layout allows applications to avoid unnecessary IO and accelerate analytical processing performance on modern CPUs and GPUs. ]
From the TileDB website:
Building these will usually require a custom API built and maintained centrally.