Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


The Real Data Corpus (RDC) is a collection of disk images extracted from secondary storage devices that were acquired from second-hand markets around the world. In total, the RDC currently consists of 58 TiB of data contained in 3,127 disk images from 29 countries. A variety of devices are represented, including magnetic media and solid state storage from laptops, desktops, mobile phones, USB memory sticks, and other media. The The dataset is hosted in the HPC infrastructure at the Naval Postgraduate School, as well as in AWS Govcloud. 

Potential Uses

The Real Data Corpus is a one-of-a-kind scientific resource for:


Please be aware that due to limited staff we cannot always accommodate all requests. We are working on developing Efforts are underway to develop infrastructure that will allow us to meet a wider range of research requirements without unduly increasing privacy risks.