These consist of separate boxes each with their own tabbed subtopics
Please contact Santi Thompson with any further questions.
Organizing and structuring files at the beginning of a project will ease the research process and prevent losses and mix-ups.
For a list of best practices: Stanford University’s File-naming best practices
Visual Diagram example: Sample File-Naming Convention Visual
This is the strategy we employ to keep track of the changes to files over time.
During collaborative work, versioning is essential and more complex.
Data Management Plans will often include methods for managing versions of data.
Manually - For basic small scale needs:
Tools for Version control:
Better for groups and larger projects and when involving activities such as models, code, etc.
Workflows are the steps you take to move from start to finish in your research activities.
Things to consider:
Basic elements of work:
Individual data collection
Parts of workflows may be computational processes automated via the use of scripts.
Environments and circumstances contextualize decisions and processes.
Documenting the workflow aids your ability to pick up where you left off, and to communicate effectively with collaborators.
Three key practices - Justin Kitzes, The Basic Reproducible Workflow Template
Jupyter Notebooks - an open-source application allowing researchers to generate and collaborate on documents containing live code, equations, visualizations and text.
Electronic Lab Notebooks (ELNs) - See the Harvard ELN matrix for information
Docker - containers for computational environments
Storage is where data and research materials reside during the process of collection and analysis.
Back-up strategies are ways to insure that files are intact and up to date.
Use the "3-2-1" Rule: 3 copies, 2 different media, 1 copy off-site.
Don’t rely solely on the Cloud, make sure you keep a local back-up.
Designate one copy as the working copy and sync or update at designated intervals.
Automate back-up whenever possible.
Test your backups periodically.
Document these locations and who is responsible.
Pay special attention to raw data files - they are most valuable.
Keep an sharp eye out for vulnerabilities both internal and external.
|Jane Dough - Dept IT
|UH MS OneDrive
|John Smyth - Post doc
|My External Hard Drive
|Sal E. Mander - PI
For additional storage and back - up tips: Ways to Avoid a Data-Storage Disaster by Jeffrey Perkel, Nature 568, 131-132 (2019)
Storage within the UH network ensures compliance. Information security adheres to specific protocols designed to keep university systems secure.
Consider choosing an additional trusted cloud option for one of your storage solutions. (Do not rely on this as your sole storage.)
Free options you might consider: Box, Dropbox, Google Drive
The growing scale of data is one of the biggest challenges we face in research and data services.
Network attached storage (NAS)
These devices contain storage and associated management software - sort of like a small computer with a large amount of storage capacity. They are internet accessible which allows you to centralize data collected in multiple ways and then access files for analysis in one spot. Most models contain multiple hard drives and are set up with RAID to protect against data loss in case of a hard drive failure. (The cost ranges widely approximately $300-500.)
Cloud Storage Services
Beyond the free and institutional storage, there are varying levels of cloud storage services options available, some with additional back-up features.
Amazon Web Services is one of the most common choices, but there may be other options more suitable to your needs and budget.
We advise keeping a document that lays out the following:
A list of data files, average size, and format
Three storage locations
Medium of storage
Methods of back-up (Manual, automatic, software used, etc.)
Timing - Daily, weekly, monthly will depend on your output
Log of back-up dates (Verify that back-up is complete)
For Groups: Contingency plans should someone leave