Analysis modules
OpenScPCA organizes individual analyses into analysis modules.
Each analysis module is a folder with files containing all code, computing environment specifications, and documentation needed to run and interpret an analysis.
For example, an analysis to perform cell type annotation on Ewing sarcoma samples would be a single analysis module, and it might be named celltype-ewings-sarcoma
.
This section explains the structure of analysis modules.
Skeleton analysis module contents
You can create a starting point for your analysis module with the script create-analysis-module.py
.
This script will create a skeleton analysis module with the following file structure (depending on how you run the script, there may be more files here too!).
├── scripts
│ └── ...
├── results
│ └── README.md
├── plots
│ └── ...
├── scratch
│ └── ...
├── README.md
├── Dockerfile
├── .gitignore
└── .dockerignore
Please refer to the documentation on creating analysis modules when you are ready to make your first analysis folder and begin contributing to OpenScPCA!
These are the main files and folders you will interact with when writing your analysis:
scripts
- You can save any scripts (e.g.,
.R
,.py
, or.sh
) that you write for your analysis module in this folder. - If you choose, you can also save any notebooks (e.g., R Markdown or Jupyter) files in this folder too. Or, depending on the goals of your analysis module, you may prefer to save notebooks in the root of the analysis module folder.
- You can save any scripts (e.g.,
results
- Any result files (e.g., TSV files) that your code produces should be saved to this
results
folder. - Git will ignore the contents of this folder, except for its
README.md
file, which you can use to document the results files themselves. This means that only itsREADME.md
file will be present in the remote repository.
- Any result files (e.g., TSV files) that your code produces should be saved to this
plots
- Any plots that your code produces should be saved to this
plots
folder.
- Any plots that your code produces should be saved to this
scratch
- You can optionally use this folder to store intermediate files that your code produces but are not meant to live in
results
orplots
. - We have set up Git to ignore the contents of this folder, so anything you save to this folder will only be stored locally and not in the remote repository.
- You can optionally use this folder to store intermediate files that your code produces but are not meant to live in
README.md
- Use this markdown file to document your analysis module.
Your
README.md
file should have enough information for other contributors or repository users to learn the following:- The scientific goals of the module
- The input and output of the module and its computational resource requirements
- How to run the module, including steps needed to set up the modules software environment
- Please see the documentation on documenting your analysis module for more information about adding to this
README.md
file.
- Use this markdown file to document your analysis module.
Your
There are also some additional files in the skeleton that are useful to be aware of:
Dockerfile
- This is the analysis module's Dockerfile and contains the commands that Docker uses to build the module's Docker image.
- For more information on how OpenScPCA uses
Docker
images, please see ourDocker
documentation.
- Hidden files
.gitignore
and.dockerignore
- We have set up these files to tell Git and Docker, respectively, to ignore certain files that do not belong in version control or in the module's Docker image.
- These files will likely be automatically hidden from you, and you don't really have to worry about it. Just be aware that they are there and working behind the scenes to help manage the module!
Additional files you will add to your module
While you write your analysis, you may add other files too:
-
Scripts and analysis notebooks, e.g., R Markdown files or Jupyter notebooks
- We recommend saving scripts in the
scripts
folder, as described above. - You are also welcome to save notebook files in the root of your analysis module folder.
Feel free to choose what location makes the most sense for your analysis, as long as it is all documented in the module's
README.md
file!- Please see the documentation on structuring your analysis notebooks for more information about how to write your analysis notebooks.
Naming your files
If your module has multiple scripts or notebooks, we recommend naming them in the order they should be run.
For example, you might have these files in your module (the script names are conceptual!):
scripts/01_script-to-prepare-data.R
scripts/02_script-to-analyze-data.R
03_notebook-to-visualize-data.Rmd
Check out these slides from Jenny Bryan to learn more about how we in the Data Lab think about naming our files!
- We recommend saving scripts in the
-
A script to run all code in the module
- If your module has multiple scripts or notebooks, we recommend adding a script (for example a
shell
script) to the root of your analysis module folder that will run all scripts in order. - You can name this file, for example,
run_{module-name}.sh
and document how to run in the module'sREADME.md
- If your module has multiple scripts or notebooks, we recommend adding a script (for example a
- Additional environment files
- When you create a module, you can choose to include files that manage the module's software environment in the analysis module skeleton.
- In this case, your module may also contain contain R-specific (e.g.,
renv.lock
) and/or Python-specific (e.g.,environment.yml
) files or folders.
- Documentation for your analysis module
- A template
README.md
file will be created for you when you create a new analysis module. You will not need to specifically create any new files for documenting your module, but you will need to fill in theREADME.md
file with information about the goals and content of the module, as well as instructions for how to run it.
- A template
Example analysis modules
To help you get started, we've created two analysis modules that you can use as references:
hello-R
is an example R-based analysis modulehello-python
is an example Python-based analysis module