Tutorial 4. Jupyter notebook for computational chemistry

Learning objectives

  • Get familiar with different ways to run python
  • Set up your personal computer ready to run Jupyter Notebook
  • Learn to run Jupyter Notebook from remote cluster through port forwarding

Introduction

Python is useful for computational chemistry in multiple aspects. In Tutorial 1, we have demonstrated how we can use python to generate input files and analyze output files. Python can also be used for big data analysis, machine learning, and generating publication-quality figures.

Common ways to run python include:

Run python in … Use case
command line building a package and run on remote computer clusters
interactive session simple tasks like: using python as a calculator; checking whether some packages can be loaded
Jupyter Notebook integrate different usages of Python together in one place

For the first two options, please refer to this tutorial. We will focus on the Jupyter Notebook.

Try and Learn

Here we will demonstrate a few different ways to run Jupyter Notebook.

1. Use readily available online service.

For example, from the XSEDE computational chemistry website: https://chemcompute.org/jupyterhub. There are many good examples of using Jupyter Notebooks for computational chemistry on that website. Please check them out.

Pro: No need to set up your own computer environment. Good for beginners.

Con: Need to upload your files. May not be convenient enough for daily code development.

2. Set up your own computer to run Jupyter Notebook.

Please follow the guidance of this website.

Con: More complicated setup

Pro: You can customize the conda environment freely to meet your development needs.

Special Tips: To be able to switch between different conda environments in your Jupyter Notebook, you need to install nb_conda, nb_conda_kernels in your conda environment, as described here.

3. Use port forwarding to run a Jupyter Notebook from remote computer cluster.

Detailed instructions can be found from many online resources, such as this website.

Con: Even more complicated setup

Pro: Some operations need to be done in this way. For example, if you have a very big dataset (millions of entries) on the remote cluster. Running on your own computer may be unrealistic, because

  1. Your computer doesn’t have a large enough disk space to copy over the dataset
  2. You computer doesn’t have enough memory to load the dataset, or is unable to load it without being extremely slow or crashing.
  3. You would like to submit computing jobs directly to the queue through Jupyter Notebook.
Written on September 12, 2020