TDM 30100: Project 8 — 2022
Motivation: Python is an incredible language that enables users to write code quickly and efficiently. When using Python to write code, you are typically making a tradeoff between developer speed (the time it takes to write a functioning program) and program speed (how fast your code runs). Python code does not have the advantage of easily being compiled to machine code and shared. In Python, you need to learn how to use virtual environments, and it is good to have an understanding of how to build and push a package to pypi.
Context: This is the first in a series of 3 projects that focuses on setting up and using virtual environments, and creating a package. This is not intended to teach you everything, but rather, give you some exposure to the topics.
Scope: Python, virtual environments, pypi
Dataset(s)
The following questions will use the following dataset(s):
-
/anvil/projects/tdm/data/movies_and_tv/imdb.db
Questions
Question 1
This project will be focused on creating, updating, and understanding Python virtual environments. Since this is The Data Mine, we will pepper in some small data-related tasks, like writing functions to operate on data, but the focus is on virtual environments.
Let’s get started.
Use this article as your reference. First thing is first. We have a Jupyter notebook that we tend to work in, running our bash
code in bash
cells. This is very different than your typical environment. For this reason, let’s start by popping open a terminal, and working in the terminal.
You can open a terminal in JupyterLab by clicking on the blue "+" button in the upper left-hand corner of the Jupyter interface. Scroll down to the last row and click on the button that says "Terminal".
Start by taking a look at which python3
you are running. Run the following in the terminal.
which python3
Take a look at the available packages as follows.
python3 -m pip list
This doesn’t look right, it doesn’t look like our f2022-s2023 environment, does it? It doesn’t even have pandas
installed. This is because we don’t have JupyterLab configured to have our f2022-s2023 version of Python pre-loaded in a fresh terminal session. In fact, with this project, we aren’t going to use that environment!
The |
Instead, we are going to use the non-containerized version of Python that is running the JupyterLab instance itself! To load up this environment, run the following.
module load python/jupyterlab
Then, check out how things have changed.
which python3
python3 -m pip list
Looks like we are getting there! Let’s back up a bit and explain some things.
What does which python3
do? which
will print out the absolute path to the command which would be executed. In this case, running python3
would be the same as executing /anvil/projects/tdm/apps/python/3.10.5/bin/python3
.
What does the python3 -m pip
mean? The -m
stands for module-name. In a nutshell, this ensures that the correct pip
— the pip
associated with the current python3
is used! This is important, because, if you have many versions of Python installed on your system, if environment variables aren’t correctly set, it could be possible to use a completely different pip
associated with a completely different version of Python, which could cause all sorts of errors! To prevent this, it is safer to do python3 -m pip
instead of just pip
.
What does python3 -m pip list
do? The python3 -m pip
is the same as before. The list
command is an argument you can pass to pip
that lists the packages installed in the current environment.
Perform the following operations.
-
Use
venv
to create a new virtual environment calledquestion01
. -
Confirm that the virtual environment has been created by running the following.
source question01/bin/activate
-
This should activate your virtual environment. You will notice that
python3
now points to an interpreter in your virtual environment directory.which python3
output/path/to/question01/bin/python3
-
In addition, you can see the blank slate when it comes to installed Python packages.
python3 -m pip list
outputPackage Version ---------- ------- pip 22.0.4 setuptools 58.1.0 WARNING: You are using pip version 22.0.4; however, version 22.2.2 is available. You should consider upgrading via the '/home/x-kamstut/question01/bin/python3 -m pip install --upgrade pip' command.
See here on how to properly include a screenshot in your Jupyter notebook. If you do not properly submit the screenshot, you will likely lose points, so please take a minute to read it.
-
Screenshot showing
source question01/bin/activate
output. -
Screenshot showing
which python3
output after activating the virtual environment. -
Screenshow showing
python3 -m pip list
output after activating the virtual environment.
Question 2
Okay, in question (1) you ran some commands and supposedly created your own virtual environment. You are possibly still confused on what you did or why — that is okay! Things will hopefully become more clear as you progress.
Read this section of the article provided in question (1). In your own words, explain 2 good reasons why virtual environments are important when using Python. Place these explanations in a markdown cell in your notebook.
We are going to create and modify and destroy environments quite a bit! Don’t be intimidated by messing around with your environment. |
Okay, now that you’ve grokked why virtual environments are important, let’s try to see a virtual environment in action.
Activate your empty virtual environment from question (1) (if it is not already active). If you were to try and import the requests
package, what do you expect would happen? If you were to deactivate your virtual environment and then try and import the requests
package, what would you expect would happen?
Test out both! First activate your virtual environment from question (1), and then run python3
and try to import requests
. Next, run deactivate
to deactivate your virtual environment. Run python3
and try to import requests
. Were the results what you expected? Please include 2 screenshots — 1 for each attempt at importing requests
.
As you should hopefully see — the virtual environments do work! When a certain environment is active, only a certain set of packages is made available! Pretty cool! |
-
1-2 sentences, per reason, on why virtual environments are important when using Python.
-
1 screenshot showing the attempt to import the
requests
library from within your question01 virtual environment. -
1 screenshot showing the attempt to import the
requests
library from outside the question01 virtual environment.
Question 3
Create a Python script called imdb.py
that accepts a single argument, id
, and prints out the following.
python3 imdb.py imdb tt4236770
Title: Yellowstone Rating: 8.6
You can use the following as your skeleton.
#!/usr/bin/env python3
import argparse
import sqlite3
import sys
from rich import print
def get_info(iid: str) -> None:
"""
Given an imdb id, print out some basic info about the title.
"""
conn = sqlite3.connect("/anvil/projects/tdm/data/movies_and_tv/imdb.db")
cur = conn.cursor()
# make a query (fill in code here)
# print results
print(f"Title: [bold blue]{title}[/bold blue]\nRating: [bold green]{rating}[/bold green]")
def main():
parser = argparse.ArgumentParser()
subparsers = parser.add_subparsers(help="possible commands", dest="command")
some_parser = subparsers.add_parser("imdb", help="")
some_parser.add_argument("id", help="id to get info about")
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
if args.command == "imdb":
get_info(args.id)
if __name__ == "__main__":
main()
Deactivate any environment you may have active.
deactivate
Confirm that the proper python3
is active.
which python3
/anvil/projects/tdm/apps/python/3.10.5/bin/python3
Now test out your script by running the following.
python3 imdb.py imdb tt4236770
What happens? Well, the package rich
should not be installed to our current environment. Easy enough to fix, right? After all, we know how to make our own virtual environments now!
Create a virtual environment called question03
. This time, when creating your virtual environment, add an additional flag --copies
to the very end of the command. Activate your virtual environment and confirm that we are using the correct environment.
source question03/bin/activate
which python3
Immediately trying the script again should fail, since we still don’t have the rich
package installed.
python3 imdb.py imdb tt4236770
ModuleNotFoundError: No module named 'rich'
Okay! Use pip
(using our python3 -m pip
trick) to install rich
and try to run the script again!
Not only should the script now work, but, if you take a look at the packages installed in your environment, there should be some new additions.
python3 -m pip list
Package Version ---------- ------- commonmark 0.9.1 pip 22.0.4 Pygments 2.13.0 rich 12.6.0 setuptools 58.1.0
That is awesome! You just solved the issue of not being able to run some Python code because a package was not installed for you. You did this by first creating your own custom Python virtual environment, installing the required package to your virtual environment, and then executing the code that wasn’t previously working!
-
Screenshot showing the activation of the
question03
virtual environment, thepip
install, and successful output of the script. -
Screenshot showing the resulting set of packages,
python3 -m pip list
, for thequestion03
virtual environment.
Question 4
Okay, let’s take a tiny step back to peek at a few underlying details of our question01
and question03
virtual environments.
Specifically, start with the question01
environment. The entire environment lives within that question01
directory doesn’t it? Or _does it!?
ls -la question01/bin
Notice anything about the contents of the question01
bin directory? They are symbolic links! python3
actually points to the same interpreter that was active when we created the virtual environment, the /anvil/projects/tdm/apps/python/3.10.5/bin/python3
interpreter! But wait, how do we have a different set of packages then, if we are using the same Python interpreter? The answer is, your Python interpreter will look in a variety of locations for your packages. By activating your virtual environment, we’ve altered our PYTHONPATH
.
If you run the following, you will see the list of directories that Python searches for packages, when importing.
import sys
sys.path
['', '/anvil/projects/tdm/apps/python/3.10.5/lib/python3.10/site-packages', '/anvil/projects/tdm/apps/python/3.10.5/lib/python3.10', '/anvil/projects/tdm/apps/python/3.10.5/lib/python310.zip', '/anvil/projects/tdm/apps/python/3.10.5/lib/python3.10', '/anvil/projects/tdm/apps/python/3.10.5/lib/python3.10/lib-dynload', '/home/x-kamstut/question01/lib/python3.10/site-packages']
sys.path
is initialized from the PYTHONPATH
environment variable, plus some additional installation-dependent defaults. If you take a peek in question01/lib/python3.10/site-packages
, you will see where rich
is located. So, even if you look /anvil/projects/tdm/apps/python/3.10.5/lib/python3.10/site-packages
, and see that rich
is not installed in that location, because Python searches all of those locations for rich
and rich
is installed in question01/lib/python3.10/site-packages
, it will be successfully imported!
This begs the question, what if /anvil/projects/tdm/apps/python/3.10.5/lib/python3.10/site-packages
has an older version of rich
installed — which version will be imported? Well, let’s test this out!
If you look at plotly
in the jupyterlab environment, you will see it is version 5.8.2.
import plotly
plotly.__version__
5.8.2
Activate your question03
environment and install plotly==5.10.0
. Re-run the following code.
import plotly
plotly.__version__
What is your output? Is that expected?
We modified this question Thursday, October 27 due to a mistake by your instructor (Kevin). If you previously did this problem, no worries, you will get credit either way. |
If you take a look at |
-
Screenshots of your operations performed from start to finish for this question.
-
1-2 sentences explaining where Python looks for packages.
Question 5
Last, but certainly not least, is the important topic of pinning dependencies. This practice will allow someone else to replicate the exact set of packages needed to run your Python application.
By default, python3 -m pip install numpy
will install the newest compatible version of numpy to your current environment. Sometimes, that version could be too new and create issues with old code. This is why pinning is important.
You can choose to install an exact version of a package by specifying the version. For example, you could install numpy
version 1.16, even though the newest version is (as of writing) 1.23. Just run python3 -m pip install numpy==1.16
.
This is great, but is there an easy way to pass an entire list of all of the packages in your current virtual environment? Yes! Yes there is! Try it out.
python3 -m pip freeze > requirements.txt
cat requirements.txt
That’s pretty cool! That is a specially formatted list containing a pinned set of packages. You could do the reverse as well. Create a new file called requirements.txt
with the following contents copied and pasted.
commonmark==0.9.1 plotly==5.10.0 Pygments==2.13.0 requests==2.2.1 rich==12.6.0 tenacity==8.1.0 thedatamine==0.1.3
You can use the -r
option of pip
to install all of those pinned packages to an environment. Test it out! Create another new virtual environment called question05
, activate the environment, and use the -r
option and the requirements.txt
file to install all of the packages, with the exact same versions. Double check that the results are the same, and that the installed packages are identical to the requirements.txt
file.
Great job! Now, with some Python code, and a requirements.txt
file, you should be able to setup a virtual environment and run your friend or co-workers code! Very cool!
Unfortunately, there is more to this mess than meets the eye, and a lot more that can go wrong. But these basics will serve you well and help you solve lots and lots of problems! |
-
Screenshots showing the results of running the bash commands from the start of this question to the end.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |