References

Amalgamation of code, commands, formulas, or tidbits I find myself repeatedly googling the syntax of

Python

Dict <–> JSON

import json
with open('data.json', 'r') as fp:
    data = json.load(fp)

with open('data.json', 'w') as fp:
    json.dump(data, fp, sort_keys=True, indent=4)

Read a .jsonl

import json

file_path = 'data.jsonl'
with open(file_path, 'r', encoding='utf-8') as file:
    for line in file:
        data = json.loads(line)
        print(data)

Write a .jsonl

import json

with open('output.jsonl', 'w') as outfile:
    for dict in list_of_dicts:
        json.dump(dict, outfile)
        outfile.write('\n')

Pretty print a dictionary/JSON

print(json.dumps(ur_dict, indent=4))

Page through a compressed JSON (credit to levy5674!)

import gzip
import json
import boto3

def stream_objects(filename):
    with gzip.open(filename, 'rt') as f:
        for line in f:
            yield json.loads(line)

json_path = "json_path.ndjson.gz"
for i, line in enumerate(stream_objects(json_filepath)):
    print(line.keys())
    break

List comprehension with conditionals why are they like this w/ different conditionals this annoys me

new_list = [x for x in list]
new_list = [x for x in list if x <3]
new_list = [x if x <3 else y for x in list ]

Debugging Interpreter

import IPython; IPython.embed()

Time something in seconds

import time
start = time.time()
function_to_time()
duration = time.time() - start
print(duration)

Or with a decorator! (credit to Suresh Kumar!)

from functools import wraps
import time


def timeit(func):
    @wraps(func)
    def timeit_wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        total_time = end_time - start_time
        print(f'Function {func.__name__}{args} {kwargs} Took {total_time:.4f} seconds')
        return result
    return timeit_wrapper


@timeit
def calculate_something(num):
    """
    Simple function that returns sum of all numbers up to the square of num.
    """
    total = sum((x for x in range(0, num**2)))
    return total

if __name__ == '__main__':
    calculate_something(10)
    calculate_something(100)
    calculate_something(1000)
    calculate_something(5000)
    calculate_something(10000)

Jupyter Magic Commands

%matplotlib inline

%load_ext autoreload
%autoreload 2

Startup Script (#!/bin/bash not needed if it’s not a script)

#!/bin/bash
jupyter lab \
    --port=8888 \
    --allow-root \
    --NotebookApp.token='' \
    --NotebookApp.password=''

Add conda env to jupyter

python -m ipykernel install --user --name <ur_env_name_here> --display-name "<ur display name here>"

Bash

find a file with wildcards

find . -name "*.csv"

unzip file

unzip file.zip -d destination_folder

Conda

Create new env

conda create --name <ur_env_name_here> python=3.8

Delete old env

conda env remove --name <ur_env_name_here> 

Git commands

Remove large file from commit history - careful with this one:

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch <filepath>’ HEAD

Checkout branch from remote

git checkout -b <branch_name> origin/<branch_name>

See staged changes

git diff --cached

See changes from latest commit

git diff HEAD~ HEAD

See changes from last X commits

git diff HEAD~X HEAD

Copy files to and from remote

From your local machine:

scp /path/to/local/file user@example.com:/home/name/dir

scp user@example.com:/home/name/dir/file /path/to/local/dir

You can replace user@example.com with predefined aliases in .ssh/config

F1 vs Precision vs Recall vs Accuracy

Precision - percentage of positive predictions that were correct - True Positive / (False Positives + True Positives)

Recall - percentage of positive class that was correctly identified - True Positive / (False Negatives + True Positives)

Accuracy - percentage of predictions that were correct - True Positive + True Negatives / (False Positives + False Negatives + True Positives + True Negatives)

F1 - 2 * (Precision * Recall) / (Precision + Recall) - harmonic mean of precision and recall

i.e. vs e.g.

I.e. stands for id est or ‘that is’ and is used to clarify the statement before it. E.g. means exempli gratia or ‘for example. ‘