On Organizing Machine Learning Code

Data Scientists breaking up their notebooks into standard methods (or modules) can improve their process without interrupting their workflow.

The following proposed organization of code would aid in:

  • faster deploy times
  • less code refactoring
  • interchangable functions
  • easier code hand offs
  • reproducable results

How do you run this? Try the makeup (pypi.org) framework designed just for this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
"""
Code taken from: https://scikit-learn.org/stable/tutorial/statistical_inference/supervised_learning.html

It has been modified to encapsulate and label sections of code.
"""
import numpy as np
from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier


def load(**kwargs):
"""
"Get the data" this sparse function allows a framework
to be able to swap out different data sets easily.
"""
iris_X, iris_y = datasets.load_iris(return_X_y=True)
return iris_X, iris_y


def features(X, y, **kwargs):
"""Optional method to transform data, add or remove columns"""
# if i had pandas here, I would do something!
return X, y


def split(X, y, **kwargs):
np.random.seed(0)
indices = np.random.permutation(len(X))
X_train = X[indices[:-10]]
y_train = y[indices[:-10]]
X_test = X[indices[-10:]]
y_test = y[indices[-10:]]

return (X_train, y_train),\
(X_test, y_test)


def train(X, y, **kwargs):
"""Create and fit a nearest-neighbor classifier"""
model = KNeighborsClassifier()
model.fit(X, y)

return model


def predict(model, X, **kwargs):
return model.predict(X)

tg.