Deployment eines Data Science Projekts

In diesem kleinen Tutorial möchte ich zeigen, wie man ein kleines Data Science Projekt deployen kann. Das Projekt selbst ist hier nur ein Dummy, welches einen Forecast simuliert. Im Zentrum steht das deployment einer Rest-APIs und eines simplen Frontend im Container auf GCP. Dabei wird hier allerdings keine GCP spezifischen Features wie Vertex AI verwendet, sondern es geht lediglich um Basic deployment. Ebenso besteht das Frontend aus einer einfachen HTML-Webseite, ohne Nutzung von Streamlit, Gradio oder anderen Tools.

Einrichtung

Für das Projekt habe ich ein neues Github Repostory erstellt und lokal geklont. Wie das geht wird beispielweise hier erklärt.

Den Code zum Projekt findet ihr hier: https://github.com/gochxx/restapi/

Darüber hinaus benötigen wir ein Python Environment. Dieses erstelle ich folgendermaßen:

conda create --name restapi python=3.10

conda create --name restapi python=3.10

Das Environment aktivieren kann man mit

conda activate restapi

conda activate restapi

Für das Projekt müssen wir einige Pakete installieren

pip install numpy
pip install pandas
pip install scikit-learn
pip install flask
pip install requests
pip install pytest

pip install numpy
pip install pandas
pip install scikit-learn
pip install flask
pip install requests
pip install pytest

Damit wir die Abhängigkeiten später im Container erneut installieren können, erzeugen wir eine Requirements.txt

pip freeze > requirements.txt

pip freeze > requirements.txt

Logging

Um Fehler sauber nachvollziehen zu können verwende ich im Projekt einen Logger. Dieser wird so initialisiert:

# Logger-Konfiguration
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[
        logging.StreamHandler(),  # Ausgabe in die Konsole
        logging.FileHandler("forecast.log")  # Ausgabe in eine Datei
    ]
)

logger = logging.getLogger(__name__)  # Modul-spezifischer Logger

# Logger-Konfiguration
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[
        logging.StreamHandler(),  # Ausgabe in die Konsole
        logging.FileHandler("forecast.log")  # Ausgabe in eine Datei
    ]
)

logger = logging.getLogger(__name__)  # Modul-spezifischer Logger

Der Forecast

Für den Forecast erstelle ich eine Klasse. Bei der Initialisierung der Klasse wird ein einfaches Regressionsmodell erstellt.

class ForecastModel:
    """
    A class to represent a simple forecasting model using Linear Regression.

    Attributes:
        model (LinearRegression): The underlying scikit-learn Linear Regression model.
        is_trained (bool): Indicates whether the model has been trained.
    """

    def __init__(self) -> None:
        """
        Initializes the ForecastModel with a LinearRegression model.
        """
        self.model: LinearRegression = LinearRegression()
        self.is_trained: bool = False
        logger.info("ForecastModel initialized.")

class ForecastModel:
    """
    A class to represent a simple forecasting model using Linear Regression.

    Attributes:
        model (LinearRegression): The underlying scikit-learn Linear Regression model.
        is_trained (bool): Indicates whether the model has been trained.
    """

    def __init__(self) -> None:
        """
        Initializes the ForecastModel with a LinearRegression model.
        """
        self.model: LinearRegression = LinearRegression()
        self.is_trained: bool = False
        logger.info("ForecastModel initialized.")

Im nächsten Schritt wird eine Funktion erstellt, die das Modell trainiert. Dem Model können entweder Daten bereitgestellt werden oder es erzeugt zufällig Daten. Dabei wird zunächst geprüft, ob das Modell bereits trainiert ist. Wenn nicht werden die Daten in Trainings- und Testdaten aufgeteilt, ein Modell trainiert und der MSE des Trainingssets berechnet.

    def train(self, X: Optional[np.ndarray] = None, y: Optional[np.ndarray] = None) -> None:

        """
        Trains the linear regression model with provided or default data.

        Args:
            X (Optional[np.ndarray]): The input features for training. Defaults to dummy data.
            y (Optional[np.ndarray]): The target values for training. Defaults to dummy data.

        if X is None and y is None:
            ValueError: If the shapes of X and y do not match.
        """

        if self.is_trained: # Check if the model is already trained
            logger.info("Model is already trained. Skipping re-training.")
            return

        if X is None or y is None: # Use default dummy data if no training data is provided
            logger.warning("No training data provided. Using default dummy data.")
            X = np.array([[i] for i in range(10)])  # Features
            y = np.array([2 * i + 1 for i in range(10)])  # Labels
        else: # Validate the input data
            if X.shape[0] != y.shape[0]: # Check if the number of samples in X and y match
                logger.error("The number of samples in X and y must match.")
                raise ValueError("The number of samples in X and y must match.")

        # Split data into training and testing sets
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

        logger.info("Starting model training...")
        self.model.fit(X_train, y_train) # Train the model 
        self.is_trained = True
        logger.info("Model training completed.")

        # Evaluate the model
        predictions = self.model.predict(X_test) # Make predictions
        mse = mean_squared_error(y_test, predictions) # Calculate Mean Squared Error
        logger.info(f"Model evaluation completed. Mean Squared Error: {mse:.4f}")

    def train(self, X: Optional[np.ndarray] = None, y: Optional[np.ndarray] = None) -> None:

        """
        Trains the linear regression model with provided or default data.

        Args:
            X (Optional[np.ndarray]): The input features for training. Defaults to dummy data.
            y (Optional[np.ndarray]): The target values for training. Defaults to dummy data.

        if X is None and y is None:
            ValueError: If the shapes of X and y do not match.
        """

        if self.is_trained: # Check if the model is already trained
            logger.info("Model is already trained. Skipping re-training.")
            return

        if X is None or y is None: # Use default dummy data if no training data is provided
            logger.warning("No training data provided. Using default dummy data.")
            X = np.array([[i] for i in range(10)])  # Features
            y = np.array([2 * i + 1 for i in range(10)])  # Labels
        else: # Validate the input data
            if X.shape[0] != y.shape[0]: # Check if the number of samples in X and y match
                logger.error("The number of samples in X and y must match.")
                raise ValueError("The number of samples in X and y must match.")

        # Split data into training and testing sets
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

        logger.info("Starting model training...")
        self.model.fit(X_train, y_train) # Train the model 
        self.is_trained = True
        logger.info("Model training completed.")

        # Evaluate the model
        predictions = self.model.predict(X_test) # Make predictions
        mse = mean_squared_error(y_test, predictions) # Calculate Mean Squared Error
        logger.info(f"Model evaluation completed. Mean Squared Error: {mse:.4f}")

Neben dem Training wird auch eine predict Funktion erstellt. Diese dient dazu das Modell anzuwenden, um einen Forecast basierend auf neuen Daten zu erzeugen.

    def predict(self, features: List[List[float]]) -> List[float]:
        """
        Predicts target values for the given features.

        Args:
            features (List[List[float]]): A list of feature vectors for prediction.

        Returns:
            List[float]: A list of predicted values.

        Raises:
            ValueError: If the model is not trained or if the input format is invalid.
        """
        if not self.is_trained:
            logger.error("Prediction attempted on an untrained model.")
            raise ValueError("Model is not trained yet. Please train the model before making predictions.")

        # Validate and prepare input features
        features_array = np.array(features)
        if features_array.ndim != 2 or features_array.shape[1] != 1:
            logger.error(f"Invalid input format for prediction: {features}")
            raise ValueError("Input features must be a list of lists, each containing a single value.")

        # Make predictions
        logger.info(f"Making predictions for input: {features}")
        predictions = self.model.predict(features_array)

        # Validate predictions
        if not isinstance(predictions, np.ndarray) or not np.issubdtype(predictions.dtype, np.number):
            logger.error(f"Unexpected model output: {predictions}")
            raise ValueError(f"Unexpected model output: {predictions}")

        logger.info(f"Predictions completed successfully: {predictions.tolist()}")
        return predictions.tolist()

    def predict(self, features: List[List[float]]) -> List[float]:
        """
        Predicts target values for the given features.

        Args:
            features (List[List[float]]): A list of feature vectors for prediction.

        Returns:
            List[float]: A list of predicted values.

        Raises:
            ValueError: If the model is not trained or if the input format is invalid.
        """
        if not self.is_trained:
            logger.error("Prediction attempted on an untrained model.")
            raise ValueError("Model is not trained yet. Please train the model before making predictions.")

        # Validate and prepare input features
        features_array = np.array(features)
        if features_array.ndim != 2 or features_array.shape[1] != 1:
            logger.error(f"Invalid input format for prediction: {features}")
            raise ValueError("Input features must be a list of lists, each containing a single value.")

        # Make predictions
        logger.info(f"Making predictions for input: {features}")
        predictions = self.model.predict(features_array)

        # Validate predictions
        if not isinstance(predictions, np.ndarray) or not np.issubdtype(predictions.dtype, np.number):
            logger.error(f"Unexpected model output: {predictions}")
            raise ValueError(f"Unexpected model output: {predictions}")

        logger.info(f"Predictions completed successfully: {predictions.tolist()}")
        return predictions.tolist()

Die Rest-API

Die Rest-API erstellen wir mit Flask. Dies ist eine der einfachsten Möglichkeiten um in Python eine Rest-API bereitzustellen.

from flask import Flask, request, jsonify
from forecast import ForecastModel
import logging
from typing import Any, Dict, Tuple

# Logging-Konfiguration
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")


app = Flask(__name__)

# Lade das Modell
model = ForecastModel()
model.train()


@app.route("/predict", methods=["POST"])
def predict() -> Tuple[Dict[str, Any], int]:
    """
    Predict endpoint for the Flask API.

    Expects a JSON payload with a 'features' key containing a list of lists.
    Returns a JSON response with the model predictions or an error message.

    Returns:
        Tuple[Dict[str, Any], int]: A JSON response and HTTP status code.
    """
    data = request.get_json()

    # Eingabevalidierung
    if not data or "features" not in data:
        logging.error("Invalid input: Missing 'features' key")
        return jsonify({"error": "Invalid input, 'features' key required"}), 400

    try:
        # Validierung der Features
        features = data["features"]
        if not isinstance(features, list) or not all(isinstance(f, list) and len(f) == 1 for f in features):
            logging.error(f"Invalid input format: {features}")
            return jsonify({"error": "Features must be a list of lists, each containing a single value."}), 400

        # Vorhersage
        prediction = model.predict(features)

        # Validierung des Outputs
        if not isinstance(prediction, list) or not all(isinstance(p, float) for p in prediction):
            logging.error(f"Unexpected model output: {prediction}")
            return jsonify({"error": "Unexpected model output. Please contact support."}), 500

        return jsonify({"prediction": prediction}), 200

    except ValueError as e:
        logging.error(f"Value error: {str(e)}")
        return jsonify({"error": f"Value error: {str(e)}"}), 400

    except Exception as e:
        logging.error(f"Unexpected error: {str(e)}")
        return jsonify({"error": "An unexpected error occurred. Please contact support."}), 500


if __name__ == "__main__":
    app.run(debug=True)

from flask import Flask, request, jsonify
from forecast import ForecastModel
import logging
from typing import Any, Dict, Tuple

# Logging-Konfiguration
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")


app = Flask(__name__)

# Lade das Modell
model = ForecastModel()
model.train()


@app.route("/predict", methods=["POST"])
def predict() -> Tuple[Dict[str, Any], int]:
    """
    Predict endpoint for the Flask API.

    Expects a JSON payload with a 'features' key containing a list of lists.
    Returns a JSON response with the model predictions or an error message.

    Returns:
        Tuple[Dict[str, Any], int]: A JSON response and HTTP status code.
    """
    data = request.get_json()

    # Eingabevalidierung
    if not data or "features" not in data:
        logging.error("Invalid input: Missing 'features' key")
        return jsonify({"error": "Invalid input, 'features' key required"}), 400

    try:
        # Validierung der Features
        features = data["features"]
        if not isinstance(features, list) or not all(isinstance(f, list) and len(f) == 1 for f in features):
            logging.error(f"Invalid input format: {features}")
            return jsonify({"error": "Features must be a list of lists, each containing a single value."}), 400

        # Vorhersage
        prediction = model.predict(features)

        # Validierung des Outputs
        if not isinstance(prediction, list) or not all(isinstance(p, float) for p in prediction):
            logging.error(f"Unexpected model output: {prediction}")
            return jsonify({"error": "Unexpected model output. Please contact support."}), 500

        return jsonify({"prediction": prediction}), 200

    except ValueError as e:
        logging.error(f"Value error: {str(e)}")
        return jsonify({"error": f"Value error: {str(e)}"}), 400

    except Exception as e:
        logging.error(f"Unexpected error: {str(e)}")
        return jsonify({"error": "An unexpected error occurred. Please contact support."}), 500


if __name__ == "__main__":
    app.run(debug=True)

Manuelles Testen der RestAPI

Um die Rest-API lokal und manuell zu testen kann man folgendes Python Script erzeugen. Es muss dabei in einem separten Terminal laufen, während in einem anderen Terminal die Flask App läuft.

import requests
import json


"""
Dieses Script dient dazu, die API manuell zu testen. 
Um es auszuführen muss die Flask App in "app.py" gestartet werden. Dieses Script hier muss dann in einem separaten Terminal laufen.
"""

# API-Endpunkt (Basis-URL der Flask-App)
BASE_URL = "http://127.0.0.1:5000"

# Beispiel-Daten für die Anfrage
data = {
    "features": [[5], [10], [15]]  # Eingabedaten für die Vorhersage
}

# POST-Anfrage senden
try:
    response = requests.post(f"{BASE_URL}/predict", json=data)
    response.raise_for_status()  # Überprüft, ob ein HTTP-Fehler aufgetreten ist

    # Ausgabe der Antwort
    print("Antwort vom Server:")
    print(json.dumps(response.json(), indent=4))  # Schön formatierte Ausgabe der JSON-Antwort
except requests.exceptions.RequestException as e:
    print(f"Fehler bei der Anfrage: {e}")

import requests
import json


"""
Dieses Script dient dazu, die API manuell zu testen. 
Um es auszuführen muss die Flask App in "app.py" gestartet werden. Dieses Script hier muss dann in einem separaten Terminal laufen.
"""

# API-Endpunkt (Basis-URL der Flask-App)
BASE_URL = "http://127.0.0.1:5000"

# Beispiel-Daten für die Anfrage
data = {
    "features": [[5], [10], [15]]  # Eingabedaten für die Vorhersage
}

# POST-Anfrage senden
try:
    response = requests.post(f"{BASE_URL}/predict", json=data)
    response.raise_for_status()  # Überprüft, ob ein HTTP-Fehler aufgetreten ist

    # Ausgabe der Antwort
    print("Antwort vom Server:")
    print(json.dumps(response.json(), indent=4))  # Schön formatierte Ausgabe der JSON-Antwort
except requests.exceptions.RequestException as e:
    print(f"Fehler bei der Anfrage: {e}")

Ebenso kann auch der Forecast manuell getestet werden. Dafür verwende ich folgendes Script:

from forecast import ForecastModel
import numpy as np

"""
Dieses Script dient dazu, den Forecast im Scripts "forecast.py" manuell zu testen. 
"""

# Instanziiere das Modell
model = ForecastModel()

# Beispiel-Daten vorbereiten
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])

# Trainiere das Modell
print("Training startet...")
model.train(X, y)
print("Training abgeschlossen.")

# Vorhersage testen
print("Vorhersagen werden durchgeführt...")
features = [[6], [7], [8]]
predictions = model.predict(features)

print(f"Features: {features}")
print(f"Vorhersagen: {predictions}")

from forecast import ForecastModel
import numpy as np

"""
Dieses Script dient dazu, den Forecast im Scripts "forecast.py" manuell zu testen. 
"""

# Instanziiere das Modell
model = ForecastModel()

# Beispiel-Daten vorbereiten
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])

# Trainiere das Modell
print("Training startet...")
model.train(X, y)
print("Training abgeschlossen.")

# Vorhersage testen
print("Vorhersagen werden durchgeführt...")
features = [[6], [7], [8]]
predictions = model.predict(features)

print(f"Features: {features}")
print(f"Vorhersagen: {predictions}")

Testing

Dieses Skript dient dazu den Code zu testen. Es wird das Python Modul pytest verwendet.

import pytest
from forecast import ForecastModel
from flask.testing import FlaskClient
from flask import Flask


@pytest.fixture
def trained_model() -> ForecastModel:
    """
    Fixture, das ein trainiertes ForecastModel bereitstellt.

    Returns:
        ForecastModel: Ein trainiertes Instanzobjekt des ForecastModel.
    """
    model = ForecastModel()
    model.train()
    return model


def test_model_training(trained_model: ForecastModel) -> None:
    """
    Testet, ob das Modell nach dem Training korrekt als trainiert markiert ist.

    Args:
        trained_model (ForecastModel): Ein trainiertes Modell.
    """
    assert trained_model.is_trained is True, "Model should be trained after calling train()"


def test_model_prediction_shape(trained_model: ForecastModel) -> None:
    """
    Testet, ob die Vorhersagen die gleiche Länge wie die Eingabe haben.

    Args:
        trained_model (ForecastModel): Ein trainiertes Modell.
    """
    features = [[5], [10], [15]]
    predictions = trained_model.predict(features)
    assert len(predictions) == len(features), "Predictions should match the number of inputs"


def test_model_prediction_values(trained_model: ForecastModel) -> None:
    """
    Testet, ob die Vorhersagen eine Liste von Floats sind.

    Args:
        trained_model (ForecastModel): Ein trainiertes Modell.
    """
    features = [[5], [10]]
    predictions = trained_model.predict(features)
    assert isinstance(predictions, list), "Predictions should be a list"
    assert all(isinstance(x, float) for x in predictions), "All predictions should be floats"


def test_untrained_model_prediction() -> None:
    """
    Testet, ob ein Fehler auftritt, wenn ein untrainiertes Modell Vorhersagen ausführen soll.
    """
    model = ForecastModel()
    with pytest.raises(ValueError, match="Model is not trained yet."):
        model.predict([[5]])


@pytest.fixture
def client() -> FlaskClient:
    """
    Fixture, das einen Test-Client für die Flask-API bereitstellt.

    Returns:
        FlaskClient: Ein Test-Client der Flask-Anwendung.
    """
    from app import app
    with app.test_client() as client:
        yield client


def test_api_valid_input(client: FlaskClient) -> None:
    """
    Testet die API mit gültigen Eingabedaten.

    Args:
        client (FlaskClient): Ein Test-Client der Flask-Anwendung.
    """
    response = client.post("/predict", json={"features": [[5], [10]]})
    assert response.status_code == 200
    data = response.get_json()
    assert "prediction" in data
    assert isinstance(data["prediction"], list)


def test_api_invalid_input(client: FlaskClient) -> None:
    """
    Testet die API mit ungültigen Eingabedaten.

    Args:
        client (FlaskClient): Ein Test-Client der Flask-Anwendung.
    """
    response = client.post("/predict", json={"wrong_key": [[5], [10]]})
    assert response.status_code == 400
    data = response.get_json()
    assert "error" in data

def test_api_invalid_input_values(client: FlaskClient) -> None:
    """
    Testet die API mit ungültigen Eingabedaten, hier nicht Key sondern Werte der JSON.

    Args:
        client (FlaskClient): Ein Test-Client der Flask-Anwendung.
    """
    response = client.post("/predict", json={"features": [[5], [10,20]]})
    assert response.status_code == 400
    data = response.get_json()
    assert "error" in data

def test_api_unexpected_error(client: FlaskClient) -> None:
    """
    Testet die API, wenn das Modell nicht verfügbar ist.

    Args:
        client (FlaskClient): Ein Test-Client der Flask-Anwendung.
    """
    from app import app
    app.view_functions["predict"].__globals__["model"] = None  # Modell temporär entfernen
    response = client.post("/predict", json={"features": [[5], [10]]})
    assert response.status_code == 500
    data = response.get_json()
    assert "error" in data

import pytest
from forecast import ForecastModel
from flask.testing import FlaskClient
from flask import Flask


@pytest.fixture
def trained_model() -> ForecastModel:
    """
    Fixture, das ein trainiertes ForecastModel bereitstellt.

    Returns:
        ForecastModel: Ein trainiertes Instanzobjekt des ForecastModel.
    """
    model = ForecastModel()
    model.train()
    return model


def test_model_training(trained_model: ForecastModel) -> None:
    """
    Testet, ob das Modell nach dem Training korrekt als trainiert markiert ist.

    Args:
        trained_model (ForecastModel): Ein trainiertes Modell.
    """
    assert trained_model.is_trained is True, "Model should be trained after calling train()"


def test_model_prediction_shape(trained_model: ForecastModel) -> None:
    """
    Testet, ob die Vorhersagen die gleiche Länge wie die Eingabe haben.

    Args:
        trained_model (ForecastModel): Ein trainiertes Modell.
    """
    features = [[5], [10], [15]]
    predictions = trained_model.predict(features)
    assert len(predictions) == len(features), "Predictions should match the number of inputs"


def test_model_prediction_values(trained_model: ForecastModel) -> None:
    """
    Testet, ob die Vorhersagen eine Liste von Floats sind.

    Args:
        trained_model (ForecastModel): Ein trainiertes Modell.
    """
    features = [[5], [10]]
    predictions = trained_model.predict(features)
    assert isinstance(predictions, list), "Predictions should be a list"
    assert all(isinstance(x, float) for x in predictions), "All predictions should be floats"


def test_untrained_model_prediction() -> None:
    """
    Testet, ob ein Fehler auftritt, wenn ein untrainiertes Modell Vorhersagen ausführen soll.
    """
    model = ForecastModel()
    with pytest.raises(ValueError, match="Model is not trained yet."):
        model.predict([[5]])


@pytest.fixture
def client() -> FlaskClient:
    """
    Fixture, das einen Test-Client für die Flask-API bereitstellt.

    Returns:
        FlaskClient: Ein Test-Client der Flask-Anwendung.
    """
    from app import app
    with app.test_client() as client:
        yield client


def test_api_valid_input(client: FlaskClient) -> None:
    """
    Testet die API mit gültigen Eingabedaten.

    Args:
        client (FlaskClient): Ein Test-Client der Flask-Anwendung.
    """
    response = client.post("/predict", json={"features": [[5], [10]]})
    assert response.status_code == 200
    data = response.get_json()
    assert "prediction" in data
    assert isinstance(data["prediction"], list)


def test_api_invalid_input(client: FlaskClient) -> None:
    """
    Testet die API mit ungültigen Eingabedaten.

    Args:
        client (FlaskClient): Ein Test-Client der Flask-Anwendung.
    """
    response = client.post("/predict", json={"wrong_key": [[5], [10]]})
    assert response.status_code == 400
    data = response.get_json()
    assert "error" in data

def test_api_invalid_input_values(client: FlaskClient) -> None:
    """
    Testet die API mit ungültigen Eingabedaten, hier nicht Key sondern Werte der JSON.

    Args:
        client (FlaskClient): Ein Test-Client der Flask-Anwendung.
    """
    response = client.post("/predict", json={"features": [[5], [10,20]]})
    assert response.status_code == 400
    data = response.get_json()
    assert "error" in data

def test_api_unexpected_error(client: FlaskClient) -> None:
    """
    Testet die API, wenn das Modell nicht verfügbar ist.

    Args:
        client (FlaskClient): Ein Test-Client der Flask-Anwendung.
    """
    from app import app
    app.view_functions["predict"].__globals__["model"] = None  # Modell temporär entfernen
    response = client.post("/predict", json={"features": [[5], [10]]})
    assert response.status_code == 500
    data = response.get_json()
    assert "error" in data

Das Frontend

Wie man ein Frontend erstellt, zeige ich hier

Deployment

Wie man das Ganze in einen Container verpackt zeige ich hier und wie man diesen Cloud in der Cloud deployed zeige ich hier.

Einrichtung

Logging

Der Forecast

Die Rest-API

Testing

Das Frontend

Deployment

Das könnte dich auch interessieren

Cloud Deployment

Terraform

GCP Pub/Sub für Data-Science