Skip to main content

Neuralk-AI Classification Expert Module

The previous examples have shown how to use the Neuralk Classifier to solve classification problems with Neuralk’s foundational model. However as we saw the Classifier is a generic building block, and for complex use-cases it should be integrated in a full data-processing pipeline. One option is to build our own pipeline locally and use the Classifier in it. An alternative is to rely on one of the fully integrated, end-to-end workflows available on the Neuralk platform (at the moment, classification and product categorization).

This is a different usage of the platform, in which we rely on it more heavily to create and manage projects and datasets, run data-processing workflows and store the resulting models and prediction results.

Here we illustrate the practical aspects of the platform (how to connect to it, upload data etc.) on a toy example. More information on use-cases and support is available on the Neuralk website.

WARNING

For this example to run, the environment variables NEURALK_USERNAME and NEURALK_PASSWORD must be defined. They will be used to connect to the Neuralk API.

The first step is to create a Neuralk client that we will use to interact with the platform. Note that we chose to make our username and password available through environment variables, but you can use other approaches to load them.

import os
from neuralk import Neuralk

client = Neuralk(os.environ['NEURALK_USERNAME'], os.environ['NEURALK_PASSWORD'])

Next, we use scikit-learn’s make_moons dataset to simulate a binary classification task.

from neuralk.datasets import two_moons

moons_data = two_moons()
print(moons_data["path"])
/Users/aabraham/neuralk/src/neuralk/datasets/data/moons.csv

We want to learn to classify this data with Neuralk. All analyses that run on the platform happen within a “project”. We must first create a project and upload a dataset in it.

project = client.projects.create("MoonsExample", exist_ok=True)

dataset = client.datasets.create(
project,
"MoonsExample",
moons_data["path"],
)

client.datasets.wait_until_complete(dataset, verbose=True)
Dataset(id='e416598a-84b3-419e-8047-a0cecf8e177f', name='MoonsExample', file_name='moons.csv', status='OK', analysis_list=[])

Now, we fit a classification pipeline. Note that no long-running training is happening, as the core of the pipeline is the pretrained foundational Neuralk model.

We specify the column to predict (“label”) and the features to use.

analysis_fit = client.analysis.create_classifier_fit(
dataset,
"Two Moons Classifier",
target_column="label",
feature_column_name_list=["feature1", "feature2"],
)

analysis_fit = client.analysis.wait_until_complete(analysis_fit, verbose=True)

Once our fit is completed, we can refer to it to perform predictions on unseen data. Here to keep the example simple we just apply it to the same data that we used for training.

analysis_predict = client.analysis.create_classifier_predict(
dataset, "Two Moons Prediction", analysis_fit
)
analysis_predict = client.analysis.wait_until_complete(analysis_predict, verbose=True)

Finally, we can download the prediction results.

import tempfile
from pathlib import Path

from sklearn.metrics import accuracy_score
import polars as pl

with tempfile.TemporaryDirectory() as results_dir:
client.analysis.download_results(analysis_predict, folder_path=results_dir)
results_file = next(Path(results_dir).iterdir())
y_pred = pl.read_parquet(results_file)["label"].to_numpy()

X = pl.read_csv(moons_data["path"])
y = X["label"].to_numpy()
X = X.drop("label").to_numpy()

acc = accuracy_score(y, y_pred)
print(f"Accuracy of classification: {acc}")
Accuracy of classification: 0.984

We finish by plotting the results.

import numpy as np
import matplotlib.pyplot as plt

plt.rcParams.update(
{
"axes.edgecolor": "#4d4d4d",
"axes.linewidth": 1.2,
"axes.facecolor": "#f5f5f5",
"figure.facecolor": "white",
}
)

fig, axes = plt.subplots(1, 2, figsize=(11, 5), dpi=120)
titles = ["Ground Truth", f"Model Prediction\nAccuracy: {acc:.2f}"]
colors = ["#1a73e8", "#ffa600"] # Professional blue & orange

for idx, ax in enumerate(axes):
labels = y if idx == 0 else y_pred
for lab in np.unique(labels):
ax.scatter(
X[labels == lab, 0],
X[labels == lab, 1],
s=70,
marker="o",
c=colors[lab],
edgecolors="white",
linewidths=0.8,
alpha=0.9,
label=f"Class {lab}" if idx == 0 else None, # Legend only on first panel
zorder=3,
)

# Aesthetics
ax.set_xticks([])
ax.set_yticks([])
ax.set_aspect("equal")
ax.set_title(titles[idx], fontsize=14, weight="bold", pad=12)
ax.grid(False)

# Subtle outer border (inside the axes limits)
x_margin = 0.4
y_margin = 0.4
ax.set_xlim(X[:, 0].min() - x_margin, X[:, 0].max() + x_margin)
ax.set_ylim(X[:, 1].min() - y_margin, X[:, 1].max() + y_margin)

# Panel annotation (A, B)
ax.text(
0.05,
0.98,
chr(ord("A") + idx),
transform=ax.transAxes,
fontsize=16,
fontweight="bold",
va="top",
ha="right",
)

# Shared legend beneath the plots
handles, labels_ = axes[0].get_legend_handles_labels()
fig.legend(
handles,
labels_,
loc="lower center",
ncol=2,
frameon=False,
fontsize=12,
bbox_to_anchor=(0.5, 0.02),
)

fig.tight_layout()
plt.subplots_adjust(bottom=0.05)
plt.show()

Total running time of the script: (0 minutes 21.339 seconds)

On this page