Using the Neuralk Classifier
The Classifier is the simplest way to use Neuralk’s In-Context
Learning model for classification. It offers the usual scikit-learn classifier
interface so it can easily be inserted into any machine-learning pipeline.
WARNING
For this example to run, the environment variables NEURALK_USERNAME and
NEURALK_PASSWORD must be defined. They will be used to connect to the
Neuralk API.
Simple example on toy data
We start by using the Classifier on simple data that needs no preprocessing.
Generate simple data:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification()
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(f"{X_train.shape=} {y_train.shape=} {X_test.shape=} {y_test.shape=}")
X_train.shape=(75, 20) y_train.shape=(75,) X_test.shape=(25, 20) y_test.shape=(25,)
Now we apply Neuralk’s classifier.
from sklearn.metrics import accuracy_score
from neuralk import Classifier
# Note: nothing actually happens during fit() -- in-context learning models are
# pretrained but require no fitting on our specific dataset.
classifier = Classifier().fit(X_train, y_train)
predictions = classifier.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")
Accuracy: 0.92
Working with non-numeric data
The Neuralk Classifier is a raw classifier that does not perform any preprocessing. To handle complex datasets, we need to encode non-numeric data and possibly reduce the feature dimension. The example below shows a simple pipeline that yields good results for most datasets.
The example dataset contains the descriptions and sale price of houses. The prediction target is the sale price (binned to transform it into a classification task).
import skrub
from neuralk import datasets
X, y = datasets.housing()
skrub.TableReport(X.assign(Sale_Price=y), max_plot_columns=100)