Using LogisticRegression in a Pipeline
This notebook shows how to use LogisticRegression in a scikit-learn Pipeline with preprocessing, such as StandardScaler, to handle binary classification.
Setup
We use a synthetic dataset for reproducibility.
[1]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from glmpynet import LogisticRegression
# Generate synthetic dataset
X, y = make_classification(n_samples=200, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and fit pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('logistic_net', LogisticRegression())
])
pipeline.fit(X_train, y_train)
# Predict and evaluate
y_pred = pipeline.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Pipeline Accuracy: {accuracy:.2f}")
Pipeline Accuracy: 0.88
Explanation
The pipeline combines
StandardScalerfor feature scaling andLogisticRegressionfor classification.The dataset is the same as in the basic example, ensuring consistency.
Accuracy is similar to the basic example but may improve slightly due to scaling.
With
glmnet, expect comparable integration but potentially better performance on high-dimensional data.