Phase 3: Scikit-learn Compatible API
Objective: To create a high-level, user-friendly Python class that matches the Scikit-learn API and uses the C++ binding for its computations.
Model Name: LogisticRegression
The primary user-facing class will be named LogisticRegression. This
decision is a conscious choice to signal to users that the class is intended
as a direct, high-performance, drop-in replacement for
sklearn.linear_model.LogisticRegression.
While this creates the potential for a name conflict if both are imported into the same namespace, this is a standard and well-understood aspect of the Python import system. Users who need to compare both can use a standard aliasing convention:
from glmpynet import LogisticRegression as GlmnetLogisticRegression
from sklearn.linear_model import LogisticRegression as SklearnLogisticRegression
The benefit of immediate user familiarity and seamless integration into existing Scikit-learn workflows far outweighs this manageable risk.
API Design Philosophy: A Hybrid Approach
A key design challenge for this project is balancing the user familiarity of
the Scikit-learn API with the unique, high-performance capabilities of the
underlying glmnet engine. This project will adopt a hybrid API to
provide the best of both worlds: simplicity by default, with power on demand.
Rationale:
This approach is superior because it serves two distinct user groups without compromising the experience for either:
The Scikit-learn User: For the majority of users, the goal is seamless integration. They want to use our
LogisticRegressionclass in their existingPipelineandGridSearchCVworkflows. By accepting standard parameters likeCandpenalty, we provide a frictionless, “drop-in” experience.The `glmnet` Power User: A user familiar with the R glmnet package knows that its real power lies in computing the entire regularization path efficiently. Our API provides an “escape hatch” for these users, allowing them to bypass the Scikit-learn conventions and pass
glmnet-native parameters directly to the C++ engine for maximum performance and control.
Implementation:
The internal fit method will be responsible for translating the user-provided
parameters into the format required by the C++ binding, prioritizing the
glmnet-native parameters if they are provided.
API Contract
The class will implement the standard Scikit-learn estimator interface.
``__init__(self, …)``: The constructor will accept both Scikit-learn and
glmnet-style parameters.def __init__(self, # --- Scikit-learn Style Parameters (The Default) --- penalty='l2', C=1.0, # --- Glmnet-Style Parameters (The "Escape Hatch") --- alpha=None, lambda_path=None, nlambda=100, # --- Other Glmnet Features --- standardize=True, # ... other glmnet parameters ): # ...
- Core Methods: The class will implement all standard methods:
fit(self, X, y)predict(self, X)predict_proba(self, X)get_params()/set_params()
Future Expansion
The glmnet library is capable of more than just logistic regression
(e.g., linear, Poisson, Cox regression). The API will be designed with
this in mind.
A potential future architecture would involve:
- A base class,
GlmNetEstimator, that handles the common logic of parameter translation and interaction with the C++ binding.
- A base class,
- Specific child classes for different models, such as
LogisticRegression, ElasticNet, etc., that inherit from the base class.
- Specific child classes for different models, such as
This ensures that as we expand the library’s functionality, we can do so in a clean, modular, and maintainable way.