.. _phase_3_sklearn_api: Phase 3: Scikit-learn Compatible API ==================================== **Objective:** To create a high-level, user-friendly Python class that matches the Scikit-learn API and uses the C++ binding for its computations. Model Name: `LogisticRegression` -------------------------------- The primary user-facing class will be named ``LogisticRegression``. This decision is a conscious choice to signal to users that the class is intended as a direct, high-performance, drop-in replacement for ``sklearn.linear_model.LogisticRegression``. While this creates the potential for a name conflict if both are imported into the same namespace, this is a standard and well-understood aspect of the Python import system. Users who need to compare both can use a standard aliasing convention: .. code-block:: python from glmpynet import LogisticRegression as GlmnetLogisticRegression from sklearn.linear_model import LogisticRegression as SklearnLogisticRegression The benefit of immediate user familiarity and seamless integration into existing Scikit-learn workflows far outweighs this manageable risk. API Design Philosophy: A Hybrid Approach ---------------------------------------- A key design challenge for this project is balancing the user familiarity of the Scikit-learn API with the unique, high-performance capabilities of the underlying ``glmnet`` engine. This project will adopt a **hybrid API** to provide the best of both worlds: simplicity by default, with power on demand. **Rationale:** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This approach is superior because it serves two distinct user groups without compromising the experience for either: 1. **The Scikit-learn User:** For the majority of users, the goal is seamless integration. They want to use our ``LogisticRegression`` class in their existing ``Pipeline`` and ``GridSearchCV`` workflows. By accepting standard parameters like ``C`` and ``penalty``, we provide a frictionless, "drop-in" experience. 2. **The `glmnet` Power User:** A user familiar with the R `glmnet` package knows that its real power lies in computing the entire regularization path efficiently. Our API provides an "escape hatch" for these users, allowing them to bypass the Scikit-learn conventions and pass ``glmnet``-native parameters directly to the C++ engine for maximum performance and control. **Implementation:** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The internal ``fit`` method will be responsible for translating the user-provided parameters into the format required by the C++ binding, prioritizing the ``glmnet``-native parameters if they are provided. API Contract ------------ The class will implement the standard Scikit-learn estimator interface. * **``__init__(self, ...)``:** The constructor will accept both Scikit-learn and ``glmnet``-style parameters. .. code-block:: python def __init__(self, # --- Scikit-learn Style Parameters (The Default) --- penalty='l2', C=1.0, # --- Glmnet-Style Parameters (The "Escape Hatch") --- alpha=None, lambda_path=None, nlambda=100, # --- Other Glmnet Features --- standardize=True, # ... other glmnet parameters ): # ... * **Core Methods:** The class will implement all standard methods: * ``fit(self, X, y)`` * ``predict(self, X)`` * ``predict_proba(self, X)`` * ``get_params()`` / ``set_params()`` Future Expansion ---------------- The ``glmnet`` library is capable of more than just logistic regression (e.g., linear, Poisson, Cox regression). The API will be designed with this in mind. A potential future architecture would involve: * A base class, ``GlmNetEstimator``, that handles the common logic of parameter translation and interaction with the C++ binding. * Specific child classes for different models, such as ``LogisticRegression``, ``ElasticNet``, etc., that inherit from the base class. This ensures that as we expand the library's functionality, we can do so in a clean, modular, and maintainable way.