Tips and Tricks for Working with scikit-learn, a ML Library in Python

Bilal Muhammad

10 months ago

Table of Contents

1. Importing scikit-learn modules

from sklearn import module_name

2. Splitting data into training and testing sets

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

3. Preprocessing data using scalers

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

4. Creating and fitting a model

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

5. Evaluating model performance

from sklearn.metrics import accuracy_score, confusion_matrix
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
confusion_matrix = confusion_matrix(y_test, y_pred)

6. Hyperparameter tuning using GridSearchCV

from sklearn.model_selection import GridSearchCV
param_grid = {'n_estimators': [100, 200, 300], 'max_depth': [None, 5, 10]}
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_

7. Handling imbalanced datasets with resampling

from sklearn.utils import resample
# Upsample the minority class
X_minority_upsampled, y_minority_upsampled = resample(X_minority, y_minority, n_samples=len(X_majority), random_state=42)
# Downsample the majority class
X_majority_downsampled, y_majority_downsampled = resample(X_majority, y_majority, n_samples=len(X_minority), random_state=42)

8. Saving and loading models

from sklearn.externals import joblib
# Save the trained model
joblib.dump(model, 'model.pkl')
# Load the saved model
loaded_model = joblib.load('model.pkl')

9. Feature selection using SelectKBest

from sklearn.feature_selection import SelectKBest, f_classif
selector = SelectKBest(score_func=f_classif, k=10)
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)

10. Handling missing values with Imputer

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
X_train_imputed = imputer.fit_transform(X_train)
X_test_imputed = imputer.transform(X_test)

Conclusion

These are just a few tips and tricks to get you started with scikit-learn. Remember to refer to the scikit-learn documentation for more detailed information and additional functionality.