How to Use Scikit-learn in Excel (Python)

This guide will explain how to use the Scikit-Learn library in Excel.

Scikit-learn, also known as sklearn, is a popular Python library used in machine learning. The library includes a variety of functions useful for data analysis, modeling, and predictive analytics.

Beginner analysts who are already familiar with Excel may want to explore more advanced data analysis techniques available in Scikit-learn.

Luckily, Excel has recently launched a feature that allows users to run Python code in Excel using the PY function. 

With this function, users can take advantage of the Scikit-learn library to help with various tasks, including data pre-processing, building models, evaluating models, and more.

In this guide, we will provide a step-by-step tutorial on how to use the Scikit-learn library in Excel.

The Anatomy of the PY Function

The syntax of the PY function is as follows:

=PY(python_code,return_type) 

Let’s look at each argument to understand how to use the PY function.

  • PY() refers to our PY function. This function allows us to run Python code on a secure Microsoft Cloud runtime.
  • python_code refers to the Python code you wish to run. This value must be static text.
  • The return_type parameter allows us to specify the desired return type. A value of 0 indicates an Excel value, while a value of 1 indicates a Python object.
  • Do note that the PY function cannot be used with any other Excel functions.

A Real Example of Using Scikit-Learn in Excel

Let’s explore a basic classification use case where we can apply the Scikit-learn library in Excel.

Importing Scikit-learn library

To import the Scikit-Learn library, we can select a cell and type “=PY(“ to convert the formula bar into a Python editor.

We can type “import sklearn” to add the Scikit-Learn library into our Microsoft Cloud Python environment.

import scikit-learn in excel as a library

In the example above, we’ve added a line in our Python code to import Scikit-learn, numpy, and pandas. 

The latter two are popular Python libraries that will help us with handling our data later.

Training and Evaluating a Model

Now that we have imported the Scikit-learn library, we can start performing data analytics on a dataset.

For this example, we’ll use the Iris dataset that comes with the Scikit-Learn library. 

Our goal is to train and evaluate a model that will classify a flower’s species based on the following features:

  • Sepal length
  • Sepal width
  • Petal length
  • Petal width

We’ll use the following Python code to load our dataset, train a model, and later evaluate the accuracy of our model:

# Load the Iris dataset

iris = sklearn.datasets.load_iris()

X = iris.data  # Features (sepal length, sepal width, petal length, petal width)

y = iris.target  # Target variable (species)

#Split dataset

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.85, random_state=42)

# Initialize the K-Nearest Neighbors (KNN) classifier with k=3

knn_classifier = sklearn.neighbors.KNeighborsClassifier(n_neighbors=3)

# Train the model on the training data

knn_classifier.fit(X_train, y_train)

# Make predictions on the test data

y_pred = knn_classifier.predict(X_test)

# Calculate accuracy

accuracy = sklearn.metrics.accuracy_score(y_test, y_pred)

accuracy

add python code

After evaluating the code, we’ve determined that our model has a 0.96 (96%) accuracy in predicting the species based on the provided features.

Click on the link below to create your own copy of our examples.

Head to the next section to read our step-by-step tutorial on how to use the Scikit-learn library in Excel.

How to Use Scikit-Learn in Excel

In this step-by-step guide, we will use Scikit-learn to train and evaluate a basic classification model.

  1. To access the Python Editor within Excel, select a blank cell and type “=PY(“ to set the cell to Edit mode. You may also use the keyboard shortcut Ctrl+Shift+Alt+P.
    to use scikit-learn in Excel, type the PY functionYou may also Insert Python code in Excel by selecting Formulas > Insert Python > Python in Excel.
    select Insert Python option
  2. Next, we’ll import the necessary libraries for our basic classification task.
    use scikit-learn in Excel by importing the library
  3. We’ll then add Python code to load the Iris dataset. The X variable will contain features such as sepal length, while the y variable will contain the target value or the value we want to predict.
    load iris dataset
  4. Next, we’ll use the train_test_split function to split our dataset into training and testing data.
    use scikit-learn in Excel to split datasetThe training data will be used to train our model, and the testing data is what we’ll use to determine the model’s accuracy.
  5. Choose the model you would like to train. In our example, we’ll use a K-Nearest Neighbors classifier. We’ll use the classifier’s fit() function to train the classifier on our training data.
    select the machine learning model to use
  6. Once we’ve trained our classifier, we can use the predict() function to predict the target variable given our test set.
    use the predict functionWe can then use the accuracy_score function to determine how accurate our prediction is.

These are all the steps you need to know to start using Scikit-Learn in Excel.

FAQs

  1. Why is the PY Function not accessible to me?
    As of September 2023, you’ll need to join the Microsoft 365 Insider Program to use the Python in Excel feature. This program allows all subscribers to have access to the latest features in Excel. Once you’ve signed up for the Insider program, choose the Beta Channel Insider level to access Python in Excel features. 

  2. Do I need to install Python to use Python in Excel?
    No, it is not required for users to have Python installed on their device to use the PY function. The Python in Excel feature runs the code in the Microsoft Cloud. Because of this implementation, the PY function requires internet access to run.

To learn more about using Excel for data analytics, you can read our post on how to add a regression line to a scatterplot graph. You may also be interested in our guide on how to perform multiple linear regression in Excel.

That’s all for this guide! Be sure to check out our library of spreadsheet resources, tips, and tricks! 

Get emails from us about Excel.

Our goal this year is to create lots of rich, bite-sized tutorials for Excel users like you. If you liked this one, you'd love what we are working on! Readers receive early access to new content.

0 Shares:
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like