Tuesday, January 10, 2023

Machine learning in e-learning - k-means clustering

Is it possible to use machine learning in e-learning? There are several aspects of e-learning where machine learning (ML) can be applied. Machine learning can be used directly in the learning content or on analytics. Since an LMS captures a large amount of data every day related to the content (courses or modules), learners and trainers, machine learning can immediately find application in analytics. Before we get to that, let’s understand what Machine Learning is in simple terms. Machine learning (ML) is a subset of Artificial Intelligence (AI). In Machine Learning, you write algorithms that allow the system to learn from the data provided. The system discovers data features and then uses that to conclude a fact, perform an action or improve itself.

ML use cases in e-learning

Let’s take a simple example where machine learning can be applied in e-learning to get more insight from analytics. Let's assume we have an LMS that has 100 courses, 1000 learners and 20 trainers. Trainers are responsible for conducting blended learning models for learners. Trainers build e-learning content as well as conduct ILT sessions. We want to help trainers deliver more effectively and so we want to find out how many types of trainers we have. To do this we need to cluster the trainers and conduct tailored train-the-trainer programs.

We have two data points for trainers. The first is the average ratings received from learners, which is a number from 0 to 10. The second is the performance of learners in topics handled by these trainers. This comes from the scores in the assessments at the end of those topics. Since this is a percentage score, it will be a number from 0 to 100. We will divide this by 10 and bring it down to the same range as ratings i.e., 0 to 10.

As of this point, we do not know how many Types of trainers we have. If two Trainers have similar characteristics (pair of values) we could say that they are of the same Type for the purpose of training them.

E-learning data analytics

In order to analyse the data in the above scenario, we’ll write an algorithm called k-means Clustering. It’s a simple technique used to group data, but we don’t tell the program how we want it grouped. Instead we let the program figure that out. If we told the program how we want it grouped, it would be biased.

The table below contains the data of the 20 trainers. The data is plotted on the graph on the right side. To change the data and the plot, you could either adjust the values individually or just click on the Randomize Data button. Once you are satisfied with a visual arrangement of the data on the plot, click on the Cluster by k-means button. You could either change the values manually or randomize them to take a new set to try again.

Trainer Name Average quiz score of learners
(in topics taught by them)
Average rating received
(from learners)
James
Linda
William
Susan
Richard
Mary
Thomas
Charles
Lisa
Nancy
Emily
Sandra
Steven
Paul
Andrew
Kevin
Brian
George
Amy
Helen

Data plot:

Observations: The program provided the best possible clusters in the data in an unsupervised manner. It is unsupervised because the program was not told what it should look for. Depending on the arrangement of the data, it simply puts them into groups. Since we did not create any sort of labels, the program cannot tell if a particular group of trainers are effective or not. It is now for us to decide what kind of feedback and training each cluster of trainers must be given so that they can perform more effectively.

Note: There are several ways to determine which of the clusters is the best. This program uses a method called the knee / elbow technique and may have some limitations in the implementation.

k-means clustering steps

Since we do not know how many good clusters are possible, the program will need to identify this for us. We will start off by trying out with 2 clusters and go all the way up till 10. We possibly won't be able to conduct train-the-trainer programs for more than this anyway. This number, i.e. the number of clusters, is denoted as k and the term k-means comes from this.
The basic construct of a k-means clustering algorithm is the following:

  1. Start off by trying to create 2 clusters. For this, take 2 random points on the chart
  2. Now, take each trainer and find out to which of these points they are closer and put them in that cluster
  3. Take the positions of trainers in each cluster that we just created and find its mean value. Then repeat step 2, i.e., look at each trainer again and recreate clusters based on which mean value they are closer to.
  4. Repeat the process until the clusters no longer change. This will be our solution in this iteration.
  5. To check the quality of the clusters in the current solution, compute the variation in each cluster and add them up to get a total variation of that solution.
  6. Iterate this process several times (say 5-10 times) and check which solution has the least total variation. This is the best solution for that value of k.
  7. Now do the above steps for different values of k. The total variation reduces as the value of k increases. The reduction in variation per value of k does not fall as quickly after a certain point. This value of k is the optimal number of clusters.

k-means clustering real world examples

Clustering gives you great insight that you might otherwise miss. Let’s look at a couple of simple examples in e-learning:

  • Clustering learners by content preference

    There are different ways a learner can learn. This could be through watching videos, taking e-learning courses, attending classroom lectures, reading (books, articles, guides and reference material), interacting with simulations, playing games and performing role-plays, working on case studies or attending practice sessions and labs. When we create learning journeys we usually assume that all learners would learn from and enjoy the same format of content. So, we push content of the same format to all learners. However, that assumption is not correct. Learners prefer different formats. If we could cluster people based on their interests and provide them learning content in a format that they are most receptive to, our training can become very effective.
  • Clustering learners by skill

    Simulations and scenario based learning are environments in which the system responds to user behavior. In other words, they are not linear in nature. The combination of actions performed by learners when interacting with these types of modules can be captured as data points. Based on these data points, you are highly likely to find clusters of learners based on their actions, which comes from their skills or knowledge. These clusters can help you understand your learner profile better and improve the simulation or re-train some clusters of learners differently.

Concluding thoughts…

By using simple machine learning techniques such as clustering with elearning, we will be able to gain insight into data like never before. Maybe there is something out there that we have never known and it is hidden in some clusters. Let’s try to start off in some small way to implement these techniques and change the way our learners learn.

Featured Posts

Designing Competitions for Gamification

In learning, gamification is competition integrated into a learning system. The competition could be to gain leaderboard points, to ga...

Popular Posts