Machine Learning Explainability: complete walkthrough using a detailed use-case

Share it with your friends Like

Thanks! Share it with your friends!

Close

Do machine Learning models really act like black boxes? For a large majority of people, especially those who are not data scientists, the answer is “Yes”.

However, this is not completely true. With proper structuring and critical thinking, one can explain the predictions or decisions made by a machine learning model. In this article, I have shared a hybrid framework that uses the concepts of machine learning explainability which can be used to explain a so-called black box machine learning model. The framework uses the interpretations derived from a trained machine learning model. Unlike descriptive analysis to find key insights, the focus in this approach is to make use of model behaviours and characteristics such as Relative Feature Importances, Partial Dependencies, Permutation Importances, SHAP values.

Understanding what makes the Crowdfunding Projects successful using ML Explainability

Crowdfunding is the practice of funding a project or a venture by raising monetary contributions from many people across the globe. There are a number of organisations such as DonorsChoose.org, Patreon, Kickstarter which hosts the crowdfunding projects on their platforms. Kickstarter has hosted more than 250,000 projects on its website with more than $4 Billion collective amount raised.

Trending AI Articles:

1. Deep Learning Book Notes, Chapter 1

2. Deep Learning Book Notes, Chapter 2

3. Machines Demonstrate Self-Awareness

4. Visual Music & Machine Learning Workshop for Kids

While it is true that crowdfunding is one of the most popular methods to raise funds however the reality is that not every project is able to completely reach the goal. In fact, on Kickstarter, only about 35% of the total projects have raised successful fundings in the past. This fact raises an important question — which projects are able to successfully achieve their goal?. In other words, can project owners somehow know what are the key project characteristics that increase the chances of success?

In many studies, Researchers and analysts have used descriptive analysis methods on the crowdfunding data to obtain insights related to project success. While many others have also applied predictive modelling to obtain the probability of project success. However, these approaches have fundamental problems: The descriptive analysis part of the problem only gives surface level insights while the in Predictive analysis, models act as the black boxes.

Contents

  1. Business Use-Case and Problem Statement
  2. Hypothesis Generation
  3. Dataset Preparation
  4. Modelling Project Success
  5. Model Interpretation: Insights Generation
    a. Most important features of a project? ( Relative Feature Importance )
    b. Features having the biggest impact on success? ( Permutation Importance )
    c. How do changes in features affect project success? ( Partial Dependencies )
    d. Digging deeper into the decisions made by the model ( SHAP values )
  6. Final Conclusions

1. Understanding the Business Use Case

The essential business use-cases in the crowdfunding scenario can be considered from two different perspectives — from the project owner’s perspective and the companies perspective.

a. From the project owner’s perspective, it is highly beneficial to be aware of the key characteristics of a project that greatly influence the success of any project. For instance, it will be interesting to pre-emptively know about the following questions:

  • What is an ideal and optimal range of the funding goal for my project?
  • On which day of the week, I should post the project on Kickstarter?
  • How many keywords should I use in my project title?
  • What should be the total length of my project description?

b. From the perspective of companies which hosts crowdfunding projects such as DonorsChoose.org, Patreon, and Kickstarter, they receive hundreds of thousands of project proposals every year. A large amount of manual effort is required to screen the project before it is approved to be hosted on the platform. This creates the challenges related to scalability, consistency of project vetting across volunteers, and identification of projects which require special assistance.

It is due to these two perspectives, there is a need to dig deeper and find more intuitive insights related to the project’s success. Using these insights, more people can get their projects funded more quickly, and with less cost to the hosting companies. This also allows the hosting companies to optimize the processes and channel even more funding directly to projects.

2. Hypothesis Generation

Hypothesis Generation is a very powerful technique which can help an analyst to structure a very insightful and relevant solution of a business problem. It is a process of building an intuitive approach of the business problem without even thinking about the available data. Whenever I start with any new business problem, I try to make a comprehensive list of all the factors which can be used to obtain the final output. For example, which features should affect my predictions. Or, which values of those features will give me the best possible result. In the case of crowdfunding, the question can be — which features are very important to decide if a project will be successful or not.

So, to generate the hypothesis for the use-case, we will write down a list of factors (without even looking at the available data) that can possibly be important to model the project success.

  1. Total amount to be raised — More amount may decrease the chances that the project will be successful.
  2. The total duration of the project — It is possible that projects which are active for very short or very long time periods are not successful.
  3. The theme of the project — People may consider donating to a project which has a good cause or a good theme.
  4. Writing style of the project description — If the message is not very clear, the project may not get complete funding.
  5. Length of the project description — Very long pieces of text may not perform well as compared to shorter crisp texts.
  6. Project launch time — A project launched on weekdays as compared to weekends or holidays may not get complete funding amount.

So this is an incomplete list of possible factors we can think at this stage that may influence the project success. Now, using machine learning interpretability, not only we can try to understand which features are actually important but also what are the feature values which these features can take.

3. Dataset Preparation

Dataset Used: https://www.kaggle.com/kemical/kickstarter-projects

In this dataset, a number of features are about the active stage of the project. This means that a project was launched on a particular date and a partial amount is already raised. The goal of our problem statement is a little bit different, we want to focus on the stage in which the project is not launched yet and identify if it will succeed or not. Additionally, find the most important features (and the feature values) that influence this output. So we perform some pre-processing in this step which includes the following:

  • Get rid of unwanted columns (active stage columns)
  • Feature Engineering (driven from our hypothesis generation)
  • Remove Duplicates
  • Handle Missing Values
  • Encode the Categorical Features

Feature Engineering (Driven from Hypothesis Generation)

  1. Project Name / Description Features: From our hypothesis, we suggested that how the project name or description is written may affect the success of the project. So we create some features related to the project name. We don’t have a description of the project in this dataset, so we avoid that. We create the following features: Number of Words Used, Number of Characters Used, Number of Syllables Used (Difficult Words)
  2. Project Launched Date Features: Also, we suggested that the project first launch can affect its success. So we create some date — time-related features: Launched Day, Month, Quarter, Week; Total Duration of the Project; Was project launched on weekday or weekend; Was project launched on a holiday or regular day
  3. Project Category Features: These are more likely the high-level features which provide the idea about the category/sub-category of the project. Also, we add some extra information with a category such as the popularity of the category calculated from the total number of projects posted in that category. Category Count and Sub Category Count: Generally how many projects are posted in those categories. This gives an idea if the project belongs to a more generic category or is more of a rare project. Category / Sub-Category Mean Goal: Generally what is the average goal set in those categories/sub-categories. This gives an idea if the project’s goal is much higher or much lower than the standard mean goal of that category.

For Category and Main Category, I have used LabelEncoder, Some people may argue that LE may not be a perfect choice for this rather OneHot Encoder should be used. But In our use-case we are just trying to understand the effect of a column as a whole, so we can use label encoder. Now, we can generate the count/aggregation based features for the main category and subcategory.

4. Modelling Project Success

Now, with all those features prepared we are ready to train our model. We will train a single random forest regression model for this task. There are ofcourse many other models available as well such as lightgbm or xgboost, but in this article, I am not focussing on evaluation metric rather the insights from predictive modelling. Now, we have a model which predicts the probability of a given project to be successful or not. In the next section, we will interpret the model and its predictions. In other words, we will try to prove or disprove our hypothesis.

5. Insights from Predictive Modelling

5.1 Which are the most important features (relatively) of a project? (Relative Feature Importance)

In tree-based models such as random forest, a number of decision trees are trained. During the tree building process, it can be computed how much each feature decreases the weighted impurity (or increases the information gain) in a tree. In a random forest, the impurity decrease from each feature is averaged and the features are ranked according to this measure. This is called relative feature importance. The more an attribute is used to make key decisions with decision trees, the higher its relative importance. This indicates that the particular feature is one of the important features required to make accurate predictions.

Most Important (Relative): Goal | NumChars | LaunchedWeek | DiffMeanCategoryGoal | Duration | SyllableCount | LaunchedMonth | NumWords | LaunchedDay | MeanCategoryGoal

Least Important (Relative): Music | Theater | Fashion | Comics | Games | Publishing | Technology | Film & Video | Food | Crafts | Design | Dance | Art | Photography | Journalism

  • From the graph, it is clear that the features which are important to predict the project success are: project goal, length of the project name, launched week, duration, and the number of syllables present in the name. While the least important features are mostly related to the project categories
  • What does this mean for the project owner? For someone who is willing to raise funds, they should consider evaluating the ideal project goal and duration. A high or medium-high project goal may almost lead to the case of failure. Additionally, the number of characters used in the project title will also affect if the project will be succeeded or failed.
  • What does this mean for the company? The company can identify the projects with high importance based on their meta-features such as the length of the project.

By applying this approach, we primarily obtained the factors to look at a high level, But still, we need to answer, what are the optimal values of these features. This will be answered when we apply other techniques in the next sections. Before moving on to those techniques, I wanted to explore a little more about relative feature importance using a graph theory perspective.

5.2 Which features have the biggest impact on project success? (Permutation Importance)

In the last section, we mainly identified which the features at a very high level which are relatively important to the model outcome. In this section, we will go a little deeper and understand which features have the biggest impact on the model predictions (in an absolute sense). One of the ways to identify such behaviour is to use permutation importance.

The idea of permutation importance is very straightforward. After training a model, the model outcomes are obtained. The most important features for the model are the ones if the values of those features are randomly shuffled then they lead to the biggest drops in the model outcome accuracies. Let’s look at the permutation importance of features of our model.

  • This is an interesting plot, We can observe that the features shown in top and in green are the most important as if their values are randomized then the outcome performance suffers.
  • We can observe that the top features are are the features which we mostly saw in the relative importance section, but using this graph we can quantify the amount of importance associated with them. And also obtain the ones which are least important, for example — launched week, if it was a weekend or not etc.

With this method, we obtained the importance of a feature in a more absolute sense rather than a relative sense. Let’s assume that our feature space forms a majority of the universe. Now, it will be interesting to plot both permutation and relative feature importance and make some key observations.

  • Very Interesting Insights can be obtained from the above plot, There are some features which showed up higher in the relative feature importance, but when we look at their permutation importance we see that they are not important.
  • From this plot, we can again observe that our hypothesis is almost true, the project goal, duration, number of characters, number of words all are the most important features that one should look at while creating a new project page.

Presence of which keywords makes the biggest impact in the predictions?

Using permutation importance, we can also evaluate which keywords make the biggest impact on the model prediction. Let’s train another model which also uses keywords used in the project name and observe the permutation importance.

From the first plot, we can observe that there are certain keywords which when used in the project name are likely to increase the probability success of a project. Example — “project”, “film”, and “community”. While on the other hand, keywords like “game”, “love”, “fashion” is likely to garner less attraction. This implies that crowdfunding projects related to games or entertainment such as love or fashion may not be very successful as compared to the ones related to art, design etc.

5.3 How do changes in features lead to changes in model outcome? (Partial Dependencies)

So far we have only talked about which features are most or least important from a pool of many features. For example, we observed that Project Goal, Project Duration, Number of Characters used etc are some of the important features related to project success. In this section, we will look at what are the specific values or ranges of features which leads to project success or failure. Specifically, we will observe how making changes such as increasing or decreasing the values affect the model outcomes. These effects can be obtained by plotting the partial dependency plots of different features.

Project Name — Features

We observe that the projects having a fewer number of words (<= 3) in the name does not show any improvement in model success. However, if one start increasing the number of words in the project name, the corresponding model improvement also increases linearly. For all the projects having more than 10 words in the name, the model becomes saturate and shows similar predictions. Hence, the ideal word limit is somewhere around 7–10.

Number of Characters

From the 2nd plot, we observe that if the total number of characters are less than 20, then model performance decreases than a normal value. Increasing the characters in the name linearly also increases the model performances.

Let’s also plot the interaction between the number of words and characters used.

From the above plot, it can be observed that about 40–65 characters and 10–14 words are the good numbers for the project name.

Project Launched Day and Duration

For shorter project duration (less than 20 days), the chances that the project will be successful are higher. However, if the duration of a project is increased to say 60–90 days, it is less likely to achieve its goal.

We understood from the permutation importance that launched month has less impact, which we can observe from partial dependency plots. But I just wanted to see are there any specific months in which the chances of project success are more. Looks like that towards the last quarter of the year (months 9–12), the success rate of projects is slightly higher while it is slightly lesser in quarter 3.

For launch day, the model performance is lesser when the launched day is Friday — Sunday as compared to Monday — Wednesday.

Project Main Category

From the feature definition, category count is a feature which acts as the proxy of the popularity of a project category. For example, if in Travel category a large number of projects are posted then its category_count will be higher so it is a popular category on Kickstarter. On the other hand, if in the Entertainment category, very rarely someone adds a project, its category_count will be lesser and so is its popularity. From the plot, we can observe that chances that a project will be successful will be higher if it belongs to a popular category. Also holds true for the main category.

How about specific categories?

By plotting the pdp_isolate graph we can also identify the effect of specific project categories.

From the partial dependency plot for project category, we observe that the accuracy of a model predicting the project success increases if it belongs to “Music”, “Comics”, “Theater”, or “Dance” categories. It decreases if it belongs to “Crafts”, “Fashion Film & Video”. The same insights can be backed from the actual predictions plot.

5.4 Understanding the decisions made by the Model (using SHAP)

In this section, We make the final predictions from our model and interpret them. For this purpose, we will use of SHAP values which are the average of marginal contributions of individual feature values across all possible coalitions. Let’s try to understand this in laymen terms, Consider a random project from the dataset with the following features:

  • Title contains 8 words
  • Title contains “machine learning”
  • The project goal is US 10000 dollars
  • The project is launched on a weekday

The trained model predicts that this project is likely to be successful with a probability of 75%. But, someone asks the question: Why this project has the success probability of 75%, not 95 %? To answer this question, we obtain the shap values for the prediction made by the model for this project. Shap values indicate the amount of increase or decrease in model outcome value from the average value of the predictions in the entire dataset. For example:

  • The average prediction value for this project would have been 45% without any model.
  • Due to the presence of 8 keywords in the project title the success probability is increased to 60%.
  • Since the title contains the bigram “machine learning”, the success probability is further increased to 88%.
  • Since the project is launched on a weekday, the success probability is increased by 2% to 90%
  • However, since the project goal is too high (as compared to the average of the universe), the success probability is decreased from 90% to 75%.

Let’s see the model predictions on the entire dataset.

In a sample of around 6500 crowdfunding projects, Model predicts that about 4200 will be failed and only about 2300 will be successful. Now, we are interested to understand what is driving the success and failure of these projects. Let’s plot the individual feature effects on some of these predictions to make sense out of them.

For this particular project, the prediction value is increased to 0.58 from the base value of 0.4178. This implies that the presence of certain features and their corresponding values in this project makes it more likely to be successful. For instance, the duration is 44, the number of characters in the project name is 27, and the difference in goal amount from the mean goal amount of the category is about 50K. These features increase the probability.

For this particular project, apart from the number of characters, duration, the goal amount = 2000 also increases the probability from the base value of 0.4178 to 0.72. Not many features decrease the probability significantly.

For this project, the probability is not increased much as compared to other projects. In fact, the probability is decreased due to the number of characters equal to 17, a high value of goal, and the duration of 29 days.

For this project, the duration of 25, a small goal of 1500 significantly increases the project changes. However, less number of words (only 4), and difference from mean category goal amounts decreases the probability almost equally.

For this project, a large duration, the presence of a particular category decreases the chances significantly. Not many features and the feature values are able to increase the project success chances.

Now, we can aggregate the shap values for every feature for every prediction made by the model. This helps to understand an overall aggregated effect of the model features. Let’s plot the summary plot.

We can observe that shap values are higher for the same set of features which we saw in relative feature importance and permutation importance. This confirms that features which greatly influence the project success are related to project features such as duration or goal.

6. Final Conclusions

After applying these different techniques we understood that there are certain factors which increase or decreases the chances of a project successfully able to raise the funds. Both from the project owner’s and the company’s perspective it is important to set the optimal values of project goal and the duration. A large duration or a very large amount may not be able completely successful. At the same time, it is important to choose the right number of words and characters in the name of the project. For example, a project having very few or very large numbers of words and characters may become less intuitive and less self-explanatory. Similarly, the project category also plays a crucial role, There will be some categories on the platform in which a total number of projects are very large, these are so-called popular categories. The chances may be higher if the project is posted in a popular category rather than a rare category.

In this article, I shared a general framework which I follow while solving any data science or analytics problem. This framework can be applied to many other different use-cases as well. The main focus of this article was different techniques related to machine learning explainability. Mainly, I used relative feature importance, permutation importance, partial dependencies, and shap values. In my opinion, here are some pros and cons of these techniques. Apart from these techniques, there are other alternatives as well (such as a skater, lime).

References

Thanks for reading. The complete code is available at this link.

Don’t forget to give us your 👏 !

https://medium.com/media/c43026df6fee7cdb1aab8aaf916125ea/href


Machine Learning Explainability: complete walkthrough using a detailed use-case was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Comments

Write a comment

*