Predictive Modeling 101: A Beginner's Guide to Data Science
Taken from my book, "Predictive Analytics Study Guide --2021",
Chapter 8.1 Modeling Vocabulary:
Modeling notation is sloppy because many words mean the same thing.
The number of observations will be denoted by N. When we refer to the size of a data set, we are referring to. Each row of the data is called an observation or record. Observations tend to be people, cars, buildings, or other insurable things. These are always independent in that they do not influence one another. Because computers have limited power, N tends to be less than 100,000. Each observation has known attributes called variables, features, or predictors. We use
P to refer the number of input variables that are used in the model.
The target, response, label, dependent variable, or outcome variable is the unknown quantity that is being predicted. We use Y
for this. This can be either a whole number, in which case we are performing regression, or a category, in which case we perform classification.
For example, say that you are a health insurance company that wants to set the premiums for a group of people. The premiums for people who are likely to incur high health costs need to be higher than those likely to be low-cost.
Older people tend to use more of their health benefits than younger people, but there are always exceptions for those who are very physically active and healthy. Those who have an unhealthy Body Mass Index (BMI) tend to have higher costs than those who have a healthy BMI, but this has less impact on younger people.
In short, we want to predict the future health costs of a person by taking into account many of their attributes at once.
This can be done in the health_insurance data by fitting a model to predict the annual health costs of a person. The target variable is y = charges, and the predictor variables are age, sex, bmi, children, smoker and region. These six variables mean that P = 6
. The data is collected from 1,338 patients, which means that
N = 1338.
▶️ If you enjoy this video, please like it and share it.
▶️ Don't forget to subscribe to this channel for more updates.
▶️ Subscribe now: https://www.youtube.com/@predictiveanalyst?sub_confirmation=1
Music used with permission from the YouTube Audio Library, "W. A. Mozart, Symphony No.38 in D major - A Far Cry" from https://www.youtube.com/watch?v=Da4eikl5wfk&ab_channel=AFarCry-Topic
⚡️ 𝐂𝐎𝐍𝐍𝐄𝐂𝐓 𝐖𝐈𝐓𝐇 𝐌𝐄:
▶️ Website: www.PredictiveInsightsAI.com
▶️ Linkedin: https://www.linkedin.com/in/sdcastillo/
🎬 𝐖𝐀𝐓𝐂𝐇 𝐎𝐔𝐑 𝐎𝐓𝐇𝐄𝐑 𝐕𝐈𝐃𝐄𝐎𝐒:
▶️ https://youtu.be/gUISSMCFgXI
▶️ https://youtu.be/2ORYiCwyvGI
▶️ https://youtu.be/j0AqIb412RE
▶️ https://youtu.be/ZdOcMJdAdqI
▶️ https://youtu.be/vtegCXjxwUg
✖️ 𝐂𝐎𝐏𝐘𝐑𝐈𝐆𝐇𝐓 𝐍𝐎𝐓𝐈𝐂𝐄: This video and my YouTube channel contain dialog, music, and image that are property of "Channel name" You are authorized to share the video link and channel and embed this video in your website or others as long as a link back to my YouTube Channel is provided
© PredictiveAnalyst
Please share with your friends and family. Also don't forget to like, subscribe, and hit the notification bell to notify you if I post a new video. Much love and positive thoughts.
▶️ 𝐑𝐄𝐋𝐀𝐓𝐄𝐃 𝐇𝐀𝐒𝐓𝐀𝐆𝐒:-
#PredictiveModeling
#DataScience
#MachineLearning
#DataAnalysis
#BigData
#AI
#TechTutorial
#DataScienceTutorial
#PythonProgramming
#DataVisualization
#Analytics
#DataScienceCommunity
#LearnDataScience
#Coding
#Technology
Taken from my book, “Predictive Analytics Study Guide –2021”,
Chapter 8.1 Modeling Vocabulary:
Modeling notation is sloppy because many words mean the same thing.
The number of observations will be denoted by N. When we refer to the size of a data set, we are referring to. Each row of the data is called an observation or record. Observations tend to be people, cars, buildings, or other insurable things. These are always independent in that they do not influence one another. Because computers have limited power, N tends to be less than 100,000. Each observation has known attributes called variables, features, or predictors. We use
P to refer the number of input variables that are used in the model.
The target, response, label, dependent variable, or outcome variable is the unknown quantity that is being predicted. We use Y
for this. This can be either a whole number, in which case we are performing regression, or a category, in which case we perform classification.
For example, say that you are a health insurance company that wants to set the premiums for a group of people. The premiums for people who are likely to incur high health costs need to be higher than those likely to be low-cost.
Older people tend to use more of their health benefits than younger people, but there are always exceptions for those who are very physically active and healthy. Those who have an unhealthy Body Mass Index (BMI) tend to have higher costs than those who have a healthy BMI, but this has less impact on younger people.
In short, we want to predict the future health costs of a person by taking into account many of their attributes at once.
This can be done in the health_insurance data by fitting a model to predict the annual health costs of a person. The target variable is y = charges, and the predictor variables are age, sex, bmi, children, smoker and region. These six variables mean that P = 6
. The data is collected from 1,338 patients, which means that
N = 1338.
▶️ If you enjoy this video, please like it and share it.
▶️ Don’t forget to subscribe to this channel for more updates.
▶️ Subscribe now: https://www.youtube.com/@predictiveanalyst?sub_confirmation=1
Music used with permission from the YouTube Audio Library, “W. A. Mozart, Symphony No.38 in D major – A Far Cry” from https://www.youtube.com/watch?v=Da4eikl5wfk&ab_channel=AFarCry-Topic
⚡️ 𝐂𝐎𝐍𝐍𝐄𝐂𝐓 𝐖𝐈𝐓𝐇 𝐌𝐄:
▶️ Website: www.PredictiveInsightsAI.com
▶️ Linkedin: https://www.linkedin.com/in/sdcastillo/
🎬 𝐖𝐀𝐓𝐂𝐇 𝐎𝐔𝐑 𝐎𝐓𝐇𝐄𝐑 𝐕𝐈𝐃𝐄𝐎𝐒:
▶️ https://youtu.be/gUISSMCFgXI
▶️ https://youtu.be/2ORYiCwyvGI
▶️ https://youtu.be/j0AqIb412RE
▶️ https://youtu.be/ZdOcMJdAdqI
▶️ https://youtu.be/vtegCXjxwUg
✖️ 𝐂𝐎𝐏𝐘𝐑𝐈𝐆𝐇𝐓 𝐍𝐎𝐓𝐈𝐂𝐄: This video and my YouTube channel contain dialog, music, and image that are property of “Channel name” You are authorized to share the video link and channel and embed this video in your website or others as long as a link back to my YouTube Channel is provided
© PredictiveAnalyst
Please share with your friends and family. Also don’t forget to like, subscribe, and hit the notification bell to notify you if I post a new video. Much love and positive thoughts.
▶️ 𝐑𝐄𝐋𝐀𝐓𝐄𝐃 𝐇𝐀𝐒𝐓𝐀𝐆𝐒:-
#PredictiveModeling
#DataScience
#MachineLearning
#DataAnalysis
#BigData
#AI
#TechTutorial
#DataScienceTutorial
#PythonProgramming
#DataVisualization
#Analytics
#DataScienceCommunity
#LearnDataScience
#Coding
#Technology