An engineering company wants to make a loan. The bank constructs a credit check through accessing historic data of companies. The data carried company ID, Years in operation, Size of the company (big or small), loan amount, times a company was overdue and whether or not the company defaulted.
Company_ID Years Size Loan Amount Times overdue Default
1 5 B R60000 1 NO
2 9 B R90000 12 YES
3 3 S R40000 7 YES
4 15 B R20000 15 YES
5 4 S R10000 0 NO
i. Which variable is the dependent variable?
ii. Categorize the variables numerically for analysis purposes.
iii. What are the assumptions of Logistic regression?
(1)The company needs to determine if a company is a defaulter or not based on other accessible information such as the year, size, loan amount, and time overdue.
Hence The dependent variable is the last variable, "Defaulted."
(2) There are two categorical variables that must be converted to numerical form for analysis.
The first is "Size," while the second is "Defaulted."
As a result, for a large company, we can assign value 1(B), while for a small company, we can assign value 0. (S).
As a result, the size variable will be:
size=(1,1,0,1,0)
For our dependent variable, we can assign a value of 0 to Yes and a value of 1 to No.
As a result, the dependent variable will be:
defaulted=(1,0,0,0,1)
(3) Hypotheses:
(a)Binary is the dependent variable. We can see that the dependant variable only accepts two values: 0 and 1.
(b)The observations are distinct. Each company is distinct from the others.
(c)Among independent variables, there is no multicollinearity. As a result, there should be little association between independent variables (year, size, loan amount, and times overdue).
(d) There are no outliers at the extremes.
Comments
Leave a comment