To create a good project follow these steps:
Step 1) Decide what is going to be your dependent variable. As you already know, the dependent variable is the variable that you want to predict or you want to estimate it on average. In both cases, you have two options to go for. Either pick a numerical dependent variable or a binary (two level categorical variable such as Yes/No or 0/1) dependent variable.
Example1: Numerical: Monthly sales of a company
Example2: students grade with two levels of good or bad
Step 2) Find predictor or explanatory variables that you think might have a relationship with the dependent variable. You need at least 4 predictors for this project.
Example 1: number of products offered, products price, incentives, competitors prices, time of the year,
Example 2: study time, number of books offered, age, employment status,
Step 3) Use week 2 lecture to create some summary statistics and some graphs to get more information about your data. (This step is optional)
Example 1: Create a bar chart to see if the amount of sales is different for different types of incentives.
Example 2: Create a mosaic plot to see if employment status (employed vs unemployed) changes the probability of getting a good grade.
Step 4) Create your model with all the predictors.
Example 1: if you chose a continuous (numerical) dependent variable, use multiple regression or regression tree for modeling.
Example 2: if you chose a binary dependent variable, use logistic regression or classification trees.
Step 5) perform a variable selection and select your best predictors to be included in the model (you are going to learn about this in future lectures)
Step 6) Create the model but this time, only with the best subset of predictors.
Step 7) Draw the diagnostic plots and check for any assumption violations for example for a multiple regression, see if the residuals are normal or check for consistency of variance.
Step 8) If there is any violation, try to fix it using transformations or polynomial regression.
Step 9) It is better that you can fix the violation but sometimes violations need more complicated models to be solved so, if it wasnt possible, just mention in your report steps you did to fix it but it didnt work. (it is better to consult with me at this stage)
Step 10) Start writing your report. Your report should include the following sections:
Introduction and statement of the problem being addressed.
Outline of the analysis.
Sources of data; method of collection.
Presentation of the data and the data analysis (all relevant software output).
Conclusions and recommendations for further study.
Step 11) Congratulations. You made a data analytics project on your own. Not many people in this word are capable of performing a deep analysis like this.