Make a presentation of you doing the follows. Using Zoom to record a video of the presentation and the link of the video or the video to Canvas.
Download and Import to SAS Miner the
Breast Cancer Wisconsin (Diagnostic) Data Set at this link.
Reject all the text variables. Set
diagnosis as the binary target.
Split the data 70:30:0 for Training: Validation: Test. Create the following decision trees.
a decision tree used Gini Index for splitting
(Nominal Target Criterion)
a decision tree used Entropy for splitting (Nominal
Target Criterion)
a decision tree used Chi-squared test for splitting (Nominal Target Criterion)
a decision tree with 4 leaves.
Compare the trees in terms of misclassification rate and ROC-Index.
Show the confusion matrix (classification table) of the
Entropy tree.
What is the best tree in term of misclassification. In that tree, what is the most important variables?
How many leaves that tree in 6. have? Explain why the tree ended
up with that number of leaves. (i.e., explain the
Subtree Assessment Plot).