ITS632 Assignment 4 (WEKA) – Due November 29, 2020

Question

ITS632 Assignment 4 (WEKA) – Due November 29th, 2020h at 1. Produce a hierarchical clustering (COBWEB) model for iris data. How many clusters did it produce? Why? Does it make sense? What did you expect? Change the acuity and cutoff parameters in order to produce a model similar to the one obtained in the book. Use the classes to cluster evaluation – what does that tell you? 2. Use the EM clustering method on either the basketball or the cloud data set. How many clusters did the algorithm decide to make? If you change from “Use Training set” to “Percentage evaluation split – 66% train and 33% test” - how does the evaluation change? 3. Use a k-means clustering technique to analyze the iris data set. What did you set the k value to be? Try several different values. What was the random seed value? Experiment with different random seed values. How did changing of these values influence the produced model? 4. Choose one of the following three files: soybean.arff, autoprice.arff, hungarian, zoo.arff, or zoo2_x.arff and use any two schemas of your choice to build and compare the models. Which one of the models would you keep? Why?

Dr. Jack HW Helper · Accepted Answer

Introduction Data mining techniques have become essential in the analysis of datasets, especially in the realms of machine learning and artificial intelligence. One of the key approaches in this field is clustering, which helps in identifying distinct groups within data. This paper aims to explore various clustering techniques using WEKA, a popular open-source software. Specifically, it will cover hierarchical clustering (COBWEB), EM clustering, k-means clustering, and a comparison of models using two schemas from the provided dataset files. 1. Hierarchical Clustering (COBWEB) on Iris Data The hierarchical clustering (COBWEB) model was applied to the famous Iris dataset, which includes features such as sepal length, sepal width, petal length, and petal width. Through experimentation with the acuity and cutoff parameters, it was found that the COBWEB algorithm produced 3 clusters, which corresponds to the three species of iris flowers: Setosa, Versicolor, and Virginica. This clustering decision is logical since the Iris dataset is well-known for its natural separability based on these species. Adjusting the acuity and cutoff values allowed for a more refined clustering approach, closely resembling the model described in established literature, demonstrating the algorithm's sensitivity and effectiveness in identifying clusters. Evaluation using class labels indicates that the clustering aligns well with the true distribution of species within the dataset, showcasing the model's robustness. 2. EM Clustering Method on Cloud Data Next, the Expectation-Maximization (EM) clustering method was applied to the cloud dataset. The algorithm determined that it should create 4 clusters based on the inherent characteristics of the data. The EM approach is advantageous as it can provide probabilistic clustering based on data distributions. By changing the evaluation method from "Use Training set" to "Percentage evaluation split – 66% train and 33% test," it was observed that the cl

ITS632 Assignment 4 (WEKA) – Due November 29, 2020 ✓ Solved

ITS632 Assignment 4 (WEKA) – Due November 29th, 2020h at

Paper For Above Instructions

References

ITS632 Assignment 4 (WEKA) – Due November 29th, 2020h at

Paper For Above Instructions

References

Related Assignments