{"id":205,"date":"2017-11-25T00:43:23","date_gmt":"2017-11-25T00:43:23","guid":{"rendered":"http:\/\/martinsiron.com\/?p=205"},"modified":"2017-11-25T00:43:23","modified_gmt":"2017-11-25T00:43:23","slug":"using-tensorflows-cnn-vs-sklearns-decision-tree-regressor","status":"publish","type":"post","link":"http:\/\/martinsiron.com\/index.php\/2017\/11\/25\/using-tensorflows-cnn-vs-sklearns-decision-tree-regressor\/","title":{"rendered":"Using TensorFlow\u2019s CNN vs. SKLearn\u2019s Decision Tree Regressor"},"content":{"rendered":"<p><strong>Background<\/strong><\/p>\n<p>After a semester at UC Berkeley learning various machine learning and data science tools, I\u2019ve decided to re-examine the model I built half a year ago to predict the remainder of the primary elections at the time.<\/p>\n<p>I will be using the same key data:<\/p>\n<ul>\n<li>Geographic Region<\/li>\n<li>Election type (primary vs. Caucus, open vs. Closed)<\/li>\n<li>Demographics<\/li>\n<\/ul>\n<p><a href=\"http:\/\/martinsiron.com\/2016\/04\/30\/predicting-a-us-primary-election\/\">In the previous model,<\/a> I was using overall state-based demographic data since I did not have the computational skills at the time to handle more than 50 rows of data. However, with the Python skills I acquired over the semester, I decided to improve my model by adding more demographic and election data by using county level information provided by the US Census Bureau.<\/p>\n<p>Instead of manually deciding which variables I think would exert the most influence on my model, I decided to let the model figure it out. I tried using both TensorFlow\u2019s Convolutional Neural Network (CNN) using the Keras wrapper, as well as SKLearn\u2019s Decision Tree Regressor.<\/p>\n<p><strong>Explanations of Algorithms<\/strong><\/p>\n<p>There is a key difference between the two algorithms:<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"234\"><strong>Decision Trees<\/strong><\/td>\n<td width=\"234\"><strong>Convolutional Neural Networks<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"234\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-206\" src=\"http:\/\/24.144.91.142\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.39.21.png\" alt=\"\" width=\"568\" height=\"296\" srcset=\"http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.39.21.png 568w, http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.39.21-300x156.png 300w\" sizes=\"auto, (max-width: 568px) 100vw, 568px\" \/><\/td>\n<td width=\"234\">\u00a0<img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/4\/46\/Colored_neural_network.svg\/300px-Colored_neural_network.svg.png\" width=\"228\" height=\"274\" \/><\/td>\n<\/tr>\n<tr>\n<td width=\"234\"><u>How it works:<\/u><\/p>\n<p>Decision trees can be thought as a collection of \u2018if-then\u2019 statements. They take an input data set and try to match the output data-set through a tree-like structure of if-then statements. Each node on the tree is known as a \u2018leaf\u2019 and each leaf assigns a value in a regression tree. The algorithm finds the best place to create a split in order to minimize the loss function (error) in the actual output vs. the output the decision tree creates. This is very similar to the game of <em>20 questions<\/em> \u2013 you have to find the best questions in order to optimize the tree for new, unseen data.<\/td>\n<td width=\"234\">&nbsp;<\/p>\n<p>Neural networks are fundamentally different from decision trees. Rather than being \u2018linear\u2019 in their progression, from starting input to ending output, the data goes back and forth between \u2018neurons,\u2019 before returning to the output layers. However, having very large inputs will create very large number of hidden neurons between the input and output layer. To reduce the number of neurons, we create a convolutional neural network. In a CNN, an input layer is reduced to one neuron as it progresses through each layer. Additionally certain variables might add weight on others in the network.<\/td>\n<\/tr>\n<tr>\n<td width=\"234\"><u>Problems:<\/u><\/p>\n<p>One major problems with decision trees is over-fitting your training data. While over-fitting might results in 100% accuracy with your training data this could leave to catastrophic results with unseen data. One way to limit overfitting is by limiting the depth of the nodes, or pruning (removing random leaves) after overfitting<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-207\" src=\"http:\/\/24.144.91.142\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.39.48.png\" alt=\"\" width=\"580\" height=\"472\" srcset=\"http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.39.48.png 580w, http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.39.48-300x244.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><\/td>\n<td width=\"234\"><u>Problems:<\/u><\/p>\n<p>CNNs are an active area of research and are still poorly understood. They are often over specified for very specific data and might not work well on new data. This is because its hard to predict or figure out which type of layer, or activation function might work best for certain applications. Often people will build two completely different architecture of CNN that will work well with some data sets but not up to par with others.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>I decided to do a CNN instead of Recurrent Neural Network (RNN) because I believed my input data to not have much inter-correlation between each features. However, I will be testing an RNN in the future because I am still curious about the possible results.<\/p>\n<p><strong>Model Summary<\/strong><\/p>\n<p>I began by creating a data set that combines the county vote results with the demographic and election data. I then separate the data into states that had already had their elections by March 22<sup>nd<\/sup>, and states that had yet to hold an election. I only took into consideration the democratic primary results. I further split the data into an 80% train-test ratio to not overfit both models.<\/p>\n<p>For the CNN model, I built the model using 4 dense layers, with a sigmoid, softmax, and hyperbolic tangent activation layers. These layers are friendly to continuous, regression data. This created a model with <span style=\"text-decoration: underline;\">almost 70 thousand parameters.<\/span><\/p>\n<p>For the decision tree, regression model, I set the max depth to be 30 leafs, so as to not over-fit, and set the maximum features to be the square-root of the input features. I also used sklearn\u2019s AdaBoostRegressor. This helps with continuous data as it provides a smoother output instead of a step-function output, by superimposing multiple decision tree (in my model, 1000 decision trees).<\/p>\n<p><strong>Results<\/strong><\/p>\n<p>To visualize the results, I created an output graph for each model of the predicted vs actual election results. The more accurate, the more the slope would approach unity:<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"234\"><strong>Decision Tree Regressor<\/strong><\/td>\n<td width=\"234\"><strong>Convolutional Neural Network<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"234\">\u00a0<img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-208\" src=\"http:\/\/24.144.91.142\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.40.12.png\" alt=\"\" width=\"548\" height=\"370\" srcset=\"http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.40.12.png 548w, http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.40.12-300x203.png 300w\" sizes=\"auto, (max-width: 548px) 100vw, 548px\" \/><\/td>\n<td width=\"234\">\u00a0<img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-209\" src=\"http:\/\/24.144.91.142\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.40.31.png\" alt=\"\" width=\"578\" height=\"364\" srcset=\"http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.40.31.png 578w, http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.40.31-300x189.png 300w\" sizes=\"auto, (max-width: 578px) 100vw, 578px\" \/><\/td>\n<\/tr>\n<tr>\n<td width=\"234\"><u>Mean Error<\/u><\/td>\n<td width=\"234\"><u>Mean Error<\/u><\/td>\n<\/tr>\n<tr>\n<td width=\"234\">+1.19%<\/td>\n<td width=\"234\">-6.54%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Here is the state by state prediction error for both models:<\/p>\n<table width=\"470\">\n<tbody>\n<tr>\n<td width=\"118\"><strong>State<\/strong><\/td>\n<td width=\"118\"><strong>DTR Predict Err (%)<\/strong><\/td>\n<td width=\"118\"><strong>CNN Predict Err (%)<\/strong><\/td>\n<td width=\"118\"><strong>Actual (%)*<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"118\">California<\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">+17.6<\/span><\/td>\n<td width=\"118\">+2.44<\/td>\n<td width=\"118\">41.5<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">Delaware<\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">+7.45<\/span><\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">-10.63<\/span><\/td>\n<td width=\"118\">39.3<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">Indiana<\/td>\n<td width=\"118\">-1.86<\/td>\n<td width=\"118\">-4.50<\/td>\n<td width=\"118\">54.3<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">Kentucky<\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">-10.16<\/span><\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">-15.54<\/span><\/td>\n<td width=\"118\">49.5<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">Maryland<\/td>\n<td width=\"118\">-0.66<\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">-9.93<\/span><\/td>\n<td width=\"118\">30.2<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">Montana<\/td>\n<td width=\"118\">+0.95<\/td>\n<td width=\"118\">+1.85<\/td>\n<td width=\"118\">48.3<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">New Jersey<\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">+10.80<\/span><\/td>\n<td width=\"118\">-3.02<\/td>\n<td width=\"118\">36.5<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">Oregon<\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">+8.51<\/span><\/td>\n<td width=\"118\">-0.15<\/td>\n<td width=\"118\">58.7<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">Pennsylvania<\/td>\n<td width=\"118\">+4.11<\/td>\n<td width=\"118\">-5.11<\/td>\n<td width=\"118\">34.3<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">Rhode Island<\/td>\n<td width=\"118\">-1.87<\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">-21.67<\/span><\/td>\n<td width=\"118\">50.4<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">South Dakota<\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">+8.92<\/span><\/td>\n<td width=\"118\">-0.57<\/td>\n<td width=\"118\">49.8<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">West Virginia<\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">-16.27<\/span><\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">-21.90<\/span><\/td>\n<td width=\"118\">52.8<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">Wisconsin<\/td>\n<td width=\"118\">-2.33<\/td>\n<td width=\"118\"><span style=\"color: #ff6600;\">-12.05<\/span><\/td>\n<td width=\"118\">59.2<\/td>\n<\/tr>\n<tr>\n<td width=\"118\">Wyoming<\/td>\n<td width=\"118\">-3.44<\/td>\n<td width=\"118\">-3.97<\/td>\n<td width=\"118\">56.3<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>While the error for the DTR model was more centered about zero, it provided more catastrophic results (above 5% error) than the CNN model. If the CNN model was linearly calibrated by 6% at the very end, it would have had two less catastrophic results, and would have been significantly better. Overall, both of these models resulted in more problems than the state-wide analysis.<u> I attribute more data to more error as perhaps a case of Simpson\u2019s Paradox.<\/u><\/p>\n<p>However, perhaps by combining a linear combination of these two models, an even better model could be made than the previous model with just state-wide data. There are many more variables that I could further explore in the DTR and CNN library as well, that could perhaps optimize this model further.<\/p>\n<p><em>*This calculation was achieved by weighing each county\u2019s population with their votes, which may not be the same results from the published voting results by the state but is more accurate for the data used in these models.<\/em><\/p>\n<p><strong>Interesting connection \u2013 GOP vs. DNC<\/strong><\/p>\n<p>Just for curiosity, I decided to run the same CNN and DTR model on the GOP:<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"234\"><strong>Decision Tree Regressor<\/strong><\/td>\n<td width=\"234\"><strong>Convolutional Neural Network<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"234\">\u00a0<img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-210\" src=\"http:\/\/24.144.91.142\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.42.02.png\" alt=\"\" width=\"554\" height=\"360\" srcset=\"http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.42.02.png 554w, http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.42.02-300x195.png 300w\" sizes=\"auto, (max-width: 554px) 100vw, 554px\" \/><\/td>\n<td width=\"234\">\u00a0<img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-211\" src=\"http:\/\/24.144.91.142\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.42.19.png\" alt=\"\" width=\"574\" height=\"364\" srcset=\"http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.42.19.png 574w, http:\/\/martinsiron.com\/wp-content\/uploads\/2017\/11\/Screen-Shot-2017-11-24-at-16.42.19-300x190.png 300w\" sizes=\"auto, (max-width: 574px) 100vw, 574px\" \/><\/td>\n<\/tr>\n<tr>\n<td width=\"234\"><u>Mean Error<\/u><\/td>\n<td width=\"234\"><u>Mean Error<\/u><\/td>\n<\/tr>\n<tr>\n<td width=\"234\">-18.02%<\/td>\n<td width=\"234\">-17.4%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Both of these models outputted predictions with significantly greater error. I interpret this to mean that democratic voters fit more \u2018neatly\u2019 into specific demographic groups outlined by the census data than GOP voters.<\/p>\n<p><strong>Interesting connection \u2013 weights<\/strong><\/p>\n<p>I decided to further analyze in the DTR Model which variables were most prominent in calculating the percentage of votes received to each candidate in the DNC primary.<\/p>\n<p>I outputted the weights from the trained DTR Model:<\/p>\n<table width=\"462\">\n<tbody>\n<tr>\n<td width=\"154\">Variable<\/td>\n<td width=\"154\">Description<\/td>\n<td width=\"154\">Weight<\/td>\n<\/tr>\n<tr>\n<td width=\"154\">Type<\/td>\n<td width=\"154\">What type of election was<br \/>\nheld (primary or caucus)<\/td>\n<td width=\"154\">0.025561<\/td>\n<\/tr>\n<tr>\n<td width=\"154\">RHI125214<\/td>\n<td width=\"154\">White percent<\/td>\n<td width=\"154\">0.027758<\/td>\n<\/tr>\n<tr>\n<td width=\"154\">AGE135214<\/td>\n<td width=\"154\">Person under 5 yrs, percent<\/td>\n<td width=\"154\">0.028681<\/td>\n<\/tr>\n<tr>\n<td width=\"154\">Open<\/td>\n<td width=\"154\">Whether the election was<br \/>\nopened or closed<\/td>\n<td width=\"154\">0.028841<\/td>\n<\/tr>\n<tr>\n<td width=\"154\">LND110210<\/td>\n<td width=\"154\">Land area<\/td>\n<td width=\"154\">0.029259<\/td>\n<\/tr>\n<tr>\n<td width=\"154\">RHI525214<\/td>\n<td width=\"154\">Native Hawaiian\/Other<br \/>\nPacific Islander percentage<\/td>\n<td width=\"154\">0.036283<\/td>\n<\/tr>\n<tr>\n<td width=\"154\">RTN131207<\/td>\n<td width=\"154\">Retail sales per capita<\/td>\n<td width=\"154\">0.040708<\/td>\n<\/tr>\n<tr>\n<td width=\"154\">HSG495213<\/td>\n<td width=\"154\">Median value of housing<br \/>\nunits<\/td>\n<td width=\"154\">0.050376<\/td>\n<\/tr>\n<tr>\n<td width=\"154\">RHI225214<\/td>\n<td width=\"154\">Black or African American<br \/>\npercentage of population<\/td>\n<td width=\"154\">0.069465<\/td>\n<\/tr>\n<tr>\n<td width=\"154\">Region<\/td>\n<td width=\"154\">In which geographic region<br \/>\nof the US the election was<br \/>\nheld<\/td>\n<td width=\"154\">0.249174<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The region of the voter has a significantly greater impact (by one order of magnitude) on the results. As expected, the geographic south and west voted very differently in the democratic primary. Perhaps less expected, was the importance of the election type (open, closed, caucus, primary) as well as the racial make-up and certain odd economic factors (retail sales, median value of housing) of each county.<\/p>\n<p><strong>If you would like to see my Jupyter notebook or data set, <a href=\"http:\/\/martinsiron.com\/contact\/\">please contact me<\/a>!<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Background After a semester at UC Berkeley learning various machine learning and data science tools, I\u2019ve decided to re-examine the model I built half a year ago to predict the remainder of the primary elections at the time. I will be using the same key data: Geographic Region Election type (primary vs. Caucus, open vs.&hellip;<a href=\"http:\/\/martinsiron.com\/index.php\/2017\/11\/25\/using-tensorflows-cnn-vs-sklearns-decision-tree-regressor\/\" class=\"button\">Read more <span class=\"screen-reader-text\">Using TensorFlow\u2019s CNN vs. SKLearn\u2019s Decision Tree Regressor<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-205","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"http:\/\/martinsiron.com\/index.php\/wp-json\/wp\/v2\/posts\/205","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/martinsiron.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/martinsiron.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/martinsiron.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/martinsiron.com\/index.php\/wp-json\/wp\/v2\/comments?post=205"}],"version-history":[{"count":0,"href":"http:\/\/martinsiron.com\/index.php\/wp-json\/wp\/v2\/posts\/205\/revisions"}],"wp:attachment":[{"href":"http:\/\/martinsiron.com\/index.php\/wp-json\/wp\/v2\/media?parent=205"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/martinsiron.com\/index.php\/wp-json\/wp\/v2\/categories?post=205"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/martinsiron.com\/index.php\/wp-json\/wp\/v2\/tags?post=205"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}