{"id":3018,"date":"2015-10-06T10:56:51","date_gmt":"2015-10-06T17:56:51","guid":{"rendered":"http:\/\/blogs.reed.edu\/ed-tech\/?p=3018"},"modified":"2015-10-07T08:45:45","modified_gmt":"2015-10-07T15:45:45","slug":"creating-nice-tables-using-r-markdown","status":"publish","type":"post","link":"https:\/\/blogs.reed.edu\/ed-tech\/2015\/10\/creating-nice-tables-using-r-markdown\/","title":{"rendered":"Creating nice tables using R Markdown"},"content":{"rendered":"<p><script type=\"text\/javascript\">\/\/ <![CDATA[\nif (window.hljs &#038;&#038; document.readyState &#038;&#038; document.readyState === \"complete\") {    window.setTimeout(function() {       hljs.initHighlighting();    }, 0); }\n\/\/ ]]><\/script>One of the neat tools available via a variety of packages in <strong>R<\/strong> is the creation of beautiful tables using data frames stored in <strong>R<\/strong>. In what follows, I\u2019ll discuss these different options using data on departing flights from Seattle and Portland in 2014. (More information and the source code for this <strong>R<\/strong> package is available at <a class=\"uri\" href=\"https:\/\/github.com\/ismayc\/pnwflights14\">https:\/\/github.com\/ismayc\/pnwflights14<\/a>.)<!--more--><\/p>\n<p>We begin by ensuring the needed packages are installed and then load them into our <strong>R<\/strong> session.<\/p>\n<div class=\"container-fluid main-container\">\n<div class=\"container-fluid main-container\">\n<pre class=\"r\"><code># List of packages required for this analysis\r\npkg &lt;- c(\"dplyr\", \"knitr\", \"devtools\", \"DT\", \"xtable\")\r\n\r\n# Check if packages are not installed and assign the\r\n# names of the packages not installed to the variable new.pkg\r\nnew.pkg &lt;- pkg[!(pkg %in% installed.packages())]\r\n\r\n# If there are any packages in the list that aren't installed,\r\n# install them\r\nif (length(new.pkg)) {\r\n  install.packages(new.pkg, repos = \"http:\/\/cran.rstudio.com\")\r\n}\r\n\r\n# Load the packages into R\r\nlibrary(dplyr)\r\nlibrary(knitr)\r\nlibrary(DT)\r\nlibrary(xtable)\r\n\r\n# Install Chester's pnwflights14 package (if not already)\r\nif (!require(pnwflights14)){\r\n  library(devtools)\r\n  devtools::install_github(\"ismayc\/pnwflights14\")\r\n  }\r\nlibrary(pnwflights14)\r\n\r\n# Load the flights dataset\r\ndata(\"flights\", package = \"pnwflights14\")<\/code><\/pre>\n<p>The dataset provides for the development of a lot of interesting questions. Here I will delve further into some of the questions I addressed in two recent workshops I led in the Fall 2015 Data @ Reed Research Skills Workshop Series. (Slides available at <a class=\"uri\" href=\"http:\/\/rpubs.com\/cismay\">http:\/\/rpubs.com\/cismay<\/a>.)<\/p>\n<p>The questions I will analyze by creating tables are<\/p>\n<p>1. Which destinations had the worst arrival delays (on average) from the two PNW airports?<\/p>\n<p>2. How does the maximum departure delay vary by month for each of the two airports?<\/p>\n<p>3. How many flights departed for each airline from each of the airports?<\/p>\n<div id=\"the-kable-function-in-the-knitr-package\" class=\"section level4\">\n<h4><strong>The <code>kable<\/code> function in the <code>knitr<\/code> package<\/strong><\/h4>\n<p>To address the first question, we will use the <code>dplyr<\/code> package written by Hadley Wickham as below. We\u2019ll use the <code>top_n<\/code> function to isolate the 5 worst mean arrival delays.<\/p>\n<pre class=\"r\"><code>worst_arr_delays &lt;- flights %&gt;% group_by(dest) %&gt;%\r\n  summarize(mean_arr_delay = mean(arr_delay, na.rm = TRUE)) %&gt;%\r\n  arrange(desc(mean_arr_delay)) %&gt;%\r\n  top_n(n = 5, wt = mean_arr_delay)<\/code><\/pre>\n<p>This information is helpful but you may not necessarily know to which airport each of these FAA airport codes refers. One of the other data sets included in the <code>pnwflights14<\/code> package is <code>airports<\/code> that lists the names. Here we will do a match to identify the names of these airports using the <code>inner_join<\/code> function in <code>dplyr<\/code>.<\/p>\n<pre class=\"r\"><code>data(\"airports\", package = \"pnwflights14\")\r\njoined_worst &lt;- inner_join(worst_arr_delays, airports, by = c(\"dest\" = \"faa\")) %&gt;%\r\n  select(name, dest, mean_arr_delay) %&gt;%\r\n  rename(\"Airport Name\" = name, \"Airport Code\" = dest, \"Mean Arrival Delay\" = mean_arr_delay)<\/code><\/pre>\n<p>Lastly we output this table cleanly using the <code>kable<\/code> function.<\/p>\n<pre class=\"r\"><code>kable(joined_worst)<\/code><\/pre>\n<table>\n<thead>\n<tr class=\"header\">\n<th align=\"left\">Airport Name<\/th>\n<th align=\"left\">Airport Code<\/th>\n<th align=\"right\">Mean Arrival Delay<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr class=\"odd\">\n<td align=\"left\">Cleveland Hopkins Intl<\/td>\n<td align=\"left\">CLE<\/td>\n<td align=\"right\">26.150000<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">William P Hobby<\/td>\n<td align=\"left\">HOU<\/td>\n<td align=\"right\">10.250000<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">Metropolitan Oakland Intl<\/td>\n<td align=\"left\">OAK<\/td>\n<td align=\"right\">10.067460<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">San Francisco Intl<\/td>\n<td align=\"left\">SFO<\/td>\n<td align=\"right\">8.864937<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">Bellingham Intl<\/td>\n<td align=\"left\">BLI<\/td>\n<td align=\"right\">8.673913<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Oddly enough, flights to Cleveland (from PDX and SEA) had the worst arrival delays in 2014. Houston also had around a 10 minute delay on average. Surprisingly, the airport in Bellingham, WA (only around 100 miles north of SEA) had the fifth largest mean arrival delay.<\/p>\n<\/div>\n<div id=\"the-dt-package\" class=\"section level4\">\n<h4><strong>The DT package<\/strong><\/h4>\n<p>In order to answer the second question, we\u2019ll again make use of the various functions in the <code>dplyr<\/code> package.<\/p>\n<pre class=\"r\"><code>dep_delays_by_month &lt;- flights %&gt;% group_by(origin, month) %&gt;%\r\nsummarize(max_delay = max(dep_delay, na.rm = TRUE))<\/code><\/pre>\n<p>The <code>DT<\/code> package provides a nice interface for viewing data frames in <strong>R<\/strong>. I\u2019ve specified a few extra options here to show all 12 months by default and to automatically set the width. Go ahead and play around with the filter boxes at the top of each column too. (An excellent tutorial on <code>DT<\/code> is available at <a class=\"uri\" href=\"https:\/\/rstudio.github.io\/DT\/\">https:\/\/rstudio.github.io\/DT\/<\/a>.)<\/p>\n<pre class=\"r\"><code>datatable(dep_delays_by_month,\r\n          filter = 'top', options = list(\r\n            pageLength = 12, autoWidth = TRUE\r\n          ))<\/code><\/pre>\n<p>The created table in HTML is available <a href=\"http:\/\/www.reed.edu\/data-at-reed\/software\/R\/blogposts\/DT.html\">here<\/a>.<\/p>\n<p>If you click on the <code>max_delay<\/code> column header, you should see that the maximum departure delay for PDX was in March and for Seattle was in May.<\/p>\n<\/div>\n<div id=\"the-xtable-package-to-produce-nice-tables-in-a-pdf\" class=\"section level4\">\n<h4><strong>The <code>xtable<\/code> package to produce nice tables in a PDF<\/strong><\/h4>\n<p>Again, we find ourselves using the extremely helpful <code>dplyr<\/code> package to answer this question and to create the underpinnings of our table to display. We merge the flights data with the <code>airlines<\/code> data to get the names of the airlines from the two letter carrier code.<\/p>\n<pre class=\"r\"><code>data(\"airlines\", package = \"pnwflights14\")\r\nby_airline &lt;- flights %&gt;% group_by(origin, carrier) %&gt;%\r\n  summarize(count = n()) %&gt;%\r\n  inner_join(x = ., y = airlines, by = \"carrier\") %&gt;%\r\n  arrange(desc(count))<\/code><\/pre>\n<p>The <code>xtable<\/code> package and its <code>xtable<\/code> function (and also the <code>kable<\/code> function you saw earlier) provide the functionality to generate HTML code or <span class=\"math inline\">\\(\\LaTeX\\)<\/span> code to produce a table. We will focus on producing the <span class=\"math inline\">\\(\\LaTeX\\)<\/span> code in this example.<\/p>\n<pre class=\"r\"><code>print(xtable(by_airline),\r\n      comment = FALSE)<\/code><\/pre>\n<p>\\begin{table}[ht]<br \/>\n\\centering<br \/>\n\\begin{tabular}{rllrl}<br \/>\n\\hline<br \/>\n&amp; origin &amp; carrier &amp; count &amp; name \\\\<br \/>\n\\hline<br \/>\n1 &amp; PDX &amp; AS &amp; 12844 &amp; Alaska Airlines Inc. \\\\<br \/>\n2 &amp; PDX &amp; WN &amp; 11193 &amp; Southwest Airlines Co. \\\\<br \/>\n3 &amp; PDX &amp; OO &amp; 9841 &amp; SkyWest Airlines Inc. \\\\<br \/>\n4 &amp; PDX &amp; UA &amp; 6061 &amp; United Air Lines Inc. \\\\<br \/>\n5 &amp; PDX &amp; DL &amp; 5168 &amp; Delta Air Lines Inc. \\\\<br \/>\n6 &amp; PDX &amp; US &amp; 2361 &amp; US Airways Inc. \\\\<br \/>\n7 &amp; PDX &amp; AA &amp; 2187 &amp; American Airlines Inc. \\\\<br \/>\n8 &amp; PDX &amp; F9 &amp; 1362 &amp; Frontier Airlines Inc. \\\\<br \/>\n9 &amp; PDX &amp; B6 &amp; 1287 &amp; JetBlue Airways \\\\<br \/>\n10 &amp; PDX &amp; VX &amp; 666 &amp; Virgin America \\\\<br \/>\n11 &amp; PDX &amp; HA &amp; 365 &amp; Hawaiian Airlines Inc. \\\\<br \/>\n12 &amp; SEA &amp; AS &amp; 49616 &amp; Alaska Airlines Inc. \\\\<br \/>\n13 &amp; SEA &amp; WN &amp; 12162 &amp; Southwest Airlines Co. \\\\<br \/>\n14 &amp; SEA &amp; DL &amp; 11548 &amp; Delta Air Lines Inc. \\\\<br \/>\n15 &amp; SEA &amp; UA &amp; 10610 &amp; United Air Lines Inc. \\\\<br \/>\n16 &amp; SEA &amp; OO &amp; 8869 &amp; SkyWest Airlines Inc. \\\\<br \/>\n17 &amp; SEA &amp; AA &amp; 5399 &amp; American Airlines Inc. \\\\<br \/>\n18 &amp; SEA &amp; US &amp; 3585 &amp; US Airways Inc. \\\\<br \/>\n19 &amp; SEA &amp; VX &amp; 2606 &amp; Virgin America \\\\<br \/>\n20 &amp; SEA &amp; B6 &amp; 2253 &amp; JetBlue Airways \\\\<br \/>\n21 &amp; SEA &amp; F9 &amp; 1336 &amp; Frontier Airlines Inc. \\\\<br \/>\n22 &amp; SEA &amp; HA &amp; 730 &amp; Hawaiian Airlines Inc. \\\\<br \/>\n\\hline<br \/>\n\\end{tabular}<br \/>\n\\end{table}<\/p>\n<p>If you don\u2019t know <span class=\"math inline\">\\(\\LaTeX\\)<\/span>, I\u2019ve also duplicated a similar table using <code>kable<\/code> for you to compare:<\/p>\n<pre class=\"r\"><code>kable(by_airline)<\/code><\/pre>\n<table>\n<thead>\n<tr class=\"header\">\n<th align=\"left\">origin<\/th>\n<th align=\"left\">carrier<\/th>\n<th align=\"right\">count<\/th>\n<th align=\"left\">name<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr class=\"odd\">\n<td align=\"left\">PDX<\/td>\n<td align=\"left\">AS<\/td>\n<td align=\"right\">12844<\/td>\n<td align=\"left\">Alaska Airlines Inc.<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">PDX<\/td>\n<td align=\"left\">WN<\/td>\n<td align=\"right\">11193<\/td>\n<td align=\"left\">Southwest Airlines Co.<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">PDX<\/td>\n<td align=\"left\">OO<\/td>\n<td align=\"right\">9841<\/td>\n<td align=\"left\">SkyWest Airlines Inc.<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">PDX<\/td>\n<td align=\"left\">UA<\/td>\n<td align=\"right\">6061<\/td>\n<td align=\"left\">United Air Lines Inc.<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">PDX<\/td>\n<td align=\"left\">DL<\/td>\n<td align=\"right\">5168<\/td>\n<td align=\"left\">Delta Air Lines Inc.<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">PDX<\/td>\n<td align=\"left\">US<\/td>\n<td align=\"right\">2361<\/td>\n<td align=\"left\">US Airways Inc.<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">PDX<\/td>\n<td align=\"left\">AA<\/td>\n<td align=\"right\">2187<\/td>\n<td align=\"left\">American Airlines Inc.<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">PDX<\/td>\n<td align=\"left\">F9<\/td>\n<td align=\"right\">1362<\/td>\n<td align=\"left\">Frontier Airlines Inc.<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">PDX<\/td>\n<td align=\"left\">B6<\/td>\n<td align=\"right\">1287<\/td>\n<td align=\"left\">JetBlue Airways<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">PDX<\/td>\n<td align=\"left\">VX<\/td>\n<td align=\"right\">666<\/td>\n<td align=\"left\">Virgin America<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">PDX<\/td>\n<td align=\"left\">HA<\/td>\n<td align=\"right\">365<\/td>\n<td align=\"left\">Hawaiian Airlines Inc.<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">SEA<\/td>\n<td align=\"left\">AS<\/td>\n<td align=\"right\">49616<\/td>\n<td align=\"left\">Alaska Airlines Inc.<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">SEA<\/td>\n<td align=\"left\">WN<\/td>\n<td align=\"right\">12162<\/td>\n<td align=\"left\">Southwest Airlines Co.<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">SEA<\/td>\n<td align=\"left\">DL<\/td>\n<td align=\"right\">11548<\/td>\n<td align=\"left\">Delta Air Lines Inc.<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">SEA<\/td>\n<td align=\"left\">UA<\/td>\n<td align=\"right\">10610<\/td>\n<td align=\"left\">United Air Lines Inc.<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">SEA<\/td>\n<td align=\"left\">OO<\/td>\n<td align=\"right\">8869<\/td>\n<td align=\"left\">SkyWest Airlines Inc.<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">SEA<\/td>\n<td align=\"left\">AA<\/td>\n<td align=\"right\">5399<\/td>\n<td align=\"left\">American Airlines Inc.<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">SEA<\/td>\n<td align=\"left\">US<\/td>\n<td align=\"right\">3585<\/td>\n<td align=\"left\">US Airways Inc.<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">SEA<\/td>\n<td align=\"left\">VX<\/td>\n<td align=\"right\">2606<\/td>\n<td align=\"left\">Virgin America<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">SEA<\/td>\n<td align=\"left\">B6<\/td>\n<td align=\"right\">2253<\/td>\n<td align=\"left\">JetBlue Airways<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td align=\"left\">SEA<\/td>\n<td align=\"left\">F9<\/td>\n<td align=\"right\">1336<\/td>\n<td align=\"left\">Frontier Airlines Inc.<\/td>\n<\/tr>\n<tr class=\"even\">\n<td align=\"left\">SEA<\/td>\n<td align=\"left\">HA<\/td>\n<td align=\"right\">730<\/td>\n<td align=\"left\">Hawaiian Airlines Inc.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>With the originating airport duplicating across all of the airlines, it would be nice if we could reduce this duplication and just bold PDX or SEA and have each appear once. Awesomely enough, the <code>rle<\/code> function in <strong>R<\/strong> will be of great help to us in this endeavor. It counts how many times a value is repeated in a table. We will then make a call to the <code>multirow<\/code> function in <span class=\"math inline\">\\(\\LaTeX\\)<\/span> in a sneaky way of pasting the appropriate text in addition to using the <code>force<\/code> option for sanitizing the text into <span class=\"math inline\">\\(\\LaTeX\\)<\/span>.<\/p>\n<p>We add in a few options to make the output of the table a little nicer by specifying horizontal lines and removing the default rownames.<\/p>\n<pre class=\"r\"><code>rle.lengths &lt;- rle(by_airline$origin)$lengths\r\nfirst &lt;- !duplicated(by_airline$origin)\r\nby_airline$origin[!first] &lt;- \"\"\r\nby_airline$origin[first] &lt;- paste(\"\\\\multirow{\", \r\n                                  rle.lengths,\r\n                                  \"}{*}{\\\\textbf{\",\r\n                                  by_airline$origin[first], \"}}\")\r\n\r\nprint(xtable(by_airline),\r\n              comment = FALSE,\r\n              hline.after=c(-1,0,nrow(by_airline), 11),\r\n              sanitize.text.function = force,\r\n              include.rownames = FALSE)<\/code><\/pre>\n<p>The resulting table produced by <span class=\"math inline\">\\(\\LaTeX\\)<\/span> can be found at Overleaf.com at <a class=\"uri\" href=\"https:\/\/www.overleaf.com\/read\/wvrpxpwrbvnk\">https:\/\/www.overleaf.com\/read\/wvrpxpwrbvnk<\/a>.<\/p>\n<p>We see that Alaska Airlines had the most flights out of both airports with Southwest coming in second at both airports.<\/p>\n<p>(The generating R Markdown file for this HTML document\u2014saved in the .Rmd extension\u2014is available <a href=\"http:\/\/reed.edu\/data-at-reed\/software\/R\/blogposts\/tables_blogpost.Rmd\">here<\/a>.)<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p><!--more--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the neat tools available via a variety of packages in R is the creation of beautiful tables using data frames stored in R. In what follows, I\u2019ll discuss these different options using data on departing flights from Seattle and Portland in 2014. (More information and the source code for this R package is &hellip; <a href=\"https:\/\/blogs.reed.edu\/ed-tech\/2015\/10\/creating-nice-tables-using-r-markdown\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Creating nice tables using R Markdown&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1244,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43,32,33],"tags":[],"class_list":["post-3018","post","type-post","status-publish","format-standard","hentry","category-general-instructional-technology","category-programming","category-quantitative-analysis"],"_links":{"self":[{"href":"https:\/\/blogs.reed.edu\/ed-tech\/wp-json\/wp\/v2\/posts\/3018","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.reed.edu\/ed-tech\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.reed.edu\/ed-tech\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.reed.edu\/ed-tech\/wp-json\/wp\/v2\/users\/1244"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.reed.edu\/ed-tech\/wp-json\/wp\/v2\/comments?post=3018"}],"version-history":[{"count":17,"href":"https:\/\/blogs.reed.edu\/ed-tech\/wp-json\/wp\/v2\/posts\/3018\/revisions"}],"predecessor-version":[{"id":3035,"href":"https:\/\/blogs.reed.edu\/ed-tech\/wp-json\/wp\/v2\/posts\/3018\/revisions\/3035"}],"wp:attachment":[{"href":"https:\/\/blogs.reed.edu\/ed-tech\/wp-json\/wp\/v2\/media?parent=3018"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.reed.edu\/ed-tech\/wp-json\/wp\/v2\/categories?post=3018"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.reed.edu\/ed-tech\/wp-json\/wp\/v2\/tags?post=3018"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}