Statistics > Final Exam Review > University of Texas STA 301 Elementary Statistics Statistics Final Exam. Q&A. Score 100% (All)

University of Texas STA 301 Elementary Statistics Statistics Final Exam. Q&A. Score 100%

Document Content and Description Below

University of Texas STA 301 Elementary Statistics Statistics Final Exam NORMAL DISTRIBUTION Which of the following statements is true about the normal random walk model, Y[t] = Y[t-1] + e[t],... where e[t] is a normal shock with mean zero and standard deviation sigma? -This random walk model depends upon the initial state of the system. -Each shock is independent of all prior shocks. Which of the following is true of the normal distribution model? Select all correct answers. -The normal distribution has "thin tails" because large outliers are unlikely to occur. -The area under a normal density curve represents probability Probability Models A manufacturer of video game controllers is concerned that their controller may be difficult for left-handed users. Suppose that 22% of the population is lefthanded. Consider a sample of 12 customers. Can the number of left-handed gamers in the sample be modeled as a Binomial random variable? -Yes, if we assume that each customer has a 22% chance of being left-handed. Which of the following is a discrete random variable? -The number of customers waiting in line at Franklin BBQ when it opens tomorrow morning. -The count of typos on a page. Model Selection Which of the following is true of this set of models? -The model with the best predictive performance is Model 2. -Here we see evidence that simpler models show less degradation in performance (than do more complex models) when moving going from insample to out-of-sample data. Which of the following is true of Akaike information criterion (AIC)? Select all correct answers. -AIC is among the most common measures of predictive model performance. Machine Learning Match each of the following with Statistics, Machine Learning, or both. -Fundamentally about learning from data = Both Uses lots of regression analysis = Both Machine Learning Suppose we're looking at COVID-19 data from every county in the US, and we're trying to understand the relationship between various social-distancing measures taken in that county (x) and the growth rate of the virus in that county. We build a regression model that relates each county's COVID-19 growth rate (y) versus several predictors that measure the extent of each county's social distancing behavior. Our goal is to understand how different measures of social distancing seem related to the COVID-19 growth rate. Does this sound more like we need the tools of statistics or machine learning? Why? -Statistics, because we care chiefly about helping stakeholders (policy-makers, health professionals, etc) understand and interpret an important partial relationship. Regression in the Real World Which of the following are among guidelines for data scientists on what variables to include in fitting multiple regression models? -It is essential to incorporate variables that directly affect both the outcome (Y) and the particular X predictor of interest. -It is beneficial, but not strictly essential, to include variables that affect Y even if they are not correlated with a particular X predictor of interest. Multiple Regression This is one of multiple questions about the same model. In 1990, the United Nations created a single measure that ranges between zero and one -- the Human Development Index (HDI) -- to summarize health, education, and economic status for world countries. The following is a fitted model to predict HDI from life expectancy and expected years of schooling: Which of the following are correct interpretations of the (beta) coefficient for LifeExpectancy? Select all correct answers. -A partial slope. -The change in HDI associated with a one-year increase in life expectancy, holding other predictors constant. Grouped and Numerical Data This is one of multiple questions about the same scenario. NCAA coaches may be compensated with salary as well as with an annual bonus. The amount of that bonus may be contingent on team performance as well as meeting off-field thresholds with respect to players' grades and conduct. The distribution of salary and bonus is something that coaches negotiate with the Athletic Director at a NCAA school. A college football coach-who-will-not-be-named is evaluating the job market and wants to predict what sort of bonus compensation he might expect to earn at various NCAA schools. Using NCAA Salaries data, (Links to an external site.) he fits a model to predict the maximum annual bonus (in millions of dollars) from: Salary (in millions of dollars) whether the school is in the Southeastern Conference (SEC = 1) or is not (SEC = 0) The model equation is: Interactions and ANOVA Reasons to include an interaction term in our model include which of the following? -To estimate context-specific effects of some predictor variable on the outcome (y). -Looking at an ANOVA table suggests that an interaction term noticeably improves the predictive power of the model. Grouped Variables A data scientist fits a groupwise linear model to predict Revenue from research grants in terms of affiliation of the research team with 1 of the 8 academic Institutions in the University of Texas system (UT Arlington, UT Austin, UT Dallas, UT El Paso, UT Permian Basin, UT Rio Grande Valley, UT San Antonio, or UT Tyler). If this model uses "baseline/offset" form, how many dummy variables should this model use to encode the variable Institution? Observational Studies Match each of the following study designs with its corresponding rank in the "hierarchy" of study designs discussed during class. -Randomized controlled trials = 1 -Quasi-experiments with good mechanisms for randomization = 2 -Prospective cohort studies = 3 -Retrospective cohort studies = 4 -Other study types = 5 Experiments Which of the following is true of randomization in the context of an experiment? -Randomization ensures balance, on average, even for possible confounding factors of which the experimenter is not aware. Why do we need a control group in experimental design? Choose all correct answers. -To rule out alternative explanations for the outcome of the experiment related to placebo effects. [Show More]

Last updated: 1 year ago

Preview 1 out of 15 pages

Add to cart

Instant download