Programming > QUESTIONS & ANSWERS > University of California, Berkeley DATA MISC Homework 9: Central Limit Theorem (All)

University of California, Berkeley DATA MISC 1 Homework 9: Central Limit Theorem Reading: * Why the mean matters Please complete this notebook by filling in the cells provided. Before you begi... n, execute the following cell to load the provided tests. Each time you start your server, you will need to execute this cell again to load the tests. Homework 9 is due Thursday, 11/1 at 11:59pm. You will receive an early submission bonus point if you turn in your final submission by Wednesday, 10/31 at 11:59pm. Start early so that you can come to office hours if you’re stuck. Check the website for the office hours schedule. Late work will not be accepted as per the policies of this course. Directly sharing answers is not okay, but discussing problems with the course staff or with other students is encouraged. Refer to the policies page to learn more about how to learn cooperatively. For all problems that you must write our explanations and sentences for, you must provide your answer in the designated space. Moreover, throughout this homework and all future ones, please be sure to not re-assign variables throughout the notebook! For example, if you use max_temperature in your answer to one question, do not reassign it later on. In [1]: # Don't change this cell; just run it. import numpy as np from datascience import * # These lines do some fancy plotting magic. import matplotlib %matplotlib inline import matplotlib.pyplot as plt plt.style.use('fivethirtyeight') import warnings warnings.simplefilter('ignore', FutureWarning) from client.api.notebook import Notebook ok = Notebook('hw09.ok') _ = ok.auth(inline=True) ===================================================================== Assignment: Homework 9: Central Limit Theorem 1 OK, version v1.12.5 ===================================================================== Successfully logged in as [email protected] 1.1 1. The Bootstrap and The Normal Curve In this exercise, we will explore a dataset that includes the safety inspection scores for restaurants in the city of Austin, Texas. We will be interested in determining the average restaurant score for the city from a random sample of the scores; the average restaurant score is out of 100. We’ll compare two methods for computing a confidence interval for that quantity: the bootstrap resampling method, and an approximation based on the Central Limit Theorem. In [2]: # Just run this cell. pop_restaurants = Table.read_table('restaurant_inspection_scores.csv').drop(5,6) pop_restaurants Often it is impossible to find complete datasets like this. Imagine we instead had access only to a random sample of 100 restaurant inspections, called restaurant_sample. That table is created below. We are interested in using this sample to estimate the population mean. Question 3 Complete the function bootstrap_scores below. It should take no arguments. It should simulate drawing 5000 resamples from restaurant_sample and computing the mean restaurant score in each resample. It should return an array of those 5000 resample means. In [8]: def bootstrap_scores(): resampled_means = make_array() for i in range(5000): resampled_mean = np.mean(restaurant_sample.sample().column(3)) resampled_means = np.append(resampled_means, resampled_mean) return resampled_means resampled_means = bootstrap_scores() resampled_means Question 4 Compute a 95 percent confidence interval for the average restaurant score using the array resampled_means. In [11]: lower_bound = percentile(2.5, resampled_means) upper_bound = percentile(97.5, resampled_means) print("95% confidence interval for the average restaurant score, computed by bootstrap 95% confidence interval for the average restaurant score, computed by bootstrapping: ( 90.98 , 93.56 ) Question 5 Does the distribution of the resampled mean scores look normally distributed? State "yes" or "no" and describe in one sentence why you would expect that result. Yes, since the central limit theorem states that the distribution of sample averages tend to be normally distributed 6 Question 6 Does the distribution of the sampled scores look normally distributed? State "yes" or "no" and describe in one sentence why you should expect this result. Hint: Remember that we are no longer talking about the resampled means! No, since the sampled scores are distributed like the population scores , and the population scores are not normally distributed. For the last question, you’ll need to recall two facts. 1. If a group of numbers has a normal distribution, around 95% of them lie within 2 standard deviations of their mean. 2. The Central Limit Theorem tells us the quantitative relationship between the following: * the standard deviation of an array of numbers. * the standard deviation of an array of means of samples taken from those numbers. Question 7 Without referencing the array resampled_means or performing any new simulations, calculate an interval around the sample_mean that covers approximately 95% of the numbers in the resampled_means array. You may use the following values to compute your result, but you should not perform additional resampling - think about how you can use the CLT to accomplish this. In [12]: sample_mean = np.mean(restaurant_sample.column(3)) sample_sd = np.std(restaurant_sample.column(3)) sample_size = restaurant_sample.num_rows mean_sd = sample_sd / sample_size**0.5 lower_bound_normal = sample_mean - 2 * mean_sd upper_bound_normal = sample_mean + 2 * mean_sd print("95% confidence interval for the average restaurant score, computed by a normal 95% confidence interval for the average restaurant score, computed by a normal approximation: ( 90.9258714979737 , 93.6341285020263 ) This confidence interval should look very similar to the one you computed in Question 4. 1.2 2. Testing the Central Limit Theorem The Central Limit Theorem tells us that the probability distribution of the sum or average of a large random sample drawn with replacement will be roughly normal, regardless of the distribution of the population from which the sample is drawn. That’s a pretty big claim, but the theorem doesn’t stop there. It further states that the standard deviation of this normal distribution is given by sd of the original distribution psample size In other words, suppose we start with any distribution that has standard deviation x, take a sample of size n (where n is a large number) from that distribution with replacement, and compute the mean of that sample. If we repeat this procedure many times, then those sample means will have a normal distribution with standard deviation pxn. That’s an even bigger claim than the first one! The proof of the theorem is beyond the scope of this class, but in this exercise, we will be exploring some data to see the CLT in action. [Show More]

Last updated: 1 year ago

Preview 1 out of 32 pages

Programming> QUESTIONS & ANSWERS > University of California, Berkeley DATA MISC Homework 6: Probability, Simulation, Estimation, and Assessing Models (All)

Homework 6: Probability, Simulation, Estimation, and Assessing Models Reading: Randomness (https://www.inferentialthinking.com/chapters/09/randomness.html) Sampling and Empirical Distributions (ht...

By QuizMaster , Uploaded: Oct 01, 2022

**$9**

Programming> QUESTIONS & ANSWERS > University of California, Berkeley DATA MISC Homework 10: Linear Regression (All)

University of California, Berkeley DATA MISC Homework 10: Linear Regression Reading: Prediction (https://www.inferentialthinking.com/chapters/15/prediction.html) 1. Triple Jump Distances vs....

By QuizMaster , Uploaded: Oct 02, 2022

**$9**

Programming> QUESTIONS & ANSWERS > WGU C779 Web Development Foundations Questions and Answers Graded A+ (All)

MIME (Multipurpose Internet Mail Extensions) ✔✔A protocol that enables operating systems to map file name extensions to corresponding applications. Also used by applications to automatically process...

By clairel^ , Uploaded: Jan 27, 2023

**$8**

Programming> QUESTIONS & ANSWERS > WGU C777 - Practice Test C Questions and Answers with Verified Solutions (All)

WGU C777 - Practice Test C Questions and Answers with Verified Solutions Because mobile devices have smaller screens, you must lay out your pages differently than you would for a page that will be...

By Nutmegs , Uploaded: Sep 14, 2022

**$10**

Programming> QUESTIONS & ANSWERS > WGU C777 Web Development Applications – Already Graded A (All)

WGU C777 Web Development Applications – Already Graded A App ✔✔Relatively small applications developed exclusively for mobile devices <nav> ✔✔structure element in HTML5 that includes hypertext menu...

By Nutmegs , Uploaded: Sep 14, 2022

**$10**

Programming> QUESTIONS & ANSWERS > WGU C777 Web Development Applications Pre-Assessment Latest 2022 (All)

WGU C777 Web Development Applications Pre-Assessment Latest 2022 The HTML5 specification for the <audio> element does not require a specific audio codec to be supported, but it does support three...

By Nutmegs , Uploaded: Sep 14, 2022

**$8**

Programming> QUESTIONS & ANSWERS > 06 eLMS Activity 2 – ARG Latest Update Q&A 2022 (All)

1. Briefly explain the purpose of Open Shortest Path First in routing. OSPF (Open Shortest Path First) is an Internet Protocol (IP) routing protocol. It is part of the set of internal gateway proto...

By Exammate , Uploaded: Aug 04, 2023

**$4.5**

Programming> QUESTIONS & ANSWERS > ASU CSE 110 Exam 1 with 100% Correct Answers Updated & Verified (All)

ASU CSE 110 Exam 1 with 100% Correct Answers Updated & Verified Which of the following refers to a collection of programs that a computer executes? ✔✔ Software Computers are machines that ✔✔ execu...

By Crescent , Uploaded: Jan 02, 2023

**$9**

Programming> QUESTIONS & ANSWERS > Questions and Answers > 2019 Latest passapply 1Z0-071 PDF and VCE dumps Download. Pass Oracle 1Z0-071 Exam with 100% Guarantee (All)

2019 Latest passapply 1Z0-071 PDF and VCE dumps Download 1Z0-071Q&As Oracle Database 12c SQL Pass Oracle 1Z0-071 Exam with 100% Guarantee Free Download Real Questions & Answers PDF and VCE file fr...

By QuizMaster , Uploaded: Feb 12, 2021

**$9.5**

Programming> QUESTIONS & ANSWERS > Avaya ACIS 71200X Exam Questions V9.02 | Killtest (All)

Avaya ACIS 71200X Exam Questions V9.02 | Killtest You need to enable the digit 8 to be used as the AAR access code. Which SAT command would you use to define the digit 8 as the AAR access code? A....

By Nutmegs , Uploaded: Aug 29, 2022

**$8**

Connected school, study & course

**About the document**

Uploaded On

Oct 02, 2022

Number of pages

32

Written in

This document has been written for:

Uploaded

Oct 02, 2022

Downloads

0

Views

33

Avoid resits and achieve higher grades with the best study guides, textbook notes, and class notes written by your fellow students

Your fellow students know the appropriate material to use to deliver high quality content. With this great service and assistance from fellow students, you can become well prepared and avoid having to resits exams.

Your fellow student knows the best materials to research on and use. This guarantee you the best grades in your examination. Your fellow students use high quality materials, textbooks and notes to ensure high quality

Get paid by selling your notes and study materials to other students. Earn alot of cash and help other students in study by providing them with appropriate and high quality study materials.

In Browsegrades, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.

We're available through e-mail, Twitter, Facebook, and live chat.

FAQ

Questions? Leave a message!

Copyright © Browsegrades · High quality services·