# Make the internet a better place to learn

Find the probability of someone winning for deals 1 through 6 ...

#### Explanation:

Let ${D}_{n}$ for $n = 1 , 2 , 3 , 4 , 5 , 6$ be a winning deal

Also, note that Deal 7 MUST be a win if it gets that far.

P(D_1)=2(13/52)(39/51)=13/34 ~~0.382352941 

[Note: multiply by 2 because either player could win]

Now, moving on to the the second deal, this assumes the first deal neither player won.

P(D_2)=(1-13/34)(13/34)=0.23615917 

Continuing for ${D}_{3} \text{ through } {D}_{6}$ ...

$P \left({D}_{3}\right) = {\left(1 - \frac{13}{34}\right)}^{2} \left(\frac{13}{34}\right)$

$P \left({D}_{4}\right) = {\left(1 - \frac{13}{34}\right)}^{3} \left(\frac{13}{34}\right)$

$P \left({D}_{5}\right) = {\left(1 - \frac{13}{34}\right)}^{4} \left(\frac{13}{34}\right)$

$P \left({D}_{6}\right) = {\left(1 - \frac{13}{34}\right)}^{5} \left(\frac{13}{34}\right)$

Finally, since ${D}_{7}$ MUST be a win for one of the players, this concludes the game and the probability of getting to ${D}_{7}$ is the complement of the sum of the probabilities for ${D}_{1} \text{ through } {D}_{6}$

$P \left({D}_{7}\right) = 1 - {\sum}_{1}^{6} P \left({D}_{n}\right)$

The table below summarizes the probability distribution and the expected value for D which is equal to approximately 2.5 deals

Hope that helped! A Pearson's chi-square test can refer to a test of independence or a goodness of fit test.

#### Explanation:

When we refer to a "Pearson's chi-square test," we may be referring to one of two tests: the Pearson's chi-square test of independence or the Pearson's chi-square goodness-of-fit test.

Goodness of fit tests determine whether a data set's distribution differs significantly from a theoretical distribution. The data must be unpaired.

Tests of independence determine if unpaired observations of two variables are independent of one another.

Observed values Expected values Using the chi-square formula, you determine your chi-square statistic, your degrees of freedom, and your level of significance, and compare your results to a chi-square distribution table. For the data presented above, we could use the chi-square test to determine if males and females differ in the amount of time (more or less than fifteen hours per week) spent on homework.

Both tests analyze unpaired, categorical data and are used when data is nonparametric. Note: by unpaired, we mean that your categories are independent of one another. These tests also can't be used with very small cell counts, such as expected values lower than five.

The results of your chi-square test will only tell you whether or not your observed values fit your expected values (whether those values are to fit an expected distribution or if your two variables are independent of one another). These tests will not tell you how your observed values differ.

There's a very good tutorial here that walks you through an example in detail.

## What is the probability of getting a sum of either 7, 11, or 12 on a roll of two dice?

The probability is 25%.

#### Explanation:

Let's first take a look at the probability for one of those sums.

There are $6 \times 6 = 36$ different results of a roll of two dice:

$\left(1 , 1\right) , \left(1 , 2\right) , \ldots , \left(1 , 6\right)$
$\left(2 , 1\right) , \left(2 , 2\right) , \ldots , \left(2 , 6\right)$
$\ldots$
$\left(6 , 1\right) , \left(6 , 2\right) , \ldots , \left(6 , 6\right)$

The probability of each one of those is $\frac{1}{36}$.

• How many possible combinations of two dice will give you a sum of $7$? There are $6$ combinations: $\left(1 , 6\right)$, $\left(6 , 1\right)$, $\left(2 , 5\right)$, $\left(5 , 2\right)$,  (3,4) and $\left(4 , 3\right)$.

$\implies P \left(\text{sum} = 7\right) = 6 \cdot \frac{1}{36} = \frac{6}{36} = \frac{1}{6}$

• For a sum of $11$, there are $2$ combinations: $\left(5 , 6\right)$ and $\left(6 , 5\right)$.

$\implies P \left(\text{sum} = 11\right) = 2 \cdot \frac{1}{36} = \frac{2}{36} = \frac{1}{18}$

• For a sum of $12$, there is just $1$ combinations: $\left(6 , 6\right)$.

$\implies P \left(\text{sum} = 12\right) = \frac{1}{36}$

Now, how do you combine those three probabilities?

The events "$\text{sum} = 7$", "$\text{sum} = 11$" and "$\text{sum} = 12$" are independent events since neither of them can ever occur at the same time.

For independent events $A$ and $B$ it holds

$P \left(A \text{ or } B\right) = P \left(A\right) + P \left(B\right)$

Thus, our probability is

$P = P \left(\text{sum"=7) + P("sum"=11) + P("sum} = 12\right)$

$= \frac{6}{36} + \frac{2}{36} + \frac{1}{36} = \frac{9}{36}$

$= \frac{1}{4}$

= 25%

## In a survey of 375 dog and cat owners, there were 215 dog owners and 193 cat owners. How many in the survey own a dog and no cat?

Draw a Venn Diagram (see below)

#### Explanation:

If you sum the dog and cat owners ...

$215 + 193 = 408$

$408$ is greater than $375$ because the intersection of the two categories (see Venn diagram below) was counted twice . The value of the intersection (both cat and dog owners) is

$408 - 375 = 33$ own both a cat and a dog .

The count of owners of a dog and no cat is $215 - 33 = 182$

hope that helped ## What is the variance of {12, 6, 7, 0, 3, -12}?

Population variance: $56.556$
Sample variance: $67.867$

#### Explanation:

To calculate the variance:

1. Calculate the arithmetic average (the mean)
2. For each data value square the difference between that data value and the mean
3. Calculate the sum of the squared differences

If your data represents the entire population:
4. Divide the sum of the squared differences by the number of data values to get the population variance

If your data represents only a sample taken from a larger population
4. Divide the sum of the squared differences by 1 less than the number of data values to get the sample variance ## How many different two-person teams can be made from 6 people?

A 2 person team can be chosen in one of fifteen ways.

#### Explanation:

The question is not precise because if you treated it literally the answer would be 3. First you choose one team, 4 people are left in the group, the second team takes another 2 people and the remaining create the third team.

But I assume that the question is like "In how many ways a 2 people team can be chosen from 6 people?"

Such question has an answer $15$ because first member is chosen from 6 people (so there are 6 possibilities), the second person is chosen from remaining five people so the number is $6 \cdot 5 = 30$, but you have to divide the result by 2 because 2 people can be chosen in 2 ways but they still form the same team. It does not matter if you choose Ann first then John or the other way John first then Ann they form the same team.

There is also another way of calculating the number. A team of 2 chosen from six people is (in mathematics) a 2 element combination of a six element set.
The number of such combinations can be calculated as:

C""_6^2=((6),(2))=(6!)/(4!2!)=(1*2*3*4*5*6)/(1*2*3*4*1*2)=15#

## What are the variance and standard deviation of {2,9,3,2,7,7,12}?

Variance (population): ${\sigma}_{\text{pop}}^{2} = 12.57$
Standard Deviation (population): ${\sigma}_{\text{pop}} = 3.55$

#### Explanation:

The Sum of the data values is $42$

The Mean ($\mu$) of the data values is $\frac{42}{7} = 6$

For each of the data values we can calculate the difference between the data value and the mean and then square that difference.

The sum of the squared differences divided by the number of data values gives the population variance (${\sigma}_{\text{pop}}^{2}$).

The square root of of the population variance gives the population standard deviation (${\sigma}_{\text{pop}}$)

Note: I've assumed the data values represent the entire population.
If the data values are only a sample from a larger population then you should calculate the sample variance, ${s}^{2}$, and sample standard deviation, $s$, using the method above with the only difference being that the division to find the variance needs to be by (1 less than the number of data values).

Note 2: Normal statistical analysis is done with the aid of computers (e.g. using Excel) with built-in functions to provide these values. ## What to do if a problem has a non-tabled degrees of freedom?

Choose the closest value in the table.

#### Explanation:

When the degrees of freedom is very high, the value of the inverse function changes very slowly. This is the case for Student's t-distributions and Chi-Squared where we are normally using the table to look up a value which corresponds to some cumulative probability being met.

These tests are most sensitive when the number of degrees of freedom is low - that's where all the action is. What they are telling us is that if we gather more data (more degrees of freedom) then our answer will get better. But there is a diminishing return as we gather more data, to the point that another point really doesn't change the test by a significant amount. This is reflected in our tables, where at some point they start skipping values and taking larger steps. This can be seen in the following graph, showing a student's t test for an $\alpha = 0.95$ As the degrees of freedom gets very large the change becomes insignificant, so most tables jump from some high number, like 30, to $\infty$.

So the rule of thumb is to choose the table row closest to the degrees of freedom that you have. The error in doing so will be small, but you can, if you like, interpolate between the values.

If your degrees of freedom is larger than the largest integer entry in the table, use the value for $\infty$. If there is no entry for $\infty$, use the largest valued entry.

## How can you determine if a difference between the means of two samples is significant?

You form a new statistic which is the difference between the two means which allows you to ask significance questions about it.

#### Explanation:

In this question we want to know about the difference of two means. This is a function of random variables, i.e.

${\mu}_{1 - 2} = {\mu}_{1} - {\mu}_{2}$

or, in our case the function is of the sample means:

${m}_{1 - 2} = {m}_{1} - {m}_{2}$

There are many assumptions that go into the next steps (see this link for details Stat Trek: difference between means ) but for now lets assume that the two distributions we are sampling are approximately normal, and that we only have relatively few sampling points for each (otherwise we would be certain of the values of the means and therefore the difference).

Given this, we can calculate the sample variance of the difference of means from the sample variances of the two samples:

${s}_{1 - 2}^{2} = {s}_{1}^{2} / {n}_{1} + {s}_{2}^{2} / {n}_{2}$

Note that what goes into this calculation is the sample variances divided by the number of points, which is the variance of the calculated means and follows the expected form of the central limit theorem (Variance of sample mean ).

Before we ask questions about the new distribution using t-statistics, we need to know the degrees of freedom which can be approximated from (Welch–Satterthwaite equation ):

$D . F . = {\left({s}_{1}^{2} / {n}_{1} + {s}_{2}^{2} / {n}_{2}\right)}^{2} / \left({\left({s}_{1}^{2} / {n}_{1}\right)}^{2} / \left({n}_{1} - 1\right) + {\left({s}_{2}^{2} / {n}_{2}\right)}^{2} / \left({n}_{2} - 1\right)\right)$

This equation allows for a different significance of each point from the two distributions based on their variance and a different number of samples from each. If the distributions have the same variance and we take the same number of samples, $n$, from each, this simplifies to $D . F . = 2 n - 2$

Given all of these, we can use the students-t distribution to ask questions about the probability of the statistic ${m}_{1 - 2}$ taking on specific values using:

$t = \frac{{m}_{1 - 2} - d}{s} _ \left(1 - 2\right)$

Where d is the proposed distance between the two means.

## Suppose that U has a uniform distribution on [0, 1] and that, conditional on U = u, the distribution of V is uniform on [0, u]. What is the probability density function of V?

Probability density of random variable $V$ is
$f \left(x\right) = {\int}_{x}^{1} \frac{\mathrm{dy}}{y} = - \ln \left(x\right)$

#### Explanation:

The probability density $f \left(x\right)$ of random variable $V$ is a result of a combination of two factors:
(a) random variable $U$ should take some value $y$ greater than $x$ (with probability density $1$) and, for each such value $y$,
(b) random variable $V$ should take a value $x$ with probability density $\frac{1}{y}$.

These two above factors are two independent random variable, so the probabilities of combined events must be multiplied.

Now the probability density of $U$ (which is $1$) should be multiplied by probability density of $V$ (which is $\frac{1}{y}$) and integrate by $y$ from $x$ to $1$:
$f \left(x\right) = {\int}_{x}^{1} \frac{\mathrm{dy}}{y} = - \ln \left(x\right)$

Just to check, integral of this probability density from $0$ to $1$ should be equal to $1$:
${\int}_{0}^{1} \left[- \ln \left(x\right)\right] \mathrm{dx} = {\left[x - x \cdot \ln \left(x\right)\right]}_{0}^{1} = 1$

Graphically, the probability density of random variable $V$ looks like this:

$f \left(x\right) = - \ln \left(x\right)$
graph{-ln(x) [-.1, 1, -5, 5]}