# Sampling from the Stars

An interactive statistics tutorial using real astronomical data from the HYG star catalog (~120,000 stars).

## Topics Covered

- **Sampling variability** — why different samples give different estimates
- **Standard Error** — measuring the precision of your estimate
- **Central Limit Theorem** — why sample means follow a normal distribution
- **Confidence Intervals** — quantifying uncertainty in your estimates

## Getting Started

### 1. Install Dependencies

```bash
pip install -r requirements.txt
```

### 2. Run the Tutorial

**Option A: Python Script**
```bash
python star_statistics.py
```

**Option B: Jupyter Notebook** (recommended for interactive exploration)
```bash
jupyter notebook star_statistics.ipynb
```

## Files

| File | Description |
|------|-------------|
| `star_statistics.py` | Python script version of the tutorial |
| `star_statistics.ipynb` | Jupyter notebook version (interactive) |
| `data/hyg_v42.csv` | HYG star catalog with ~120,000 stars |
| `requirements.txt` | Python dependencies |

## Data Source

The HYG Database (v4.2) combines data from:
- Hipparcos Catalog
- Yale Bright Star Catalog
- Gliese Catalog of Nearby Stars

Each star includes position (RA/Dec), magnitude (brightness), spectral type, and more.

## Key Concepts

### Standard Error
$$SE = \frac{s}{\sqrt{n}}$$

Where $s$ is the sample standard deviation and $n$ is sample size.

### 95% Confidence Interval
$$CI = \bar{x} \pm 2 \times SE$$

About 95% of confidence intervals constructed this way will contain the true population mean.

### The Trade-off
To cut your CI width in half, you need **4× as many observations** (because of the √n in the denominator).
