Trying to survey an entire population of people is hard. How could you possibly reach every woman between the ages of 20-29? Every student at your college? Every person in Germany? It takes a lot of time and money to get data from every single person, and when you finally do, that data might have changed! To combat this problem researchers might use methods like cluster sampling or stratified sampling to collect data from groups or individuals that represent the larger population. These two are often confused, so this page offers insight on cluster sampling vs. stratified sampling.
Note that these are not the only two sampling methods available. Other sampling methods include:
- Simple random sampling
- Systematic sampling
- Convenience sampling
- Snowball sampling
Differences Between Cluster Sampling vs. Stratified Sampling?
Cluster sampling and stratified sampling are two sampling methods that break up populations into smaller groups and take samples based on those groups. In cluster sampling, natural “clusters” are groups that are selected for the sample. In stratified samples, individuals within chosen groups are selected for the sample.
What Is Cluster Sampling?
Cluster sampling is a type of sampling in which a larger population is naturally divided up into different clusters, or groups. Clusters, rather than individuals, are randomly selected as the sample. Certain rules and principles of cluster sampling ensure that researchers still get an accurate sample.
Types of Cluster Sampling
Depending on the resources available to the researchers, clusters may undergo a series of random selections before the final sample is chosen. Researchers may put the clusters together themselves or use natural borders or groupings that divide clusters (state borders, age, homeroom, etc.)
Once the clusters are identified and assigned a number, researchers may engage in single-stage cluster sampling and select every member of a few clusters to be their sample. Researchers may choose clusters through simple random sampling, or by generating a random number.
Maybe this creates too large of a sample. In this case, the researchers will enter a second stage and randomly choose a group of people within each cluster to use in the sample. Double-stage cluster sampling will produce a less accurate sample but is more convenient if resources are limited.
Researchers can narrow their samples down even further by randomly choosing people from the random group they’ve chosen from the random clusters. Multi-stage cluster samplings get farther and farther away from the population and offer a less accurate sample, but may be encouraged if resources are truly limited.
Rules of Cluster Sampling
To ensure that clusters can represent the entire population, researchers follow a few rules and principles of cluster sampling:
Clusters must be similar. The people within each cluster can be very different from each other, but the clusters as a whole should look very similar across the population. When this happens, you are more likely to choose a cluster that represents the whole population.
Let’s say you want to choose a sample from a zip code, and you put “clusters” together by neighborhood. As you look at the different neighborhoods, you see that some are filled with people of one age group or income bracket, whereas other neighborhoods have a majority of another income bracket. Some neighborhoods have a good mix, but each cluster looks wildly different from the next. Cluster sampling would not be a good choice in this scenario.
Clusters should be mutually exclusive. Individuals should only belong to one cluster.
Clusters must represent the whole population. This is the ultimate goal. Cluster sampling is usually chosen due to limited resources or convenience. The goal should not change: you want to understand the whole population better.
Examples of Cluster Sampling
- You own an apple orchard, and you want to determine how large the apples in your orchard are. There is no time to measure every single apple in the orchard, so you decide to use cluster sampling. All of your apple trees are lined up in neat rows, from 1-20. They contain a mix of apple trees but look the same and are the same age. You use simple random sampling and decide to measure the apples in rows 3, 6, and 17.
- Do the kids in your school want more options at lunch? Do they bring their lunch to school or buy lunch at the cafeteria? You want to use cluster sampling to find out the answer. Each homeroom contains around the same number of students, so you randomly select the homerooms on the East Wing of the school and send the students in those homerooms a survey.
- Do people who attend the YMCA believe that the organization offers enough resources to the neighborhood? One way to find out is to select all of the YMCAs in one state and survey its members. You will have to do some research beforehand to ensure that the state’s members represent the larger population in terms of income bracket, age, etc.
As you read through these examples of cluster sampling, you may be asking yourself, “Can’t I just choose the groups I want to sample to ensure I represent the whole population?” You can, but then you might not be using cluster sampling to put together your sample. Stratified sampling is very similar to cluster sampling, but the small differences between them could be the difference in terms of how accurate or biased your sample becomes.
What Is Stratified Sampling?
Stratified random sampling is a sampling method that intentionally divides the population into different strata, then randomly selects individuals from each stratum to ensure that all groups are accounted for in the sample. Strata can be anything from race to age to zip code.
Stratified random sampling can prevent the problems that come with cluster sampling when clusters are imbalanced. Remember the example with the neighborhoods that looked very different from one another? Researchers could still consider each neighborhood as an individual stratum, but then select individuals from each neighborhood as a way to get a set of data that reflects the population as a whole.
Rules and Principles of Stratified Sampling
Do your research before dividing the population into strata. Consider the population you’re sampling and how different demographics may affect their answers or accessibility. If you’re surveying a group of college students, it may not make sense to divide them up by individual age, but it may be helpful to consider the number of freshmen, sophomores, juniors, and seniors and whether they could make good strata. If your population contains college students across multiple colleges, it may make more sense to use the different colleges as strata and let the random sampling handle the proportion of ages present in the sample.
Consider the proportions of each stratum as you randomly select individuals. Let’s say you want to put together a survey of your neighborhood, and you divide your population by age. There are significantly more people in the 40-59 age group than in the 20-39 age group. You must consider this as you select individuals from the 40-59 and 20-39 age groups.
Your sample must represent the whole population. Again, this is the ultimate goal. If your sampling method seems to leave out or disproportionately represent certain facets of the population, you may want to evaluate your method and try again.
Examples of Stratified Sampling
- You want to know who people in your state are going to vote for in the next Presidential election. One way to divide the state is by county, but not all counties have the same population. After you gather the number of people in each county, you randomly select a handful of people from each county that is proportionate to the handfuls you are selecting from other counties. You may select 10 people from one county and 50 people from a more populous county.
- Remember the apple orchard from above? Let’s say you grow both red and green apples in your orchard. You know that you grow twice as many red apples as you do green apples. You randomly pick five green apples and ten red apples to weigh for your sample.
- You want to take an online survey on how many people are familiar with a certain beauty brand. You have data that allows you to reach people by age, location, and income bracket. Using a random generator, you target ads to collect people from each different age group, location, and income bracket.
Sampling will never be exactly perfect. The only way to gather data that accurately reflects a whole population is to gather data on the whole population. When this isn’t possible, cluster sampling, stratified sampling, and other sampling methods may get the job done.