Rule of thumb: Minimum population size to calculate sample size for data that needs to be disaggregated

As a rule of thumb, what should be the minimum population size required to calculate a reliable sample size for a survey, given that the survey responses need to be disaggregated by various demographic groups (race, gender, age, zip codes, etc.)? Also, does the minimum size of a specific demographic group within the population matter when calculating the sample size, assuming the overall population size is above the threshold?

3 Likes

Great question @johnjy thanks for asking!

Figuring out what sample size you might need for a project is complex, and there are many misconceptions and myths about it.

In terms of the relationship between the size of the population and the size of the sample you need, these two things are mostly unrelated. Unless you are going to collect a sample that is close to the same size as your population, or your population is very small, such as less than 1000 total, the population size is not what determines how big of a sample you need. I know this is very counterintuitive! :upside_down_face:

What you need to consider when deciding what sample size you need is how precise you need your answer to be, how representative you need your answer to be, how much variability you think there is in the population, and how large the effect or phenomena is that you’re trying to capture in your data. There isn’t really a rule of thumb for this that has stood the test of time, so you do really need to consider each of these items and maybe use some formal calculations.

I will add some links to resources here in a little bit.

If you want to share a specific situation, we’re happy to help walk through the calculations.

1 Like

Heather, thanks for your thoughtful response!

I work in local government for a city of 150,000 people, and I’m developing a guide to improve survey practices across departments. My goal is to help staff determine when to use sampling, what response rate to aim for, and when data can be reliably broken down by demographics (e.g., race, gender, age). This effort is to improve our program performance management system.

One section of the guide will assist staff who organize events in determining appropriate response rates and demographic analysis based on event sizes (for example, small: <200 attendees, medium: 200–500 attendees, large: >500 attendees). Similarly, for surveys aimed at different levels—community, neighborhood, or city-wide—the sample size would be based on the population data from the census.

For example,

If I arbitrarily choose 500 attendees as the minimum threshold for reliable demographic breakdowns, two departments hosting events might have different approaches. Department A, with 300 attendees, should aim for maximum responses but understand that demographic analysis would be unreliable in this case. Meanwhile, Department B, with 850 attendees, could use stratified sampling for more reliable subgroup data.

I’m looking for advice on how to set such thresholds, based on either research or industry rules of thumb. Also, I would appreciate tips on determining appropriate response rates for various event sizes.