Hi All,
We’re seeking your insights and recommendations regarding a challenge we’re facing with an existing data; particularly when the small sample sizes of subgroups when not aggregating categories of gender or race variable.
The aim is to to include all categories for gender and race in quantitative analysis (e.g., variance analysis, regression, multivariate analysis, etc.). However, if not aggregate certain race categories, the sample sizes for specific underrepresented groups become exceedingly small (e.g., counts of 1, 7, 11, and others below 20). When testing across gender and race, some of the subgroups are “0” participants. Such small sizes pose significant challenges for multiple comparisons, as they may lead to inaccuracies or be statistically underpowered.
To address this issue, one approach is collapsing some race categories. For instance, might label these as “Underrepresented”, while providing detailed information in the note about the specific groups within this category (such as American Indian/Alaska Natives, Asian, Pacific Islander/Hawaiian, Middle Eastern/North African, etc.). For groups (e.g., multiracial, White, Black/African, Hispanic/Latinx ) with sufficient sample sizes, plan to analyze them as distinct categories. This analysis plan will include a rationale for collapsing these categories, and the descriptive statistics will list all categories, even those with very small sample sizes.
Could you please provide some advice and suggestions on this approach and the following questions. We greatly appreciate your suggestions and insights!
- Do you recommend collapsing some categories of the targeted variable in this context?
- If yes, could you please share any supportive and evidence-based resources?
- If no, what alternative strategies and resources would you suggest to tackle this issue?
Thank you very much!!!