How do random variations in the seed affect the overall expansion? If small variations in the seed (sample uncertainty) largely affect the overall expansion, this raises consistency issues.
Related question: How to measure robustness of the expansion step?
We emulate data uncertainty by generating a sequence of random draws in the annotated seed. Then, we can compare the generated expansions and look at how data uncertainty affect the overall expansion.
Using the generated expansions, we look at two things:
pairwise overlap: for all pairs of expansions (k in n), we compute the share of families which are in both expansions and report moments of the distribution
batch overlap: for all expansions, we compute the share of families which are in all expansions and report moments of the distribution
Overall, we find that the expansion is robust to data uncertainty, even under particularly conservative conditions. Drawing 10 different random samples of the seed (50% of the seed), the resulting expansions exhibit a median family overlap ranging between 76% and 94%. The median (and other moments) of the overlap distribution grow as the share grows - as expected.