In my original data pool I had 600 samples and I calculated pricing info and average distribution for those clusters.
I now have a lot more sample points and I am looking too add another 600 samples onto my original 600.
How would I go about selecting the second 600 in a way that does not disrupt my distribution rations between the clusters and the pricing info?
Also, how would I test for similarity between both sample sets?
Thanks!
You can use stratified sampling. It is used when we might reasonably expect the measurement of interest to vary between the different subgroups, and we want to ensure representation from all the subgroups.
To test for similarity between both sample sets can be used the Paired t-test (comparing means between two groups).
Comments
Leave a comment