Sampling Methods in Research
Sampling
is that part of statistical practice concerned with the selection of an
unbiased or random subset of individual observations within a
population of individuals intended to yield some knowledge about the
population of concern, especially for the purposes of making predictions
based on statistical inference. Sampling is an important aspect of data
collection.
There are two basic approaches to sampling: probabilistic and non-probabilistic sampling.
A
probability sampling scheme is one in which every unit in the
population has a chance (greater than zero) of being selected in the
sample, and this probability can be accurately determined. The
combination of these traits makes it possible to produce unbiased
estimates of population totals, by weighting sampled units according to
their probability of selection. Example: We want to estimate the total
income of adults living in a given street. We visit each household in
that street, identify all adults living there, and randomly select one
adult from each household. (For example, we can allocate each person a
random number, generated from a uniform distribution between 0 and 1,
and select the person with the highest number in each household). We
then interview the selected person and find their income. People living
on their own are certain to be selected, so we simply add their income
to our estimate of the total. But a person living in a household of two
adults has only a one-in-two chance of selection. To reflect this, when
we come to such a household, we would count the selected person’s income
twice towards the total. (In effect, the person who is selected from
that household is taken as representing the person who isn’t selected.)
In the above example, not everybody has the same probability of
selection; what makes it a probability sample is the fact that each
person’s probability is known. When every element in the population does
have the same probability of selection, this is known as an ‘equal
probability of selection’ (EPS) design. Such designs are also referred
to as ‘self-weighting’ because all sampled units are given the same
weight.
Nonprobability sampling is
any sampling method where some elements of the population have no chance
of selection (these are sometimes referred to as ‘out of
coverage’/’undercovered’), or where the probability of selection can’t
be accurately determined. It involves the selection of elements based on
assumptions regarding the population of interest, which forms the
criteria for selection. Hence, because the selection of elements is
nonrandom, nonprobability sampling does not allow the estimation of
sampling errors. These conditions place limits on how much information a
sample can provide about the population. Information about the
relationship between sample and population is limited, making it
difficult to extrapolate from the sample to the population. Example: We
visit every household in a given street, and interview the first person
to answer the door. In any household with more than one occupant, this
is a nonprobability sample, because some people are more likely to
answer the door (e.g. an unemployed person who spends most of their time
at home is more likely to answer than an employed housemate who might
be at work when the interviewer calls) and it’s not practical to
calculate these probabilities. In addition, non-response effects may
turn any probability design into a non-probability design if the
characteristics of non-response are not well understood, since
non-response effectively modifies each element’s probability of being
sampled.
Let us look at the various types of sampling under each category:
Probability Sampling
- Simple random sampling
- Systematic sampling
- Stratified sampling
- Multistage cluster sampling
Non-probability Sampling
- Convenience sampling
- Quota sampling
- Judgment sampling
- Snowball sampling
1. Probability Sampling Methods
A
sampling in which every member of the population has a calculable and
non-zero probability of being included in the sample is known as probability sampling.
Methods of random selection consistent with both the probabilities of
inclusion are used in forming estimates from the sample. The probability
of selection need not be equal for members of the population. If the
purpose of a research is to arrive at conclusions or make predictions
affecting the population as a whole, then the choice of a probabilistic
sampling approach is desirable.
1.1. Simple Random Sampling:
A
sampling process where each element in the target population has an
equal chance or probability of inclusion in the sample is known as simple random sampling.
For ex, if a sample of 15000 names is to be drawn from the telephone
directory, then there is equal chance for each number in the directory
to be selected. These numbers (serial no of name) could be randomly
generated by the computer or picked out of a box. These numbers could be
later matched with the corresponding names thus fulfilling the list. In
small populations random sampling is done without replacement to avoid
the instance of a unit being sampled more than once.
The benefits of simple random sampling
can be reaped when the target population size is small, homogeneous,
sampling frame is clearly defined, and not much information is available
regarding the population. It is advantageous in that it is free of
classification error, and requires minimum advance knowledge of the
population. Two striking features are the elimination of human bias and
non-dependency on the availability of the element. It is seldom put into
practice because of the application problem associated with it. This
sampling method is generally not preferred as it becomes imperative to
list every item in the population prior to the sampling and requires
constructing a very large sampling frame, resulting in extensive
sampling calculations and excessive costs.
1.2. Systematic Sampling:
Systematic
sampling involves the selection of every kth element from a sampling
frame. Here ‘k’ represents the skip interval and is calculated using the
following formulae.
Skip interval (k) = population size/Sample size
Often
used as a substitute to simple random sample, it involves the selection
of units from a list using a skip interval (k) so that every k’th
element on the list, following a random start between 1 and k, is
included in the sample. For ex, if k were to equal 6, and the random
start were2, then the sample would consists of 2nd, 8th, 14th, 20th …….elements of the sampling frame.
It
is to be noted here that if the skip interval is not a whole number
then it is rounded off to the nearest whole number. This sampling method
can be used industrial operations where the equipments and machinery in
the production line are checked for proper functioning as per the
specifications. The manufacturer can select every k’th item to ensure
consistent quality or for detection of defects. Therefore, he requires
the first item to be selected at the random as the starting point and
subsequently he can choose every k’th item for evaluation against
specifications. It also finds its applicability while questioning people
in a sample survey where the interviewer may catch hold of every 10th
person entering a particular shop. However, in every case, the
researcher has to determine the skip interval and proceed thereafter. In
both the cases, it is necessary to select the first item in the
population in a random manner and thereafter follow the skip interval.
This method is more economical and less time consuming than simple
random sampling.
Stratification
is the process of grouping the members of the population in homogenous
group before sampling. It should be ensured that each element in the
population is assigned a particular stratum only. The random sampling is
applied within each stratum independently. This often improves the
representativeness of the sample by reducing the sampling error.
The
number of units drawn for sampling from each stratum depends on the
homogeneity of the elements. A smaller sample can be drawn from the
known to have the elements with the same value whereas sample can be
drawn in much higher proportion from another stratum where the values
are known to differ. This is because in the former case the information
from the smaller number of respondents can be enumerated to the whole
sample stratum. However in the latter case with much variability among
the elements the higher elements value will keep the sampling to minimum
errors to minimum value. The smaller errors may be due to groups are
appreciably represented when strata are combined.
1.4. Multistage cluster sampling:
Clustering
involves grouping the population into various clusters and selecting
few clusters for study. Cluster sampling is suitable for conducting
research studies that cover large geographic area. Once the cluster is
formed the researcher can either go for one stage, two stages, or multi
stage cluster sampling. In single stage, all the elements from each
selected are studied, whereas in two stages, the researchers use random
to select few elements from clusters. Multistage sampling involves
selecting a sample in two or more successive stages. Here the cluster
selected in the first stage can be divided into cluster units.
For
example consider the case where a company decides to interview 400
households about the likeability of its new detergent in a metropolitan
city. To minimize the resources and time researchers divide the city
into separate blocks say 40, each block consist of heterogeneous units.
The researcher may opt for the two stage cluster sampling if he finds
that individual clusters have little heterogeneity to other clusters.
Similarly a multi stage cluster sampling involves three or more sampling
steps, it differs from stratified sampling that is done in cluster in
contrast to elements within strata as is the case in the stratified
sampling. Elements are randomly selected from each stratum in each
stratum in case of stratified sampling whereas only selected clusters
are studied in cluster sampling.
2. Non-probability Sampling Methods
It
involves the selection of units based on factors other than random
chance. It is also known as deliberate sampling and purposive sampling.
For ex, a scheme whereby units are selected purposefully would yield a
non-random sample. In a general sense, it is an umbrella term, which
includes any sample that does not conform to the requirements of a
probability sampling. Convenience sampling, quota sampling, judgment
sampling and snow ball sampling are few ex’s of non-probability
sampling.
2.1. Convenience Sampling:
The
selection of units from the population based on their easy availability
and accessibility to the researcher is known as convenience sampling.
For ex, imagine a co., that surveys a sample of its employees to know
the acceptance for a new flavor of potato chips that it plans to
introduce in the market. This type of sampling is a typical ex of
convenience sampling as the criterion for selecting a sample is
convenience and availability. Although this type of research is easy and
cost effective, the findings of the sample survey cannot be generalized
to the entire population, as the sample is not representative. As there
is no set criterion for selecting the sample, there is a scope for
research being influenced by the bias of the researcher. As in the above
ex, the researcher may conduct a sample survey involving its own
employees to find whether the market, would accept the product.
2.2. Quota Sampling:
In
quota sampling, the entire population is segmented into mutually
exclusive groups. The number of respondents (quota) that are to be drawn
from each of several categories is specified in advance and the final
selection of respondents is left to the interviewer who proceeds until
the quota for each category is filled. Quota sampling finds extensive
use in commercial research where the main objective is to ensure that
the sample represents in relative proportion, the people in the various
categories in the population, such as gender, age group, social class,
ethnicity, and region of residence. For ex, if a researcher wants to
segment the entire population based on gender, then he would have two
categories of respondents, that is, males and females. If he plans to
collect a sample of 30, he may allot a quota of 15 for male and 15 for
female respondents. Therefore, the researcher will stop administering
the questionnaire to females after he interviews the 15th female respondent, that is, when the quota of 15 females is filled.
Quota sampling is subject to interviewer bias that may result in:
- The quota reflecting the population in terms of superficial characteristics.
- The researcher selecting the respondents based on availability rather than on their suitability to the study.
2.3. Judgment Sampling:
The
selection of a unit, from the population based on the judgment of an
experienced researcher, is known as judgment or purposive sampling.
Here, the sample units are selected based on population’s parameters. It
is often noticed that companies frequently select certain preferred
cities during test marketing their products. This is because they
consider the population of that particular city to be representative of
the total population of the country. The same is the case with the
selection of specific shopping malls that according to the researcher’s
judgment attract a reasonable number of customers from different
sections of the society. Polling results predicted on television is also
a result of judgment sampling. Researchers select those districts that
have voting patterns close to the overall state or country in the
previous year. The judgment of the researcher is based on the
assumption that the past voting trends of selected sample districts are
still representative of the political behavior of the state’s
population. For ex, certain companies test market their new product
launches in cities like Mumbai and Bangalore, because the profile of
these cities is representative of the total Indian population.
2.4. Snowball Sampling:
Sampling
procedures that involve the selection of additional respondents are
known as snowball sampling. This sampling technique is used against low
incidence or rare populations. Sampling is a big problem in this case,
as the defined population from which the sample can be drawn is not
available. Therefore, the process sampling depends on the chain system
of referrals. Suppose, SG sports Ltd., a manufacturer of sports
equipment plans to survey 100 senior players through its new website for
getting their feedback on the quality of its products.
However,
getting track of such senior senior squash players can be very
difficult, as their presence may be very rare or low. Therefore, it
collects the details of the first 200 visitors to its website, to list
if any of them is a squash player or knows a squash player. If the
visitor is a squash player, then he is requested to refer the names of
at least 3 other players known to him. The referred names of the squash
players are then called upon for further referrals and this gone on
until the sample size of 100 adult players is reached. Although small
sample sizes and low costs are the clear advantages of snowball
sampling, bias is one of its disadvantages. The referral names obtained
from those sampled in the initial stages may be similar to those
initially sampled. Therefore, the sample may not represent a
cross-section of the total population. It may also happen that visitors
to the site or interviewers may refuse to disclose the names of those
whom they know.
Sampling Errors in Research
Error is defined as , “an act, assertion, or belief that unintentionally deviates from what is correct, right, or true”. In a business research process,
there is sure to be some error in the results because there is the
involvement of human intelligence and the use of sampling methods that
may not be always accurate. The absolute value of the difference between
an unbiased point estimate and the corresponding population parameter
is known as a sampling error. It arises because the data is collected
from a part, rather than the whole of the population. The sampling error
can be more reliable by increasing the sample size. Total survey errors
are of two types: Random sampling error & non-sampling error.
- Random Sampling Error: Random sampling error or sampling error is the difference between the sample results and the results of a census conducted by identical procedures. Although a representative sample is taken, there is always a slight deviation between the true population value and the sample value. This is because the sample selected is not perfectly representative of the test population. Therefore, a small random sampling error is evident. As the sampling error is the outcome of chance, the laws of probabilities are applicable to it. The sampling error is inversely proportional to the sample size. As the sample size increases, the sampling error decreases. Although sampling errors cannot be avoided altogether, they can be controlled through careful sample designs, large samples, and multiple contacts to assure representative response. Random sampling error represents how accurately the sample’s true mean value(x sample), is representative of the population’s true mean value(X population).
- Non-Sampling error: Non- sampling errors also known as systematic errors occur due to the nature of the study’s design and the correctness of execution. Non-sampling error includes non-observation errors and measurement errors. Non- observational errors occur when data cannot be collected from the sampling unit or variable. Measurement errors arise from various sources like respondents, interviewers, supervisors, and even data processing systems. Non-observation error is further divided into non-coverage and non-response error. In probability sampling, each element of the population has a non-zero chance of selection into the sample. Non-coverage error occurs when an element in the target population has no chance of being selected into the sample. Non-response error occurs when data cannot be collected from the element actually selected into the sample. This may be due to the refusal of the element to cooperate because of language barrier, health limitation, or non availability of the element during the survey period. Selection of faulty sampling frame may also result in a non-sampling error. Sampling frame error is said to occur when certain non potential respondents are included in the sampling frame and certain deserving respondents are rejected.
No comments:
Post a Comment