Artificial selection. Long before Darwin and Wallace, farmers and breeders were using the idea of selection to cause major changes in the features of their plants. We present a new analysis of the power of artificial selection experiments to detect and localize quantitative trait loci. This analysis uses a simulation framework. A simulation tool was developed for estimating the power to detect artificial selection acting directly on single loci. The simulation tool should be.
Power Selection Artificial The of
This linkage disequilibrium results in interference, where the effect of selection on each locus is decreased Hill and Robertson Previous studies of the power of artificial selection experiments to detect trait loci have taken the traditional population genetics approach in which selection is parameterized using selection coefficients that remain constant each generation.
For example, previous work by Kim and Stephan analyzed a single locus under a constant selection coefficient, using a diffusion approximation, and two recent simulation studies employed forward simulations with loci parameterized by constant selection coefficients to obtain power estimates Baldwin-Brown et al.
While there are ways to translate truncation selection intensity parameters to selection coefficients see, for example, Falconer and Mackay , Chap. We introduce a new simulation framework to investigate the power of artificial selection experiments to detect and localize QTL contributing to a quantitative trait. Our simulations employ a whole-genome quantitative genetic model of loci underlying a trait, and we explicitly model artificial selection of individuals each generation based on trait values.
As in Baldwin-Brown et al. A common assumption used in theoretical studies e. In addition, Orr showed that the distribution of fitness effect sizes of alleles fixed during adaptation is approximately exponential and that this result is nearly independent of the distribution of mutational effects. In light of these considerations, we follow the rationale of Otto and Jones and model effect sizes of QTL alleles as exponentially distributed. Additional support for this assumption comes from empirical studies of several quantitative traits whose genetic variation has been shown to depend on a few loci of large effect and many loci of small effect see Mackay for a review.
Because we assume no relationship between starting allele frequency and effect size, this implicitly assumes that the trait variation in the founder population is neutral or effectively neutral prior to the onset of artificial selection.
For traits under stabilizing selection, we might expect variants of large effect to have reduced frequencies. The simulation framework we have developed easily generalizes to such alternative cases, but as a starting point we begin with the simplifying assumption of a neutral trait with exponentially distributed QTL effect sizes. Our simulations show that forward simulations of a locus assuming a constant selection coefficient do not fully capture the allele frequency dynamics of a QTL under artificial selection on a quantitative trait.
In contrast, explicit quantitative genetic modeling of the trait leads to insights regarding the effect of the trait architecture on the allele frequency trajectory of a QTL.
For instance, by simulating the entire genome of individuals, we demonstrate the important role that recombination plays in decreasing interference between QTL, as well as reducing linkage disequilibrium between QTL and neighboring neutral loci. Our results emphasize that designing the artificial selection experiment to allow more opportunity for recombination increases the ability to detect and localize QTL. Finally, previous work has suggested that when founder sequence information is available, one can obtain more accurate allele frequency estimates by estimating local haplotype frequencies from pooled read data Long et al.
We show that these improved allele frequency estimates, when compared to estimates calculated directly from read counts, can lead to an increase in power, although the magnitude of improvement is condition dependent. We used the program forqs Kessner and Novembre for all forward simulations. In our simulations of artificial selection on a quantitative trait, individuals had three chromosomes, with lengths matching Drosophila chromosomes X, 2, and 3.
For each set of experimental parameters we simulated populations using replicates each of 12 different canonical architectures: For a given number of QTL and heritability level, QTL positions and effect sizes were generated randomly for each simulation run see next section.
The random trait generation was implemented in an auxiliary program that produces trait description files, which are included by forqs configuration files that specify the experimental setup. Hence, we did not include de novo mutations in our simulations. Each generation, forqs calculates trait values for all individuals based on the effect sizes of the alleles they carry at QTL and a random environmental effect with variance determined by the heritability of the trait.
All configuration files and analysis scripts for all simulations are freely available online at https: To investigate how the genetic architecture of a trait affects the behavior of QTL allele frequencies under artificial selection, we developed a method to generate random trait architectures with specified parameters. In particular, we were interested in how the number of QTL contributing to the trait and the heritability of the trait affect the power to detect QTL. In addition, we wanted to investigate how the trait architecture affects a focal QTL with a specified effect size and initial allele frequency.
Finally, we wanted to ensure that linkage disequilibrium in the simulated starting population is similar to that found in populations used in experimental settings. In the following, we describe our procedure for generating the founding population and trait architecture in our simulations. Starting with founder individuals for which we have full-genome haplotypes, we run neutral forward simulations with forqs to recombine the haplotypes and expand the population size. This results in a mixed population that is two to three times larger than the desired starting population for the selection experiment.
Our procedure is similar to the experimental procedure used by Turner and Miller , where individuals from the DGRP inbred lines are allowed to mate randomly for several generations to create a mixed population with genetic variation and linkage disequilibrium similar to those of the natural population from which the inbred lines were derived. In some cases we additionally specify a focal QTL, for which we specify the locus, effect size, and initial allele frequency. We create a starting population by randomly selecting a subset of individuals from the mixed population.
We ensure that the focal QTL has the desired allele frequency by choosing individuals in Hardy—Weinberg proportions according to their focal QTL genotype. From the heritability and total variance parameters that we specify, we calculate a target genetic variance for the trait. We choose the remaining QTL positions uniformly at random across the genome, until we have the specified number of QTL.
We then scale the QTL effect sizes so that the genetic variance is equal to the target genetic variance. In the case where we have a focal QTL, we keep the focal QTL effect size constant and scale the other QTL effect sizes—this requires iteration until the genetic variance is close to the target genetic variance, within a specified tolerance. We first recall the deterministic model of selection on a single locus in a diploid population. Suppose a locus has two alleles A 0 and A 1.
If p is the frequency of A 1 , then the allele frequency change is given by. In the special case where A 1 has additive selection coefficient s , this becomes. By solving the above equation for s , we obtain a formula for the realized selection coefficient:.
We explored several methods for calculating the power and false-positive rate associated with the detection of QTL by allele frequency differences between populations. First, we note that calculating the false-positive rate as the proportion of neutral non-QTL variants detected does not capture the effect of detected variants clustering due to linkage disequilibrium.
In addition, significantly diverged variants are not necessarily causal, but indicate that the region nearby may contain a causal variant. To address these issues, we define a detection region to include all variants within a specified radius of any variant whose absolute allele frequency difference value exceeds a given threshold see Results, Measurement of power to detect and localize QTL.
This leads to a natural definition of false-positive rate: Illustration of linkage disequilibrium between QTL and neutral loci. In plots of allele frequency differences between high and low populations, QTL peaks are narrower when extra generations of neutral mixing are introduced.
As an example, we indicate a threshold of 0. Note that the true region, consisting of sites within 10 kb of either SNP, is too small to be seen at this scale; thus, the size of the orange bars represents the local false-positive rate at this threshold.
Also note that the threshold of 0. A Twenty generations of selection; B 20 generations of selection, 4 generations per selection event 80 generations total. We calculated the false-positive rate in this way with radii of 10 kb, kb, and 1 Mb. We found that using a radius of kb or 1 Mb led to true regions that covered a substantial portion of the genome in the cases where there were a large number of QTL.
Using akb radius for the detection region gave the most interpretable results. With regards to power, the simplest method is to calculate the proportion of QTL detected. However, we feel that a more relevant measure of power is the proportion of the genetic variance in the initial population that is explained by the detected QTL.
This measure appropriately gives greater weight to QTL responsible for more of the genetic variance. Figure 3, A and B illustrates the difference between these two measures of power: Comparison of three methods for calculating and interpreting power and false-positive rate.
A Power is measured by the proportion of QTL detected, and false-positive rate is measured by the proportion of neutral variant sites detected. B Power is measured by the proportion of genetic variance in the founder population explained by the detected QTL, and false-positive rate is measured as in A. C Power is measured as in B, and false-positive rate is measured by the proportion of the neutral genome covered by the detection region.
Going forward, we use the kb radius for the detection region, together with measuring power as the proportion of variance explained, to provide the most interpretable results. The forward simulator forqs efficiently simulates the entire genome of each individual by tracking haplotype chunks. In our simulation framework, we use two forward simulations.
The first simulation creates a mixed population from the founder haplotypes, after which we generate the random trait architecture. The second simulation represents the selection experiment. Individuals in the final populations are mosaics of individuals in the mixed population, which are in turn mosaics of the founders.
We implemented a custom program to handle this two-step propagation of neutral variation. Given founder sequences, the mixed population, and the final population, the program calculates the allele frequency in the final population of each variant in the genome.
After calculating allele frequencies for each population, we calculate the allele frequency difference D for each high—low population pair under consideration multiple pairs for the replication analyses.
After sorting the D values, we begin with the highest D value and iteratively decrease the threshold to obtain data equivalent to a receiver operating characteristic ROC curve power and false-positive rates for that simulation run. We note that this step depends on the method for calculating power and false-positive rate, so we performed it once for each method we described above in Calculation of power and false-positive rate.
We obtain average ROC curves by calculating the average power over replicate simulation runs at regularly spaced false-positive rates, where the power for a particular run at a given false-positive rate is obtained by linear interpolation between points on its ROC curve. To investigate the power increase due to the use of haplotype-based allele frequency estimates, we needed to simulate pooled sequence reads from a population, followed by haplotype frequency estimation in sliding windows across the genome, using the harp method and software detailed in Kessner et al.
Because this procedure is computationally expensive, and due to the large number of simulations involved in this study, it was not feasible to do this for each simulated experiment. As an alternative, we obtained empirical error distributions, which we later used to add random errors to true allele frequencies. Because errors in the haplotype frequency estimation depend on the length scale of recombination, we ran replicate neutral simulations for varying numbers of generations from 40 to Haplotypes surrounding selected QTL are expected to be longer than in neutral regions, so our empirical error distributions are conservative.
Haplotype frequency estimation was performed with harp in overlapping sliding kb windows within a single 1-Mb region. We then calculated allele frequencies at variant sites within the region, using read counts. By considering allele frequencies in bins of size 0. Similarly, we also derived allele frequency estimates from the local haplotype frequencies, from which we obtained frequency-dependent empirical error distributions for each generation count.
We performed forward simulations of populations, using the program forqs Kessner and Novembre , which models whole genomes of individuals and selection on quantitative traits. Thus, initial allele frequencies of QTL were randomly distributed according to the allele frequency distribution of DGRP variant sites. Similarly, linkage disequilibrium patterns reflect the patterns observed in the DGRP populations. To simulate various trait architectures, we considered 12 scenarios by simulating 2, 5, 10, or QTL at initial heritability levels of 0.
Following previous theoretical and empirical studies Orr ; Otto and Jones ; Mackay ; Thornton et al. These scenarios span settings that are relatively straightforward for genetic mapping such as an oligenic trait with 2 QTL and a high heritability of 0. Our simulations began with several generations of neutral mixing, emulating laboratory procedures that use inbred founder lines to create larger experimental populations with increased genetic variation and reduced linkage disequilibrium e.
Using this procedure we simulated populations of various sizes, which we used as the initial populations for the artificial selection simulations. To first illustrate the importance of modeling quantitative traits explicitly, we show examples of allele frequency trajectories of a focal QTL contributing to a quantitative trait under truncation selection in comparison to a single locus with two alleles with a constant additive selection coefficient. For these simulations, we depart from our general procedure described above and fix the starting allele frequency of a focal QTL to 0.
The remaining QTL are chosen with a random distribution on starting allele frequencies and effect sizes see Methods. Selection is assumed to be in the direction of increasing trait values. As seen in Figure 4, A and C , the allele frequency trajectories of the focal QTL under truncation selection are qualitatively different from the trajectories under a constant selection coefficient.
While a strong selection coefficient leads to nearly deterministic allele frequency trajectories that monotonically increase, trajectories of a focal QTL under strong truncation selection are dependent on the underlying trait architecture. This effect can be seen in the trajectories where the focal QTL decreases in frequency at first, due to repulsion linkage disequilibrium i. Qualitative differences between fixed selection coefficient and truncation selection on a quantitative trait.
A focal QTL exhibits fundamentally different behavior under a constant selection coefficient, compared to truncation selection on a quantitative trait. Shown are allele frequency trajectories A and realized selection coefficient distributions B of a locus under two different selection coefficients 0.
C and D show the same for a focal QTL effect sizes 0. In addition, once an allele with a constant selection coefficient reaches high frequency, it only gradually increases in the final generations before finally going to fixation.
In contrast, the focal QTL under truncation selection tended to become fixed quickly after reaching high frequency in the population. This behavior is not surprising, because after a few generations of selection, the upper tail of the population trait value distribution is highly enriched for individuals carrying high-effect variants. To further illustrate these qualitative differences, we analyzed the realized selection coefficient of the trajectories, which represents the selection coefficient that would result in a given single-generation allele frequency change under a deterministic model see Methods.
Under a constant selection coefficient, the mean realized selection coefficient tracks the true selection coefficient closely during the selection phase, after which it decreases to zero during the drift phase Figure 4B. Under truncation selection, the behavior of the mean realized selection coefficient depends on the underlying genetic architecture of the trait.
When the effect size is low, the realized selection coefficient increases each generation—this is because selection acts on the larger-effect QTL first and then has a greater effect on the focal QTL after the larger-effect QTL have reached fixation. On the other hand, when the effect size is higher, the focal QTL experiences very strong selection initially, decreasing as the focal QTL rises to fixation Figure 4D.
Another effect of the underlying trait architecture can be seen in the fixation times of the focal QTL Figure 5. For a given effect size and heritability, the fixation time of the focal QTL increases with total number of QTL due to interference. While these two forms of simulation explicit QTL simulation vs. Effect of genetic architecture on fixation times. Fixation times of a focal QTL for a trait under truncation selection decrease with increasing effect size and heritability.
In all of the following analyses, we examine the power to detect QTL through allele frequency differences at variant sites.
There is currently no consensus regarding the choice of test statistic for the analysis of artificial selection experiments. Both for simplicity and because of its use in practice Parts et al.
We calculate D for each variant site in the genome, and we call a site detected if the D value exceeds a threshold value. By varying the threshold, we obtain ROC curves showing the relationship between power true positive rate and the false-positive rate. Due to linkage and strong selection, detected QTL will generally have neighboring neutral variants whose allele frequency differences also exceed the detection threshold.
Because of this, detection and localization of a QTL are necessarily intertwined. In an actual experimental setting, the entire genomic region surrounding the significantly diverged loci would often be chosen for follow-up studies.
We explored several methods for calculating and interpreting power and false-positive rate. We present our results using the method that we found to be most interpretable, which we summarize here see Methods for full details on the different methods. For a given D -value threshold, we determine a detection region that consists of all variants within a specified radius of any variant above the threshold 10 kb for the results presented here; see Figure 2 for an illustration. Power is calculated as the proportion of genetic variance in the founder population explained by the QTL located within the detection region.
We define the true region to consist of all variants within the specified radius of any QTL i. The false-positive rate is calculated as the proportion of the neutral genome covered by the detection region.
Thus, a false-positive rate of 0. We note also that the false-positive rate represents the combined size of regions surrounding loci above the threshold; the detection region surrounding a single locus will be one to two orders of magnitude smaller than this i. Use of this technique presumes that allele frequency differences between the high and low lines will be more pronounced at QTL contributing to the trait than, for example, differences between the high line and a control population that has been evolving neutrally.
To compare the power obtained by a divergent selection experiment to the power obtained by selection in a single direction, we simulated three populations originating from a single founder population, where one population was selected for high values, one selected for low values, and one allowed to evolve neutrally.
Comparison between the high and low populations leads to a substantial increase in power over the comparison between the high and neutral populations. Increase in power due to divergent selection. Comparison of two populations divergently selected for extreme values of a trait has greater power to detect QTL than comparison between selected and control populations.
Another technique available in artificial selection experiments is the use of replicate high and low populations to increase confidence that allele frequency differences between diverged populations are due to selection rather than genetic drift.
We then calculated the average power to detect QTL, using subsets of the data representing population replicates from one to five pairs. We found that using two replicate populations substantially increases power to detect QTL Figure 7.
For example, at the low false-positive rate neutral genome proportion of 0. Adding further replicate populations continues to increase power, but with diminishing returns. Increase in power due to replicate populations. Adding replicate pairs of divergently selected populations increases power to detect QTL. It is well known that selection on a single locus, defined by a selection coefficient s , acts more efficiently in larger populations, as can be seen in the dependence of fixation probabilities and fixation times on the population-scaled selection coefficient 4 N s Ewens In all cases we found a substantial increase in power to detect and localize QTL as we increased the population size.
Increase in power due to population size. We next investigated the effects of the length of the experiment and the strength of selection on the power to detect QTL. Each simulation ran for 80 generations, and we examined snapshots of each population at generations 20, 40, 60, and In these simulations, the populations consisted of individuals, and the trait had 10 QTL, with a heritability of 0.
However, the lower effective population size induced by strong selection results in the fixation of many neutral variants, which are then falsely detected as QTL. Effects of length of experiment and selection strength. In addition, letting the experiment run for a greater number of generations at this lower selection pressure increases the maximum power attained.
These observations suggest that recombination plays a large role in the power to detect and localize QTL: Increased recombination will also reduce interference between QTL, which should allow lower-effect QTL to be selected and detected.
This led us to investigate the effects of recombination further in our subsequent analyses. A timeline showing a decrease in the number of bacteria over time when taking an antibiotic. What is artificial selection or selective breeding? There are other types of selection, in addition to natural selection, that are out there in the world.
Think about some decisions you make about the types of pets you want or what kind of foods you prefer to eat. An example of artificial selection - Dog breeding. Around 30, to 40, years ago, humans began domesticating wolves. Nowadays, these domesticated animals are what we call dogs!
Domestication is the act of separating a small group of organisms wolves, in this case from the main population, and select for their desired traits through breeding. Over thousands of years, the domestication of wolves resulted in the loss of some of the more aggressive traits, like the instinctual, defensive behavior in the presence of humans barking or howling, bearing their teeth, poising to attack, or running away , and the size and shape of their teeth. Now humans select for a variety of traits in dogs based on personal preference and companionship, instead of as a way to increase human survival.
A timeline showing how dogs became domesticated over a long period of time due to artificial selection. Dog breeding is a perfect example of how humans select for desirable or fashionable traits. There are three different types of breeds that exist:. Purebred is a type of dog that comes from a lineage of the same dog breed and that has never mated with another breed. For example, a purebred german shepherd is all german shepherd and nothing else.
A cross-breed dog is a dog that was the offspring of two different types of purebreds. The resulting offspring would be a cross-breed of half german shepherd, half husky. Finally, mixed-breeds are a combination of multiple breeds, where their parents were not purebreds. There are too many possible combinations to count! In purebreds, since there is only one lineage, these mistakes are often more apparent and can make purebred dogs prone to certain diseases. An example of artificial selection - Genetically modified organisms.
Recently we have started to artificially select traits at a molecular level where we mix DNA from different plant or animal species to make genetically modified organisms GMOs. To genetically modify an organism, genetic information or, the blueprint of the organism is added or removed, or replaced by the information from another organism that has a trait we desire.
If you could identify the genetic information that coded for drought resistance from another plant, then you could insert that into the blueprints of your corn species to make it more resistant to drought! Cartoon showing how drought-sensitive corn is bred with drought-resistant corn to produce drought-resistant offspring. GMOs are used in agriculture to help crops become more resistant to drought, cold, salinity, pests and diseases. This is advantageous for us because it allows us to feed our growing population by doing agriculture in places that are usually less than ideal or not possible.
With more areas to do agriculture, we have larger agricultural production to feed ourselves. Common misconceptions about evolution. Evolution is not the same as adaptation or natural selection.
Imagine a scenario where one trait might be highly advantageous in one environment, but highly detrimental in another. A good example of this is the fur color of mice.
In the forest, it will be more likely that mice take on a darker color to match the earth. Can beneficial traits arise in more than one area by accident? When multiple environments favor the existence of a trait, these beneficial traits can pop up through mutation and spread throughout their individual populations completely independently.
Evolution: Natural selection and human selection article
Slow though the process of selection may be, if feeble man can do much by his powers of artificial selection, I can see no limit to the amount of change, to the. Artificial selection, also called "selective breeding”, is where humans select for . The distribution of citations of Academic papers is in a power law distribution (a. Artificial selection is a process of genetic modification of farm animal species that . Given the intrinsic power of this method to assess genetic pleiotropy, it might.