SMARTPOP

User Manual

SMARTPOP Manual

July, 2014
1 Introduction to SMARTPOP
2 Requirements
3 Installation
4 SMARTPOP features
  4.1 Input
   4.1.1 Via command lines
   4.1.2 Via input files
   4.1.3 Windows executable
  4.2 Simulation parameters
   4.2.1 Verbose
   4.2.2 Random seed
   4.2.3 Population size
   4.2.4 Sample size
   4.2.5 Number of simulations
   4.2.6 Number of generations to run
   4.2.7 Mating system and number of offspring
   4.2.8 Sibling matings
   4.2.9 Demography parameters
   4.2.10 Mutation rates
   4.2.11 Burn-In phase
   4.2.12 DNA sequences
   4.2.13 Choice of outputs
   4.2.14 Save/Load
  4.3 Outputs
   4.3.1 Diversity tables
   4.3.2 Fasta
   4.3.3 Arlequin
  4.4 Running SMARTPOP in parallel mode
  4.5 Setting up a complex scheme (change of parameters through time)
  4.6 Starting conditions: burn-in and pre-run
5 Examples
A Default parameters
B References

1 Introduction to SMARTPOP

SMARTPOP is fast and flexible forward-in-time simulator for population genetics. Specially developed for speed efficiency, it is available in both serial and parallel versions. Developed for anthropological inferences on human populations, SMARTPOP simulates individuals with sequences of sex-linked DNA (mitochondrial, X and Y chromosomes) and autosomes. Studies of social dynamics are enabled using SMARTPOP's flexible demographic models and social rules of mating.
For any use of SMARTPOP or re-use of its code source please cite:
Guillot and Cox, 2014. SMARTPOP: inferring the impact of social dynamics on genetic diversity through high speed simulations. BMC Bioinformatics 2014 15:175

2 Requirements

SMARTPOP has been developed in C++ using the Boost C++ library. To build the software from sources you need a C++ compiler such as g++ or Visual Studio installed on your computer. To compile the parallel version of SMARTPOP, we recommend using mpic++.

3 Installation

You can directly download a binary (executable) version of SMARTPOP compatible with your OS at http://smartpop.sourceforge.net/download.html
Alternatively you can build SMARTPOP from the source code following these instructions:

To build SMARTPOP from source on a UNIX machine:

SMARTPOP is then ready to launch via the command line:
./smartpop

For building the parallel version, go to the src-parallel directory:

We have run tests on the main operating system available today, if you encounter any trouble please contact us.

4 SMARTPOP features

SMARTPOP must be called via the command lines (./smartpop ) from the directory where it is installed (or add the directory to your PATH). One call will launch a set of simulations with one set of parameters.

4.1 Input

A set of simulations rely on several parameters, all of which have default values except the population size and random seed. The default values are given in this table. You can either set these parameters using flags on the command line or via an input file.

4.1.1 Via the command line

You can define all the parameters to start a set of simulations using flags on the command line. All the parameters that are not called by flags will receive default values.
Note: you must always define the population size.
The order of parameters does not matter. If you give a wrong sequence of arguments on the command line, the program will raise an error and not start (in most cases).
To run a set of 1000 simulations with population size 200 evolving for 500 generations, enter:
./smartpop -p 200 -t 500 -nsimu 1000





Flag Argument

Meaning

-v

Verbose

-i

Input file with parameters

-o string

Name for the output files

-s integer

Random seed

-p integer

Population size

-sample integer

Sample size

-nsimu integer

Number of simulations

-t integer

Number of generations between two sampling events

-nstep integer

Number of sampling events through time for one simulation run

-mat integer

Mating system

-noSib

Prevent half siblings (sharing at least one parent) mating

-noSib2

Prevent full siblings (sharing two parents) mating

-mu double (x4)

Mutation rates (simple rates)

-muFull double (x8)

Mutation rates (Kimura’s two parameters model)

-mtdiv

Output diversity for mitochondrial DNA

-ydiv

Output diversity for Y chromosome

-xdiv

Output diversity for X chromosome

-adiv

Output diversity for autosomes

-nbLociAinteger

Number of unlinked autosomal loci

-sizeAinteger

Size, in number of sites, per autosomal locus

-nbLociXinteger

Number of unlinked loci on the X chromosome

-sizeXinteger

Size, in number of sites, per locus on the X chromosome

-sizeY integer

Size, in number of sites, on the Y chromosome

-sizeMt integer

Size, in number of sites, of the mitochondrial sequence

-burnin double

Set the diversity threshold parameter for the burn-in phase

-burninType int

Set the type of DNA to be tested in the burn-in phase

-demog double (x3) double

Set the demographic function

-fasta

Save each simulation in a fasta format file at the end of the run

-arl

Save each simulation in a arlequin format file at the end of the run

-save string

Save all the simulations in a SMARTPOP format (.sim) with a root name given by the argument

-load string

Load all the simulations in a SMARTPOP format (.sim) from the root name given by argument

-header

Output the header at the beginning of diversity files





Table 1: Flags for command line call of SMARTPOP

4.1.2 Via input files

Instead of defining the parameters in the command line, you can also use an input file. For example, this is useful if you modify a lot of default value parameters.
You must respect the format of this file to make it work. Each line describe the value for one parameter. The order of lines does not matter. All the parameters that are not defined in this input file will take default values, given in this table.





Keyword Argument

Meaning

verbose boolean (0 or 1)

Verbose

fileOutput string

Name of the output files

seed integer

Random seed

populationSize integer

Population size

sampleSize integer

Sample size

nSimu integer

Number of simulations

step integer

Number of generations between two sampling events

nstep integer

Number of sampling events through time for one simulation run

matingSystem integer (0 to 4)

Mating system

inbreeding boolean (0 or 1)

0 allows siblings mating;1 prevents half sibling matings; 2 prevents full sibling matings

muMtDnaTransition double

Mutation rate (/site/generation) for mitochondrial transitions

muMtDnaTransversion double

Mutation rate (/site/generation) for mitochondrial transversions

muXTransition double

Mutation rate (/site/generation) for X chromosome transitions

muXTransversion double

Mutation rate (/site/generation) for X chromosome transversions

muYTransition double

Mutation rate (/site/generation) for Y chromosome transitions

muYTransversion double

Mutation rate (/site/generation) for Y chromosome transversions

muAutosomeTransitiondouble

Mutation rate (/site/generation) for autosomal transitions

muAutosomeTransitiondouble

Mutation rate (/site/generation) for autosomal transversions

diversityToOutput integer (0 to 4)

0 = output diversity for all kinds of DNA simulated

1 = output mitochondrial DNA

2 = output X chromosome

3 = output Y chromosome

4 = output autosome

nbLociA integer

Number of unlinked autosomal loci

sizeA integer

Size, in number of sites, per autosomal locus

nbLociX integer

Number of unlinked loci on the X chromosome

sizeX integer

Size, in number of sites, per locus on the X chromosome

sizeY integer

Size, in number of sites, of the Y chromosome

sizeMt integer

Size, in number of sites, of the mitochondrial sequence

burninTheta double

Set the diversity threshold parameter for the burn-in phase

burninType int

Set the type of DNA to be tested in the burn-in phase

demog double double double

Set the demographic function

mito integer

Mitochondrial simulation only

fastaOutput boolean (0 or 1)

Save each simulation in a fasta format file at the end of the run

arlequinOutput boolean (0 or 1)

Save each simulation in a Arlequin format file at the end of the run

save boolean (0 or 1) (+ string)

If 1, save all the simulations in a SMARTPOP format (.sim) in the directory given by the argument

If 0, no saving

load boolean (0 or 1) (+string)

If 1, load all the simulations in a SMARTPOP format (.sim) from the directory given by the argument

If 0 no saving

headerOutput boolean (0 or 1)

Output the header at the beginning of diversity files





Table 2: Parameter file definitions

By default, each run of SMARTPOP creates a parameter file called SMARTPOP_parameters.txt. This has exactly the same format as the input parameter file such that running ./smartpop -i SMARTPOP_parameters.txt will run the exact same set of simulations as you formerly ran. This is possible due to the fact that the random seed is one of the input parameters.

4.1.1.3 Windows executable

The available windows executable only handle input files. SMARTPOP will read the file SMARTPOP_parameters.txt which must be in the same directory. For each new launch of SMARTPOP you must set the parameters by modifying this file. If you want the seed to be picked randomly, remember to erase the line with the seed keyword from the parameter files.

4.2 Simulation parameters

4.2.1 Verbose

It is recommended to begin using SMARTPOP with the verbose option on. This will make SMARTPOP return relevant information one the command line when the simulations are running. It is a good way to check that the parameters are set correctly, as well as to visualize the progress of the program.

4.2.2 Random seed

Simulations are highly reliant on random processes. Such processes are simulated via sequences of random numbers which are a large part of the program. Each time the program starts, it calls a random seed from which this sequence is produced uniquely. By default, the random seed is random, but it is possible to set it to repeat earlier runs.

4.2.3 Population size

The population size set in SMARTPOP corresponds to the census population size at the beginning of the simulation. If the population size is set to be non constant, this will change through time.

4.2.4 Sample size

Instead of analyzing the entire population, you can sample a certain number of individuals. This situation would match a “real life” situation where you do not have access to the DNA of your whole population. If you define a sample size, all the outputs (diversity, but also fasta and arlequin files) will be generated on a random sample in your population of the defined size.

4.2.5 Number of simulations

You can define the number of simulations that will be run with this set of parameters. If you are loading a set of simulations from a directory, this number must be smaller or equal to the number of saved simulations.

4.2.6 Number of generations to run

During each simulation, the population will evolve for a number of generations t between two sampling events. By default, there is only one sampling event at the end of those t generations, where output files are produced and diversity is measured. If you define multiple sampling events (NSampling) through time, then t defines the number of generations to run between two sampling:

GTotal = NSampling ×t
4.2.7 Mating system and number of offspring

Five mating systems are available, each designated by a number:

  1. Monogamy
    Males and females are paired randomly to mate. No individuals can be paired with two different mates. The number of offspring per couple is set to follow a Poisson law.
  2. Polygamy
    Males and females are paired randomly to mate. The number of offspring per female is set to follow a Poisson law.
  3. Polygyny
    Males and females are paired randomly to mate. A female can only mate with one male. The number of offspring per female is set to follow a Poisson law.
  4. Polyandry
    Males and females are paired randomly to mate. A male can only mate with one female. The number of offspring per male is set to follow a Poisson law.
  5. Random mating
    Males and females are paired randomly to mate. There is no law on the number of offspring.

4.2.8 Sibling matings

For any mating system, you can forbid the mating between full or half siblings.

4.2.9 Demography parameters

The demograpy is defined by a Markovian function:
popsize(t+1)=a + b x popsize(t) + c x popsize(t)2
Based on three parameters a, b and c, this definition permits a large range of demographic scenarios. This table presents the most simple example:

Scenario a b c
A constant population size010
A constant growth of x individuals per generationx10
A constant decrease of x individuals per generation-x10
An exponential growth of rate x0x0
An exponential decrease of rate x01/x0
4.2.10 Mutation rates

Each type of DNA (mitochondrial, X, Y and autosome) has two mutation rates: a transition and transversion rate. These rates must be set in mutations/site/generation. Overall mutation rates can be defined, but you can set transitions equal to transversions equal to half the total mutation rate, if you do not know the ratio of transition over transversion.
The rationale behind having only eight mutation rates is that mutation rates measured for mtDNA, Y, X and autosomes can be different by more than an order of magnitude. Similarly, transition rates are usually an order of magnitude higher than transversion rates.

4.2.1 Burn-in phase

By default, simulations will start with the entire population having the same DNA. Alternatively, a burn-in phase allows you to start with an accelerated evolutionary process that will force your population to reach a given diversity θburn-in from which your simulation will start. In this process, a higher mutation rate is applied to your population which quickly increases the diversity. The accelerated evolution stops when the threshold is reached or if it has been running for more than 100 generations.
By default, θburn-in is computed on the mitochondrial DNA. This can be changed by using the flag burnType. This parameter must be set to:

Home