US
Algorithms Implemented in TAGster | National Institute of Environmental Health Sciences
Algorithms Implemented in TAGster | National Institute of Environmental Health Sciences
Skip Navigation
Algorithms Implemented in TAGster
Close the left navigation
Add
TAGster: Efficient Selection of LD Tag SNP in Single or Multiple Populations
Consider a set S which contains M bi-allelic SNP markers a
,a
...,a
in
populations
and S
contains M
SNP markers s
,s
,...,s
in population
. First, we estimated pairwise LD measure r
for each SNP pair within each population. Two markers s
im
and s
in
are said to be in strong LD if the r
(s
im
,s
in
) is greater than or equal to a pre-specified threshold value r
. Both are considered tag SNP for each other, in that s
im
can be used as a surrogate for s
in
, or vice versa.
Our aim is to find a tag SNP set, denoted by T, such that for ∀s
im
∈S
=1,...,
, ∃α
that satisfies r
(α
,S
im
) ≥ r
. In our presentation, we introduce intermediate SNP sets,
and
= 1,...,
where,
is called the candidate set which contains all the SNPs in population
that are eligible to be chosen as a tag SNP,
contains SNPs in population
that are already tagged by at least one of tag SNPs in
, i.e. ∀s
im
= 1,...,
, ∃α
that satisfies r
(α
,S
im
) ≥ r
. We implemented several algorithms in
TAG
ster to select tag SNP set
Algorithm 1: A greedy algorithm for single or multiple populations
1. Set
= ∅, P
= S
and
= ∅, for any
=1,...,
2. For each SNP α
in
, calculate
If α
If α
3. Find the SNP α
max
that has the highest
, and add α
max
to
. If α
max
, add any SNP s
im
in
with r
(α
max
, s
im
) ≥ r
to
and then exclude α
max
from
4. Repeat Steps 2-3 until
=S
for any
=1,...,
Algorithm 2: An optimal solution for single population tag SNP
An exhaustive Search is performed within each population to find minimal number of population specific tag SNPs
for
= 1,...,
1. Set
= ∅ and
, for
=1,...,
2. Within population
, partition SNPs in
into disjoint precinct
ij
= 1,...,
, so that r
(s
im
,s
in
)<r
for any two SNPs s
im
and s
in
that belong to different precincts.
3. Within a precinct P
ij
For any two SNPs s
im
and s
in
in precinct
ij
, if
,we exclude one with smaller
from precinct
ij
Conduct an exhaustive search to find a set of minimum number of tag SNPs for SNPs in precinct
ij
and add these tag SNPs into
4. Repeat step (3) for each precinct
Algorithm 3: Two-stage solution for multi-populations
1. Conduct Algorithm 2 within each population to select a set of population specific tag SNPs
for
= 1,...,
2. Set
= ∅,
= S
for
= 1,...,
3. For each SNP
ij
in
, find and SNP s
im
(s
im
and s
im
) that satify r
ij
,S
im
) ≥ r
and then add them as well as
ij
into LD bin
ij
and exclude them for
4. With each LD bin
ij
, set
ij
= ∅. Find any SNP s
im
in
ij
that satify r
(s
im
,S
in
) ≥ r
for any SNP s
in
in
ij
, and then add s
im
to
ij
5. Set
. For each SNP τ in
= 1,...,|
P|
, construct a one dimensional array
with
elements, where
6. Cluster SNPs in
so that any two SNPs τ
and τ
in a cluster satisfy
7. Set Ψ = ∅. Find one SNP τ
in each cluster with maximum
and add it to Ψ.
8. Cluster SNPs in Ψ so that any two SNPs τ
and τ
in a cluster satisfy
9. For each cluster, set LD bin set
= ∅, record the LD bins in each population that can be tagged by any SNP in the cluster to
, and then conduct an exhaustive search to find a minimum set of tag SNPs in the cluster that can tag all LD bins in
. Add this set of SNPs to
Back
to Top
Last Reviewed: February 18, 2026