Weizmann Institute Press Release (Heb)
PLos Press Release
Q & A

Lineage analysis
What is the “cell lineage tree” of an organism?
What is the importance of lineage analysis?
How did people perform lineage analysis until now?
What is known about the cell lineage tree of humans and mice?
Do all humans have exactly the same cell lineage tree?
The method
Can you describe your method in general terms?
Don’t all the cells of an organism have the same DNA sequence?
Aren’t all mutations harmful?
So do you mean that the cells in my body are intrinsically “labeled”?
What are microsatellites, and why do you use them?
Is this method similar to phylogenetic analysis of species?
To which organisms can this method be applied?
In what aspects is your method better than existing methods?
Can the method be applied in a high-throughput manner?
Technical details
Why do you use mutant organisms? Doesn’t this cast doubt on your findings?
What are the current limitations of the method?
Can lineage analysis also be performed on populations of cells instead of on single cells?
Future
Can we expect to see this method applied in the near future?
What are your short and long term goals?
Are you interested in collaborations? What is your collaboration policy?



Lineage analysis
What is the “cell lineage tree” of an organism?
All living organisms are made of cells. The number of cells of different multicellular organisms can vary enormously: For example, the worm C. elegans has about 1000 cells, whereas the human body has about 100 trillion cells. However, no matter how many cells an organism has, they all originated from a single cell, namely, the fertilized egg. During the development process of an organism, each cell, in its turn, can have one of three outcomes: (i) it can divide into (and be replaced by) precisely two new ("daughter") cells; (ii) it can continue living without dividing, or (iii) it can die and stop being part of the living organism. Such a development process can be represented by a mathematical structure called a labeled rooted binary tree (see Figure). We note that for each organism, at any point in time, there is exactly one binary tree that represents correctly its development up to that point in time. We call this tree the cell lineage tree of the organism. This tree is similar to a genealogical tree, which describes the descent of a family, or to the “tree of life”, which depicts how species on Earth evolved.

What is the importance of lineage analysis?
The lineage relations among the cells of an organism are important for most biological research fields such as developmental biology, immunology, stem-cell research, brain research, and cancer research. Uncovering these lineage relations may help to resolve many open fundamental questions, regarding, for example, the origin of various cell types, stem cell numbers and growth dynamics, the nature of differentiation barriers, and the development of tumors and metastases.
How did people perform lineage analysis until now?
For small and transparent organisms lineage analysis can be performed simply by looking at the organism through a microscope and tracing its cells through development. The entire cell lineage tree of the worm C. elegans (which has about 1000 cells) was reconstructed in this fashion. However this method is of course not applicable to more complex organisms such as mice growing in their mother's uterus. Most lineage studies have been performed using a “clonal assay”. In this type of assay, the cell of interest is marked with a heritable marker (such as a fluorescent dye or a retrovirus) which is transmitted to all the cell’s descendants, allowing the identification of the cell clone. Clonal assays have been useful in obtaining lineage information in various biological systems, but they are quite limited in their potential, because they can only say whether a particular cell is a descendent of the original marked cell. In addition, many clonal assays use invasive techniques to mark the original cell, and this may alter the normal biological function of the clone.
What is known about the cell lineage tree of humans and mice?
Not much! It is quite surprising how little we know about our own cell lineage tree. For example, if we take a human cell and ask a seemingly simple question: “How many cell divisions were needed to create this cell from the zygote?”, we would probably be left without an answer, because for most cell types, we simply do not have that knowledge, and different estimations may vary greatly
Do all humans have exactly the same cell lineage tree?
Probably not. For simple organisms, such as the tiny worm C. Elegans, which has only about a thousand cells, the answer is yes. All C. Elegans worms develop in the same fashion, such that the cell lineage trees of different individual worms, is identical. Cell lineage trees of larger organisms have not been reconstructed, but there are indications that the development of organisms such as the fruit fly, D. Melanogaster, might not be completely deterministic, and therefore the cell lineage trees of different individuals are different. Indeed, the question of how much of human development is constant across individuals and how much is variable is still unresolved
The method
Can you describe your method in general terms?
Our method enables to reconstruct parts of the cell lineage tree of an organism with high resolution and high precision. If you were to extract a set of cells (say one hundred) from a mouse, you may be interested to know the lineage relations between these cells. These relations can be represented by a binary tree, which is part of the enormous organism cell lineage tree. Our method (which is non-invasive) exploits intrinsic differences in the DNA sequences of cells in order to perform lineage analysis. It is commonly said that all of the cells in the human body (as an example of a multicellular organism) have the same DNA sequence. This is not exactly true – rare mutation events cause cells to acquire slightly different versions of the original DNA sequence. These differences, which account for a miniscule fraction of the DNA sequence, and are found mostly in non-coding regions (hence do not affect the functionality of the cells) essentially “record” the specific path each cell took during development. Lineage analysis is performed in a retrospective manner, similarly to phylogenetic analysis, in which lineage relationships between species are inferred based on their DNA sequences. Unlike phylogenetic analysis, which has a speculative component, our method can be verified under controlled experiments. Also, our analysis is on a completely different scale – we use mutations occurring at a single cell division, not during millions of years.
Don’t all the cells of an organism have the same DNA sequence?

To a first approximation, yes, but actually each cell has a slightly different DNA sequence. Before a cell divides, its DNA is copied by a high fidelity molecular machine, and each daughter cell gets one copy of the DNA sequence. Although extremely precise, this machine also makes rare mistakes called mutations. For example, it might put a ‘T’ instead of a ‘G’. We calculated that in each cell division each daughter cell acquires about 50 mutations across its genome (we performed the calculation under conservative assumptions – we suspect the number is probably much higher).

Aren’t all mutations harmful?
Not at all. Only a small fraction of our DNA, such as genes coding proteins, is “meaningful”. Mutations occurring in such areas might be harmful, and might cause diseases such as cancer. One should keep in mind that mutations can also be beneficial, and the theory of evolution is based on this notion. Yet most mutations occur in the non-meaningful parts of the DNA sequences (sometimes referred to as “junk” DNA), and these mutations are neutral and go un-noticed. In our method we specifically attempt to look only at mutations occurring in non-meaningful parts of the sequence, because these are random events, as opposed to mutations in genes which may be selected for or against, and hence are not truly random.
So do you mean that the cells in my body are intrinsically “labeled”?
Yes! These random mutations which take place in each and every one of us contain very useful information which can be used to determine the lineage relation between cells (and other properties of the cell lineage tree). As opposed to previous methods of lineage analysis, our method does not require any sort of invasive marking.
What are microsatellites, and why do you use them?
Microsatellites are DNA "stutter" sequences, such as ACACACACACACACACACACACAC (AC12) or AAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAG (AAG11). In these sequences a small repeat unit (such as 'AC') appears several times in tandem. The genomes of most organisms have microsatellites, and there are over a million of them in humans and mice. For our method, microsatellites are very useful for several reasons, mainly because they have fast mutation rates. Although there are mutations in all parts of the genome, such events are relatively rare. Since it is currently possible only to analyze small fractions of a genome, we need to focus on genomic parts which mutate faster. Microsatellites have a special mutational process called slippage, in which whole repeat units are added or deleted (for example, AC12 changes to AC13). Slippage mutations occur at much higher rates than insertions, deletions or substitutions, about 100-1000 fold faster. 
Is this method similar to phylogenetic analysis of species?
Yes, it is quite similar. Modern phylogenetic analysis of species compares the DNA sequences of species in order to obtain their evolutionary relations. Species which are closer evolutionary-wise tend to have more similar sequences, because they share a long evolutionary path in which they accumulated similar mutations, and they diverged only quite recently. For example, humans and chimps share a common evolutionary path from the beginning of life and have evolved separately since splitting from a common ancestor "only" about 6 million years ago, hence their DNA is about 99% similar. In our method we compare DNA sequences of cells within an organism in order to obtain their lineage relations. Analogously, cells which are closer lineage-wise will tend to have more similar sequences, because they share a longer development path in which they accumulated similar (somatic) mutations. Also, similar mathematical analysis (called phylogenetic algorithms) can be used.
To which organisms can this method be applied?
In principle the method can be applied to all organisms, because all organisms have DNA that is subject to somatic mutations. Importantly, our method is applicable to Human, since it is non-invasive. If the implementation of the method is based on microsatellites, the organism must have a sufficient number of microsatellites. Most model organisms, including Drosophila, Mouse and Human, have a large number of microsatellites making this a good choice of implementation. 
In what aspects is your method better than existing methods?
Our method has two main advantages over existing methods. First, to our knowledge this method enables for the first time to perform lineage analysis at the level of single cells, at high resolution (potentially at single cell division resolution) and with high precision. Current methods offer much more limited information. Second, our method is applicable to Human, since it is non-invasive. Experimentation on humans is usually not possible due to ethical reasons, yet it is the primary goal of most researches, and model organisms, such as mice, are used just as second-bests. Many researches performed on model organisms, however persuasive and conclusive they may be, will always leave the big question: "But is it the same in humans?" We offer a way to reach definitive answers to such questions.
Can the method be applied in a high-throughput manner?
Certainly, and we have already made a first step in this direction. By high-throughput we mean analysis of a large number microsatellites from a large number of samples. Our goal is to fully automate the procedure from DNA samples to a reconstructed tree, and we have created a prototype of a machine implementing this procedure. 
Technical details
Why do you use mutant organisms? Doesn’t this cast doubt on your findings?

In regular, “wild type” organisms, DNA mutations accumulate at a slow rate, and therefore a large portion of the genome must be scanned in order to find a sufficient amount of mutations, which would enable reconstruction of the cell lineage tree. Scanning a large portion of the genome, although technically possible, is very expensive, and therefore not feasible at the present time. In order to circumvent this difficulty, we perform experiments on mutant organisms which have a defect in a DNA mutation-repairing gene. These organisms develop normally and look just like their wild type relatives, but they accumulate DNA mutations at a very high rate – more than 100 times the wild type rate. In these organisms, scanning of just a minute fraction of the genome is sufficient for reconstructing the cell lineage tree.

The use of mutant organisms does not necessarily pose a problem, because although they are not identical to wild type organisms in every respect (for example, due to excess accumulation of mutations, most mutant organisms develop cancer at a relatively young age), these animals are generally very similar to their wild type relatives and their development is indistinguishable from wild type development.

With the rapid advancement of DNA sequencing technologies, it may become feasible in the near future to sequence large portions of the genome, such that it will be possible to use wild type organisms for cell lineage tree reconstruction.

What are the current limitations of the method?
Lineage reconstruction from genomic variability is a retrospective analysis, and it suffers from the limitations of this type of analysis. Specifically, it cannot uncover the existence of specific cells which have died (although the existence of populations of such cells can be deduced), and the existence of cells which have divided can be deduced, but their phenotype and position cannot be determined. Another limitation is that at the present time, it is not financially feasible to reconstruct cell lineage trees of wild type organisms, and therefore reconstruction is performed on mutant organisms. This limitation, however, may be overcome in the future with advancement in DNA sequencing technologies. Another such limitation concerns the DNA samples which can be processed. Because the physical manipulation of single cells, and the amplification of single-copy genomes is technically challenging, we perform reconstruction currently from samples of cell clones, rather than single cells. Cell clones, in which all cells are descendents of a single cell, harbor the genetic identity of their founder cells, and therefore their use is analogous to the use of the founder cells themselves and does not introduce bias. Cell clones, however, are not easily obtainable from all tissues and may be impossible to obtain from tissues which lack self-renewal capacity. We intend to work in the future with single cells and we are currently developing the laboratory set up and protocols which will enable us to do so.
Can lineage analysis also be performed on populations of cells instead of on single cells?
It is possible to perform lineage analysis on certain populations of cells instead of single cells. If discrete cell clones are used, in which all cells are descendents of a single cell, the analysis is straightforward and analogous to analysis of the founder cells of the clones. However, if heterogeneous populations of cells, in which the cells are not pure clones, are used for lineage tree reconstruction, the resulting tree might be ambiguous and not have much biological meaning. The reason for this is that in contrast to lineage relations between single cells, lineage relations between heterogeneous cell populations are very complex, and a simple binary tree cannot portray this complexity. Performing lineage analysis of heterogeneous cell populations may still be worthwhile, even if no tree is drawn, because other, simpler, lineage information may be obtained. For example, the average depth (number of cell divisions) can be compared between two heterogeneous cell samples.
Future
Can we expect to see this method applied in the near future?
Certainly. The method is already applicable, we have begun collaborating with research groups from around Israel, and are very interested in collaborating with other groups from Israel and around the world.
What are your short and long term goals?
In the short term, we plan to initiate small-scale projects in order to gain preliminary understanding of partial lineage trees associated with different organs or systems, by analyzing cell samples containing only dozens or hundreds of cells. In addition, we plan to analyze of the development of cancer, which may provide immediate benefits. Cancer analysis may not require the perfection of single-cell methods, since clonal tissue samples may be obtainable from solid tumors. In the longer term, with the improvement of DNA sequencing technologies, we wish to inspire the initiation of a ‘‘Human Cell Lineage Project,’’ whose aim would be to reconstruct an entire human cell lineage tree. A precursor project, which may face fewer hurdles, would be a ‘‘Mouse Cell Lineage Project.’’ Both projects would require multidisciplinary teams, with members familiar with different organs or biological subsystems, but either project would benefit from the teams working on the same individual organism, since accumulated mutation information regarding the same individual could greatly improve the precision of the overall tree reconstruction process. Still, as in the Human Genome Project, diversity would be needed to separate incidental from essential properties of the organism cell lineage tree.
Are you interested in collaborations? What is your collaboration policy?

We are very interested to collaborate with research groups wishing to apply this method in their research. Various lineage analysis questions interest us, but we are also very open and eager to pursue other directions. Currently the method is feasible for analysis of animals or tissues displaying microsatellite instability. We have established at the Weizmann institute two strains of mutant mice (Mlh1-/- obtained from R.M. Liskay of OHSU and Msh2-/- obtained from W. Edelmann of AECOM), which display such instability, and we have already begun experiments with them. We will also be happy to perform analysis on human tissue with genomic instability (either local, such as from a cancer, or global, such as in mismatch repair deficient humans).The main part of the collaboration works as follows: you give us DNA from the cells you wish to analyze (each DNA sample should be from a single cell or from a clone of a single cell, see current limitations). We currently suggest at least 1µg DNA from each sample. We then perform the lineage analysis and return you a tree depicting the lineage relations between your cells and additional information explaining the results.

If you are interested to check the possibility of collaborating with our group, please contact Ehud Shapiro at Ehud.Shapiro@weizmann.ac.il.