Lisbon-K Chromosome Dataset: Difference between revisions
Line 86: | Line 86: | ||
A simple algorithm to segment the karyogram, written in MATLAB will be included in the package. | A simple algorithm to segment the karyogram, written in MATLAB will be included in the package. | ||
* Input: Karyogram image | * Input: Karyogram image | ||
* Output: | * Output: Cell array with the 46 chromosomes, rightly ordered | ||
== Dataset Request & Citing == | == Dataset Request & Citing == |
Revision as of 15:27, 29 July 2008
Under Construction...
Background & Framework
Some extracts of our paper [1]:
"The study of chromosomes morphology and the relation with some genetic diseases is the main goal of cytogenetics. Normal human cells have 23 classes of large linear nuclear chromosomes, in a total of 46 chromosomes per cell. This set of chromosomes contains approximately 30.000 genes (genotype) and large tracts of non coding sequences. Therefore, the examination of genetic material can involve the examination of specific chromosomal regions using DNA probes, e.g. FISH (fluorescent in situ hybridization), called molecular cytogenetics, comparative genomic hybridization (CGH) and the morphological and textural analysis of the entire chromosomes, the conventional cytogenetics, which is the focus of our work. These cytogenetics studies are very important when it comes to detection of acquired chromosomal abnormalities, such as, translocations, duplications, inversions, deletions, monosomies or trisomies that occur for example in leukemia cancerous cells and are the ideal path to take in order to characterize the different types of leukemia existent, being crucial when it comes to the right choice of treatment and follow-up for the patient, among various other applications.
The pairing of chromosomes is one of the main steps in conventional cytogenetics analysis and it is important to obtain a rightly ordered karyogram for diagnosis of genetic diseases based on the patient karyoptype.
The karyogram is an image representation of stained human chromosomes with the widely used Giemsa Stain metaphase spread (G-banding) where the chromosomes are paired in 22 classes of homologous elements and two sex-determinative chromosomes (XX for the female or XY for the male), arranged in order of decreasing size. A karyotype is the set of characteristics extracted from the karyogram that may be used to detect chromosomal abnormalities. The metaphase is the step of the cellular division process where the chromosomes are at their most condensed state. In this phase the chromosomes appear well defined, allowing for the best visualization and abnormality recognition than in all the other states of the cell-division cycle.
Usually, the pairing and karyotyping procedure is done manually by visual inspection and, therefore, it is time consuming and technically demanding. After the G-banding procedure, all chromosomes gain a distinct transverse banding pattern characteristic for each class (see \ref{fig:ideograma}). This banding profile is the most important feature for chromosome classification. Based on an international system for cytogenetic nomenclature (ISCN) that provides standard diagrams/ideograms of band profiles for all the chromosomes of a normal human, the clinical staff is trained to pair and interpret the karyogram according to that information. Fig.\ref{fig:ideograma} shows an ideogram for the chromosomes of class 1 in various states of condensation. Other features, related to the chromosome dimensions and shape are also used to increase the discriminative power of the manual or automatic classifiers.
Automatic pairing and classification is needed but it is a very difficult task. It has been an active field of research in the last two decades and still is an open problem today, namely, concerning the specific task of chromosomes pairing.
For instance, the most widely used commercial packages for cytogenetic analysis, including hardware (microscope) and software, are the Metasystems and Cytovision systems. These systems, containing state of the art algorithms for automatic detection of metaphase plates and implementation of the FISH technique, are however, still very ineffective with respect to chromosome classification and/or pairing. The same is true for the Leica package used by the Institute of Molecular Medicine of Lisbon (IMM) where the data used in this work was acquired..." [1]
Here a pairing algorithm for \textit{karyotyping} is proposed to be used in the scope of \textbf{leukemia} diagnosis. For this purpose \textbf{bone marrow} cells are used. These chromosome images present much less quality than the ones used in the traditional genetic analysis using data sets such as Edinburgh \cite{C1,C6}, Copenhagen \cite{C1,C2,C5,C7,C12} and Philadelphia \cite{C1,C5}, namely, concerning the centromere, band profile description/discrimination and level of chromosome condensation. For this purpose a new data set, with chromosomes not available in the traditional data sets, is presented and used to evaluate the pairing algorithms. We will focus more on this issue in the chromosome data section \ref{}.
The images were acquired with a Leica\texttrademark Optical Microscope DM 2500 and some image pre-processing (mainly noise reduction) and chromosome segmentation were performed with Leica\texttrademark CW 4000 Karyo software used by the clinical staff. The pairing ground truth was obtained manually by the technical staff of the Institute of Molecular Medicine of Lisbon and used to asses the accuracy of the proposed pairing algorithms.
are manually segmented and correctly oriented in a computer assisted basis by the clinical staff. They are also transformed by the technicians that perform some image intensity pre-processing in order to subjectively improve its quality, such as, noise removal and sharpen increases. It is important to stress that this intensity pre-processing is not desirable in the scope of this work because it makes it difficult to define the right acquisition model in a statistical or deterministic basis.
The lack of quality of the chromosome images used in the leukemia diagnostic process, when compared with other types of chromosomes images, is due to the fact that these images are based on \textit{bone marrow} cells usually acquired from patients suffering from leukemia.
For instance, the images from Edinburgh and Copenhagen datasets are based on routinely acquired peripheral blood cells (constitutional \textit{cytogenetics}) while in the Philadelphia dataset the images are bases on cells extracted from chorionic villus (pre-natal \textit{cytogenetics}). In both constitutional and pre-natal \textit{cytogenetics} the observed cells are all equal, meaning that the same \textit{karyotype} is always observed, independently on which cell is analyzed, making it possible to choose the \textit{metaphases} that present better image quality. On the contrary, in tumoral \textit{cytogenetics} (leukemia in this case), a mixture of both normal and cancerous cells is observed, with significant differences not only between normal and tumoral cells, but also within the tumoral cells, which are the key cells for the diagnosis. In addition, while in pre-natal and constitutional cytogenetics it is possible to control the cell division cycle in order to obtain chromosomes with the best morphology possible, in tumoral \textit{cytogenetics} that is not possible because it is much more difficult to predict the behavior of these cancerous cells. Two different quality \textit{karyograms} are displayed in Figs.\ref{fig:cario} and \ref{carioEDINBURGO} for comparison purposes.
A new data set of this type of \textit{bone marrow} cell chromosomes, ordered and annotated by the technicians, is used and described in this paper in the Section \ref{sec:LisbonDataSet}. \emph{\textbf{\textit{POR A FRASE A SEGUIR E O PARAGRAFO ANTERIOR INTEIRO NESSA SECCÃO DO LISBON DATASET}}} This data set of \textit{karyograms} is a very important tool from a researching point of view because at last a ground truth is available to test classification and pairing algorithms for this type of cells. The images, relevant software, and relevant information are available at the website http://mediawiki.isr.ist.utl.pt/wiki/Lisbon-K_Chromosome_Dataset.
POR A IMAGEM DO IDEOGRAMA COM REFERENCIA PARA O ISCN.
- References:(INCLUIR LINKS COM PDF'S DOS ARTIGOS POR BAIXO DE CADA REFERÊNCIA...DESCOBRIR COMO SE FAZ...)
- [1] Artem Khmelinskii, Rodrigo Ventura and João Sanches, Automatic Chromosome Pairing for Karyotyping Purposes Using Mutual Information, NOME DE REVISTA, ANO, PÁGINAS, ETC.
- [2] Artem Khmelinskii, Rodrigo Ventura and João Sanches, Automatic Chromosome Pairing Using Mutual Information, Proceedings of the IEEE EMBC’08 - 30th Annual International Conference of the IEEE EMBS, August 20-24, Vancouver, Canada, 2008 (FALTAM AS PÁGINAS!!!)
- [3] Artem Khmelinskii, Rodrigo Ventura and João Sanches, Chromosome Pairing for Karyotyping Purposes Using Mutual Information, Proceedings of the 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, May 14-17, Paris, France, 2008 Pages: 484-487
Lisbon-K1 Chromosome Dataset
Description:
- 200 ordered and chromosome class-numbered karyograms:
- 100 "Good/Medium"
- INSERIR NÚMERO Female
- INSERIR NÚMERO Male
- 100 "Bad"
- INSERIR NÚMERO Female
- INSERIR NÚMERO Male
- 100 "Good/Medium"
- Origin: bone marrow cells collected from patients with Leukemia
- All the karyograms were selected fulfilling the following criteria:
- No structural abnormalities (such as translocations, deletions, inversions, etc.)
- No numerical abnormalities (such as monosomies or trisomies)
- No segmentation artifacts
- No artifacts related with chromosome overlapping in the metaphase plate
- All the chromosomes are correctly oriented
- Karyograms with very bended chromosomes were excluded (more than 50º)
- Without the chromosome straightening performed by the Leica software
- Total number of chromosomes: (100*46)*2=9200
- 768 x 512 TIFF format images
- INSERIR NUMERO MB
- Average Chromosome Size after segmenting the karyogram: 80 x 40
Lisbon-K2 Chromosome Dataset (Under Construction...)
Description:
In the future, another dataset will be build with more "real" and interesting data. i.e., karyograms extracted from cancerous cells of Leukemia patients, with all sort of chromosomal numerical and structural abnormalities.
Software
Description:
A simple algorithm to segment the karyogram, written in MATLAB will be included in the package.
- Input: Karyogram image
- Output: Cell array with the 46 chromosomes, rightly ordered
Dataset Request & Citing
In order to follow-up the investigation interest in this area we ask the researchers interested in this dataset to send us an e-mail, with a brief description of your work (one, two paragraphs would be more than enough) and the institute/research center you are affiliated to. A temporary download link will be send to you in the next few hours following the e-mail reception.
To reference the dataset in any publication describing research performed using the dataset, or sets derived from the original dataset made available here please cite the following paper, in which the dataset was first presented and made public:
- Artem Khmelinskii, Rodrigo Ventura and João Sanches, Automatic Chromosome Pairing for Karyotyping Purposes Using Mutual Information, NOME DE REVISTA, ANO, PÁGINAS, ETC.
Thank you and good work!
Contact
For dataset request, questions, comments and suggestions on the data and the website, report bugs or typos, please contact:
Artem Khmelinskii
e-mail: artkhmelinskii (##) isr.ist.utl.pt