Lisbon-K Chromosome Dataset
Under Construction...
Background & Framework
Extract of the Introduction of our paper [1]:
"The study of chromosomes morphology and the relation with some genetic diseases is the main goal of cytogenetics. Normal human cells have 23 classes of large linear nuclear chromosomes, in a total of 46 chromosomes per cell. This set of chromosomes contains approximately 30.000 genes (genotype) and large tracts of non coding sequences. Therefore, the examination of genetic material can involve the examination of specific chromosomal regions using DNA probes, e.g. FISH (fluorescent in situ hybridization), called molecular cytogenetics, comparative genomic hybridization (CGH) and the morphological and textural analysis of the entire chromosomes, the conventional cytogenetics, which is the focus of our work. These cytogenetics studies are very important when it comes to detection of acquired chromosomal abnormalities, such as, translocations, duplications, inversions, deletions, monosomies or trisomies that occur for example in leukemia cancerous cells and are the ideal path to take in order to characterize the different types of leukemia existent, being crucial when it comes to the right choice of treatment and follow-up for the patient, among various other applications.
The pairing of chromosomes is one of the main steps in conventional cytogenetics analysis and it is important to obtain a rightly ordered karyogram for diagnosis of genetic diseases based on the patient karyoptype.
The karyogram is an image representation of stained human chromosomes with the widely used Giemsa Stain metaphase spread (G-banding) where the chromosomes are paired in 22 classes of homologous elements and two sex-determinative chromosomes (XX for the female or XY for the male), arranged in order of decreasing size. A karyotype is the set of characteristics extracted from the karyogram that may be used to detect chromosomal abnormalities. The metaphase is the step of the cellular division process where the chromosomes are at their most condensed state. In this phase the chromosomes appear well defined, allowing for the best visualization and abnormality recognition than in all the other states of the cell-division cycle.
Usually, the pairing and karyotyping procedure is done manually by visual inspection and, therefore, it is time consuming and technically demanding. After the G-banding procedure, all chromosomes gain a distinct transverse banding pattern characteristic for each class (see \ref{fig:ideograma}). This banding profile is the most important feature for chromosome classification. Based on an international system for cytogenetic nomenclature (ISCN) that provides standard diagrams/ideograms of band profiles for all the chromosomes of a normal human, the clinical staff is trained to pair and interpret the karyogram according to that information. Fig.\ref{fig:ideograma} shows an ideogram for the chromosomes of class 1 in various states of condensation. Other features, related to the chromosome dimensions and shape are also used to increase the discriminative power of the manual or automatic classifiers.
Automatic pairing and classification is needed but it is a very difficult task. It has been an active field of research in the last two decades and still is an open problem today, namely, concerning the specific task of chromosomes pairing.
For instance, the most widely used commercial packages for cytogenetic analysis, including hardware (microscope) and software, are the Metasystems and Cytovision systems. These systems, containing state of the art algorithms for automatic detection of metaphase plates and implementation of the FISH technique, are however, still very ineffective with respect to chromosome classification and/or pairing. The same is true for the Leica package used by the Institute of Molecular Medicine of Lisbon (IMM) where the data used in this work was acquired..." [1]
POR A IMAGEM DO IDEOGRAMA COM REFERENCIA PARA O ISCN.
- References:(INCLUIR LINKS COM PDF'S DOS ARTIGOS POR BAIXO DE CADA REFERÊNCIA...DESCOBRIR COMO SE FAZ...)
- [1] Artem Khmelinskii, Rodrigo Ventura and João Sanches, Automatic Chromosome Pairing for Karyotyping Purposes Using Mutual Information, NOME DE REVISTA, ANO, PÁGINAS, ETC.
- [2] Artem Khmelinskii, Rodrigo Ventura and João Sanches, Automatic Chromosome Pairing Using Mutual Information, Proceedings of the IEEE EMBC’08 - 30th Annual International Conference of the IEEE EMBS, August 20-24, Vancouver, Canada, 2008 (FALTAM AS PÁGINAS!!!)
- [3] Artem Khmelinskii, Rodrigo Ventura and João Sanches, Chromosome Pairing for Karyotyping Purposes Using Mutual Information, Proceedings of the 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, May 14-17, Paris, France, 2008 Pages: 484-487
Lisbon-K1 Chromosome Dataset
Description:
- 200 ordered and chromosome class-numbered karyograms:
- 100 "Good/Medium"
- INSERIR NÚMERO Female
- INSERIR NÚMERO Male
- 100 "Bad"
- INSERIR NÚMERO Female
- INSERIR NÚMERO Male
- 100 "Good/Medium"
- Origin: bone marrow cells collected from patients with Leukemia
- All the karyograms were selected fulfilling the following criteria:
- No structural abnormalities (such as translocations, deletions, inversions, etc.)
- No numerical abnormalities (such as monosomies or trisomies)
- No segmentation artifacts
- No artifacts related with chromosome overlapping in the metaphase plate
- All the chromosomes are correctly oriented
- Karyograms with very bended chromosomes were excluded (more than 50º)
- Without the chromosome straightening performed by the Leica software
- Total number of chromosomes: (100*46)*2=9200
- 768 x 512 TIFF format images
- INSERIR NUMERO MB
Lisbon-K2 Chromosome Dataset (Under Construction...)
Description:
In the future, another dataset will be build with more "real" and interesting data. i.e., karyograms extracted from cancerous cells of Leukemia patients, with all sort of chromosomal numerical and structural abnormalities.
Dataset Request & Citing
In order to follow-up the investigation interest in this area we ask the researchers interested in this dataset to send us an e-mail, with a brief description of your work (one, two paragraphs would be more than enough) and the institute/research center you are affiliated to. A temporary download link will be send to you in the next few hours following the e-mail reception.
To reference the dataset in any publication describing research performed using the dataset, or sets derived from the original dataset made available here please cite the following paper, in which the dataset was first presented and made public:
- Artem Khmelinskii, Rodrigo Ventura and João Sanches, Automatic Chromosome Pairing for Karyotyping Purposes Using Mutual Information, NOME DE REVISTA, ANO, PÁGINAS, ETC.
Thank you and good work!
Contact
For dataset request, questions, comments and suggestions on the data and the website, report bugs or typos, please contact:
Artem Khmelinskii
e-mail: artkhmelinskii (##) isr.ist.utl.pt