1. Introduction
According to the World Health Organisation (WHO), cancer is the leading cause of global mortality, resulting in approximately 10 million deaths in 2020, accounting for almost one in every six deaths. The most prevalent cancers include breast, lung, colon, rectal, and prostate cancers. Several methods are frequently employed in cancer detection, histopathological imaging being a significant component. However, histopathological imaging presents challenges due to the substantial volume of data and the complexity of interpretation. Pathologists devote considerable time to analysing these images and providing diagnostic evaluations. It is important to note that even when provided with identical visual data, pathologists can occasionally arrive at differing conclusions. This variability can be influenced by various factors, including fatigue and concentration lapses.
With the advancement of technology, it is now possible to digitise slides of histological images using different types of cameras and microscopes that can clearly visualise and define the structure of the tissue. The quality of the scan varies from one raw material to another, which introduces some differences in the slides. During the digitisation process, a hematoxylin and eosin (H&E) colouring is used. This advancement makes it possible to employ CAD systems and artificial intelligence, which can help the pathologist in assessing the accuracy of diagnosis and prognosis and also save time based on these digitised images.
Early detection and classification of any cancer are crucial to improving the survival rate, which requires a substantial number of pathologists and time. Therefore, there is a consistent need for CAD systems to ease pathologist workload and assist in the early treatment of cancer before metastasis occurs, which allows pathologists to focus on more pressing and intricate tasks. Microscopic and histopathological slides are to be used to detect and classify cancer into multiple categories at an early stage. They provide in-depth information about the disease and its impact on cells[1], thanks to the preparation process that preserves the architecture of the underlying tissue. The diagnosis of a histopathological image remains the gold standard, allowing the diagnosis of various diseases, including almost all types of cancer. Most previous studies published on cancer detection focused on artificial intelligence and deep learning algorithms based on mammography, ultrasound, and magnetic resonance imaging (MRI) but rarely discussed histopathology images as datasets. The remainder of this paper is structured as follows: Section 1 defines histopathological imaging, its utility in detecting cancers compared to other modalities, and the histological slide acquisition process. Section 2 highlights the types of segmentation used in histopathological images. Section 3 provides some approaches to cancer diagnosis and prognosis. Section 4 deals with the performance measures frequently used to evaluate the efficiency of the algorithms used in the segmentation and classification work listed in this document. Section 4 introduces a review of the literature based on histopathological imaging for diagnosis and prognosis. Finally, Section 5 addresses research directions and provides a conclusion for this paper.
2. Medical imaging modalities
As far as cancer segmentation and classification using artificial intelligence is concerned, there are four medical imaging modalities that can be split into two categories: coloured images and greyscale images. In this work, we shed light on histopathological imaging as a modality. Most of the previous works mainly use mammography images due to their availability and widespread use. Mammography imaging technology is used predominantly in scenarios that require binary classification, often distinguishing findings as benign or malignant[1]. However, cancer segmentation and classification encompass a broader spectrum of complexities and variations. Histopathological imaging (HP), on the other hand, presents a rich dataset that excels in multi-class classification tasks[1]. While HP imaging is less frequently employed, it shines when dealing with scenarios that require the categorisation of cancers into multiple classes, providing a more comprehensive understanding of the disease's intricate nature at an early stage.
2.1 Mammogram
A mammogram (MG), a prevalent diagnostic tool in breast imaging, provides radiologists with a crucial method to thoroughly examine breast tissue for any irregularities or abnormalities. In the presence of cancer, it usually appears as a detectable mass, as illustrated in Figure 1.

Fig. 1 Example of mammogram breast cancer screening
2.2 Magnetic resonance imaging
Magnetic resonance imaging (MRI) is a diagnostic technique that harnesses the power of magnetic fields and radio waves to generate comprehensive visual representations of the body's soft tissues, including critical areas such as the breasts, liver, lungs, and bones. Consequently, MRI images of cancer provide a unique level of insight into cancer's presence within soft tissues, surpassing the clarity achievable through some established methods like mammograms and ultrasounds. The high-resolution imaging offered by MRI not only aids in cancer detection but also in accurately assessing its extent. Figure 2 provides an illustrative example.

Fig. 2 Example of magnetic resonance breast cancer screening
2.3 Ultrasound
These images, often known as sonograms, offer a unique perspective on cancer detection. Unlike other imaging techniques such as mammograms and magnetic resonance imaging, cancer ultrasound stands out as a radiation-free diagnostic tool. It emits high-frequency sound waves, which travel into the body, bounce off organs and tissues, and then return as echoes. These echoes are expertly captured and translated into detailed images, providing a safe and radiation-free method of visualising internal structures, as shown in Figure 3.

Fig. 3 Example of ultrasound in the context of cancer diagnosis
2.4 Histopathological images
Histopathological images offer high-resolution microscopic views of cellular structures within tissues, revealing intricate details. In this modality, cancer diagnoses are made at the cellular level, well before tumour formation, making it a proactive and efficient approach. These images are indispensable for researchers and medical professionals, allowing for an accurate diagnosis of disease, particularly for various types of cancer, as illustrated in Figure 4.

Fig. 4 Example of breast cancer histopathology image
3. Image acquisition of histological slides
During the acquisition of histopathological slides, the pathologist takes samples from abnormal areas of the organ and mounts them on glass slides for microscopic examination[1]. Subsequently, they examined these samples using a microscope after staining them with haematoxylin and eosin. The stained slides are then digitised to obtain whole slide imaging (WSI) images. With WSI, the pathologist can extract suspicious samples at various magnifications, known as Regions of Interest (ROI), to classify and diagnose different subtypes of benign and malignant cancers.
4. Stain normalisation
In the digitalisation of WSI, the pathologist faces a common problem that causes a variation in colours between the digitised slides. Due to differences in the colour response of slide scanners, raw materials, and manufacturing techniques, to avoid this problem, it is critical to use stain normalisation as a pre-processing step before starting any analyses on histopathological images. For that, several studies based on this technique were used to unify the digitised image colours and improve the quality of analysis. For example, Reinhard et al.[2] proposed an approach based on matching the colour histogram statistics of the source and destination images. These techniques assume that each staining of the digitalised image must contain some proportion of stained tissue. In WSI digitisation, pathologists often encounter a common issue that leads to colour variations among digitised slides. These variations arise from differences in the colour response of slide scanners, raw materials, and manufacturing techniques. To address this problem, it is essential to incorporate stain normalisation as a pre-processing step before starting any analyses on histopathological images. Several studies have focused on techniques to standardise the colours of digitised images and enhance the quality of analysis. For example, Reinhard et al.[2] proposed an approach that involves aligning the colour histogram statistics of the source and destination images. These techniques operate under the assumption that each staining in the digitised image should contain a certain proportion of stained tissue.
Figure 5 represents the difference between normalised and unnormalised samples of four types of breast cancer histological images.

Fig. 5 Differences in normalised and unnormalized samples of four types of breast cancer histological images
5. Pathology image segmentation
In this context, the objective of the segmentation task is to assign a class label to each patch or pixel within the image to extract objects such as cells. Histopathological image segmentation can be divided into two primary categories: Tissue-Level Segmentation and Nuclei-Level Segmentation. Due to the diverse patterns found in whole slide images (WSI), achieving precise segmentation of nuclei and tissues is a formidable challenge. To begin with, variations exist in both nucleus and tissue sizes and shapes, necessitating a segmentation model with robust generalisation capabilities. Furthermore, nuclei and cells often congregate in small clusters, which may lead to overlap or contact issues that can result in under-segmentation within histopathological images. Lastly, in the case of certain malignant conditions, such as moderate and poorly differentiated adenocarcinomas, the structures of tissues may be significantly distorted, complicating a complex task. Numerous studies have presented deep learning approaches to overcome these challenges in nuclei and tissue segmentation, with the ultimate goal of extracting features from WSI while achieving the highest segmentation performance.
5.1. Tissue-level segmentation
The whole slide image (WSI) covers a tissue area of approximately 15 mm × 15 mm, resulting in images of several gigapixels in size. Handling these exceptionally large images can pose computational challenges. It is common practice to initially identify regions of interest on the slide and subsequently conduct a more in-depth image analysis. Most slides are empty and do not contain tissue. Most WSI scanners can identify unsuspected areas on the slide during the processing phase and omit them, thereby reducing scanning time. A method for monitoring tissue location is introduced in[3]. In most published studies on tissue segmentation, this is achieved through supervised pixel-wise classification of small rectangular image regions using colour and texture features[4][5]. However, an unsupervised method has also been proposed[6]. Several studies have employed deep learning methods for the segmentation of various tissue types in WSI[7, 8, 9]. A U-Net-based neural network is among the deep learning segmentation algorithms. For example, Saltz et al.[10] used the U-Net network to create lymphocyte mappings in H&E images in 13 TCGA dataset tumour types. Following suit, Raza et al.[7] achieved minimal information loss using a dilated network for gland instance segmentation in colon histology images. Lu et al.[11] introduced Brcaseg, a deep learning-based automatic segmentation technique for WSI processing that uses TCGA breast cancer WSI tissues from TCGA and employs a U-Net structure. Wen et al.[12] used a Gabor-based module for tissue segmentation to extract texture information at varying scales and directions. Rojthoven et al.[13] proposed a semantic segmentation model, HooskNet, that incorporates contextual information in WSIs using a CNN. Figure 6 presents the general segmentation steps at the tissue level.

Fig. 6 Segmentation steps at the tissue level
5.2. Nuclei-level segmentation
Nuclei-level segmentation, or cellular object segmentation, aids in exploring nuclei features from histopathological imaging. It focuses on the morphological appearances. Deep learning methods based on this type of segmentation can be categorised into two approaches: pixel-wise classification[14] and fully convolutional network (FCN)-based methods[15]. Pixel-wise classification methods transform the segmentation task into a classification task, where the label of each pixel is predicted based on the value of the raw pixel within a square window centred on it[16]. For example, Cireşan et al.[14] used the deep neural network (DNN) as a pixel classifier and the image intensities in a square window centred on the pixel as input. Xing et al.[17] introduced a CNN model that learnt to generate a probability map for each image, with each pixel in this probability map assigned a probability of belonging to the nucleus. Finally, an iterative region merging algorithm was used to complete the segmentation task. In addition, Nesma et al.[18] adopted an optimised pixel-based classification model in conjunction with the region-growing strategy, successfully obtaining nuclear and cytoplasmic segmentation results. In the fully convolutional network (FCN)-based method, we used all HP images instead of using extracted patches as input. This technique can be more efficient and accurate for nuclei segmentation[19].
U-net is one of the nuclei segmentation architectures used in addition to FCN. U-Net incorporates skip connections between the down-sampling and up-sampling paths, which help stabilise gradient updates during deep model training. Amirreza et al.[20] presented a U-Net-based model with two sequential stages for segmenting touching cells. In the first stage, U-Net is used to separate cell nuclei from the background, and in the second stage, a regression U-Net is applied to create a distance map for each nucleus, facilitating the final step of cell segmentation. Yang et al.[21] implemented a hybrid network that combines U-Net and regional proposals to separate touching nuclei that are difficult to segment individually. Zhao et al.[22] introduced an architecture, known as Triple U-Net, based on U-Net for nuclei segmentation, eliminating the need for colour normalisation. Schmitz et al.[23] proposed a family of deep fusion for nuclei segmentation, intending to identify and segment nuclei in histopathological images and improve the segmentation task's performance. Figure 7 illustrates the steps of segmentation at the nuclei level.

Fig. 7 Segmentation steps at the nuclei level
6. Cancer diagnosis and prognosis
WSI images typically have a large size, approximately 100,000 × 100,000 pixels, which poses a significant challenge when applying deep learning-based classification and prediction techniques. It is almost impractical to use WSI as input in CNN model training[24, 25]. Therefore, histopathological imaging-based studies follow two distinct approaches: patch-based and WSI-based.
6.1. Patch-level methods
In a patch-based approach, the pathologist must select a representative region of interest (ROI) from the WSI and decompose the selected region into patches of smaller size to train the model. Zhu et al.[26] proposed a CNN approach named DeepConvSurv that used patches extracted from WSI. The algorithm developed achieved better results than the standard Cox proportional hazard model. Vesal et al.[27] based on patch-wise and WSI-wise methods to develop two CNN architectures, ResNet50 and InceptionV3, and the patch-wise technique gives significant results in validation on both architectures compared to the other technique. Hou et al.[28] proposed the patch-wise technique with maximum expectation (EM). A typical deep learning method requires a substantial dataset and a long time to train a robust model. Since the histopathologic datasets are not very available and are expensive to create, the majority of patch-based methods employ the technique of transfer learning (TL) that transfers and refines the learnt knowledge of a pre-trained model on a huge dataset. Thus, creating an accurate model can be done with a minimum of (dataset) images and time.
Xu et al.[29] achieved region-level classification results using CNN activation features. First, each selected region from the WSI was segmented into a group of patches. They also proposed a pre-trained CNN architecture based on transfer learning techniques with the ImageNet dataset and, for the final classification, adopted an SVM classifier. Similarly, Källén et al.[30] divided a WSI into multiple patches to extract the characteristics of each one using a pre-trained OverFeat network and used a Random Forest (RF) to classify the subtypes in prostatic adenocarcinoma. Mercan et al.[31] used a pre-trained VGG-16 network on pre-selected patches to extract features and use them to classify the WSI through average pooling.
6.2. WSI-level methods
The patch-based prediction approaches mentioned above still have some shortcomings. Most of these existing methods usually assume that the diagnosis of every patch selected from the corresponding WSI is identical to it, so the patch labels are not necessarily the same as whole slide imaging labels due to the heterogeneous patterns[32]. Patch-based approaches require a large number of patch annotations, which is very difficult for pathologists[33]. To bypass these challenges, several studies focused only on annotation at the WSI level, such as[34, 32, 35]. Multi-instance learning (MIL) is one of the most effective tools. Shao et al.[34] added a ranking-based regularisation to the Cox model to consider the ordinal characteristic of survival and then aggregated instance predictions to the whole slide predictions using average pooling. In the same way, Yao et al.[36] proposed an approach to WSI survival prediction, namely, an attention-guided deep multiple instance learning network (DeepAttnMISL), which guarantees adaptive assemblies of survival prediction. Moreover, Chikontwe et al.[37] introduced a novel multi-instance learning (MIL) architecture for histopathology slide classification. This technique can be implemented for bag-based and instance-based learning with a centre loss to minimise the distances in the embedding space of the interclass. Furthermore, Wang et al.[32] extracted spatial contextual characteristics from each patch individually. Then they calculated a globally holistic region descriptor after collecting characteristics from several instances for a classification based on WSI.
7. Evaluation metrics
Generally, every model created after image pre-processing, training, and validation steps should be evaluated based on performance. Several metrics are calculated for this purpose using test images. Each metric has its formula written according to the confusion matrix presented in Table 1. This matrix contains row classes and column class labels, which, respectively, represent input classes and predicted class labels. Cancer could be a true positive or true negative in the case of correct classification and a false positive or false negative in the case of incorrect classification. The most commonly used and popular evaluation measures for cancer classification and segmentation are accuracy, sensitivity, specificity, precision, F measure, AUC and the area under the ROC curve (AUROC), and Intersection-Over-Union (IoU). These metrics are defined in Table 2.
Table 1 Confusion matrix
| Actual Positive (1) | Actual Negative (0) | |
|---|---|---|
| Predicted Positive (1) | True Positives (TP) | False Positives (FP) |
| Predicted Negative (0) | False Negatives (FN) | True Negatives (TN) |
Table 2 Performance metrics
8. Literature review
The field of detection and classification of different cancers through histopathological imaging has been the subject of several published studies.
For example, Kwok et al.[38] proposed an approach based on Inception-ResNet-v2 to classify early-stage histopathological breast cancer images from the dataset ICIAR2018 Grand Challenge on Breast Cancer Histology Images into four subtypes: normal tissue, benign lesion, in situ carcinoma, and invasive carcinoma. The method achieved an accuracy of 87%. The article did not mention any limitations of the proposed technique. Alom et al.[39] proposed a repeat residual U-Net (R2U-Net) for the segmentation of images from medical nuclei. Their experimentation employed a publicly available dataset derived from the Data Science Bowl Grand Challenge 2018. The results of their study demonstrated that the model was capable of accurately segmenting nuclei images, achieving an accuracy rate of 92.15%. Want et al.[40] classified liver microscopic images into three subtypes: normal, granuloma-fibrosis1, and granuloma-fibrosis2, using convolutional neural networks (CNN) and two machine learning techniques: Support Vector Machine (SVM) and Random Forest (RF). Using a limited mouse liver dataset, the results demonstrated that the proposed CNN-based classifiers achieved an accuracy of 82.78% in distinguishing between the three image types. The dataset used contains only 30 mouse livers, which can limit the generalisability of the results. Vuola et al.[41] compared two widely used segmentation frameworks, U-Net and Mask-RCNN, in the nuclei segmentation domain. They combined these frameworks to develop a high-performing model, harnessing the strengths of both, resulting in enhanced performance. The model was trained using the Kaggle 2018 Data Science Bowl dataset. The model validation was performed using a four-fold cross-validation approach, and it achieved an overall IOU result of 0.7 in terms of map, dice, precision, and recall, with respective values of 52.30%, 65.90%, and 72.50%. Pan et al.[42] proposed a novel approach for Semantic Nuclei Segmentation, an extension of U-Net named AS-Unet. This approach is designed to handle both small and large cells by extracting multi-scale features. The model was trained on two datasets: a multi-organic HE stained pathological image dataset (MOD) and a breast cancer image dataset (BNS). For the MOD dataset, the proposed approach achieved an accuracy, F1-score, and IoU of 92.82%, 87.35%, and 77.72%, respectively. In contrast, the same approach with the BNS dataset achieved 96.86%, 86.97%, and 77.31% for accuracy, F1-score, and IoU, respectively. This method demonstrated the best performance on both nuclei segmentation datasets. Zeng et al.[43] developed a novel architecture, the RIC unit, based on U-net to achieve more accurate nuclei segmentation using the Cancer Genomic Atlas (TCGA) dataset. This method achieved Dice, F1 score, and aggregated Jaccard scores of 80.08%, 82.78%, and 56.35%, respectively. The proposed model effectively addresses the issue of overlapping and separating touching cells through contour prediction, which is a common challenge in the nuclei segmentation task. Despite efforts to mitigate overfitting in the model's development, a deeper network remains essential for learning more intricate features. Mahbod et al.[44] developed a two-stage U-Net algorithm that was used to segment nuclei in H&E-stained tissue images. The architecture comprised a convolutional neural network (CNN) and a U-Net, both using distinct activation functions and a weighted loss function to detect and localise the nuclei in the images. The experimental results demonstrated that the two-stage U-Net achieved an average intersection over union (IoU) of 56.87% in segmenting nuclei in H&E-stained tissue images. The proposed approach yields favourable results when compared to several state-of-the-art algorithms applied to the same dataset. Toaçar et al.[45] developed a novel deep learning model based on a convolutional neural network (CNN) with a residual architecture to improve the effectiveness of breast cancer classification. This model, named BreastNet, outperformed the AlexNet, VGG16, and VGG-19 architectures, as well as traditional machine learning methods, achieving an accuracy of 98.80% when evaluated in the BreaKHis dataset. The article did not mention any limitations of the proposed technique. Celik et al.[46] introduced two pre-trained models, CNN ResNet-50 and DenseNet-161, to extract characteristics to distinguish between breast cancer subtypes using the BreaKHis breast cancer dataset with various magnification levels. As a result of the study, the DenseNet-161 model has achieved an accuracy and F-score of 91.57% and 92.38%, respectively; on the other hand, the ResNet-50 has achieved 90.96% and 94.11%, respectively, for accuracy and F-score. Gour et al.[47] developed an automated approach based on ResHist, a convolutional neural network, to classify histopathological images of breast cancer into two major subtypes: benign and malignant. They designed a data augmentation technique based on stain normalisation and generated patches from histopathological images of the BreaKHis dataset. In this study, the proposed approach outperformed some existing approaches, including AlexNet, VGG16, VGG19, GoogleNet, Inception-v3, and ResNet50. This approach achieved an accuracy of 84.34% and an F1 score of 90.49% in the classification of histopathological images. It also showed a precision of 92.52% and an F1 score of 93.45% when using patches and data enhancement. As part of this study in the future, the authors intend to validate the ResHist approach using a more extensive dataset to diagnose breast cancer. They also plan to explore the ResHist method to diagnose other types of cancer, such as lung cancer, colon cancer, and prostate cancer. Murtaza et al.[48] proposed a classification model known as the Biopsy Microscopic Image Cancer Network (BMIC_Net) to differentiate between eight distinct subtypes of breast cancer using the BreakHis dataset. They employed feature reduction techniques to extract the most discriminative feature subset. The proposed model outperformed existing standard models, achieving an accuracy of 95.48% for first-level classifiers (the BC1 classifier is used to categorise images into two distinct classes, benign and malignant). Furthermore, it achieved 94.62% accuracy rates for second-level classifiers, where the B2 classifier is responsible for predicting four subtypes of benign BC (adenosis (A), fibroadenoma (F), tubular adenoma (TA), and phyllodes tumour (PT)), and the M2 classifier is tasked with classifying images into four different subtypes of malignant BC (ductal carcinoma (DC), lobular carcinoma (LC), mucinous carcinoma (MC), and papillary carcinoma (PC)). For each classification level, six traditional ML algorithms were applied, namely, the kNN where k=1, SVM, NB, DT, LDA classifier, and LR. These six traditional ML algorithms were applied to see which one performs better. The k-Nearest Neighbours (kNN) algorithm exhibited superior performance compared to other traditional machine learning algorithms. Furthermore, the hierarchical classification model demonstrated its superiority over the one-level classification model. As a future work of this study, they envisage the creation of a classification model to categorise BC using any modality type. Hameed et al.[49] used four distinct models based on pre-trained VGG16 and VGG19 architectures: fully trained VGG16, fine-tuned VGG16, fully trained VGG19, and fine-tuned VGG19 models. These models were trained using a 5-fold cross-validation technique on a private dataset of whole slide images (WSI) to classify breast cancer histopathology images into two classes: non-carcinoma and carcinoma. Using the fine-tuned VGG16 and fine-tuned VGG19 models outperformed the other methods, achieving an accuracy, sensitivity, and F1 score of 95.29%, 97.73%, and 95.29%, respectively. The authors of this study compared their work with state-of-the-art studies using the BreakHis dataset and trained the model using a smaller dataset containing only two-class images. However, this presents a limitation in terms of multi-class classification. Mewada et al.[50] used two datasets, the 2015 BreaKHis dataset and the Breast Cancer Classification Challenge dataset. They employed spectral characteristics obtained through a multi-resolution wavelet transform and spatial characteristics extracted using a novel CNN architecture to classify histopathological images of breast cancer. This integrated approach resulted in an improved classification average precision of 97.58% and 97.45% for both datasets and an AUC value ranging from 99.20% to 99.49% for different magnification factors in the BreaKHis dataset, surpassing ALEXNET and VGG16. The CNN architecture used in this work cannot handle high-resolution images, which represents a limitation of this study. Kong et al.[51] introduced a novel approach to segmenting nuclei in histopathological images using two-stage stacked U-nets with attention mechanisms. The proposed method is based on deep learning and uses a combination of convolutional neural networks (CNN) and U-net for segmentation. The two-stage stacked U-net architecture (SUNets) allowed for accurate segmentation of nuclei in images with high variability in shape, size, and staining intensity. The method is evaluated based on several publicly available datasets, including the Cancer Genome Atlas (TCGA) and Triple Negative Breast Cancer (TNBC). The results showed that the proposed method achieved superior performance compared to other state-of-the-art methods in terms of various evaluation metrics, including the mean aggregate Jaccard Index (AJI) results, which were 59.65% and 62.10%, and F1 scores using TCGA, which were 82.47% and 80.60%, using TNBC. Chen et al.[52] proposed a method for segmenting nuclei in medical images using boundary-assisted region proposal networks (BA-RPN) that achieved better performance in three public datasets for nucleus segmentation and outperformed several existing methods. Specifically, the proposed method achieved the following Dice similarity coefficient (DSC) values: 86.7% on the MoNuSeg dataset, 85.5% on the TNBC dataset, and 89.8% on the PanNuke dataset. Ohata et al.[53], automatic techniques were used to categorise eight subtypes of colorectal cancer. Convolutional neural network (CNN) structures were used to extract image features, which were subsequently input into Naive Bayes, Multilayer Perceptron, k-Nearest Neighbours, Random Forest and Support Vector Machine (SVM) classifiers. Through this combination of methods for the extraction of histopathological images and machine learning algorithms for classification, the best performance was achieved using DenseNet169 in conjunction with the SVM (Radial Basis Function) classifier, generating a precision of 92.08% and an F-score of 92.12%. The dataset used was from the University Medical Centre Mannheim, which may not contain all variations. The authors aim to investigate combinations of various CNNs and traditional methods for feature extraction to improve metrics for future research. Vahadane et al.[54] introduced an innovative deep learning approach called dual encoder Attention U-net (DEAU) for the nuclei segmentation task. They incorporated the convolutional blur attention (CBA) network to mitigate noise generation and PyramidBlur Pooling (PBP) to handle various information scales. The model was trained and evaluated using two datasets, the 2018 Data Science Bowl Challenge dataset (DSB) and the multi-organ nucleus segmentation dataset (MoNuSeg). The model achieved an overall IoU, F1, recall, and precision of 84.29%, 92.82%, 89.89%, and 95.96%, respectively, for the DSB dataset, and 79.85%, 82.47%, 81.25%, and 84.29% for the MoNuSeg dataset. Tran et al.[55] proposed a new model based on a U-Net architecture named TMD-Unet to avoid the limitation that it is not able to fully exploit the output features of the convolutional units in the node. To train and validate this model, they used seven different datasets. The implemented method in this paper achieved a dice score of 92.49% for nuclei segmentation, 96.43%. for liver segmentation, 95.51% for spleen segmentation, 92.65% for polyp segmentation, 94.11% for EM segmentation, 91.81% for left atrium segmentation, and 87.27% for skin lesion segmentation using seven datasets for evaluation, including colonoscopy, electron microscopy (EM), dermoscopy, computed tomography (CT), and magnetic resonance imaging (MRI). He et al.[56] addressed the challenge of segmenting overlapping nuclei by introducing a hybrid attention nested UNet, called HanNet. This approach uses discriminative features to segment the boundaries of diverse and small nuclei, employing a publicly available multi-organ nuclear segmentation dataset. The test results demonstrated that the model achieved an F1 score and Dice coefficient of 88.75% and 80.21%, respectively, for the first test (comprising eight images in total) and 88.02% and 81.85%, respectively, for the second test (including six images in total). Dabass et al.[57] proposed a deep learning-based approach for automated gland segmentation in colon histopathology images using an attention-guided deep atrous-residual U-Net architecture. The proposed approach achieved a mean intersection over union (IoU) of 84.8% and a mean Dice similarity coefficient (DSC) of 91.5%, indicating a high level of precision in gland segmentation. The approach was also trained and validated on the publicly available dataset, called the Gland Segmentation (GlaS) dataset. Lal et al.[58] proposed a robust deep learning architecture for nuclei segmentation from liver cancer histopathology images (KMC liver dataset); the model achieved an F1 score of 81.36%. Le et al.[59] focused on nucleus segmentation within cell microscope images from the Data Science Bowl 2018 dataset; a novel architecture known as Double ResPath Unet (DR-Unet) was proposed. The authors identified limitations in previous models, particularly ResUnet++, which exhibited a semantic gap between features directly connecting the encoder and decoder, thus impeding information extraction across various regions. DR-Unet uses double ResPath (DR) to enhance the capture of contextual information through Progressive Atrous Spatial Pyramidal Pooling (PASPP). The experimental results underscore that DR-Unet surpasses ResUnet, DoubleUnet, and other benchmark models in nuclei segmentation. Ali et al.[60] presented MSAL-Net, a new deep learning architecture for the accurate segmentation of nuclei in histopathological images. MSAL-Net integrates feature-level and spatial-level attention mechanisms to capture local and global context information. The proposed method was evaluated on two publicly available datasets, the TNBC and the CRC. The method achieved a Dice similarity coefficient (DSC) of 91% and a mean intersection over union (IoU) of 84.4 using the TNBC dataset as well as a DSC of 0.877 and a mean IoU of 81.7% using the CRC dataset. Models Ilyas et al.[61] presented TSFD-Net, a deep learning approach for nuclei segmentation and classification in histological images. The model is evaluated on several publicly available datasets, achieving high segmentation accuracy with an average Dice coefficient of 86.6% and classification accuracy of 93.8%. TSFD-Net also showed improved generalisation to unseen tissue types and pathological conditions. Dabass et al.[62] developed a multitasking U-net model with hybrid convolutional learning and attention modules for cancer classification and gland segmentation in colon histopathological images. It achieved an F1 score of 93.3% and an object-dice index of 93.5% for the gland detection and segmentation task. Kiran et al.[63] proposed a Unet-based architecture called DenseRes-Unet. They used residual connections with Atrous blocks instead of a conventional skip connection to reduce the semantic gap between the encoder and decoder paths. The proposed approach is trained and evaluated on a MoNuSeg dataset, achieving an accuracy of 89.77%, an F1-score of 90.36%, and an Aggregated Jaccard Index (AJI) of 78.61%. Tran et al.[64] proposed Trans2Unet, a new two-branch architecture for nuclei segmentation in histopathological image analysis. As the main challenge, they highlight the existence of overlapping areas, which makes separating independent nuclei more complicated. Trans2Unet combines the Unet and TransUnet networks, with the Unet branch enabling the network to combine features from different spatial regions of the input image and localise regions of interest more precisely. The second branch, TransUnet, uses a Vision Transformer (ViT) to enhance image details by recovering localised spatial information. They also propose infusing TransUnet with a computationally efficient variation module, the "Waterfall" Atrous Spatial Pooling with Skip Connection (WASP-KC), to boost Trans2Unet efficiency and performance. Experimental results on the 2018 Data Science Bowl benchmark demonstrate the effectiveness and performance of the proposed architecture compared to previous segmentation.
Table 3 summarises the techniques to detect and classify cancer in the studies mentioned in this work.
Table 3 Review of the literature on cancer histology image classification
| Authors | Pub. Year | Dataset | Architecture Proposed | Type of Cancer | Acc (%) | SP (%) | F-score (%) | SN (%) | Dice (%) | IoU (%) | Pr (%) |
| Kwok et al.[38] | 2018 | ICIAR2018 Grand Challenge on Breast Cancer Histology Images dataset | Inception-ResNet-v2 | Breast Cancer Subtypes | 87.00 | -- | -- | -- | -- | -- | -- |
| Alom et al.[39] | 2018 | Data Science Bowl Grand Challenge 2018 dataset | Recurrent Residual U-Net (R2U-Net) | Nuclei Segmentation | 92.15 | -- | -- | -- | -- | -- | -- |
| Want et al.[40] | 2019 | Mice liver microscopic images dataset | Convolutional Neural Networks (CNNs) | Liver Cancer | 82.78 | -- | -- | -- | -- | -- | -- |
| Vuola et al.[41] | 2019 | Kaggle 2018 Data Science Bowl dataset | Combined U-Net and Mask-RCNN | Nuclei Segmentation | -- | -- | -- | 72.50 | 52.30 | -- | 65.90 |
| Pan et al.[42] | 2019 | MOD datasets | AS-Unet | Nuclei Segmentation | 92.82 | -- | 87.35 | -- | -- | 77.72 | -- |
| Pan et al.[42] | 2019 | BNS datasets | AS-Unet | Nuclei Segmentation | 96.86 | -- | 86.97 | -- | -- | 77.31 | -- |
| Zeng et al.[43] | 2019 | The Cancer Genomic Atlas (TCGA) dataset | RIC-unit | -- | -- | -- | 82,78 | -- | 88.75 | -- | -- |
| Mahbod et al.[44] | 2019 | H&E-stained tissue images | Two-stage U-Net Algorithm | -- | -- | -- | -- | -- | -- | 56.87 | -- |
| Togaçar et al.[45] | 2020 | BreaKHis dataset | BreastNet (CNN) | Breast Cancer | 98.80 | -- | -- | -- | -- | -- | -- |
| Celik et al.[46] | 2020 | BreaKHis dataset | ResNet-50 | Breast Cancer Subtypes | 91.57 | -- | 92.38 | -- | -- | -- | -- |
| Celik et al.[46] | 2020 | BreaKHis dataset | DenseNet-161 | Breast Cancer Subtypes | 90.96 | -- | 94.11 | -- | -- | -- | -- |
| Gour et al.[47] | 2020 | BreaKHis dataset | ResHist (CNN) | Breast Cancer Subtypes | 84.34 | -- | 90.49 | -- | -- | -- | -- |
| Murtaza et al.[48] | 2020 | BreakHis dataset | BMIC_Net first level classifier | Breast Cancer Subtypes | 95.48 | -- | -- | -- | -- | -- | -- |
| Murtaza et al.[48] | 2020 | BreakHis dataset | BMIC_Net second level classifier | Breast Cancer Subtypes | 94.62 | -- | -- | -- | -- | -- | -- |
| Hameed et al.[49] | 2020 | Private Dataset | Fine-tuned VGG16 and Fine-tuned VGG19 | Breast Cancer Histopathology | 95.29 | -- | 95.29 | 97.73 | -- | -- | -- |
| Mewada et al.[50] | 2020 | BreaKHis Dataset | Multi-Resolution Wavelet Transform and CNN | Breast Cancer | 97.58 | -- | -- | -- | -- | -- | -- |
| Mewada et al.[50] | 2020 | Breast Cancer Classification Challenge 2015 dataset | Multi-Resolution Wavelet Transform and CNN | Breast Cancer | 97.58 | -- | -- | -- | -- | -- | -- |
| Kong et al.[51] | 2020 | TNBC datasets | Two-stage Stacked U-nets with Attention Mechanisms (SUNets) | -- | 82.47 | -- | -- | -- | -- | 80.60 | -- |
| Chen et al.[52] | 2020 | MoNuSeg dataset | Boundary-Assisted Region Proposal Networks (BA-RPN) | Nuclei Segmentation | -- | -- | -- | -- | 86.70 | -- | -- |
| Chen et al.[52] | 2020 | TNBC dataset | Boundary-Assisted Region Proposal Networks (BA-RPN) | Nuclei Segmentation | -- | -- | -- | -- | 85.50 | -- | -- |
| Chen et al.[52] | 2020 | PanNuke dataset | Boundary-Assisted Region Proposal Networks (BA-RPN) | Nuclei Segmentation | -- | -- | -- | -- | 89.80 | -- | -- |
| Ohata et al.[53] | 2021 | University Medical Centre Mannheim dataset | DenseNet169 and SVM | Colorectal Cancer Subtypes | 92.08 | -- | 92.12 | -- | -- | -- | -- |
| Vahadane et al.[54] | 2021 | Multi-organ nucleus segmentation dataset (MoNuSeg) | DEAU with CBA and PBP | Nuclei Segmentation | -- | -- | 92.82 | 89.89 | -- | 84.29 | 95.96 |
| Vahadane et al.[54] | 2021 | 2018 Data Science Bowl challenge dataset (DSB) | DEAU with CBA and PBP | Nuclei Segmentation | -- | -- | 82.47 | 81.25 | -- | 79.85 | 84.29 |
| Tran et al.[55] | 2021 | HP images datasets | TMD-Unet | -- | -- | -- | -- | -- | 92.49 | -- | -- |
| Kiran et al.[58] | 2021 | KMC liver dataset | Robust Deep Learning Architecture | Nuclei Segmentation | -- | -- | 81.36 | -- | -- | -- | -- |
| He et al.[56] | 2021 | MoNuSeg dataset | Hybrid-attention Nested UNet (HanNet) | -- | -- | -- | -- | -- | 88.75 | -- | -- |
| Dabass et al.[57] | 2021 | GlaS dataset | Deep Atrous-Residual U-Net Architecture | Gland Segmentation | -- | -- | -- | -- | 91.60 | 84.8 | -- |
| Ali et al.[60] | 2022 | TNBC dataset | MSAL-Net | -- | -- | -- | -- | -- | 91.00 | 84.40 | -- |
| Ali et al.[60] | 2022 | CRC dataset | MSAL-Net | -- | -- | -- | -- | -- | -- | 81.07 | -- |
| Ilyas et al.[61] | 2022 | Multiple Datasets | TSFD-Net | Nuclei Segmentation and Classification | -- | -- | -- | -- | 86.60 | -- | -- |
| Dabass et al.[62] | 2022 | GlaS dataset | Multi-tasking U-Net Model | Cancer Classification and Gland Segmentation | -- | -- | 93.3 | -- | 93.5 | -- | -- |
| Kiran et al.[63] | 2022 | KMC liver dataset | DenseRes-Unet | Liver Cancer Nuclei Segmentation | 78.77 | -- | 90.36 | -- | -- | 78.61 | -- |
9. Datasets
This section provides an in-depth analysis of public datasets used in various studies for cancer classification.
The Cancer Genome Atlas (TCGA) is a public dataset used to detect and classify various types of cancer pathology and radiology images. It was created by the National Cancer Institute (NCI) and the National Human Genome Research Institute of the United States in 2006. This dataset comprises more than 30,000 images in two formats: SVS and DICOM, encompassing 30 types of cancer extracted from 11,000 patients[65].
BreaKHis is a publicly available dataset used for the detection and classification of breast cancer histological images. It was created in 2014 by the P&D Laboratory and the Department of Pathological Anatomy and Cytopathology in Brazil. The latest version of the dataset consists of 7,909 histopathological biopsy images, each with dimensions of 700 × 460 pixels and containing 3 RGB channels in PNG format. These images are available at magnifications of 40×, 100×, 200×, and 400×. Of the total images, 2,480 represent normal cases, while 5,429 represent abnormal cases[66]. The dataset is derived from 82 patients, and the abnormal class is further categorised into four subcategories based on size and appearance.
The University Medical Centre Mannheim Dataset is a public dataset comprising 5,000 images in Tiff format, each with a size of 150x150 pixels. It is mainly used for the detection and classification of eight types of colorectal cancer. The dataset was generated by the Institute of Pathology at the University Medical Centre Mannheim, Heidelberg University, Germany, in 2016. The images were digitised using an Aperio ScanScope (Aperio/Leica Biosystems) with a magnification of 20x.
ImageNet is a publicly available database containing more than 14 million images, with an average image resolution of 469x387 pixels. It is generally used for classification and detection tasks in various fields. This dataset is created by the WordNet hierarchy, with the most recent version updated in 2021.
Mice liver microscopic images are a small private dataset used to detect and classify liver cirrhosis. It contains 30 images, with 10 showing normal cases and 20 illustrating abnormal cases, divided into two classes: granuloma-fibrosis1 and granuloma-fibrosis2. These images have a resolution of 1536 x 2048 pixels.
The dataset of the 2018 Competition Data Science Bowl contains 670 segmented nucleus images with a size of 256 by 256 pixels. Images were acquired under various conditions and vary in cell type, magnification, and imaging modality (bright field versus fluorescence). The dataset is designed to challenge the 2018 Data Science Bowl competition[67]. Each image is represented by an associated image ID. The dataset folder contains two subfolders: an 'images' folder for the image files and a 'masks' folder for the masks corresponding to the Image IDs from the 'images' folder.
Breast cancer histopathology image dataset (BNS)[68] contains 33 H&E-stained histopathology breast cancer images, each with a size of 512x512 pixels, along with their associated ground truth. In the ground truth image or mask, each pixel value above 0 is considered as the label for the corresponding nucleus. These images were collected during the diagnosis of breast cancer in 7 patients. Images from the BNS dataset have been categorised based on patients selected at random from an unpublished study on triple negative breast cancer (TNBC). The 512x512 samples are randomly cropped from the entire slide images, and 3 to 7 samples are chosen from each slide to ensure the diversity of the dataset. Each nucleus in the mask has been fully annotated using the ITK-SNAP software.
PanNuke[69][70] dataset, a valuable resource that consists of 7904 H&E-stained image patches derived from more than 20,000 whole slide images (WSI) representing 19 different tissue types. Within this dataset, nuclei are classified into five clinically significant groups, including neoplastic cells, inflammatory cells, connective cells, dead cells, and epithelial cells. Each of these is meticulously labelled with an instance segmentation mask. The dataset provides annotations at the patch level, with each patch measuring 256×256 pixels at 40× resolution. In particular, the original patches, originally scanned at 20×, were resized to 40× for the sake of consistency.
The multi-organ nucleus segmentation dataset (MoNuSeg)[71] comprises 30 H&E-stained histopathology images with a size of 1000 × 1000. These images, acquired at a magnification of 40x from various hospitals, offer a diverse range of nuclear appearances. MoNuSeg stands out as a widely used dataset, featuring images of seven distinct organs sourced from 30 different patients, including the breast, stomach, liver, prostate, kidney, colon, and bladder. Within this dataset, the images of the liver, breast, prostate, and kidney are designated for training and validation, while the images of the stomach, colon, and bladder are allocated for testing. In specific numbers, the dataset comprises 12 training images, 4 validation images, and 14 test images.
The multi-organ HE stained pathological image dataset (MOD) is a multi-organ HE stained pathological image dataset that contains 30 images of seven distinct organs: breast, liver, kidney, prostate, bladder, colon, and stomach. These images are high-resolution, each measuring 1000x1000 pixels, and collectively contain approximately 21,000 nuclei, collected by expert pathologists.
Triple Negative Breast Cancer (TNBC)[71] contains 50 high-resolution images captured at a magnification of 40x from 11 patients. These images show the diversity of tissue types within the breast, each measuring 512×512 pixels. The dataset encompasses a total of 4022 manually annotated nuclei, with individual images containing anywhere from a minimum of 5 to an average of 80 nuclei. The dataset contains images with diverse cell densities, ranging from sparse nuclei in adipose tissue to densely packed regions in invasive breast carcinoma.
Table 4 provides a detailed description of the public and private datasets used for cancer classification using histopathological images, including the Cancer Genome Atlas (TCGA), BreaKHis, ImageNet, and other datasets.
Table 4 Review of the datasets used
Table 5 represents a literature review of the dataset used by the studies cited in the article.
Table 5 Description and details of the datasets used
| Dataset Name | Number of Images | Normal | Abnormal | Availability | Resolution | Image format | Use for task |
|---|---|---|---|---|---|---|---|
| The Cancer Genome Atlas (TCGA) Pathology | 30,000 | -- | -- | Public | -- | SVS and DICO M format | Detection and classification of different types of cancer pathology and radiology images |
| BreaKHis | 7 909 | 2480 | 5429 | Public | 700X460 pixels, 3- channel RGB, 8-bit depth in each channel | PNG format | Detection and classification of breast cancer histology images |
| Dataset of the University Medical centre Mannheim[72] | 5000 | -- | -- | Public | 150 x 150 pixels | Tiff format | Detection and classification of eight types of colorectal cancer |
| ImageNet | 14 million | -- | -- | Public | The average image resolution on ImageNet is 469x387 pixels | -- | Detection, classification, segmentation, and object categorization in different fields |
| private dataset of whole slide images (WSI) | -- | -- | -- | private | -- | -- | Detection and classification of different cancer histology Images |
| Mice liver microscopic images | 30 | 10 | 10 (granulomafibrosis1) 10 (granulomafibrosis2) | Private | 1536 x 2048 pixels | -- | Detection and classification of liver cirrhosis |
| Dataset of the 2018 Data Science Bowl[67] | 670 | -- | -- | Public | 256 x 256 pixels | JPEG | Nuclei segmentation |
| Breast cancer histopathology image dataset (BNS) | 33 | -- | -- | Public | 512x512 pixels | -- | Nuclei segmentation of breast cancer |
| PanNuke dataset[69] | 7901 | -- | -- | Public | 256 × 256 pixels | -- | instance segmentation and classification |
| multi-organ nucleus segmentation dataset (MoNuSeg) | 14 | -- | - | Public | 1000 × 1000 pixels | -- | Nuclei segmentation |
| Multi-organ HE stained pathological image dataset (MOD) | 30 | -- | - | Public | 1000x1000 pixels | -- | Nuclei classification and segmentation |
| TNBC | 50 | -- | -- | Public | 512 × 512 pixels | -- | Breast cancer classification and segmentation |
10. Discussion and future research directions
In this study, existing works related to the detection and diagnosis of various types of cancer, with a particular focus on histological image modality and computer-aided diagnosis systems, are reviewed. The aim is to compile a list of existing methods and datasets commonly used for this purpose. Cancer is one of the deadliest diseases that humanity has faced, and there is an urgent need to improve its treatment and early diagnosis. Artificial intelligence (AI) plays a crucial role in achieving this objective by reducing pathological workload and increasing accuracy.
Most of the reviewed studies are based on deep learning methods, specifically employing convolutional neural networks (CNNs) and machine learning algorithms such as SVM, KNN, and random forest. In addition, some studies use transfer learning methods such as AlexNet, VGG16, and VGG-19. Deep learning models consistently yield superior results compared to classical AI algorithms in most cancer classification studies. There are also studies involving segmentation tasks that extensively use deep learning methods, particularly the U-Net network and CNN. Architectures such as Recurrent Residual U-Net and TMD-Unet have demonstrated exceptional results in the analysis of histological images. The most commonly used datasets in this field are The Cancer Genome Atlas and ImageNet.
Our literature review revealed that the discussed studies have concentrated on classification and segmentation. However, histopathological image analysis can be applied to various types of data since they share the same dataset. Therefore, it would be beneficial to explore other types of cancer. One challenge in histopathological analysis is acquiring diverse datasets to avoid overfitting and create well-trained models. Unfortunately, datasets in this field are not widely available. There is another crucial challenge related to the time and cost of the acquisition of biopsies, scanning, and digitalisation. This process is more time-consuming and expensive than other medical modalities. However, it is more advantageous because the diagnosis is performed at an earlier stage before becoming metastatic, making treatment easier and more efficient. Many studies use the same dataset for both training and validation, which can lead to less satisfactory results and less stable models. Instead, they could benefit from using diverse datasets to test their models. Furthermore, these datasets vary in terms of staining techniques, microscopes, and other factors.
In the following section, new research directions related to the detection, classification, and segmentation of histopathological cancer images are introduced to provide research paths and directions for other researchers in the same field. A considerable effort is required to engage in and improve the performance of techniques utilised for early detection and segmentation tasks. Therefore, some issues and future work will be further discussed.
Variation in nucleus/tissue sizes and shapes: Generally, the tissue in a slide image is heterogeneous, resulting in considerable variation in the shapes and sizes. Several existing studies have proposed a solution to bypass this issue using deep learning algorithms, such as architectures based on the U-Net Network. However, there is always a need to improve quality and accuracy by creating adaptive methods for each nucleus size during the segmentation task.
Clusters of nucleus/tissue: Microscopic and cellular samples are characterised by an uneven distribution of cells and nuclei, which results in some nuclei forming clusters. For the segmentation task, it is necessary to separate them and create a distance between them to avoid under-segmentation.
Treatment of damaged nucleus/tissue: In the case of moderately and adenocarcinomas, the cells and tissue structure are generally highly damaged, which places a considerable challenge to distinguish them and obtain good nuclei segmentation.
11. Conclusion
This study reviews previous research on the early detection of cancer using artificial intelligence techniques. The focus is on the microscopic aspect of this research. In other words, histopathological imaging is addressed, as it is believed to provide more detailed information about the extracted tissue. On the one hand, histological images are one of the medical imaging modalities that produce high-resolution colour images for cancer detection and classification. On the other hand, working with this modality is not an easy task due to its size, availability, and cost. That is why most of the aforementioned research utilised transfer learning. This article serves as a review of technologies utilised for early cancer detection and classification using this type of dataset. First, histological images as a medical imaging modality and the process of acquiring slides, from scanning to digitalisation, with colour normalisation are introduced. The various segmentation types and techniques, including approaches like the U-Net network and the Fully Convolutional Network, are described. Furthermore, some diagnostic and prognostic approaches are introduced. In this section, most studies are based on transfer learning approaches focusing on patches or whole-slide images (WSI), with examples such as ResNest-50, VGG-16, and VGG-19. Additionally, some datasets used to train models with these approaches are also introduced. Finally, some challenges and discussions in this field are outlined.
