Concerning microalgae strains and culture conditions in this study, two different strains of the marine dinoflagellate species Protoceratium reticulatum (Claparède & Lachmann) Bütschli 1885 (IO116-01 and IO116-02) were used both in the training and validation stage of the counting experiments. The cultures were obtained from the algae culture collection of the University of Lisbon (ALISU) and were grown under controlled laboratory conditions (Fitoclima 600PL, Aralab, Portugal), in L1 medium (Andersen et al 2005), at a salinity of 33, at 19 ± 1 °C under a 12:12 h light: dark cycle, 100–110 μmol photons m−2 s−1.
In what concerns sample preparation for cell counts, 3 ml culture samples were fixed with approximately 0.15 ml of Lugol’s solution (Karlson et al 2010). Immediately before filling the counting chamber samples were homogenized by gently rotating the flask 25 times. A sub-sample was then used to fill the chamber of a Palmer–Maloney counting slide (100 µl) (LeGresley and McDermott 2010). No dilution steps were used. The sample was allowed to settle for 5 min, and the chamber was placed under a stereo microscope at 10 × magnification (Zeiss Stemi 305, Germany). The images were then acquired through the eyepiece with a cell phone (Samsung M21, South Korea) equipped with a 48.0 MP camera (Samsung S5KGM1—f/2.0, 26 mm (wide), 1/2.0″, 0.8 µm). The image covered the whole area of the counting chamber, the equivalent of a100 µl culture sample.
For the image train process, a total of 6 images of 2250 × 4000 pixels at 24 bits were acquired from cultures in different phases of the growth curve to cover a variety of particle properties (e.g. range of cell sizes, cell debris, and thecal plates).
The pre-processing of the images was achieved in Matlab R2021a environment and consisted of four steps: first, a modified homomorphic filter was applied with a sigma of 11 to compensate for irregular illumination of the background; second, all images were histogram matched to one reference image, chosen for its ideal radiometric range; the third step consisted in producing a binary mask for each image with a global threshold, followed by morphological operations to consolidate the area of interest and eliminate surrounding structures included in the field of view (FOV); finally, this mask was applied to the processed image (Fig. 1).
The algorithm used is one of the latest developments in one-stage algorithms based on convolutional neural networks (CNNs), the 5th version since the introduction of the concept You Only Look Once (YOLO) (Redmon et al 2016). YOLO v5 was made publicly available in a GitHub repository in 2020 (GitHub Ultralytics n.d.). The algorithm has been retrained for the task using a transfer learning technique: since many basic features are common to all detection problems (edges, contrasts, forms, etc.), an already heavily trained network can be used to implement a new problem. The new discriminators will define the last layers of the CNN, tuning the detector according to the details of the specific problem. After the download and successful installation of YOLOv5, the algorithm was trained on our data set as described below. The set of images acquired was segmented into tiles of 800 × 800 pixels to accelerate training procedure, as it was our objective to use the most complete model of YOLOv5, the model x that uses a CNN with 476 layers.
The train demands a set of images with all the objects of interest identified with bounding boxes, and the respective list; the data set annotated in this way, called ground truth, is then split into train and validation subsets, between which the algorithm will converge to the best possible achievement in terms of precision (percentage of true positives correctly classified) and recall (percentage of true positives detected).
A metric usually considered in object detection applications is the mean average precision (mAP) that quantifies the stability and consistency of the model, within a confidence threshold related to the intersection-over-union (IoU) areas between the anchor boxes estimated from the train data and the bounding boxes predicted by the model in the annotated data. With a threshold of X, the box is assigned to an object of interest if the IoU quotient is above X and considered background in the opposite case. Non-maxima suppression ensures most multiple detections are avoided, by considering only the box with maximum probability in each set of overlapping boxes.
The annotation of a subset of images for train and validation purposes can be made with online tools, such as (Makesense.AI n.d.) used in the present work, with a user-friendly graphic interface. The images to be annotated (usually 30% of the images available, further split in 20% for train, 10% for validation) are uploaded to the site and the graphic tools available in the interface are used to draw boxes around all the objects of interest in each image, using zoom, correction, delete and pan functionalities. At the end, a text file for each image is exported in a user-defined format, with all the annotations (image coordinates for the boxes) made in that image. The images and corresponding text files are then distributed between validation and train, because YOLOv5 requires a fixed directory tree, with names that it will recognize to know where to find image and label files during the train stage. Once trained, the algorithm was applied to a test data set of 43 images of P. reticulatum cultures as described above. Results were assessed by manually verifying the false negatives and false positives in each image, and the performance of the model was evaluated based on precision and recall.