Deep learning to recognize and count green leafhoppers

Vineyards are a crop of great economic importance in Portugal, whose production of over 224 kha of vines may be affected by evolving global changes, as new pests arrive in greater numbers at more northern latitudes. Integrated pest management requires early recognition and assessment of pests to enable a proportionate response in control. Using yellow sticky traps to catch green leafhoppers in the vineyards under attack, we could use the image of the traps and deep learning methods to evaluate with high accuracy the number of insects presents and establishes a procedure to assess any number of traps in a short period of time. Implementation is possible with ordinary laptop computers and could contribute to more extensive and more frequent coverage in surveillance, since the human labor required to count hundreds of insects in each trap is reduced to seconds.


Background
In integrated pest management projects, the correct identification of species to allow rapid decision-making based on economic thresholds is critical in economically important crops. In recent years, several computer programs have been developed to automatically and quickly identify different groups of insects and have already achieved some progress, though the methodologies are still imprecise and/or difficult to apply with accuracy (e.g., Cho et al. 2007;Wang et al. 2012), and never settled for green leafhoppers.
Several species of leafhoppers (typhlocybinae) have been recorded through the world as pests of economically important crops, namely on agricultural fields, orchards, and greenhouses (Jacas et al. 1997;Torres et al. 1998;Delrio et al. 2001;Mazzoni et al. 2001;Alma 2002;Coutinho et al. 2015). The direct feeding process of those leafhoppers on leaves, whether through the direct damage which can result in sap loss, blockage of vascular tissue and characteristic hopperburn symptoms, or through their capacity to act as vectors for viruses and phytoplasmas (Ossiannilsson 1978;Raupach et al 2002), can lead to considerable losses in production and yield as well as significant control costs (Pollini & Bariselli 1995;Backus et al 2005).
Rapid monitoring of the main vineyard pests Jacobiasca lybica (Bergevin & Zanon), the Cicadellidae cotton jassid, and the cryptic Empoasca species would be important, but the need for human intervention to count captures is a bottleneck in the process, as the number of insects can easily reach many hundreds of individuals in a single trap and multiple traps are needed to make a robust and representative assessment. This is where the possibility of semiautomatic detection offers an advantage.
The detection of objects using deep learning techniques has been the subject of many developments in the last two decades, mainly due to its broad application

Open Access
Bulletin of the National Research Centre . What motivated our interest in deep learning algorithms was the limited success of conventional image processing techniques and the possibility of data augmentation inherent in deep learning. Data augmentation is the introduction of small randomized changes to train data, in the form of radiometric and geometric alterations: dimensions, translation, rotation, and shearing. This is very interesting because whatever the number of train images we use, we cannot guarantee that all the possible positions, sizes, and perspective views of the insects trapped are contemplated in the train set. The main objective of this work was to develop an easy and fast method that allows an accurate counting of Jacobiasca lybica collected recently or after a period of several days glued into sticky traps, when some morphological characteristics (wings, body contours, and eyes) are already lost.

Methods
The chromotropic sticky traps (20 × 22.5 cm) from Biosani (Palmela, Portugal) were placed in Alentejo region vineyards and weekly collected in August and stored in the laboratory until sorted. The images of the traps were acquired with a Canon-EOS-4000D, in the form of color images of 3 bands and 16 bits in a proprietary RAW digital format (CR2) and converted to TIF without compression in open-source software (IrfanView n. d.). Four steps of preprocessing were needed to ensure a homogeneous data set within criteria of radiometric and geometric quality, implemented in MATLAB environment. The first step consisted in the location and extraction of the area of interest (from all 168 images), followed by dimensions normalization, and tiling of a subset of 81 subimages of 1024 × 1024 × 3 pixels to have reasonable processing times in a laptop. If a black and white etiquette identifying the vineyard field at the upper right corner existed, which was the case for the front side of each trap, it was replaced by the local mean value of the background in each band red, green, and blue. Secondly, as some traps had small damaged areas which turned in saturated pixels (bright spots), to avoid any confusion sources the image was processed by regions to find pixels with intensity higher than the mean value plus two standard deviations measured in each region, and these pixels were replaced by a linear combination of bands, with a technique similar to the one used to process saturated pixels due to random noise (Sabins 1987).
The third necessary step aimed to compensate for nonuniform illumination with the application of a modified homomorphic filter (Mathworks 2013) to eliminate the vertical gradient observed in all images due to the setup used in image acquisition.
The last preprocessing step was to proceed to the histogram matching between all images and an image chosen as reference due to its good contrast and radiometric range. At the end of this process the whole data set had the same radiometric characteristics and the main causes of misperception had been eliminated.
To detect and count the green leafhoppers in the images was used one of the latest developments in one stage algorithms based on convolutional neural networks (CNNs), the 5th version since the introduction of the concept You Only Look Once (YOLO) in 2016 (Redmond et al. 2016). YOLO v5 was introduced in 2020 by a different developer and made publicly available in a GitHub repository (GitHub n. d.). To be used on any data set, it needs to go through a train stage consisting in annotate a considerable number of images, identifying all occurrences of the objects of interest belonging to each class that is intended to be detected.
Tools available online such as Makesense.AI (n. d.) allow to upload a data set, annotate it according to the goals, and download a set of output files with the annotations in a user defined format. YOLO v5 has another advantage over conventional algorithms, the possibility of transfer learning. Since many basic features are common to all detection problems (edges, contrasts, forms, etc.) an already heavily trained network can be used to implement a new problem. The new discriminators will define the last layers of the CNN, tuning the detector according to the details of the specific problem, while the basics defined by the first layers were robustly trained in big data sets like Common Objects in Context (COCO), with 80 classes and more than 200.000 images annotated.

Results
We used the model x of YOLO v5, keeping the hyperparameters by default, and annotated 24 images of a subset of 81 tiles with one class of objects of interest. Sixteen images were used for train and eight for validation, respecting the recommended 30% with a split 20-10% between train and validation. The train was done once and took a few hours (23.2 h for 448 iterations in a laptop equipped with dual Core Intel i7-10750H processor, 16 GB SDRAM and an NVIDIA GeForce RTX 2060), but the resulting weights can be used to detect the same objects of interest in the future on any similar image (Fig. 1), with a processing time of 2500 ms for each image.
Some parameters are available to fine-tune the inference results if necessary, the most useful being a confidence threshold that reflects the probability associated by the network to each detection of being a true positive. These values can be displayed with the classified image, to assess whether we should change the confidence threshold and in which direction: a value closer to one will limit the count to those objects detected with higher confidence (Fig. 2), while a lower value will also count those objects detected with a lower probability (Fig. 3) of being true positives. The results were evaluated by a human expert on the remaining 57 tiles after detection with a confidence threshold of 0.25, giving an overall precision of 0.9962 and a recall of 0.9598. Considering that the algorithm eliminates the subsampling normally applied when there is human intervention, the almost 96% of true positives detected constitutes an acceptable result, adding that the counting time was reduced to a few seconds.

Discussion
The methodology described demands access to an image processing tool able to preprocess the images, with programs that can be used by any user with basic informatic skills. MATLAB (Matlab n.d.) was used in this work, but there are similar options available online that could be used for the operations described, aiming to improve image contrast and equalize radiometric range, such as Image J (Image J n.d.). The deep-learning tool is open-source software and can be installed and run in an averaged personal laptop; although the train stage consumes a few hours of processing, and its preparation needs an expert to annotate a subset of images containing the objects of interest with an online tool, it can be done once for each species. The inference demands one line of commands and take around 2.5 s to produce two outputs: an image with all the occurrences marked and a numeric output including the total of objects detected for each class considered. The parametrization of the inference is the most demanding step in terms of human intervention, because several sets of parameters must be evaluated against each other observing the test images to localize false positives and false negatives.

Conclusions
We were able to find an accurate way to quantify green leafhoppers, specifically Jacobiasca lybica, in chromotropic traps that, with a minimum number of hours for image acquisition and preprocessing, after an initial train stage, can be counted in 2.5 s by trap, regardless of the number of insects captured, without any subjectivity and much less error-prone than a human operator. This is significant when compared to the human labor required for the number of traps needed to monitor several vineyards every week during the critical season. The overall procedure could eventually become semiautomatic; nonetheless, the annotation for train remains the part where human intervention cannot be easily replaced, but this is done once for each kind of insect and kept to next seasons. The integrated pest management projects could use this approach as a new standard, decreasing the time of response to control possible damages. The same procedure can be extended to other pests, becoming a valuable asset in cultures of high economic value that could be controlled more often and more extensively, increasing the likelihood of early detection and allowing for a rapid and proportional response.