Tomatoes (Annotated)

Project with 66 annotated tomatoes (424 images)

Abstract

This article focuses on the benefits of using a deep learning model for tomato segmentation. We talked about all the stages of data collection for further neural network training. After testing, it was determined that through the use of AI, the segmentation rate of tomatoes increased with no loss in precision. Such post may be useful to specialists in agricultural activities.

Download

Direct download: tar archive (1.02 GB)

Keywords

  • AI
  • Deep Learning
  • Computer vision
  • Labeling
  • Automated
  • Tomato
  • Phenotype
  • Phenotyping
  • Segmentation

Statistics

Project contains 66 datasets with 424 images in it, with a total of 2560 annotated objects.

Introduction

The tomatoes are one of the most consumed fruits in the world. No wonder tomato cultivation is so popular among farmers. Demand for high quality tomatoes is only growing.

The vast majority of commercially available tomatoes are hybrids created by crossing two cultivars. These tomatoes take the best qualities from their parents: they are disease resistant, have good taste and growth qualities. One of the main tasks of the manufacturer is to improve the quality and safety of grown products. Phenotyping methods are widely used for selection and development of new cultivars with desired properties. In tomato breeding, fruit phenotypes are important agronomic traits as a reference.

Tomato annotation is a crucial part of the research process, as it allows scientists to gain insights into the tomato’s genetic makeup, nutritional value, and other properties. Our goal is to teach the Deep Learning system by segmenting images to automatically determine the external and internal structure of a tomato, as well as automate the process of collecting information about the size and color of tomatoes. In the long term, this should facilitate and speed up the phenotyping process. Traditional measurement methods based on manual labeling limit the effective collection of data on the morphology of tomato fruits. The risk of making a mistake by manual labeling is higher. In addition, manual labeling is more difficult and time-consuming than computer vision technologies.

The basis for the creation of this post was the original scientific research article "Quantitative Extraction and Evaluation of Tomato Fruit Phenotypes Based on Image Recognition", taken from an open source.

Research methods

First of all, we needed to get tomatoes of different cultivars. This is important so that AI learns to label up completely different structures of tomatoes and does it correctly. The size of the tomato, the number of locules, the thickness of the pericarp and many other parameters differ from cultivar to cultivar. Perhaps in the future we can train an AI to recognize between tomato cultivars and tag them.

Before we started photographing tomatoes, we generated a QR-code with parameters of 6.9 cm x 6.9 cm, which will be its data. Later on, it will help to calculate the area of an object in the photos and straighten the photos themselves.

During the entire process of photographing the camera was on a tripod in a fixed position. The light was positioned so that there was almost no shadow of the object. This is useful so that when labeling a tomato with a smart brush tool, the selection of an object is more accurate. Additionaly, we placed a pad under the tomato in order for it to stay flat.

Before shooting, all the tomatoes were washed. Also, each part of the tomato was wiped with a dry napkin in order to avoid glare in photographs. Glare significantly complicates the labeling process.

Photo No.1: The tomato must be positioned so that its navel / sepals look up.

Photo No.2: The tomato must be turned upside down.

Next, we made a vertical cut in the middle of the tomato.

The tomato must stay in the same position.

Photo No.3: The left side of a vertical slice of a tomato.

Photo No.4: The right side of a vertical slice of a tomato.

Then we connected the left and right parts of the tomato together and made a horizontal cut.

Photo No.5: Connected left and right parts of a tomato with a navel.

Photo No.6: Connected left and right parts of a tomato with a top(point).

It is important to follow the same sequence when cutting and photographing tomatoes. This will make it easier to organize photos.

From one tomato, we got 6 photos, except for those tomatoes that had sepals, in this case there were 7 photos.

Data organization on the platform

To systematize all the tomatoes, a project was created with many datasets. In each of the datasets, photos of tomatoes are presented in the same sequence. All photos were large in size.

Each dataset contains all photos of one tomato of a particular cultivar.

To designate the fruit cultivar, we used a tag system. We also used a tag system to distribute images by angles. That way, we got:

  • Photo №1: top
  • Photo №2: bottom
  • Photo №3: vertical cut left
  • Photo №4: vertical cut right
  • Photo №5: horizontal cut top
  • Photo №6: horizontal cut bottom

                                                                              

If you need to calculate the area of the tomatoes and are not sure about the straightness of your camera angle or stability of your pictures, you need to use this APP “Perspective transform using QR-code” to straighten the photos before starting the labeling process. In our case, this was not necessary.

After we made sure that the photos were uploaded in the correct order, we started labeling objects

                                                                              

Labeling process

First, we needed to create classes with which we will label certain parts of the tomato. It was important to choose a tool with which to mark one or another object. Choosing the right tool will help you complete the labeling faster and clearer. Also, we made sure to have diverse colors so that the parts of the tomato did not merge with each other.

Layering

It is important to note that labeling is done by stacking layers, where the top layers cut out the bottom ones.

How it works:

To begin with, we started the labeling with the parts that take the most time. It is important to point out that we labeled them, paying close attention only to those gaps that are not in contact with the parts that we are labeling on top. Thus, the placenta and septum are labeled with circles, and the locules, which are the topmost layer, are accurately labeled along the borders, thereby cutting off the excess selection.

Large parts of the tomato, such as the whole tomato and the pericarp, were annotated with a smart tool, as it defines objects against the background quite accurately. The annotation of more detailed objects was done by the polygon tool.

You can find more about the structure of tomatoes here

Step 1: With a smart tool 0, we selected the whole tomato in all the photos.

Step 2: After we copied the tomato layer in the slice photos and repeated it, changing the class to pericarp (click on the layer and tap ctrl+c/ctrl+v). On the pericarp layer, the outer parts of the fetus were removed.

Step 3: Label the navel with a smart tool. If the tomato also has a sepal, it can be labeled with a smart tool.

Step 4: We selected the placenta with the polygon tool 8.

Step 5: We selected the septum with the polygon tool.

Step 6: In the photos of the vertical sections of the tomato, we have identified the core.

Step 7: We selected the locules with the polygon tool. It's important to select the locules carefully because this is our topmost layer.

Step 8: Сolumella we labeled only on those photos where it was visible.

The most important thing is that at maximum opacity, the labeling should match the parts of the tomatoes.

Post-Processing and Analysis

After all the tomatoes were labeled, we used the APP “Rasterize objects” to get rid of the overlaying. Therefore, the upper layers were cut out the lower. You can learn more about how this app works HERE.

Conclusion

Thus, this article claims that the Supervisely team has made progress in automating the collection of data on tomato phenotypes. Deep Learning methods can effectively help human labor, saving time and budget. Taken together, our results are consistent with the conclusion of the article "Quantitative Extraction and Evaluation of Tomato Fruit Phenotypes Based on Image Recognition". The measurement accuracy was comparable to manual labeling, and the time spent on labeling was significantly reduced. Tomato annotation also helps to create accurate models for predicting the tomato’s growth and yield. This can help farmers to increase their crop yield and improve the quality of their tomatoes.Received consistent and quantitative data on the phenotype can also help in genomic, metabolomic, and transcriptomic studies of the tomato. It is worth noting that this technique can be extended to the phenotyping of other fruit crops.

License

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

If you use this dataset, cite us:

@misc{ Computer Vision Dataset,
    title = { Tomatoes Annotated },
    type = { Open Source Dataset },
    author = { Supervisely },
    howpublished = { \url{ https://ecosystem.supervisely.com/projects/supervisely-tomatoes-research } },
    url = { https://ecosystem.supervisely.com/projects/supervisely-tomatoes-research },
    journal = { Supervisely Ecosystem },
    publisher = { Supervisely },
    year = { 2023 },
    month = { mar },
}