New computer vision model

We’ve released a new computer vision model for iNaturalist. This is our first model update since April 2022. The iNaturalist website, mobile apps, and API are all now using this new model. Here’s what’s new and different with this change:

  • It includes 60,000 taxa (up from 55,000)
  • It was trained using a different approach than our previous models, which made it much faster to train

To see if a particular species is included in this model, you can look at the “About” section of its taxon page.

It’s bigger

Our previous model included 55,000 taxa and 27 million training photos. The new model was trained on over 60,000 taxa and almost 30 million training photos.

It was trained using a transfer learning strategy

During previous training runs, our strategy was to train the entire model on the dataset. This means that all of the model weights were candidates for being updated, in order to learn the most efficient and useful visual features for making suggestions for the taxa in that dataset. When training this model, we froze most of the model weights (thereby freezing the visual feature extraction) and only trained the very last layer of the model, the layer that makes the taxa suggestions. This is a machine learning strategy known as transfer learning.

One way to think about this is to imagine that someone was asked to learn all about different kinds of cars. Later, that person was asked to differentiate between two different kinds of pickup trucks, but only using distinguishing characteristics they learned from their study of cars (for example, color, size, visual shape, branding, engine size, etc), without learning anything new about pickup trucks (for example bed capacity, towing limits, etc). Chances are, that person could distinguish between most kinds of trucks without needing to learn anything new specifically about pickup trucks. They may not perform as well as someone who learned about trucks from the beginning, but they have strong foundational knowledge to draw upon for the task.

Our new model was trained using a transfer learning strategy. We used the internal weights and visual features from our previous model which was trained on 55,000 taxa. The advantage of this approach is that we didn’t need to learn all of those internal model weights and visual features again, so training was quite a bit faster. It’s only been four months since our last model was released, which is the shortest time between model releases so far.

As with the pickup truck analogy, it could be that this model trained with the transfer learning approach is slightly less accurate overall than if we had trained the entire model again. However, in our testing this new model appears to achieve nearly the same accuracy as the previous model while containing more taxa. Our plan going forward will be to spend the time fully training a model about once a year to maximize accuracy with new photos and taxa, and to use the faster transfer learning approach in between full training runs so we can release models more frequently than we have in the past.

Future work

First, we are still working on new approaches to improve suggestions by combining visual similarity and geographic nearness. We still can’t share anything concrete, but we are getting closer.

Second, we’re still working to compress these newer models for on-device use. The in-camera suggestions in Seek continue to use the older model from March 2020.

We couldn't do it without you

Thank you to everyone in the iNaturalist community who makes this work possible! Sometimes the computer vision suggestions feel like magic, but it’s truly not possible without people. None of this would work without the millions of people who have shared their observations and the knowledgeable experts who have added identifications.

In addition to adding observations and identifications, here are other ways you can help:

  • Share your Machine Learning knowledge: iNaturalist’s computer vision features wouldn’t be possible without learning from many colleagues in the machine learning community. If you have machine learning expertise, these are two great ways to help:
  • Participate in the annual iNaturalist challenges: Our collaborators Grant Van Horn and Oisin Mac Aodha continue to run machine learning challenges with iNaturalist data as part of the annual Computer Vision and Pattern Recognition conference. By participating you can help us all learn new techniques for improving these models.
  • Start building your own model with the iNaturalist data now: If you can’t wait for the next CVPR conference, thanks to the Amazon Open Data Program you can start downloading iNaturalist data to train your own models now. Please share with us what you’ve learned by contributing to iNaturalist on Github.
  • Donate to iNaturalist: For the rest of us, you can help by donating! Your donations help offset the substantial staff and infrastructure costs associated with training, evaluating, and deploying model updates. Thank you for your support!
Posted on 19 de agosto de 2022, 12:43 AM by alexshepard alexshepard

Comentários

Excellent! Thank you for making these continuous improvements to the process, and thank you for sharing this concise update. Well done!

Publicado por tsn mais de 1 ano antes

Wow! Thanks for sharing & thank you staff!

Publicado por gatorhawk mais de 1 ano antes

Awesome! Thanks for the explanation of how this works and even more thanks for making it work in the first place :)

Publicado por earnoodles mais de 1 ano antes

Very nice work! I'm excited to see more updates. I love this use of this technology and love to see how its evolving.

Publicado por roachiecanada mais de 1 ano antes

Impressive, very nice. Glad to see continued updates to the infrastructure.

Publicado por kemper mais de 1 ano antes

In which countries or continents will this new model perform better then the old model? Can you say that the 5.000 added new species are mainly from the continent Asia or that the improvement is about insekts, worms or centipeds?

Publicado por optilete mais de 1 ano antes

Congratulations!

Publicado por prokhozhyj mais de 1 ano antes

I want to highlight this discussion in the iNat-Forum: https://forum.inaturalist.org/t/possible-increase-in-cv-errors-around-organism-range-location/34411

I for my part noticed (without being aware of a new model released) less accurate suggestions lately, with especially the 'seen nearby' species disappeared - maybe the new model is putting regions with fewer observations at a disadvantage, and favors especially North American species?

Publicado por carnifex mais de 1 ano antes

Good!

A question: is it possible, for the future, to use the shared computing power of users (those who voluntarily make themselves available) to train the entire model on the dataset, as is done, I believe, in astronomy, to manage large masses of data?

As reference: https://boinc.berkeley.edu/

Publicado por valentino_traversa mais de 1 ano antes

Nice and important improvement! Congratulations!

Publicado por valeriosbordoni mais de 1 ano antes

Please - can we have a blog post, or just a link to explore - for the 5K new species?

How many for Africa? How many plants?
(With those gifted graphics you have brought us before?)

Publicado por dianastuder mais de 1 ano antes

Excellent!

Publicado por wildlife13 mais de 1 ano antes

That´s great news! Would also like to know a bit more about those new additions, if this is possible.

Publicado por ajott mais de 1 ano antes

This is fast!

Publicado por sedgequeen mais de 1 ano antes

This is great! iNaturalist is truly the perfect example of how technology and nature can come together to create something amazing! I would love to see a list of what species have been given the honor of being added! I hope they were a lot of hexapods..! But I don't wish to pressure anyone :)

Publicado por timo27 mais de 1 ano antes

I know at least one new CV-species 😄

Publicado por carnifex mais de 1 ano antes

Thank you for the update!

Publicado por silaseckhardt mais de 1 ano antes

Have the "Included" labels on species "About" pages been updated?

Publicado por dan_johnson mais de 1 ano antes

Thanks so much! Been waiting for these small updates that help a ton!

Publicado por yayemaster mais de 1 ano antes

https://www.inaturalist.org/taxa/260419-Panopeus-herbstii meets the 100 observation mark but isn’t included.

Publicado por yayemaster mais de 1 ano antes

HELP!! I have been searching for "how to's" on this site - and while the drop down menu says "video tutorials" and other invitations, when you open it up it says "this page does not exist". I am so grateful for this tool but after trying to use it for a few years, I still am not proficient and would sincerely appreciate being able to learn from somebody who is proficient. THANKS again for this amazing tool and all your work.

Publicado por kimnoreen mais de 1 ano antes

Could we please get the new species list? It might help people see any misidentifications that they may have had.

Publicado por yayemaster mais de 1 ano antes

Yes the new species list would be fascinating. :)

Publicado por wildlife13 mais de 1 ano antes

https://www.inaturalist.org/taxa/260419-Panopeus-herbstii meets the 100 observation mark but isn’t included.

In April, when the export for this model was created and training was started, there were only 96 verifiable observations and only 28 research grade observations of this taxon, so it's possible that it was under one of the taxon cutoffs.

Publicado por alexshepard mais de 1 ano antes

I for my part noticed (without being aware of a new model released) less accurate suggestions lately, with especially the 'seen nearby' species disappeared - maybe the new model is putting regions with fewer observations at a disadvantage, and favors especially North American species?

@carnifex - the new model was released just a few hours before this blog post was published, so none of the observations mentioned in that forum post would have been affected by the new model.

Publicado por alexshepard mais de 1 ano antes

A question: is it possible, for the future, to use the shared computing power of users (those who voluntarily make themselves available) to train the entire model on the dataset, as is done, I believe, in astronomy, to manage large masses of data?

@valentino_traversa - unfortunately, computer vision training is not (to my knowledge) modular and granular the way that many scientific computing jobs are.

Publicado por alexshepard mais de 1 ano antes

Have the "Included" labels on species "About" pages been updated?

@dan_johnson - yep they are automatically updated when the model goes live.

Publicado por alexshepard mais de 1 ano antes

Whoop whoop!

Publicado por muir mais de 1 ano antes

Awesome stuff y'all! Thanks for sharing!

Publicado por tristonli mais de 1 ano antes

Great stuff, folks!

Publicado por radrat mais de 1 ano antes

Great stuff, and seems like a smart strategy!

Publicado por deboas mais de 1 ano antes

Thanks once again for everyone's hard work!

Publicado por susanhewitt mais de 1 ano antes

Magnificent! Wonderful! Amazing! :) Always love to hear about these updates.

Publicado por sambiology mais de 1 ano antes

Nice! Is there a way to see all the new species included in this model?

Publicado por torgos216 mais de 1 ano antes

I'm happy to see the new ants added, hopefully even more will come with the next round as IDs continue going through.

Publicado por arman_ mais de 1 ano antes

Geographical inclusion is necessary.Many times the computer suggestions are faulty because the species is not found in that region.I think that should be the way to go.
Any way great progress so far.congratulations.

Publicado por satishnikam mais de 1 ano antes

Thanks, and the plain-language explanation of transfer learning is appreciated!

Publicado por janetwright mais de 1 ano antes

@alexshepard @valentino_traversa
I would be happy to contribute some processing power to distributed deep learning as well.

It seems like someone has worked on that topic. Here is a paper from 2021 I found: https://arxiv.org/pdf/2103.08894.pdf

Publicado por hedaja mais de 1 ano antes

Are males and females of dimorphic taxa learned separately?

Publicado por trichopria mais de 1 ano antes

@trichopria, no they are not, and neither are egg, larva, pupa, and adult of insects that have complete metamorphosis.

Publicado por susanhewitt mais de 1 ano antes

Nor are flowers, seeds, leaves, trunks, tubers, etc. - but it is an interesting idea for the future!

Publicado por deboas mais de 1 ano antes

That's good to hear. The computer learning is one of the factors that keeps me motivated to put in so many hours photographing and editing photos to provide the best photos I possibly can for the learning models. I am always curious if anyone else puts in as many hours as I do every day to get the best possible taxon photos.

Publicado por royaltyler mais de 1 ano antes

Although I suspect most of us who post thousands of photos do sometimes crop our pictures to make identification easier, sounds like you, @royaltyler , do a much better and more consistent job of making the photos as good as possible. That's great!

Publicado por sedgequeen mais de 1 ano antes

Super good news! Transfer learning FTW!

Publicado por dgilperez mais de 1 ano antes

Adicionar um Comentário

Iniciar Sessão ou Registar-se to add comments