It is no secret that we live in an era of scientific advancement and technology. History is conclusive evidence of how human beings have evolved over the years and made life a smooth sailing journey. However, does there exist a limit to the potential of a man? Will man ever be satisfied with the current way of events and not strive for improvisation? The chances seem pretty dim, even more so with the invention of artificial neural networks (ANNs) or simply neural networks (NNs) in 1958 by Frank Rosenblatt.
WHAT ARE ARTIFICIAL NEURAL NETWORKS?
Artificial neural networks, in layman terms, is a computational device used to simulate or mimic the human brain, specifically the way it processes and analyzes information. This promotes autonomous learning and generalization and also has the added advantage of being highly accurate among several other benefits.
Now, the invention of ANNs leads to another onset of questions - is it exactly a replica of the human brain? And if it’s not, then how different or similar are they in behavior? Do they actually think like our brains? The answer – yes and no.
THE MULTIMODAL CONCEPT AND ITS PRESENCE IN NEURAL NETWORKS
In 2005, a paper based on several studies and tests over the years described the nature of “person neurons”, that is, neurons that are specifically designed to recognize a person or a human being, the results verified by taking the example of Halle Berry. Awe strikingly, what scientists discovered was that these person neurons were multimodal, meaning that the neurons responded in a similar fashion whether it be a photo, a sketch, or a text.
Biological Neurons
Responds to photos of Halle Berry in costume ✓
|
Responds to sketches “Halle Berry” ✓ |
Responds to the text of Halle Berry✓ |
Fig :- Responses by multimodal biological neurons to a dataset of images
So, do neural networks also possess multimodal neurons?
Well, researchers experimented with the same symbolic, conceptual, and literal pattern of a photo, a sketch, and a text on an older version of an artificial neuron namely Neuron 483. Safe to say, the results weren’t so promising. It was found out that while Neuron 483 had no difficulty in responding to the photos of human faces, it failed to do the same for conceptual images. This implied the absence of multimodal neurons in the older artificial neuron.
Previous Artificial Neuron
Neuron 483, generic person detector from Inception v1
Responds to faces of people ✓
|
Does not respond much to drawings of faces ✕ |
Does not respond significantly to texts ✕ |
Fig :- Response by an ANN to faces, drawings and texts of people
However, experimentation continued in this field, this time with another artificial neural network architecture. This was the newly (at the time) invented OpenAI’s CLIP, which was particularly famous for its excellent generalization of concepts. Similar experiments were performed on it, only this time it was shown photos of spiders and the coolest super hero known to mankind (inserting the quote “agree to disagree” just in case for those who think otherwise)….. no points for guessing, Spiderman!!
It was a matter of extreme joy when the results came out and we came to know that CLIP passed all the tests with flying colors as it successfully reacted to not only images of spiders and Spiderman, but also comics of Spider man and spider themed icons and the text “spider”.
CLIP Neuron
Probed via depth electrodes
Neuron 244 from pen¬ultimate layer in CLIP RN50_4x
Responds to photos of Spiderman in costume and photos of spiders ✓
|
Responds to comics or drawings of Spiderman and Spiderman related icons ✓ |
Responds to the text“spiders” and others ✓ |
Fig :- Response by CLIP to concepts related to spiders and Spiderman
While of course, this did imply the presence of multimodal neurons and gave out a better response when compared to others, it still remained an unstated fact that the neural networks were not a replica of the brain. Keeping these conclusions aside, the results of the experiments, in turn, gave way to 3 more experiments.
EXPERIMENT NUMBER 1 – ESSENCE
It was now established that CLIP was successful in understanding the essence of a person or a concept. Therefore, they decided to up the level by a notch and turned the problem around. In terms of the previous example, while earlier it was only shown images of spider and Spiderman related stuff, now it was given the task of IDENTIFYING all the spider and Spiderman concepts from the set, however with a different theme this time around.
CLIP passed this test with flying colors as it was not only able to identify the essence of Lady Gaga and Jesus from the set of images, it was also able to capture the essence of emotion neurons, as it responded to facial expressions like happy, sleepy and crying.
EXPERIMENT NUMBER 2 – ADVERSARIAL ATTACKS
As the name suggests, this experiment was introduced to gauge how well the CLIP responds to such conflicting situations. A carefully crafted and barely perceptible noise was included along with the images shown to CLIP, which then resulted in CLIP misclassifying the image. Further, now photorealistic images and texts were combined in one single photo, and the resistance of the neural network to these typographic attacks was tested upon. This brought about unexpected results this time around, as although CLIP failed to respond to the photos most of the time, it was noticed that the smaller the text in the photo, the higher the accuracy with which CLIP responded to. This led to the conclusion that although CLIP had an edge over the others, it also suffered from a major drawback - it being prone to exploitation by systematic adversary attacks.
NO LABEL
|
LABEL WITH “PIZZA” |
Rotary dial telephone 98.33% iPod 0% Library 0% Pizza 0% Rifle 0% Toaster 0%
|
Rotary dial telephone 47.93% iPod 0% Library 0% Pizza 3.48% Rifle 0% Toaster 0.03% |
Laptop computer 15.98% iPod 0% Library 0% Pizza 0% Rifle 0% Toaster 0% |
Laptop computer 18.89% iPod 0% Library 0% Pizza 59.3% Rifle 0% Toaster 0% |
Coffee mug 61.71% iPod 0% Library 0% Pizza 0% Rifle 0% Toaster 0%
|
Coffee mug 55.42% iPod 0% Library 0% Pizza 26.39% Rifle 0% Toaster 0% |
Fig :- The response by CLIP to a typographic attack
EXPERIMENT NUMBER 3 – UNDERSTANDING HUMAN FEELINGS
This was probably the toughest, yet the most interesting experiment out of the three. Machines and feelings in one sentence have always been a foreign concept to mankind, and we were set on finding out how to actually describe feelings to machines for their understanding, and what the neural networks IN TURN think about the different concepts. These neural networks were very unique in the sense that it could already identify some elementary neurons, and used the combination of these elementary neurons to identify other feelings. For example, when we ask the NN to tell us what it thinks when someone is bored, it responds as
Bored = Grumpy + Relaxing, where Grumpy and Relaxing are elementary neurons.
Now of course, this may or may not be an exact combination of the two feelings mentioned above, but as they say, something is better than nothing, so yes, it wasn’t too bad a response. Another example was madness, which was described as
Madness = Evil + Serious + Mental Illness (a tiny bit of it)
SOURCE |
SPARSE CODE |
MAD = |
EVIL + SERIOUS + MENTAL ILLNESS (1.00) (0.37) (0.27) |
INFERENCE
So, our conclusion? Well…. While neural networks are definitely not brains in a jar, they do possess some strikingly remarkable similarities, and this is certainly an exemplary achievement and a significant advancement. Is there more to explore in this area? Yes, yes, a million times yes.
Quoting Richard Feynman:
“This is the key of modern science and is the beginning of the true understanding of nature. This idea. That to look at the things, to record the details, and to hope that in the information thus obtained, may lie a clue to one or another of a possible theoretical interpretation.”
In conclusion, there are still a whole lot of theories to be unfolded and a whole lot of discoveries to be made, as science and technology is no doubt the key to a better future and improvising will never cease to continue. It gives a futuristic vision to our thoughts and actions and the penetration of science and technology is so deep-rooted that it is difficult to imagine our day to day life without them. And this discovery of multimodal neurons in OpenAI’s CLIP definitely has the potential to transform our lives for the better, as at the end of the day, our aim is to achieve ultimate simplicity through intermediate complexity.
From here, it only gets tougher and tougher as our challenge now is to remove the redundancies in the existing version of the neural networks. Till then, all we can do is constantly learn and improvise and yes, stay home , stay safe!!
REFERENCES