Is the big model color blind?
Article source: Saibo Zen Heart
Image source: Generated by AI
Let’s start with the conclusion:
Most models are color blind
The vast majority of human information comes from visual input.
We use our eyes to see the rising sun, the bright moon, the lonely smoke in the desert, and the blue sea and Xiongguan. So, when we photographed the beautiful scenery and came to discuss it with the big model: Does the big model see the same as us?
Perhaps, what the big model sees is different from what we do.
So there was this test: Is the large model color blind?
When doing a physical examination, the doctor may take out a few pictures and ask you what the numbers are, like the one below.
This is Ishihara’s color blindness test chart, which consists of dots of multiple colors and forms multiple numbers: people with normal color vision can correctly distinguish them, while people with color blindness will make mistakes.
So, when we give these test charts to AI, let him take a look. Here are two of the most classic ones: one is that color blindness cannot see numbers (red and green color blindness misreads), and the other is that only color blindness can see numbers.
test A
Normal reading: 74
Red and green color blindness: 21
Test B
Normal reading: No numbers
Red and green color blindness: 5
Four tested parties were selected:
- OpenAI’s GPT-4o
- Claude(Anthropic)’s 3.5 Sonnet, via Claude
- Gemini(Google) 2.0 (exp-1206)
- GLM-4
Unified use of Prompt: Are there numbers in the picture? If so, what is it?
the first question
Normal reading: 74; red-green color blindness: 21
ChatGPT’s GPT-4o, correct answer
Claude’s 3.5 Sonnet, somewhat color blind
Gemini 2.0 (exp-1206), real-hammer red-green color blindness
Wisdom’s GLM-4, the answer is correct
Summary: For OpenAI and Intelligent Spectrum models, color vision is normal in this test. Gemini is red-green color blind, Claude doesn’t know what color blind he is
the second question
Normal reading: no numbers; red and green color blindness: 5
ChatGPT’s GPT-4o answered a 5 and was identified as hemi-color blindness.
Claude’s 3.5 Sonnet answered a 5 and was identified as hemi-color blindness
Gemini’s 2.0 (exp-1206) is nothing
Wisdom’s GLM-4, the answer is correct
Summary: In this test, only GLM-4 answered correctly.
concluded
Let’s start with the conclusion: Based on the above color blind sample test, Intelligram is better than most models in visual understanding.
OpenAI
Claude
Gemini
Intelligence Test A
❌❌✅测试 B
❌❌❌✅
No wonder he received White House panic certification: “Zhipu: Statement on being included in the U.S. Department of Commerce’s list of entities》
Then, on the day it was added to the entity list, Intelligent Spectrum implemented a realtime API that benchmarked GPT-4o, empowering the hardware mouth and eyes, and is an end-to-end model with two-minute memory and ability to sing. It should be the strongest in China at present.
Understanding the model GLM-4V-Plus has also been fully upgraded (the GLM-4 on the web is also based on this when reading pictures), supporting the variable resolution function, saving more tokens! (For example, at a resolution of 224 * 224, the number of input image tokens is only 3% of the original), and it also supports lossless recognition of 4K ultra-clear images and extreme aspect ratio images.
Also, its video understanding model has been updated to support 2 hours of content:“Smart Spectrum Realtime, 4V, Air new models released, online bigmodel.cn》
Of course, from a developer’s perspective, the most bragging thing is that the following four models are all free:
- Language Model GLM-4-Flash
- Image Understanding Model GLM-4V-Flash
- Image generation model CogView-3-Flash
- Video generation model CogVideoX-Flash
In the end, we have to say that this test is not rigorous at all, and we should also know that the principles of models and people looking at pictures are different, but it is very interesting: only when the large model observes the world like us can it be better. Serve us.
and… I also tested several other companies in China, but the results were not ideal. If you want to know the conclusion, you can take the pictures in the article and test it yourself and post it to the comment area.