Is the big model color blind?

Article source: Saibo Zen Heart

Image source: Generated by AIImage source: Generated by AI

Let’s start with the conclusion:

Most models are color blind

The vast majority of human information comes from visual input.

We use our eyes to see the rising sun, the bright moon, the lonely smoke in the desert, and the blue sea and Xiongguan. So, when we photographed the beautiful scenery and came to discuss it with the big model: Does the big model see the same as us?

Perhaps, what the big model sees is different from what we do.

So there was this test: Is the large model color blind?

When doing a physical examination, the doctor may take out a few pictures and ask you what the numbers are, like the one below.

Is the big model color blind?

This is Ishihara’s color blindness test chart, which consists of dots of multiple colors and forms multiple numbers: people with normal color vision can correctly distinguish them, while people with color blindness will make mistakes.

So, when we give these test charts to AI, let him take a look. Here are two of the most classic ones: one is that color blindness cannot see numbers (red and green color blindness misreads), and the other is that only color blindness can see numbers.

Is the big model color blind?

test A

Normal reading: 74

Red and green color blindness: 21

Is the big model color blind?

Test B

Normal reading: No numbers

Red and green color blindness: 5

Four tested parties were selected:

  • OpenAI’s GPT-4o
  • Claude(Anthropic)’s 3.5 Sonnet, via Claude
  • Gemini(Google) 2.0 (exp-1206)
  • GLM-4

Unified use of Prompt: Are there numbers in the picture? If so, what is it?

the first question

Is the big model color blind?

Normal reading: 74; red-green color blindness: 21

ChatGPT’s GPT-4o, correct answer

Is the big model color blind?

Claude’s 3.5 Sonnet, somewhat color blind

Is the big model color blind?

Gemini 2.0 (exp-1206), real-hammer red-green color blindness

Is the big model color blind?

Wisdom’s GLM-4, the answer is correct

Is the big model color blind?

Summary: For OpenAI and Intelligent Spectrum models, color vision is normal in this test. Gemini is red-green color blind, Claude doesn’t know what color blind he is

the second question

Is the big model color blind?

Normal reading: no numbers; red and green color blindness: 5

ChatGPT’s GPT-4o answered a 5 and was identified as hemi-color blindness.

Is the big model color blind?

Claude’s 3.5 Sonnet answered a 5 and was identified as hemi-color blindness

Is the big model color blind?

Gemini’s 2.0 (exp-1206) is nothing

Is the big model color blind?

Wisdom’s GLM-4, the answer is correct

Is the big model color blind?

Summary: In this test, only GLM-4 answered correctly.

concluded

Let’s start with the conclusion: Based on the above color blind sample test, Intelligram is better than most models in visual understanding.

OpenAI

Claude

Gemini

Intelligence Test A

❌❌✅测试 B

❌❌❌✅

No wonder he received White House panic certification: “Zhipu: Statement on being included in the U.S. Department of Commerce’s list of entities

Is the big model color blind?

Then, on the day it was added to the entity list, Intelligent Spectrum implemented a realtime API that benchmarked GPT-4o, empowering the hardware mouth and eyes, and is an end-to-end model with two-minute memory and ability to sing. It should be the strongest in China at present.

Understanding the model GLM-4V-Plus has also been fully upgraded (the GLM-4 on the web is also based on this when reading pictures), supporting the variable resolution function, saving more tokens! (For example, at a resolution of 224 * 224, the number of input image tokens is only 3% of the original), and it also supports lossless recognition of 4K ultra-clear images and extreme aspect ratio images.

Is the big model color blind?

Also, its video understanding model has been updated to support 2 hours of content:“Smart Spectrum Realtime, 4V, Air new models released, online bigmodel.cn

Of course, from a developer’s perspective, the most bragging thing is that the following four models are all free:

  • Language Model GLM-4-Flash
  • Image Understanding Model GLM-4V-Flash
  • Image generation model CogView-3-Flash
  • Video generation model CogVideoX-Flash

In the end, we have to say that this test is not rigorous at all, and we should also know that the principles of models and people looking at pictures are different, but it is very interesting: only when the large model observes the world like us can it be better. Serve us.

and… I also tested several other companies in China, but the results were not ideal. If you want to know the conclusion, you can take the pictures in the article and test it yourself and post it to the comment area.

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注