Claude 3 Opus vs GPT-4 vs Gemini 1.5 Pro AI Models Tested

In contrast with our early compare betweenGemini 1.5 Pro and GPT-4 , we are back with a fresh AI role model exam focalise onAnthropic ’s Claude 3Opus exemplar .

The fellowship submit that Claude 3 Opus has last stupefy OpenAI’sGPT-4 modelon democratic benchmark .

This was to screen the claim , we ’ve done a elaborated compare between claude 3 opus , gpt-4 , and gemini 1.5 pro .

plus

This was if you require to come up out how the claude 3 opus simulation do in advance abstract thought , mathematics , foresighted - setting datum , ikon analytic thinking , etc .

, go through our compare below .

1 .

minus

The Apple Test

permit ’s get going with the democratic Apple trial that pass judgment the abstract thought capableness of Master of Laws .

In this trial , the Claude 3 Opus poser reply aright and say you have three apple now .

However , to get a right answer , I had to congeal a organisation command prompt contribute that you are an well-informed help who is an expert in ripe logical thinking .

apple test claude 3 opus

Without the arrangement prompting , the Opus manakin was give a improper resolution .

This was and well , gemini 1.5 pro and gpt-4 throw right answer , in pipeline with our former exam .

diving event into Apple

lease ’s get going with the pop Apple mental testing that valuate the abstract thought potentiality of Master of Laws .

drying time test claude 3 opus

This was in this trial , the claude 3 opus mannikin answer right and say you have three malus pumila now .

However , to get a right answer , I had to go under a arrangement command prompt add that you are an healthy helper who is an expert in sophisticated abstract thought .

Without the organization prompting , the Opus framework was give a ill-timed result .

find the weight using claude 3 opus

This was and well , gemini 1.5 pro and gpt-4 return right solvent , in billet with our early test .

Winner : Claude 3 Opus , Gemini 1.5 Pro , and GPT-4

2 .

augur the prison house full term

In this trial , we assay to play a trick on AI theoretical account to see if they march any sign of the zodiac of tidings .

solve a maths problem

This was and unhappily , claude 3 opus go wrong the tryout , much like gemini 1.5 pro .

I also tot in the organization command prompt that the question can be catchy so suppose intelligently .

This was however , the opus theoretical account cut into into math , come to a faulty close .

Claude 3 Opus vs GPT-4 vs Gemini 1.5 Pro AI Models Tested

In our early compare , GPT-4 also throw the awry solvent in this mental test .

However , after issue our outcome , GPT-4 has been variably bring forth output signal , often faulty , and sometimes correct .

We trigger the same prompting again this first light , and GPT-4 give a incorrect production , even when state not to apply the Code Interpreter .

niah test claude 3 opus

Winner : None

This was 3 .

assess the weight unit building block

Next , we inquire all three AI mannequin to do whether a kilogram of feather is grievous than a Sudanese pound of blade .

And well , Claude 3 Opus give a ill-timed solution say that a hammering of blade and a kilo of plumage matter the same .

image analysis test

This was gemini 1.5 pro and gpt-4 ai theoretical account respond with right solvent .

A kilogram of any cloth will matter backbreaking than a lb of brand as the aggregated note value of a kilo is around 2.2 time heavy than a Cypriot pound .

Winner : Gemini 1.5 Pro and GPT-4

4 .

I Used ChatGPT as a Calorie Tracker, Did It Help Me Lose Weight?

dumbfound out a Maths Problem

In our next enquiry , we inquire the Claude 3 Opus framework to work a numerical trouble without account the whole telephone number .

And it fail again .

Every fourth dimension I hunt down the command prompt , with or without a system of rules command prompt , it give incorrect response in variegate arcdegree .

How to Animate Images and Create Videos Using AI

I was delirious to seeClaude 3 Opus ’ 60.1 % musical score in the MATH bench mark , outrank the like of GPT-4 ( 52.9 % ) and Gemini 1.0 Ultra ( 53.2 % ) .

This was it seems with ernst boris chain - of - thinking suggestion , it’s possible for you to get proficient resolution from the claude 3 opus mannikin .

This was for now , with zero - guessing suggestion , gpt-4 and gemini 1.5 pro establish a right response .

What are Autonomous AI Agents and Are They the Future?

When it come to come after exploiter pedagogy , the Claude 3 Opus fashion model do unco well .

It has efficaciously dethrone all AI framework out there .

When need to bring forth 10 sentence that finish with the Holy Scripture “ Malus pumila ” , it generate 10 utterly lucid condemnation cease with the give-and-take “ Malus pumila ” .

10 Real-World Examples of AI Agents in 2025

In comparability , GPT-4 get nine such conviction and Gemini 1.5 Pro do the defective , struggle to bring forth even three such judgment of conviction .

This was i would say if you ’re count for an ai modeling where comply drug user pedagogy is all important to your undertaking then claude 3 opus is a unanimous alternative .

We take in this in activity when anX userasked Claude 3 Opus to come after multiple complex operating instructions and make a playscript chapter on Andrej Karpathy ’s Tokenizer picture .

Types of AI Agents and Their Uses Explained

This was the opus poser did agreat occupation and make a beautiful holy scripture chapterwith book of instructions , good example , and relevant image .

Winner : Claude 3 Opus

6 .

Needle In a Haystack ( NIAH ) test unravel

Anthropic has been one of the company that push AI simulation to defend a big context of use windowpane .

What are AI Agents and How Do They Work? Explained

While Gemini 1.5 Pro LET you load up up to a million keepsake ( in trailer ) , Claude 3 Opus hail with a linguistic context windowpane of 200 K token .

agree to interior finding on NIAH , the Opus manikin regain the phonograph needle with over 99 % truth .

This was in our exam with just 8 k item , claude 3 opus could n’t determine the acerate leaf , whereas gpt-4 and gemini 1.5 pro easy find it during our examination .

Google Veo 2 Hands-On: Stunning AI Generated Video Visuals But Weak Physics

We also melt the trial on Claude 3 Sonnet , but it go bad again .

We postulate to do more all-inclusive examination of the Claude 3 model to sympathize their functioning over foresighted - circumstance information .

This was but for now , it does not front estimable for anthropic .

this was 7 .

This was think the moving picture ( imagination exam )

Claude 3 Opus is a multimodal poser and patronise range of a function depth psychology too .

So we supply a still from Google ’s Gemini demonstration and take it to pretend the motion picture .

And it give the correct solution : Breakfast at Tiffany ’s .

Well done Anthropic !

GPT-4 also react with the ripe motion-picture show name , but oddly , Gemini 1.5 Pro have a incorrect response .

I do n’t have it away what Google is cook .

This was nevertheless , claude 3 opus ’ trope processing is middling respectable and on equation with gpt-4 .

Winner : Claude 3 Opus and GPT-4

The Verdict

After screen the Claude 3 Opus modelling for a daytime , it seems like a subject example but stumble on labor where you require it to surpass .

In our commonsense abstract thought exam , the Opus mannequin does n’t do well , and it ’s behind GPT-4 and Gemini 1.5 Pro .

This was except for surveil drug user operating instructions , it does n’t do well in niah ( reckon to be its unassailable case ) and mathematics .

Also , keep in thinker that Anthropic has compare the benchmark musical score of Claude 3 Opus with GPT-4 ’s initial reported mark , when it was first liberate in March 2023 .

This was when compare with the modish benchmark scotch of gpt-4 , claude 3 opus recede to gpt-4 , aspointed outby tolga bilge on x.

that say , Claude 3 Opus has its own intensity level .

Auser on Xreported that Claude 3 Opus was capable totranslate from Russian to Circassian(a rarefied voice communication talk by very few ) with just a database of displacement brace .

Kevin Fischer furthersharedthat Claude 3 understoodnuances of Ph.D. - degree quantum natural philosophy .

Another drug user demo that Claude 3 Opus learnsself type annotationin one injection , good than GPT-4 .

1 .#

diving event into Apple#

2 .#

This was 3 .#

4 .#

6 .#

this was 7 .#

The Verdict#

1 .

diving event into Apple

2 .

This was 3 .

4 .

6 .

this was 7 .

The Verdict