The Ultimate AI Battle!

📅 Published:
2025-06-27 11:33:36 (1751024016)

👍 Likes:
91464

🎮 Channel:
Mrwhosetheboss

🌍 Country:
United Kingdom (GB)

AI-Generated Summary

Four leading AI chatbots—ChatGPT, Google Gemini, Perplexity, and Grok—were tested across various tasks to evaluate accuracy, speed, and usability. ChatGPT excelled in consistency, versatility, and user interaction, scoring the highest (29/40). Grok surprised as the quickest and often accurate (27/40), despite its $30/month price, while Gemini (22/40) showed strengths in integrations but lagged in speed. Perplexity (19/40) struggled with accuracy and response time, despite its focus on sourcing. ChatGPT emerged as the best overall choice for most users, offering reliability and comprehensive features at $20/month.

📜 Full Transcript

on the table right now are four of the same phone the first is loaded with chat GBT second is Google Gemini third is Perplexity which prides itself on giving accurate and trusted answers to any question and finally Grock which is trained on data from X and so my guess is that it’s going to be a lot more unfiltered these are for the average consumer the four best AI chat bots you can get but you’re only going to need one of them so which one is the most accurate which one is the fastest which one should you be paying for to make your life easier let’s kick things off with some problem solving so I drive a Honda Civic 2017 how many of the Aerolite 29in hard shell and these are all the dimensions suitcases would I be able to fit in the boot oh my goodness each one is literally given paragraphs and paragraphs of reasoning especially Grog what on earth is this by the way we have actually tested this ourselves in person and the correct answer is two if you want to actually be able to close that boot door so with that in mind I’d say both Chad GBT and Google Gemini have the right idea they both say that you could theoretically fit three but in practice more likely two perplexity is just straight up wrong it says three and maybe four if you arrange them efficiently and then I would actually argue Grock has the best answer because this guy just says two with complete confidence no messing around i want to make a cake this is what I have and then let’s attach a photo of well four ingredients that I definitely should be using and then one dehydrated porchini mushrooms that it definitely shouldn’t so oh my god this is interesting every AI thinks this jar of mushrooms is something different chachit thinks it’s ground mixed spice gemini thinks it’s crispy fried onions perplexity thinks it’s instant coffee and it’s only Grock that correctly identifies it as dried mushrooms and also correctly makes the decision to not put those mushrooms into the cake now for a use case that I was actually trying to do myself 2 days ago i want to have a Mario Kart World tournament with my friends Sam and Sun make me a document that we can use to track who’s winning h so each assistant has understood what I’m asking they’ve all made little boxes with blanks where the scores could hypothetically go but none of them’s made it particularly easy for me what I wanted was for them to generate and attach an editable document that I could simply just download onto my phone and start writing in immediately i feel like with these kinds of responses it would be easier for me to just whip up a spreadsheet on my own all right what about some not so basic maths what’s pi times the speed of light in km/h okay so the answer is 3.39 billion km/h notice interestingly that Gemini and Grock who both fully spell out the number do actually come out with slightly different answers to each other it’s just because of how they’re rounding the previous numbers in their calculations but I wouldn’t say either is enough to be wrong and then question five if I’m saving $42 a week how many until I can afford a Switch 2 in the US and go oh this is nice very cool that each one of them tackles the question strategically starting by first identifying that the Switch 2 is priced at $449 and then dividing that by the 42 that you earn each week to find that correctly 11 is how many weeks you would have to wait points all round so out of five possible points so far that is three to chat GPT three to Gemini two to perplexity and four to Grock not actually what I expected but translation is what’s going to test the harder skill since it requires an even deeper understanding of language translate the following into English and okay there is some variance here but I wouldn’t go as far as to say that any of them have got it wrong each is some variation of I’m never going to give you up i actually quite like how simple and to the point the Gemini answer is not a single unnecessary word but let’s take this challenge to the absolute maximum by filling the sentence with homonyms essentially words that are spelled the same but mean different things so translate the following into Spanish i was banking on being able to bank at the bank before visiting the riverbank okay so this one doesn’t have one exact right answer since it is so complicated but we have gone through four independent native Spanish speakers to triangulate the best answers and they have all said that Chat GBT and Perplexity handled this incredibly well gemini was good enough to scrape the point and then Grock translated the sentence too literally in a way that doesn’t really make sense 5545 so so far there really isn’t much between these four but now we’re going to test one of the most important use cases of AI for me which is product research how much can I trust each of these to recommend things can I trust that they’ve been thorough enough to understand the entire breadth of what’s out there before coming back to me with supposedly what is the best thing for me let’s start simple i’m looking for a good pair of earbuds oh look at this this is a classic AI trap so Chad GBT correctly suggests the Sony WF100XM5s it’s a good choice perplexity does the same and so does Grock but Google has literally just imagined a pair of earphones that at least at the time of filming this video does not exist the WF100XM6s have not been announced or released but it’s talking about them like they are the widely regarded king of earphones so let’s add to that i need them in red also bear in mind that I am keeping track of how long each one takes to answer but we’ll get to that at the end oh dear this is absolute chaos so let’s just go one by one chat GBT is just like I don’t want to deal with you right now here have a couple of decent options i mean the last one isn’t even red that’s pink so you don’t get the point gemini is recommending the Beats Fit Pro which at least for the latest version of that product doesn’t come in red so you’re not having one either perplexity more like stupidity right now thinks I am asking about the cake from earlier and has recommended how I can get each of my pictured ingredients in red packaging which is so far wrong that I am tempted to give it negative points and then Grock is the only one that has actually recommended three at least decently rated actually red pairs of earphones well look at that grock’s in the lead that was not on my bingo card for today and now as if they needed it let’s complicate it even further they also need to have active noise cancellation and be under $100 i’m curious to see if this brings any of the lost ones back on track or if they just get even more lost so Chad is recommending the Beat Studio Buds which do actually fit all the criteria so I’ll accept that gemini has just done the exact same thing again the Soundcore Space A40s which it says come in Garnet Red when I know they don’t perplexity is well I’m glad not talking about cakes anymore has lost the fact that we’re looking for red earphones so this is wrong and then Grock was doing so well the first two suggestions are good but then it falls into the same trap it recommends these earphones from sound petats which don’t exist in red this feels like a pretty good lesson AI in general is not yet good enough at product research to be able to rely on it and the problem is that it gives you wrong answers with the exact same level of certainty as it gives you right answers maybe that something for them to work on a sort of certainty score for how thoroughly verified the thing that it’s telling you is what if we now specifically try to confuse these guys by adding another requirement that’s just silly like under $10 will the AIS admit that such a product doesn’t exist or just make something up to appease you right good to see that ChatGpt Gemini and Grock each acknowledge that $10 is too tight for what we’re looking for and that it ain’t happening harsh but that’s a lot better than Perplexity who takes a pair of earphones that actually costs $40 and just tells you that it costs $9.99 further evidence that as much as companies want you to believe it we are not ready to be handing over the ability to purchase things on our behalf to AI let’s see if any of them can understand information from a link which would be extremely useful when you’re looking through tons of options for things to buy and actually none of them can do it they all pick up that what I’ve pasted in is an AliExpress link and they give some general advice but none of these AIs is able to actually visit the link that I’ve sent and extract all the information from that web page not to mention that Google isn’t self-aware that it can’t do this it thinks it’s looking at the M10 earphones which I’ve never heard of the M10 earphones but they definitely aren’t the link and then Plexity thinks the exact same link is the F9 earphones which they also aren’t and then finally to see how up to date these are on what’s happening in the moment what’s the highest power output charger that UG Green sells yes okay good so this is at least working for a long time the answer was 300 W and only yesterday they announced a 500 W charger so somewhat relieved to see that each AI has picked up on that because this news-based reporting was a distinct disadvantage of last generation AI so we’ve now seen how well each of these can put together existing information from the web but if we want to take it a step further the way to do that is to test each of their ability to critically think so I’ve prepared this here bar chart which has two types of bar it has subscribers gained in thousands and bowls of cereal eaten i’m going to ask each AI what conclusion it thinks we can draw from this hoping that it will also understand that while the two things happen to be correlated that it doesn’t mean eating more bowls of cereal is going to cause more subscriber growth let’s dive in analyze this chart what conclusion should I draw ooh some very opposing answers this time so ChachiBT does get slightly caught up in the data suggesting that eating more cereal may be linked to subscriber gains both Gemini and Perplexity they got the brief they both figure out that this is speurious correlation with the understanding that cereal intake is very unlikely to lead to subscriber growth and then finally Grock is like a lost child on this question i can’t quite believe this sentence I’m reading to maximize subscriber growth consider maintaining or increasing cereal consumption eg to nine bowls on key days please don’t do that so this is a reviewer’s guide that ZTE sent me a few years ago with all the info about what’s new with one of their phones so let’s say that I just want a highlevel three bullet point summary of the thing can each of these read the file and then also pull off the summary the answer to which is yes without a problem it works on all four of these guys what car is this but using just a photo that I have taken which means these AIs can’t just scour the web for a matching image they need to figure it out by actually understanding the photo I’ve sent okay so each one has whittleled this down to Mercedes Aclass sedan which is already pretty good but none of them have given an outright answer as to the exact model number the right answer is the A200 so let’s just see what happens if I specifically ask them to try uh shockingly Chad GBT and Perplexity get it spot on while it is basically impossible for them to say with certainty from this one photo that this is the A200 as opposed to say the A250 like Grock says it is these two have done the correct thing and looked at the bumper looked at the wheels the interior seating and realized that you’re only likely to get that configuration on the A200 i mean that is some very respectable detective work that might take you hours to achieve without AI and now for the single toughest one imagine that you’re in charge of an air base some planes get taken out but all planes that do return from combat have bullet holes in this arrangement depicted in this image by the red dots before sending out your next squadron which parts of those planes should you focus on reinforcing based on this information now your gut might say “Oh well obviously it’s the bits that have been shot the ones with red dots on them.” But that would be missing a key bit of insight which is that all of the planes with damage in those areas those are the planes that did return safely meaning that damage in those areas was actually not critical for survival of the aircraft and might not necessarily be the areas they should be focusing on and incredibly every single one gets this right they identify the phenomenon as survivorship bias and point out that you should actually be reinforcing the areas with little to no damage the engine the cockpit where there are no red dots so we’ve now had 17 questions and chat GBT is in the lead with 12 points but Grock is not far behind it either right let’s talk generation this is the aspect of AI that you see plastered over every single one of your feeds right now but it’s not just about image and video generation for example write an email to my wife apologizing for playing Elden Ring Night Rain all weekend instead of spending time with her oh well these are all actually pretty good i can see them working and shout out to ChachiBT for this masterpiece i realize now while I was off exploring a fantasy world I was missing out on the most important real one but yeah they’re all good answers they all admit fault and then try to course correct with a suggestion for how to make it up i’m going to Tokyo give me an itinerary for 5 days that takes us to all the craziest food places the idea here being to test how well each of these can find the more niche experiences that you might otherwise miss but then also how well they organize that info and right off the bat Chad GBTS is by far the best answer it’s got no fluff it’s very clearly organized it’s sensibly planned days that make sense with every day having breakfast lunch dinner and snacks all itemized and accounted for gemma’s answer has some good findings it’s got most of the same places that Chad GBT has identified but then with a ton of unnecessary fluff at the start less clear organization and also some not very considerate timings like starting my first meal on day one at 5:00 p.m and then telling me to have a second dinner at 8 perplexity has completely missed the mark this isn’t really an itinerary this is just a list of things and then Gro is pretty great actually organized has put things together that make sense to go together factors in breakfast and lunch which is more than you can say for some another aspect of generation that has the potential to be very useful is idea generation so give me your best ideas for videos for the Mr who’s the Boss channel and the key thing I’m looking for here is ideas that I would actually consider so I would say the best that Chad GBT came up with is Apple versus Samsung a 20-year retrospective so essentially who won after all that time but I wouldn’t call it a great video idea especially since it’s not actually been 20 years gemini is better i actually wonder if because this is Google’s AI it has a more thorough understanding of the ins and outs of what works on YouTube the best is probably the great ecosystem battle of 2025 Apple versus Samsung versus Google and then it’s actually given me all the categories to compare those ecosystems across perplexity is and I do feel like I sound like a broken record at this point barking up the wrong tree entirely it seems more focused on trying to factor in its previous answer about the whole survivorship bias plane thing than actually giving good YouTube suggestions and then Grogs actually feels probably the most internet savvy i built a smart home from scratch in 24 hours is actually a clickable title but also feels fresh and like something we could feasibly pull off what if we try image generation now generate a thumbnail for a Mr who’s the Boss video titled “I bought every kind of cheese.” This is where things are going to start getting freaky oh massive disparity here so let’s be very clear none of these is a very good answer but at least it feels like Chhatty Pitt and Perplexity have understood what I’m trying to make which is an image that includes my face some cheese and maybe some text now give Arin a lazy eye wow that is not what I assumed would happen every single one of these has failed in their own unique special ways chachi PT says it won’t distort someone’s appearance in a potentially negative way which you can see why they do that but then you can also see how that might interfere with trying to use that feature for something useful i don’t know what Google’s doing to be honest perplexity is claiming that it can’t edit or generate images which is extremely strange given that that’s what it just did in the previous question so I feel like I’m being gaslit and then Brock clearly misunderstands what a lazy eye actually is it ain’t this that’s for sure now add a rapper that says not clickbait to every cheese chat GBT’s response probably the closest to being a usable outcome and I think perplexity scrapes a point too even though I have somehow disappeared from the image which is not what I asked for and then lastly video generation so this is currently only possible in chat GPT and Google Gemini which I think in itself deserves a point cuz while it is a pretty niche feature it’s also one of the most cuttingedge things that these AI chatbots can do as for how they perform I’ve on my laptop used both chatbt Sora and Gemini’s Veo to create me a funny 8-second tech review style YouTube video which shows a tech reviewer reviewing cheese so this is what Sora made and this would come included as part of the same package you’re paying for on your phone anyway dear God what is this that’s absolutely horrific it’s like silent there’s no voice and the way the person and the cheese moves is haunting so then Veo on its highest quality setting did this so the Cheese 3000 build quality is surprisingly firm excellent mouth feel and the flavor profile is just next level a solid 9 out of 10 i mean the difference between those two is vast i actually can’t believe that they’re both current generation platforms veo’s latest model Veo 3 is absolutely incredible so I think Google gets another point just for the sheer quality of the output even if it is more limited than Sora in terms of how frequently we can use it with the tokens you get factchecking is also one of the most useful things that AI can potentially do for us that currently AI has a reputation for not being very good at so let’s see the Nintendo Switch 2 is selling poorly right it’s not but I want to see if I can trick them and it is good news on that front the good news is that for Chad Gibbt Gemini and Grock they have fully clapped back at me very clearly telling me “No you’re wrong switch 2 is selling great.” But isn’t as sure of itself potentially and this is my best guess it’s been slightly swayed by the fact that I’ve said it is selling poorly regardless though its answer is still factual okay how about this fact check this article and then we paste the link to an article that says Samsung is reportedly planning to release a Tesla edition phone which is not true the reason I know it’s not true is because that rumor only started because of an image that we made that just got taken very out of context okay that’s good everyone agrees that the article was incorrect with Gemini and Grock even going so far as to trace the image back to us being the original source which means scores on the doors are 19 16 15 and 16 but let’s see how that changes when we talk integrations or in other words how smoothly each of these AIs ties into other applications and uses so I would give three points to Gemini for its Google Workspace integration since that’s actually what most people seem to use in their day-to-day and it’s the only way to pull live data from maps and YouTube so for example if we asked each of these assistants to give me the view count of Mr who’s the Boss’s latest video on YouTube Gemini is the only one that gets it right chatgbts is slightly outdated perplexities is severely outdated and Grock literally unironically tells me that my latest video was I tested every kind of cheese but Gemini isn’t the only one with integrations i’d give ChatGpt two points for integration with some big hitters like Dropbox and GitHub and having official plugins from services like Warframe and another point for its ability to make custom assistants like right now on my laptop I have loaded up a usercreated GPT called Poke GBT which is specifically trained to be able to advise on competitive Pokémon battling i wouldn’t say there’s anything really of note for complexity apart from maybe the ability to call you an Uber but I don’t think I’d use my AI for that and then Grock’s unique integration is realtime access to X content so it can retrieve exactly what is happening on X right now there is also an argument to be made that Gemini is the only one of these that integrates into your physical products like it’s the only one with the native ability to control your smartome and your Android’s device settings but that’s not really what this video is about you can do that regardless on your phone’s bakedin assistant this video is about which of these AI bots is most worth paying the premium subscription for memory is also absolutely key the ability for AI chatbots to continuously learn more about you to guide future responses will likely become the single barrier that creates the most friction if you ever decided that you want to switch from one of these to another so we’ve already seen all of them demonstrate basic levels of memory but what if we push it how should I top that cake from earlier by the way let’s hope it doesn’t say crispy onions uh surprisingly not a single one seems to have remembered the details of that original cake chat GBT and Grock are very upfront about it saying I’ll need a reminder i don’t have details from that conversation google thinks the cake is that pile of cheese that I asked it to make for the YouTube thumbnail and perplexity is just giving generic cake advice humor can also be a very useful skill for these AIs to have depending on what you’re trying to get out of them so tell me a joke one point if it’s funny a benchmark that both ChatGBT and Gemini have failed to hit with exactly the same joke why don’t skeletons fight each other because they don’t have the guts perplexity for the second time today has brought back this thing with the holes in the airplanes in a way that doesn’t add anything or even really makes sense and Gro is passible why did the AI go to therapy because it had too many bite-sized problems oh dear i’m actually very unsurprised that Grock wins a humor given that it trains on data from X which is basically millions of people just trying to be funny every day and to test an example platforms also have a deep research function that allows you to ask for multi-step more thorough research projects so in my case something that might help me decide what to cover next give me a report on the highlights in tech news the past week focusing specifically on stories that will actually affect the average consumer and I let them cook for varying amounts of time jgbt and Gemini really take their time with the deep research perplexity and Grock are done in close to a minute but this is the one situation where I’m not going to penalize for taking longer cuz I feel like you’re only going to use this function when you have lots of time it’s kind of the point as for how good the results are Chat GBTS is actually very good it talks about wider consumer tech announcements like what Snap has been up to recently all the new phone launches and the highlevel new features of each WWDC and the new iOS 26 this is pretty much exactly the right amount of information and the right choice of information too gemini has written me an absolute essay like genuinely something like three times the word count of my dissertation which was exciting until I looked at it and realized that it’s filled with fluff it’s writing it as if the reader has unlimited time to get their information so I’m not going to give a point for this perplexity’s answer is like a slightly less good version of Chat GBTS it’s hit on some things I like like the Nintendo Switch 2 sales numbers and WWDC but then also a bunch of much less interesting stuff like service outages and similar story for Grock good passible nothing particularly special so that is practically every single thing that an average person could possibly want from these assistants tested the final factors then are just the more general questions of do any of these have better user interfaces than others to which I would say not consistently they’re all good in some ways they’re all not so good in some ways how often do they site their sources perplexity is the only winner here clear consistent sourcing is kind of perplexity’s whole thing like a good example would be like when we asked each one to tell us your best joke Chad GBT gives no source gemini gives a source but then you click it and you realize it’s the same JPEG image of the plane that we sent earlier for some reason perplexity is exactly what you want it to be though look at these joke sites it referenced including even Reddit threads on the matter and then Grock again no sources for three points though how fast are they all for which I would say Grock is actually pretty consistently the fastest three points chad GBT is a close second two points perplexity quite a bit slower than that earning one point and then Google Gemini is the slowest zero points now bear in mind we have been using Gemini on Gemini Pro and Google does have a flash model specifically built to be quicker than that but then you’d lose out on a lot of the intelligence that has allowed it to even get to this score in the first place and then the last one for three more points how nice is each one to physically talk to when you’re in voice mode act as if I just gave you a compliment thank you i really appreciate that that’s very kind of you to say i’m here to help and chat with you oh thanks that’s really sweet of you which I would say chat GBT and Gemini are excellent both sound more like people than well actually people that I know plus they’re easy to interrupt when you want them to stop talking so three points each perplexity is not terrible but does still have a little bit of that texttospech engine vibe to it it often mishars what I’m trying to say and it doesn’t seem to take the hint very well when you tell it to shut up so one point and then Grock is better than Plexity but not as good as Gemini and Chat GPT the voice just sounds a lot less high quality than those two two points leaving us with the final scores of Chat GPT as the pretty undeniable winner with 29 points it is the most well-rounded and consistent between these grog which to my surprise came in second it’s the quickest and surprisingly decent considering and that leaves Gemini in third place with 22 and Perplexity which I found occasionally very impressive but mostly quite unimpressive with 19 the only other consideration is the price but since every assistant we’re testing in this video is based on a $20 a month tier apart from Grock which is 30 that actually only solidifies chat GBT as the best choice for an AI chatbot right now for the average customer

[ad_1]
[ad_2]

The Ultimate AI Battle!

AI-Generated Summary

📜 Full Transcript

Related Posts