How do we have confidence that something is true? That is a critical question in all things, including the domain world.
Last week in the NamePros Blog, I shared my experience with a new AI-powered appraisal tool at OceanfrontDomains.com. That tool does an incredible job of analyzing the structure of a name, brainstorming potential uses, and presenting the case for the value of the domain name. However, it has become increasingly clear that many of the comparator sales did not exist. I’ve made an update to that article to better reflect the frequency of errors. There is still much to like in the tool, but the level of comparator sales hallucinations in the current release is unsettling.
This article uses 'lie' in the sense that a lie is simply "something that is not true" with no implication that there was an intent to deceive, the so-called non-deceptionist viewpoint.
AI Can Lie Convincingly
One problem is that AI can lie really convincingly.
With OceanfrontDomains, you can add to the name in the prompt conditions. If you tell it to not include comparator sales, it won’t. So I tried this prompt
At first glance, it did follow my directive. The comparator sales were all in the same extension, of similar length, and, most importantly, all gave a NameBio price and date.
Just to be sure, I went to NameBio to check the data. The first comparator sale suggested was love in the .cc extension selling for $13,500 in 2017. But there is no record of that sale on NameBio! Second on list, it told me hero sold for $5000, also in 2017. When I check on NameBio, there is no record of that name ever selling in the .cc extension. Next it suggested that star sold for $6000 in 2018, but it didn’t, at least no NameBio listing. The next comparator was eco, I don’t think a terribly apt comparator for my original query word duet, but it doesn’t matter, since there is no sale listing on NameBio. The last comparator name, bet does have a sale listed in NameBio! But, OceanfrontDomains tells me it sold for $6000 with a 2018 NameBio listing, whereas it really has a NameBio listed sale at $2500 in 2011. So four of the comparator sales were not listed at all, and the fifth gave data inconsistent with NameBio. I found similar results for a few other names that I checked.
Update: In continued testing over the past two days it appears that OceanfrontDomains appraisal no longer list any comparator sales as NameBio. This observation is based on a relatively small number of names rechecked, but it appears that none now claim NameBio as the source.
But it is so convincing. The sales prices the tool suggested are believable, and it gives a specific NameBio reference for each. Surely no one would lie about that, since it is so easily checked. But it did lie. Over and over. The danger in AI tools, not just in domains but in everything, is that they can lie so convincingly.
By the way, each run produces a somewhat different result, which is to be expected with AI models, so your results may not be identical to the ones reported above.
AI Lies A Lot – Across All Models
I sought more information on hallucinations, one term for information made up by AI tools. There are many articles that qualitatively state that AI frequently hallucinates, but I wanted to find an actual recent study that looked at multiple AI models.
I recommend that anyone interested in this topic read AI Hallucination: Comparison of the Most Popular LLMs (’25) by Cem Dilmegani at AIMultiple.com. That article defines hallucinations:
The study investigated AI hallucination rates across 13 different LLMs, including versions of GPT, Claude, Grok, Llama, DeepSeek, Gemini and others. It found that the hallucination rates varied between 15% to 57%, with GPT 4.5 and Grok 3 among the best, and Gemini and GPT 4o the worse.
Even the ‘best’ LLMs had a disconcerting frequency of hallucinations. Note this was a small research study, limited to 60 questions with each LLM, and using one type of information resource.
The article also discusses the risks associated with hallucinations, and the how they come about, as well as steps for mitigation.
Better Accuracy Through RAG Tools
In researching this topic I learned a new term Retrieval-Augmented Generation (RAG). In a different article, Best RAG tools: Embedding Models, Libraries and Frameworks, Cem Dilmegani writes
Agentic Systems and Hallucinations
The high rate of hallucinations is of particular concern in the move toward autonomous or semi-autonomous agentic systems. Let’s say you have an agentic system ‘operate’ a retail business, making decisions on product lines, pricing, inventory, supply chains, marketing and more.
If part of the system has serious hallucinations, such as making up sales data for particular merchandise lines, disastrous results are possible. The move to agentic systems needs to be slow, with attention to robust systems to minimize and mitigate hallucinations in data.
The NamePros Blog covered agentic systems in the article Agent, Agentic and More: Domain Name Investment Opportunities.
For domain investors, these concerns can also lead to opportunity, though. Will there be demand for domain names suited to verification and accuracy in agentic systems or in AI more generally?
As noted in AI Hallucination: Comparison of the Most Popular LLMs (’25), hallucinations are a particular concern when AI is applied in critical systems such as healthcare, legal, and financial sectors, among others.
What Can We Learn From Science
Most of my career before domains was in science, and I think there are lessons in validation and trust in results from science that could be applied in domains.
Research Details
In science the details of the experiment or research study must be included in the paper. That makes sure that we are clear on exactly what was found, but it also allows another group to replicate the experiment. Sometimes in domain names we are told in a vague way that an experiment supports some result, or shown results without details of the study, or the actual numbers even.
Statistical Significance
It is easy to be potentially fooled by a result that is really nothing more than noise. I might tell you that my sell-through rate doubled when I did something. But without knowing if that was going from 1 sale per year to 2, or from 500 to 1000, or how irregular my sales normally are, the statement means nothing.
Peer Review
The heart of scientific validation is peer review. That simply means that prior to a result being published, several researchers who are expert in the field, but without affiliation to the authors, have carefully reviewed the results. While peer review can, and does, make mistakes, it is a critical component of validation.
Replication
Anything important in science will be replicated by multiple independent groups. There is competition in almost any niche, and that is good. Results that do not stand the test of being replicated by others will no longer have status.
Discussion
Almost all papers have a section called Discussion. That is the place where the significance and implications of the work is laid out, but also includes a balanced look at how the research relates to other results, limitations in the research done, and ideas for next steps. I think we could benefit from full discussion commentary on domain experiments.
Community Review
Following peer review and acceptance, the study is published, and becomes part of the scientific record. Journals properly guard their reputation, making sure to publish only deserving contributions to knowledge. Yes, sometimes things slip through, but most of the time, quality things get published. Contrast this to some ‘theory’ widely shared on social media, possibly starting from a noise coincidence or faulty assumption.
Share Views
Most scientific results get discussed at scholarly meetings, both formally in paper presentations and informally at the event. It would be wonderful if the naming conferences moved to have a component specifically for discussion of research that was at a scholarly level.
NamePros Role
While it is not a true scholarly mechanism in the academic sense, the NamePros community plays an important role in pressing for details, evaluating significance of claims, sharing results, and discussion.We each play our part in making sure that happens.
Insist Multiple AI Sources Agree
A key part of the the science validation process outlined above is that multiple routes support some finding. That is everything from details allowing replication of research studies, to multiple peer reviews supporting publication, to the broader community discussion processes.
I am surprised, given the high hallucination rates, that we do not insist that any AI result be supported by multiple independent paths. For example, if we had two different and independent LLMs each suggesting the same comparator sales, that would give us more confidence. It would seem easy to do this - have an agent that consulted two different AI environments, and required consistency in order to include a result.
For that matter, would it not be trivial for a different AI tool check the verifiable data? For example, if sales at a venue are listed, check that the data is correct. Could not an AI agent perform the check I did on comparator sales?
Check Anything That Matters
Until that becomes commonplace, when you are using any result, whether AI generated or not, always check independently anything that matters to you.
I am increasingly worried that as a society we are trusting results generated by AI way more than is warranted.
I welcome comments in the discussion below on any aspects of this topic.
Updates:
1. May 5, 2025 I added one line in the introduction to make clear that use of the term 'lie' is simply as something not true, with no implication about intent to deceive or not.
2. May 7, 2025 I have not tested extensively, but there seems to have been a recent change at OceanfrontDomains so that they no longer list comparator sales as being NameBio listed. Section updated to reflect this.
Special thanks to Cem Dilmegani who wrote a number of articles related to this topic. Two were cited in the article, and you can browse all his recent articles at this link.
Last week in the NamePros Blog, I shared my experience with a new AI-powered appraisal tool at OceanfrontDomains.com. That tool does an incredible job of analyzing the structure of a name, brainstorming potential uses, and presenting the case for the value of the domain name. However, it has become increasingly clear that many of the comparator sales did not exist. I’ve made an update to that article to better reflect the frequency of errors. There is still much to like in the tool, but the level of comparator sales hallucinations in the current release is unsettling.
This article uses 'lie' in the sense that a lie is simply "something that is not true" with no implication that there was an intent to deceive, the so-called non-deceptionist viewpoint.
AI Can Lie Convincingly
One problem is that AI can lie really convincingly.
With OceanfrontDomains, you can add to the name in the prompt conditions. If you tell it to not include comparator sales, it won’t. So I tried this prompt
duet.cc only include reliable comparator sales with NameBio references
.At first glance, it did follow my directive. The comparator sales were all in the same extension, of similar length, and, most importantly, all gave a NameBio price and date.
Just to be sure, I went to NameBio to check the data. The first comparator sale suggested was love in the .cc extension selling for $13,500 in 2017. But there is no record of that sale on NameBio! Second on list, it told me hero sold for $5000, also in 2017. When I check on NameBio, there is no record of that name ever selling in the .cc extension. Next it suggested that star sold for $6000 in 2018, but it didn’t, at least no NameBio listing. The next comparator was eco, I don’t think a terribly apt comparator for my original query word duet, but it doesn’t matter, since there is no sale listing on NameBio. The last comparator name, bet does have a sale listed in NameBio! But, OceanfrontDomains tells me it sold for $6000 with a 2018 NameBio listing, whereas it really has a NameBio listed sale at $2500 in 2011. So four of the comparator sales were not listed at all, and the fifth gave data inconsistent with NameBio. I found similar results for a few other names that I checked.
Update: In continued testing over the past two days it appears that OceanfrontDomains appraisal no longer list any comparator sales as NameBio. This observation is based on a relatively small number of names rechecked, but it appears that none now claim NameBio as the source.
But it is so convincing. The sales prices the tool suggested are believable, and it gives a specific NameBio reference for each. Surely no one would lie about that, since it is so easily checked. But it did lie. Over and over. The danger in AI tools, not just in domains but in everything, is that they can lie so convincingly.
By the way, each run produces a somewhat different result, which is to be expected with AI models, so your results may not be identical to the ones reported above.
AI Lies A Lot – Across All Models
I sought more information on hallucinations, one term for information made up by AI tools. There are many articles that qualitatively state that AI frequently hallucinates, but I wanted to find an actual recent study that looked at multiple AI models.
I recommend that anyone interested in this topic read AI Hallucination: Comparison of the Most Popular LLMs (’25) by Cem Dilmegani at AIMultiple.com. That article defines hallucinations:
Hallucinations happen when an LLM produces information that seems real but is either completely made up or factually inaccurate.
The study investigated AI hallucination rates across 13 different LLMs, including versions of GPT, Claude, Grok, Llama, DeepSeek, Gemini and others. It found that the hallucination rates varied between 15% to 57%, with GPT 4.5 and Grok 3 among the best, and Gemini and GPT 4o the worse.
Even the ‘best’ LLMs had a disconcerting frequency of hallucinations. Note this was a small research study, limited to 60 questions with each LLM, and using one type of information resource.
The article also discusses the risks associated with hallucinations, and the how they come about, as well as steps for mitigation.
Better Accuracy Through RAG Tools
In researching this topic I learned a new term Retrieval-Augmented Generation (RAG). In a different article, Best RAG tools: Embedding Models, Libraries and Frameworks, Cem Dilmegani writes
In the domain context, a RAG system might provide verifiable domain sales data.Retrieval-Augmented Generation (RAG) is an AI method that improves large language model (LLM) responses by using external information sources. RAG provides current, reliable facts and lets users trace their origins, boosting transparency and trust in AI.
Agentic Systems and Hallucinations
The high rate of hallucinations is of particular concern in the move toward autonomous or semi-autonomous agentic systems. Let’s say you have an agentic system ‘operate’ a retail business, making decisions on product lines, pricing, inventory, supply chains, marketing and more.
If part of the system has serious hallucinations, such as making up sales data for particular merchandise lines, disastrous results are possible. The move to agentic systems needs to be slow, with attention to robust systems to minimize and mitigate hallucinations in data.
The NamePros Blog covered agentic systems in the article Agent, Agentic and More: Domain Name Investment Opportunities.
For domain investors, these concerns can also lead to opportunity, though. Will there be demand for domain names suited to verification and accuracy in agentic systems or in AI more generally?
As noted in AI Hallucination: Comparison of the Most Popular LLMs (’25), hallucinations are a particular concern when AI is applied in critical systems such as healthcare, legal, and financial sectors, among others.
What Can We Learn From Science
Most of my career before domains was in science, and I think there are lessons in validation and trust in results from science that could be applied in domains.
Research Details
In science the details of the experiment or research study must be included in the paper. That makes sure that we are clear on exactly what was found, but it also allows another group to replicate the experiment. Sometimes in domain names we are told in a vague way that an experiment supports some result, or shown results without details of the study, or the actual numbers even.
Statistical Significance
It is easy to be potentially fooled by a result that is really nothing more than noise. I might tell you that my sell-through rate doubled when I did something. But without knowing if that was going from 1 sale per year to 2, or from 500 to 1000, or how irregular my sales normally are, the statement means nothing.
Peer Review
The heart of scientific validation is peer review. That simply means that prior to a result being published, several researchers who are expert in the field, but without affiliation to the authors, have carefully reviewed the results. While peer review can, and does, make mistakes, it is a critical component of validation.
Replication
Anything important in science will be replicated by multiple independent groups. There is competition in almost any niche, and that is good. Results that do not stand the test of being replicated by others will no longer have status.
Discussion
Almost all papers have a section called Discussion. That is the place where the significance and implications of the work is laid out, but also includes a balanced look at how the research relates to other results, limitations in the research done, and ideas for next steps. I think we could benefit from full discussion commentary on domain experiments.
Community Review
Following peer review and acceptance, the study is published, and becomes part of the scientific record. Journals properly guard their reputation, making sure to publish only deserving contributions to knowledge. Yes, sometimes things slip through, but most of the time, quality things get published. Contrast this to some ‘theory’ widely shared on social media, possibly starting from a noise coincidence or faulty assumption.
Share Views
Most scientific results get discussed at scholarly meetings, both formally in paper presentations and informally at the event. It would be wonderful if the naming conferences moved to have a component specifically for discussion of research that was at a scholarly level.
NamePros Role
While it is not a true scholarly mechanism in the academic sense, the NamePros community plays an important role in pressing for details, evaluating significance of claims, sharing results, and discussion.We each play our part in making sure that happens.
Insist Multiple AI Sources Agree
A key part of the the science validation process outlined above is that multiple routes support some finding. That is everything from details allowing replication of research studies, to multiple peer reviews supporting publication, to the broader community discussion processes.
I am surprised, given the high hallucination rates, that we do not insist that any AI result be supported by multiple independent paths. For example, if we had two different and independent LLMs each suggesting the same comparator sales, that would give us more confidence. It would seem easy to do this - have an agent that consulted two different AI environments, and required consistency in order to include a result.
For that matter, would it not be trivial for a different AI tool check the verifiable data? For example, if sales at a venue are listed, check that the data is correct. Could not an AI agent perform the check I did on comparator sales?
Check Anything That Matters
Until that becomes commonplace, when you are using any result, whether AI generated or not, always check independently anything that matters to you.
I am increasingly worried that as a society we are trusting results generated by AI way more than is warranted.
I welcome comments in the discussion below on any aspects of this topic.
Updates:
1. May 5, 2025 I added one line in the introduction to make clear that use of the term 'lie' is simply as something not true, with no implication about intent to deceive or not.
2. May 7, 2025 I have not tested extensively, but there seems to have been a recent change at OceanfrontDomains so that they no longer list comparator sales as being NameBio listed. Section updated to reflect this.
Special thanks to Cem Dilmegani who wrote a number of articles related to this topic. Two were cited in the article, and you can browse all his recent articles at this link.
Last edited: