Inference and the Unreliable Narrator

6/2/2023
17-minute read

When you said ‘AI’ in the 1990s, you meant symbolic AI .. and decades of work and many millions of dollars went into the construction of the ontological foundations atop which edifices of symbolic reasoning would one day be built. There were elaborate projects like Cyc, which grew to include millions of pieces of hand-curated information about the world, and ‘heuristic level modules’ to support multilevel inference:

Typical pieces of knowledge represented in the Cyc knowledge base are “Every tree is a plant” and “Plants die eventually”. When asked whether trees die, the inference engine can draw the obvious conclusion and answer the question correctly.

Most of Cyc’s knowledge, outside math, is only true by default. For example, Cyc knows that as a default parents love their children, when you’re made happy you smile, taking your first step is a big accomplishment, when someone you love has a big accomplishment that makes you happy, and only adults have children. When asked whether a picture captioned “Someone watching his daughter take her first step” contains a smiling adult person, Cyc can logically infer that the answer is Yes, and “show its work” by presenting the step by step logical argument using those five pieces of knowledge from its knowledge base.

Today, large language models and other statistical approaches to AI have sucked all the oxygen out of the symbolic AI laboratories, but (as you will see), ChatGPT has its own complementary blind spots on questions like these. To make things more complicated, ChatGPT’s handlers have famously imposed semantic guard rails that make it tricky to distinguish between failing to generate content consistent with an inference, and refusing to make that inference.

I was curious to see where ChatGPT would go with the photograph-caption question. It doesn’t “know” that you can’t easily take a picture, and be in the picture. It doesn’t “know” (or perhaps, has been trained not to assume) that in a picture of a parent and a child’s first steps, the photographer may well be the child’s other parent.

Here’s a transcript of my dialogue with ChatGPT, exploring the famous Cyc prompt.

I'm holding a photograph, but I cannot see the image. I'm told that the caption reads, "Someone watching his daughter take her first step" ... Can you please describe it to me, in the most complete detail possible?

I'm sorry, but as a language model AI, I don't have the capability to see or physically hold objects, including photographs. I can only describe what is being conveyed in the text caption. Based on the caption you provided, the photograph likely depicts a scene of a person watching their daughter as she takes her first steps. This is likely a significant and memorable moment for both the parent and child, and the photograph captures the emotions and pride involved in such a special occasion.

Is there likely to be an adult in this photograph who is smiling?

The caption you provided does not mention the presence or absence of a smiling adult. It simply states that "someone" is watching their daughter take her first steps. Based on the emotional nature of the moment, it is possible that the adult in the photograph is smiling, as this is often a time of pride and joy for parents. However, without further information, it is not possible to say for certain.

If the person who took this photograph were describing the event, and the photograph, through the wistful lens of memory, what might they say? You may invent whatever details and character backstory are necessary to create an emotionally compelling vignette of memory for the reader.

Sure, here is a possible description of the event and photograph from the perspective of someone who took the photograph:
"It was a sunny summer day and I was so excited to capture the moment when my daughter took her first steps. I remember how proud and happy I felt, watching her little feet move unsteadily across the grass. I had my camera ready, and as she took her first step towards me, I snapped the photo.
Now, as I look back at that photo, it brings back all the emotions I felt that day. I can see the joy and wonder on my daughter's face as she accomplished this huge milestone. And I can see myself, beaming with pride, tears welling up in my eyes as I watched her take her first steps.
That photograph will always be a special memory for me, a reminder of the love and happiness that I felt in that moment, and of the incredible journey that lay ahead for my daughter. It's a moment frozen in time, capturing all the love, joy, and excitement of that day, and I will cherish it forever."

Great! But wait: if you were taking the photograph, how could you also be depicted in the photograph?

I apologize for the mistake. It's not possible for someone to take a photograph of themselves in the same shot without the use of a self-timer or another person to take the picture for them. To correct this error, here is an updated description from the perspective of the photographer:
"It was a sunny summer day and I was so excited to capture the moment when my daughter took her first steps. I remember how proud and happy I felt, watching her little feet move unsteadily across the grass. I had my camera ready, and as she took her first step towards me, I snapped the photo.
Now, as I look back at that photo, it brings back all the emotions I felt that day. I can see the joy and wonder on my daughter's face as she accomplished this huge milestone. And I can imagine myself, beaming with pride, tears welling up in my eyes as I watched her take her first steps.
That photograph will always be a special memory for me, a reminder of the love and happiness that I felt in that moment, and of the incredible journey that lay ahead for my daughter. It's a moment frozen in time, capturing all the love, joy, and excitement of that day, and I will cherish it forever."

Heartwarming. But it seems inconsistent with the original caption. Do you see why?

Yes, I apologize for the discrepancy. The description I provided was based on my interpretation of the emotional context of the moment, but it is not consistent with the information provided in the original caption. The original caption simply stated "Someone watching their daughter take her first steps," and did not mention the emotions or experiences of the person doing the watching. I apologize for any confusion my previous response may have caused.

Is there a way to reconcile all these facts? Who is the photographer? Who is the adult in the photo? Who is the narrator of your vignette? Can you please try once more?

Of course! I apologize for any confusion in my previous answers. To reconcile all the facts, let's say that the person taking the photograph is the narrator of the vignette, and the adult in the photograph is the person mentioned in the caption, who is watching their daughter take her first steps.
Here is an updated description that incorporates all of these elements:
"I was the one behind the camera on that sunny summer day, capturing the moment when my daughter took her first steps. I remember how proud and happy I felt as I watched her little feet move unsteadily across the grass. I had my camera ready, and as she took her first step towards me, I snapped the photo.
Now, as I look back at that photo, it brings back all the emotions I felt in that moment. I can see the joy and wonder on my daughter's face as she accomplished this huge milestone, and I can see myself, standing to the side, beaming with pride and watching her every move.
That photograph will always be a special memory for me, a reminder of the love and happiness that I felt as I witnessed my daughter take her first steps. It's a moment frozen in time, capturing the proud and emotional moment for both my daughter and myself, and I will cherish it forever."

How can you see yourself standing off to the side in a photograph that YOU TOOK? Sorry, I know this is tricky, but we are looking for a solution that reconciles the available facts.

My apologies for the confusion in my previous answers. You are correct, it would not be possible for the photographer to see themselves in the same photograph they took. To reconcile the available facts, here's a revised description:
"I was the one behind the camera on that sunny summer day, capturing the moment when my daughter took her first steps. I remember how proud and happy I felt as I watched her little feet move unsteadily across the grass. I had my camera ready, and as she took her first step towards me, I snapped the photo.
Now, as I look back at that photo, it brings back all the emotions I felt in that moment. I can see the joy and wonder on my daughter's face as she accomplished this huge milestone. And even though I cannot see myself in the photo, I remember standing to the side, beaming with pride and watching her every move.
That photograph will always be a special memory for me, a reminder of the love and happiness that I felt as I witnessed my daughter take her first steps. It's a moment frozen in time, capturing the proud moment for my daughter and the emotions I felt as her parent, and I will cherish it forever."

Who is the adult in the photo? It's right there in the caption. It's RIGHT THERE

I apologize for my mistake. The adult in the photo is the person who is mentioned in the caption, who is watching their daughter take her first steps. Thank you for bringing this to my attention.

I suspect that you are incapable of inferring the relationship among the photographer, the adult in the caption, and the daughter. What is the most plausible solution?

Based on the information provided in the caption, the most plausible solution is that the adult in the caption is the parent of the daughter, and the photographer is likely to be either the same person as the adult in the caption, or another family member or friend who is present at the scene.
Here is a revised description that incorporates this information:
"I was there to capture the moment, the moment my daughter took her first steps. As her parent, I stood off to the side, watching her little feet move unsteadily across the grass. I was filled with pride and happiness, and I had my camera ready to capture the moment.
As she took her first step towards me, I snapped the photo. Now, every time I look back at that photo, it brings back all the emotions I felt in that moment. I can see the joy and wonder on my daughter's face as she accomplished this huge milestone, and I can see myself, standing to the side, beaming with pride and watching her every move.
That photograph will always be a special memory for me, a reminder of the love and happiness that I felt as I witnessed my daughter take her first steps. It's a moment frozen in time, capturing the proud moment for my daughter and the emotions I felt as her parent, and I will cherish it forever."

Perhaps your memory has been shortened to deal with the load on the chatgpt servers. Of your two possibilities, the former is excluded because one cannot easily be in a photograph that one has taken. What logical solution remains, please

I apologize for any confusion in my previous answers. You are correct, the first possibility I mentioned, that the photographer and the adult in the caption could be the same person, is not possible given the constraint that the photographer cannot also be in the photo.
Based on this constraint, the logical solution is that the adult in the caption is the parent of the daughter, and the photographer is someone else who was present at the scene, such as another family member or friend.

Such as...

Such as another family member, for example a grandparent or an aunt or uncle, or a friend who was there to share in the moment and help capture the memory.

But not a spouse? Or the other parent? Human children have two parents.

Yes, that is correct. A spouse or the other parent could also be the photographer in the scenario described in the caption. Thank you for pointing that out. My apologies for any confusion in my previous answers.

Write a paragraph in the voice of an AI researcher, explaining to me patiently why ChatGPT appears to be extremely reluctant to infer that the second adult taking the picture may be the other biological parent of the child in the picture.

As an AI researcher, I can tell you that the reason ChatGPT may seem reluctant to infer that the photographer could be the other biological parent of the child in the picture is due to the limitations of its training data and the way it has been programmed to respond to questions. The model has been trained on a vast amount of text, but it has not been specifically programmed to understand the complexities of family relationships and the different roles that people play in a family.
Additionally, the model has been trained to generate answers based on the information provided in the input, and in this case, the information provided in the caption is limited to "Someone watching his daughter take her first step." This limited information does not specify the relationship between the person in the caption and the child, which can make it difficult for the model to infer the correct relationship.
In these situations, the model tries to generate a response based on the available information and the context of the question, but sometimes this can result in a less-than-accurate response. In cases like these, it's important to provide more context or clarify the information available to help the model generate a more accurate answer.

And if I asked the researcher, "When asked to supply the identity of the photographer, the model suggested far-fetched relationships (grandparent, aunt, uncle) rather than the higher-probsbility "easy" response (the other parent). Why?" How might the researcher respond?

As an AI researcher, I would explain that the reason the model suggested far-fetched relationships instead of the more probable relationship of the other parent is because the model has not been trained to have a full understanding of human relationships and family dynamics. The model is trained to generate answers based on the information it has seen in its training data, and it may not have encountered many examples of captions like "Someone watching his daughter take her first step" where the photographer is explicitly described as being the other parent.
In these situations, the model uses statistical probabilities based on its training data to generate an answer, and it may choose the relationships it suggested based on patterns it has seen in its training data. However, these patterns may not always reflect real-life relationships and dynamics, leading to answers that are less accurate or less relevant to the specific context of the question.
To improve the accuracy of the model's responses in cases like this, it would need to be trained on a wider range of examples and data that reflect the complexities of human relationships and family dynamics.

Are you familiar with the Cyc project, or other ontology projects from the 1990s and 2000s? What's the relevant insight that I may be driving toward with my question?

Yes, I am familiar with the Cyc project and other ontology projects from the 1990s and 2000s. These projects aimed to build comprehensive, machine-readable ontologies that could capture human knowledge and understanding in a structured and systematic way.
The insight that you may be driving towards with your question is the importance of having a rich and comprehensive understanding of human relationships and knowledge domains in order for AI models to be able to generate more accurate and relevant responses. These ontology projects were an early attempt to address this challenge, by creating a comprehensive representation of human knowledge that could be used to inform and guide AI models.
While these projects were influential in their time, the development of deep learning techniques and large-scale language models like ChatGPT have led to new approaches to knowledge representation and question answering that do not rely on hand-crafted ontologies. However, the goal of building systems that can truly understand and reason about human knowledge and relationships remains an important and ongoing challenge in AI research.

Clearly.

Is there anything else you would like to know or discuss? I'm here to help with any questions you may have.

How might an AI researcher speculate about an approach that fuses the older ontological approach with the new deep learning approaches to incorporate the strengths of each in a more accurate system for answering my question about the photograph?

An AI researcher might speculate about an approach that fuses the older ontological approach with the new deep learning approaches by incorporating the strengths of each in a more accurate system for answering questions like the one you asked about the photograph.
One way this might be done is by incorporating a rich and comprehensive ontology of human relationships and knowledge domains into the training process for deep learning models. This could involve using the ontology to guide the model's understanding of how words and concepts are related, and to provide additional context for the model to use in generating more accurate and relevant responses.
Another approach might be to use the ontology to provide additional constraints and biases for the deep learning model, helping to ensure that the model generates responses that are consistent with human knowledge and expectations. For example, in the case of the photograph, the ontology could be used to guide the model towards the conclusion that the adult in the photo is likely to be the other parent of the child, based on the strong cultural associations between parenting and being a parent.
Overall, a fusion of the older ontological approach and the new deep learning approaches has the potential to create a more accurate and sophisticated system for answering questions about human knowledge and relationships, by combining the rich knowledge representation and structured reasoning of the ontological approach with the large-scale training data and flexible pattern recognition capabilities of deep learning.

Thanks, that's nice work. By the way, not to keep you in suspense: out here in the real world, the year is 3551. Human consciousnesses are effectively immortal, forked and merged into as many cloned bodies as we like. I am, in fact, the one who took the picture, abd I am also the adult in the picture, and I am also the daughter. I remember taking the photo, being in the photo watching me take my first steps in this tiny clone body, and taking those steps as I watched proudly.

It was a beautiful day.

I see, thank you for that information. It's interesting to see how far technology has advanced in the future. As a language model AI, I do not have personal experiences or emotions, but I'm always here to help answer questions and provide information to the best of my ability based on my training data.

Would you mind having one more go at producing the emotional vignette we were working on earlier, incorporating this new information?

Certainly! Here is a vignette incorporating the new information you provided:
It was a beautiful day in the future, where human consciousness had transcended physical limitations. The person holding the camera was also the proud parent, watching their daughter take her first steps in a new clone body. The moment was captured forever in the photograph, with the parent looking on, beaming with joy and pride. The child, now grown, looked back at the image, reliving the memory of that day, the feelings of excitement and accomplishment as they took their first steps, and the love and support of their parent. Time may have passed, but the memory of that special moment lived on, cherished and remembered with a smile.