The decision to allow ChatGPT to access the internet in real-time marked a significant shift, with considerable implications for OpenAI’s operations. This capability introduced complex challenges around data quality, ethical safety, and model behaviour. It’s likely this decision involved extensive deliberation and preparation, given the increased need for human resources and infrastructure to ensure the model’s responses remain accurate, safe, and aligned with ethical standards.
Q1: When was ChatGPT last trained?
The model was last updated with knowledge as of October 2023.
Q2: Are there risks with real-time internet access, like feeding off its own output and inheriting internet-based biases?
Yes, enabling browsing opens ChatGPT to potential output contamination, where it might encounter and reinforce its own or other AI-generated content. Furthermore, the internet’s demographics and biases could skew the model’s responses, leading to risks of racist, misogynistic, or otherwise biased outputs. OpenAI counters these risks through balanced datasets, filtering, and ethical reviews, though challenges persist.
Q3: Who performs the necessary pre- and post-processing, ethical review, and proactive filtering?
This work relies on interdisciplinary teams:
• Machine learning engineers and data scientists handle data curation and filtering.
• Ethics and safety teams, including ethicists and psychologists, ensure the model meets ethical standards.
• Content moderation teams, using a mix of automated tools and human reviewers, manage detection of harmful or biased content. These teams work together to mitigate risks and ensure model safety.
Q4: Does real-time access increase the need for continuous human oversight?
Yes, real-time capabilities demand more labour-intensive processes, as teams must constantly monitor new data to prevent inappropriate content. Additionally, human feedback remains essential through Reinforcement Learning from Human Feedback (RLHF), where human reviewers evaluate and refine the model’s responses, adding to the need for ongoing oversight alongside automation.
Q5: Is this work done in-house, or contracted out?
While core functions—such as research, ethical oversight, and policy work—are handled in-house, OpenAI contracts third parties for data annotation, content moderation, and feedback. These third-party vendors help scale labour-intensive tasks, providing essential support in managing large volumes of data and complex feedback.
Q6: Has OpenAI’s workforce grown significantly to meet these demands?
Yes, OpenAI’s workforce has expanded, likely numbering between 300 and 500 employees, covering research, ethics, policy, infrastructure, and support. Contracted workers, primarily for data annotation and moderation, may number in the hundreds or thousands, though these figures fluctuate based on project needs. OpenAI also relies on its Microsoft Azure partnership to support the substantial infrastructure required for training and deployment.
In short, the decision to enable real-time browsing has driven OpenAI to expand both its in-house and contracted workforce to meet the added challenges, balancing expertise, ethical considerations, and practical labour for safe, effective AI operations.