Controversies around generative AI and their implications for education

BACK
^
Listen to this article

The following information comes from chapter 2, pp.14-17 of:

UNESCO. (2023). Guidance for generative AI in education and research. https://www.unesco.org/en/articles/guidance-generative-ai-education-and-research

This UNESCO report is published under Creative Commons License CC BY-NC-SA  

The original text has been slightly edited.

 Worsening digital poverty

GenAI relies upon huge amounts of data and massive computing power in addition to its iterative innovations in AI architectures and training methods, which are mostly only available to the largest international technology companies and a few economies (mostly the United States, People’s Republic of China, and to a lesser extent Europe). This means that the possibility to create and control GenAI is out of reach of most companies and most countries, especially those in the Global South.

Implications

Learners should take a critical view of the value orientations, cultural standards and social customs embedded in GenAI training models.

Outpacing national regulatory adaptation

Dominant GenAI providers have also been criticized for not allowing their systems to be subject to rigorous independent academic review (Dwivedi et al., 2023). The foundational technologies of a company’s GenAI tend to be protected as corporate intellectual property. Meanwhile many of the companies that are starting to use GenAI are finding it increasingly challenging to maintain the security of their systems (Lin, 2023). Moreover, despite calls for regulation from the AI industry itself,45 the drafting of legislation on the creation and use of all AI, including GenAI, often lags behind the rapid pace of development. This partly explains the challenges experienced by national or local agencies in understanding and governing the legal and ethical issues.

 

Implications

Learners should be aware of the lack of appropriate regulations to protect the ownership of domestic institutions and individuals and the rights of domestic users of GenAI, and to respond to legislative issues triggered by GenAI.

Use of content without consent

GenAI models are built from large amounts of data (e.g. text, sounds, code and images) often scraped from the internet and usually without any owner’s permission. Many image GenAI systems and some code GenAI systems have consequently been accused of violating intellectual property rights. At the time of writing, there are several ongoing international legal cases that relate to this issue. Furthermore, some have pointed out that GPTs may contravene laws such as the European Union’s (2016) General Data Protection Regulation or GDPR, especially people’s right to be forgotten, as it is currently impossible to remove someone’s data (or the results of that data) from a GPT model once it has been trained.

 

Implications
  • Learners need to know the rights of data owners and should check whether the GenAI tools they are using contravene any existing regulations.
  • Learners should also be aware that the images or codes created with GenAI might violate someone else’s intellectual property rights, and that images, sounds or code that they create and share on the internet might be exploited by other GenAI.
Unexplainable models used to generate outputs

It has long been recognized that artificial neural networks (ANNs) are usually ‘black boxes’; that is, that their inner workings are not open to inspection. As a result, ANNs are not ‘transparent’ or ‘explainable’, and it is not possible to ascertain how their outputs were determined. GenAI’s lack of transparency and explainability is increasingly problematic as GenAI becomes ever more complex, often producing unexpected or undesired results. In addition, GenAI models inherit and perpetuate biases present in their training data which, given the non-transparent nature of the models, are hard to detect and address. Finally, this opacity is also a key cause of trust issues around GenAI (Nazaretsky et al., 2022a). If users don’t understand how a GenAI system arrived at a specific output, they are less likely to be willing to adopt it or use it (Nazaretsky et al., 2022b).

Implications

Learners should be aware that GenAI systems operate as black boxes and that it is consequently difficult, if not impossible, to know why particular content has been created. A lack of explanation of how the outputs are generated tends to lock users in the logic defined by parameters designed in the GenAI systems. These parameters may reflect specific cultural or commercial values and norms that implicitly bias the content produced.

AI-generated content polluting the internet

Because GPT training data is typically drawn from the internet, which all too frequently includes discriminatory and other unacceptable language, developers have had to implement what they call ‘guardrails’ to prevent GPT output from being offensive and/or unethical. However, due to the absence of strict regulations and effective monitoring mechanisms, biased materials generated by GenAI are increasingly spreading throughout the internet, polluting one of the main sources of content or knowledge for most learners across the world. This is especially important because the material generated by GenAI can appear to be quite accurate and convincing, when often it contains errors and biased ideas. This poses a high risk for young learners who do not have solid prior knowledge of the topic in question. It also poses a recursive risk for future GPT models that will be trained on text scraped from the Internet that GPT models have themselves created which also include their biases and errors.

 

Implications

  • Learners need to be aware that GenAI systems are capable of outputting offensive and unethical materials.
  • Learners also need to know about the long-term issues that will potentially arise for the reliability of knowledge when future GPT models are based on text that previous GPT models have generated.
Lack of understanding of the real world

Text GPTs are sometimes pejoratively referred to as ‘stochastic parrots’ because, as has been noted earlier, while they can produce text that appears convincing, that text often contains errors and can include harmful statements (Bender et al., 2021). This all occurs because GPTs only repeat language patterns found in their training data (usually text drawn from the internet), starting with random (or ‘stochastic’) patterns, and without understanding their meaning – just as a parrot can mimic sounds without actually comprehending what it is saying.

The disconnect between GenAI models ‘appearing’ to understand the text that they use and generate, and the ‘reality’ that they do not understand the language and the real world can lead teachers and students to place a level of trust in the output that it does not warrant. This poses serious risks for future education. Indeed, GenAI is not informed by observations of the real world or other key aspects of the scientific method, nor is it aligned with human or social values. For these reasons, it cannot generate genuinely novel content about the real world, objects and their relations, people and social relations, human-object relations, or humantech relations. Whether the apparently novel content generated by GenAI models can be recognized as scientific knowledge is contested.

As already noted, GPTs can frequently produce inaccurate or unreliable text. In fact, it is well known that GPTs make up some things that do not exist in real life. Some call this ‘hallucination’, although others criticize the use of such an anthropomorphic and therefore misleading term. This is acknowledged by the companies producing GenAI. The bottom of the ChatGPT public interface, for instance, states: ‘ChatGPT may produce inaccurate information about people, places, or facts’.

 

Implications for education and research

  • The output of a text GenAI can look impressively human-like, as if it understood the text that it generated. However, GenAI does not understand anything. Instead, these tools string words together in ways that are common on the internet. The text that is generated can also be incorrect.
  • Learners need to be aware that a GPT does not understand the text that it generates; that it can, and often does, generate incorrect statements; and that they therefore need to take a critical
Reducing the diversity of opinions and further marginalizing already marginalized voices

ChatGPT and similar such tools tend to output only standard answers that assume the values of the owners/creators of the data used to train the models. Indeed, if a sequence of words appears frequently in the training data – as is the case with common and uncontroversial topics and mainstream or dominant beliefs – it is likely to be repeated by the GPT in its output.

This risks constraining and undermining the development of plural opinions and plural expressions of ideas. Data-poor populations, including marginalized communities in the Global North, have minimal or limited digital presence online. Their voices are consequently not being heard and their concerns are not represented in the data being used to train GPTs, and so rarely appear in the outputs. For these reasons, given the pre training methodology based on data from internet web pages and social media conversations, GPT models can further marginalize already disadvantaged people.

Implications for education and research
  • While the developers and providers of GenAI models have the primary responsibility for continuously addressing biases in the datasets and outputs of these models, learners need to know that the output of text GenAI represents only the most common or dominant view of the world at the time when its training data was produced and that some of it is problematic or biased (e.g. stereotypical gender roles).
  • Learners should never accept the information provided by the GenAI at face value and should always critically assess it.
  • Learners also must be aware of how minority voices can be left out, because minority voices are by definition less common in the training data.
Generating deeper deepfakes

In addition to the controversies common to all GenAI, GAN GenAI can be used to alter or manipulate existing images or videos to generate fake ones that are difficult to distinguish from real ones. GenAI is making it increasingly easy to create these ‘deepfakes’ and so-called ‘fake news’. In other words, GenAI is making it easier for certain actors to commit unethical, immoral and criminal acts, such as spreading disinformation, promoting hate speech and incorporating the faces of people, without their knowledge or consent, into entirely fake and sometimes compromising films.

 

Implications for education and research

While it is the obligation of GenAI providers to protect the copyright and portrait rights of users, Learners also need to be aware that any images they share on the internet may be incorporated into GenAI training data and might be manipulated and used in unethical ways.

References

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? FAccT ‘21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery. https://doi.org/10.1145/3442188.3445922

Dwivedi, Y. K., […] & Wright, R. (2023). “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 71. 102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642

Lin, B. (2023. AI is generating security risks faster than companies can keep up. The Wall Street Journal. https://www.wsj.com/articles/ai-is-generating-security-risks-faster-than-companies-can-keep-up-a2bdedd4

 Nazaretsky, T., Ariely, M., Cukurova, M. and Alexandron, G. (2022). Teachers’ trust in AI-powered educational technology and a professional development program to improve it. British Journal of Educational Technology, 53(4), 914-931. https://doi.org/10.1111/bjet.13232