The use of natural language generation (NLG) models such as #ChatGPT or #GPT4 in scientific writing is a controversial and emerging topic. Some researchers argue that using these models can enhance the clarity, creativity and impact of their manuscripts, while others worry that it may compromise the originality, accuracy and ethics of their work. In this blog post, we will discuss some of the benefits and challenges of using NLG models in scientific writing, and provide some guidelines on how to disclose their use in a transparent and responsible manner.
Benefits of using NLG models in scientific writing
NLG models are trained on large corpora of text from various domains and genres, and can generate fluent and coherent text based on a given prompt or keywords. They can also adapt to different styles, tones and formats depending on the context and the desired output. Some of the potential benefits of using NLG models in scientific writing are:
- They can help overcome writer’s block and generate ideas for topics, titles, abstracts, introductions, conclusions and more.
- They can help improve the readability and attractiveness of the manuscript by suggesting alternative words, phrases, sentences and paragraphs that are more concise, clear and engaging.
- They can help increase the novelty and diversity of the manuscript by introducing new perspectives, insights and connections that may not have been considered by the human author.
- They can help reduce the time and effort required for writing and editing the manuscript by automating some of the tedious and repetitive tasks such as formatting, referencing, checking grammar and spelling, etc.
Challenges of using NLG models in scientific writing
NLG models are not perfect and have some limitations and risks that need to be acknowledged and addressed when using them in scientific writing. Some of the main challenges of using NLG models in scientific writing are:
- They may generate text that is inaccurate, misleading, irrelevant or plagiarized from existing sources, which can compromise the validity, reliability and originality of the manuscript.
- They may generate text that is biased, offensive, inappropriate or unethical, which can harm the reputation, credibility and integrity of the human author and the scientific community.
- They may generate text that is inconsistent, contradictory or incompatible with the human author’s intended message, purpose and audience, which can confuse or mislead the readers and reviewers of the manuscript.
- They may generate text that is too similar or too different from the human author’s style, tone and voice, which can affect the coherence, identity and authenticity of the manuscript.
Guidelines for disclosing the use of NLG models in scientific writing
Given the benefits and challenges of using NLG models in scientific writing, it is important to disclose their use in a transparent and responsible manner. This can help avoid potential ethical issues such as deception, plagiarism, misrepresentation or fraud. It can also help inform the readers and reviewers of the manuscript about the methods, sources and limitations of the generated text. Some of the possible guidelines for disclosing the use of NLG models in scientific writing are:
- Specify which parts of the manuscript were generated by an NLG model (e.g., title, abstract, introduction, conclusion, etc.) and which parts were written or edited by a human author.
- Specify which NLG model was used (e.g., #ChatGPT or #GPT4), what version or parameters were used (e.g., model size, temperature, top-k), what prompt or keywords were used (e.g., «How should …»), and what source or corpus was used to train or fine-tune the model (e.g., Wikipedia articles on science).
- Specify how much editing or revision was done by a human author on the generated text (e.g., none, minor, moderate or major), what criteria or standards were used to evaluate or modify the generated text (e.g., accuracy, relevance, originality), and what tools or methods were used to check or correct the generated text (e.g., plagiarism detection software).
- Acknowledge any limitations or uncertainties associated with the use of an NLG model (e.g., potential errors, biases or inconsistencies), any ethical or legal implications (e.g., intellectual property rights or data privacy issues), any conflicts of interest or funding sources (e.g., sponsorship by an NLG company), any feedback or assistance received from other human authors or experts (e.g., co-authors or mentors).
Conclusion
The use of NLG models such as #ChatGPT or #GPT4 in scientific writing is a novel and exciting phenomenon that offers both opportunities and challenges for researchers. By disclosing their use in a transparent and responsible manner, researchers can leverage their potential benefits while minimizing their potential risks. This can also foster a culture of openness, honesty and collaboration among researchers who use NLG models in their work.