Artificial Intelligence (AI) | CCC's Velocity of Content Blog and Podcast Series https://www.copyright.com/blog/topic/artificial-intelligence/ Rights Licensing Expert Tue, 19 Dec 2023 14:23:16 +0000 en-US hourly 1 https://www.copyright.com/wp-content/uploads/2021/06/cropped-favicon-512x512-1-32x32.png Artificial Intelligence (AI) | CCC's Velocity of Content Blog and Podcast Series https://www.copyright.com/blog/topic/artificial-intelligence/ 32 32 5 AI-Related Topics Every Information Professional Should Think About in 2024  https://www.copyright.com/blog/5-ai-related-topics-every-information-professional-should-think-about-in-2024/ Tue, 19 Dec 2023 08:13:58 +0000 https://www.copyright.com/?post_type=blog_post&p=46977 Learn how information professionals can approach the changing environment caused by the rapid-fire advancements in AI technology to raise the profile of the information center and provide value.

The post 5 AI-Related Topics Every Information Professional Should Think About in 2024  appeared first on Copyright Clearance Center.

]]>
Tools like ChatGPT have changed our expectations for how search should work.  Regardless of the current risks and realities of where the technology and rights are, more and more users now expect to be able to ask a question in natural human language and expect AI-powered machines to compose a thorough and correct response in little to no time. If the expectation is changing rapidly amongst adults now, imagine the young researcher 10 or 15 years from now who has been steeped in that reality for the majority of their lives.

So, how can information professionals approach the changing environment caused by the rapid-fire advancements in AI technology to raise the profile of the information center and provide value? Consider the following five areas where you can play a role: 

1. Copyright questions about using AI in content

Information managers are often seen as copyright experts. Although you are likely not a copyright lawyer, you have a deep well of knowledge around copyright that can be employed to help your company reduce risk. I believe that is going to continue as we see more and more AI use cases.

Consider the risk when departments within the organization begin developing their own AI projects or are licensing tools that leverage AI and they do not know to ask the questions like “what rights do I have around the use of copyrighted content to train this algorithm,” “how do we accurately attribute OA content used to train an algorithm as well as in the output,”  or “how is my own intellectual property protected in this tool?” It’s important before these projects come to fruition that you’re known within your organization as a leader and resource for accurate copyright information, so you can have a seat at the table and help reduce the risk that your company could incur.

Getting Started: CCC’s Intersection of AI and Copyright page serves as a resource for information on the responsible development and use of AI technologies with copyright-protected content. 

2. Licensing content for AI

 Information managers are often already in charge of licensing subscriptions for scientific literature and databases for use across the organization. Information managers are, therefore, uniquely positioned to also determine how best to license externally created copyrighted materials from publishers for use in AI projects. When you are a central hub of content licensing, information managers can evaluate the needs across the organization and license efficiently, removing the licensing burden from siloed groups around the organization that are only thinking about their individual AI project. 

3. Company guidelines and strategic directives around technologies using AI

More and more, we are hearing from information managers that senior leadership within organizations are putting forth goals and setting expectations that technologies adopted by the organization leverage AI to improve efficiency and outcomes.  This is perfectly reasonable given the landscape we’re in, where ChatGPT is being experimented with by everyone and most technologies, for better or worse, are putting some kind of LLM or generative AI into their tools to improve workflows.  These directives may not consider the reality and limitations of where the technology actually is, however, nor the risks in using AI.

While AI holds great promise for R&D if used responsibly, AI systems also have the potential to generate bad science, make false or misleading conclusions, promote misinformation, and lead to harmful results. And by now we have all heard stories about hallucinations from Large Language Models (hallucinations being a fancy word for when the AI makes up facts). Many of these problems relate to the fundamental nature of generative AI as a text predicting tool, not as a system that has real knowledge. The quality, accuracy, or bias of the training data can affect the output (or more simply: garbage in, garbage out). Equally, the application of an LLM to a domain for which it is lacking training, can yield hallucinations. Techniques such as retrieval augmented generation, or RAG, are being explored to address these issues. This also means that a large amount of human validation of the results is required to use AI in the life sciences, which reduces the efficiency promised by AI. So, we’re seeing a healthy skepticism and caution alongside the optimism for an AI powered future. 

In addition, we expect more and more organizations will adopt company guidelines on the use of AI, including what data you can (and more importantly cannot) use in projects, what you can do with the output, and the types of tools you can use.  Many organizations are starting to develop cross-functional AI groups, where legal, IT, and other stakeholders evaluate proposed use cases before green-lighting their use internally. 

Luckily, information managers are experts in information, certainly from a licensing and management side, but also in terms of searching, synthesizing, and validating results, and I have spoken to several information managers who demanded a seat at the table. We are in a key position to help evaluate the output of these tools to make sure they are delivering on the promise of AI.  This is an opportunity to raise the profile of the information center by helping different functional areas assess what they need from a tool, to determine how to use data and content in a way that is copyright compliant and follows internal company guidelines. And, most importantly, help validate the output. 

4. Budget ambiguity

We expect some budgeting issues around the use of content in AI.  We’re seeing direct licenses being negotiated by individuals working on single projects or groups with a targeted need without consulting the information center.  That means that multiple groups may be negotiating with the same publishers without knowing it and without other groups being able to take advantage.  Partly this comes down to who holds the budget. As mentioned previously, if the information center holds the budget, then they can scan across the organization to see all of the need for a particular publisher and negotiate accordingly.  But the budgets for information centers would need to increase to accommodate this.

I think this is where a tight partnership with data science and some executive sponsorship really comes into play.  Simplistically speaking, this works best when the information center hast the authority to manage the licensed content and licensing budget for AI.  Alternatively, AI project negotiations can be successful when there are tight partnerships with data science and other functional areas specifically ones in which they bring the information center into negotiation processes and also support any necessary monetary investments with bill back processes.

Mary Ellen Bates, a highly respected thought leader and consultant in the information management industry, recently conducted a research project for CCC analyzing how information professionals can partner with data professionals to provide intelligence to their clients in an increasingly complex and interconnected information environment.  Partnering with data science, both teams leveraging their unique strengths, we believe is a strategic and valuable path forward. 

Check out our three-part series with Mary Ellen here:  

5. Stay current

There are a lot of responsibilities for information managers in this fast-changing AI landscape – be the AI-copyright expert, license content in new and rapidly changing ways, join cross-functional teams, and advocate for appropriate budget by partnering with stakeholders. You need to stay on top of the rapidly changing advancements in AI-technologies so you can effectively evaluate vendors, the type of AI used, the use cases, and the potential risks. You essentially have to learn a whole new way of thinking and working, which has enormous possibilities.  At a recent Pistoia Alliance conference in Boston, we heard the refrain several times from life science leaders that “your job isn’t going to be replaced by AI, but you will be replaced by someone willing to use AI.” While fear may be a partial motivator, I’ve talked to many informational professionals who are taking on these tasks because they know it will help them stay relevant and help their company to gain efficiency.

But you still have a day job! One of the most important things to recognize and advocate for is that you need the support and bandwidth to focus on how the information center can support the organization in its strategic AI goals. This is likely a tall order, especially for solo librarians and information centers already working at capacity.  It will require selling the potential benefits internally to key stakeholders and leadership for the new and forward-thinking evolution of library services.  

The post 5 AI-Related Topics Every Information Professional Should Think About in 2024  appeared first on Copyright Clearance Center.

]]>
Thoughts on the U.S. Executive Order on Artificial Intelligence https://www.copyright.com/blog/thoughts-on-the-u-s-executive-order-on-artificial-intelligence/ Wed, 13 Dec 2023 08:24:06 +0000 https://www.copyright.com/?post_type=blog_post&p=46883 According to the Biden Administration, “the Executive Order establishes new standards for Artificial Intelligence (AI) safety and security, protects Americans’ privacy, advances equity and civil rights, stands up for consumers and workers, promotes innovation and competition, advances American leadership around the world, and more.”

The post Thoughts on the U.S. Executive Order on Artificial Intelligence appeared first on Copyright Clearance Center.

]]>
This piece originally appeared in The Scholarly Kitchen on 4 December.

On October 30, the Biden Administration issued an Executive Order on “Safe, Secure, and Trustworthy Artificial Intelligence.” According to the Administration, “[t]he Executive Order establishes new standards for Artificial Intelligence (AI) safety and security, protects Americans’ privacy, advances equity and civil rights, stands up for consumers and workers, promotes innovation and competition, advances American leadership around the world, and more.”

I share my thoughts on the Executive Order below:

There has been significant governmental activity around AI, driven especially by the G7 Hiroshima process. In reading the Executive Order (EO), I was most interested in learning the Biden Administration’s approach on three topics: (1) copyright, (2) AI accountability, and (3) AI use in education.

The Executive Order kicked the can on copyright. The US Copyright Office (part of the Legislative Branch) is currently in the middle of a massive AI study process, and the Executive Order directs the head of the US Patent and Trademark Office (US PTO, part of the Executive Branch) to meet with the head of the USCO within six months of the Copyright Office’s issuance of any final report (traffic is bad in DC). At such time, the US PTO is directed to “issue recommendations to the President on potential executive actions relating to copyright and AI.” On the positive side, at least the EO acknowledged that copyright is relevant.

On accountability, as I noted previously in The Scholarly Kitchen, to reach its full potential AI needs to be trained on high quality materials and that information needs to be tracked and disclosed. While the EO could have said more on this topic, I was pleased to note that it includes language such as a mandate to the Secretary of Health and Human Services to create a task force whose remit includes ensuring “development, maintenance, and availability of documentation to help users determine appropriate and safe uses of AI in local settings in the health and human services sector.”

Finally, on education, I was happy to see the following:

To help ensure the responsible development and deployment of AI in the education sector, the Secretary of Education shall, within 365 days of the date of this order, develop resources, policies, and guidance regarding AI. These resources shall address safe, responsible, and nondiscriminatory uses of AI in education, including the impact AI systems have on vulnerable and underserved communities, and shall be developed in consultation with stakeholders as appropriate. They shall also include the development of an “AI toolkit” for education leaders implementing recommendations from the Department of Education’s AI and the Future of Teaching and Learning report, including appropriate human review of AI decisions, designing AI systems to enhance trust and safety and align with privacy-related laws and regulations in the educational context, and developing education-specific guardrails.

Students are not “one size fits all.” Students in my local school district speak 151 home languages other than English. Within each language group, including native English speakers, we have children from some of the wealthiest zip codes in America as well as a student homelessness rate of greater than 10%. In districts such as mine, which is diverse in terms of nearly every measure — including gender, racial, religious, and national origin — personalized and adaptive educational tools are needed. CCC’s work with schools and ed tech providers who license high quality content for AI-based rights is promising and we have experience with how districts especially would benefit from more federal support. Let’s hope it is forthcoming.

The post Thoughts on the U.S. Executive Order on Artificial Intelligence appeared first on Copyright Clearance Center.

]]>
AAP Responds To “Flawed” GenAI Arguments https://www.copyright.com/blog/aap-responds-to-flawed-genai-arguments/ Fri, 08 Dec 2023 14:09:19 +0000 https://www.copyright.com/?post_type=blog_post&p=46881 AAP responded to tech industry assertions that respect for copyright is an obstacle to their innovation by labeling that as, “nonsense.”

The post AAP Responds To “Flawed” GenAI Arguments appeared first on Copyright Clearance Center.

]]>

The Association of American Publishers (AAP) has released its Reply Comments to the U.S. Copyright Office concerning a Notice of Inquiry regarding copyright law and artificial intelligence.

According to Andrew AlbanesePublishers Weekly executive editor, “the AAP told the Copyright Office that the tech industry needs to stop telling copyright owners to back off on their claims that the use of their works to create training datasets for Gen AI systems without permission is infringement.”

Click below to listen to the latest episode of the Velocity of Content podcast.

Albanese tells me that AAP’s responded to tech industry assertions that respect for copyright is an obstacle to their innovation by labeling that as, “nonsense.”

“It would be a grave error to repeat the past policy mistakes that allowed technology companies to achieve such an unhealthy, monopoly-like market dominance to the point that governments have struggled to curb their power,” the AAP statement also said.

The post AAP Responds To “Flawed” GenAI Arguments appeared first on Copyright Clearance Center.

]]>
The United States Copyright Office Notice of Inquiry on AI: A Quick Take https://www.copyright.com/blog/the-united-states-copyright-office-notice-of-inquiry-on-ai-a-quick-take/ Wed, 06 Dec 2023 14:20:06 +0000 https://www.copyright.com/?post_type=blog_post&p=46874 Roy Kaufman evaluates responses from the United States Copyright Office notice of inquiry entitled “Artificial Intelligence and Copyright.”

The post The United States Copyright Office Notice of Inquiry on AI: A Quick Take appeared first on Copyright Clearance Center.

]]>
This post originally appeared in the Scholarly Kitchen on 11/28/23

Monday, October 30 was the final date for interested parties to submit comments to a comprehensive “Notice of inquiry and request for comments” issued by the United States Copyright Office entitled “Artificial Intelligence and Copyright.” With 34 questions asked about both copyright and technology, some parties responses exceed 100 pages. More than 9,000 responses have been filed. On the assumption that readers might be interested in this topic and less interested in reviewing all the responses, I have pasted below a selection of questions and answers from Copyright Clearance Center’s (CCC’s) own response.

Does the increasing use or distribution of AI-generated material raise any unique issues for your sector or industry as compared to other copyright stakeholders?

AI generated materials may both advance text publishing and hinder it. In sectors such as science, news, and book publishing, poor quality AI materials can generate bad science, promote misinformation, and lead to harmful results. This is not to say that such is the inevitable result of all AI; merely that it is a meaningful risk with respect to certain AI applications. AI can advance text publishing by providing tools for writing, checking, validating, and improving text-based works. It is also useful for primary research that may result in the creation of new content.

How or where do developers of AI models acquire the materials or datasets that their models are trained on? To what extent is training material first collected by third-party entities (such as academic researchers or private companies)?

In the text sector, developers of AI models — when acting lawfully — acquire materials and data sets from publishers, other rightsholders, websites that allow crawling, intermediaries, and aggregators (such as CCC). Significant amounts of content are available through licenses, including open licenses such as CC BY and CC BY-NC. Significant amounts of content are also available through the public domain. When acting unlawfully, AI developers receive materials from pirate sites, through downloading in violation of express terms and flags, and from so-called “shadow libraries,” among other things.

To what extent are copyrighted works licensed from copyright owners for use as training materials? To your knowledge, what licensing models are currently being offered and used?

Copyrighted materials are licensed for AI use directly by rightsholders and collectively through rights aggregators such as CCC. CCC’s collective licenses are non-exclusive, global, and fully voluntary. Our current AI-related offerings are focused on the corporate, research, academic and education markets.

Additionally, in science publishing, under “open access” business models, copyright owners employ open licensing which sometimes allows licensed reuse for AI under the terms of such licenses. According to this report, open models accounted for 31% of articles, reviews and conference papers in 2021.

Are some or all training materials retained by developers of AI models after training is complete, and for what purpose(s)? Please describe any relevant storage and retention practices.

Humans communicate in natural language by placing words in sequences; the rules about what the sequencing and specific form of a word are dictated by the specific language (e.g., English). An essential part of the architecture for all software systems (and therefore AI systems) that process text is how to represent that text so that the functions of the system can be performed most efficiently.

Almost all large language models are based on the “transformer architecture,” which invokes the “attention mechanism.” The latter is a mechanism that allows the AI technology to view entire sentences, and even paragraphs, at once rather than as a mere sequence of characters. This allows the software to capture the various contexts within which a word can occur.

Therefore, a key step in the processing of a textual input in language models is the splitting of the user input into special “words” that the AI system can understand. Those special words are called “tokens.” The component that is responsible for that is called a “tokenizer.” There are many types of tokenizers. For example, OpenAI and Azure OpenAI use a subword tokenization method called “Byte-Pair Encoding (BPE)” for its Generative Pretrained Transformer (GPT)-based models. BPE is a method that merges the most frequently occurring pairs of characters or bytes into a single token, until a certain number of tokens or a vocabulary size is reached. The larger the vocabulary size, the more diverse and expressive the texts that the model can generate.

Once the AI system has mapped the input text into tokens, it encodes the tokens into numbers and converts the sequences (even up to multiple paragraphs) that it processed as vectors of numbers that we call “word embeddings.” These are vector-space representations of the tokens that preserve their original natural language representation that was given as text. It is important to understand the role of word embeddings when it comes to copyright because the embeddings are the representations (or encodings) of entire sentences, paragraphs, and even documents, in a high-dimensional vector space. It is through the embeddings that the AI system captures and stores the meaning and the relationships of the words from the natural language.

Embeddings are used in practically every task that a generative AI system performs (e.g., text generation, text summarization, text classification, text translation, image generation, code generation, and so on).

Word embeddings are usually stored in vector databases but a detailed description of all the approaches to storage is beyond the scope of this response since there is a wide variety of vendors, processes, and practices that are in use.

Under what circumstances would the unauthorized use of copyrighted works to train AI models constitute fair use? Please discuss any case law you believe relevant to this question.

U.S. law has no specific rules governing the use of copyrighted materials to train AI. Rather, such uses fall under the general copyright regime. Under U.S. law, copying copyrighted content to train AI can state a cause of action for infringement [Citing, Thomson Reuters Enters. Ctr. GmbH v. ROSS Intelligence Inc., 529 F.Supp.3d 303 (D. Del. 2021) (downloading and copying of Westlaw database for the purpose of training AI).] Thus, such activities require a license to be non-infringing unless they fall under the fair use exception.

The application of fair use to an infringement is fact dependent. Copying for purposes of training an AI will usually entail copying the complete work. Whether the copying is for commercial or non-commercial research purposes will be considered. The courts will also look very closely at market harm under the fourth factor. As stated by the Supreme Court in Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 590 (1994) “[the fourth factor] requires courts to consider not only the extent of market harm caused by the particular actions of the alleged infringer, but also ‘whether unrestricted and widespread conduct of the sort engaged in by the defendant … would result in a substantially adverse impact on the potential market’ for the original.” And, as reinforced by the recent Supreme Court decision in Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, 598 U.S. (2023), the impact of the infringing use on licensing is one of the key factors in determining market harm.

Relevant instructional cases include the cases mentioned above as well as Fox News Network, LLC v. TVEyes, Inc., 883 F.3d 169 (2d Cir. 2018), where the Second Circuit Court of Appeals rejected a fair use defense in a case of allegedly transformative compiling of recorded broadcasts into text searchable databases that allowed search and viewing of short excerpts. By contrast, the Second Circuit had previously considered the text mining of scanned books for non-commercial social science research in Authors Guild v. Google, Inc. 721 F.3d 132 (2d Cir. 2015), and held that copies made and used for a specific purpose involving snippets would likely fall under fair use.

There are currently multiple pending cases in the U.S. relating to use of copyrighted content for the development of AI systems. Congress has expressed interest in the issue by including language in the SAFE Innovation Framework that the Framework will “support our creators by addressing copyright concerns, protect intellectual property, and address liability.”

Should copyright owners have to affirmatively consent (opt in) to the use of their works for training materials, or should they be provided with the means to object (opt out)?

Copyright is, and should remain, an opt in regime. Placing the burden of asserting rights on the copyright holders is inequitable, burdensome, and largely impractical. Only those making copies know what they are copying in the first instance and thus the copyright owners are not in a position to opt out.

9.2. If an ‘‘opt out’’ approach was adopted, how would that process work for a copyright owner who objected to the use of their works for training? Are there technical tools that might facilitate this process, such as a technical flag or metadata indicating that an automated service should not collect and store a work for AI training uses?

There is good reason that copyright is an “opt in” regime. Some AI developers have gathered content by routinely ignoring flags, copyright notices and metadata. Thus, while there are protocols and flags that can be used and are used by rightsholders and honored by ethical AI developers, they are no substitute for placing the responsibility for compliance on the user. Moreover, requiring flags and metadata assumes that the content resides on a server or website under the control of the rightsholder. This is not always true. For example, in the recent case of Am. Soc’y for Testing & Materials v. Public.Resource.Org, Inc., 82 F.4th 1262 (D.C. Cir. 2023), the Court of Appeals for the District of Columbia Circuit ruled that the non-commercial posting of technical standards incorporated into reference by law is fair use. It would be problematic to assume that the entity posting the standards over the objection of copyright owners would take steps to reserve the copyright owner’s AI rights.

Finally, for smaller creators, any obligation to adopt technical protection measures or flags is unfair and unduly burdensome.

Technical flags and metadata are useful for AI developers who act ethically and have another great value; where ignored by AI developers they can provide evidence of willfulness.

What legal, technical, or practical obstacles are there to establishing or using such a process? Given the volume of works used in training, is it feasible to get consent in advance from copyright owners?

It is feasible to acquire advance consent of copyright owners. It is not feasible to place the burden on rightsholders to police their rights without knowing who is using their works without authorization and how the works are being used.

The burden of implementing technical measures, flags, and metadata may, depending on the sector, be involved, complicated and costly to copyright owners. In the recent past, international sector-wide initiatives such as ACAP have absorbed significant time and resources on the part of rightsholders and users seeking to act ethically, only to be rejected by the tech industry. Current efforts of note include the W3C Text and Data Mining Rights Reservation Protocol.

As noted above, as a practical matter, a copyright holder may have no control over websites where its content is held. This is especially true where content is posted in violation of copyright or under a copyright exception.

There is certainly enough copyrightable material available under license to build reliable, workable, and trustworthy AI. Just because a developer wants to use “everything” does not mean it needs to do so, is entitled to do so, or has the right to do so. Nor should governments and courts twist or modify the law to accommodate them.

The post The United States Copyright Office Notice of Inquiry on AI: A Quick Take appeared first on Copyright Clearance Center.

]]>
AI, Licensing, and The Path Forward https://www.copyright.com/blog/ai-licensing-and-the-path-forward/ Mon, 27 Nov 2023 14:07:15 +0000 https://www.copyright.com/?post_type=blog_post&p=46771 In the rapidly developing world of AI uses and discussions, copyright issues are key.

The post AI, Licensing, and The Path Forward appeared first on Copyright Clearance Center.

]]>
From Large Language Models (LLMs) to other research-based applications, AI technologies rely on millions of books, scholarly journals, and other curated publications. Responsibly using these works is a foundational part of the discussion, and solutions like voluntary collective licensing can provide an efficient and compelling way to enable authorized uses and reduce the risk of infringement related to technologies that are developed with machine-readable content.

On Thursday, October 12, 2023, CCC presented a Town Hall special program on voluntary collective licensing, its significance to research in many fields, and the role it can play to drive innovation in science and technology, including AI.

Click below to listen to the latest episode of the Velocity of Content podcast.

During the program, CCC’s General Counsel Catherine Zaller Rowland and a panel of international legal experts including Prof. Daniel GervaisBruce Rich, and Carlo Scollo Lavizarri considered how voluntary collective licensing is a proven way to use large collections of copyrighted materials with permission, and why AI technologies must address important concerns over equity, transparency, and authenticity.

The post AI, Licensing, and The Path Forward appeared first on Copyright Clearance Center.

]]>
Digital Hollywood Focus on AI and Copyright https://www.copyright.com/blog/digital-hollywood-focus-on-ai-and-copyright/ Mon, 13 Nov 2023 14:12:22 +0000 https://www.copyright.com/?post_type=blog_post&p=46677 Since 1787, US copyright law has raced to keep up with innovation and change.

The post Digital Hollywood Focus on AI and Copyright appeared first on Copyright Clearance Center.

]]>
While the US Congress holds the power to legislate on copyright, the last update of the Copyright Act went into force in 1978, and the last major copyright legislation, the Digital Millennium Copyright Act, was passed in 1998.

In the meantime, shaping the evolution of copyright law has been left not to elected representatives but to appointed judges.

Click below to listen to the latest episode of the Velocity of Content podcast.

In September, the Authors Guild, John Grisham, Jodi Picoult, David Baldacci, and George R. R. Martin, along with 13 other authors, filed a class-action suit against OpenAI, the creators of ChatGPT, for copyright infringements of their works of fiction, on behalf of a class of such fiction writers.

For a recent Digital Hollywood virtual conference on AI and the creative industries, I moderated a panel program considering the copyright questions involved in such cases as well as the implications across all creative industries when AI systems use copyright-protected content in developing so-called generative AI solutions. Guest speakers included:

  • Matthew Asbell, principal with Offit Kurman in New York City;
  • James Sammataro, a partner and co-chair of the music group and media and entertainment litigation practice at Pryor Cashman in Miami;
  • Pamela Samuelson is co-director of the Berkeley Center for Law & Technology; and
  • Scott Sholder, partner of Cowan, DeBaets, Abrahams & Sheppard in New York City.

The post Digital Hollywood Focus on AI and Copyright appeared first on Copyright Clearance Center.

]]>
Catching Up on AI and Other Topics With Roy Kaufman https://www.copyright.com/blog/catching-up-on-ai-and-other-topics-with-roy-kaufman/ Thu, 09 Nov 2023 14:29:53 +0000 https://www.copyright.com/?post_type=blog_post&p=46642 Roy Kaufman is Managing Director, Business Development and Government Relations at CCC, where he participates in a wide range of copyright and licensing conversations internally and with many industry groups. We recently caught up with Roy for an industry update on AI and other related topics.

The post Catching Up on AI and Other Topics With Roy Kaufman appeared first on Copyright Clearance Center.

]]>
Roy Kaufman is Managing Director, Business Development and Government Relations at CCC, where he participates in a wide range of copyright and licensing conversations internally and with many industry groups. We recently caught up with Roy for an industry update.

CCC: Hi, Roy. What kinds of conversations are you having inside CCC and with external stakeholders about generative AI?

RK: It is not just generative AI but also other forms of AI, text and data mining, open AI systems, and the like. CCC is participating in several public commentary processes currently underway before governmental bodies in the US, UK, EU, G7, and elsewhere. It is certainly a hot topic, and whether you look at government interventions or the spate of legal actions, it has to be seen as an important issue.

CCC: I would expect that there is also a #2, #3 and #4 on that list of priority topics, though. Am I right?

RK: Yes. I read the DC Circuit Court of Appeals holding in the ASTM vs Public. Resource.org case with a great deal of interest; I am talking with a bunch of folks about the impact that is having on Standards Development Organizations’ (SDOs) licensing, development, and sustainability practices. That case too leads in curious ways back to AI, as does almost every copyright case, including Warhol.

Meanwhile, we are involved with responses to the White House Office of Technology Policy (OSTP)’s public access requirements. We are especially interested in the OSTP’s questions about persistent identifiers (PIDs) and other metadata. There is so much that funding agencies can do to promote scientific communication, and so many ways to get it wrong. At least with the discussions of PIDs and metadata there is a recognition that policy pronouncements do not in themselves enable change.

Another priority for copyright wonks is US Copyright Office modernization. I have been serving on a Library of Congress committee and recently participated in a participation discussion group in Congress. This has been ongoing for some time.

CCC: What were your takeaways from the Frankfurt Book Fair?

RK: Geopolitics aside, the big issue at Frankfurt was….AI. AI and technical standards, AI and Open Access, AI and publication ethics, AI in the publication process. AI is not new, and as you know; at CCC we have been licensing text and data mining (TDM) for years. Publishers and educational technology (edtech) companies have been using AI for a long time. What is different is how it has now crashed into all conversations.

CCC: Last thoughts to leave the readers of Velocity of Content with for now?

RK: I would like to underscore something about the nexus of generative AI and copyright. Copyright in its modern form has been around since 1709, when the most significant relevant technological advances were the move from animal skins to plant-based paper in the late Middle Ages and the introduction of the printing press to Europe in the 15th Century. Copyright law has navigated – to name a few– the introduction for photography, photo reprography, recorded music, radio, TV, silent films and talkies, eBooks, and MP3s. In each case, there was no doubt someone who sought to capitalize on the new technology by not compensating rightsholders, arguing that the technology was new, and that copyright would stifle innovation. AI is not new, and neither are the arguments that it is incompatible with fundamental copyright concepts such as advanced consent, attribution, and compensation.

The post Catching Up on AI and Other Topics With Roy Kaufman appeared first on Copyright Clearance Center.

]]>
Kindle Authors To Have AI-Audiobooks https://www.copyright.com/blog/kindle-authors-to-have-ai-audiobooks/ Fri, 03 Nov 2023 13:09:18 +0000 https://www.copyright.com/?post_type=blog_post&p=46579 KDP authors can now choose from a selection of AI-generated narrators to create a machine-generated audiobook.

The post Kindle Authors To Have AI-Audiobooks appeared first on Copyright Clearance Center.

]]>
On Wednesday, Kindle Direct Publishing, Amazon’s self-publishing service, announced an “invite only” beta test for a service allowing authors to create audiobooks using virtual voice narration.

Under the new initiative, KDP authors with an eligible e-book on the KDP platform may choose from a selection of AI-generated narrators, then preview and customize a machine-generated audiobook.

Click below to listen to the latest episode of the Velocity of Content podcast.

“This program makes a lot of sense for a lot of self-published authors who would never be able to afford to create audio editions,” comments Andrew Albanese, Publishers Weekly executive editor.

The program and the technology will significantly increase the selection of audiobooks available by KDP authors, Albanese tells me.

“According to an Amazon spokesperson, only four percent of self-published titles with KDP have a companion audiobook available,” he explains.

The post Kindle Authors To Have AI-Audiobooks appeared first on Copyright Clearance Center.

]]>
CCC At Frankfurt Book Fair 2023 https://www.copyright.com/blog/ccc-at-frankfurt-book-fair-2023/ Mon, 30 Oct 2023 12:50:34 +0000 https://www.copyright.com/?post_type=blog_post&p=46472 At the 2023 Frankfurt Book Fair last week, the halls were alive with the sounds of AI.

The post CCC At Frankfurt Book Fair 2023 appeared first on Copyright Clearance Center.

]]>
Language interfaces are going to be a big deal. That’s how Sam Altman, chair of OpenAI, put it when the company launched ChatGPT last November.

Going to be a big deal? Definitely a big deal.

At the 2023 Frankfurt Book Fair last week, the halls were alive with the sounds of AI. In the Frankfurt Studio, I moderated a panel discussion, “Trained With Your Content,” considering what limits should be placed on training Large Language Models (LLMs) and how to address concerns over equity, transparency, and authenticity.

Click below to listen to the latest episode of the Velocity of Content podcast.

“Right now, the current status situation is that the AI governance is far behind the AI capabilities, which is dangerous,” noted Dr. Hong Zhou, Director of Intelligent Services & Head AI R&D, Wiley. “This has impacted the research and also the publishing, because it’s very hard for the people to manage all these AI capabilities.

“That’s why we need to create the legal framework to catch up to these technologies to have the response,” he explained. “I do have several concerns about this. The first concern, as everyone knows, is copyright infringement. Today, generative AI generates content which infringes on copyright without permission. This is a problem. Another concern, actually, is that AI can generate content that is similar to the original content but is not enough to be considered as copyright infringement. This is one scenario. Another scenario is it generates some content which infringes the copyright, but it’s hard to detect. In both cases for the copyright holders, it’s very difficult for them to enforce the rights – in both cases.”

According to Dr. Namrata Singh, Founder and Director, Turacoz Groupthe ICMJE has developed guidelines on the responsibility of scientific authors when using AI in their work.

“If you have used an AI tool, then you mention that in your methods section. You mention the name of the tool. You mention the version if it is there or the whole technology part behind it. This is where, I guess, the transparency works. But ultimately, the responsibility is on the author. But guidelines and recommendations do help us just to know what is right and what is wrong and what we can do and what we cannot do.”

The demand for AI tools in research and scholarly publishing raises copyright-related questions about the use of published materials that feed the tools. Carlo Scollo Lavizarri described how licensing solutions might meet that demand.

“These licenses can either be from segments of publishing, perhaps, that have large content that they can license, or it could be voluntary collective license, linking many-to-many situations. For example, you have many writers, many publishers on the one side, and you have many pieces of content on the other side used by different AI tools. So that is one such mechanism – voluntary collective licensing.”

2023 Frankfurt Book Fair Panel

The post CCC At Frankfurt Book Fair 2023 appeared first on Copyright Clearance Center.

]]>
How Generative AI Challenges Standards Publishers https://www.copyright.com/blog/how-generative-ai-challenges-standards-publishers/ Thu, 12 Oct 2023 08:12:07 +0000 https://www.copyright.com/?post_type=blog_post&p=46221 Learn about standards publishers and how they are responding to the needs of their stakeholders in an increasingly digital and connected world.

The post How Generative AI Challenges Standards Publishers appeared first on Copyright Clearance Center.

]]>
In early May, CCC hosted “Workflow of the Future: Sustainable Business Models,” the latest event in a series designed to help facilitate important conversations on critical topics related to standards. This event focused on standards publishers and how they are responding to the needs of their stakeholders in an increasingly digital and connected world. You can read a summary of the event here.

This post is the first in a series of opinion pieces picking up on the themes and topics discussed in the webinar, with a particular focus on the current and potential impacts of technology trends – in this case genAI – on the standards ecosystem. Today, I will focus on the perspective of the standards publisher.

In the webinar – a mere four months ago (although it feels much longer) – I described the rate of announcements around genAI as like observing a ‘tsunami’ coming towards the shore. Since then, the pace of change has remained relentless, although it takes more exceptional items to break through the news cycle. I’d argue that the AI wave is still coming in, and various parts of the content landscape are underwater. It remains to be seen whether this AI wave is solely destructive, or if it also stimulates new growth in the standards ecosystem.

Examples of the pace of change are so numerous that I have had to rewrite this section several times. There have been increasing calls for regulation, including various preliminary legislative hearings across the globe; countless product and API releases; several AI-focused standards industry events; and inevitably a number of class action lawsuit filings over how the LLMs gather information. Regardless, the innovation curve is still accelerating as more and more organizations look to adopt genAI to either reduce costs or in the quest for new forms of value.

During the webinar, Silona Bonewald claimed ‘the biggest competitor’ she fears is ‘time’. I’d reframe this slightly, as inertia. To many observers, this author included, the industry is almost pathologically averse to change. And the main change that is required is a change of mindset.

Industry initiatives like shared schemas, ontologies, or identifiers – even SMART – all of which are looking to grow the world of standards usage, tend to have a frustratingly small number of active participants. We should be working together to meet users’ needs through digital means – and by ‘work’ I mean code, infrastructure, interfaces, schemas, talent. And we should embrace the FAIR principle for ‘As open as possible, as closed as necessary’. The world is increasingly connected, and our customers depend on us working together – and we need to be in that world. Not a closed world based on how the world was 20+ years ago. 

With notable exceptions, one of the specific challenges for standards publishers is that their customers may not use, let alone derive value from, their product for months, if not years after purchase. There is very limited feedback available from which to make decisions. And in the absence of data, we have inertia.  

I’m not saying we all need to deploy chatbots or abandon the core principles of value through wisdom that most standards publishers uphold. But I am saying that if we don’t do more to acknowledge the world around us, then as Silona also said, ‘we’re already dying’. Just as a generation of publishers were taught to mimic Amazon if we wanted to be successful in ecommerce, we would need to at least approximate ChatGPT levels of empathy and simplicity if we expect to survive in the future. 

The post How Generative AI Challenges Standards Publishers appeared first on Copyright Clearance Center.

]]>