Velocity of Content | Copyright Clearance Center

What Is (and Isn’t) Protected by Copyright?

CCC — Wed, 20 Dec 2023 07:24:13 +0000

Where copyright protection begins and ends

Employees consume and share copyrighted materials all day long. It’s just business. However, routine content exchanges such as sharing published reports, articles and other information found on the Web, have copyright implications, which can expose companies to a greater risk of infringement. While you may know the basics of copyright, your colleagues and staff may not.

What is copyright?

In the U.S., copyright is a form of protection provided by the government to the authors of “original works of authorship, including literary, dramatic, musical, artistic, and certain other intellectual works.” This protection is available to both published and unpublished works in the U.S., regardless of the nationality or domicile of the author. Copyright protection exists from the time the work is created in a fixed, tangible form of expression. The copyright in the work of authorship immediately becomes the property of the author who created the work. Here is a breakdown of where U.S. copyright law protection starts and ends:

Protected by Copyright:

Literary works (not just The Grapes of Wrath or The Tipping Point, but all works expressed in writing both in print and digital form, however formally or informally recorded)
Computer software (considered to be literary works)
Pictorial, graphic, and sculptural works (e.g., paintings, drawings, carvings, photographs, clothing designs, textiles)
Architectural works (e.g., buildings themselves as well as blueprints, drawings, diagrams, and models)
Sound recordings (e.g., songs, music, spoken word, sounds, and other recordings)
Audiovisual works (e.g., live action movies, animation, television programs, and videogames)
Pantomimes and choreographic works (e.g., the art of imitating or acting out situations, and the composition of dance movements and patterns, including those accompanied by music)
Dramatic works and accompanying music (e.g., plays and musicals)

Not Protected by Copyright:

Works that have not been fixed in a tangible medium of expression (that is, not written, recorded, or captured electronically)
Titles, names, short phrases and slogans; familiar symbols or designs; mere variations of typographic ornamentation, lettering or coloring; mere listings of ingredients or contents
Ideas, procedures, methods, systems, processes, concepts, principles, discoveries, or devices, as distinguished from a description, explanation, or illustration
Works consisting entirely of information that are natural or self-evident facts, containing no original authorship, such as the white pages of telephone books, standard calendars, height and weight charts, and tape measures and rulers
Works created by the U.S. Government
Works for which copyright has expired; works in the public domain

Understanding the extent to which materials are copyright protected can help you minimize the risk of infringement by well-intentioned employees. Make it easy for employees to get up-to-speed by checking out our Copyright Education programs.

This blog post is also available as a PDF. Access it here.

*Much of the information in this post was drawn from content posted on the website of the U.S. Copyright Office and is based on the U.S. Copyright Act of 1976. The information appears here in an edited form. For the complete, unedited text, visit: www.copyright.gov.

The post What Is (and Isn’t) Protected by Copyright? appeared first on Copyright Clearance Center.

5 AI-Related Topics Every Information Professional Should Think About in 2024

Keri Mattaliano — Tue, 19 Dec 2023 08:13:58 +0000

Tools like ChatGPT have changed our expectations for how search should work.  Regardless of the current risks and realities of where the technology and rights are, more and more users now expect to be able to ask a question in natural human language and expect AI-powered machines to compose a thorough and correct response in little to no time. If the expectation is changing rapidly amongst adults now, imagine the young researcher 10 or 15 years from now who has been steeped in that reality for the majority of their lives.

So, how can information professionals approach the changing environment caused by the rapid-fire advancements in AI technology to raise the profile of the information center and provide value? Consider the following five areas where you can play a role: 

1. Copyright questions about using AI in content

Information managers are often seen as copyright experts. Although you are likely not a copyright lawyer, you have a deep well of knowledge around copyright that can be employed to help your company reduce risk. I believe that is going to continue as we see more and more AI use cases.

Consider the risk when departments within the organization begin developing their own AI projects or are licensing tools that leverage AI and they do not know to ask the questions like “what rights do I have around the use of copyrighted content to train this algorithm,” “how do we accurately attribute OA content used to train an algorithm as well as in the output,”  or “how is my own intellectual property protected in this tool?” It’s important before these projects come to fruition that you’re known within your organization as a leader and resource for accurate copyright information, so you can have a seat at the table and help reduce the risk that your company could incur.

Getting Started: CCC’s Intersection of AI and Copyright page serves as a resource for information on the responsible development and use of AI technologies with copyright-protected content. 

2. Licensing content for AI

Information managers are often already in charge of licensing subscriptions for scientific literature and databases for use across the organization. Information managers are, therefore, uniquely positioned to also determine how best to license externally created copyrighted materials from publishers for use in AI projects. When you are a central hub of content licensing, information managers can evaluate the needs across the organization and license efficiently, removing the licensing burden from siloed groups around the organization that are only thinking about their individual AI project. 

3. Company guidelines and strategic directives around technologies using AI

More and more, we are hearing from information managers that senior leadership within organizations are putting forth goals and setting expectations that technologies adopted by the organization leverage AI to improve efficiency and outcomes.  This is perfectly reasonable given the landscape we’re in, where ChatGPT is being experimented with by everyone and most technologies, for better or worse, are putting some kind of LLM or generative AI into their tools to improve workflows.  These directives may not consider the reality and limitations of where the technology actually is, however, nor the risks in using AI.

While AI holds great promise for R&D if used responsibly, AI systems also have the potential to generate bad science, make false or misleading conclusions, promote misinformation, and lead to harmful results. And by now we have all heard stories about hallucinations from Large Language Models (hallucinations being a fancy word for when the AI makes up facts). Many of these problems relate to the fundamental nature of generative AI as a text predicting tool, not as a system that has real knowledge. The quality, accuracy, or bias of the training data can affect the output (or more simply: garbage in, garbage out). Equally, the application of an LLM to a domain for which it is lacking training, can yield hallucinations. Techniques such as retrieval augmented generation, or RAG, are being explored to address these issues. This also means that a large amount of human validation of the results is required to use AI in the life sciences, which reduces the efficiency promised by AI. So, we’re seeing a healthy skepticism and caution alongside the optimism for an AI powered future. 

In addition, we expect more and more organizations will adopt company guidelines on the use of AI, including what data you can (and more importantly cannot) use in projects, what you can do with the output, and the types of tools you can use.  Many organizations are starting to develop cross-functional AI groups, where legal, IT, and other stakeholders evaluate proposed use cases before green-lighting their use internally. 

Luckily, information managers are experts in information, certainly from a licensing and management side, but also in terms of searching, synthesizing, and validating results, and I have spoken to several information managers who demanded a seat at the table. We are in a key position to help evaluate the output of these tools to make sure they are delivering on the promise of AI.  This is an opportunity to raise the profile of the information center by helping different functional areas assess what they need from a tool, to determine how to use data and content in a way that is copyright compliant and follows internal company guidelines. And, most importantly, help validate the output. 

4. Budget ambiguity

We expect some budgeting issues around the use of content in AI.  We’re seeing direct licenses being negotiated by individuals working on single projects or groups with a targeted need without consulting the information center.  That means that multiple groups may be negotiating with the same publishers without knowing it and without other groups being able to take advantage.  Partly this comes down to who holds the budget. As mentioned previously, if the information center holds the budget, then they can scan across the organization to see all of the need for a particular publisher and negotiate accordingly.  But the budgets for information centers would need to increase to accommodate this.

I think this is where a tight partnership with data science and some executive sponsorship really comes into play.  Simplistically speaking, this works best when the information center hast the authority to manage the licensed content and licensing budget for AI. Alternatively, AI project negotiations can be successful when there are tight partnerships with data science and other functional areas specifically ones in which they bring the information center into negotiation processes and also support any necessary monetary investments with bill back processes.

Mary Ellen Bates, a highly respected thought leader and consultant in the information management industry, recently conducted a research project for CCC analyzing how information professionals can partner with data professionals to provide intelligence to their clients in an increasingly complex and interconnected information environment.  Partnering with data science, both teams leveraging their unique strengths, we believe is a strategic and valuable path forward. 

Check out our three-part series with Mary Ellen here: 

5. Stay current

There are a lot of responsibilities for information managers in this fast-changing AI landscape – be the AI-copyright expert, license content in new and rapidly changing ways, join cross-functional teams, and advocate for appropriate budget by partnering with stakeholders. You need to stay on top of the rapidly changing advancements in AI-technologies so you can effectively evaluate vendors, the type of AI used, the use cases, and the potential risks. You essentially have to learn a whole new way of thinking and working, which has enormous possibilities.  At a recent Pistoia Alliance conference in Boston, we heard the refrain several times from life science leaders that “your job isn’t going to be replaced by AI, but you will be replaced by someone willing to use AI.” While fear may be a partial motivator, I’ve talked to many informational professionals who are taking on these tasks because they know it will help them stay relevant and help their company to gain efficiency. 

But you still have a day job! One of the most important things to recognize and advocate for is that you need the support and bandwidth to focus on how the information center can support the organization in its strategic AI goals. This is likely a tall order, especially for solo librarians and information centers already working at capacity.  It will require selling the potential benefits internally to key stakeholders and leadership for the new and forward-thinking evolution of library services. 

The post 5 AI-Related Topics Every Information Professional Should Think About in 2024  appeared first on Copyright Clearance Center.

In 2023, Voices Carry

Christopher Kenneally — Mon, 18 Dec 2023 14:17:44 +0000

In the final weeks of 2023, Velocity of Content is looking back at the past twelve months of programs.

Citizen. Journalist. Influencer. War crimes investigator. Leading figures in the global community of creators go by many titles. We read their words, and we watch their videos, scrolling through our content feeds in search of meaning – and for meaningful connection.

Click below to listen to the latest episode of the Velocity of Content podcast.

Journalism professor and book author Jeff Jarvis recalls that early in his own writing and publishing career, he wrote on typewriters and saw his stories set in hot metal linotype. His latest book, The Gutenberg Parenthesis, places us outside the era of print and beyond the world that print created. As transmission of knowledge and creativity shifts off the page and onto the screen, Jarvis urges that we celebrate not mourn.

In late March 2023, National Public Radio announced cancellation of several broadcast and podcast programs, as well as the layoffs of 10% of its national staff. The NPR cuts were the deepest the network has made since the Great Recession in 2008. The network’s CEO blamed a budget deficit of $30 million. Executives said they were seeking to protect core services, and they pointed a finger at declines in corporate underwriting.

American news consumers are unfazed by hearing that programming on NPR and other nonprofit public media is made possible by for-profit businesses. But Victor Pickard, co-director of the Media, Inequality and Change Center at the Annenberg School for Communication at the University of Pennsylvania, wants to take a closer look.

TikTok challenges can range from the sublime to the ridiculous – and even the dangerous. There are challenges over art projects, school bathrooms, and sometimes fatally, boat jumping. Now, TikTok may have a challenge for publishers. In May, TikTok’s China-based parent company, ByteDance, sought a trademark for its own book publishing imprint, to be called 8th Note Press. The New York Times and others reported that the fledgling publisher is approaching self-published authors with offers for book deals. The advances aren’t large, but the implications for the industry are enormous.

Mark Gottlieb, vice president and literary agent with Trident Media Group, described the bookselling power that TikTok already has, and why a publisher TikTok represents a digital-first challenge to traditional players.

At the World Expression Forum in Lillehammer, Norway, the International Publishers Association presented a Prix Voltaire Special Award for murdered Ukrainian children’s book author and poet, Volodymyr Vakulenko, who was abducted and murdered by Russian armed forces in March 2022, shortly after the full-scale invasion of Ukraine. Accepting the 2023 IPA Prix Voltaire Special Award on behalf of Volodymyr Vakulenko was the Ukrainian novelist Victoria Amelina, who received the Joseph Conrad Literary Award from the Polish Institute in Kyiv in 2021 and was a European Union Prize for Literature finalist in 2019. In May, Victoria Amelina told me why she ventured to Ukaine’s Kharkiv region to Kapitolivka, where Volodymyr Vakulenko lived with his family, and how discovered the author’s journal buried in the family garden.

A month after our interview, and immediately following the International Book Arsenal Festival in Kyiv where she spoke about Vakulenko, Victoria Amelina was fatally injured in a Russian missile attack on a restaurant in Kramatorsk in the Donbas region of eastern Ukraine. Yuliia Kozlovets, the book festival’s coordinator, told me that the national community of Ukrainian authors and publishers has struggled to make sense of her death.

Media – whether published by individuals or global corporations – has never been easier to make or been more ubiquitous. Technology sees to that. Yet we must not take for granted how critical these media activities are to the joy of celebrating our humanity and to the responsibility of sustaining our freedom.

The post In 2023, Voices Carry appeared first on Copyright Clearance Center.

Amazon Urges Judge To Dismiss Antitrust Case

Christopher Kenneally — Fri, 15 Dec 2023 14:14:26 +0000

In September, many book publishers were eager to praise an antitrust suit filed against Amazon by the Federal Trade Commission. The e-tailer giant was accused of maintaining an illegal monopoly in online commerce, although the suit did not specifically call out books.

Now, Amazon has asked a federal judge to reject the suit.

“Amazon has slammed the FTC’s antitrust case as wrong on the facts and the law, and in a 31-page filing urged a federal court to dismiss it,” reports Andrew Albanese, Publishers Weekly executive editor.

Click below to listen to the latest episode of the Velocity of Content podcast.

In their filing, Amazon states that it has “relentlessly innovated, and saved consumers money.”

The filing comes in response to the FTC’s blockbuster 172-page September lawsuit, supported by 17 state attorneys general. In their filing, Amazon states that it has “relentlessly innovated, and saved consumers money.” “The FTC is ignoring “the facially procompetitive” effects of Amazon’s conduct, the company asserts.

“As complicated as antitrust law can be, Amazon’s response is pretty straightforward,” Albanese tells me.

The post Amazon Urges Judge To Dismiss Antitrust Case appeared first on Copyright Clearance Center.

Utilizing the Annual Copyright License Across Your Organization

Beth Johnson — Thu, 14 Dec 2023 14:23:21 +0000

Fast-paced organizations that rely on and invest heavily in R&D should not only regard published content as the heart of innovation, but also possess a deep appreciation of the system of copyright protecting this intellectual property. After all, many kinds of published literature, including news, blogs, books, journals, and standards — including the organizations’ own materials — are protected by copyright laws that place limits on how content can be used by others without the rightsholder’s permission.

Success is dependent on collaboration, networking, and sharing information. Unfortunately, from growing startups to established global companies, we hear all the time from organizations whose teams are not fully aware of how copyright applies to the content they rely on to accomplish their respective goals. Without knowing what permissions are needed to reuse content — and how to obtain them if needed — these teams can put their entire organization at risk of copyright infringement.

The Annual Copyright License (ACL) from CCC helps minimize an organization’s infringement risk by providing a consistent set of global reuse rights across millions of publications from thousands of rightsholders that complements existing publisher agreements, subscriptions, and other content purchases. The license also helps increase organizational efficiency by reducing time spent on verifying rights, obtaining individual permissions, and enabling the sharing of content compliantly between employees.

How Do Different Departments Use the Annual Copyright License?

Drug development is a lengthy and complex process that requires extensive collaboration among both internal and external stakeholders. Cross-functional groups within an organization — including research and development (R&D), clinical research, competitive intelligence, regulatory, marketing, and medical affairs — must work together and with external partners to advance drug compounds from discovery to approval for use in patients to physician education and promotion.

R&D

Literature review is an essential step in the research process; it provides context, informs methodology, maximizes innovation, and helps to avoid duplicative research. Researchers and investigators need to work from and share scientific literature for analysis and review as they identify new drug targets for development. This sharing of information increases efficiencies and accelerates the timeline for developing novel therapies and new indications for existing compounds.

Clinical Research

As clinical trials become more complex, contract research organizations (CROs) have become a trusted extension of the clinical research team assisting with trial design, patient recruitment, and trial management. In order for these collaborative projects to succeed, both the internal clinical research team and the CRO need to be able to share all documentation relevant to the project, including published literature.

“Certain projects require us to collaborate with project team members from different organizations, and we have to make sure to work compliantly here. It can be a very time-consuming process but with the Collaboration Amendment we can make sure that everything is legal, and collaboration can happen seamlessly.”

[Teresa Silveira, Global Scientific Information Manager at BIAL] (https://www.copyright.com/resource-library/case-studies/bial/)

Competitive Intelligence

Missed market intelligence can be extremely costly. Competitive intelligence (CI) is a key tool that can help organizations gain valuable insights into their markets, their competitors, and their customers to enable more effective decision-making and greater revenue. Valuable CI comes from collecting data from diverse sources, including competitor analysis, market research, regulatory updates, patent information, clinical trials, and published literature. With the ACL in place, these materials can be hosted on an organization’s CI site and accessed by staff to help analyze the market landscape to identify opportunities for drug development, improved product positioning, and enhanced customer satisfaction.

Legal/Regulatory Affairs

Meeting regulatory requirements for clinical trials and new drug submissions is an essential part of conducting clinical research. Medical literature is a key source of safety information about these products, as new types of adverse reactions may first come to light as published individual case reports or as part of published clinical studies. Regulatory Affairs groups need to work closely with other departments to centralize and organize the published literature that will be submitted to support a regulatory filing for a new drug or new indication. It is important that this team not only has the ability to share these documents internally but also permission to share them with appropriate government agencies as a required part of the regulatory submission package.

Marketing

To make physicians aware of new products, promotional literature is prepared for publication in journals and for use by sales representatives. Claims made in these materials must be supported by the outcomes of published clinical research and full-text copies of these supporting materials must be stored and available for reference.

Medical Affairs

One important role of the Medical Affairs team is to explain to potential healthcare prescribers the real-world applications of a drug through the dissemination of unbiased clinical and scientific information. They are responsible for communication with payors, patients, physicians, regulators, and government agencies, and frequently need to respond to unsolicited requests for information about a drug or device.

Drive Business Forward with the Annual Copyright License

A copyright compliance strategy that informs and meets the needs of employees across the enterprise sets up an organization for higher efficiency, improved collaboration, and a minimized risk of copyright infringement, ultimately helping to fuel innovation and new discoveries.

Interested in seeing real examples of how R&D teams use the Annual Copyright License and our other solutions? Check out our case studies featuring customers in life sciences, biotech, chemical, consumer goods, and more. Click here to contact us about content management and licensing solutions for your organization.

The post Utilizing the Annual Copyright License Across Your Organization appeared first on Copyright Clearance Center.

Thoughts on the U.S. Executive Order on Artificial Intelligence

Roy Kaufman — Wed, 13 Dec 2023 08:24:06 +0000

This piece originally appeared in The Scholarly Kitchen on 4 December.

On October 30, the Biden Administration issued an Executive Order on “Safe, Secure, and Trustworthy Artificial Intelligence.” According to the Administration, “[t]he Executive Order establishes new standards for Artificial Intelligence (AI) safety and security, protects Americans’ privacy, advances equity and civil rights, stands up for consumers and workers, promotes innovation and competition, advances American leadership around the world, and more.”

I share my thoughts on the Executive Order below:

There has been significant governmental activity around AI, driven especially by the G7 Hiroshima process. In reading the Executive Order (EO), I was most interested in learning the Biden Administration’s approach on three topics: (1) copyright, (2) AI accountability, and (3) AI use in education.

The Executive Order kicked the can on copyright. The US Copyright Office (part of the Legislative Branch) is currently in the middle of a massive AI study process, and the Executive Order directs the head of the US Patent and Trademark Office (US PTO, part of the Executive Branch) to meet with the head of the USCO within six months of the Copyright Office’s issuance of any final report (traffic is bad in DC). At such time, the US PTO is directed to “issue recommendations to the President on potential executive actions relating to copyright and AI.” On the positive side, at least the EO acknowledged that copyright is relevant.

On accountability, as I noted previously in The Scholarly Kitchen, to reach its full potential AI needs to be trained on high quality materials and that information needs to be tracked and disclosed. While the EO could have said more on this topic, I was pleased to note that it includes language such as a mandate to the Secretary of Health and Human Services to create a task force whose remit includes ensuring “development, maintenance, and availability of documentation to help users determine appropriate and safe uses of AI in local settings in the health and human services sector.”

Finally, on education, I was happy to see the following:

To help ensure the responsible development and deployment of AI in the education sector, the Secretary of Education shall, within 365 days of the date of this order, develop resources, policies, and guidance regarding AI. These resources shall address safe, responsible, and nondiscriminatory uses of AI in education, including the impact AI systems have on vulnerable and underserved communities, and shall be developed in consultation with stakeholders as appropriate. They shall also include the development of an “AI toolkit” for education leaders implementing recommendations from the Department of Education’s AI and the Future of Teaching and Learning report, including appropriate human review of AI decisions, designing AI systems to enhance trust and safety and align with privacy-related laws and regulations in the educational context, and developing education-specific guardrails.

Students are not “one size fits all.” Students in my local school district speak 151 home languages other than English. Within each language group, including native English speakers, we have children from some of the wealthiest zip codes in America as well as a student homelessness rate of greater than 10%. In districts such as mine, which is diverse in terms of nearly every measure — including gender, racial, religious, and national origin — personalized and adaptive educational tools are needed. CCC’s work with schools and ed tech providers who license high quality content for AI-based rights is promising and we have experience with how districts especially would benefit from more federal support. Let’s hope it is forthcoming.

The post Thoughts on the U.S. Executive Order on Artificial Intelligence appeared first on Copyright Clearance Center.

The Boston Globe Declares CCC a Top Place to Work in Massachusetts

CCC — Tue, 12 Dec 2023 07:39:54 +0000

CCC has been named one of the Top Places to Work in Massachusetts in the large employer category for the third year in a row in the 15th annual employee-based survey project from The Boston Globe.

“CCC is honored to have been named a Top Place to Work in Massachusetts based on the votes from our team members for the third year in a row,” said Tracey Armstrong, President and CEO, CCC. “As we have grown as an organization we continue to learn together, innovate together, and strive to maintain an inclusive, people-first environment, where we emphasize the importance of community and the difference that together, we can make in the world.”

CCC remains committed to continuous improvement in creating a diverse, equitable, and inclusive (DEI) workplace. CCC’s DEI Employee Resource Group launched a pilot Girls Who Code team at the elementary school level in 2023 and piloted the first chapter north of Boston last year. Its speaker series most recently hosted Dr. Kenann McKenzie-DeFranza who talked about the importance of listening to understand each other, our customers, and our communities. CCC is a member of the Mass Technology Leadership Council (MassTLC), whose mission is to accelerate innovation, growth, and development of an inclusive tech ecosystem in Massachusetts through STEM pipeline initiatives, talent development, leadership, and advocacy. CCC is a member of the North Shore Juneteenth Association, an emerging 501 (c)(3) organization of community leaders creating awareness about Juneteenth, educating the community about positive aspects of Black American culture, and dismantling racism by using events and programming as tools of change.

The post The Boston Globe Declares CCC a Top Place to Work in Massachusetts appeared first on Copyright Clearance Center.

In Praise of the Title Verso

Christopher Kenneally — Mon, 11 Dec 2023 14:10:41 +0000

You may read a book front to back, cover to cover, and still miss the title verso. But not Richard Charkin, because that’s where he starts reading.

Title verso – by law and by tradition, every book you read will have one. “Verso” is Latin for reverse. The title verso is the text behind the title page, including always an ISBN catalog number, sometimes the publisher’s name and contact information, and occasionally details on printing history and typesetting.

Click below to listen to the latest episode of the Velocity of Content podcast.

Critically for Richard Charkin, however, the title verso is where to find who is the copyright holder. He recently wrote for Publishing Perspectives about his professional obsession with this “most boring bit of any book.”

“The fact that it’s only three words makes it more powerful than the 100-page documents you see as contracts in some cases. All rights reserved – it means all rights reserved, and rights means all media. It means everything. Of course, the publisher of that edition may not have all rights in that book. The author does, which is why I say I think in a way, it’s more important for the author.”

Richard Charkin has held senior posts at major publishing houses including Bloomsbury, Macmillan, and Oxford University Press. He is a former president of the International Publishers Association and the United Kingdom’s Publishers Association. Currently, he is founder and sole employee of Mensch Publishing.

The post In Praise of the Title Verso appeared first on Copyright Clearance Center.

AAP Responds To “Flawed” GenAI Arguments

Christopher Kenneally — Fri, 08 Dec 2023 14:09:19 +0000

The Association of American Publishers (AAP) has released its Reply Comments to the U.S. Copyright Office concerning a Notice of Inquiry regarding copyright law and artificial intelligence.

According to Andrew Albanese, Publishers Weekly executive editor, “the AAP told the Copyright Office that the tech industry needs to stop telling copyright owners to back off on their claims that the use of their works to create training datasets for Gen AI systems without permission is infringement.”

Click below to listen to the latest episode of the Velocity of Content podcast.

Albanese tells me that AAP’s responded to tech industry assertions that respect for copyright is an obstacle to their innovation by labeling that as, “nonsense.”

“It would be a grave error to repeat the past policy mistakes that allowed technology companies to achieve such an unhealthy, monopoly-like market dominance to the point that governments have struggled to curb their power,” the AAP statement also said.

The post AAP Responds To “Flawed” GenAI Arguments appeared first on Copyright Clearance Center.

The United States Copyright Office Notice of Inquiry on AI: A Quick Take

Roy Kaufman — Wed, 06 Dec 2023 14:20:06 +0000

This post originally appeared in the Scholarly Kitchen on 11/28/23

Monday, October 30 was the final date for interested parties to submit comments to a comprehensive “Notice of inquiry and request for comments” issued by the United States Copyright Office entitled “Artificial Intelligence and Copyright.” With 34 questions asked about both copyright and technology, some parties responses exceed 100 pages. More than 9,000 responses have been filed. On the assumption that readers might be interested in this topic and less interested in reviewing all the responses, I have pasted below a selection of questions and answers from Copyright Clearance Center’s (CCC’s) own response.

Does the increasing use or distribution of AI-generated material raise any unique issues for your sector or industry as compared to other copyright stakeholders?

AI generated materials may both advance text publishing and hinder it. In sectors such as science, news, and book publishing, poor quality AI materials can generate bad science, promote misinformation, and lead to harmful results. This is not to say that such is the inevitable result of all AI; merely that it is a meaningful risk with respect to certain AI applications. AI can advance text publishing by providing tools for writing, checking, validating, and improving text-based works. It is also useful for primary research that may result in the creation of new content.

How or where do developers of AI models acquire the materials or datasets that their models are trained on? To what extent is training material first collected by third-party entities (such as academic researchers or private companies)?

In the text sector, developers of AI models — when acting lawfully — acquire materials and data sets from publishers, other rightsholders, websites that allow crawling, intermediaries, and aggregators (such as CCC). Significant amounts of content are available through licenses, including open licenses such as CC BY and CC BY-NC. Significant amounts of content are also available through the public domain. When acting unlawfully, AI developers receive materials from pirate sites, through downloading in violation of express terms and flags, and from so-called “shadow libraries,” among other things.

To what extent are copyrighted works licensed from copyright owners for use as training materials? To your knowledge, what licensing models are currently being offered and used?

Copyrighted materials are licensed for AI use directly by rightsholders and collectively through rights aggregators such as CCC. CCC’s collective licenses are non-exclusive, global, and fully voluntary. Our current AI-related offerings are focused on the corporate, research, academic and education markets.

Additionally, in science publishing, under “open access” business models, copyright owners employ open licensing which sometimes allows licensed reuse for AI under the terms of such licenses. According to this report, open models accounted for 31% of articles, reviews and conference papers in 2021.

Are some or all training materials retained by developers of AI models after training is complete, and for what purpose(s)? Please describe any relevant storage and retention practices.

Humans communicate in natural language by placing words in sequences; the rules about what the sequencing and specific form of a word are dictated by the specific language (e.g., English). An essential part of the architecture for all software systems (and therefore AI systems) that process text is how to represent that text so that the functions of the system can be performed most efficiently.

Almost all large language models are based on the “transformer architecture,” which invokes the “attention mechanism.” The latter is a mechanism that allows the AI technology to view entire sentences, and even paragraphs, at once rather than as a mere sequence of characters. This allows the software to capture the various contexts within which a word can occur.

Therefore, a key step in the processing of a textual input in language models is the splitting of the user input into special “words” that the AI system can understand. Those special words are called “tokens.” The component that is responsible for that is called a “tokenizer.” There are many types of tokenizers. For example, OpenAI and Azure OpenAI use a subword tokenization method called “Byte-Pair Encoding (BPE)” for its Generative Pretrained Transformer (GPT)-based models. BPE is a method that merges the most frequently occurring pairs of characters or bytes into a single token, until a certain number of tokens or a vocabulary size is reached. The larger the vocabulary size, the more diverse and expressive the texts that the model can generate.

Once the AI system has mapped the input text into tokens, it encodes the tokens into numbers and converts the sequences (even up to multiple paragraphs) that it processed as vectors of numbers that we call “word embeddings.” These are vector-space representations of the tokens that preserve their original natural language representation that was given as text. It is important to understand the role of word embeddings when it comes to copyright because the embeddings are the representations (or encodings) of entire sentences, paragraphs, and even documents, in a high-dimensional vector space. It is through the embeddings that the AI system captures and stores the meaning and the relationships of the words from the natural language.

Embeddings are used in practically every task that a generative AI system performs (e.g., text generation, text summarization, text classification, text translation, image generation, code generation, and so on).

Word embeddings are usually stored in vector databases but a detailed description of all the approaches to storage is beyond the scope of this response since there is a wide variety of vendors, processes, and practices that are in use.

Under what circumstances would the unauthorized use of copyrighted works to train AI models constitute fair use? Please discuss any case law you believe relevant to this question.

U.S. law has no specific rules governing the use of copyrighted materials to train AI. Rather, such uses fall under the general copyright regime. Under U.S. law, copying copyrighted content to train AI can state a cause of action for infringement [Citing, Thomson Reuters Enters. Ctr. GmbH v. ROSS Intelligence Inc., 529 F.Supp.3d 303 (D. Del. 2021) (downloading and copying of Westlaw database for the purpose of training AI).] Thus, such activities require a license to be non-infringing unless they fall under the fair use exception.

The application of fair use to an infringement is fact dependent. Copying for purposes of training an AI will usually entail copying the complete work. Whether the copying is for commercial or non-commercial research purposes will be considered. The courts will also look very closely at market harm under the fourth factor. As stated by the Supreme Court in Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 590 (1994) “[the fourth factor] requires courts to consider not only the extent of market harm caused by the particular actions of the alleged infringer, but also ‘whether unrestricted and widespread conduct of the sort engaged in by the defendant … would result in a substantially adverse impact on the potential market’ for the original.” And, as reinforced by the recent Supreme Court decision in Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, 598 U.S. (2023), the impact of the infringing use on licensing is one of the key factors in determining market harm.

Relevant instructional cases include the cases mentioned above as well as Fox News Network, LLC v. TVEyes, Inc., 883 F.3d 169 (2d Cir. 2018), where the Second Circuit Court of Appeals rejected a fair use defense in a case of allegedly transformative compiling of recorded broadcasts into text searchable databases that allowed search and viewing of short excerpts. By contrast, the Second Circuit had previously considered the text mining of scanned books for non-commercial social science research in Authors Guild v. Google, Inc. 721 F.3d 132 (2d Cir. 2015), and held that copies made and used for a specific purpose involving snippets would likely fall under fair use.

There are currently multiple pending cases in the U.S. relating to use of copyrighted content for the development of AI systems. Congress has expressed interest in the issue by including language in the SAFE Innovation Framework that the Framework will “support our creators by addressing copyright concerns, protect intellectual property, and address liability.”

Should copyright owners have to affirmatively consent (opt in) to the use of their works for training materials, or should they be provided with the means to object (opt out)?

Copyright is, and should remain, an opt in regime. Placing the burden of asserting rights on the copyright holders is inequitable, burdensome, and largely impractical. Only those making copies know what they are copying in the first instance and thus the copyright owners are not in a position to opt out.

9.2. If an ‘‘opt out’’ approach was adopted, how would that process work for a copyright owner who objected to the use of their works for training? Are there technical tools that might facilitate this process, such as a technical flag or metadata indicating that an automated service should not collect and store a work for AI training uses?

There is good reason that copyright is an “opt in” regime. Some AI developers have gathered content by routinely ignoring flags, copyright notices and metadata. Thus, while there are protocols and flags that can be used and are used by rightsholders and honored by ethical AI developers, they are no substitute for placing the responsibility for compliance on the user. Moreover, requiring flags and metadata assumes that the content resides on a server or website under the control of the rightsholder. This is not always true. For example, in the recent case of Am. Soc’y for Testing & Materials v. Public.Resource.Org, Inc., 82 F.4th 1262 (D.C. Cir. 2023), the Court of Appeals for the District of Columbia Circuit ruled that the non-commercial posting of technical standards incorporated into reference by law is fair use. It would be problematic to assume that the entity posting the standards over the objection of copyright owners would take steps to reserve the copyright owner’s AI rights.

Finally, for smaller creators, any obligation to adopt technical protection measures or flags is unfair and unduly burdensome.

Technical flags and metadata are useful for AI developers who act ethically and have another great value; where ignored by AI developers they can provide evidence of willfulness.

What legal, technical, or practical obstacles are there to establishing or using such a process? Given the volume of works used in training, is it feasible to get consent in advance from copyright owners?

It is feasible to acquire advance consent of copyright owners. It is not feasible to place the burden on rightsholders to police their rights without knowing who is using their works without authorization and how the works are being used.

The burden of implementing technical measures, flags, and metadata may, depending on the sector, be involved, complicated and costly to copyright owners. In the recent past, international sector-wide initiatives such as ACAP have absorbed significant time and resources on the part of rightsholders and users seeking to act ethically, only to be rejected by the tech industry. Current efforts of note include the W3C Text and Data Mining Rights Reservation Protocol.

As noted above, as a practical matter, a copyright holder may have no control over websites where its content is held. This is especially true where content is posted in violation of copyright or under a copyright exception.

There is certainly enough copyrightable material available under license to build reliable, workable, and trustworthy AI. Just because a developer wants to use “everything” does not mean it needs to do so, is entitled to do so, or has the right to do so. Nor should governments and courts twist or modify the law to accommodate them.

The post The United States Copyright Office Notice of Inquiry on AI: A Quick Take appeared first on Copyright Clearance Center.

Velocity of Content | Copyright Clearance Center

What Is (and Isn’t) Protected by Copyright?

Where copyright protection begins and ends

What is copyright?

Protected by Copyright:

Not Protected by Copyright:

5 AI-Related Topics Every Information Professional Should Think About in 2024

1. Copyright questions about using AI in content

2. Licensing content for AI

3. Company guidelines and strategic directives around technologies using AI

4. Budget ambiguity

5. Stay current

In 2023, Voices Carry

Amazon Urges Judge To Dismiss Antitrust Case

Utilizing the Annual Copyright License Across Your Organization

Thoughts on the U.S. Executive Order on Artificial Intelligence

The Boston Globe Declares CCC a Top Place to Work in Massachusetts

In Praise of the Title Verso

AAP Responds To “Flawed” GenAI Arguments

The United States Copyright Office Notice of Inquiry on AI: A Quick Take

5. Stay current