F ULIB

 

Vision

 

For the first time in history, all the significant literary, artistic, and scientific works of mankind can be digitally preserved and made freely available, in every corner of the world, for our education, study, and appreciation and that of all our future generations.

Up until now, the transmission of our cultural heritage has depended on limited numbers of copies in fragile media. The fires of Alexandria irrevocably severed our access to any of the works of the ancients. In a thousand years, only a few of the paper documents we have today will survive the ravages of deterioration, loss, and outright destruction. With no more than 10 million unique book and document editions before the year 1900, and perhaps 100 million since the beginning of recorded history, the task of preservation is much larger. With new digital technology, though, this task is within the reach of a single concerted effort for the public good, and this effort can be distributed to libraries, museums, and other groups in all countries.

Existing archives of paper have many shortcomings. Many other works still in existence today are rare, and only accessible to a small population of scholars and collectors at specific geographic locations. A single wanton act of destruction can destroy an entire line of heritage. Furthermore, contrary to the popular beliefs, the libraries, museums, and publishers do not routinely maintain broadly comprehensive archives of the considered works of man. No one can afford to do this, unless the archive is digital.

Digital technology can make the works of man permanently accessible to the billions of people all over the world. Andrew Carnegie and other great philanthropists in past centuries have recognized the great potential of public libraries to improve the quality of life and provide opportunity to the citizenry. A universal digital library, widely available through free access on the Internet, will improve the global society in ways beyond measurement. The Internet can house a Universal Library that is free to the people.

   
 

Mission

 

The mission is to create a Universal Library which will foster creativity and free access to all human knowledge. As a first step in realizing this mission, it is proposed to create the Universal Library with a free-to-read, searchable collection of one million books, available to everyone over the Internet. Within 10 years, it is our expectation that the collection will grow to 10 Million books. The result will be a unique resource accessible to anyone in the world 24x7, without regard to nationality or socioeconomic background.

One of the goals of the Universal Library is to provide support for full text indexing and searching based on OCR (optical character recognition) technologies where available. The availability of online search allows users to locate relevant information quickly and reliably thus enhancing student's success in their research endeavors. This 24x7 resource would also provide an excellent test bed for language processing research in areas such as machine translation, summarization, intelligent indexing, and information retrieval.

It is our expectation that the Universal Library will be mirrored at several locations worldwide so as to protect the integrity and availability of the data. Several models for sustainability are being explored. Usability studies would also be conducted to ensure that the materials are easy to locate, navigate, and use. Appropriate metadata for navigation and management would also be created.

   
 

Goals

 

The primary long-term objective is to capture all books in digital format. It is believed that such a task is impossible and could take hundreds of years, and never be completed. Thus, a first step was to demonstrate the feasibility by undertaking to digitize 1 million books (less than 1% of all books in all languages ever published). This was achieved in the 2006 - 2007 timeframe. We continue to digitize books at 50 scanning centers all over the globe to achieve the long term objective. We believe such a project has the potential to change how education is conducted in much of the world.

The first major project of Universal Library is the Million Book Digital Library project. Typical large high-school libraries house fewer than 30,000 volumes. Most libraries in the world have less than a million volumes. The total number of different titles indexed in OCLC's WorldCat is about 48 million. One million books, therefore, is more than the holdings of most high-schools, and is equivalent to the libraries at many universities and represents a useful fraction of all available books.

A secondary objective of this project will be to provide a test bed that will support other researchers who are working on improved scanning techniques, improved optical character recognition, and improved indexing. The corpus this project creates will be one to three orders of magnitude larger than any existing free resource.

   
 

Benefits

 

The principal benefit of the Universal Library will be to supplement the formal education system by making knowledge available to anyone who can read and has access. Libraries have played a vital role in the advancement of human society. Societal advance depends on young people having access to books via libraries and other means. We expect that making this unique web resource available free to everyone around the world will enhance the learning process.

Libraries are unevenly distributed around the world and within each country. In the U.S., the NCES Survey noted that in 1996, 3,408 of 3,792 institutions of higher education had libraries holding 806.7 million volumes. The 112 largest university libraries in the United States and Canada each have at least 1.8 million books. They are members of the Association for Research Libraries. Massachusetts has about 25 million volumes; New York has about 31 million volumes, and California has about 40 million volumes in their ARL Libraries (Association for Research Libraries, 1999/2000). Other states, such as North and South Dakota , have no large libraries. A few large public libraries have several million volumes. However, most junior colleges, high schools, and public libraries have much smaller collections. Making this large knowledge repository can revolutionize research at all levels of education and give a much-needed boost at minimal cost to our national educational infrastructure. This impact will be further enhanced given the convenience of online access, and the benefit of full text searching at word and phrase levels.

A secondary benefit of online search is to make locating the relevant information inside of books far more reliable and much easier. Student success in finding exactly what they seek will increase and increased success will enhance student willingness to perform research using this large resource. NCES reports that 84 percent of libraries around the country are open between 60 and 80 hours a week. This digital library would be open all the 168 hours the week on a 24x7x365 basis. More than one individual will be able to use the same book at the same time. Thus, popular works will not be checked out and thus unavailable to others.

This project will produce an extensive and rich test bed for use in further textual language processing research. It is hoped that at least 10,000 books among the million will be available in more than one language, providing a unique resource for example based machine translation.

Many believe that information is now doubling every two years. Machine summarization, intelligent indexing, and information mining are tools that will be needed for individuals to keep up in their discipline work, in their businesses, and in their personal interests. This large digitization project will enable extensive research in these areas.

   
 

People

 

Carnegie Mellon University

US Advisors
  • Dr. Robert Kahn, CNRI
  • Prof. John Mccarthy, Stanford University
  • Dr. Ching-chih Chen, Co-Principal Investigator, China-US Million Book Project
  • Dr. Peter Chen, Executive Director, China US Million Book Project
  • Dr. Daniel Greenstein, Executive Director of the Digital Library Federation
  • Dr. Brewster Kahle, President, Internet Archive
  • Dr. Victor Zue, Professor, LCS, MIT
  • Dr. Michael Lesk, National Science Foundation
  • Dr. Stephen Griffin, National Science Foundation
China
  • Dr. Pan Yunhe, President of Zhejiang University
  • Dr. Gao Wen, Deputy President of the Graduate School, Chinese Academy of Science
  • Dr. Gao Wen, Deputy President of the Graduate School, Chinese Academy of Science
  • Dr. Yueting Zhuang, Assistant Dean of Computer Science Department of Zhejiang University
  • Dr. Jihai Zhao, Associate librarian of Zhejaing University
  • Chen Haiying, Secretary of CADAL Project & Director of Digital Library R&D Center, Zhejiang University Libraries
  • Dr. Huang Tiejun, Institute for Digital Media of Peking University, Director of North Technical Center for China-US Million Book Project
  • Past Members

  • Mr. Guo Xinli, Vice General Director, Ministry of Education of China
  • Mr. Chen Jianping, Vice Director, State Planning Commission of China
  • Dr. Chi Huisheng, Vice President of Beijing University
  • Dr. Hu Dongcheng, Vice President of Tsinghua University
  • Dr. Xu Zhong, Vice President of Fudan University
  • Dr. Zhang Yibin, Assistant to the President, Nanjing University
India
  • Dr. A.P.J Abdul Kalam, Founding Sponsor. Former President of India
  • Dr. N. Balakrishnan, Chairman, Division of Information Sciences, Indian Institute of Science
  • Dr. M D Tiwari, Director, Indian Institute of Information Technology, Allahabad
  • Mr. K.V. Ramanachary, Executive Officer of Tirumala Tirupati Devasthanams
  • Mr. Nadendla Manohar, MLA, Chairman, A.P Assembly Library Committee
  • Mr. Subramanyam Bhuman, TTD UDL
  • Dr. C V Jawahar, IIIT Hyderabad UDL
  • CDAC, Noida
  • Dr. A. B. Saha, CDAC, Calcutta
  • Kiran Kumar Vemuru, Director Planning, UDL India
  • Mr. Sridharan, Secretary, Arulmigu Kalasalingam College Of Engineering
  • Dr. Thangaraj, Principal, Arulmigu Kalasalingam College Of Engineering
  • Dr. Vaidhyasubramanian, Dean, Shanmugha Arts, Science, Technology & Research Academy
  • Asma Mannan,E.Veera Raghavendra ,Rakesh Devineni,Jaya Krishna, MSIT-IIIT, Hyderabad
  • Past Members

  • Prof. M G K Menon, Chairman, Council of Indian Institute of Information Technology, Allahabad and Indian Institute of Technology-Bombay (Mumbai)
  • Dr. V S Arunachalam, Chairman CESTEP, Bangalore, Distinguished Service Professor, CMU
  • Mr. Yashwant Bhave, Joint Secretary, Ministry of Communication and Information Technology
  • Dr. Chaturvedi, Director, Ministry of Communication and Information Technology, Govt. of India
  • Dr. Ashok Kolaskar, Vice Chancellor, Pune University
  • Dr. P. Krishniah, Executive Officer of Tirumala Tirupati Devasthanams
  • Mr. Ajeya Kallam, Executive Officer of Tirumala Tirupati Devasthanams
  • Mr. A.P.V.N Sharma, Executive Officer of Tirumala Tirupati Devasthanams
  • Mr. Ajay Sawhney, Special Secretary for IT, Government of Andhra Pradesh
  • Dr. IV Subbarao, Secretary Education, Government of Andhra Pradesh
  • Dr. Om Vikas, Senior Director, Ministry of Communication and Information Technology, Govt. of India
  • Mr. Surendra Bagde, Joint CEO, Mumbai Industrial Development Corporation
Egypt
   
 

Partners

  China Egypt India Click here to visit our Partner websites
   
 

UDL Conference Proceedings and Publications

 

ICUDL is a series of annual international conferences on the Universal Digital Library.The goal of these conferences is to provide a forum for library and IT professionals to exchange comprehensive views on the recent developments and progress in digital library technology, to promote international cooperation in related fields, to advocate universal access to information, and to enhance the global impact of the Universal Digital Library. The following are proceedings from past conferences and selected UDL publications.

Publications

2007 ICUDL, Carnegie Mellon University, Pittsburgh, USA

2006 ICUDL, Bibliotheca Alexandrina, Alexandria, Egypt

2005 ICUDL, Zhejiang University, Hangzhou, China

   
 

Funding

 

Funding for the Million Book Project is coming from multiple sources. National Science Foundation provided funding for equipment, India and China are providing manpower resources for scanning, indexing and hosting, and various companies and foundations are providing partial support.

National Science Foundation provided funding for Scanners, Computers, Servers, and Software. These resources from NSF are augmented by China and India who are providing the manpower, about 2,000 man years of effort each, over a five year period, as their contribution to this project. This represents a twenty-to-one relative contribution by China and India to this project. They will assist in the selection of documents, software development and in digitizing of the books.

 
 

The Knowledge Conservancy

 

The Knowledge Conservancy will use the donations above to support the following activities:

  • Accepting undirected and directed gifts to digitize and make freely accessible the works of man. This includes gifts that guarantee specified numbers of books or specified collections under the rule that access is free to the people.
  • Funding the digitization of works and the ongoing upgrading of the digital works to the public. This funding can be directed to ongoing or new digitization projects. In addition, the funding will support one central master registry site that other sites will mirror.
  • Promoting awareness of these digital works, and their importance, to the public at large.
  • Selection and digitization of works may be at the discretion of the Conservancy, the projects it funds, or its donors.
    • For example, a donor might contribute money specifically for digitizing works in a particular subject area. Selection of works might also be made from lists drawn up by scholars, requests over the Internet, and books donated to the Conservancy.
  • Coordinating the efforts of digitization efforts. The Knowledge conservancy maintains a database to keep track of all book editions, music editions, art, and motion pictures that have been digitized and whether they have free access. This includes the capabilities to
    • determine through its catalog whether access to a known document is already available,
    • claim to be digitizing a document within a certain time period,
    • reference a document with a hyperlink that does not go bad,
    • reference the content of a document based on accepted practice in document tagging, and
    • Allow machine interpretation of copyright restrictions and other restrictions on access and use.

Rationale for the creation of Knowledge Conservancy

For the first time in history, all the significant literary, artistic, and scientific works of mankind can be digitally preserved and made freely available, in every corner of the world, for our education, study, and appreciation and that of all our future generations.

Up until now, the transmission of our cultural heritage has depended on limited numbers of copies in fragile media. The fires of Alexandria irrevocably severed our access to many of the works of the ancients. In a thousand years, only a few of the paper documents we have today will survive the ravages of deterioration, loss, and outright destruction. With no more than 10,000,000 unique book and document editions before the year 1900, and perhaps 300 million since the beginning of recorded history, the task of preservation is much larger. With new digital technology, though, this task is within the reach of a single concerted effort for the public good, and this effort can be distributed to libraries, museums, and other groups in all countries.

Existing archives of paper have many shortcomings. Many other works still in existence today are rare, and only accessible to a small population of scholars and collectors at specific geographic locations. A single wanton act of destruction can destroy an entire line of heritage. Furthermore, contrary to the popular beliefs, the libraries, museums, and publishers do not routinely maintain broadly comprehensive archives of the considered works of man. No one can afford to do this, unless the archive is digital.

Digital technology can make the works of man permanently accessible to the billions of people all over the world. Andrew Carnegie and other great philanthropists in past centuries have recognized the great potential of public libraries to improve the quality of life and provide opportunity the citizenry. A universal digital library, widely available through free access on the Internet, will improve the global society in ways beyond measurement. The Internet can house a Universal Library that is free to the people.

The mission of the Knowledge Conservancy is the preservation of the world's cultural, historical, and scientific works, and their free access to the world over the Internet.

Many projects are already underway to put works on-line for general reading, including projects like American Memory, Project Gutenberg, and the Universal Library. While these projects are making some headway at preserving the world's cultural heritage, they still face at least three major obstacles:

  • Digitization efforts are widespread but relatively small and too often associated with restricted access. Collection owners are not motivated to provide free access often through ignorance about available techniques for recovering costs.
  • Funding for the digitization of information -- and for its ongoing preservation, maintenance, and delivery, once digitized -- is less than needed to make a large-scale digital library of millions of items available to the global public.
  • There are no global organizations seeking only to provide coordination of digitization efforts, sharing of limited resources, and global public awareness.
  • A large portion of the world's cultural heritage is copyrighted. Thus far, projects have largely avoided using copyrighted material, because of the difficulty of acquiring rights. Many works risk being lost in obscurity long before the copyright expires. The result is that important bodies of work may be unavailable in widely accessible for a long time, and possibly forever.

The Knowledge Conservancy promotes the widespread duplication of digital collections at many sites, to increase accessibility and minimize the risk of single-point failure.

Conservancy policy is that a digital copyright may be retained by the collection owner but the conservancy will have a perpetual right to provide free viewing access to the material through its universal library. Furthermore, free viewing access will be automatically and perpetually granted to any site that agrees to the conditions of copyright and agrees to solely bear the cost of the mirroring. The copyright owner will always have a right to a link defined on any of his pages viewed, as will the Knowledge Conservancy.

The Knowledge Conservancy will actively support and endorse, but will not fund, money making activities around the material that is free to read. Examples include ordering copies of a book or manuscript, paying to print, paying for high fidelity access, ordering CD or DVD ROMs, ebook editions, selling supplementary materials, searching tools, indexing tools, multilingual tools, and portals that provide new media presentations of the material.

The Knowledge Conservancy will fund (a) digitizing a work and (b) providing metadata about a work including cataloging metadata, metadata about the structure of the content inside the work (e.g., chapters, stanzas, link maps for images), metadata that controls copyrights, and full text (symbolic) conversion. It will not fund, but will strongly endorse and otherwise support, interpretative efforts.

The conservancy is not a standards making organization, but will provide online cross-disciplinary support and assistance in order to improve quality in digitization, free-to-read presentations, and long-term preservation and migration of digital works. The conservancy may charge fees for this support or provide referrals to organizations that charge fees for such support.

   
 

Sustainability

 

Sustainability is a long-term issue for this project; further research will be done on developing economic models to support this major contribution to education. Partial answers to this significant challenge are discussed below. Three establishments, which have potential for offering a sustainable model for this project, are the Library of Congress and similar national libraries, OCLC, and other commercial concerns.

The million-book project will be a public good and as such must have a suitable repository that will continue to make it available to the public at no charge. That responsibility belongs most clearly to the national library in each country. Having a network of national libraries mirroring the resource around the world would be an appropriate and desired outcome.

  1. Library of Congress
  2. In the United States, the Library of Congress would be the natural choice to act as the repository of all knowledge, because the national interest is so clearly served. However, the Library of Congress is not the national library of the United States, although many people assume that it is. In the LOC's own words in its mission statement: THE FIRST PRIORITY of the Library of Congress is to make knowledge and creativity available to the United States Congress. It is only a lesser goal to make knowledge available to the public, and that is why the million-book project had to be undertaken outside of the scope of LOC. LOC is also the guardian of the copyright office and is apparently unwilling and/or unable to digitize anything to which there might be a copyright claim.

    In FY 2001, Congress appropriated 100 million dollars for Digital Preservation, contingent on LOC's raising of $75 million in matching resources. The law allows the acceptance of gifts in kind as a part of the matching funding. We are exploring the possibility of pledging the million-book project to LOC as matching funding for the Digital Preservation initiative. Even if the value of the project were only assessed on its inputs (equipment and labor), the Million Book project represents a significant investment.

  3. OCLC
  4. Another alternative might be for OCLC to maintain a free version of the resource. OCLC is a non-profit organization whose member libraries are committed to enhancing access to information. OCLC might cover its costs by charging member libraries a small fee when the million-book project is accessed through the 48 million-title database. For the millions of OCLC users, that convenience would be worth a small payment in an already existing fee relationship. OCLC's recent strategic planning initiatives identified the addition of more full text to the database, exploring archiving responsibilities, and becoming more international as important thrusts. OCLC would also be able to cover partial costs through some of the strategies listed below for publishers.

  5. Commercial alternatives
  6. The marketplace for electronic books is chaotic at this moment. Questia, designed to be an online source with at least 50,000 of the best books with sophisticated software to support searching and the creation of footnotes, marketed itself directly to students at a $20-30 monthly fee. Although the project was well capitalized and attracted a great deal of media attention, its operations have been significantly scaled back over the past year. Librarians have long observed that charging for resources in the academic environment reduces use. Students' desire for the convenience of online information sends them to the web and to the much-used electronic resources of their own libraries. That love of convenience apparently does not extend to purchasing Questia under current pricing models.

    During the same period, the company netLibrary has announced that it will provide new full textbooks online. NetLibrary marketed itself to libraries through consortia. Use of materials, thus, was at no direct cost to students and faculty. While students appreciated the convenience of being able to use the resource online, they had many complaints about its functionality, in particular they resented the fact that books could only be printed one page at a time and that books were unavailable if another individual were using them. The economic models behind netLibrary charges also seemed to reflect an adherence to those of paper books rather than recognizing the economies of digital materials. The assets of netLibrary have since been sold to OCLC.

    At this time, the marketplace responses suggest that turning the million book project into a private, revenue-generating source would not offer a sustainable model. JSTOR, Project Muse, and other digital journal projects that offer online materials with superior functionality and sustainability continue to flourish.

  7. Publishers
  8. Another commercial alternative might revolve around relationships with publishers. As publishers find that making the book available increases sales, they might be willing to support the project. The Universal Library website would support "buy" buttons, in return for a small share of the revenues from print-on-demand sales of out of print materials.