Skip to main content

Top 5 Reasons Digitization Projects Fail

1. No one really thinks about why the item is being scanned

When people talk about scanning something or putting it online, their reasons for doing so can be shockingly vague.  People will say things like "to make it more available", or "to make it easier to search", but often what they really want is something completely different.  I have had a faculty member initiate a scanning project for photographs only to find out that what they really wanted was to blow up the images so he could see them better.  After the first few test items, and him complaining about how he couldn't see the images well enough, it became obvious what he wanted.  We changed the project to scan the images at a higher resolution to make it a bit easier for him.

2. No one thinks about what, exactly needs to be captured during the scan

When people first start out, they think of scanning as a pretty simple process of just taking a picture of the item and putting it online.  However, there are actually layers of information that can be captured in an item.  For a typed page, you may want to just capture the text/information on the page (which you could have someone transcribe), or you may want to capture the image of the text on the page or the typesetting, or you may want to capture the paper in great detail to show the fibers, or to show stains or damage the paper has sustained.  Each of these objectives would require a slightly different scanning standard to capture the required item.

3. People want "high quality" digital items, they want them quickly, and they don't want it to cost too much.

You can't have your cake and eat it to, at least not in digital projects.  It is rare for a digitization project to be both full of requirements, fast, and cheap.  More than likely, the process will take many more months than people expect, and cost about 3/4ths what they thought.  Why is this the case? Because people have a tendency to focus on the scanning and the scanning equipment.  They think the most time consuming part of the process is image capture, and that the most expensive part of the process is buying the scanner itself.  However, metadata creation and putting the items online typically takes the most time, and the scanner itself is usually 1/3-1/4th the cost of a project, next to human resources costs, the cost of setting up a system to host the items online, and the cost of securing the digital files for long term preservation.

4. They don't experiment with a small number of items first

While it is commendable that people want to get started immediately and dive in, how you dive into a project makes a difference.  If you set up your process and standards, and then dive in by scanning 1,000 items, you may be horrified to find out that none of those items are going to work for their intended purpose and you have to do them all over again.  A better method of diving in is to start with one item, take it completely through the process with an eye toward tweaking the process and testing the final product.  Make sure that what your producing is indeed what is needed. Then, do a few more items and record how long each step took, and use those numbers to estimate the time and resources needed for the rest of the project.  You'll have a much better idea of how the rest of the project is going to go, and you'll not have wasted a lot of time doing the wrong thing.

5. They avoid assessment, and avoid reprocessing

There is a tendency to celebrate the end of projects and then turn a blind eye to them while you focus on new more interesting things.  With this thinking, a project is successful because it is complete.  However, digitization projects need to be checked on occasionally to make sure they're still working.  Make sure to set up a web statistics package on the collection so you can monitor usage, and then check on it at least every year.  Sometimes, what you find out is that something about the system or metadata is preventing the project from reaching it's users.  At that point, some part of the project has to be reprocessed.  Don't fear reprocessing.  Often, fixing the problem will be faster than you imagine, and it's much better to have a working collection than a broken one.

This list is based of the book Digitizing Flat Media: Principles and Practices, a practical book about the nuts and bolts of Digitization for those who are just starting out, available for purchase at Amazon.

Comments

Popular posts from this blog

The Workload Iceberg for Digital Collections and Initiatives

In the last few weeks, I was asked to write a small paragraph explaining my area to others in the library.  I was happy to do this, as many people say they don’t know what my people do.  It’s sometimes hard to explain to others what we do without going into overtly technical topics and terms.  If we have done our job right, we’re practically invisible, which is the way it should be.  Anyway, writing the description made me realize why there is often a mis-match between what we do and what people think we do.  I’ll let you read the description yourself.  I’ve underlined the important bit. “Digital Resources is primarily an Open Access publisher.  We publish both born digital items (produced by students or faculty), and we scan to publish or republish old items. We curate digital collections through the whole digital life-cycle. Our work is a bit different from other departments because the more work we finish; the more work we create in having to maintain the collections. We’re no

Bureau of Indian Affairs- Digital Collection

The Bureau of Indian Affairs is one of the oldest Bureaus in the United States.  It was established in 1824 by Secretary of War John C. Calhoun . While the history of the organization has been controversial, their records are open to the public.  This collection brings together letters distributed from the Bureau of Indian Affairs starting in 1832 and going on into 1966. View the rest of the collection:  http://bit.ly/2h0hKvW 

HathiTrust and Local Digital Stewardship

An article I had been working on for a bit with Heidi Winkler finally got published in the second volume of the International Journal of Librarianship in a special issue on Data Librarianship . Abstract This article reviews the influence that massive digital libraries like the HathiTrust Digital Library can have on local, smaller institutions’ digitization, preservation, and curation programs. The history of HathiTrust’s digital preservation efforts as a Trusted Repository is reviewed. A case study is presented showing how one academic library made difficult digital stewardship decisions in a modern world of globally federated preservation initiatives. The authors introduce the concept of deselection as part of the digital curation process and discuss how digital collection administrators can refine their local digital preservation efforts to better reflect the realities of constrained human and financial resources. Read more...