Skip to main content

Posts

Showing posts from October, 2008

Large Scale Digitization Projects Part 2: Project planning

Project planning and project management can be useful for many things besides digital projects, however they are particularly useful for digital projects because digital projects tend to have many different components. You have to worry about people, funding, schedules, and technology. Any one of those things by themselves would be difficult to manage without a plan of attack. Dealing with all of them at once requires a lot of attention and planning. Project planning and project management can be summed up in 5 steps. These should be done in order. This is a brief overview of the project process, and by no means does it cover the whole topic. If you are interested in reading more about project management, I highly suggest The McGraw-Hill 36-Hour Project Management Course by Helen S. Cooke and Karen Tate (ISBN 0-07-143897-1). 1) Context Once you have been assigned a digital project, develop your context. Why is the project important? Who wants it done? Who is involved and why?

The Cost of Digitization

Libraries have to take many things into consideration when considering starting a new digitization project. Equipment for mass digitization is usually specialized and very expensive. Only a few companies make the equipment, allowing them a strong control over the overall market. This causes prices to be inflated. For book scanners a library could pay anywhere from $40,000 to $300,000 for just one piece of equipment. Some book scanners cut the book out of the binding before scanning the pages individually. Keeping an expensive piece of equipment working past the purchase date usually means the library also has to pay a yearly maintenance or warrantee subscription, which adds to the overall price of the equipment. A good guess is 10-15% of the purchase price every year to keep the equipment running. For a $300,000 machine that comes to $30,000-$45,000 a year for as long as the machine is being used. In addition to the price of the equipment, the library must pay for server storage

The Basics: Large scale Digital Project Management Part 1

The Basics: Large scale Digital Project Management Part 1 I went through a class last year about digital project management, and as much as I was impressed at the technical part of the class, and the part of the class where we designed a website to put the digital stuff up, I was surprised to find out that the class had very little to do with large scale project (items that were over 100 items). It also completely ignored book digitization. This is odd, because 90% of my job is large scale digitization book efforts. There are a few things that you need to consider when doing a large scale digitization effort (especially if it’s books). What is the quantity of items to be digitized? A collection of 200 items is going to be treated differently than a collection of 14,000 items. How much variety is there in the group of items? Are they all books? All pictures? Are they mixed documents? Are they bound? If you find that the group has many subgroups, go ahead and divide them out and make eac

The Digital Assembly Line

This year, a colleague and I gave a presentation on how the digital lab at Texas Tech University ramped up its digitization efforts. It's called The Digital Assembly Line , and it lives at the Texas Digital Library repository . I happen to stumble upon it while doing a Google search.

The Basics: File size

File size is important when you are trying to make an image available on-line because it has to be good quality enough that it’s useful, but small enough that it can be loaded into a browser without having the patron wait. Also, if you want to archive the images, then you want a high quality archival image without being too hefty on memory. There are a few things that affect file size: The type of file affects file size. See the post about file types . Whether you are saving in full color, grayscale, or black and white also affects it. Typically, full color images are going to be large files. Grayscale is slightly smaller, but still much greater than black and white. However, you have some options when saving in full color or grayscale. 8 Bit or 24 bit Color/Grayscale You can save a color or grayscale image as either 8 bit or 24 bit color or grayscale. 8-bit 8 bit means that there are 8 bits that describe each pixel. This means that you have 256 different combination of bits to s

The Basics: DPI (Dots Per Inch)

DPI (Dots Per Inch) in digital imaging means how many pixels per square inch make up the image. For example, if I scan a 1 inch by 1 inch piece of paper in 300 DPI, then there are 300 pixels in the image that I scanned. If I looked at the digital image and realized that it did not show enough detail of the image, then I would scan it at 600 DPI. Now, the digital image has 600 pixels that represent the image. You can keep going up and up, but there reaches a point where adding more pixels does not gain you any more detail. Why? Because at some point you start picking up detail about the paper (or medium) that the items is on. You start to see the fibers in a piece of paper, or the dimples on it’s surface. Once you start picking up that level of detail, you are actually creating more “noise” in the image than is necessary and you need to back off on the DPI. Most printed text, for example, is printed in 80-90 DPI on a page. However, this does not mean you can scan the image at

New Equipment: Atiz- Bookdrive

At work we are looking at the Atiz Bookdrive . The two Atiz products (The Bookdrive DIY and the Book snap) end up being very affordable options for digitization. An organization, depending on what options they want, can get a setup for anywhere from $3000-$10,000. I haven't seen this equipment in person, but once I do, I'll be sure to post about it.

Digitization Equipment Review: The CopiBook

The Copibook is an overhead scanner from a company in France called i2s. It's sold here in America by iimage Retrieval . Although they have many fine products, I am only reviewing the CopiBook. The machine runs off of an internal computer and can output to a network, to another computer, or to a flash drive. The equipment is easy to use, and easy to train. I can get someone working on it in 5 minutes for simple stuff, and 10 minutes for more complex items. The CopiBook is designed to work with ambient light. It does this by making a “light map” of the whole scanning area. If there is a shadow, it will adjust the brightness of the scan in that area. If there is excessive light in another area, it will lower the brightness in that area. So, you get a very clean flat image. For people who have problems with lighting, you can add light to the machine as needed or move the machine to a better light location (where the light is more consistent etc.) The image quality is wonderful. At 300

Image Display: Image zooming

People have brought the problem up many times that it would be better if we could display images dynamically and be able to zoom in. A quick and easy solution is Zoomify EZ (which is free) or other more expensive versions of Zoomify that have more features. Zoomify will work with high resolution images relatively quickly. It works by saving the image in a mosaic of smaller images at different resolutions. It only deals with a few sections of the image at a time instead of dealing with the whole image, so there is very little lag. The speed is determined more by the computer and the internet connection than the size of the image itself. With this software, the level of zoom is determined by the DPI of the image. It will allow someone to zoom in till the section they are viewing has reached the original DPI. Since the free version of Zoomify uses flash, it also helps to prevent people from downloading the image. This can be a problem if your goal is to share the pictures. You will

Protecting Images from being downloaded on websites

There’s no 100% secure answer to protecting images. There is always going to be a way around it for extremely advanced users, but this might represent a very low percentage of the population. The most common image download prevention method is to add a bit of code to the website that prevents right clicking. For example, the link provided here is a webpage where someone wrote a script that gives a custom message when a user tries to right click on the page. You can try it out on the page. These kinds of codes can be bypassed by disabling Javascripts on the page, or by using Firefox. A more secure way is to use Flash. More advanced users can still get to the image by combing through the HTML of the webpage, but for the average person this prevents downloads.

The Basics: Digital Imaging- File Format types

File Formats What format you use is dictated by your goal. Libraries have a tendency to want to have a small access copy of reasonable quality, and a high quality archival copy of reasonable file size. To get a better idea of what the different formats are, and what they do, I've linked to some articles: TIFF (Tagged Image File Format ) JPEG JPEG2000 PDF JPEG is the perfect format for full color or grayscale images because it provides a nice balance between quality and file size. The format is stable as long as you do not edit the file over and over. If you are going to be editing the file, save it as a Tiff first, then save it as a JPEG. JPEGs are perfect for access copies, and sometimes for Archival copies. A full color or grayscale Tiff is huge as far as file size goes. A black and white Tiff, however, is generally smaller than a black and white JPEG. Don’t ask me why. Tiff also provides the best quality and stability. You can use Tiffs for archival storage if you have a

The Basics: Software

Image editing software Which software you use depends on what you are digitizing. If you are digitizing images, Photoshop or something similar is vital. The GIMP is a similar program that is free. Whenever possible you want to avoid having to edit the images at all. You want your scan to have good lighting, good white balance, and a good crop box before you touch it with a program. It saves time and saves the image from unnecessary tampering. All image editing software works off of different algorithms. This is why the same function will work differently in different programs. The more expensive and popular software (like Photoshop) has good algorithms that may produce better results. The basic functions you want in an image editing software: · color correction · cropping · brightness/contrast correction Text image processing software If you are scanning mostly text, then you need a totally different kind of program. If you are doing text, then you are likely doing books, newspapers

The Basics: Hardware

Let’s start with the basics. In order to understand digitization, first you have to have at least a glancing understanding of the basic technology involved. Why? Because the individual parts of a computer can make the difference between a smooth workflow and a choppy one. Trying to do massive amounts of computations by processing hundreds of images at once can be annoying if your RAM is insufficient or if your network can’t handle the load. You can get a great overview by checking out the following articles from howstuffworks.com . Don't forget to check out all the links at the top of the article to see more. How PC's work How Scanners work How Digiatal Cameras work How Servers work How Networks work

The Digitial Dungeon

My work consists of rapidly learning about every aspect of digitization for libraries. In my fast paced required learning, I’ve realized that a lot of what I need to know is spread out over the vast information landscape. Very little of it is combined into a cohesive single location that is easily referenced or discussed. So, I wanted to start this blog as a place to review my learning, and offer it to the community for discussion and reference. In the coming weeks, I’ll be posting about the basic knowledge I’ve had to become familiar with. Most of it will be training documentation that I’ve written for the students who I manage. After I get through that, I’m going to start going through the problems and solutions that my coworkers and I have had to tackle in regard to digital library initiatives. I hope this blog is a useful and informative place for anyone interested in the topic.