Archived entries for libraries

Monster truck info

We have recently begun sending Biodiversity Heritage Library materials to the Internet Archive scanning pod at NYPL. We’re currently trying to get the workflow in place, and so we recently purchased one of these Samson Book Carts to send stuff down. They’re perfect in a lot of ways: rugged, collapsible, huge capacity. Unfortunately, it’s also too tall (by about 4″) to fit in the van we’re using to transport books. I’ve been researching big book carts to no avail – if anyone knows of one similar, but a little shorter, than the samson I’d appreciate knowing about it. Thanks. Isn’t it interesting how 90% of digitization works out to be logistics?

Technorati Tags: , ,

“Email is for old people.”

More on the oversimplicity of “Digital Natives” etc. (The Googlization of Everything):

As Henry Jenkins writes, there is so much interesting stuff going one out there among age groups, among members of communities, and across oceans that flattening out everyone into “generations” or “natives” and “immigrants” is just false and useless.

It also has real-world implications. Once we assume that the kids out there love certain forms of interaction and hate others, we forge policies and design systems and devices that meet our presumptions. By doing so, we either pander to some marketing cliche or force an otherwise diverse group of potential users into a one-size-fits-all system that might not meet their needs.

Also see the first comment for the predictable “it is TOO” take on things, replete with the usual ageist assumptions and based mainly on hypotheticals and anecdotal evidence.

Technorati Tags: , , ,

More on the Kindle and privacy

Snoop-friendly Kindle e-reader highlights privacy issues raised by feds’ attempts to get list of p-book buyers | TeleRead: Bring the E-Books Home:

Just when the Kindle is appearing with its own Trust Us approach—Amazon stores everything for itself and maybe unwittingly for Washington—D.C. comes along to remind us of the risk of Big Bro even without the Kindle. Via an AP story, we learn that federal prosecutors sought “the identities of thousands of people who bought used books through online retailer Amazon.com Inc.”

No word on how far they got in the used books. But some other highlights from the post:

Meanwhile Jeff Bezos and friend will be playing do-it-yourself snoops through a TOS specifically authorizing them to poke around your machine to see if you’ve been a good boy or girl. Naughty, naughty, naughty you’ll be if Jeff somehow finds you’ve been bypassing the DRM, and I doubt the punishment will be just a lump of coal. Away could go your Kindle service and book access—just read Amazon’s Terms of Use: “In case of such termination, you must cease all use of the Software and Amazon may immediately revoke your access to the Service or to Digital Content without notice to you and without refund of any fees.”

And since that Kindle’s got no other use but reading e-books that you get from Amazon, you then have a brick. An ugly, beige, $400 brick. But wait, there’s more:

Meanwhile here’s another gem from Jeff’s snoop-friendly terms of service: “The Device Software will provide Amazon with data about your Device and its interaction with the Service (such as available memory, up-time, log files and signal strength) and information related to the content on your Device and your use of it (such as automatic bookmarking of the last page read and content deletions from the Device). Annotations, bookmarks, notes, highlights, or similar markings you make in your Device are backed up through the Service. Information we receive is subject to the Amazon.com Privacy Notice.”

Which privacy policy is then quoted, vague enough that you could easily get sold out to the feds. One thing you can say for the paper book, Amazon can’t turn it off. As much as we might want to get over the pesky inconvenience that privacy poses to the growth of social marketing by denying it exists, there are real and serious consequences to doing so. Relabeling it “identity management” in order to productize it and reduce it to a purely technological problem won’t help either. Just because you might not care who knows what you read doesn’t mean they should find out.

Technorati Tags: , , , , , , ,

Social metadata

What I Learned Today… » Blog Archive » The Return of Everything is Miscellaneous:

…Weinberger touches on the future of the ebook. He talked about how we could collect data from how people read books, the passages they highlight, where people read books and so much more using wireless enabled ebook readers (p.222) – and while it sounds like science fiction – we’re almost there. Kindle has the power of wireless technology – meaning that in theory, Amazon could connect to our readers and collect data. While this sounds scary and like a huge invasion of privacy – imagine the power that this data could provide. Some examples Weinberger has is that you could create a list of books that people most often read at the beach or a list of books people stopped reading 1/2 way through – how cool would that be?

Well, because the only people I can think of who would find that data valuable would be marketers. So I don’t think it would be that cool. And it is scary and a huge invasion of privacy. When the government starts asking Amazon for tracking data on where you and your Kindle were last Tuesday, you probably won’t think it’s very cool either. Especially if you can’t turn it off.

Technorati Tags: , , , ,

OCRopus Garden

Ars reviews Google’s OCRopus scanning software. We may play with this a bit internally; everybody seems to use Abbyy, but everyone also seems to think that OCR pretty universally sucks, based on the anecdotal evidence I have heard. What I found especially interesting in this review was the huge difference in results from sans-serif rather than serif text:

The following examples show the typical output quality of OCRopus:


Tpo’ much is takgn, much abjdegi qngi tlpugh we arg not pow Wat strength whipl} in old days Moved earth and heaven; that which we are, We are; QpeAequal_tgmper of hqoic hgarts, E/[ade Qeak by Eirpe ang fqte, lgut strong will To strive, to Seek, to hnd, and not to y{eld.


Tho’ much is taken, much abides; and though We are not now that strength which in old days Moved earth and heaven; that which we are, we are; One equal temper of heroic hearts, Made weak by time and fate, but strong in will To strive, to seek, to find, and not to yield

Night and day. Of course almost everything we would possibly be hoping to OCR would be serif text. Ain’t it allus the way.

Technorati Tags: , , , , , , , , ,

Wrighting the rong

While reading a Kevin Kelly post about an HG Wells novel that actually was credited in real scientific work, I saw this graphic:

World Set Free

And thought “Cool! A link to the book in the Internet Archive!” Alas, I was wrong. Not only was the image not linked to the IA copy – the image wasn’t linked to anything – the link later in the post was your standard Amazon Associate link. Disappointing. So I’ll right that wrong here:

World Set Free

Go forth and read freely.

Technorati Tags: , , , ,

OCR services?

As part of a IMLS grant we’re working on, I need to find a company that will OCR and double-key about 165k entries from the Index to American Botanical Literature. The entries are spread over a number of volumes. I already know about Digital Divide Data – they were the company we had originally approached about this project, but that was a while ago, and if there’s any other companies people know of, I’d appreciate hearing from you. Thanks!

Now software questions

Scanners the last time, this time it’s presentation software. Or is that digital library software? Collection management software? Our original pilot project went up on a very old version of Greenstone, and again I am having trouble turning up anything more than Greenstone and CONTENTdm (Perhaps the google-fu is weak in this one.) Our Herbarium uses KE Software’s kEMu for its collections, and while it seems strong in some areas, I have some reservations about its use for digital library collections, mainly that I can’t find a whole lot of libraries using it. (Also, it doesn’t appear to have any MARC support.) Again, is there something I am missing? Are people just using LAMP stacks for this?Are most installations just homegrown? Lots to learn…

ALA: what the hell is it?

I had originally written this as a comment on kgs’ ALA: what is to be done? post, but since it kept getting longer, I thought I’d post it here instead.

First, some background: I graduated lib school in 93, worked in software/web development till 2005. Started in a systems librarian job in April of last year.

That said, ALA makes me nuts. A few random things come to mind:

1. I joined ALA at a student rate when I was taking classes towards possibly getting school media specialist certification. (Dodged that bullet.) Despite contacting them to tell them I was no longer a student and even though I now pay non-student (read: full) membership fees, every single piece of mail they send me (at MPOW, no less) is addressed to “John Mignault, student.” This irritates me anew every time I see it.

2. The organization is enormous. I find it byzantine and incomprehensible, and I’ve programmed in PowerBuilder. There are too many fee-requiring sub-associations, divisions, councils, round tables, kaffeeklatsches, cells, and jamborees. There is too much crap to wade through, and most people don’t have the contacts that would make navigating the oranization easier.

3. WRT #2, it’s insane that so many publications (I am looking at you, Library Technology Reports) are outrageously expensive, and not included in a LITA membership. 63 bucks for a single issue of a magazine? I work in a botanical library, fer Chrissakes. We’re strapped enough as it is – I can’t ask them to pay some outrageous sum of money for these publications, and I’m already into ALA for enough dough as it is. Why isn’t this stuff online? How can anyone take ALA seriously with regards to Open Access when they act like Elsevier?

4. When I saw the absolute fetish librarians have for listservs (three letters. R. S. S), I decided I better get my subscriptions out of my personal gmail account and into another for mailing lists only. I’ve been trying to unsubscribe from LITA-L for a couple weeks now, only to continually get errors from the lisetserv processor.

5. Elections. ALA bugs the crap out of me to vote in elections. They send me postcards. They send me e-mails. Great, now I can vote for a bunch of people who I’ve never heard of whose position papers require a much greater degree of knowledge about the organization than I have. I can go by the biblioboogersphere, but they say things like “Vote for J. Random Librarian, because he/she *gets it.*” Well, I’m glad someone does, but I need more, you know?

6. Just read this passage from kgs’ post:

Council elects an Executive Board, which theoretically runs ALA, but delegates to the Executive Director of ALA, currently Keith Fiels (a good guy, but he also isn’t going to steer ALA anywhere EB isn’t taking it — and that’s correct behavior). Council nominates and elects EB. With a majority on Council, you theoretically have control of ALA (since you can elect the EB). There are just under 200 Councilors, so elect a slate of 100 Councilors and you have a majority. Yet it’s not that simple, either, because as the ALA website notes, “Council, the governing body of ALA [is] comprised of 183 members: 100 elected at large; 53 by chapters; 11 by divisions; 7 by roundtables; and 12 members of the Executive Board.” It’s not impossible that a slate couldn’t include chapter, divisional, or roundtable candidates, but it would require more effort, and since not all Councilors are elected at the same time, you can’t just run 100 at-large candidates. More likely than electing Councilors from chapters and divisions is first, to build a reform Council over several years, and second, that a strong Council EB slate would pick up additional votes outside the original reform slate.

My head hurts.

Anyway, that’s a start.

Book scanners

MPOW is struggling towards getting digitization off the ground, and one of the things I’ve been looking at are book scanners. We often scan rare or fragile (Italian) material, so smooshing down a book onto a flatbed isn’t acceptable. I was surprised at how few vendors there are to choose from. There’s Kirtas, which makes a high-end machine that can do up to 2400 pages an hour. I saw one demonstrated at the BookExpo at Javits last week, and they’re very cool. The book is held in a cradle, and the pages are turned by means of a puff of air. It works quite well, and it scores very high on the Neat-O Scale. It’s very expensive, though, and we don’t have the necessary volume of material to be scanned to justify buying one of these. We’ve done some outsourcing to Kirtas, and been pleased with the results, but it’s overkill for us.

Then there’s the Atiz BookDrive DIY. Most book scanners have the same basic setup: a scaffolding encloses a platen for the book along with mounts for 2 digital cameras pointed at either page of the book. Atiz sells you the scaffolding and lets you pick the cameras yourself, thus the DIY. Atiz also makes something called the BookDrive, which supposedly enables fully unattended scanning. It’s a fully enclosed unit (reminded me of a toaster oven) that turns the pages of the book via an arm with a mild adhesive on it. It gives me the willies to even consider that.

I love the Scribe scanners that the Internet Archive is using, at least in part because I agree so strongly with the ideology and goals of the project, but again, we don’t have the volume to qualify for an on-site Scribe, and we will probably be doing some outsourcing to NYPL’s Scribe station later this year.

We already use a Minolta book scanner, and the Indus (ours are branded BookEye,) so I know about those already. But I haven’t really been able to find anything else, and you’d think there’d be more out there. Anyone know of any others?



Copyright © 2004–2009. All rights reserved.

RSS Feed. This blog is proudly powered by Wordpress and uses Modern Clix, a theme by Rodrigo Galindez.