January 2011 | the Little Projects of Shawn M. Jones

Copyright © is not enough

2011/01/25 23:25:03

Tonight’s Digital Libraries class covered Copyrights, Patents, Copylefts, and other intellectual property concepts. This is especially important in the digital libraries world because the laws surrounding intellectual property make preservation challenging.

An important take-away from tonight’s lecture was that merely placing “© 2010 Shawn M. Jones” at the bottom of my pages is not merely enough to protect it legally. A notice must appear somewhere on the site for its content. Several years ago, Lawrence Lessig created the Creative Commons license to allow those on the Internet to share their work while retaining their copyright to said work.

To provide a license for this blog, I’ve filled out the form on the site and followed the directions on the web site for linking to the chosen license, like so:

the Little Projects of Shawn M. Jones by Shawn M. Jones is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

We’re quite lucky to live in an era where folks are willing to do the legal legwork necessary to make this happen. I’m happy that it’s even held up in court.

Finding a kernel on the web

2011/01/20 00:02:35

One of the great agonies of a human being is searching for that little kernel of knowledge that actually answers their question. Within a traditional library, one would ask the reference librarian to lead them to the documents that, hopefully, answer their question. On the web, we use search engines as if they were reference librarians, and search engines are a poor substitute, but they are all we have for now.

Within a traditional library, the information about a book (metadata) is stored within some system (e.g. MARC), and this system is linked to some library classification (e.g. Dewey Decimal) for finding it on the shelf at a particular library. A whole profession exists for making this happen. Book metadata is chosen by professionals so that said book can be delivered to the person looking for the information within. These professionals (catalogers) are the gatekeepers of the whole system. Without them, the books might as well be strewn about.

On the web, there is no central authority. Every site is responsible for its own content. Search engines like Google use complex algorithms to try to find something that answers your question. Web site owners must take it upon themselves to not only ensure that their site stays consistent and correct, but also that it has metadata for these search engines to use to find them. Though Search Engine Optimization (SEO) is largely used to ensure potential customers find businesses, it is also important in helping users find the information they are looking for.

Now that I am aware of the importance of such metadata, I have installed a plugin for WordPress on my blog that generates Dublin Core metadata elements. These metadata elements are supposed to help others find articles like this one via search engines.

This plugin takes the existing metadata I was already supplying for each post, and places it in the header of the HTML at the top of the page, like so:

<meta name="DC.publisher" content="the Little Projects of Shawn M. Jones" />
<meta name="DC.publisher.url" content="/blog/" />
<meta name="DC.title" content="Finding a kernel on the web" />
<meta name="DC.identifier" content="/blog/?p=188" />
<meta name="DC.date.created" scheme="WTN8601" content="2011-01-19T23:52:11" />
<meta name="DC.created" scheme="WTN8601" content="2011-01-19T23:52:11" />
<meta name="DC.date" scheme="WTN8601" content="2011-01-19T23:52:11" />
<meta name="DC.creator.name" content="Shawn M. Jones" />
<meta name="DC.creator" content="Shawn M. Jones" />
<meta name="DC.rights.rightsHolder" content="Shawn M. Jones" />
<meta name="DC.language" content="en-US" scheme="rfc1766" />

I don’t really expect the search engine rankings to go up, but the real win here is that I’m helping others index my site in case I’ve actually provided exactly the information someone is looking for. In a way, this is a form of SEO, but it gets back to that cataloging spirit originally found in the library. There is no common list of tags or subjects for the web that we all must adhere to, but little steps like this bring us closer to finding the information we are looking for.

Take a look at the source for some of your favorite news sites, you’ll probably see the same metadata in their headers too.

For futher reading:

Metadata: The Foundations of Resource Description

Lil’ bit: scp and echo statements in your login scripts

2011/01/16 12:26:43

For years, I have used scp to transfer files between Unix/Linux machines. I noted quite a while ago that having an echo statement in my login script (in this case .bashrc) caused the file transfer to fail, so I removed all echo statements from login scripts, using them only for temporary debugging. Today, I believe I have found the way to have my cake (use scp to copy files) and eat it too (leave the echo statements in the login script).

If your .bashrc looks like this:

#!/bin/bash
echo "Running the .bashrc file"
 
# correct spelling mistakes
shopt -s cdspell
 
# save multiline commands in the command history
shopt -s cmdhist
 
# some useful aliases
alias c='clear'
alias ls='ls -p'
alias vi='vim'
 
echo "done with .bashrc file"

And you connect via SSH, you’ll see the following:

[me@otherhost]$ ssh myhost
me@myhosts's password:
Running the .bashrc file
done with .bashrc file
[me@myhost me]$

Which is fine for an SSH session. In fact, you might even ask questions of logging in users, or dump an entire warning banner to the screen to indemnify you for legal reasons. Many interactive possibilities exist.

Now say you wish to copy a file from one server to another:

[me@otherhost]$ scp myfile myhost:~/
Running the .bashrc file

Once this is done, you only see the output from the first echo. What’s worse, the copy never happens. Some claim this is a bug. The fact that the bug was reported in 2000 and still exists in scp indicates to me that the scp authors don’t consider it a problem.

The solution is in testing for the existence of a terminal:

#!/bin/bash
if tty &gt; /dev/null 2&gt;&amp;1; then
    echo "Running the .bashrc file"
fi
 
# correct spelling mistakes
shopt -s cdspell
 
# save multiline commands in the command history
shopt -s cmdhist
 
# some useful aliases
alias c='clear'
alias ls='ls -p'
alias vi='vim'
 
if tty &gt; /dev/null 2&gt;&amp;1; then
    echo "done with .bashrc file"
fi

Now scp will work without any output to the screen. More importantly, your files will be copied:

[me@otherhost]$ scp myfile myhost:~/
me@myhost's password:
myfile                               100% 1896KB   1.9MB/s   00:00

Thanks to the forums on Ars Technica for leading me to the answer to this issue. As that post existed in 2000, I now see why the openssh folks haven’t “fixed” scp.

This site is a blog

2011/01/10 04:33:20

I’ve resolved that what I’m really looking for in a web site is a blog.

I wanted the site to do the following things:

allow me to publish blog posts, which I was already doing with WordPress
allow me to publish the occasional article, which I seem to have little time or inclination to do, so I ditched Drupal
allow me to publish photos, for which I’m currently using Picasa Web Albums

I’m maintaining my own site on my own server for the following reasons:

Blogger was too slow to load on many of the networks I experimented with
WordPress has more features than Blogger.
WordPress is open source, and I can learn about this nifty piece of software by running it myself.
I had already moved my mail services off of Gmail and onto my rented server, and wanted to consolidate my web services there as well.
I love messing with a Linux server in my spare time.

We’ll see how far this Intel Celeron 2.53 GHz machine with 1GB of RAM will get me.