EOF - Now Data Gets Personal
The main problem with data is that it's easy to copy. In fact, sending it from one place to another is essentially an act of replication. The mv command is alien to most people's experience of the Internet. They may use it every day (hardly realizing it) within their own filesystems, but between separate systems on the Net, their experience is that of replication, not relocation.
This alone makes control of one's data problematic, especially when the first-person possessive voice isn't quite right. “My” data often isn't. For example, take profile or activity data kept by a service provider. It's from you and about you, but you don't own it, much less control it. Transaction data is created by a buyer and a seller together, but the canonical form of the data is what's kept by the seller and provided (by copying) in the form of a bill or displayed on an encrypted personal Web connection. Sellers don't go much further than that. The idea of sharing that information in its raw form, either during a transaction or later on request by the buyer, is alien at best to most sellers' IT and legal departments. As John Perry Barlow put it in “Death From Above” (way back in 1995, w2.eff.org/Misc/Publications/John_Perry_Barlow/HTML/death_from_above.html), “America remains a place where companies produce and consumers consume in an economic relationship which is still as asymmetrical as that of bomber to bombee.” In fact, this is still true of the whole business world.
Yet, internetworking of that world brings a great deal of symmetricality to it, imposed by the architecture of the Internet and its growing suite of protocols. The bank that used to occupy the most serious building on Main Street—or a skyscraper in a big city—is now but one location among a trillion on the Web. Yours is another. The word “domain” applies to both of you, even if your bank's “brand” is bigger than yours. Of your own sense of place and power on the Net, the words of William Cowper apply (www.bartelby.com/41/317.html): “I AM monarch of all I survey; / My right there is none to dispute...”
Yet, as William Gibson famously said, “the future is here but not evenly distributed.” Bomber/bombee power asymmetries persist in the B2C (business-to-consumer) world of everyday retailing. When you buy something, the transaction data in most cases comes to you only in the form of a receipt from the seller and a bill from the credit-card company. Neither is offered in formats that allow you to gather data on the spot or later over a secure Net connection—not easily, anyway.
If we could collect that data easily, our self-knowledge and future purchases would be far better informed. In fact, collected data could go far beyond transaction alone. Time, date, location, duration, sequence—those are obvious ones. How about other bits of data, such as those involved in dealings with airlines? For example, your “fare basis code” (HL7LNR, or some other collection of letters and numbers) contains piles of information that might be useful to you as well as the airline, especially as you begin to add up the variables over time.
A marketplace is no better than the knowledge and practices that buyers and sellers both bring to it. But, while the Net opens many paths for increasing knowledge on both sides, most of the knowledge-gathering innovation has gone into helping sellers. Not buyers.
Today, that's changing. More and more buyers (especially the geeks among them) are getting around to helping themselves. In particular, two new development categories are starting to stand out—at least for me. One is self-tracking, and the other is personal informatics.
Compared to its alternative (basically, guessing), self-tracking is “know thyself” taken to an extreme. Alexandra Carmichael, for example, tracks 40 things about herself, every day. These include mood, chronic pain levels, sexual activity, food intake and so on. She's a star in the Quantified Self community (www.kk.org/quantifiedself), which is led by Gary Wolf and Kevin Kelly. Among topics at QS meetups are chemical body load, personal genome sequencing, lifelogging, self-experimentation, behavior monitoring, location tracking, non-invasive probes, digitizing body info, sharing health records, psychological self-assessments and medical self-diagnostics, to name a few.
Now, would any of these be extreme if they were easy and routine? Well, that's the idea. ListenLog (cyber.law.harvard.edu/projectvrm/ListenLog), one of the projects I'm involved with, doesn't make sense unless it's easy, and unless the data it yields is plainly valuable.
This brings us to personal informatics, which is a general category that includes self-tracking and extends to actions. All this data needs to live somewhere, and stuff needs to be done with it.
In the commercial realm, I see two broad but different approaches. One is based on a personal data store that might be self-hosted by the customer or in a cloud operated by what we call a fourth-party service (serving the buyer rather than the seller—to differentiate it from third parties, which primarily serve sellers). As Iain Henderson (who leads this approach) puts it, what matters here “is what the individual brings to the party via their personal data store/user-driven and volunteered personal information. They bring the context for all subsequent components of the buying process (and high-grade fuel for the selling process if it can be trained to listen rather than shout).” The other approach is based on complete user autonomy, whereby self-tracking and personal relationships are entirely the responsibility of the individual. This is exemplified by The Mine! Project (themineproject.org/about), led by Adriana Lukas. As she puts it, the difference between the two approaches is providing vs. enabling (www.mediainfluencer.net/2009/04/enabling-vs-providing).
Either way, the individual is the primary actor. As distribution of the future evens out, the individual has the most to gain.
Doc Searls is Senior Editor of Linux Journal. He is also a fellow with the Berkman Center for Internet and Society at Harvard University and the Center for Information Technology and Society at UC Santa Barbara.
Doc Searls is Senior Editor of Linux Journal
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




7 hours 40 min ago
8 hours 14 min ago
9 hours 12 min ago
10 hours 3 min ago
14 hours 4 min ago
17 hours 52 min ago
18 hours 11 sec ago
20 hours 14 min ago
22 hours 44 min ago
1 day 8 hours ago