Talking Point: Could Linux Abandon Directories In Favour Of Tagging?

For a fairly scruffy looking guy, I have a surprisingly healthy approach to organising my files. However, I'm constantly pushing up against the limitations of a system that is based around directories. I'm convinced that Linux needs to make greater use of tagging, but I'm also beginning to wonder if desktop Linux could abandon the hierarchical directory structure entirely.

Why is it that web based technology such online bookmarking makes far greater use of tagging than the Linux desktop does? Directories for files are based on the way that humans have always organised items in the real world, using categories and sub categories. Thanks to powerful computers and cheap, plentiful storage, tagging now offers a method of storage that isn't based on placing files in one place or another.

The word processor file that makes up this article is stored /documents/articles/linux_journal/ but it could be even more efficiently organised if I could easily tag it as “documents”, “articles”, “linux journal” as well as “op ed”, “daft ideas”, “tagging”, “linux” and “web posts”. That way I could find it by browsing through alll of the web posts I've made this year or all of the op-ed peices I've ever written.

Some organisational situations illustrate the weakness of the hierarchical approach. For example, if I download some independent electronic dance music, where do I place it within a hierarchical system file system? Does it go in /mp3/dance/electronica/independent or /mp3/independent/electronica/dance? Which system works best depends on whether the significant factor is that it is electronica or independently produced. This is where tagging comes into its own as it allows objects to be placed in more than one category at once.

When dealing with files, there's a distinction to be made between the files that I normally care about and those that I only care about when I'm fiddling around inside Linux's innards. The default setup of most Linux distributions acknowledges this distinction as the files are stored either:

  • outside of the /home directory (files that I don't care about most of the time)
  • inside the /home directory but hidden (more files that I don't care about most of the time)
  • inside the /home directory and visible (these are the files that I care about)


It's this last category of files that is ripe for being moved over to a tagged system. Abandoning the directory system outside of the /home folder would mean not only designing a new operating system but also designing a new set of applications.

Application awareness could make tagging more useful, because as it stands, when I'm opening files or saving them, I can't use tagging most of the time. For one thing, application awareness could reduce the tagging workload. A word processor could set the tag of a file as a “text document” and perhaps offer me some pertinent tags from the system tag cloud to go with it. When I download a file within Firefox, I bet that it would be fairly easy for the developers to make it tag the file as “downloaded”. That way it keeps that information when I also decide that it belongs in the “video” and “trailer” “film” “science fiction” “have watched” categories.

Most people probably have a fairly fixed idea of what they think a file browser is, but a large proportion of applications are actually specialised file browsers. Why couldn't a tag-aware file browser suddenly switch into music browsing mode as soon as I select the music file tag? If it automatically switched to the details view, added an extra pane on the left hand side for an album view, gained a time elapsed counter in the status area along with some transport controls, you'd have a fairly good music player. Email clients are also specialised file browsers. In the classic three pane layout, the left area represents the folders, the top right hand pane shows the files, and the bottom right pane is a viewer. Click on the message and it opens a slightly specialised text editor.

Ubiquitous tagging for normal desktop use would be a way for desktop Linux to get ahead of the competition, and I have an idea that it would particularly appeal to people who weren't computer experts. Bear in mind that non-experts don't have any difficulty understanding tagging on the web.

I see the two main barriers to greater adoption of tagging on the desktop as the lack of a unified standard for metadata and the aforementioned lack of application awareness. I wonder which will be the first mainstream distribution or desktop environment to experiment with removing directories and going 100% tagging for end users?

The tagging image used as the icon for this article was created by Salvatore Vuono. Downloaded from Free Digital Photos.

______________________

UK based freelance writer Michael Reed writes about technology, retro computing, geek culture and gender politics.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Tagistant FUSE filesystem

spurious's picture

You may want to investigate the Tagistant project:
http://www.tagsistant.net/index.php

It's a tag-based "semantic" filesystem based on FUSE and Sqlite. I have not had time to use it myself, so I can't attest to how well it functions.

Reminds me of Hoarders

GreyGeek's picture

Hoarders: folks who pile things all in one room (or house) until there is no room to walk around. They mentally tagged where each item was but after the passage of time they no longer can find a specific item. When a hoarder's stash is cleared up and a long lost item is discovered the response often is "Oh, I wondered where that was".

Other problems with abandoning directories is that of ownership and security, one of the hallmarks of Linux. It would put a huge burden on a file browser or a shell to show only those files a particular user has the right to see, modify or delete, each file having to be examined in order to decide.

I just used "find . -type f | wc -l" as root to discover that my Kubuntu 10.4 system has a total of 502,862 files! A SOHO or corporate LAN might contain hundreds or thousands of times as many files. Microsoft's "Active Directory" is already a slug when trying to show the files in a directory with only a fraction of that total. Having a Linux file browser scan/sort/display who knows how many millions of files residing in one "directory" (or on one HD) would make AD look like lightening. Obviously a heavily indexed database with fast response times would be necessary.

My "tags" are my sub-directory names, and that paradigm works quite well for me, so I'll pass on what the author is proposing. Besides, as one comment already pointed out, there are apps that already allow for file tagging. No need to change the Linux file hierarchy structure.

sounds like a movement

Anonymous's picture

This sounds to me like a situation where we have tagging enthusiasts who are enthusiastic to the point of wanting to force people to use tags ("why can't the fools see the light?"). Since I don't see the either/or issue in regard to tags/directories, it's hard to see the need to force people to use tags.

I thought that the best plan for deciding where computing/Linux is going is that anything new should supplant something old by the fact that the vast majority abandon the old, not that the fans of the new abolish it. So show me the data on that.

A better way

Javier's picture

I don't think that Tags are that useful, using them would imply to have to memorize lots of tags. Since entropy is a fact and you cannot remember your old ideas forever, after 2 years you may also wonder "why did I tag that file that way". Of course the same can happen with a hierarchical structure but the advantage there is that you only need to remember the broader subject and then go into more specific sub-categories by choosing among the options that you created, without having to remember them at all times. We all need to define our own standard about how to categorize things, most OSs have implemented pre-stablished directories like Documents, Videos, Pictures, tmp, etc. But we still need to define by ourselves (perhaps with some expert's advice) how to categorize our files beyond those main categories. One useful tool could be that in addition to the well know Ctrl-C Ctrl-V to copy and paste, an easy Ctrl-C Ctrl-L standar option were made also available in order to copy and paste symbolic links of the original files in many different places if those files fall under many different categories.

"I don't think that Tags are

xbackslashx's picture

"I don't think that Tags are that useful, using them would imply to have to memorize lots of tags"

I believe tags are as usefull as you make them, just like directories.
If you tag your music files with 'music' and {artist name}, you do not have to remember those tags, it is only logical that you would tag them so.

Most of the time you apply logic to the naming of directories and subdirectories in which files are categorized.

Logic does not need to be remembered, it is something that comes naturally to us and thus using tags this way I believe can be as efficient as using directories (but one does not need to replace the other).

tagging is heavily suited to the GUI

Equitas's picture

Okay so you're talking of desktop so it's pretty much a given that it will be GUI-based but tagging and associated searching/finding of files is heavily biased towards the GUI. How do you backup or copy only the 1000 files that are tagged with "client name" using rsync, tar or equivalent?

Also you have a heavy presumption on documents here - productivity ones at that. That's fine but there are a host of other files I have under /home. Finally with a directory structure it's easier to implement permissions based on groups etc. How would you easily restrict access to one bunch of files by group if they were all stored in one directory and tagged only?

Okay so you're talking of

Anonymous's picture

Okay so you're talking of desktop so it's pretty much a given that it will be GUI-based but tagging and associated searching/finding of files is heavily biased towards the GUI. How do you backup or copy only the 1000 files that are tagged with "client name" using rsync, tar or equivalent?

Also you have a heavy presumption on documents here - productivity ones at that. That's fine but there are a host of other files I have under /home. Finally with a directory structure it's easier to implement permissions based on groups etc. How would you easily restrict access to one bunch of files by group if they were all stored in one directory and tagged only?

While I have some reservations, I don't see the issues you raised as being significant. Yes, rsync, tar and other tools would have to be updated to support tags but rsync -t "client name" would be trivial.

Similarly, it should be easy to build a mechanism that allows you to set permissions based on tags. It would be equivalent to allowing you to set multiple groups on a file under the current system and to set permissions based on each group.

Have you tried using xattr?

Kalin's picture

I think extended attributes will answer some of your problems, check them out. Software like Beagle uses them, but you can use the user namespace for anything you like.

Software is/will be ported to use them. For example, see wget:
https://github.com/wertarbyte/wget/tree/xattrurl

Cheers,
Kalin.

Was about to mention it myself.

Ulrik's picture

I too think xattr + fast indexer is the way to go here.

The challenge as I see it, is getting the indexer enough fast, lightweight and subtle for the user to always run it in the background. Most people I know turn of all indexers, be it beagle, tracker, or Windows Desktop Search, since they suck up a little too much RAM, and thrash the I/O-cache a little too much.

I agree

Kasey Erickson's picture

I realized the strength and flexibility of tagging when first using labels within gmail. I love the idea of having attributes on a file and then searching or sorting by attributes. The current linux file system architecture is has lots of mileage and is very powerful. I love it. Merging these two in an elegant way would be the challenge. Maybe just keeping them separate and adding this type of functionality to an arbitrary folder, say /home/{user}/taggable-data, would be a decent trade off. If I want version control in a directory, I use git (git init....) and use the tool from there. Why not have a similar tool that uses the existing filesystem yet presents a DB view of the directories contents. That tool could be command line app or a file manager plugin that allows you to add a file, set/modify/delete its attributes, and search for a file. Apple has been doing this for some time with they way iPod songs are presented to users either on the computer or the iPod.

dear god no

Anonymous's picture

dear god no

Semantic Desktop

Michael Calabrese's picture

Isn't this what the "Semantic Desktop" is for? In KDE you can tag, rate, comment on your files.

Personally, I see it that tagging too much effort. I have to do more work saving files. Then I have to remember what tags that I used (more work). Keep a good hierarchy keeps my search time low. If I need a broader search, I just use general search tools (find/grep type).

Semantic Desktop

Caesar Tjalbo's picture

This is what popped into my head too. Granted, I don't use it myself but the author should check it out. Apart from semantic desktop, he could use 'nepomuk' as a search term.

excellent notions, like cross-indexing

Anonymous's picture

Awesome ideas, I like them! Faster, sensible, and more like how we actually work. The traditional FS hierarchy is very limited and restrictive. The notion of file metadata with tags that cross multiple categories is closer to the paper-and-filing-cabinet world-- paper files can live in only one physical location, but can logically be in multiple categories. This is handled with cross-indexes, which assign multiple categories to single files.

why one or the other?

Renich's picture

I don't understand why can't both be useful.

I mean, it's not that you have to choose between filesystems and tags. You could do both. I mean, simplify the filesystem hierarchy a bit and store stuff properly tagged.

For example, you could just drop music into the Music folder and tag it accordingly.

One thing to think about would be duplicates. What about duplicates and/or untagged files?

Besides, you could already tag files if you're using Tracker; which permits tagging and all.

I think KDE even has this functionality built in. I wouldn't know; I'm a GNOME user. ;)

It's hard to be free... but I love to struggle. Love isn't asked for; it's just given. Respect isn't asked for; it's earned!
Renich Bon Ciric

http://www.woralelandia.com/
http://www.introbella.com/

tags in filename

Joe Bloggs's picture

You can put tags in the filename (seperated by underscores) and use find to retrieve them.
I wrote a script which prompts for some tags and then fills a directory with symlinks to files matching those tags.
I only use this for certain files which I store in a single flat directory.

It's not directory based but inode based

Anonymous's picture

You can easily hardlink your files wherever you want. If you want to use tags then read the meta data that is included within the files

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState