Talking Point: Could Linux Abandon Directories In Favour Of Tagging?

For a fairly scruffy looking guy, I have a surprisingly healthy approach to organising my files. However, I'm constantly pushing up against the limitations of a system that is based around directories. I'm convinced that Linux needs to make greater use of tagging, but I'm also beginning to wonder if desktop Linux could abandon the hierarchical directory structure entirely.

Why is it that web based technology such online bookmarking makes far greater use of tagging than the Linux desktop does? Directories for files are based on the way that humans have always organised items in the real world, using categories and sub categories. Thanks to powerful computers and cheap, plentiful storage, tagging now offers a method of storage that isn't based on placing files in one place or another.

The word processor file that makes up this article is stored /documents/articles/linux_journal/ but it could be even more efficiently organised if I could easily tag it as “documents”, “articles”, “linux journal” as well as “op ed”, “daft ideas”, “tagging”, “linux” and “web posts”. That way I could find it by browsing through alll of the web posts I've made this year or all of the op-ed peices I've ever written.

Some organisational situations illustrate the weakness of the hierarchical approach. For example, if I download some independent electronic dance music, where do I place it within a hierarchical system file system? Does it go in /mp3/dance/electronica/independent or /mp3/independent/electronica/dance? Which system works best depends on whether the significant factor is that it is electronica or independently produced. This is where tagging comes into its own as it allows objects to be placed in more than one category at once.

When dealing with files, there's a distinction to be made between the files that I normally care about and those that I only care about when I'm fiddling around inside Linux's innards. The default setup of most Linux distributions acknowledges this distinction as the files are stored either:

  • outside of the /home directory (files that I don't care about most of the time)
  • inside the /home directory but hidden (more files that I don't care about most of the time)
  • inside the /home directory and visible (these are the files that I care about)


It's this last category of files that is ripe for being moved over to a tagged system. Abandoning the directory system outside of the /home folder would mean not only designing a new operating system but also designing a new set of applications.

Application awareness could make tagging more useful, because as it stands, when I'm opening files or saving them, I can't use tagging most of the time. For one thing, application awareness could reduce the tagging workload. A word processor could set the tag of a file as a “text document” and perhaps offer me some pertinent tags from the system tag cloud to go with it. When I download a file within Firefox, I bet that it would be fairly easy for the developers to make it tag the file as “downloaded”. That way it keeps that information when I also decide that it belongs in the “video” and “trailer” “film” “science fiction” “have watched” categories.

Most people probably have a fairly fixed idea of what they think a file browser is, but a large proportion of applications are actually specialised file browsers. Why couldn't a tag-aware file browser suddenly switch into music browsing mode as soon as I select the music file tag? If it automatically switched to the details view, added an extra pane on the left hand side for an album view, gained a time elapsed counter in the status area along with some transport controls, you'd have a fairly good music player. Email clients are also specialised file browsers. In the classic three pane layout, the left area represents the folders, the top right hand pane shows the files, and the bottom right pane is a viewer. Click on the message and it opens a slightly specialised text editor.

Ubiquitous tagging for normal desktop use would be a way for desktop Linux to get ahead of the competition, and I have an idea that it would particularly appeal to people who weren't computer experts. Bear in mind that non-experts don't have any difficulty understanding tagging on the web.

I see the two main barriers to greater adoption of tagging on the desktop as the lack of a unified standard for metadata and the aforementioned lack of application awareness. I wonder which will be the first mainstream distribution or desktop environment to experiment with removing directories and going 100% tagging for end users?

The tagging image used as the icon for this article was created by Salvatore Vuono. Downloaded from Free Digital Photos.

______________________

UK based freelance writer Michael Reed writes about technology, retro computing, geek culture and gender politics.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Wont work..

madtom1999's picture

Sounds negative I know but as with every other indexing system it will always be incomplete until every possible indexing option is exhausted.
This is a problem that people with computers have been trying to solve for 50 years, and librarians for hundreds.
The problem is not with tagging, files, inodes or URL's, its the fact the human part of the human-computer interface is a bit imprecise most of the time, and when its precise Heisenberg pops up and your looking in the wrong place again...

Not Tags vs Files but Tags AND Files

trydk's picture

I work with tagging professionally and tagging does rarely replace a file hierarchy -- it rather complements it. Tagging is a tool to quickly find related information across space (i.e. file placement) and time.

Another aspect that some people in this discussion seem to forget is the time searching a huge repository takes. If I try to search my 1TB drive with documents spanning several decades of work, it will take almost forever. And what if I didn't use the right search terms? What if the search term was misspelled in the crucial document? Indexing is obviously a way around the time problem, but that does not solve the problems of spelling and furthermore introduces another problem of when and how to index, oh and the problems of space for the index, which can be rather substantial.

For tagging to work, though, four requirements must be met:

1. Completeness
2. Consistency
3. Effectiveness
4. Ease

1. Completeness means that the tags must describe the contents completely (at least within the taxonomy chosen, which is one of the biggest problems in non-business tagging, as few people have any idea of what taxonomy to use -- even if they knew what a taxonomy is). If the tagging is not complete, it will be difficult to find the wanted information later and you may have to resort to searching, which means no savings, really.

2. Consistency means that two documents relating to the same subjects should use the same tags. This is extremely difficult with manual tagging, as practice has shown that no two humans would tag the same document in the same way if the document has the least bit of complexity to it. (And the same person would not tag the same document the same way at different times.) Consistency also means that there must be a tag repository containing the "approved" tags with a thorough explanation of their meaning and use, which obviously would be a problem in the average person's daily use as nobody would actually read the explanation.

3. A tagging system must be efficient, i.e. changes in the documents should immediately be reflected in their tags and changes in the underlying taxonomy (e.g. adding a new tag) should immediately be reflected also without taking too many system resources. This means that tagging should be considered dynamic and not static.

4. Tagging must be easy to use! If it takes almost as long to tag a document as to create its contents, nobody would do it.

In my experience, tagging is difficult, error-prone, time consuming ... and necessary. I think it is important that the people working on tagging in different systems should get together and start defining a general taxonomy that would cover the most obvious everyday use. When that is done, the systems could begin to automatically scan the documents and tentatively assign tags that users could approve or possibly change. The users should obviously be able to add their own tags too, but these should probably be separate from the "official" tags. And yes, I know there would be numerous problems with this approach still.

Oh, and tagging should be language independent, meaning that the tags should be represented in some language-independent format and presented in the user's chosen language, which means that I, with documents in several languages, could find my documents by tags, independently of their language.

tagsistant

Anonymous's picture

The following actually implements what was described as a fuse fs...

http://www.tagsistant.net/

Dynamic Folder Trees The

Anonymous's picture

Dynamic Folder Trees

The basic idea is that you would have a virtual file system created from the tag names and the file would show up everywhere that it's relevant. Application awareness would be built in because applications already know how to deal with files. The user just has to navigate the tags.

The back end would have to do several things:
1. Manage real file names to prevent clashing names
2. Manage a database files and tags associated
3. Create a virtual file system based off the tags
4. Control after how many levels to start displaying files... Obviously most gui programs would lag out if you had all images/music withing the root tag folder for images/music... assuming a very large collection.

The most major pitfall is naming the file and storing it on the real file system.
The most logical seems to be:
1. Store the files in a one level deep tree with the levels based off file type/main tag (e.g. image, music, document). obviously browsing this real directory with gui programs may have lag... but at least the file is not lost if the database becomes corrupt.
2. Name the real file with a key_value-descriptive_title. Again so it's not lost if the database corrupts

The great thing is that writes to the virtual file system using an application will automatically tag files. One odd thing to deal with in the back end is how to manage 'move' and 'copy' commands within the virtual file system... perhaps
1. move would erase all tags and repopulate them based on where it's moved too.
2. copy's function would have to remain the same for the user to not get confused... that being, a new real file is created with new tags and a new key value to where it was copied.
3. A new command would have to be created for cli users to actually append tags without actually duplicating the file.

To my understanding a tag based system, similar to this, using a virtual file system is already possible with the current kernel. The daemon just needs to be written. Special guis could come later.

Nepomuk

Anonymous's picture

I am surprised you do not mention Nepomuk. It appears to be exactly what you are wishing for and it is already here. You can tag all of your files, if you have the time and inclination (a huge undertaking). Unfortunately, there presently does not appear to be any way to save all of this work, so when you reinstall the operating system (or move a file?), the information would likely be lost and you would have to retag your archives all over again.

Storedwares tabulation

Anonymous's picture

Access to the data/code hohlraum, (multiply)encrypted/NOT usually organised by a hierarchical file system with definite, (meaningful?) names is often supplemented by user tagging if one browses through a typical user space. As such tagging is omnipresent. One can only image what chaos would ensue after a storage failure. Look in lost+found after such an event, and recovery of tens of files/fragments (disk blocks?) (if possible) would be very time consuming. (Backup data!) One can also see that (language) translation would probably lead to unsatisfactory tagging. Professional "jargon" may also cause misinterpretations: In short, a real Tower of Babel. Best to stick with the traditional approach until we are all identical robots speaking a common language, having a uniform education and assigned to a beehive society without social mobility...usw.! Is one really looking at a REGISTRY? I was hoping to see that vanish once and for all.

Use Both

stunder's picture

I have a feeling that for years to come the current method and the tagging method within meta data will be supported. I would think that most readers of LJ would be from the group that sticks with a hybrid method of put it in its correct place (e.g. Documents, Music, Photos) and then tag the file itself. While users that aren't so eager to get dirty and actually access system type files could be converted to taggers quicker. I like Renich's idea of having the manager do some of the tagging for you if you insert the file in directory Documents/linux/magazines the file gets tagged as the directory was spelled out. That would help with my laziness.

I had a similar idea about 2 years ago with Music. I would like to have a music player that pulled tag info from .ogg files and automatically inserted files with certain tags into a playlist. Then I wouldn't have to build playlist or add songs to a playlist file but the player would notice tags and build the playlist for you. This is doable with ogg because of the openness of the tags and not mp3.

God loves a working man, don't trust whitey, and see a doctor and get rid of it.

Tagging is not enough

Avishay's picture

Tagging is one way of finding things, but it has its drawbacks. It must be accompanied by a way of gathering objects into groups. For example, if I have a bunch of pictures from my last vacation, I wouldn't want to tag each and every one of them. Rather, I would expect to see them all in one location.
I deliberately used the word "object" and not "file". The high level user, in my opinion, should be able to treat some kind of compound documents (think of a rich HTML e-mail, that comes with attached images), and leave the file abstraction to programmers. Only then can tagging and grouping work.

Windows 7

Anonymous's picture

MS is having another shot at it with Windows 7. The default file search/file explorer index unit is the "library" (the tag) rather than the folder.

By default, this doesn't work for me, because the default tags are 'music' 'video' etc. I'd want 'projects' 'applications' 'virtual machines' 'code' etc.

It does completely hide the 'folder' structure from the end user. I guess Desktop Linux lags behind the competition in this area.

for a third party file manager perhaps

phaedrus's picture

i don't see the value in adding tags to a filesystem in anyway, but could see that some users might find value in a file manager that incorporates tagging. personally i would not want to label files, in the same way that i use device names or UUID rather than labels when referring to hardware. meta-data just adds a level of ambiguity that i don't need.

Directories vs. tags

spiralx's picture

I've had to use SharePoint at work for 2 years now, and if that's the way tags work, then I'm sticking with directories.

There is no way to just drag-and-drop files around. We have inherited a pile of historical stuff that I could sort out quite quickly if I could just 'pull and push', but I just can't.

We're in the process of dumping SharePoint, and spending a large sum of money and time to out all our files and folders back into the older format that we all know and can work with.

Smart reduction of the tag-cloud

Odo1's picture

How about the possibility to reduce the tag-search to the structure below a selectable folder?
If I like to see the pictures to the tags "Birthday", "Marina" and "fire" I'd only select the ~/pictures folder and if I want to hear/view the music/video to the tags "AC/DC" and "Hell" I'd select the folder ~/music or ~/video.
If I want to see the slideshow of the last vacation I'd select the real folder "~/pictures/own/20100815t0905 Scotland" via the file tools.

So I can have the best of two worlds, the tag-cloud and a folder hierarchy.

yes...

Anonymous's picture

I like the approach of moving etc. of files while only moving a camouflage symlink and not the file itself.This idea of a xml like file system is very much compatible with tagging.

this is another

Anonymous's picture

It's a nice idea, but I feel

Anonymous's picture

It's a nice idea, but I feel it doesn't work for me. I've used tagging in web applications and image organizers and I don't like it. Whenever I had to search for something I spent more time figuring out which tag I used, than following a logical tree structure. I rarely have trouble finding files in directories. I for once would't like a filesytem based on tags

I've done that ...

Tim's picture

I've implemented such a tag-based filesystem in my thesis. It replaces directories with tags, thus there are no directories anymore. So, as an example, if you save a file in /music/dance/mp3/, you will also find it in /dance/mp3/music/, /mp3/dance/music/, /mp3/music/, /music/, .... and so on (btw. it also has a new kind of metadata system). It's very nice to use, but it's not ready for serious usage because it's way too young...

I think, most people don't see the most important issue with tagging: The amount of tags is growing significantly over time. While new directories were hidden in other directories, new tags are added to a large number of existing tags, so maybe a 'ls' will list you a few hundred tags someday.

Another major issue is, when files are ambigious. For example one file called "file" is located in /a/b/c, an another with the same name is located in /a/b/d/, listing the contents of /a/b/ will result in two files with the same name.

I solved these problems (as good as possible :)), but i think most people seem to forget about these issues when talking about a filesystem with tags.

If someone wants to know more or wants to test my filesystem, feel free to ask me here or via mail: lucidfs (at) timmjati.de

tags are already obsolete

clasqm's picture

Three words: Beagle. Spotlight. Google Desktop Search. OK, so that's five words, so sue me.

Why sit there laboriously adding tags to thousands of files? isn't that the kind of manual labour drudgery computers were supposed to liberate us from?

Rather let the computer read the actual content of the file and create a database of files and what they contain. This development is still in its infancy, but for text-based files, at least, it works, I can find any text file on my Mac in seconds. If your text-based file does not contain the data you need to find it, then you may need extra tutoring in prose composition :-)

Images are more of a challenge, but already there are applications that you can command to look for "the mostly red image that I created about two months ago": http://www.ironicsoftware.com/deep

I'm not aware of any effort to apply this to music, but in principle it should be possible to analyze an mp3 and then ask for "a piece in 3/4 time at a slow tempo" or whatever musicians need to search for.

Beagle does much the same for my PCLOS netbook. All of these work just fine with tags, but the tags are supplemental, not the main show.

Let the machine do the work!

symbolic links

John S. Holland's picture

When I need to categorize items in more than one way, I use symbolic links. That method enables me to have a correlation between my file structure and the physical arrangement of my hard disk. If you adopt tagging, I hope that you offer it as an option that I can reject.

Lobotomy Project

MadBob's picture

The last idea about "everything is a specialized filemanager" is one of the pillars of the Lobotomy Project concept ( http://lobotomy-project.org/wiki/Thoughts ).
That was just an intellectual exercise, I never produced some effective running code about that (apart some primitive proof of concept), but many of the components adhere to your proposal.

Tagging and those who can barely use a computer

Name already taken's picture

As I read this article I keep thinking of a certain person I know, who is not very computer savvy, that is constantly having problems finding saved documents.
The concept of a hierarchical file system is completely beyond their grasp. If everything could be poured into a single heap with tags that the computer would then use to pull out likely candidates for the item being looked for this individuals time would be more efficiently used (as would mine by not having to search for mis-filled documents for this person).
This would also allow for more compact data storage in one of my own projects as I have a large group of pdf documents that need to be referenced by catalog number or any one of several other identifying characteristics.
I can see where a well developed tagging system would allow properly developed system software to begin to anticipate a users next need based on the contextual relation of the previously searched items.
ED

Not a new idea, but a good one

Anonymous's picture

I remember a presentation from Microsoft about the future Windows Vista. They intended to implement a new file system, not hierarchical, based on a database. The whole disk would be a database for storing the data (no more as files), and with some kind of tagging system. Later they abandoned this idea, I don't know why. The idea is good, but it certainly is not easy to implement, or it is not simple to be used by normal users, or not efficient. Maybe be it's time to have another look at it.

directories as namespace

Anonymous's picture

Let's not forget that directories actually serve another important function of acting as a namespace for filenames. It allows us to have files with the same name but in different directories.
Someone mentioned using UUIDs for filenames to avoid name collisions but humans aren't good at identifying stuff using UUIDs. So perhaps we can use a tag (perhaps even a special tag) to identify files. However doing so doesn't really solve the name collision issue (remember that we WANT them to be able to collide as a feature) unless there are further rules and restrictions on such usage of the tags. We'll probably end up implementing a directory hierarchy using tags.

Why switch?

Anonymous's picture

Currently, with absolute paths, the representation of files and directories on the filesystem is simple and concise. If you write /home/user/Music/lost-in-space.ogg, you mean exactly that. There are no ambiguities. The vagueness of a tag based filesystem worries me, and I feel that if this were to be implemented, it should be at the filemanager level and be completely optional. Back in my M$ Windows days, I could write metadata for files with Explorer, but I never did because it was simply a waste of time as I never needed to search for files.

And for the record, I dump all of my music in ~/Music, but I can see why that might be an issue for people with larger music collections. If I want a song, I just search for it using my audio player.

This could be big!

Marcus Rhodes's picture

Imagine if Linux were to offer something Apple never even thought of, and M$ failed to deliver with WinFS, and even failed to execute elegantly with their briefcase, paperclip (wasn't that what it was called?), and 'Open an Office document' item on the Start menu.

And it needn't require a database. We could just add a layer to an existing file-system, and expand on the use of the kinds of tags music and video files already use. That way we could eliminate the need to think of a unique name, or combination of path and name, for a file. I mean, clearly, people have trouble with this anyway, like thinking of a subject for an e-mail, so let's eliminate it altogether. Offer a description field/tag instead. Most people could cope with that without straining their creativity muscle too much.

Locating an existing file could be reduced to establishing criteria via point-n-shoot instead of remembering filenames or navigating folders.

We could even expand on the 'Recent Documents' menu idea, adding 'Documents by you', Spreadsheets, Images, Videos, etc.

It already is

Adam Williamson's picture

"And it needn't require a database. We could just add a layer to an existing
file-system, and expand on the use of the kinds of tags music and video files
already use."

We already did. They're called xattrs. Look down a bit in the comment thread...

Not quite the same thing.

Marcus Rhodes's picture

xattrs are in the FS, not in the files themselves. Nor can they obviate filenames.

M$ was trying to adapt the Pick file-system to Windows as WinFS for this very reason, but they couldn't pull it off for undisclosed reasons. Were Linux to succeed where M$ failed, it could raise some eyebrows.

RE: Pick file system

GreyGeek's picture

The DOS based DBMS called "Advanced Revelation" used the Pick AMV system and I used it professionally for several years. It could easily combine other systems "parent-child" table paradigms into a single table, where violations of the 3rd Normal Form were handled with mult-valued fields. Each table had a "dictionary" which stored the field names and symbolic definitions. Keeping the dictionary and its table synchronized, along with the multitude of indexing hashes needed to circumvent the speed issue gave AREV problems when it was applied to larger or more complicated data sets. Even with the indexes speed was always a problem.

AREV evolved into a GUI product called "Open Insight", but it was considerably less stable, especially in a Windows environment. I immediately abandon it for better tools. Open Insight is still around but occupies a niche market space.

I can understand why Microsoft abandon it. So did I. I can essentially do the same thing with PostgreSQL now and enjoy both speed and stability.

> xattrs are in the FS, not

Anonymous's picture

> xattrs are in the FS, not in the files themselves.
Right. Unix treats al files as "bag of bytes" and this concept is so central to it that I very much doubt you could use anything else but xattrs.

I can't really see doing away

seeker5528's picture

I can't really see doing away with directory structure.

Sure it's fine when all you need is a file browser, but when you want to actually manage files, copy, delete, backup, compare, need to find something when you are stuck at the command line, etc...

Instead of asking if we can do away with directory structure, we should be asking why programs that handle files of a particular type (image, video, etc...) don't provide an organize feature where you set up the hierarchy you want based on relevant tags you use, then choose the directory to use for them.

Similar to the way you tell Amarok directories where you store your music (/some/directory/cds_i_own/' '/some/directory/legal_downloads/', etc..) which then show up as collections which at the time you choose to import/organize a selection of files you then choose the collections where you want to place it.

I use Amarok as an example because while most music management programs seemed to have figured this hierarchy thing out, Amarok is the only one I can think of off the top of my head the lets you specify multiple directories and to choose which to move the files to when you choose import/organize.

Somehow F-Spot and other image programs in that same category all seem to be screwed up in some way or another screwing up the exif data without asking first, organization (or lack of), lack of an option to leave files in their original location, screwy importing, unable to view files without importing them first, etc...

As Zeitgeist, Nepomuk, and related technologies start to become more widely used maybe what to do about organization can be revisited, but sane organization needs to be there in the background even when tags, dates, file types, etc.. are used in apps and file browsers to sort/filter the display of files, if for no other reason so that when something happens you have a sane way to get to the stuff that's actually important to you instead of having a big F'ed up mess to wade through.

Later, Seeker

Actually, if you think that a

Anonymous's picture

Actually, if you think that a file belongs to multiple folders, you should make a static link (ln without '-s', which stands for symbolic, opposit of static). In fact original placement of file is a static link numero one. You may now call these link "tags", if you so choose.

Old news...

Callix's picture

Electronic Document Management (EDM) and/or Document Management Systems (DMS) have been around a long long time (see: http://en.wikipedia.org/wiki/Document_management_system).
A core feature of most of these systems is to abstract the underlying file system paradigm from users and allow for tagging (i.e. multiple logical locations for documents). What you describe is an integration of DMS into the desktop and can be handled in software. There is no need to change the underlying OS file system structure.

I completely agree with this,

Anonymous's picture

I completely agree with this, and I think this is the way things will develop, people must learn to use these tools, once I used to have to go through directories to find PDF documents, of which I have thousands, e-books and reference papers, now I use an e-book manager called Calibre and I can tag to my heart's content and find things in a flash, and still get to keep my folders, there is no need to replace them, in fact Calibre can automatically organise the underlying folder structure according to the database information, some Media players can do some of what Calibre does, this is the case for most photo managers as well.

usefulness, examples, implementation

David Nessl's picture

The most widely used application today that showcases tagging is Gmail. Unfortunately, most Gmail users still mentally treat Gmail's labels as folders. Indeed, I still give most of my archived emails only a single tag/label, but when I do give a mail message multiple tags, it's quite useful later. Tagging, as a general facility, would be a great addition to a filesystem.

Although there are several background apps for Linux that implement tags in user-space, the problem is that these solutions are inherently brittle. Likewise, brute-force search has problems -- its indexes quickly get stale, and it's periodic re-indexing doesn't scale for large amounts of data. (I've got a 1TB NAS at home, which isn't that unusual nowadays.) The true solution is to implement tags/labels as an extension to POSIX filesystems in the kernel. Someone has suggested using xattr, but that doesn't work on its own -- you need to be able to query the filesystem to find not only (a) what tags does this file have, but also (b) what files have this given tag (or tags). And trying to solve (b) using the `find` command isn't viable because it doesn't scale either.

Gmail/GoogleDocs

jaqian's picture

Actually Ithink a better example would be GoogleDocs. Speaking for myself I rarely give tag an email, if I did it would be two at the most.

I use GoogleDocs as a backup of all my important documents (I use Picasa for the images, easier to resize). I find that I use multiple tags for documnets there.

I agree with the author that tags are the way to go but I think the browser shouldn't do away with hierarchy but complement it. The biggest problem that I see though is that every filetype would have to implement IPTC or something similar as anything based on a database would be too fragile, you would have to backup both your files and the database.

IMO files should be OS (& database) independent. JPEGs are a good example of this. You can write all your tags to the file and have it read by any program that supports IPTC. So I can tag in Picasa and have it read by DigiKam, Lightroom, Flickr etc.

I would suggest as software gets better at reading the contents of a file and facial recognition, the software could suggest tags based on content. The software could read the document and suggest that JohnSmith & Invoice as tags based on content. It could also suggest that you tag a photo BrianMurphy, EmmaCullen etc based on previous tags and recognising faces e.g. facial tagging in Facebook.

Windows Vista/7 browser touches on this slightly... when you go from documents into the pictures you are automatically in a photo organiser.

Great article and even better idea.

folders vs. tagging -- a case for both

Saint DanBert's picture

I'm no expert in "Library Science" or "information science" or similar, but I feel certain that those folks have good things to say about both approaches to information storage and retrieval.

In an office (business) or similar setting, there is much to endorse a document --> folder --> drawer --> cabinet or similar paradigm. For example, my latest tax forms would likely be stored this way.

In contrast, a tag saying, "this year's (2010) taxes" would not only locate those tax forms, but would likely locate a huge number of receipts, transaction records, and other working documents that were used in preparation of those same tax forms ... as well as the forms themselves. When coupled with a "tag cloud" such an approach is powerful during analysis.

THEREFORE -- This reader (writer) believes that there is a place for both systems of information categorization and storage and retrieval.

Respectfully,
~~~ 0;-Dan

waiting for years

Anonymous's picture

Ive been waiting for half my life for it to dawn on someone besides myself that this hierarchical folder structure is not a very good paradigm for data storage. Yet each new file system that comes out repeats the same tired old formula, as if no one has any real imagination anymore. ButterFS and ZFS are still based on this paradigm as far as I know. The same old tired and hackneyed paradigms persist on the Desktop too - all the desktops (KDE, Gnome, etc) are based on the same old basic WIMP paradigm of yore. When will someone do something genuinely innovative in IT?

It *was* done. Years ago.

fest3er8's picture

The concept of tagging files was thought of, and implemented years ago, in the late nineties, in a filesystem called BFS. The OS was called BeOS. It worked. Very well.

Alas, BeOS and Be, Inc., fell by the wayside. However, a group of enthusiasts has been resurrecting BeOS, implementing it as open source. It's called Haiku and can be found at http://www.haiku-os.org/. It's currently at Alpha-2 and is, by all reports, surprisingly useful and keeps getting better.

I used BeOS as my primary desktop system until it was just too far behind. Fortunately Debian had evolved enough to nearly replace it; I've been using Debian since for all my work. But as soon as Haiku is about ready for prime time, I will probably switch back.

Indexing/attributing/tagging were part of BFS from the beginning; part of the idea was to incorporate a database in the FS. I could create a search for certain parameters and save it. Every time I opened the search, it would have the up-to-date info in it. It still used the traditional hierarchical directory structure (THDS). THDS makes sense, performance-wise.

Imagine storing everything in a single directory (as the original MSDOS did). I should take a lot of disk I/O to get a file's info to open it and execute or edit it. The THDS is still a good way to store files. Finding files is a different matter. That's where indexing and tagging come into play. If you had an MP3 that fit five different genres, you would add all five tags. Search for a specific genre and all files fitting that genre would be found.

BFS and its indexing worked very well. It was one of the neater features of BeOS. They did it back in the nineties. And it's coming back to life as Haiku.

Folders vs Tags

amystko's picture

There is something in tagging that makes it interesting. I sometimes prefer tags instead of folders. If I have many items, that fit to multiple folders, it is difficult to decide, where to put them.
I can assign many tags to one item, for instance the same file is 'private' and 'important' and 'project vacation'.
The key is to have a predefine set of tags. The tags might even create a hierarchy.

In my thunderbird, I have totally abandoned the use of folders and use tags instead.
With the use of small add-on (tag-toolbar), and my predefined set of tags, it is easy and very quick.

trouble with gags

micah's picture

The trouble with tags is that you need to be quite careful to consistently label like files with like tags. This is a difficulty I encounter when blogging, using library thing, or even tagging my mp3s. I don't really want this hassle on my Desktop, nor do I want to maintain some kind of authority file telling me what tags I can use, which is the only way of overcoming this hassle that I see.

A nice way that a lot of

Ewen's picture

A nice way that a lot of websites handle that is through drop down list of previously used tags, and a fall back to creating a new one. I think it works really well because you keep the consistency (you'd always select a tag from the list before you'd bother creating a new one) and also the flexibility (you can create a new tag if you need it).

Why?

Adam Williamson's picture

"Why is it that web based technology such online bookmarking makes far greater use of tagging than the Linux desktop does?"

Because it's easier to write an article about what should happen than it is to write the code to make it happen.

(As previous commenters have pointed out, various groups have already been working on this for years, but it's not a simple problem to solve. The Semantic Desktop effort is probably the furthest along.)

Filenames are tags enough for

jvd's picture

Filenames are tags enough for me. Takes some discipline to find appropriate names, but that is the same discipline needed for making tags. Why exactly invent a new system where the solution demands just as much systematic attention to detail as the current system?

tagging for files -- very cool

James Richardson's picture

This is a very cool idea. I don't know how it would work in practice at a file system level. But with my limited experience with blogging, tag clouds make it easy for me to find what I am looking for. I have actually setup a wordpress blog for my internal network at the house. I store tidbits and hints I've download from various places on the internet. I find it much easier searching through the tag cloud that having to search all over my hard drive with various find/grep incancations.

Obviously, I don't do my photos like this. I have use picasa for that. Picasa also lets me tag my photos so I can search on tags.

Tags integrated into the filesystem would useful, at least for user files.

Business is the place with

wally's picture

Business is the place with the most need for tagging... in a directory system, a document out of the proper folder is lost. Therefore all documents need to have a unique identifier of some type, and that usually includes a project or theme name, a date, some description of the particular document, etc.
Therefore, most businesses have already tagged every document... but then they duplicate efforts and abandon the benefits of tagging by reverting to the directory hierarchy for storing the documents.
Worst of all is to then use a pre-defined set of subdirectories for all projects with overlapping or ambiguous names. The problem with that is that you cannot tell by looking at a folder whether it contains files or not and certainly cannot tell if it contains the files you want, so: click, click, click, click. What a time-waster!

The answer is BOTH!

G Johnson's picture

Keep the file system. It is efficient and clean for everything except finding files.

Add tags to find files, or even better, don't. Search.

The answer is to keep using the file system, as it has been refined over the last few decades. If you only care about tags, and do not like the naming structure, give the option to generate a file name (UUID) and drop it in the big bile of data directory under your home. Then use a full /home/user search, with matches on tags coming before matches on full text.

The best of both worlds, and without re-writing a single application or OS utility. Just improve the search and indexing within the file browser, and the job is done.

Erm, a lot of the comments

DavidR's picture

Erm, a lot of the comments seem to have missed the OP's point that the tags would only apply to visible files in the /home directory.

Also, the directory structure wouldn't be gotten rid of. Just enhanced.

So, I'd say it's a good idea. As long as it is implemented well, and not forced on us, then it could be extremely useful.

No, we didn't miss the TITLE

JamesL's picture

No, we didn't miss the TITLE that said "could Linux *ABANDON* directories in favor of tagging" (emphasis mine). He did not say augment, he did not say enhance, implement along side of, or any such statement. That title implies a complete replacement of directories with tags. What would work for both is to implement something like OS/2's "extended attributes"; probably the one thing left from OS2/Workplace Shell that would still be useful to open source.

too many trees

xbackslashx's picture

When I first ventured into the world wide web, I did not really have a clue what to do with all the information that is just there. Being a self education individual, I got most of my information from the web, books, online fora and articles.

Wanting to keep track of the vast knowledge on the web I started resorting to bookmarks, to the point that I now have several hundred of them in my firefox profile. Like regular files, I decided to structure them in seperate folders and over time this became a huge forest where it was difficult to see the trees, until I considered tagging my bookmarks, which made it a lot easier to find them again later or even be aware of them after a long time.

I definitely see a lot of merit in the tagging system for personal files, as it would make searching the forest a lot easier and less time consuming.

Delicious Bookmarks

jaqian's picture

I use to do the same but now I store all my bookmarks in Delicious and use tagging there. I found it better as I would have diferent bookmarks saved on my home and work computers, much easier to find.

Harken to 1932

Anonymous's picture

Funny thing is that the issues as described by this article with current direcotory/file structures existed before computers. Example, where would you look for a memo concerning 1930's construction techniques for dam building? In a project paper-folder for the TVA or perhaps the "Internal Memoranda" folder file by author? The difference then was that companies employed professionals to sort out that mess. Personally, I grew-up with the Dewy Decimal system for libraries. Seems that is a better model overall. That system already has a "cross reference" component built-in. Log onto your local library's online catalogue system. It usually is regional and you can generally find your targeted-media-of-interest in a few seconds or a couple of minutes. I at the opposite end can spend up to 40minutes hunting and pecking on my pc for an excel spreadsheet.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState