wc—Word Count
The wc (word count) command is a very simple utility found in all Unix variants. Its purpose is counting the number of lines, words and characters of text files. If multiple files are specified, wc produces a count for each file, plus totals for all files.
When used without options wc prints the number of lines, words and characters, in that order. A word is a sequence of one or more characters delimited by whitespace. If we want fewer than the three counts, we use options to select what is to be printed: -l to print lines, -w to print words and -c to print characters. The GNU version of wc found in Linux systems also supports the long options format: --chars (or --bytes), --words, --lines.
When I applied wc to an earlier version of the LaTeX source file with this text, I received the following information from wc:
wc wc.tex
98 760 4269 wc.tex
This line means that the file had 98 lines, 760 words and 4269 characters (bytes). Actually, I seldom use wc alone. Due to its simplicity wc is mostly useful when used in combination with other Linux commands.
If we use a file system other than Linux (or Unix), namely DOS, there is an ambiguity due to a line break being a combination of a carriage return and a line feed. Should -c count a line break as two characters or only one? The POSIX.2 standard dictates that -c actually counts bytes, not characters, and it provides the -m option to count characters. This option cannot be used together with -c, and for that matter, GNU wc does not support -m. If we desperately need it, we can always subtract the line count from the byte count to obtain the char count of a DOS file. Here are two different ways to achieve this:
wc /dosc/autoexec.bat | awk '{print $3-$1}'
tr -d '\015' < /dosc/autoexec.bat | wc -c
The first solution uses awk to subtract the first field (the line count) from the third field (the byte count). The second solution uses tr to delete the carriage returns (char 15 in octal) from the input before feeding it to wc.
Recently I used a CD-ROM writer that was connected to a machine that was slightly sick. Now and then a block of 32 consecutive bytes got corrupted while copying amongst different hard disk partitions. This caused quite a few CD-ROM backups to be damaged. Sometimes the damage affected a large file, and in this case, it was cheaper to keep the bad file and add a small patch file to the next backup. To decide whether we should make a new full backup of a corrupted file or just make a differential patch, we used the cmp command to detect the differences, followed by wc to count them:.
cmp -l /original/foo /cdrom/foo | wc -l
The -l option to cmp provides a full listing of the differences, one per line, instead of stopping on the first difference. Thus, the above command outputs the number of bytes that are wrong.
If we want to count how many words are in line 70 of file foo.txt then we use:
head -70 foo.txt | tail -1 | wc -w
Here, the command head -70 outputs the first 70 lines of the file, the command tail -1 (i.e., the number 1) outputs the last line of its input, which happens to be line 70 of foo.txt, and wc counts how many words are in that line.
If our boss presses us to include in our monthly project report a count of the number of lines of code produced, then we can do it like this:
wc -l */*.[ch] | tail -1 | awk '{print $1}'
This assumes that all our code is in files with extension .h or .c, and that these files live in subdirectories one level deep from our current directory. If file depth is arbitrary, we use the following:
wc -l `find . -name "*.[ch]" -print` | \
tail -1 | awk '{print $1}'
Notice the use of back quotes in the find
command line, and forward (normal) quotes in the
awk command. The command find . -name
"*.[ch]" -print outputs the *.c and
*.h files located below the current directory,
one per line. The back quotes cause that command to be executed,
and then replace each newline in the command's output with a blank,
and pass that output to the wc
command line.
If in good GNU style you mark all current bugs and dirty hacks in your source code with the word FIXME, then you can see how much urgent work is pending by typing:
grep FIXME *.c | wc -l
The grep outputs all lines that have a FIXME, and then we just have to count them.
As you can see there is nothing special about the wc command; however, half of my shell scripts would stop working if that command was not available.

Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- RSS Feeds
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- What's the tweeting protocol?
- Tech Tip: Really Simple HTTP Server with Python
- Kernel Problem
39 min 10 sec ago - BASH script to log IPs on public web server
5 hours 6 min ago - DynDNS
8 hours 41 min ago - Reply to comment | Linux Journal
9 hours 14 min ago - All the articles you talked
11 hours 37 min ago - All the articles you talked
11 hours 41 min ago - All the articles you talked
11 hours 42 min ago - myip
16 hours 7 min ago - Keeping track of IP address
17 hours 58 min ago - Roll your own dynamic dns
23 hours 11 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
Word Count using Unix command
Hi all,
I tried to find the word count using below command.
$ cat yadav |grep -w 'IOBAS'
3
It was showing 3, I want to inform you that I have 7 IOBAS text in a yadav file. please guide me to find word count.
Thanks,
Shiv Kumar
line count and compare
check out the free line count and compare app bcscr at http://bcscr.sourceforge.net/
When you don't have Linux
I use a website for word and character count:
http://www.caseconvert.com/
Very easy :-)
..then you need to get it
;-)
Omarh, hope you don't mind. Thanks for the link.
But the summary on wc is great. Thank you Alexandre!
(PS: Your daughter must be 22 now. My goodness, how time flies!)