How to Use the wc Command in Linux

fatmawati achmad zaenuri/Shutterstock.com

Counting the variety of traces, phrases, and bytes in a file is beneficial, however the actual flexibility of the Linux wc command comes from working with different instructions. Let’s have a look.

What Is the wc Command?

The wc command is a small software. It’s one of many core Linux utilities, so there is no such thing as a want to put in it. It’ll already be in your Linux laptop.

You possibly can describe what it does in a only a few phrases. It counts the traces, phrases, and bytes in a file or choice of information and prints the end in a terminal window. It will probably additionally take its enter from the STDIN stream, that means the textual content you need it to course of might be piped into it. That is the place wc actually begins so as to add worth.

It’s a nice instance of the Linux mantra of “do one factor and do it nicely.” As a result of it accepts piped enter, it may be utilized in multi-command incantations. As we’ll see, this little standalone utility is definitely an important staff participant.

A technique I take advantage of wc is as a placeholder in an advanced command or alias I’m cooking up. If the completed command has the potential to be harmful and delete information, I typically use wc as a stand-in for the actual, harmful command.

That manner, through the improvement of the command I get visible suggestions that every file is being processed as I anticipated. There’s no likelihood of something dangerous occurring whereas I’m wrestling with the syntax.

So simple as wc is, there are nonetheless a couple of small quirks that it is advisable to find out about.

Getting Began With wc

The only manner to make use of wc is to move the title of a textual content file on the command line.

wc lorem.txt

Using wc with a file with one long line of text

This causes wc to scan the file and depend the traces, phrases, and bytes, and write them out to the terminal window.

Phrases are thought of something bounded by whitespace. Whether or not they’re phrases from an actual language or not is irrelevant. If a file accommodates nothing however “frd g lkj”, it nonetheless counts as three phrases.

Strains are sequences of characters terminated by both a carriage return or the top of the file. It doesn’t matter if the road wraps round in your editor or within the terminal window, till wc encounters a carriage return or the top of the file, it’s nonetheless the identical line.

Our first instance discovered one line in all the file. Right here’s the content material of the “lorem.txt” file.

cat lorem.txt

The content of the file with one long line

All of that counts as a single line as a result of there are not any carriage returns. Evaluate this to a different file, “lorem2.txt”, and the way wc interprets it.

wc lorem2.txt
cat lorem2.txt

Using wc with a file with many lines

This time, wc counts 15 traces as a result of carriage returns have been inserted into the textual content to begin a brand new line at particular factors. Nevertheless, when you depend the traces with textual content in them, you’ll see there are solely 12.

The opposite three traces are clean traces on the finish of the file. These comprise solely carriage returns. Despite the fact that there is no such thing as a textual content in these traces, a brand new line has been began and so wc counts them as such.

We will move as many information to wc as we like.

wc lorem.txt lorem2.txt

Using wc with two files

We get the statistics for every particular person file and a complete for all of the information.

We will additionally use wildcards in order that we are able to choose matching information as a substitute of explicitly named information.

wc *.txt *.?

Using wc with wildcards

The Command Line Choices

By default, wc will show the traces, phrases, and bytes in every file. It’s the identical as utilizing the -l (traces) -w (phrases) and -c (bytes) choices.

wc lorem.txt
wc -l -w -c lorem.txt

Using wc with the lines, words, and bytes options

We will specify which mixture of figures we want to see.

wc -l lorem.txt

wc -w lorem.txt

wc -c lorem.txt

wc -l -c lorem.txt

Using wc with combinations of options

Particular consideration must be paid to the final determine, generated by the -c (bytes) possibility. Many individuals mistake this as counting the characters. It truly counts bytes. The variety of characters and the variety of bytes would possibly nicely be the identical. However not at all times.

Let’s take a look at the contents of a file referred to as “unicode.txt.”

cat unicode.txt

The content of a file containing a non-Latin character

It has three phrases and a non-Latin alphabet character. We’ll let wc course of the file with its default setting of bytes, and we’ll do it once more however request characters with the -m (characters) possibility.

wc unicode.txt
wc -l -w -m unicode.txt

Counting the bytes in a file and then counting the characters in the same file

There are extra bytes than there are characters.

Let’s take a look on the hex dump of the file and see what’s occurring. The hexdump command’s -C (canonical) possibility shows the bytes within the file in traces of 16, with their plain ASCII equal (if there may be one) proven on the finish of the road. If there is no such thing as a corresponding ASCII character, a interval “.” is proven as a substitute.

hexdump -C unicode.txt

A hexdump of a short file with a non-Latin character

In ASCII, a hexadecimal worth of 0x20 represents an area character. If we depend three values in from the left, we see the subsequent worth is an area character. So the these first three values 0x62, 0x6f, and 0x79 characterize the letters in “boy.”

Hopping over the 0x20, we see one other set of three hexadecimal values: 0x63, 0x61, and 0x74. These spell out “cat.” Hopping over the subsequent area character we see three extra values for the letters in “canine.” These are 0x64, 0x5f, and 0x67.

Proper behind the phrase “canine” we are able to see an area character 0x20, and 5 extra hexadecimal values. The final two are carriage returns, 0x0a.

The opposite three bytes characterize the non-Latin character, which we’ve ringed in inexperienced. It’s a Unicode character, and it takes three bytes to encode it. These are 0xe1, 0xaf, and 0x8a.

So be sure you know what you’re counting, and that bytes and characters needn’t be the identical. Often, counting bytes is extra helpful as a result of it tells you what is definitely inside the file. Counting by characters provides you the variety of issues represented by the contents of the file.

RELATED: What Are Character Encodings Like ANSI and Unicode, and How Do They Differ?

Taking Filenames From a File

There’s one other manner to offer filenames to wc . You possibly can put the filenames in a file, and move the title of that file to wc. It opens the file, extracts the filenames, and processes them as if that they had been handed on the command line. This lets you retailer an arbitrary assortment of filenames for re-use.

However there’s a gotcha, and it’s a giant one. The filenames should be null terminated, not carriage return terminated. That’s, after every filename there should be a null byte of 0x00 as a substitute of the same old carriage return byte 0x0a.

You possibly can’t open an editor and create a file with this format. Usually, information like this are generated by different applications. However, in case you have such a file, that is how you’ll use it.

Right here’s our file containing the filenames. Opening it in much less exhibits you the unusual “^@” characters that much less makes use of to point null bytes.

much less source-files-list.txt

A file in less that contains null bytes

To make use of the file with wc, we have to use --files0-from (learn enter from) possibility and move within the title of the file containing the filenames.

wc ---files0-from=source-files-list.txt

wc processing the file of null terminated filenames

The information are processed precisely as if they have been offered on the command line.

Piping Enter to wc

A way more frequent, versatile, and productive option to ship enter to wc is to pipe the output from different instructions into wc . We will reveal this with the echo command.

echo "Rely this for me" | wc
echo -e "Rely thisnfor me" | wc

Using echo to send input to wc

The second echo command makes use of the -e (escaped characters) possibility to permit escaped sequences just like the “n” newline formatting code. This injects a brand new line, inflicting wc to see the enter as two traces.

Right here’s a cascade of instructions feeding their enter from one to the opposite.

discover ./* -type f | rev | reduce -d'.' -f1 | rev | kind | uniq
  • discover seems to be for information (sort -f) recursively, beginning within the present listing. rev reverses the filenames.
  • reduce extracts the primary subject (-f1) by defining the sphere delimiter to be a interval “.” and studying from the “entrance” of the reversed filename as much as the primary interval it finds. We’ve now extracted the file extension.
  • rev reverses the extracted first subject.
  • kind kinds them in ascending alphabetical order.
  • uniq lists distinctive entries to the terminal window.

The list of unique extensions in the current directory tree

This command lists all the distinctive file extensions within the present listing and any subdirectories.

If we added the -c (depend) choice to the uniq command it will depend the occurrences of every extension sort. But when we need to know what number of completely different, distinctive file extensions there are, we are able to drop wc because the final command on the road, and use the -l (traces) possibility.

discover ./* -type f | rev | reduce -d'.' -f1 | rev | kind | uniq | wc -l

Adding wc to count the unique extensions

RELATED: Tips on how to Use the Linux reduce Command

And Lastly

Right here’s one final trick wc can do for you. It’ll let you know the size of the longest line in a file. Sadly, it doesn’t let you know which line it’s. It simply provides you the size.

wc -L taf.c

Getting the length of the longest line in a file with wc

Beware although, that tabs are counted as eight areas. Considered in my editor, there are three two-space tabs firstly of that line. Its actual size is 124 characters. So the determine reported is artificially expanded.

I’d deal with this operate with a giant pinch of salt. And by that I imply don’t use it. Its output is deceptive.

Regardless of its quirks, wc is a superb software to drop into piped instructions when it is advisable to depend all kinds of values, not simply the phrases in a file.

RELATED: 37 Vital Linux Instructions You Ought to Know

Source link

Leave a Comment

Your email address will not be published.