Select an arbitrary column of text in UNIX

Contributor Icon Contributed by qmchenry Date Icon October 6, 2003  
Tag Icon Tagged: UNIX

UNIX has wonderfully powerful text processing capabilities. There are numerous ways to solve the same problem. Frequently, for example, it is necessary to extract a single column of data from a text file or output stream. This recipe will present several solutions to this problem.


Many data files have data fields delimited by a single character like a tab or colon. To extract the full name field out of /etc/passwd, the fifth colon-delimited field, use:

cut -d : -f 5 /etc/passwd

The cut command allows a great deal of flexibility in cutting data. In this case, the -d : directs cut to use a colon character as the delimiter. The -f 5 parameter directs cut to extract only the fifth field. The field parameter makes cut extrememly flexible. Other examples are -f 2-5 to extract fields 2 through 5, -f 1,3,7 to extract the first, third, and seventh fields.

To extract a fixed set of columns, for example the column numbers 44 through 49 from a long directory listing (ls -l), use the following command:

ls -l | cut -c 44-49

On many UNIX systems, these columns represent the modification date. Like the -f parameter, the -c parameter can accept alternative values such as -c 5,7,6,8 will present those character positions in that order.

One of the trickier column extraction involves the presence of a variable amount of whitespace between fields. To extract the process id (second) field from a process listing (ps -ef), cut will not work. Another powerful text manipulator in UNIX is awk which understands that several spaces should be counted as a single whitespace. To extract the pid from a ps -ef, use:

ps -ef | head | awk '{print $2}'

Awk is an incredibly powerful tool, and this is a trivial but useful application of it.

Previous recipe | Next recipe |
 

Viewing 2 Comments

 
close Reblog this comment
blog comments powered by Disqus