Select an arbitrary column of text in UNIX

Contributor Icon Contributed by qmchenry  
Tag Icon Tagged: UNIX  

UNIX has wonderfully powerful text processing capabilities. There are numerous ways to solve the same problem. Frequently, for example, it is necessary to extract a single column of data from a text file or output stream. This tech-recipe will present several solutions to this problem.


Many data files have data fields delimited by a single character like a tab or colon. To extract the full name field out of /etc/passwd, the fifth colon-delimited field, use:

cut -d : -f 5 /etc/passwd

The cut command allows a great deal of flexibility in cutting data. In this case, the -d : directs cut to use a colon character as the delimiter. The -f 5 parameter directs cut to extract only the fifth field. The field parameter makes cut extrememly flexible. Other examples are -f 2-5 to extract fields 2 through 5, -f 1,3,7 to extract the first, third, and seventh fields.

To extract a fixed set of columns, for example the column numbers 44 through 49 from a long directory listing (ls -l), use the following command:

ls -l | cut -c 44-49

On many UNIX systems, these columns represent the modification date. Like the -f parameter, the -c parameter can accept alternative values such as -c 5,7,6,8 will present those character positions in that order.

One of the trickier column extraction involves the presence of a variable amount of whitespace between fields. To extract the process id (second) field from a process listing (ps -ef), cut will not work. Another powerful text manipulator in UNIX is awk which understands that several spaces should be counted as a single whitespace. To extract the pid from a ps -ef, use:

ps -ef | head | awk '{print $2}'

Awk is an incredibly powerful tool, and this is a trivial but useful application of it.

 

5 Comments -


  1. Khushbooshah said on August 7, 2010

    i have 5 columns in file…i want to display only 2,4 column in one single command. hw to do tht

  2. Itismike said on August 25, 2010

    cat infile | cut -f 2,4 > outfile

  3. Anonymous said on March 11, 2011

    Did you know that if you press Alt while selecting with the mouse you can also select a block of text?

  4. Jerobadi said on April 15, 2011

    Apr 14 12:15 110414060000.ca_P_STARCOB_CA_837I.nr
    Apr 14 10:56 ACH_WCF_CTRL.tbl
    Apr 14 10:37 110413220000.ca_P_PR_HC_837P.nr
    Apr 14 08:14 dont_delete
    Apr 12 11:48 trigger
    Apr 11 07:40 Mail
    Apr 01 11:36 directory_names.txt
    Apr 01 11:34 presence_files.sh
    Feb 28 06:08 monitor_va088.pl
    Feb 28 06:07 monitor_l1l2.pl

    i wanted to list the files for the dates that range between Apr 01 – Apr 12. could someone help??

  5. Anu said on November 9, 2011

    ps -ef | head | awk ‘{print $2}’
    This combination was very useful.. I was struggling with grep to do this since i had not used awk…

 

RSS feed for comments on this post. TrackBack URL

Leave a comment -