Select an arbitrary column of text in UNIX

UNIX has wonderfully powerful text processing capabilities. There are numerous ways to solve the same problem. Frequently, for example, it is necessary to extract a single column of data from a text file or output stream. This tech-recipe will present several solutions to this problem.


Many data files have data fields delimited by a single character like a tab or colon. To extract the full name field out of /etc/passwd, the fifth colon-delimited field, use:

cut -d : -f 5 /etc/passwd

The cut command allows a great deal of flexibility in cutting data. In this case, the -d : directs cut to use a colon character as the delimiter. The -f 5 parameter directs cut to extract only the fifth field. The field parameter makes cut extrememly flexible. Other examples are -f 2-5 to extract fields 2 through 5, -f 1,3,7 to extract the first, third, and seventh fields.

To extract a fixed set of columns, for example the column numbers 44 through 49 from a long directory listing (ls -l), use the following command:

ls -l | cut -c 44-49

On many UNIX systems, these columns represent the modification date. Like the -f parameter, the -c parameter can accept alternative values such as -c 5,7,6,8 will present those character positions in that order.

One of the trickier column extraction involves the presence of a variable amount of whitespace between fields. To extract the process id (second) field from a process listing (ps -ef), cut will not work. Another powerful text manipulator in UNIX is awk which understands that several spaces should be counted as a single whitespace. To extract the pid from a ps -ef, use:

ps -ef | head | awk '{print $2}'

Awk is an incredibly powerful tool, and this is a trivial but useful application of it.

 

About Quinn McHenry

Quinn was one of the original co-founders of Tech-Recipes. He is currently crafting iOS applications as a senior developer at Small Planet Digital in Brooklyn, New York.
View more articles by Quinn McHenry

The Conversation

Follow the reactions below and share your own thoughts.

  • Khushbooshah

    i have 5 columns in file…i want to display only 2,4 column in one single command. hw to do tht

    • Itismike

      cat infile | cut -f 2,4 > outfile

  • Anonymous

    Did you know that if you press Alt while selecting with the mouse you can also select a block of text?

  • Jerobadi

    Apr 14 12:15 110414060000.ca_P_STARCOB_CA_837I.nr
    Apr 14 10:56 ACH_WCF_CTRL.tbl
    Apr 14 10:37 110413220000.ca_P_PR_HC_837P.nr
    Apr 14 08:14 dont_delete
    Apr 12 11:48 trigger
    Apr 11 07:40 Mail
    Apr 01 11:36 directory_names.txt
    Apr 01 11:34 presence_files.sh
    Feb 28 06:08 monitor_va088.pl
    Feb 28 06:07 monitor_l1l2.pl

    i wanted to list the files for the dates that range between Apr 01 – Apr 12. could someone help??

  • Anu

    ps -ef | head | awk ‘{print $2}’
    This combination was very useful.. I was struggling with grep to do this since i had not used awk…

  • venkatesh

    txt file is like

    username expirydate
    —————————
    xxxx 29-11-2012
    yyyy 29-11-2012

    i want to cut these fields and store this multiple values into a variable.
    from variable to insert into a table. could you help on this.

    thanks & Regards,
    venkatesh