BSD tr - translate (or deletes) characters
tr [-dCcsu] matchString replaceString
<†
infile
tr matchString replaceString < file
the first character in matchString is translated into the first character in replaceString ….
If matchString is longer than replaceString, the last character found in replaceString is duplicated until matchString is exhausted.
tr "aeiou" "_" implies tr "aeiou" "_____"such that:
echo abcdefghijklmnopqrstuvwxyz1234 | tr "aeiou" "_" _bcd_fgh_jklmn_pqrst_vwxyz1234
-d matchString | delete characters specified in matchString. |
-s matchString | squeeze multiple characters specified in matchString to a single occurance
(either matchString or replaceString) after deletion and translation.
|
-C | Complement the set of characters in matchString, For example -C aeiou includes every character except for aeiou
|
-c | complement the set of byte values in matchString. |
-u | unbuffered output . |
\
<>&`$()";'|*=[]#?~
\aalertCharacter( usually BELL,beep),\bBackSpace,\fForm-Feed,\nNewLine,
\rCarriageReturn,\tTab,\vVerticalTab.
Backslash followed by any other character is ignored (so \\ specifies a backslash).
c1-c2 range of characters, inclusively.
To specify a hex string use $'\xxx'† .
(_) using the mouse to highlight and copy characters in the x'C0' - … range, then pasting into linux, causes 2 characters to be inserted for each character received!
A x'C3' and then the character received .AND. x'DF' (b'1011 1111')
For example: pasting an uppercase A with an over struck accent grave ( x'C0' ) inserts x'C380' into the input stream!
Using the following translate to delete the x'C3' and changing the x'80' to x'C0' works BUT …Fix up characters x'C0' thru x'CF' inserted from console that came in as x'80' thru x'8F'.
tr -d $'\xc3' < 0 | \
tr $'\x80'-$'\x8f' $'\xC0'-$'\xCF' > 0fixed
> hexdump -C 0
43 30 20 c3 80 20 09 43 31 20 c3 81
> hexdump -C 0fixed
43 30 20 c0 20 09 43 31 20 c1
Using mouse to copy paste from some pdf files may cause the inclusion of
EF 82 B7E2 80 99
[:class:] all characters belonging to the character class.
when translating, the only character classes that may appear in replaceString
are `upper' and `lower'.(linux)
upper UPPER-CASE , lower,
alpha alphabetic, digit, alnum alphanumeric,
punct punctuation†blank†,space†,
print†,
graph†,
tr "[:cntrl:]" "[:lower:]"
carriage return is shown as n,
line feed is shown as k,
TAB is shown as j
tr "[:cntrl:]" "[:lower:]" < cedar2.txt nkIf RIDGEjcould tell its story it would start with a vast ice-cap that stretched down from thenkNorth Pole to here. When a wide river had previously cut deep channels into the sandstone and shale bedrock,nkThe melting glacier left millions of cubic yards of sand, gravel and stone.nkCedar Ridge is a pile of such glacial till.nkThe rounded boulders that
[:xdigit:] hexadecimal
carriage return is shown as D (x'0D'),
line feed is shown as A (x'0A'),
TAB is shown as 9 (x'09')
tr "[:cntrl:]" "[:xdigit:]" < cedar2.txt DAIf9CEDAR RIDGE could tell its story it would start with a vast ice-cap that stretched down from theDANorth
! " ` # ' $ % & ( ) * + , - _ . / \ | : ; < = > ? @
{ } ~[ ] ^ 0-9,space,A-Z,a-zupper and lower are ordered.
See ctype(3) manual pages for details as to which characters are included in these classes,
[c*n] c repeated n times in replaceString.
If n is omitted or 0, it is be interpreted as large enough to extend replaceString to the length of matchString.
If nn has a leading 0, it is interpreted as octal
\000 octal .
To follow 0 with a digit as a character, left 0-pad the 0n to 3 octal digits ex: 007.
[=equiv=] Represents all characters belonging to the same equivalence
class as equiv, ordered by their encoded values.
tr exits 0 on success, and >0 if an error occurs.
|
The Mac OSX darwin BSD version as of 10.5.6 exits if a copyright symbol (x'A9', © ), left double quote (x'93', “) etc, B8, D1, C0, CF, C2, D8, D4 is encountered with the message: tr: Illegal byte sequenceand a exit status of 1. This is easily corrected with:
|
tr "[:lower:]" "[:upper:]" < file † (not linux)
tr -d '\000' < file
tr -s ' ' < file
tr -cd† "[:print:]" < file
uniq -d prints the words that were adjacent duplicates.
tr -s '[:punct:][:space:]' '\n' < file | \
tr '[:upper:]' '[:lower:]'| \
uniq -d
CR (carriage return, x'0D' ) as the line terminator.LF (line feed, x'0A') .tr '\015' '\012' < macfile > Unixfile
CR and LF characters at the end of each line and have a ^Z at the end of the file. For unix delete CR and ^Z(x'1B') leaving the LF:tr -d '\015\032' < DOSfile > Unixfile
ascii mode.
tr "\200-\377" "\000-\177"
tr -cs "[:alpha:]" "\n" < file
e:tr "[=e=]" "e" < file
perl extension -U
tr -CU "\0-\xFF" "" < file
tr -UC "\0-\x{FF}" "" < file
tr as described in environ.
tr [a-z] [A-Z] will work as it will map the [
character in matchString to the [ character in replaceString. However, if the
shell script is deleting or squeezing characters as in the command tr
-d [a-z], the characters [ and ] will be included in the deletion or compression list a-z to represent the three characters a, - and z use a\-z.
the feature wherein the last character of replaceString
is duplicated if replaceString has less characters than matchString is permitted by
POSIX but is not required. Shell scripts attempting to be portable to
other POSIX systems should use the [#*] .
.
-u is an extension
000 NUL 001 SOH 002 STX 003 ETX 004 EOT 005 ENQ 006 ACK 007 BEL 010 BS 011 HT† 012 NL 013 VT 014 NP 015 CR 016 SO 017 SI 020 DLE 021 DC1 022 DC2 023 DC3 024 DC4 025 NAK 026 SYN 027 ETB 030 CAN 031 EM 032 SUB 033 ESC 034 FS 035 GS 036 RS 037 US 177 DEL[:punct:] (octal)
041 ! 042 " 043 # 044 $ 045 % 046 & 047 '
050 ( 051 ) 052 * 053 + 054 , 055 - 056 . 057 /
072 : 073 ; 074 < 075 = 076 > 077 ?
100 @
133 [ 134 \ 135 ] 136 ^ 137 _
140 `
173 { 174 | 175 } 176 ~
See sed: stream editor for multiple character string manipulation.
BSD October 11, 1997