Collagen is a family of at least 29 structural proteins derived from over 40 human genes (Myllyharju & Kivirikko 2004). It is the main component of connective tissue, and the most abundant protein in mammals making up about 25% to 35% of whole-body protein content. A defining feature of collagens is the formation of trimeric left-handed polyproline II-type helical collagenous regions. The packing within these regions is made possible by the presence of the smallest amino acid, glycine, at every third residue, resulting in a repeating motif Gly-X-Y where X is often proline (Pro) and Y often 4-hydroxyproline (4Hyp). Gly-Pro-Hyp is the most common triplet in collagen (Ramshaw et al. 1998). Collagen peptide chains also have non-collagenous domains, with collagen subclasses having common chain structures. Collagen fibrils are mostly found in fibrous tissues such as tendon, ligament and skin. Other forms of collagen are abundant in cornea, cartilage, bone, blood vessels, the gut, and intervertebral disc. In muscle tissue, collagen is a major component of the endomysium, constituting up to 6% of muscle mass. Gelatin, used in food and industry, is collagen that has been irreversibly hydrolyzed.
On the basis of their fibre architecture in tissues, the genetically distinct collagens have been divided into subgroups. Group 1 collagens have uninterrupted triple-helical domains of about 300 nm, forming large extracellular fibrils. They are referred to as the fibril-forming collagens, consisting of collagens types I, II, III, V, XI, XXIV and XXVII. Group 2 collagens are types IV and VII, which have extended triple helices (>350 nm) with imperfections in the Gly-X-Y repeat sequences. Group 3 are the short-chain collagens. These have two subgroups. Group 3A have continuous triple-helical domains (type VI, VIII and X). Group 3B have interrupted triple-helical domains, referred to as the fibril-associated collagens with interrupted triple helices (FACIT collagens, Shaw & Olsen 1991). FACITs include collagen IX, XII, XIV, XVI, XIX, XX, XXI, XXII and XXVI plus the transmembrane collagens (XIII, XVII, XXIII and XXV) and the multiple triple helix domains and interruptions (Multiplexin) collagens XV and XVIII (Myllyharju & Kivirikko 2004). The non-collagenous domains of collagens have regulatory functions; several are biologically active when cleaved from the main peptide chain. Fibrillar collagen peptides all have a large triple helical domain (COL1) bordered by N and C terminal extensions, called the N- and C-propeptides, which are cleaved prior to formation of the collagen fibril. The intact form is referred to as a collagen propeptide, not procollagen, which is used to refer to the trimeric triple-helical precursor of collagen before the propeptides are removed. The C-propeptide, also called the NC1 domain, directs chain association during assembly of the procollagen molecule from its three constituent alpha chains (Hulmes 2002).
Fibril forming collagens are the most familiar and best studied subgroup. Collagen fibres are aggregates or bundles of collagen fibrils, which are themselves polymers of tropocollagen complexes, each consisting of three polypeptide chains known as alpha chains. Tropocollagens are considered the subunit of larger collagen structures. They are approximately 300 nm long and 1.5 nm in diameter, with a left-handed triple-helical structure, which becomes twisted into a right-handed coiled-coil 'super helix' in the collagen fibril. Tropocollagens in the extracellular space polymerize spontaneously with regularly staggered ends (Hulmes 2002). In fibrillar collagens the molecules are staggered by about 67 nm, a unit known as D that changes depending upon the hydration state. Each D-period contains slightly more than four collagen molecules so that every D-period repeat of the microfibril has a region containing five molecules in cross-section, called the 'overlap', and a region containing only four molecules, called the 'gap'. The triple-helices are arranged in a hexagonal or quasi-hexagonal array in cross-section, in both the gap and overlap regions (Orgel et al. 2006). Collagen molecules cross-link covalently to each other via lysine and hydroxylysine side chains. These cross-links are unusual, occuring only in collagen and elastin, a related protein.
The macromolecular structures of collagen are diverse. Several group 3 collagens associate with larger collagen fibers, serving as molecular bridges which stabilize the organization of the extracellular matrix. Type IV collagen is arranged in an interlacing network within the dermal-epidermal junction and vascular basement membranes. Type VI collagen forms distinct microfibrils called beaded filaments. Type VII collagen forms anchoring fibrils. Type VIII and X collagens form hexagonal networks. Type XVII collagen is a component of hemidesmosomes where it is complexed wtih alpha6Beta4 integrin, plectin, and laminin-332 (de Pereda et al. 2009). Type XXIX collagen has been recently reported to be a putative epidermal collagen with highest expression in suprabasal layers (Soderhall et al. 2007). Collagen fibrils/aggregates arranged in varying combinations and concentrations in different tissues provide specific tissue properties. In bone, collagen triple helices lie in a parallel, staggered array with 40 nm gaps between the ends of the tropocollagen subunits, which probably serve as nucleation sites for the deposition of crystals of the mineral component, hydroxyapatite (Ca10(PO4)6(OH)2) with some phosphate. Collagen structure affects cell-cell and cell-matrix communication, tissue construction in growth and repair, and is changed in development and disease (Sweeney et al. 2006, Twardowski et al. 2007). A single collagen fibril can be heterogeneous along its axis, with significantly different mechanical properties in the gap and overlap regions, correlating with the different molecular organizations in these regions (Minary-Jolandan & Yu 2009).