Addressing the glycan complexity by using mass spectrometry: In the pursuit of decoding glycologic

Biomolecules often contain carbohydrates. Such molecules, namely glycoconjugates, play central roles in cell adhesion, tumor migration, and attachment of pathogens. The conjugation of “natural products” with glycans is not a template-dependent process but is achieved by multiple enzymatic reactions. Because of the nature of the synthetic process, glycans exist as a complex mixture creating a glycoform and thus, the analysis becomes inevitably difficult. Mass spectrometry is one of the most informative, and thus important, tools for the structural analysis of glycans. The obvious advantage of mass spectrometry is the high sensitivity at low femtomole detection levels. A variety of ions can be obtained depending on the adducted cationic species. In order to obtain sequential information, the gas-phase dissociation reaction is used. Among the various techniques, collision-induced dissociation (CID) has been frequently used in the past. Although the technique is proven useful, there are cases that normal CID process is not suitable. This review highlights an emerging technique that focuses on the activation-energy difference between the isomeric glycans, and will complement the current methods. Furthermore, the possible source and the methodology for obtaining useful structural information are discussed.


Introduction
Nucleotides, amino acids, carbohydrates, and lipids are the essential elementary molecular units constituting biological systems. Among these, polymers and/or their conjugates directly drive biological events. Glycans, the oligomers of carbohydrates, linked to lipids and proteins, play pivotal roles in many biological phenomena. Since glycan synthesis is nontemplate dependent, the structural investigation is inevitable. The fact that the human glycan is composed of nine key monosaccharides (Figure 1) does not give the impression that the structural analysis is difficult. The actual analysis, however, is far from easy, involving problems associated with sample amounts, and more importantly, the structural complexity arising from the presence of isomeric structures.
Mass spectrometry (MS) is advantageous in handling a minute amount of sample and in obtaining structure-related information. Therefore, it is one of the important techniques used in the structural analysis of glycan. Advanced MS equipment is becoming more accessible and is supporting research on glycomics and glycoproteomics. The position of glycan attachment could be obtained by a sophisticated method using stable isotope labeling during N-glycanase digestion, but the glycan structure cannot be elucidated [1,2]. Therefore, the glycan structure must be addressed independently after cleavage from the protein, or as a glycopeptide after proteolytic digestion [3][4][5].
MS provides the mass-to-charge ratio (m/z value), while further detailed structural information is obtained from MS/MS experiments. Collision-induced dissociation (CID) is the most frequently used technique to obtain fragmentation information [6][7][8][9][10][11][12][13][14][15][16]. The low-energy collision method is most often carried out using triple quadrupole (TQ) and quadrupole ion trap (QIT), which results in fewer fragment ions as a result of the cleavage of relatively labile chemical bonds. Generally, it is very difficult to gather systematic information about the ions obtained from isomeric compounds because of the limited availability of reference compounds, but such ions can at least be distinguished from one another. Despite the difficulties in obtaining structural information in a systematical manner, it is possible to distinguish the isomeric glycan structures, including the linkage positions, and more importantly, the anomeric configurations [3,[17][18][19]. Diastereomeric ions such as anomeric isomers can be easily distinguished by low-energy CID methods, which will be discussed in a later section. However, in this case, although the linkage position can be determined by analyzing the fragment ions obtained during CID experiments, the anomeric configuration cannot be determined, unless the observer has a reference material of the exact structure, which is not practical.
In order to achieve "inner cleavages" of stable carbon-carbon linkages, high-energy CID is preferred [20][21][22][23]. The high-energy collision method provides a large variety of fragments, which is often used in shotgun proteomics research, typically using the time of flight (TOF)-TOF equipment [24,25]. However, the acetal structures present in glycans are prone to dissociation at rather low energy, and the database of the obtained complex MS/MS spectra has not been well developed; this has hampered research on glycoproteomics.

Review Chemical aspects of glycan structure
The "diversity" of glycans might be described as follows. Among various biomolecules, the glycans contain the largest amount of information in their structure within a given volume. The information is defined by three factors: 1) the combination of the chiral centers present in a constituting unit structure, 2) the functional groups in a unit, and 3) the linkage position between the units. These represent the chiral centers present at C-2-C-5 in individual monosaccharides such as glucose, mannose, and galactose, and the anomeric configurations of the glycosidic (α-and β-) linkage between individual sugar units; the functional groups at each carbon atom, such as hydroxyl, N-acetyl, and sulfate; and the glycosidic linkage position of individual sugar units, respectively (See Figure 1). We now consider the amount of information possibly contained in an n-mer of carbohydrate units. If we assign one "bit" to a single chiral center, hexoses such as glucose and mannose contain 2 5 bits of information. Thus, an n-mer of a randomly selected hexose contains 2 5 •n bits. This theoretical number includes D-and L-sugars as well. Furthermore, we should take into account the structural diversity caused by the linkage position, which adds an extra factor of 4•(n-1) to the theoretical number. Generally, the glycan function is observed in molecules up to pentasaccharides including branched structures, making the calculation of the amount of information more complex (Figure 2) [26]. On the other hand, the number for the peptide is based on the chirality of amino acid (2), the number of different side chains (19) and glycine without chirality, and the number of units (n). A chiral center can be interpreted as "0" and "1". b. A six-carbon monosaccharide, hexose, consisting of five chiral centers that make all the hexoses, which can be described by the combinations of "0" and "1". c. Hexoses can hold a substituting group at each carbon atom. d. Monosaccharides can be interconnected through an acetal linkage called glycosidic linkage. Glycan can form branched structures as well.
The theoretical number gives the possible number of configurations and only some of them were synthesized and used by organisms. Life has evolved by using a small fraction of this capability space, but this can be interpreted that life has infinite possibility for the future taking advantage of the numerous amount of information that can be incorporated into the glycan structures. It is easy to imagine that such complex and diverse glycan structures can be utilized in a variety of interactions among the other biomolecules. Glycans attached to proteins and lipids play pivotal roles in many biological phenomena such as malignant transformation, immunity, differentiation, and infectious diseases [27][28][29][30][31]. Therefore, doi: 10.7243/2052-9341-5-3 glyco-related compounds are the candidates for drug seed and functional materials, which impart interfacing capability to cells and tissues. The glycans themselves are susceptible to endogenous hydrolases, and thus, they might not be suitable as drug candidates. However, replacement of the acetal oxygen atom with sulfur atom or methylene group results in resistance to hydrolyzing enzymes [32,33]. Furthermore, drug discovery based on the identified pharmacophore seems more promising [34,35]. The importance of utilizing combinatorial library is also recognized in the glycan research field, which is still extremely challenging [27][28][29]31,32,34,[36][37][38][39][40][41][42][43][44][45][46][47][48][49][50].
The glyco-related library will be utilized as seeds to find new therapeutic drug candidates. Despite the difficulties in accessing the glyco-related library, it is considered that such a library is extremely useful for structural investigation because structural information is contained in a logically arranged series of compounds. We will discuss this aspect in Combinatorial library as the promary information source section.

Dissociation methods in mass spectrometry focusing on the energy difference between structural isomers
A series of fragments, product ions, are produced from a precursor ion under the CID conditions. This means that the fragment possesses some structural features of the precursor. Therefore, it is possible to observe the dissociation of the precursor ion or the formation of the product ions.
For example, a singly positively charged linear glycan mainly produces a series of products as a result of the cleavage of relatively labile glycosidic linkages under low-energy CID conditions (Figure 3). The spectrum providing the m/z values does not give any information regarding the presence or absence of structural isomers in the sample and the probable isomer when it is a pure sample. The structurally isomeric glycans produce very similar spectra consisting of a series of ions with the same m/z values. However, the heights of some of the signals are often different, making it possible to distinguish the isomeric compounds by comparing the spectra (Figure 4) [51,52]. The spectral matching method relies on the fact that the different chemical bonds in isomeric compounds require different energies to dissociate, producing a distinguishable pair of spectra. For determining the exact structure in addition to distinguishing, it is necessary to have a variety of reference compounds. Importantly, the product ions can also be used as references in the spectral matching by performing further CIDs in the MS n stages [53][54][55].
The key to spectral matching is the availability of the right reference for comparison. However, it is practically not possible to obtain all the reference samples or their information. In such instances, it is of extreme importance to extract as much information as possible using the available glycans.
Let us consider a pair of synthetic disaccharides consisting of an L-fucose (Fuc) and a D-galactose (Gal) (Figure 5). The CID of the sodiated α-and β-fucosyl galactosides of ROH (α-and βFuc-(1→2)-αGal-OR) results in the preferential formation of Y-ions as product ions (see Figure 3 for assignment). The Y-ion, [Gal-OR+Na] + in this case, is proven to be pure enough to be used as a reference for comparing the other ions produced from different precursor ions (see the following section) [56]. Therefore, this type of ion species can be suitably used as a reference in spectral matching. From such information, the isomeric structure is elucidated. However, dissociation of Fuc units in both isomeric compounds is the sole event in the CID experiments, which makes distinguishing spectra impossible. Fucose belongs to the family of deoxy sugars. Absence of electron withdrawing substituent group nearby the acetal position stabilizes the oxocarbenium intermediate, and making a glycosidic linkage of such sugar more labile. Therefore, cautions have to be paid when analyzing the structure of glycans containing deoxy sugars. Another important carbohydrate is sialic acid, which is also a deoxy sugar. This particular monosaccharide is deeply involved in a number of biological events; therefore, its structural elucidation is important [57][58][59]. Unlike the other deoxy sugars, sialic acid has a unique structure with nine carbon atoms, a carboxyl group at the C-1 position, and a carbonyl group at C-2 forming a glycosidic linkage. (Figure 1) Although this particular linkage is as labile as those of the other deoxy sugars, it becomes as stable as those of the other ordinary monosaccharides after esterification [60,61] or amidation [62,63] of a carboxyl group. Another important feature of sialic acid is that if linked to 3-position of adjacent galactose could form lactone structure, which can be used to distinguish the positional isomers of this particular monosaccharide [64].

Suitable ion species for the structural analysis
The most important issue in the analysis of glycans, although  not often discussed, is whether the reference data being used are reliable. The data would be unreliable if obtained from the CID of impure ions. A pure reference compound is easily obtained by careful purification using techniques such as HPLC, and its structure can be determined by NMR. Here, it is difficult to decide if the spectral data obtained from a product ion under the CID conditions are reliable. Authenticity of data is imperative in accurate spectral matching, although this aspect has seldom been the topic of discussion. The C-ion species produced from a glycoside has a hemiacetal structure that might anomerize under CID conditions, judging by an analogy of behavior of hemiacetals in aqueous medium. To ascertain this issue, a new concept, stage-discriminated correlation of energy-resolved mass spectrometry, was introduced [56]. After a statistical evaluation of the MS/MS data obtained from the above series of ions, the A-and B-ions were found to be mixtures of various structural isomers, in principle. On the other hand, the C-and Y-ions were suitable precursors for further MS/MS experiments. Interestingly, the hemiacetal structure (C-ion) does not undergo anomerization under low-energy CID conditions [65,66]. The C-and Y-ion species are often found in the MS/MS spectrum; hence, they are considered to be suitable reference ions. Furthermore, the anomeric configuration of a glycosidic linkage can be determined by comparing the MS/MS data of C-ion as a precursor. Taking advantage of synthetic chemistry, it was reported that the carbohydrate C-ions could be generated stereo-specifically in gas phase, from the corresponding 4-aminobutylglycosides [67,68]. This particular glycoside with an amino group could also be utilized in preparing the required conjugates for biological investigations [69].

New parameter for use in structural analysis
Is there any structural information other than the m/z values and relative signal heights useful in glycan analysis? Relatively weak chemical bonds tend to dissociate considerably faster compared to others. For example, glycosidic linkages are labile; hence, we often observe product ions associated to their cleavage. Among them, the glycosidic linkages of fucose (Fuc) and sialic acid (Sia) are more labile and often dissociate prior to the other glycosidic linkages as discussed above. In the dissociation of the individual ions, αFuc-(1→2)-αGal-OR and βFuc-(1→2)-αGal-OR, fucosyl cleavage is the sole event ( Figure 5). In such a case, the spectral matching method cannot be applied. Hence, a new parameter is required to address the cases involving of an ion containing labile chemical bond. Usually, CID is carried out at a fixed collision energy. Varying the energy leads to changes the signal heights of the product ions. A method that collects a series of spectra over a range of collision-energies is the energy-resolved mass spectrometry (ERMS), which provides information regarding the activation energy of a gas-phase dissociation reaction [70]. The dissociation of Fuc residues from a pair of precursor ions in the above case can be identified by ERMS (Figure 6). Since the method is very sensitive to the structural differences, it has been used to distinguish structurally closely related glycans [71][72][73][74][75].
ERMS is often carried out on the QIT MS apparatus, but is not limited to a particular equipment. In the QIT system, the series of product ions produced by MS/MS are not further subjected to collisional excitation because the product ions are just contained inside the Q-field while the precursor ion only is activated. As a result, the product ions remain without any further dissociation, which is an advantage over the ERMS carried out in the TQ system. However, the TQ system is also used to analyze the energy dependence of a precursor ion under the CID conditions. In such a case, the product ions are subjected to further collisional excitation since all the ions drift through the Q-field, which results in "peak-shaped" growth-andbreak down curves unlike the "growth curves" usually observed in the QIT system. Based on the observation of individual product ions, it was reported that structural discrimination was possible in the LC-TQ MS system [76]. Further increase in the CID energy results in the breaking down of the backbone peptide linkage and enables the peptide sequence analysis in an ERMS experiment cycle [77]. Therefore, ERMS is a quite useful technique in the glycoproteomics research as well. The advantages of ERMS carried out on the TQ-MS system are as follows:1) the isomeric glycan structures can be distinguished, 2) the site of glycan attachment is identified, and 3) the sequence of the peptide portion is obtained at the same time.
One disadvantage of ERMS on TQ-system could be the limitation of the stage of MS/MS to be MS 2 due to its specification, although it is not limited, in theory, in QIT system. However, it was shown that mathematical treatment of the ERMS data obtained in the TQ system enabled the analysis of the relationships between the product ions. It is considered that the obtained data are equivalent to those obtained by the MS n stage in the QIT system [78].

Other methods useful in glycan analysis
So far, we highlighted the collision-induced dissociation doi: 10.7243/2052-9341-5-3 techniques used to obtain useful information in the glycan analysis. Other important techniques are available for glycaninvolving analysis, a few of which are briefly summarized in this section.

Electron-transfer dissociation
Electron-transfer dissociation (ETD) is a promising fragmentation technique because a radical anion transfers an electron to a peptide precursor cation, causing radical-based fragmentation at the N-Cα bond on a peptide. This process is orthogonal to the vibrational excitation process in the CID-based fragmentation. The availability of suitable analytical equipment is also a critical factor for the successful application of ETD. An instrument equipped with both ETD and CID units is available and is advantageous for the analysis of the glycopeptides obtained after proteolytic digestion [7,8,13,14,79,80].

Infrared photodissociation
The mechanism of CID is based on the vibrational activation of ions of biomolecules, triggered by repeated collisions with inert gas molecules and the consequent achievement of the transition state to form various product ions. A similar activation of ions might be possible by irradiating a resonant infrared (IR) beam on the trapped ion cloud. Although the observed dissociation events have not been explained, it is possible to distinguish the isomeric oligosaccharides by this method [81][82][83][84].

Ion-mobility mass spectrometry
Ion-mobility mass spectrometry (IMS/MS) has garnered much attention in the isomeric glycan analysis, in conjunction with MS/MS techniques. This technique is executed in a hybrid system with an ion-mobility separation functionality and the MS system. The first unit in this mass spectrometer is the IMS unit, where the ions travel through an electric field filled with inert gas. The unit does not provide sufficient energy for the ion dissociation but affects the traveling speed of the ion depending on the ion cross section. The ions of isomeric glycans differ in their molecular structures, and in turn, their cross sections. Therefore, the ions show different mobilities in an electric field. The technique can be advantageous in analyzing the isomeric glycans and their mixtures [85,86].

Combinatorial library as the primary information source
Glycans are composed of multiple carbohydrate units, and an aspect of diversity of that family of molecules can be described by a combinatorial number calculated for an oligomer as discussed in Chemical aspect of glycan structure section. Glycans are synthesized by the sequential reactions of enzymes; hence, only some parts of the combinatorial numbers can exist. This fact indicates that the number of reference compounds is limited and forces us to investigate a glycan structure using limited information. This might seem sufficient when one focus on molecules obtained from a particular organism, but it cannot be applied to the analysis of molecules from a wide range of natural sources. To address this issue, two methodologies can be considered: 1) establishing a method for the determination of the absolute configuration of a chiral center or 2) fulfilling all the possible structural space for determining the absolute glycan structure.
Further research is needed before we can establish a method for the absolute configuration, but we might be able to adopt the second method by utilizing a combinatorial oligosaccharide library. As already discussed, a great deal of effort has been devoted to generate a combinatorial library, although the target is to find drug seeds. Such a library is considered a "seed" of the glycan structural information as well. A pioneering study in this direction used a series of synthetic trisaccharides and discussed the energy differences among them toward glycosyl dissociation [87]. The expansion of obtainable information not only generates more reference data, but also enables abstracting structural information even the library is not complete. This might be possible because a combinatorial library consists of a logical series of molecules. Owing to the logical arrangement of structures, some missing data belonging to the absent compounds might be estimated. Several approaches have been undertaken to estimate the missing data.
One of the approaches focusing on the glycosyl dissociation using a small combinatorial library analyzed the effect of anomeric configurations on the dissociation. All the possible anomeric combinations of the sequence, Fuc-Gal-Glc-OR, were analyzed using ERMS, showing that anomeric configurations could be resolved [73]. Furthermore, investigation of all the combinations of anomers and linkage isomers of Fuc-Gal-OR has been reported, with the activation energies for all the 16 isomeric disaccharides being distinguishable, reflecting their structural features [71].
Here, an important finding is that there exists a correlation between the structure and the information obtained from the MS/MS experiments, which indicates that the assumption that some missing data might be estimated using a well-designed logical set of library compounds is reasonable.

Conclusion
MS has contributed to the investigation of glycans with diverse structures. In most cases, the amount of the diverse glycan related compounds isolated from natural sources is limited. This poses difficulties in the structural analysis. MS is advantageous in such instances because of its high sensitivity. Techniques capable of isolating such small amounts of samples are required, in addition to new preparation methods.
Although not included in this review, the recent advances in deriving glycan samples from glycoproteins and glycolipids and their use in the biological investigations are also worth mentioning [88,89].
Furthermore, vast amounts of information cannot be handled without informatics [90][91][92][93][94]. In order to obtain the doi: 10.7243/2052-9341-5-3 product ions, a certain form of energy is required for exciting the precursor ion. Research on glycan analysis is certainly gaining momentum, suggesting that more products are being formed. With some impetus, structural analysis of glycans may reveal more important products.