Introduction to Tokens
In C++, a token is the smallest individual element of a program that is meaningful to the compiler. Tokens are the building blocks of any C++ program. The C++ compiler breaks down a program into these tokens during the lexical analysis phase of compilation.
Types of Tokens in C++
C++ has five types of tokens:
- Keywords
- Identifiers
- Literals
- Operators
- Punctuators/Separators
Let’s examine each type in detail:
1. Keywords
Keywords are predefined, reserved words in C++ that have special meanings to the compiler. They cannot be used as identifiers (such as variable names).
Common keywords in C++ include:
auto break case catch char class
const continue default delete do double
else enum explicit export extern false
float for friend goto if inline
int long mutable namespace new operator
private protected public register return short
signed sizeof static struct switch template
this throw true try typedef typeid
typename union unsigned using virtual void
volatile wchar_t while
Example usage:
int main() {
if (condition) {
return 0;
} else {
return 1;
}
}
In this example, int, if, else, and return are keywords.
2. Identifiers
Identifiers are names given by programmers to entities such as variables, functions, classes, etc. They help in identifying and referring to these entities in the program.
Rules for Writing Identifiers in C++:
- Identifiers can contain alphabets (both uppercase and lowercase), digits, and underscores (_)
- Identifiers must begin with either an alphabet or an underscore
- Identifiers cannot begin with a digit
- Keywords cannot be used as identifiers
- Identifiers are case-sensitive (
myVariableandmyvariableare different) - No spaces or special characters (like !, @, #, $, %, etc.) are allowed
- There is no limit on length, but only the first 31 characters might be significant (depending on the compiler)
Valid Identifiers:
age
_count
totalAmount
user_name
myVariable1
_temp
firstName
Invalid Identifiers:
123num // Cannot start with a digit
my-name // Hyphen not allowed
class // This is a keyword
user name // No spaces allowed
total$amount // Special character $ not allowed
Naming Conventions:
While not enforced by the compiler, following standard naming conventions improves code readability:
-
Camel Case: First letter of the first word is lowercase, and first letter of each subsequent word is uppercase
- Example:
studentName,totalAmount
- Example:
-
Pascal Case: First letter of each word is uppercase
- Example:
StudentName,TotalAmount
- Example:
-
Snake Case: Words are separated by underscores and all letters are lowercase
- Example:
student_name,total_amount
- Example:
-
Screaming Snake Case: Words are separated by underscores and all letters are uppercase (often used for constants)
- Example:
MAX_VALUE,PI_VALUE
- Example:
3. Literals
Literals are fixed values that appear directly in the code and do not change during execution.
Types of Literals:
-
Integer Literals:
42 // Decimal (base 10) 042 // Octal (base 8, starts with 0) 0x2A // Hexadecimal (base 16, starts with 0x) 0b101010 // Binary (base 2, starts with 0b, C++14 feature) -
Floating-Point Literals:
3.14159 // Double by default 3.14159f // Float (suffix f or F) 3.14159L // Long double (suffix l or L) 1.5e10 // Scientific notation (1.5 × 10^10) -
Character Literals:
'A' // Single character '\n' // Newline escape sequence '\t' // Tab escape sequence '\0' // Null character -
String Literals:
"Hello, World!" "C++ Programming" "Line 1\nLine 2" // With escape sequence -
Boolean Literals:
true false -
Pointer Literal:
nullptr // Represents null pointer (C++11 feature)
4. Operators
Operators are symbols that tell the compiler to perform specific mathematical or logical operations.
Categories of Operators:
- Arithmetic Operators:
+,-,*,/,% - Relational Operators:
==,!=,>,<,>=,<= - Logical Operators:
&&,||,! - Bitwise Operators:
&,|,^,~,<<,>> - Assignment Operators:
=,+=,-=,*=,/=,%=, etc. - Increment/Decrement Operators:
++,-- - Member Access Operators:
.,-> - Other Operators:
sizeof,?:(conditional), etc.
5. Punctuators/Separators
Punctuators or separators are symbols that help in grouping and separating parts of the program.
Common punctuators include:
{ } // Braces (for blocks of code)
( ) // Parentheses (for expressions and function calls)
[ ] // Brackets (for arrays)
; // Semicolon (statement terminator)
, // Comma (separator for parameters or expressions)
: // Colon (for labels and class access specifiers)
# // Pound/Hash (for preprocessor directives)
Example of Tokens in a C++ Program
Let’s break down a simple C++ program into tokens:
#include <iostream>
int main() {
int sum = 10 + 20;
std::cout << "Sum is: " << sum << std::endl;
return 0;
}
Token breakdown:
- Keywords:
int,return - Identifiers:
main,sum,std,cout,endl - Literals:
10,20,0,"Sum is: " - Operators:
=,+,<< - Punctuators:
#,<,>,(,),{,},;,::
Best Practices for Using Identifiers
-
Use descriptive names: Choose names that clearly indicate the purpose of the identifier
// Not descriptive int x = 5; // Descriptive int studentAge = 5; -
Consistency: Follow a consistent naming convention throughout your code
-
Avoid excessively long names: While descriptive, extremely long names can make code harder to read
// Too long int numberOfStudentsWhoPassedTheExamination = 25; // Better int studentsWhoPassedExam = 25; -
Avoid similar names: Avoid names that look similar and might cause confusion
// Confusing int userInput = 10; int userinput = 20; -
Use meaningful names for loop variables: While
i,j, andkare traditional for simple loops, more descriptive names can improve readability in complex situations// Traditional for(int i = 0; i < 10; i++) { ... } // More descriptive when needed for(int studentIndex = 0; studentIndex < totalStudents; studentIndex++) { ... }
Common Errors Related to Identifiers
-
Using reserved keywords as identifiers
int class = 10; // Error: 'class' is a keyword -
Beginning an identifier with a digit
int 1count = 10; // Error: identifiers can't start with a digit -
Using special characters in identifiers
int salary$ = 5000; // Error: '$' is not allowed in identifiers -
Using already declared identifiers
int value = 10; int value = 20; // Error: redeclaration of 'value' -
Case sensitivity issues
int Count = 10; count = 20; // Error: 'count' undeclared (Note: Count ≠ count)
Understanding tokens, especially identifiers, is crucial for writing syntactically correct and readable C++ code. Following the rules and best practices for identifiers will make your code more maintainable and less prone to errors.