Iterate Through Strings

A common task in computer programming is to analyze the contents of a string by using a loop to visit each character within it. The term “iterate through a string” is often used to describe this general programming pattern. As a super-simple example of this, say we wanted to write a program that printed out each character of a string separately. With a while-loop, this would look like:

string = 'The big brown fox'
i = 0
while i < len(string):
    print(string[i])
    i += 1

string = 'The big brown fox'
for i in range(0, len(string)):
    print(string[i])

In addition to these two options, you can also use a for loop to directly iterate through the characters of a string, rather that iterating through the indexes of the string. This looks like:

string = 'The big brown fox'
for char in string:
    print(char)

Take a close look at the difference between these two for-loops. In the latter, we no longer are specifying a range of number to loop through. In its place, we put the string variable directly. The for loop recognizes this, and repeats the indented code beneath the for loop for each character in the string, in order. On the first iteration, the char variable is given the value T, on the next it is given the value h, the next e, and so on. This is slightly simpler than the option that goes through the indexes, and is a handy feature to keep in mind. (NOTE: This feature of for-loops works with any object that is iterable, such as lists, dictionaries, sets, and strings. We won’t go into the details of what this means at this point. For now, just know that it works!) Generally, when a programmer says “iterate through the string,” any of these options would be acceptable. There are some specific cases where only one option will work, but for now, know that you have options.

With modifications, this programming pattern is useful for all kinds of text analysis in computer programs. Specifically, one things that this can be useful for is to determine if a string is formatted in a particular way, or follow specific rules. This issue of validating the format of string comes up a lot on websites where users have to create an account. Often, websites have rules about how the user profile information has to be formatted. You’ve probably all signed up for a website that has specific rules on what the password needs to be: minimum length of 8, at least one special character, at least one capital letter, etc. There’s also rules about formatting of inputs such as phone numbers or email addresses. A website might require you to enter your phone number in a very specific format, such as XXX-XXX-XXXX with dashes included. String iteration is useful for these kinds of problems.

Lets see how we can write a program that validates that a phone number is in this format (XXX-XXX-XXXX). More specifically, we need a program that asks a user for a phone number, and then checks to ensure that:

The first check is fairly straightforward. We can write an if statement to check for this case.

phone = input('Enter your phone number: ')
if len(phone) != 11:
    print('Invalid phone number!')

if phone[3] != '-' or phone[7] != '-':
    print('Invalid phone number!')

Now, we could do step 3 with an if-statement too. However, the code would get kind of ugly, due to the number of positions in the string we would have to check. For example, this code works but is not very pretty.

if phone[0].isdigit() == False or phone[1].isdigit() == False or phone[2].isdigit() == False or phone[4].isdigit() == False or phone[5].isdigit() == False or phone[6].isdigit() == False or phone[8].isdigit() == False or phone[9].isdigit() == False or phone[10].isdigit() == False or phone[11].isdigit() == False: 
    print('Invalid phone number!')

U. G. L. Y. Rather than having this one, gigantic loop condition, lets check these digits with a loop instead.

for i in range(0, 11):
    if i != 3 and i != 6 and phone[i].isdigit() == False:
        print('Invalid phone number!')

There we go. Yes, it is one additional line of code, but ultimately is is neater than the previous option. Putting it all together, we should get something like this:

phone = input('Enter your phone number: ')

# Check to ensire that what the user gave us is 11 characters
if len(phone) != 11:
    print('Invalid phone number!')

# Ensure that the dashes are in the correct locations
if phone[3] != '-' or phone[7] != '-':
    print('Invalid phone number!')

# Ensure every other character is a digit
for i in range(0, 11):
    if i != 3 and i != 6 and phone[i].isdigit() == False:
        print('Invalid phone number!')

This program should correct tell us if a phone number was not formatted correctly. However, it is still flawed in a few ways. In particular:

For (A), we can fix this using the exit() function. If you call exit(), it will terminate (end) the program at the line of code that exit runs on, and not continue with the regular program flow. Thus, we can add a call to exit() within each of the times we check for a malformed number. If we do this, we can then solve (B) by putting one more print at the end of the program. What we end up with is:

phone = input('Enter your phone number: ')

# Check to ensire that what the user gave us is 11 characters
if len(phone) != 11:
    print('Invalid phone number!')
    exit()

# Ensure that the dashes are in the correct locations
if phone[3] != '-' or phone[7] != '-':
    print('Invalid phone number!')
    exit()

# Ensure every other character is a digit
for i in range(0, 11):
    if i != 3 and i != 6 and phone[i].isdigit() == False:
        print('Invalid phone number!')
        exit()

print('Valid phone number!')

As usual, try this program out yourself in an IDE of your choice and test it with various correct and incorrect cases. Does it work correctly?