Initializing User Defined Data Structures

by William F. Simpson

The term user friendly is not the term new programmers usually associate with C++. One of the darkest areas in the entire C++ jungle is the place where students are supposed to find out how to initialize data structures accessed by pointers. Consider Listing 1, a simple program using dynamically allocated arrays coded with the Standard Template Library (STL).

Listing 1. Array class without a copy constructor and an overloaded assignment operator = .

1.  #include <iostream>
2.  using namespace std;
3.  template <class T>
4.  class array
5.              {
6.          private:
7.          T *a;
8.          int array_length;
9.          public:
10.         array(int i) {a=new T[i];array_length=i;};
11.         ~array(){delete [] a;};
12.         int & operator[](int i) const {return a[i];};
13.         };
14. int main(int argc, char* argv[])
15.         {
16.         array<int> w(1);
17.         array<int> x(1);
18.         w[0]=3;
19.         x=w;
20.         w[0]=5;
21.         cout << w[0] << " " << x[0] << endl;
22.         array<int>y=w;
23.         cout << w[0] << " " << y[0] << endl;
24.         w[0]=3;
25.         cout << w[0] << " " << y[0] << endl;
26.         array<int> z(w);
27.         cout << w[0] << " " << z[0] << endl;
28.         w[0]=5;
29.         cout << w[0] << " " << z[0] << endl;
30.         return 0;
31.         }

Not much is going on here. The output from lines 21, 23, 25, 27 and 29 should be:

5  3
5  5
3  5
3  5
5  3

But, it comes as a surprise to some people that the actual output is:

5  5 
5  5 
3  3 
3  3 
5  5

When w[0] was changed to 5, x[0] also was changed. This happened because the program uses pointers to dynamically allocated memory. The array variables used in the program in Listing 1 are pointers to dynamically allocated memory. Line 19 of the program copies the address of variable w to variable x. This has the effect of making x an alias of w. In short, array x really doesn't exist as an independent data structure. It simply is another name for array w. In many textbooks, this aliasing is called a shallow copy. The default constructor and assignment operator always produce a shallow copy. In situations where no dynamically allocated data structures exist, the default constructor and assignment operator are all that are necessary. However, in Listing 1, reliance on the defaults is disastrous. The results are the same even if array x is allocated and initialized in one step with array<int>=w; (line 23) or array<int>=x(w); (line 29). There is an additional bit of news: if you run the program in a debugger, it not only prints the wrong answers but it also ends with a segmentation fault.

Thankfully, these types of problems can be solved easily by overloading the assignment operator = and adding a copy constructor, as demonstrated in Listing 2. First, overload the assignment = operator. Lines 24-36 describe the operator overload. The first interesting statement is line 27. The if statement if(this!=& source) avoids wasting time on tautological statements, such as a=a;. The remainder of the method first creates a new instance of the array and then copies each element of the array on the right side of the assignment statement to the corresponding position in the array on the left side of the assignment statement. This is what most textbooks call a deep copy.

Listing 2. Array class with a copy constructor and an overloaded assignment operator = .

1.  #include <iostream>
2.  using namespace std;
3.  template <class T>
4.  class array
5.        {
6.        private:
7.        T *a;
8.        int array_length;
9.        public:
10.        array(int i) {a=new T[i];array_length=i;};
11.        array(const array &);
12.        ~array(){delete [] a ;};
13.        int & operator[](int i) const {return a[i];};
14.        array<T> & operator=(const array &);
15.        };
16. template <class T>
17. array<T>::array(const array &source)
18.        {
19.        array_length=source.array_length;
20.        a=new T[array_length];
21.        for(int i=0;i<array_length;i++)
22.         a[i]=source.a[i];
23.        };
24. template <class T>
25. array<T> & array<T>::operator=(const array &source)
26.        {
27.        if(this!=&source)
28.                {
29.                 array_length=source.array_length;
30.                 delete [] a;
31.                 a=new T[array_length];
32.                 for(int i=0;i<array_length;i++)
33.                     a[i]=source.a[i];
34.                }
35.        return *this;
36.        };
37. int main(int argc, char* argv[])
38.        {
39.        array<int> w(1);
40.        array<int> x(1);
41.        w[0]=3;
42.        x=w;
43.        w[0]=5;
44.        cout << w[0] << " " << x[0] << endl;
45.        array<int>y=w;
46.        cout << w[0] << " " << y[0] << endl;
47.        w[0]=3;
48.        cout << w[0] << " " << y[0] << endl;
49.        array<int> z(w);
50.        cout << w[0] << " " << z[0] << endl;
51.        w[0]=5;
52.        cout << w[0] << " " << z[0] << endl;
53.        return 0;
54.        }

What is a copy constructor and why is it necessary? Essentially, the copy constructor performs the same function as the overload of the assignment operator. First, a new instance of the object is created (line 20). Second, each element of the source object is copied to its corresponding element in the new object (lines 21-22). You can use a debugger to trace this program. The statement in line 48, array<int>y=w;, invokes the copy constructor as well as the overloaded assignment operator. Each element of the source instance of the array is copied to the corresponding element of the target instance of the array. The statement array<int>z(w); (line 54) needs to invoke only the copy constructor to achieve the same effect. Making these changes eliminates the unintentionally created alias we saw in Listing 1.

Finally, what about that segmentation fault? The copy constructor and assignment overload also fix that problem. The problem is caused by the way the deconstructor interacts with the variable that was a shallow copy. At program termination, the deconstructor first deletes the real variable. It then tries to delete the second variable. Because the shallow copy really created only one variable, however, and it was deleted by the first call of the deconstructor, the second call of the destructor has nothing to delete, thus causing a segmentation fault. It is interesting that the segmentation fault occurs only in the debugger; it does not occur when the program runs on the command line.

C++ is a wonderfully powerful programming language. Some sections of the language, however, are not intuitive. The simple acts of copying and initializing data structures accessed by pointers can lead to surprisingly incorrect results. Whenever you need to create these types of data structures, always create a copy constructor and overload the assignment operator = . These two simple steps can save you hours of debugging time.

William F. Simpson, PhD, is an Associate Professor of Computer Science at Emporia State University.

Load Disqus comments