Initializing User Defined Data Structures

How to handle pointers to dynamically allocated memory when programming in C++.

The term user friendly is not the term new programmers usually associate with C++. One of the darkest areas in the entire C++ jungle is the place where students are supposed to find out how to initialize data structures accessed by pointers. Consider Listing 1, a simple program using dynamically allocated arrays coded with the Standard Template Library (STL).

Listing 1. Array class without a copy constructor and an overloaded assignment operator = .


1.  #include <iostream>
2.  using namespace std;
3.  template <class T>
4.  class array
5.              {
6.          private:
7.          T *a;
8.          int array_length;
9.          public:
10.         array(int i) {a=new T[i];array_length=i;};
11.         ~array(){delete [] a;};
12.         int & operator[](int i) const {return a[i];};
13.         };
14. int main(int argc, char* argv[])
15.         {
16.         array<int> w(1);
17.         array<int> x(1);
18.         w[0]=3;
19.         x=w;
20.         w[0]=5;
21.         cout << w[0] << " " << x[0] << endl;
22.         array<int>y=w;
23.         cout << w[0] << " " << y[0] << endl;
24.         w[0]=3;
25.         cout << w[0] << " " << y[0] << endl;
26.         array<int> z(w);
27.         cout << w[0] << " " << z[0] << endl;
28.         w[0]=5;
29.         cout << w[0] << " " << z[0] << endl;
30.         return 0;
31.         }

Not much is going on here. The output from lines 21, 23, 25, 27 and 29 should be:


5  3
5  5
3  5
3  5
5  3


But, it comes as a surprise to some people that the actual output is:



5  5 
5  5 
3  3 
3  3 
5  5


When w[0] was changed to 5, x[0] also was changed. This happened because the program uses pointers to dynamically allocated memory. The array variables used in the program in Listing 1 are pointers to dynamically allocated memory. Line 19 of the program copies the address of variable w to variable x. This has the effect of making x an alias of w. In short, array x really doesn't exist as an independent data structure. It simply is another name for array w. In many textbooks, this aliasing is called a shallow copy. The default constructor and assignment operator always produce a shallow copy. In situations where no dynamically allocated data structures exist, the default constructor and assignment operator are all that are necessary. However, in Listing 1, reliance on the defaults is disastrous. The results are the same even if array x is allocated and initialized in one step with array<int>=w; (line 23) or array<int>=x(w); (line 29). There is an additional bit of news: if you run the program in a debugger, it not only prints the wrong answers but it also ends with a segmentation fault.

Thankfully, these types of problems can be solved easily by overloading the assignment operator = and adding a copy constructor, as demonstrated in Listing 2. First, overload the assignment = operator. Lines 24-36 describe the operator overload. The first interesting statement is line 27. The if statement if(this!=& source) avoids wasting time on tautological statements, such as a=a;. The remainder of the method first creates a new instance of the array and then copies each element of the array on the right side of the assignment statement to the corresponding position in the array on the left side of the assignment statement. This is what most textbooks call a deep copy.

Listing 2. Array class with a copy constructor and an overloaded assignment operator = .



1.  #include <iostream>
2.  using namespace std;
3.  template <class T>
4.  class array
5.        {
6.        private:
7.        T *a;
8.        int array_length;
9.        public:
10.        array(int i) {a=new T[i];array_length=i;};
11.        array(const array &);
12.        ~array(){delete [] a ;};
13.        int & operator[](int i) const {return a[i];};
14.        array<T> & operator=(const array &);
15.        };
16. template <class T>
17. array<T>::array(const array &source)
18.        {
19.        array_length=source.array_length;
20.        a=new T[array_length];
21.        for(int i=0;i<array_length;i++)
22.         a[i]=source.a[i];
23.        };
24. template <class T>
25. array<T> & array<T>::operator=(const array &source)
26.        {
27.        if(this!=&source)
28.                {
29.                 array_length=source.array_length;
30.                 delete [] a;
31.                 a=new T[array_length];
32.                 for(int i=0;i<array_length;i++)
33.                     a[i]=source.a[i];
34.                }
35.        return *this;
36.        };
37. int main(int argc, char* argv[])
38.        {
39.        array<int> w(1);
40.        array<int> x(1);
41.        w[0]=3;
42.        x=w;
43.        w[0]=5;
44.        cout << w[0] << " " << x[0] << endl;
45.        array<int>y=w;
46.        cout << w[0] << " " << y[0] << endl;
47.        w[0]=3;
48.        cout << w[0] << " " << y[0] << endl;
49.        array<int> z(w);
50.        cout << w[0] << " " << z[0] << endl;
51.        w[0]=5;
52.        cout << w[0] << " " << z[0] << endl;
53.        return 0;
54.        }


What is a copy constructor and why is it necessary? Essentially, the copy constructor performs the same function as the overload of the assignment operator. First, a new instance of the object is created (line 20). Second, each element of the source object is copied to its corresponding element in the new object (lines 21-22). You can use a debugger to trace this program. The statement in line 48, array<int>y=w;, invokes the copy constructor as well as the overloaded assignment operator. Each element of the source instance of the array is copied to the corresponding element of the target instance of the array. The statement array<int>z(w); (line 54) needs to invoke only the copy constructor to achieve the same effect. Making these changes eliminates the unintentionally created alias we saw in Listing 1.

Finally, what about that segmentation fault? The copy constructor and assignment overload also fix that problem. The problem is caused by the way the deconstructor interacts with the variable that was a shallow copy. At program termination, the deconstructor first deletes the real variable. It then tries to delete the second variable. Because the shallow copy really created only one variable, however, and it was deleted by the first call of the deconstructor, the second call of the destructor has nothing to delete, thus causing a segmentation fault. It is interesting that the segmentation fault occurs only in the debugger; it does not occur when the program runs on the command line.

C++ is a wonderfully powerful programming language. Some sections of the language, however, are not intuitive. The simple acts of copying and initializing data structures accessed by pointers can lead to surprisingly incorrect results. Whenever you need to create these types of data structures, always create a copy constructor and overload the assignment operator = . These two simple steps can save you hours of debugging time.

William F. Simpson, PhD, is an Associate Professor of Computer Science at Emporia State University.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Initializing User Defined Data Structures

caspitas's picture

Thank you very much for this article. :-) It has help me a lot with some wrong concepts that I had.

Caspitas
Webmaster from

Billetes de avion.ORG

Re: Initializing User Defined Data Structures

AndrewBlucher's picture

One comment in the article could mislead novices:
"Essentially, the copy constructor performs the same function as the overload of the assignment operator".

I see this misunderstanding among novices frequently. As can be seen in the code, the op= does more work: it destroys the old array. In addition, line 27 does more than just avoid wasting time; in the current example the result will be incorrect, usually with a memory access violation, because after delete-ing the array we then try to copy it's contents.

For more, see the FAQ on Assignment operators, at
http://www.parashift.com/c++-faq-lite/assignment-operators.html

Enjoy,
Andy

Re: Initializing User Defined Data Structures

Anonymous's picture

Even with the suggested "improvements", the array class template is very unsafe to use. There are many operations in the copy-assignment operator and the copy-constructor that can fail, resulting in an exception being thrown:
- memory allocations
- default construction of the T objects
- copy-assignment of the T objects

These cause both the copy-constructor and the copy-assignment operator to lose memory; additionally, the copy-assignment operator may leave its left-hand argument in an unusable (it can't even be correctly destructed) state.

Every good C++ book teaches about how to correctly implement copy-constructors and copy-assignment operators. The reviews at http://www.accu.org/ are helpful in finding a good book.

Re: Initializing User Defined Data Structures

Anonymous's picture

shouldn't the index operator on line 13 be
T& operator[](int i) const {return a[i];};

instead of
int & operator[](int i) const {return a[i];};

Severe Bug!

Anonymous's picture

The shown code has a very severe Bug in it!
In the assignment operator beginning at line 25, first delete is called on "a", then new is called, whose result is assigned to "a". Now, if new fails and consequently throws an exception, "a" points do deleted storage.

Furthermore, this type is performance-wise a bad choice, if parameterized with types which have a non-trivial default constructor, because the copy constructor and assignment operator first create default initialized objects, and then assign to these objects.

And those are not all aspects which could be optimized.

Re: Severe Bug!

Anonymous's picture

Yep, will rarely cause problems in academic code... ;-)

Easy to fix though, and for posterity here is one way:
Change:

29. array_length=source.array_length;
30. delete [] a;
31. a=new T[array_length];

To:

T * temp;
int len;
len = source.array_length;
temp = new T[len];
// now we are ok.
delete [] a;
a = temp;
array_length = len;

The performance issues need to be considered they cause a problem. To be fair, STL containers have the same problem. Of course, if anybody still makes array template classes by hand they should be hit over the head with a hardcover copy of Stroustrup. IMO you need a very good reason to not use the standard library.

Re: Severe Bug!

Anonymous's picture

I do not think that the STL containers suffer from this problem. It is very likely that implementations construct the final objects on uninitialized storage, so that default constructors are not called in this situation.

Re: Initializing User Defined Data Structures

Anonymous's picture

If you need the destructor, more often than not you will need a copy constructor and an assignment operator.

I like the article

Anonymous's picture

As an extreme novice at C/C++, found that rather interesting. Thanks - I'm sure it'll save me a lot of cursing and frustration later.

It does, however, further reinforce my general preference to stay as far as possible from C++ or C, in favour of working in Python. Sure, it's a bit slower - but you don't need to battle the memory management, you get advanced data structures and functional programming tools out of the box, etc.

Of course, none of that is any good if you're working with others on an existing C++ project...

Re: I like the article

Anonymous's picture

Of course, none of that is any good if you're working with others on an existing C++ project...

... and if you have to optimize for memory size or speed. That used to be a very uncommon thing in the last 2-3 years, but now gains more interest since a lot of embedded stuff is being developed.

Re: I like the article

Anonymous's picture

Amusingly, I just realised that the problem you describe is very similar to a common one people hit in Python, when working with most mutable data types and with functions.

>>> x = [1,2,3,4]
>>> y = x
>>> y.append(5)
>>> print x
[1,2,3,4,5,6]

so this advice is perhaps more universal than anticpated.

Re: Initializing User Defined Data Structures

Anonymous's picture

Whenever you need to create these types of data structures, always create a copy constructor and overload the assignment operator = .

Or just use std::vector...