The C++Course provides a general introduction to programming in C++. It is based on A.B. Downey's book, How to Think Like a Computer Scientist. Click here for details. 
Home File Input/Output and pmatrices The Set Data Structure  
The Set Data Structure
A data structure is a container for grouping a collection of data into a single object. We have seen some examples already, including pstrings, which are collections of characters, and pvectors which are collections on any type. An ordered set is a collection of items with two defining properties:
In addition, our implementation of an ordered set will have the following property:
Both pstrings and pvectors have an ordering; every element has an index we can use to identify it. Both none of the data structures we have seen so far have the properties of uniqueness or arbitrary size. To achieve uniqueness, we have to write an add function that searches the set to see if it already exists. To make the set expand as elements are added, we can take advantage of the resize function on pvectors. Here is the beginning of a class definition for a Set. class Set {private: pvector<pstring> elements; int numElements; public: Set (int n); int getNumElements () const; pstring getElement (int i) const; int find (const pstring& s) const; int add (const pstring& s); }; Set::Set (int n) { pvector<pstring> temp (n); elements = temp; numElements = 0; } The instance variables are an pvector of strings and an integer that keeps track of how many elements there are in the set. Keep in mind that the number of elements in the set, numElements, is not the same thing as the size of the pvector. Usually it will be smaller. The Set constructor takes a single parameter, which is the initial size of the pvector. The initial number of elements is always zero. getNumElements and getElement are accessor functions for the instance variables, which are private. numElements is a readonly variable, so we provide a get function but not a set function. int Set::getNumElements () const{ return numElements; } Why do we have to prevent client programs from changing getNumElements? What are the invariants for this type, and how could a client program break an invariant. As we look at the rest of the Set member function, see if you can convince yourself that they all maintain the invariants. When we use the [] operator to access the pvector, it checks to make sure the index is greater than or equal to zero and less than the length of the pvector. To access the elements of a set, though, we need to check a stronger condition. The index has to be less than the number of elements, which might be smaller than the length of the pvector. pstring Set::getElement (int i) const{ if (i < numElements) { return elements[i]; } else { cout << "Set index out of range." << endl; exit (1); } } If getElement gets an index that is out of range, it prints an error message (not the most useful message, I admit), and exits. The interesting functions are find and add. By now, the pattern for traversing and searching should be old hat: int Set::find (const pstring& s) const{ for (int i=0; i<numElements; i++) { if (elements[i] == s) return i; } return 1; } So that leaves us with add. Often the return type for something like add would be void, but in this case it might be useful to make it return the index of the element. int Set::add (const pstring& s){ // if the element is already in the set, return its index int index = find (s); if (index != 1) return index; // if the pvector is full, double its size if (numElements == elements.length()) { elements.resize (elements.length() * 2); } // add the new elements and return its index index = numElements; elements[index] = s; numElements++; return index; } The tricky thing here is that numElements is used in two ways. It is the number of elements in the set, of course, but it is also the index of the next element to be added. It takes a minute to convince yourself that that works, but consider this: when the number of elements is zero, the index of the next element is 0. When the number of elements is equal to the length of the pvector, that means that the vector is full, and we have to allocate more space (using resize) before we can add the new element. Here is a state diagram showing a Set object that initially contains space for 2 elements. Now we can use the Set class to keep track of the cities we find in the file. In main we create the Set with an initial size of 2: Set cities (2);Then in processLine we add both cities to the Set and store the index that gets returned. int index1 = cities.add (city1);int index2 = cities.add (city2); I modified processLine to take the cities object as a second parameter.


Home File Input/Output and pmatrices The Set Data Structure 