Monday, April 25, 2011

Collections in Java


Morpho butterfly   Collection

You will see the term used in three contexts:
  1. collection in data structure terms, is a group of elements. It includes Sets and Lists. It also includes Maps which don’t implement Collection.
  2. Collection is the interface implemented by the collection classes such as ArrayListTreeSet and HashSet implement. It defines the methods common to them all.
  3. Collections is a class of utility methods for working on collections, e.g. sorting and binary search, even though they only work on Lists.
Most of the Collection classes live in the java.util package.
In Java version 1.5 or later the Collection classes are generified to keep track of the class of the elements permitted to be contained in them.
All classes that implement Collection automatically implement the superinterface Iterable, allowing you to iterate over all their elements.

Gotchas

  • A common programming error is to use the interface Collection when you meant the class Collections and vice versa.
  • The sort method is not part of the Collection interface. It lives in the Collections class. So you cannot say myCollectionsort(). You must sayCollections.sortmyCollection ); However, sort will not work on Collections generally, only Lists.
  • Maps such as HashMap are not Collections, i.e. they don’t implement the Collection interface.

Examples of Collections

Some types of collections that implement Collection include the following:
Collection Classes and Interfaces
ClassImplements
or Extends
Collection
Interface
Interface or CollectionSynchronisedNotes
AbstractCollectionabstract class an interface to act as skeleton to build most classes that implement theCollection, interface.
java.lang.reflect.Arrayclass Reflection classes used to dynamically create Array objects where you don’t know anything about them at compile time. You probably were looking for Arraysinstead.
java.sql.Arrayinterface A class for storing and retrieving arrays in an SQL database. You probably were looking for Arrays instead.
ArrayListclassA non-thread-safe List that is marginally faster than thread-safe Vector. In versions prior to Java version 1.4, the difference was more pronounced — 3 to 4 times faster. Rolling your own non-Collection array is about 4 times faster thanVector. In earlier non-optimising JVMs, the difference was more like 40 times faster. The main overhead of using ArrayList and Vector is the casting needed to fish out the elements. ArrayList is like a [] array that automatically grows. Lookup by dense ints. Considerably faster than a HashMap.
Arraysclassstatic methods for sorting and searching ordinary arrays, e.g. binarySearch,fillsortasList converts an array into a List Collection you can then feed to methods expecting a Collection or List. Don’t confuse this with the two Array classes.
CollectioninterfaceThe interface that all collections implement that guarantee you can always add,removeclearsizeiteratorcontainstoArray etc. in a standard way. Don’t confuse this with the Collections class. toArray can be made smarter if you pass it an empty array to fill just the right size. Then you have an array of specific classes, not just generic Objects. Use it like this:
Dogs[] dogs =
( Dogs [] )dogArrayList.toArray
( new Dogs [dogArrayList.size()] );
The CollectiontoString method implemented by all Collections is a great debugging tool to dump out the entire contents of a Collection in a reasonably human-readable format enclosing each element in [].
Collectionsclassstatic methods for operating on collections, e.g. sortbinarySearchminmax,shuffle. Don’t confuse the Collections class with the Collectioninterface.
javax.swing.tree.DefaultTreeModelclassThis was written to support Swing’s JTree, but it will suffice as a generic tree collection.
HashMapclassMap, an unsynchronized Hashtable. Allows unordered lookup by key, usually a String. It is thread safe if you only do lookups. Otherwise you must synchronise externally, or use a Hashtable. No duplicates allowed.
HashSetclassSet, a collection of unique objects, without ability to lookup by key, usually a String. Not ordered. Often used to maintain a list of legal values, e.g. state abbreviations. You can quickly find out if a given value is in the legal list.
HashtableclassMap, a synchronised HashMap. Allows unordered lookup by key, usually aString. Note, this is spelled Hashtable not Hash Table.
IdentityHashMapclassIt is like a Hashtable. Used in manipulating graphs where you have to keep track of which nodes have already been visited. In Java version 1.4 or later. It compares with == instead of equals.
LinkedHashMapclassHashMap that also threads the objects together, usually in insertion order. This way you can retrieve the elements in insertion order with almost no more overhead than a HashMap. In Java version 1.4 or later.
LinkedListclassList that behaves like Vector, but implemented as a doubly linked chain of objects. Fast for insert/delete. Slow for indexing.
Listinterface specifies an ordered sequence of elements. Implemented by ArrayListVectorand LinkedList.
Mapinterface supports lookup of objects by unique key. Keys may or may not be ordered.
Queueinterface Java version 1.5 or later. Used for enqueing objects waiting to be processed. Has some very complex multi-thread abilities. It might be looked on more as threading synchronisation package than a collection.
RandomAccessinterface An interface does not do anything. It has no methods. It is just a marker that lets you know random access to a List is efficient, e. g. LinkedList would not implement it, whereas ArrayList would. In Java version 1.4 or later.
Setinterface a mathematical set of elements, with no duplication. No lookup by key.
SortedMapinterface Map interface that guarantees sorted keys, usually Strings. implemented byTreeMap.
SortedSetinterface Set interface that guarantees iteration will be in ascending order. Implemented byTreeSet
Stackclass pushdown stack, a LIFO (Last IFirst Out) queue. You can also fudge a simple LIFO stack with LinkedList using use only addFirstremoveFirst.
TreeMapclassSortedMap using a red-black tree. Allows ordered lookup by key, usuallyStrings. Does not allow duplicate keys, unless you cheat to fool it into thinking they are not really duplicates. Slower than HashMap.
TreeSetclassSortedSet using a red-black tree. Allows ordered access, but no access by key. Slower than HashSet.
VectorclassList. This is the same old Vector you are familiar with, now implementingCollection. It acts like a []- style array that automatically grows as needed. It is a synchronised ArrayList, i.e. suitable for use when more than one thread accesses the Vector.

Missing Collections

Collections that you would expect to find in Java prior to 1.5 but which are not included are:
  • FIFOQueue, a FIFO (First IFirst Out) buffering queue.
  • FIFOPriorityQueue, a FIFO queue with N levels of priority, so that the top priority items are normally taken first, but it is possible to service the lower priorities less frequently, e.g. serve four from top level, for every two from next level for eveyone from the level below, or with priority bumping if an item waits in a queue past a given threshold. You could use it for applications like airline boarding, emergency waiting rooms, computer task scheduling or even your personal todo list. You can fudge a simple FIFO queue with LinkedList using use only addFirstremoveLast.
  • SortedList that gives the illusion of always automatically staying sorted. I wrote one.
  • Cached collections that automatically drop elements not used in a long time to keep the collection from growing too large.
  • Persistent collections that automatically load from disk and save back to disk.
  • For other missing collections check out these sources:
    AmberArcher
    Jakarta Common Collections (includes bidirectional map)
    PCJ
    Trove4J
    Walt Disney Tea Trove (includes: FlyweightSet, PropertyMap, SortedArrayList)
In Java version 1.5 some of these deficiencies were rectified with the Queue interface.

Duplicate Keys

Nearly all the collections prevent you from storing duplicate keys. There are several ways around that:
  • Write your own collection that permit duplicates.
  • Invent tie breaker logic to create auxiliary tail ends to the key to artificially make all keys unique, e. g. First key "rover" goes in as "rover" and the next as "rover2" Before inserting, you must alway check if the key has already been added, and if so, try a tie breaker until you find one that has not.
  • Instead of storing data objects in your map, store a FIFOQueue of data objects that share the same key. This preserves FIFO order on duplicates. When there are no duplicate for a given keys, store a data object directly. You could similarly handle the problem with ArrayLists of duplicates instead of FIFOQueues. FIFOQueues have the advantage of not needing to estimate the number of duplicates for each key ahead of time.