GSIP 68 - Introduce GUAVA library as dependency

Overview

Introduce guava-libraries as a GeoServer core dependency and provide some general guidelines on when, why, and how to use them

Proposed By

Gabriel Roldán

Assigned to Release

2.2.0.

State

Completed

Motivation

I've been using some of the guava utilities for the most part of last year in other GeoServer related projects. At the mailing list we decided a GSIP would be worth it as an introduction to its benefits and as a reference for other GeoServer developers.

Proposal

This proposal aims at introducing the Google core guava-libraries as a core GeoServer dependency and to provide some guidelines and material for the progressive adoption of its utility classes, ranging from collections utilities, to IO, concurrent, primitive and String oprations, cache facilities, and more.

What does it bring in?

In a nutshell, excerpt from the Guava Explained wiki:

  • Basic utilities: Make using the Java language more pleasant
  • Collections: Guava's extensions to the JDK collections ecosystem. These are some of the most mature and popular parts of Guava.
  • Caches: Local caching, done right, and supporting a wide variety of expiration behaviors.
  • Functional idioms: Used sparingly, Guava's functional idioms can significantly simplify code.
  • Concurrency: Powerful, simple abstractions to make it easier to write correct concurrent code.
  • Strings: A few extremely useful string utilities: splitting, joining, padding, and more.
  • Primitives: operations on primitive types, like int and char, not provided by the JDK, including unsigned variants for some types.
  • Ranges: Guava's powerful API for dealing with ranges on Comparable types, both continuous and discrete.
  • I/O: Simplified I/O operations, especially on whole I/O streams and files, for Java 5 and 6.
  • Hashing: Tools for more sophisticated hashes than what's provided by Object.hashCode(), including Bloom filters.
  • EventBus: Publish-subscribe-style communication between components without requiring the components to explicitly register with one another.
  • Math: Optimized, thoroughly tested math utilities not provided by the JDK.

How easy is to add it?

It's on maven central, so just:

   <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>11.0.1</version>
    </dependency>

What about size/footprint

It's a single but sizable Jar, around 1.5 MB. In order not to increase the size of our downloads too much, it looks like at least we could get rid for the following libraries (thanks Andrea):
1,6M aspectjweaver-1.6.8.jar
1,2M xercesImpl-2.7.1.jar

What are some concrete benefits for GeoServer?

The following are just some small concrete examples of using Guava utilities in GeoServer, and focus only on the bits that I got to use so far.

Caches.

We use a lot of caches. Specially in core classes like CatalogImpl and ResourcePool.
Some are plain HashMap, some others are custom crafted specializations of SoftValueHashMap . Some need to do additional clean up when a resource is evicted from the cache.
So in ResourcePool we have all these cases. Replacing those HashMaps and custom classes by Guava Cache makes for doing more with less code:

  • Set cache capacity bound;
  • Entry expiration based on last access time or last read time;
  • Ability to use weak keys and/or soft value references
  • Concurrency hints ( the table is internally partitioned to try to permit the indicated number of
    concurrent updates without thread contention.)
  • For the cases where resource clean up needs to be done upon entry eviction, encapsulates the cache population logic and entry eviction hooks into a single object, so
    related logic remains close.:
    
     ....
     CacheLoader<String, DataAccess> loader = new DataStoreLoader();
     Cache<String, DataAccess> dataStoreCache = CacheBuilder.newBuilder()
       .concurrencyLevel(10)
       .expireAfterAccess(10, TimeUnit.MINUTES)
       .initialCapacity(10);
       .maximumSize(100)
       .softValues();
       .removalListener(loader)
       .build(loader);
    
     
     ....
     class DataStoreLoader 
                extends CacheLoader<String, DataAccess> 
                implements RemovalListener<String, DataAccess> {
    
         @Override
         public DataAccess load(String id) throws Exception {
           DataAccess dataStore = ....
           return dataStore;
         }
    
         @Override
         public void onRemoval(RemovalNotification<String, DataAccess> notification) {       
             String id = notification.getKey();
             DataAccess da = notification.getValue();
             try {
                 da.dispose();
              } catch (Exception e) {
                 LOGGER.log(Level.WARNING, e.getMessage(), e);
              }
         }
      }
    
  • Eliminates the need for the "double checked logic anti-pattern", so that every get method on cacheable contents becomes basically:
       public void getFoo(someKey){
          return fooCache.get(someKey);
       }
    

instead of

 public void getFoo(someKey){
    Foo foo = fooCache.get(someKey);
    if( foo == null ){
       synchronized(fooCache){
         foo = fooCache.get(someKey);
         if( foo == null ){
            foo = ....
            fooCache.put(someKey, foo);
         }
       }
    }
    return foo;
 }

Here's a complete patch for using guava Cache in ResourcePool, and the clean version of it.

Although that patch is not strictly part of this proposal, it would be a good thing to have once/if this proposal is accepted.

Collection utilities and functional style programming

class GeoServerDataProvider<T>{
...
  @Before
  public Iterator<T> iterator(int first, int count) {
      List<T> items = getFilteredItems();

      // global sorting
      Comparator<T> comparator = getComparator(getSort());
      if (comparator != null) {
          Collections.sort(items, comparator);
      }

      // in memory paging
      int last = first + count;
      if (last > items.size())
          last = items.size();
      return items.subList(first, last).iterator();
  }

  @After
  public Iterator<T> iterator(int first, int count) {
      Iterable<T> items = getFilteredItems();

      // global sorting
      Comparator<T> comparator = getComparator(getSort());
      if (comparator != null) {
          items = Ordering.from(comparator).sortedCopy(items);
      }

      // in memory paging
      Iterator<T> iterator = items.iterator();
      Iterators.skip(iterator, first);
      return Iterators.limit(iterator, count);
  }

  @Before
  protected List<T> getFilteredItems() {
     List<T> items = getItems();

     // if needed, filter
     if (keywords != null && keywords.length > 0) {
         return filterByKeywords(items);
     } else {
         // make a deep copy anyways, the catalog does not do that for us
         return new ArrayList<T>(items);
     }
  }
  
  @After
  protected Iterable<T> getFilteredItems() {
      Iterable<T> items = getItems();

      // if needed, filter
      if (keywords != null && keywords.length > 0) {
          return filterByKeywords(items);
      } else {
          return items;
      }
  }

  @Before
  private List<T> filterByKeywords(List<T> items) {
      List<T> result = new ArrayList<T>();
     final Matcher[] matchers = getMatchers();

      List<Property<T>> properties = getProperties();
      for (T item : items) {
          ITEM:
          // find any match of any pattern over any property
          for (Property<T> property : properties) {
              Object value = property.getPropertyValue(item);
              // brute force check for keywords
              for (Matcher matcher : matchers) {
                  matcher.reset(String.valueOf(value));
                  if (matcher.matches()) {
                      result.add(item);
                      break ITEM;
                  }
              }
          }
      }

      return result;
  }

  @After
  private Iterable<T> filterByKeywords(Iterable<T> items) {
      final Matcher[] matchers = getMatchers();
      final List<Property<T>> properties = getProperties();
        
      Predicate<T> filter = new Predicate<T>() {
          @Override
          public boolean apply(T item) {
              for (Property<T> property : properties) {
                  Object value = property.getPropertyValue(item);
                  // brute force check for keywords
                  for (Matcher matcher : matchers) {
                      matcher.reset(String.valueOf(value));
                      if (matcher.matches()) {
                          return true;
                      }
                  }
              }
              return false;
          }
      };

      return Iterables.filter(items, filter);
  }

}

More functional style programming with functors

Before
class CatalogConfiguration implements org.geowebcache.config.Configuration{
...
    @Override
    public Iterable<GeoServerTileLayer> getLayers() {
        List<LayerGroupInfo> layerGroups = catalog.getLayerGroups();
        List<LayerInfo> layerInfos = catalog.getLayers();
        List[] sublists = { layerInfos, layerGroups };
        CompositeList composite = new CompositeList(sublists);
        LazyGeoServerTileLayerList tileLayers = new LazyGeoServerTileLayerList(composite, this);
        return tileLayers;
    }

    private static class CompositeList extends AbstractList<Object> {

        private final List<Object>[] decorated;

        @SuppressWarnings("unchecked")
        public CompositeList(List[] sublists) {
            this.decorated = sublists;
        }

        @Override
        public Object get(final int index) {
            int subIndex = index;
            List<Object> sublist;
            for (int i = 0; i < decorated.length; i++) {
                sublist = decorated[i];
                if (subIndex < sublist.size()) {
                    return sublist.get(subIndex);
                }
                subIndex -= sublist.size();
            }
            throw new IndexOutOfBoundsException();
        }

        @Override
        public int size() {
            int size = 0;
            List<Object> sublist;
            for (int i = 0; i < decorated.length; i++) {
                sublist = decorated[i];
                size += sublist.size();
            }
            return size;
        }
    }

    private static class LazyGeoServerTileLayerList extends AbstractList<GeoServerTileLayer> {

        private final List<Object> infos;

        private final CatalogConfiguration mediator;

        public LazyGeoServerTileLayerList(final List<Object> infos,
                final CatalogConfiguration catalogConfiguration) {
            this.infos = infos;
            this.mediator = catalogConfiguration;
        }

        @Override
        public GeoServerTileLayer get(int index) {
            Object object = infos.get(index);
            if (object instanceof LayerInfo) {
                return new GeoServerTileLayer(mediator, (LayerInfo) object);
            } else if (object instanceof LayerGroupInfo) {
                return new GeoServerTileLayer(mediator, (LayerGroupInfo) object);
            }
            throw new IllegalStateException();
        }

        @Override
        public int size() {
            return infos.size();
        }
    }
}
After
class CatalogConfiguration implements org.geowebcache.config.Configuration{
...
    @Override
    public Iterable<GeoServerTileLayer> getLayers() {

        Iterable<GeoServerTileLayer> layers = Iterables.transform(catalog.getLayers(),
                new Function<LayerInfo, GeoServerTileLayer>() {
                    @Override
                    public GeoServerTileLayer apply(LayerInfo layer) {
                        CatalogConfiguration mediator = CatalogConfiguration.this;
                        return new GeoServerTileLayer(mediator, layer);
                    }
                });

        Iterable<GeoServerTileLayer> layersGroups = Iterables.transform(catalog.getLayerGroups(),
                new Function<LayerGroupInfo, GeoServerTileLayer>() {
                    @Override
                    public GeoServerTileLayer apply(LayerGroupInfo layerGroup) {
                        CatalogConfiguration mediator = CatalogConfiguration.this;
                        return new GeoServerTileLayer(mediator, layerGroup);
                    }
                });

        return Iterables.concat(layers, layersGroups);
    }
}

So that's it? kind of a modern replacement for Apache commons?

No!. And maybe. There are lots of things than (IMHO) can be done better with guava than with commons-collections. But guava is way more than the collections utilities, and so is Apache commons-*. Both of them have utilities not present in each other, and some overlap. My personal preference is to use Guava from now on for all collection utilities needs, as it's more modern, well designed, faithfully respects the Java collection contracts, leverages immutability and code clarity, is under active development and well supported. But Apache commons is gonna be around for sure as there are a lot more to commons than collections.

Also, the point of this proposal is to present guava to you and recommend you take your own tour not only about the collection utilities, but also the I/O, net, primitives, concurrent, etc.

Where can I find more information about Guava?

Googling gives as usual thousands of links. Here are some of the ones that seemed more appealing to me:

Feedback

Jody Garnett:

...Well I really like that set of capabilities; while it would represent an increased learning curve to work on GeoServer - it would be a win if we could remove a few more dependencies. We may need to duck back into GeoTools to make that happen; but that would perhaps not be a bad thing.

...We are welcome to peruse this library for GeoServer prior to that point. I also have some uDig code that used the earlier google collections library that I can fix up (and get some experience).

So you are getting two bits of feedback:

  • Yes - but not for GeoTools until after 8.0
  • A good trade if we cut down or out the other dependencies (coming from GeoTools)

Justin Deoliveira:

The guava library looks beautiful, no question there, and there is a lot of hype around it at the moment on all the java blogs. But as I mentioned before, and as jody mentioned i don't love the idea just lumping on another utility library. Obviously it leads to much nicer code, and has some functionality we don't have now but without a concrete problem it solves i don't see that as justification enough alone. It is already enough of a maze trying to look up the right utility class to use when you have to do something, this will make it worse.

I would actually be more in favor of a lower level effort at the geotools level to replace commons with guava. Obviously though that is a larger effort and by no means meant to block the proposal

Andrea Aime:

I feel the same, but at the same time I'm worried the code will turn into COBOL pretty soon if we don't do some effort to modernize it.
The situation with scripting languages and the various "java successors" seems like a grand royal mess that is not going to give us a clear successor to Java anytime soon, so we better try to get onto more compact/modern code and try to prolongue the life of the code base as much as possible.

Of course once we adopt Guava we must make an effort to use it instead of commons wherever
possible/makes sense to get some uniformity back.

Backwards Compatibility

As the proposal aims to adding a new set of utilities to the class path for progressive adoption, there are no backwards compatibility issue foreseen.

Voting

Andrea Aime: +1
Alessio Fabiani: +1
Ben Caradoc-Davies: +0
Gabriel Roldán: +1
Justin Deoliveira: +0
Jody Garnett: +1
Mark Leslie:
[~roba]:
Simone Giannecchini:

Links

Email Discussion
Jira

Added by Gabriel Roldán, last edited by Gabriel Roldán on Jan 26, 2012  (view change)
View Attachments (0) Info