Best Practice: Synchronization (Data API)

Synchronize the content of your API project to your server.

The Outdoractive Data API is often used to synchronize the content of an Outdooractive API project to another system. Initially all the content of a project is just copied. All updates to the content have to be loaded periodically (e.g. daily) to keep the content up to date.

The challenge is to get notice of changes as soon as possible and to apply updates with minimal costs (bandwidth, database, filesystem…).

This tutorial explains an implementation of the synchronization that uses 4 simple steps.

Here is a short overview of this article:

Setup

There are a lot of different ways how the content read from the Outdooractive Platform is persited in other systems:

  • using only simple caching
  • parse data and save it to a data base
  • use data as part of a search index
  • save data to a file system
  • etc.

Persisting the content in the file system is a very simple approach and will be used to explain how the basic synchronization algorithm works. Virtual sample data will be used to show what happens in each step.

This tutorial describes how the tours of an Outdooractive API Project are synchronized but the algorithm is the same for POIs.

Persisting in the File System

Let’s start with a root folder for a your project’s data:

/localdata/api-dev-oa/

Create two subfolders to separate the tour data xml files from the id lists that will be used to compute which tours have to be loaded:

/localdata/api-dev-oa/ids
/localdata/api-dev-oa/tours

The list of tours of the Outdooractive API project will be placed as XML and CSV file into the “ids” folder:

/localdata/api-dev-oa/ids/tourids.xml
/localdata/api-dev-oa/ids/tourids.csv

As soon as the synchronization procedure starts, a temporary CSV file of the last run’s id list is placed here:

/localdata/api-dev-oa/ids/tourids.last-run.csv

The basic algorithm computes three additional lists of tours that have to be processed:

/localdata/api-dev-oa/ids/tourids.diff.new.csv
/localdata/api-dev-oa/ids/tourids.diff.modified.csv
/localdata/api-dev-oa/ids/tourids.diff.deleted.csv

Finally all tours of your project will be persisted as a XML file in the folder /tours:

/localdata/api-dev-oa/tours/tour.1.xml
/localdata/api-dev-oa/tours/tour.2.xml
...

That’s the simple directory structure that will be filled with data later.

You might even keep track of changes using a version conrol system, e.g. a local git repository. As the xml files are all pretty printed the version control diffs let you inspect data changes.

Basic Algorithm

The synchronization is done in four simple steps:

  1. Get the current list of tour ids of your project from the Outdooractive Data API
  2. Compare this list with the tours in your file system
  3. Load and store modified and new tours
  4. Delete tours that are not part of your project any more

This algorithm works for an initial synchronization and for cron jobs that synchronize the data periodically (e.g. daily or every 6 hours).

Data API Endpoints

This tutorial uses the Outdooractive API Test Keys:

Test Project Key

api-dev-oa

Test API Key

yourtest-outdoora-ctiveapi

You have to use your keys in your implementation.

The base url of an API Test project is:

http://www.outdooractive.com/api/project/api-dev-oa

The basic algorithm described later only uses two Data API endpoints to do its job.

The /tours endpoint to get the list of tour ids:

/tours?key=yourtest-outdoora-ctiveapi&lastModifiedAfter=01.01.1970

The parameter lastModifiedAfter ensures that the date of last modification of the tours is part of the response.

The /oois endpoint to get the XML

/oois/1373438,1397449?key=yourtest-outdoora-ctiveapi

The complete urls to these two endpoints are:

http://www.outdooractive.com/api/project/api-dev-oa/tours?key=yourtest-outdoora-ctiveapi&lastModifiedAfter=01.01.1970

http://www.outdooractive.com/api/project/api-dev-oa/oois/1373438,1397449?key=yourtest-outdoora-ctiveapi

Data Structure

The Outdoorative Data API serves the list of your project’s tour ids as XML or JSON representation. All examples of this article use virtual sample data with simple ids from 1 to 5 and lastModified dates at midnight.

A sample XML output of the /tours endpoint is:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<datalist xmlns="http://www.outdooractive.com/api/">
  <data id="1" lastModified="2015-01-01T00:00:00.000+01:00"/>
  <data id="2" lastModified="2015-01-10T00:00:00.000+01:00"/>
  <data id="3" lastModified="2015-06-20T00:00:00.000+01:00"/>
  <data id="4" lastModified="2015-08-30T00:00:00.000+01:00"/>
</datalist>

The same output as JSON representation is (set the content type header of your request to json):

{ 
  "data": [
     { "id": 1, "lastModified": "2015-01-01" },
     { "id": 2, "lastModified": "2015-01-10" },
     { "id": 3, "lastModified": "2015-11-02" },
     { "id": 4, "lastModified": "2015-11-01" }
  ]
}

Let’s assume that we transform the XML or JSON output into a simple CSV file with the columns “id” and “lastModified” using “;” as field separator:

1;2015-01-01
2;2015-01-10
3;2015-11-02
4;2015-11-01

This Tutorial prints the content of the CSV file formatted (to make it easier to read):

id: 1 lastModified: 2015-01-01
id: 2 lastModified: 2015-01-10
id: 3 lastModified: 2015-11-02
id: 4 lastModified: 2015-11-01

A. Initial Synchronization

If your system uses the Outdooractive Data API to synchronize your project’s content then you will start with empty /tours and /ids folders. As soon as your synchronization implementation is ready you will start with an initial synchronization.

Let’s assume your initial synchronization is done at the following date:

2015-11-01

Your local folders are still empty:

/localdata/api-dev-oa/ids/   (empty folder)

/localdata/api-dev-oa/tours/ (empty folder)

Let’s apply the basic algorithm.

A.1. Get the current list of tour ids

Following the basic algorithm we grab the list of all tour ids (and lastModified dates) of your project from the Outdooractive Data API.

The virtual sample data of those files corresponds to this list of ids and dates (printed in a readable format):

id: 1 lastModified: 2015-01-01
id: 2 lastModified: 2015-01-10
id: 3 lastModified: 2015-06-20
id: 4 lastModified: 2015-08-30

Now we know the ids of all tours of your project and the date of their last modification.

A.2. Compare the id lists

As it’s the initial synchronization there is no list of tour ids locally yet and we have nothing to compare. This means the list of ids of all locally existing tours, the list of modified tours and the list of deleted tours are all empty.

List of locally existing tours:

empty list

List of modified tours:

empty list

List of deleted tours:

empty list

The list of new tours equals to the list grabbed from the Outdooractive Data API.

List of new tours:

id: 1 lastModified: 2015-01-01
id: 2 lastModified: 2015-01-10
id: 3 lastModified: 2015-06-20
id: 4 lastModified: 2015-08-30

A.3. Load and store modified and new tours

Load all tours with ids out of the list of new tours and save the XML response to the file system.

Loaded tours:

/localdata/api-dev-oa/tours/tour.1.xml
/localdata/api-dev-oa/tours/tour.2.xml
/localdata/api-dev-oa/tours/tour.3.xml
/localdata/api-dev-oa/tours/tour.4.xml

A.4. Delete tours that are not part of your project any more

It’s the initial synchronization. The list of deleted tours is empty and we have nothing to delete.

Initial Synchronization is done

We successfully synchronized 4 tours!

The initial synchronization is done and we are ready to synchronize updates.

B. Periodical Synchronization

Imagine we already successfully finished an initial synchronization (A.). Now we use the same script based on the basic algorith in a cron job for a daily synchronization.

We assume that the first time this daily sync script runs is at the following date:

2015-11-02

The same 4 steps of the basic algorithm have to be done again.

B.1. Get the current list of tour ids

Let’s assume that the new list of tours (2015-11-02) from the Data API is:

id: 2 lastModified: 2015-01-10
id: 3 lastModified: 2015-11-02
id: 4 lastModified: 2015-11-01
id: 5 lastModified: 2015-11-02

Note

Please check if the list of tours is empty! this might be the case when the data api responds with an error. Any other error that leads to an empty xml file or id list has to be checked. Otherwise the algorithm will delete all local tour data xml files and the complete project data will be read again during the next run of the cron job!

B.2. Compare the id lists

Now we look up the list of locally existing tours (id and lastModified):

id: 1 lastModified: 2015-01-01
id: 2 lastModified: 2015-01-10
id: 3 lastModified: 2015-06-20
id: 4 lastModified: 2015-08-30

The local list and the list from the Data API compared side by side:

Local tours (tours.csv):            From Data API (tours.tmp.csv):   State:
========================            ==============================   =======
id: 1 lastModified: 2015-01-01                                       <- deleted
id: 2 lastModified: 2015-01-10      id: 2 lastModified: 2015-01-10
id: 3 lastModified: 2015-06-20      id: 3 lastModified: 2015-11-02   <- modified
id: 4 lastModified: 2015-08-30      id: 4 lastModified: 2015-11-01   <- modified
                                    id: 5 lastModified: 2015-11-02   <- new

The result of the comparison are three lists.

The List of new tours:

id: 5 lastModified: 2015-11-02

The List of modified tours:

id: 3 lastModified: 2015-11-02
id: 4 lastModified: 2015-11-01

The List of deleted tours:

id: 1 lastModified: 2015-01-01

B.3. Load and store modified and new tours

Now we concatenate the list of new and modified tours and iterate over the result list to load these tours over the Outdooractive Data API and to save the responses into XML files.

Afterwards the /tours folder holds one new file. Two of the existing files were updated (overwritten):

/localdata/api-dev-oa/tours/tour.1.xml
/localdata/api-dev-oa/tours/tour.2.xml
/localdata/api-dev-oa/tours/tour.3.xml (updated)
/localdata/api-dev-oa/tours/tour.4.xml (updated)
/localdata/api-dev-oa/tours/tour.5.xml (new)

B.4. Delete tours that are not part of your project any more

We delete all tours that are on the list of deleted tours created in step 3.

List of deleted tours:

id: 1 lastModified: 2015-01-01

Synchronization is done

The content of the tours folder consists of 4 files and is synchronized with the projects content of the Outdoractive Platform:

/localdata/api-dev-oa/tours/tour.2.xml
/localdata/api-dev-oa/tours/tour.3.xml
/localdata/api-dev-oa/tours/tour.4.xml
/localdata/api-dev-oa/tours/tour.5.xml

That’s it!

C. Shell Scripts

This section lists prototype scripts to implement the synchronisation algorithm described above.

C.1. Get the current list of tour ids

We move the CSV file with all tour ids that was created or updated in the last run to another place to use it for comparison later (skip this step during initial synchronisation):

mv /localdata/api-dev-oa/ids/tourids.csv /localdata/api-dev-oa/ids/tourids.last-run.csv

Use wget to request the list of tour ids over http and to save the response to an xml file:

wget -O /localdata/api-dev-oa/ids/tourids.xml "http://www.outdooractive.com/api/project/api-dev-oa/tours?key=yourtest-outdoora-ctiveapi&lastModifiedAfter=01.01.1970"

The /ids folder now looks like:

/localdata/api-dev-oa/ids/tourids.xml          (new or modified)
/localdata/api-dev-oa/ids/tourids.last-run.csv (moved)

Let’s transform the XML file into a CSV file (e.g. using xslt) as a comma separated representation is the minimalistic representation of a two column list:

# change directory
cd /localdata/api-dev-oa/ids/

# use XSL transformation to CSV file
xsltproc data.xsl tourids.xml /
      # get rid of trailing spaces (xsltproc output)
      |tr -d ' '
      # get rid of first empty line of xsltproc output
      |tail -n +2
      # sort by ids
      |sort
      # save to temporary CSV file
      >tourids.csv

This XSL transformation is used to transform XML into CSV:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:oa="http://www.outdooractive.com/api/">

  <xsl:output method="text" indent="no"/>

  <xsl:template match="/oa:datalist/oa:data[@id]">
    <xsl:value-of select="@id" />
    <xsl:text>;</xsl:text>
    <xsl:value-of select="@lastModified" />
  </xsl:template>

</xsl:stylesheet>

Now we have two (in case of initial synchronisation) or three files (in case of periodical synchronisation):

/localdata/api-dev-oa/ids/tourids.xml          
/localdata/api-dev-oa/ids/tourids.csv          (new)
/localdata/api-dev-oa/ids/tourids.last-run.csv

C.2. Compare the id lists

This bash script does the comparison and creates three CSV files:

# change the directory
cd "/localdata/api-dev-oa/ids/"

# create a diff file
diff -C 0 tourids.last-run.csv tourids.csv > tourids.diff;

# extract all lines representing new ids out of diff file with help of grep
grep '^+ ' tourids.diff /

  # extract id out of line
  |cut -d ' ' -f 2|cut -d ';' -f 1

  # save ids of all new tours to a CSV file with one column
  > tourids.diff.new.csv;

# do the same with deleted tours
grep '^- ' tours.diff|cut -d ';' -f 2|sort|uniq > tourids.diff.deleted.csv;

# ... and modified tours
grep '^! ' tours.diff|cut -d ';' -f 2|sort|uniq > tourids.diff.modified.csv;

Finally there are 5 files placed in the /ids folder:

/localdata/api-dev-oa/ids/tourids.xml
/localdata/api-dev-oa/ids/tourids.csv
/localdata/api-dev-oa/ids/tourids.last-run.csv

/localdata/api-dev-oa/ids/tourids.diff.modified.csv
/localdata/api-dev-oa/ids/tourids.diff.new.csv
/localdata/api-dev-oa/ids/tourids.diff.deleted.csv

C.3. Load and store modified and new tours

Now we concatenate the list of new and modified tours and iterate over the result list to load these tours:

# use the .csv files as input
cat /localdata/api-dev-oa/ids/tourids.updated.csv
    /localdata/api-dev-oa/ids/tourids.new.csv /

    # split each line by ';' and output only id (first column)
    |cut -d ';' -f 1 /

    # iterate over all lines
    |(while read i; do

       # xml file
       f = "/localdata/api-dev-oa/tours/tour.$i.xml";

       # Data API endpoint Url
       u = "http://www.outdooractive.com/api/project/api-dev-oa/oois/$i?key=yourtest-outdoora-ctiveapi";

       # http request to Data API
       # save response to .xml file
       wget -O $f $u;

       # wait 3 seconds
       sleep 3;

      done;)

The bash script above executes the following single commands:

# request tour with id 1
wget -O "/localdata/api-dev-oa/tours/tour.1.xml" "http://www.outdooractive.com/api/project/api-dev-oa/oois/1?key=yourtest-outdoora-ctiveapi";
sleep 3;

# request tour with id 2
wget -O "/localdata/api-dev-oa/tours/tour.2.xml" "http://www.outdooractive.com/api/project/api-dev-oa/oois/2?key=yourtest-outdoora-ctiveapi";
sleep 3;

# request tour with id 3
wget -O "/localdata/api-dev-oa/tours/tour.3.xml" "http://www.outdooractive.com/api/project/api-dev-oa/oois/3?key=yourtest-outdoora-ctiveapi";
sleep 3;

# request tour with id 4
wget -O "/localdata/api-dev-oa/tours/tour.4.xml" "http://www.outdooractive.com/api/project/api-dev-oa/oois/4?key=yourtest-outdoora-ctiveapi";

The folder /tours consists of 4 files now:

/localdata/api-dev-oa/tours/tour.1.xml
/localdata/api-dev-oa/tours/tour.2.xml
/localdata/api-dev-oa/tours/tour.3.xml
/localdata/api-dev-oa/tours/tour.4.xml

Note

If your project consists of a lot of tours then loading several tours within a single request speeds up the synchronization process. We recommend using 100 ids per request. Please implement a break of at least 3 seconds between successive requests (sleep 3):
wget -O "/localdata/api-dev-oa/ids/tours.1_2_3_4.xml" /
        "http://www.outdooractive.com/api/project/api-dev-oa/oois/1,2,3,4?key=yourtest-outdoora-ctiveapi";
sleep 3;
Iterate through the XML response and break it into single tour xml files to match up the algorithm described in this article that keeps a xml file for each tour.

C.4. Delete tours that are not part of your project any more

The last step out of the basic algorithm deletes all tours from the file system that were removed from your project:

cd /localdata/api-dev-oa/tours/

cat ../ids/tourids.diff.deleted.csv|(while read i; do
   rm "tour.$i.xml";
done;)

Images

Use our Outdooractive Image Servers and integrate the images with the Outdooractive Image Urls directly into your webserver.

If you have to synchronize the images to your server it is important that you only load images if they were changed. You have to delete images that were removed from tours and POIs.

Database Prototype

A minimal database with one table (using PostgreSQL here) shows how a minimal implementation may look like if the tour data is hold in a database:

CREATE TABLE tours (
   id           bigint PRIMARY KEY NOT NULL,
   lastModified text   NOT NULL,
   data         xml    NOT NULL
);

The tour id and the date of last modification are needed for comparison with the data from the Outdooractive Platform and are put into separate columns.

The tour data is placed as XML data into the column “data”. Modern database systems like PostgreSQl have a XML datatype that allows to use XPath inside SQL queries. This means that a simple query with one XPath works well to list all tour ids and tour titles.

The sql queries used in a database prototype roughly look like:

-- table to store tour ids and date of last modification temporarily
create table tmp_tours(
       id bigint primary key not null,
       lastmodified text not null
);

-- table of tours including xml data
create table tours(
       id bigint primary key not null,
       lastmodified text not null,
       data xml not null
);

-- make temp table empty
delete from tmp_tours;

-- grab id list from data api
-- put it into tmp_tours table
insert into tmp_tours(id,lastmodified) values(...)

-- find modified tours
select * from tmp_tours a
       inner join tours b
       on a.id=b.id and a.lastmodified<>b.lastmodified;

-- for each row:
-- grab the tour xml from the data api
-- update tours table
update tours set lastmodified='NEW-DATE', data='XML-BLOB' where id='id';

-- find new tours
select * from tmp_tours where id not in (select id from tours);

-- for each row:
-- grab the tour xml from the data api
-- insert into the tours table
insert into tours(id, lastmodified,xml) values...

-- delete tours
delete from tours where id not in (select id from tmp_tours);

Note

Just file a support ticket if you are interested in a basic prototype of the synchronization in "any other" programming language or a full prototype using a PostgreSQL database.
Jens Schwarz