Aug 072012
 

Using Neo4j’s HA server requires only slight modifications from the getting started guide to implement instances over a VPN. This article fully describes the process (and problems) initially encountered when setting up two machines that are located in separate data centers. The business logic required a dedicated master which does heavy computations to build the graph (using Neo4j’s embedded instance) while the slave machine exposes services for clients to query against (using the dedicated client). Let’s begin:

Step 1 – Setup the VPN
Use a simple master/slave static key setup with OpenVPN. After installing it in both machines, the Master should have the following minimum configuration:
(/etc/openvpn/server.conf)
dev tun
ifconfig 10.0.0.1 10.0.0.2
secret /etc/openvpn/static.key

while the Slave has the following:
(/etc/openvpn/client.conf)
remote xxx.xxx.xxx.xxx
dev tun
ifconfig 10.0.0.2 10.0.0.1
secret /etc/openvpn/static.key

The keys were generated by issuing: openvpn --genkey --secret static.key Start the services on both machines: service openvpn start and ensure that the Master can ping the Slave on 10.0.0.2 and that the Slave can ping the Master on 10.0.0.1.

Step 2 – Setup Zookeeper
After connectivity has been established, we set up Zookeeper which Neo4j relies upon for maintaining the distributed synchronization. Even though I am using the embedded server on the Master, it does not start Zookeeper automatically. Zookeeper packaged with the standalone server still needs to be downloaded and ran separately. If you would like it to be started from within your own application, you would have to start using something like this: https://svn.neo4j.org/components/ha/trunk/src/test/java/org/neo4j/ha/LocalhostZooKeeperCluster.java Use the following configuration for the Master:
(conf/coord.cfg)
tickTime=3000
initLimit=20
syncLimit=10
server.1=10.0.0.1:2888:3888
server.2=10.0.0.2:2889:3889
dataDir=data/coordinator/
clientPort=2181

and the following for the Slave:
(conf/coord.cfg)
tickTime=3000
initLimit=20
syncLimit=10
server.1=10.0.0.1:2888:3888
server.2=10.0.0.2:2889:3889
dataDir=data/coordinator/
clientPort=2182

Please note that you must listen on the interface that the VPN created and not to use localhost! I had problems with this which resulted in several different exceptions from both the Slave and the Master, including:

java.lang.RuntimeException: Tried to join the cluster, but was unable to
Caused by: java.lang.RuntimeException: Gave up trying to copy store from master
Caused by: org.neo4j.kernel.ha.zookeeper.NoMasterException: No master
java.lang.NullPointerException Internal shutdown of HA db[2] reference=HighlyAvailableGraphDatabase[/opt/neo4j-enterprise-1.7.2/data/graph.db, ha.server_id:2] MasterServer=null Internal shutdown
org.neo4j.kernel.InformativeStackTrace: Internal shutdown
Reevaluation ended in unknown exception java.lang.NullPointerException so shutting down null
java.lang.NullPointerException Internal shutdown of HA db[2] reference=HighlyAvailableGraphDatabase[/opt/neo4j-enterprise-1.7.2/data/graph.db, ha.server_id:2], masterServer=null Internal shutdown
org.neo4j.kernel.InformativeStackTrace: Internal shutdown
Error in ZooClient.process Unable to read neo store header information
java.lang.RuntimeException: Unable to read neo store header information
java.nio.channels.ClosedChannelException
java.io.IOException: Connection reset by peer

Any of those were a result of connectivity issues caused by not listening on the correct interface. The next step is to update the unique key that each server instance relies upon to determine who is who:

Master:
(data/coordinator/myid)
1

Slave:
(data/coordinator/myid)
2

Finally, run Zookeeper on both Master and Slave: ./neo4j-coordinator Ensure that Zookeeper is working properly on both systems by looking at the Master’s log:
(data/log/neo4j-zookeeper.log)
INFO WorkerReceiver Thread org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 2 (n.leader), 0 (n.zxid), 1 (n.round), LEADING (n.state), 2 (n.sid), FOLLOWING (my state)
INFO QuorumPeer:/0:0:0:0:0:0:0:0:2181 org.apache.zookeeper.server.quorum.Learner - Getting a snapshot from leader
INFO QuorumPeer:/0:0:0:0:0:0:0:0:2181 org.apache.zookeeper.server.quorum.Learner - Setting leader epoch 1
INFO QuorumPeer:/0:0:0:0:0:0:0:0:2181 org.apache.zookeeper.server.persistence.FileTxnSnapLog - Snapshotting: 100000000

… and on the Slave’s log:
(data/log/neo4j-zookeeper.log)
INFO QuorumPeer:/0:0:0:0:0:0:0:0:2182 org.apache.zookeeper.server.ZooKeeperServer - Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 60000 datadir data/coordinator/version-2 snapdir data/coordinator/version-2
INFO QuorumPeer:/0:0:0:0:0:0:0:0:2182 org.apache.zookeeper.server.persistence.FileTxnSnapLog - Snapshotting: 0
INFO LearnerHandler-/10.0.0.1:43712 org.apache.zookeeper.server.quorum.LearnerHandler - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@594bfbf1
INFO LearnerHandler-/10.0.0.1:43712 org.apache.zookeeper.server.quorum.Leader - Have quorum of supporters; starting up and setting last processed zxid: 42949

Step 3 – Setup Neo4j
Now that we have Zookeeper instances properly communicating with each other, the next step is to configure Neo4j to work in high availability mode. Edit the Master’s configuration files:
(conf/neo4j.properties)
online_backup_enabled=true
ha.server_id=1
ha.coordinators=10.0.0.1:2181,10.0.0.2:2182
ha.slave_coordinator_update_mode=none
ha.cluster_name = my.cluster
ha.pull_interval = 10

(conf/neo4j-server.properties)
org.neo4j.server.database.mode=HA

… and the Slave’s configuration files:
(conf/neo4j.properties)
online_backup_enabled=true
ha.server_id=2
ha.coordinators=10.0.0.2:2182,10.0.0.1:2181
ha.slave_coordinator_update_mode=none
ha.cluster_name = my.cluster
ha.pull_interval = 10

(conf/neo4j-server.properties)
org.neo4j.server.database.mode=HA

Note that the ha.server_id corresponds to the id in myid. We enabled ha.slave_coordinator_update_mode=none to ensure that our Slave will never become master during election. Once both configuration files have been created, we can start both the Master (using Java) and our Slave (./neo4j) in this order.

Voila! If all went well the Master instance will be replicating information over to the Slave. If issues are encountered, be sure to check out the logging in data/log/ and in data/log/graph.db/messages.log I hope this helps!