Installation: Amazon Elastic MapReduce (EMR)


The document describes how to deploy ATSD on HBase with AWS S3 as the underlying file system.

Operational Advantages

  • Scale storage and compute layers independently to handle a variety of use cases, including the small cluster/large dataset scenario.
  • Dynamically adjust the number of region servers based on auto-scaling rules.
  • Simplified backup and recovery.
  • Reduced storage footprint. No need for 3-x data replication and additional disk space required for HFile compactions.
  • Increased resilience based on AWS S3 reliability and durability.
  • Read-only cluster replicas.

The minimum cluster size supported by this installation option is two EC2 instances, one of which is shared by the HBase Master and ATSD.

Create S3 Bucket

Create the S3 bucket prior to installation. The bucket, named atsd in the example below, stores the hbase-root directory and contains both metadata and HFiles.

aws s3 mb s3://atsd

If necessary, the hbase-root directory is created by HBase when the cluster is started for the first time. The directory is not deleted when the cluster is stopped.

Check the contents of the bucket prior to launching the cluster.

aws s3 ls --summarize --human-readable --recursive s3://atsd

Download Distribution Files

  • Amazon EMR 5.0.x - 5.3.x with HBase 1.2.x
curl -o atsd-cluster.tar.gz
  • Amazon EMR 5.12.x - 5.17.x with HBase 1.4.x
curl -o atsd-cluster.tar.gz

Refer to AWS EMR Release Matrix for more information.

Extract and Upload Co-processor File

tar -xvf atsd-cluster.tar.gz atsd/atsd-hbase*jar

The atsd-hbase.$REVISION.jar file contains ATSD HBase co-processors and filters.

By uploading the .jar file to S3, Java classes in this file are automatically available to all region servers when they are started.

aws s3 cp atsd/atsd-hbase.*.jar s3://atsd/hbase-root/lib/atsd-hbase.jar

Verify that the .jar file is stored in S3:

aws s3 ls --summarize --human-readable --recursive s3://atsd/hbase-root/lib
2017-08-31 21:43:24  555.1 KiB hbase-root/lib/atsd-hbase.jar

Total Objects: 1
  Total Size: 555.1 KiB

Store the atsd-hbase.$REVISION.jar in a directory identified by the hbase.dynamic.jars.dir setting in HBase. By default this directory resolves to hbase.rootdir/lib.


When uploading the .jar file to hbase.rootdir/lib directory, the command removes the revision from the file name to avoid changing coprocessor.jar setting in ATSD when the .jar file is replaced.

Launch Cluster

Copy the AWS CLI cluster launch command into an editor.

Specify the correct EMR release label, for example emr-5.3.1 for HBase 1.2.x or emr-5.17.1 for HBase 1.4.x.

export CLUSTER_ID=$(            \
aws emr create-cluster          \
--name "ATSD HBase"             \
--applications Name=HBase       \
--release-label emr-5.3.1       \
--output text                   \
--use-default-roles             \
--ec2-attributes KeyName=<key-name>,SubnetId=<subnet>     \
--instance-groups               \
  Name=Master,InstanceCount=1,InstanceGroupType=MASTER,InstanceType=m4.large     \
  Name=Region,InstanceCount=3,InstanceGroupType=CORE,InstanceType=m4.large       \
--configurations '[{
     "Classification": "hbase",
     "Properties": { "hbase.emr.storageMode": "s3" }
     "Classification": "hbase-site",
     "Properties": { "hbase.rootdir": "s3://atsd/hbase-root" }
  }]'              \

Specify Network Parameters

Replace <key-name> and <subnet> parameters.

The <key-name> parameter corresponds to the name of the private key used to log in to cluster nodes.

The <subnet> parameter is required when launching particular instance types. To discover the correct subnet for your account, launch a sample cluster manually in the AWS EMR Console and review the settings using AWS CLI export.

--ec2-attributes KeyName=ec2-pkey,SubnetId=subnet-6ab5ca46,EmrManagedMasterSecurityGroup=sg-521bcd22,EmrManagedSlaveSecurityGroup=sg-9604d2e6    \

Specify Initial Cluster Size

Adjust EC2 instance types and total instance count for the RegionServers group as needed. Review AWS documentation for additional commands.

If needed, adjust the cluster size at runtime.

The minimum number of nodes in each instance group is 1, therefore the smallest cluster can have two EC2 instances:

Name=Master,InstanceCount=1,InstanceGroupType=MASTER,InstanceType=m4.large        \
Name=Region,InstanceCount=1,InstanceGroupType=CORE,InstanceType=m4.large          \

Enable Consistent S3 View

For long-running production clusters, enable EMR Consistent View, which identifies inconsistencies in S3 object listings and resolves them using retries with exponential timeouts. When this option is enabled, the HBase metadata is also stored in a DynamoDB table.

Checks are enabled by adding the Consistent setting to the launch command.

--emrfs Consistent=true,Args=[fs.s3.consistent.metadata.tableName=EmrFSMetadata]   \

Note that the EMR service does not automatically remove the specified DynamoDB table when a cluster is stopped. Delete the DynamoDB table manually after the cluster is shutdown. When running multiple clusters concurrently, ensure that each cluster uses a different DynamoDB table name to avoid collisions (default table name is EmrFSMetadata.

Launch Cluster

Launch the cluster by executing the above command. The command returns a cluster ID and stores it as an environment variable.

Verify HBase Status

Log in to Master Node

Monitor cluster status until the bootstrapping process is complete.

watch 'aws emr describe-cluster --cluster-id $CLUSTER_ID | grep MasterPublic | cut -d "\"" -f 4'

Determine the public IP address of the HBase Master node.

export MASTER_IP=$(aws emr describe-cluster --cluster-id $CLUSTER_ID | grep MasterPublic | cut -d "\"" -f 4) \
&& echo $MASTER_IP

Specify path to private SSH key and log in to the node.

ssh -i /path/to/<key-name>.pem -o StrictHostKeyChecking=no hadoop@$MASTER_IP

Wait until HBase services are running on the HMaster node.

watch 'initctl list | grep hbase'
hbase-thrift start/running, process 8137
hbase-rest start/running, process 7842
hbase-master start/running, process 7987

Verify HBase version (1.2.3+) and re-run the status command until the cluster becomes operational.

echo "status" | hbase shell

Wait until the cluster initializes and the Master is initializing error is no longer visible.

1 active master, 0 backup masters, 4 servers, 0 dead, 1.0000 average load

Install ATSD

Log in to the server where you plan to install ATSD. For testing and development, you can co-install ATSD on the HMaster node.

ssh -i /path/to/<key-name>.pem ec2-user@$PUBLIC_IP

Verify that JDK 8 is installed on the server.

javac -version
java version

Change to a volume with at least 10 GB of available disk space.

df -h
cd /mnt

Download ATSD distribution files.

curl -o atsd-cluster.tar.gz
tar -xvf atsd-cluster.tar.gz

Set Path to Java 8 in the ATSD start script.

JP=`dirname "$(dirname "$(readlink -f "$(which javac || which java)")")"` \
  && sed -i "s,^export JAVA_HOME=.*,export JAVA_HOME=$JP,g" atsd/atsd/bin/ \
  && echo $JP

Set Path to ATSD coprocessor file.

echo "coprocessors.jar=s3://atsd/hbase-root/lib/atsd-hbase.jar" >> atsd/atsd/conf/ \
  && grep atsd/atsd/conf/ -e "coprocessors.jar"

If installing ATSD on an HMaster node where ports are potentially already in use, redefine default ATSD port numbers to 9081, 9082, 9084, 9088, and 9443, respectively.

sed -i 's/=.*80/=90/g; s/=.*8443/=9443/g' atsd/atsd/conf/ \
  && grep atsd/atsd/conf/ -e "port"

Check memory usage and increase ATSD JVM memory to 50% of total physical memory installed in the server, if available.

nano atsd/atsd/conf/
JAVA_OPTS="-server -Xmx4000M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath="$atsd_home"/logs"

Open Hadoop properties file and specify HMaster hostname.

nano atsd/atsd/conf/
# localhost if co-installing ATSD on HMaster
hbase.zookeeper.quorum =

Start ATSD.


Monitor startup progress using the log file.

tail -f atsd/atsd/logs/atsd.log

It can take ATSD several minutes to create tables after initializing the system.

2017-08-31 22:10:37,890;INFO;main;org.springframework.web.servlet.DispatcherServlet;FrameworkServlet 'dispatcher': initialization completed in 3271 ms
2017-08-31 22:10:37,927;INFO;main;org.eclipse.jetty.server.AbstractConnector;Started SelectChannelConnector@
2017-08-31 22:10:37,947;INFO;main;org.eclipse.jetty.util.ssl.SslContextFactory;Enabled Protocols [TLSv1, TLSv1.1, TLSv1.2] of [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2]
2017-08-31 22:10:37,950;INFO;main;org.eclipse.jetty.server.AbstractConnector;Started SslSelectChannelConnector@

Log in to the ATSD web interface on https://atsd_hostname:8443. Modify the URL if the port is customized.


Port Access

Ensure that the Security Group associated with the EC2 instance where ATSD is running allows access to the ATSD listening ports.

If necessary, add security group rules to open inbound access to ports 8081, 8082/udp, 8084, 8088, 8443 or 9081, 9082/udp, 9084, 9088, 9443 respectively.


ATSD requires a license file when connected to an HBase cluster.

Open the Settings > License page and generate a license request. Provide the license key to Axibase support.

Once the license file is signed by Axibase, start ATSD, open the Settings > License page and import the license.


Incompatible HBase Version

If ATSD is not compatible with the HBase instance, the following error is reported when ATSD is started.

2018-09-17 14:59:08,322 ERROR [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] master.MasterRpcServices: $FileDescriptor.internalBuildGeneratedFileFrom(
        at com.axibase.tsd.hbase.coprocessor.autogenerated.DeleteProtocol.<clinit>(

Check HBase version and ensure that it's 1.2.x.

echo "status" | hbase shell

DNS Resolution

In case of UnknownHostException at startup, ensure that HMaster and HRegion servers are resolvable from the ATSD server.

Caused by:
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.<init>(
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.createBlockingRpcChannel(
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getClient(
    at org.apache.hadoop.hbase.client.ScannerCallable.prepare(
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(

Missing ATSD Coprocessor File

2017-09-01 13:44:30,386;INFO;main;com.axibase.tsd.hbase.SchemaBean;Set path to coprocessor: table 'atsd_d', coprocessor com.axibase.tsd.hbase.coprocessor.CompactEndpoint, path to jar s3://atsd/hbase-root/lib/atsd-hbase.jar
2017-09-01 13:44:30,387;INFO;main;com.axibase.tsd.hbase.SchemaBean;Set path to coprocessor: table 'atsd_d', coprocessor com.axibase.tsd.hbase.coprocessor.DeleteDataEndpoint, path to jar s3://atsd/hbase-root/lib/atsd-hbase.jar
2017-09-01 13:44:30,474;WARN;main;;Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'series.batch.size' defined in URL [jar:file:/mnt/atsd/atsd/bin/atsd.17245.jar!/applicationContext-properties.xml]: Cannot resolve reference to bean 'seriesPollerHolder' while setting bean property 'updateAction'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'seriesPollerHolder': Injection of resource dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'serverOptionDaoImpl': Injection of resource dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'schemaBean': Invocation of init method failed; nested exception is org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: No such file or directory: 'hbase-root/lib/atsd-hbase.jar' Set hbase.table.sanity.checks to false at conf or table descriptor if you want to bypass sanity checks

Check the coprocessors.jar setting.

grep atsd/atsd/conf/ -e "coprocessors.jar"

Check that the file is present in S3.

aws s3 ls --summarize --human-readable --recursive s3://atsd/hbase-root/lib
2017-08-31 21:43:24  555.1 KiB hbase-root/lib/atsd-hbase.jar

Total Objects: 1
  Total Size: 555.1 KiB

If necessary, copy the file.

aws s3 cp atsd/atsd-hbase.*.jar s3://atsd/hbase-root/lib/atsd-hbase.jar

Restart the HBase cluster, both HMaster and Region Servers, restart ATSD.