Tracking Expatriation with ChartLab, SQL, and Web Crawler in ATSD

Introduction

The Internal Revenue Service (IRS) of the United States releases quarterly reports tracking the number of expatriated citizens processed in the previous three-month period. The reports are announced via the Federal Register which is the primary publication medium of the federal government and used by a number of agencies as an outreach arm to the public.

Ostensibly, the tax bureau maintains these records instead of Citizenship and Immigration Services (CIS) because the IRS uses that information to adjudicate decisions regarding those expatriating citizens upon whom the Expatriation Tax is levied.

An expatriate must meet any one of the following criteria to be liable for the eponymous tax:

  • Average annual net income tax for the five years ending before the date of expatriation or termination of residency is more than a specified amount that is adjusted for inflation ($151,000 for 2012, $155,000 for 2013, $157,000 for 2014, and $160,000 for 2015).
  • Net worth is $2 million or more on the date of expatriation or termination of residency.
  • Failure to certify on Form 8854 that the citizen has complied with all U.S. federal tax obligations for the five years preceding the date of expatriation or termination of residency.

IRS Form 8854 available at is a statement which confirms that up to the date of your expatriation, you have complied with United States Federal Tax Code to the letter of the law. Failing to correctly complete this document means that you are obligated to pay the Expatriation Tax, even if you do not meet either of the previous criteria.

Previous articles tracking United States expatriation:

Fig 1. The top chart uses a time-offset setting to show a quarter-on-quarter comparison, while the annual totals are shown in the bottom chart.

ChartLab

Both of the above graphs are prepared with Axibase ChartLab, a visualization sandbox that features convenient syntax with different types of charts. The above visualizations may be modified to produce several projections of the same dataset stored in Axibase Time Series Database (ATSD).

A number of built-in statistical functions are already supported by ChartLab, and user-defined functions may be added to a local ATSD instance. You can follow these instructions to upload a user-defined function as a JavaScript file.

The above visualization uses the ChangeFromPreviousYear user-defined function. While the absolute value of U.S. expatriates is the greatest during 2016-2017, the greatest relative change is observed several years ago.

While the underlying function may be quite verbose:

value = var v = value('cpi'); var p = value('prev_cpi'); if(p!=null && v!=null) return (v / p - 1) * 100

The syntax required in the ChartLab configuration is simple:

value = fred.PercentChangeFromYearAgo('raw')

View the complete fred.js UDF library here.

Customized data monitoring in ATSD is possible using alert-expressions whereby user-specified parameters may be defined to trigger alarms based on incoming data.

Here, alert expressions are applied to static data, but they may be easily applied to dynamic data and used for systems monitoring as seen in this example which is monitoring one of the Axibase servers right now.

The alert-expression used here highlights quarters where the percent change from the previous year is greater than 50% in red, and quarters where it is less than -10% in green.

The syntax for the alert-expression above is shown here:

alert-expression = value > 50 ? 4000 : value < -10 ? -100 : 0
alert-style = if (alert > 50) return 'color:red'
alert-style = if (alert < -10) return 'color:green'

When using a two-parameter alert-expression, three arguments are needed: an upper bound, lower bound, and median value separated by : delimiter.

For more information about ChartLab syntax or to explore other features which may be used in the example above, see the complete ChartLab documentation.

SQL

In addition to ChartLab, the ATSD includes a web-based SQL Console which can be used for ad-hoc data exploration.

SELECT date_format(time, 'yyyy') AS "Year",
  count(value) AS "Year Total",
  count(value)-lag(count(value)) AS "Y-o-Y Change",
  100*(count(value)-lag(count(value)))/lag(count(value)) AS "Y-o-Y Change, %"
FROM "us-expatriate-counter"
  WHERE entity = 'us.irs' AND datetime <= '2018-01-01T00:00:00Z'
GROUP BY period(1 YEAR, END_TIME)
  ORDER BY period(1 YEAR, END_TIME)
| Year | Year Total | Y-o-Y Change | Y-o-Y Change, % |
|------|-----------:|-------------:|----------------:|
| 2000 | 430        | null         | null            |
| 2001 | 488        | 58.00        | 13.49           |
| 2002 | 503        | 15.00        | 3.07            |
| 2003 | 550        | 47.00        | 9.34            |
| 2004 | 631        | 81.00        | 14.73           |
| 2005 | 745        | 114.00       | 18.07           |
| 2006 | 279        | -466.00      | -62.55          |
| 2007 | 470        | 191.00       | 68.46           |
| 2008 | 229        | -241.00      | -51.28          |
| 2009 | 741        | 512.00       | 223.58          |
| 2010 | 1531       | 790.00       | 106.61          |
| 2011 | 1780       | 249.00       | 16.26           |
| 2012 | 932        | -848.00      | -47.64          |
| 2013 | 2999       | 2067.00      | 221.78          |
| 2014 | 3411       | 412.00       | 13.74           |
| 2015 | 4273       | 862.00       | 25.27           |
| 2016 | 5398       | 1125.00      | 26.33           |
| 2017 | 5128       | -270.00      | -5.00           |

The above query uses the LAG function to select the previous value, offset by one index position, when creating comparative tables like the one shown here.

Likewise, data may be tracked by quarter using the query below:

SELECT CEIL(CAST(date_format(time, 'M') AS NUMBER)/3) AS "Quarter", date_format(time, 'yyyy') AS "Year",
  count(value) AS "Quarter Total"
FROM "us-expatriate-counter"
  WHERE entity = 'us.irs' AND date_format(time, 'yyyy-MM') >= '2013-01' AND datetime <= now
GROUP BY period(1 QUARTER)
  ORDER BY period(1 QUARTER)
| Quarter | Year | Quarter Total |
|---------|------|---------------|
| 1       | 2013 | 679           |
| 2       | 2013 | 1129          |
| 3       | 2013 | 560           |
| 4       | 2013 | 631           |
| 1       | 2014 | 999           |
| 2       | 2014 | 576           |
| 3       | 2014 | 775           |
| 4       | 2014 | 1061          |
| 1       | 2015 | 1331          |
| 2       | 2015 | 459           |
| 3       | 2015 | 1426          |
| 4       | 2015 | 1057          |
| 1       | 2016 | 1154          |
| 2       | 2016 | 506           |
| 3       | 2016 | 1379          |
| 4       | 2016 | 2359          |
| 1       | 2017 | 1313          |
| 2       | 2017 | 1756          |
| 3       | 2017 | 1374          |
| 4       | 2017 | 685           |
| 1       | 2018 | 1098          |

The above query uses a CAST clause and CEIL function to rename each quarter.

Web Crawler

The data published by the Federal Register requires an intermediate ETL step to be available for analysis. This extraction-transformation-loading procedure is implemented by a web crawler built specifically for the task of tracking Federal Register publications for new expatriation data releases.

The Web Crawler operates according to this workflow:

The Web Crawler reads incoming data from the Federal Register and parses it into series commands, readable by ATSD, the database which hosts all the data used in this article and supports the background operations of SQL Console. A series command template is shown below:

series d:{iso-date} e:{entity} t:{tag-1}={val-1} m:{metric-1}={number}

Once the data is stored in the database, the date (d:) parameter may be referenced in ISO format, or modified to output human-readable date information such as that seen in the SQL section of this article. Tags (t:), metrics (m:), and entities (e:) are identifying features of a particular set of data. In the case of expatriation data here, the entity is the publishing body, the IRS and the metric is the number of expatriates. The raw data does not feature tag-level differentiation, but it could be something like us-born-citizens versus naturalized-citizens, if the data is tracked that specifically.

The complete list and operation instructions of other supported Axibase data crawlers is hosted here.

Conclusion

An unusually large number of expatriates renouncing their United States citizenship was observed during the last half-decade, with an exceptionally high number leaving the country during 2016-2107. While the most recent figures show that this number has markedly decreased, it still remains higher than it ever was pre-2010. There is probably not a very simple answer to the question of 'Why?' but BBC recently wrote a brief article discussing the Americans who chose to leave the country in response to the recent presidential elections.

Whatever the reason, it is unlikely that the roughly 5000 citizens leaving the country will have much of an impact on the overall demographics or population of the country as the United States is constantly host to some 750,000 newly naturalized citizens each year.

For more information about any of the data in this article, instructions on using any of the tools featured here, or suggestions and comments about articles you'd like to see in the future, raise an issue on the Axibase GitHub repository.