# Precision in Stata

Contents (click to expand)

.

## 0.1 Create Data for Example.

First, create some fake data.

`. clear
. set obs 1000
obs was 0, now 1000
. g x = 1.1
. list in 1/5, noobs
+-----+
| x |
|-----|
| 1.1 |
| 1.1 |
| 1.1 |
| 1.1 |
| 1.1 |
+-----+
. count if x ==1.1 // zero matches!!
0
`

### Precision of Stata storage formats

Stata isnt wrong, it's just that you stored the variable **x** with too little precision (some decimal numbers have no exact finite-digit binary representation in computing). If we change the precision to float or store the variable as double format then it fixes the issue. Note below how **x** is represented in Hexidecimal and Binary IEEE format vs. Stata general (16g) and fixed (f) format.

`.
. count if x == float(1.1)
1000
.
.
. **formats
. di %21x x //hex
+1.19999a0000000X+000
. di %16L x //IEEE precision
000000a09999f13f
. di %16.0g round(x, .1)
1.1
. di %4.2f round(x, .1)
1.10
. di %23.18f round(x, .1)
1.100000000000000089
.
`

### 0.1.1 Double formats

Storing the variable (now **x**) as double format fixes this issue. You could even change all default variable storage to double, however it'd make your dataset bloated and it's usually unnecessary - you really only need to change variables that require full precision or are being displayed in a table/graph.

Click for the first example code

`.
. g double `**y** = 1.1
. count if **y** ==1.1 //works now.
1000

## 0.2 Solutions

Let's look at how to deal with stored results on the fly. The hackish/kludgy solution we have used previously was to convert it to a string and take the substring to truncate the value. This is not ideal.

`.
.
.
.
.
. g z = 999/_n
. qui su z, d
. di `"`r(mean)'"'
7.477985390007496
. di `"`=round(`r(mean)', 1.1)'"'
7.700000000000001
. di `"`=substr(`"`=round(`r(mean)', .01)'"', 1, 4)'"' //kludge using str
7.48
`

Instead, we should use one of the solutions below. These include using the extended macro function 'display' to properly format and / or round these values (SOLUTION 1) or create variables with proper display format (think of display format like a 'mask' over the true (and accurate) stored value) (SOLUTION 2).

`.
.
. **SOLUTION 1: use extended function format**
. qui su z, d
. di `"`r(mean)'"'
7.477985390007496
. local r:display %3.2f `r(mean)'
. di `"`r'"' //use stored result
7.48
. local r:display %3.2f `=round(`r(mean)',.01)'
. di `"`r'"' //use calculated/rounded result
7.48
. g mean = `r(mean)'
. local r: display %3.2f `=mean'
. di `"`r' vs. `=mean'"' //use stored variables
7.48 vs. 7.477985382080078
`

.

`.
. **SOLUTION 2: create precise, formatted variable or scalar**
. qui su z, d
. g double p1 = `r(mean)'
. di %3.2f `=p1[1]' //display without macro extension
7.48
.
. l p1 in 1
+-----------+
| p1 |
|-----------|
1. | 7.4779854 |
+-----------+
. *fix display format:
. format p1 %3.2f
. l p1 in 1 //fixed
+------+
| p1 |
|------|
1. | 7.48 |
+------+
.
`

Instead of macros or variables, we can also work with lightweight -scalar-s to get the same result.

`. *note:
. scalar s1 = `r(mean)'
. di %3.2f s1
7.48
. di s1
7.4779854
. assert `=s1' == p1 //true
`

## 0.3 Further Reading

For more information on storage precision, check out these items written by the owner of Stata William Gould

HERE and also

HERE