Opened 12 years ago

Closed 12 years ago

# Problem with EvenlyDiscretizedFunc.getXIndex(double x)

Reported by: Owned by: Ned Field Peter Powers critical OpenSHA 1.3 sha

### Description

This is giving inconsistent mapping for double x values that are right on bin boundaries, and it comes down to the following code: (int) Math.round((x-minX)/delta).

For example, the following snippet:

double minX=6.05;
double delta=0.1;
double[] testVals = {6,6.1,6.2,6.3,6.4,6.5,6.6,6.7,6.8,6.9,7};
for(double x:testVals) {

int index = (int)Math.round((x-minX)/delta);
double binCenter = minX+delta*index;
System.out.println(x+"\tindex = "+index+"\tbinCenter="+binCenter);

}

gives the following results:

6.0 index = 0 binCenter=6.05
6.1 index = 0 binCenter=6.05
6.2 index = 2 binCenter=6.25
6.3 index = 3 binCenter=6.35
6.4 index = 4 binCenter=6.45
6.5 index = 5 binCenter=6.55
6.6 index = 5 binCenter=6.55
6.7 index = 7 binCenter=6.75
6.8 index = 8 binCenter=6.85
6.9 index = 9 binCenter=6.95
7.0 index = 10 binCenter=7.05

(nothing maps to the bin with index =1, and two values map to the bin with index 5)

Peter, it looks like you may have changed this bit of code from what it was originally?

This should have been caught by some Junit test.

### comment:1 Changed 12 years ago by Kevin Milner

Ned, we are both automatically CC'd on all trac ticket creations/modifications/comments. I expanded your test a little bit to show the actual values that are being rounded, and it looks to be a double precision problem:

[code]6.0 index = 0 binCenter=6.05 '(x-minX)'=-0.04999999999999982 '(x-minX)/delta'=-0.4999999999999982
6.1 index = 0 binCenter=6.05 '(x-minX)'=0.04999999999999982 '(x-minX)/delta'=0.4999999999999982
6.2 index = 2 binCenter=6.25 '(x-minX)'=0.15000000000000036 '(x-minX)/delta'=1.5000000000000036
6.3 index = 3 binCenter=6.35 '(x-minX)'=0.25 '(x-minX)/delta'=2.5
6.4 index = 4 binCenter=6.45 '(x-minX)'=0.35000000000000053 '(x-minX)/delta'=3.5000000000000053
6.5 index = 5 binCenter=6.55 '(x-minX)'=0.4500000000000002 '(x-minX)/delta'=4.500000000000002
6.6 index = 5 binCenter=6.55 '(x-minX)'=0.5499999999999998 '(x-minX)/delta'=5.499999999999998
6.7 index = 7 binCenter=6.75 '(x-minX)'=0.6500000000000004 '(x-minX)/delta'=6.5000000000000036
6.8 index = 8 binCenter=6.85 '(x-minX)'=0.75 '(x-minX)/delta'=7.5
6.9 index = 9 binCenter=6.95 '(x-minX)'=0.8500000000000005 '(x-minX)/delta'=8.500000000000005
7.0 index = 10 binCenter=7.05 '(x-minX)'=0.9500000000000002 '(x-minX)/delta'=9.500000000000002code

When the double precision representation is x.049999999999999y it rounds down, when it is x.50000000000000y it rounds up, even though both are representations of x.5. We could potentially get around this by subtracting a very small number from (x-minX) which would cause it to always be in the bin before, but this would mess up the M6 case below (it would now be outside of the bins). I'm not quite sure the best way to go forward.

### comment:2 Changed 12 years ago by Peter Powers

I'm grappling with this. Another problem with a small value correction is that the small value must be relative to the existing values in the distribution because you don't know in advance wether your rounding error is going to be in the 16th or higher place depending on the exponent of the double. I thought I had a workaround this morning but I could find ways to break it. Looking into using BigDecimal?.

### comment:3 Changed 12 years ago by Kevin Milner

Yeah I was afraid we would have to use BigDecimal? - which would seriously hurt performance. Keep me posted!

### comment:4 Changed 12 years ago by Peter Powers

Resolution: → fixed new → closed

I need to take a shower. Fixed in [9324].

I'm not keen on BigDecimal? and I loathe casting to (float), even though that can be made to work. The (dirty, but seemingly stable) solution for now is to use the same math as before but scale (multiply) the double value that is being used to determine the correct index by 1+(1e-14, just above the precision limit of doubles) prior to rounding. This pushes any values that are e.g. 4.9999999... when they should be 5.0 above the line and they are associated with the correct index. One side effect is that if your actual value to index is 4.9999999... it will be pulled into the next highest bin, but you probably ended up such a value in the first place as a result of double precision issues.

For reference, values that fall exactly on the boundary (midpoint) between two function values are associated with the higher value (a la Matlab) with the exception of a value on the upper bound of the highest function value, which is associated with the high value of the function (also a la Matlab). New jUnit tests were also added.

Note: See TracTickets for help on using tickets.