Monday 22 March 2010

missing the point: a proposal for an alternative

one of the problems (in my view) with digital imaging is that it seems to be mired in the past. People are stuck thinking in terms of representations of the image and forget about the fact that we do manipulation.

Now in my darkroom daze putting a negative on my enlarger and making a print was the way I made a picture. The negative was something I evaluated as a intermediate but not something that I regarded as a goal in itself.

Today we (well most of us) use digital cameras, which record the data in numbers.
For example a 3072 x 2048 image from a 10D camera takes:
  • 1.3MB as a JPG
  • 6.7MB as a RAW
  • 18MB as a EXR file (which is using zip compression internally!)
the alternative to this is to render your RAW file as a TIFF in 16 bit and store that or keep your RAW and store metadata files which show your processing to get it to where you like it (which is what Adobe Lightroom does).

The disadvantage of all of these systems except storing a JPG is that you do not have something which can be viewed without applying instructions to process it. Particularly in the case of the EXR file which is intended to hold a dynamic range well beyond that of a normal image.

While HDR can be a great tool for converting the relatively unphotographable and making a rendering of that (as in below)


again we are not really storing a finished product that can be opened and viewed.

The disadvantage of 8 bits however is that it will not really support much fiddling before it starts to make the image quality fall apart. Changes such as
  • contrast fixes,
  • dodge and burn
  • colour space conversion
all erode the image. If you don't do it carefully and keep your original you'll end up with less quality. which we're all trying to maximize in the first place.

I can see a few people fidgeting in their seat and wanting to say "what about 16 bit TIFF" ... and of course that's an alternative to the 8 bit representations we have in JPG (or BMP or TIFF or ...).

A 16 bit tiff of the above image is 36MB so its actually greediest to store even if you can just open it.

As it happens as photographers we seldom want to do anywhere near the sorts of calculations that HDR or other ray tracing graphic artists want to do. Most of our cameras are actually only capturing 12 or 14 bit in the first place, and even then (as I showed recently) we seem to be pushing the limits of their ability right now with little real data additions to our images between the 10D (2003) and the G1 (2009).

So it would seem we don't need to be storing more than we are generating, and we really don't need more than 8 bits for representation.

Lets get to the core of the problem as I see it


we want a way to store digital picture information which is not a greedy space hog, but which allows us to make some alterations to the data in a non destructive way.

We picked an 8 bit representation for a number of reasons:
  • its a convenient size for a computer being one byte
  • 8 bits provides 256 levels of brightness which is enough tonality for the human eye
However over the last fifteen years this has gone from simply capturing an image to accepting that post processing is a requisite component. So formats have gradually evolved (often with no real intelligent guidance, so much as marketing pressures) we have moved from the older 8 bit representation and moved towards 16 bit representations to allow us greater "precision" in making adjustments.

This is where ignoring maths a school has let the majority of us down, as we often fail to get the most important point of this.

The decimal point.

Now of course graphics programmers (especially CGI programmers) have been working with formats like Radiance and openEXR have been around for a while and offer the image processing person quite a powerful storage system but not without a couple of drawbacks mentioned above.

Perhaps what we need is to combine the two criteria above
  • its a convenient size for a computer being one byte
  • 8 bits provides 256 levels of brightness which is enough tonality for the human eye

and look for another way to do the same.

Looking into the openEXR and Radiance formats the answer they used was to choose floating point numbers. This allowed them to move something, then move it back without significantly losing information.

There are 8 bit representations of fixed floating points numbers which have been around for decades which would allow us to keep our file sizes compact and give us far more flexible image formats. Since we really only need to keep those 256 levels we could use this sort of binary representation of our number and:
  • keep the file as "rendered" for display
  • facilitate far less desctuctive edits
  • keep the memory requirements and processing demands lower

I'm not sure how to push this forward, but I thought I'd start here




A comment by Tim has had me consider more carefully my wording in the above blog. I've certainly left a few things "implied" and perhaps require a "mind set" context of what I was thinking. Not knowing how to write that succinctly I left ambiguity in my writing (and may have made logical errors on the possible ranges available in fixed point representations). These questions make a good framework to begin addressing those shortcomings in my post.


1) No data format can be viewed without processing at all. JPG needs a 'lot' of processing before being able to view it. I think you mean 'openable with most standard operating sysems without any additional software installed'.

This is of course both right and wrong, as no "operating system" can indeed open most files without additional software installed.

Of course we all think of the entire suite of applications which gets installed on a modern computer as part of the operating system, but they are infact a suite of applications. Browsers, text editors, email applications ... they are all there "standard" when you buy a PC (be it Mac OS-X or Windows).

By processing I meant doing something more than just opening and displaying. Of course JPG is a file which contains data which must be decoded (as one may decode a zip file) to expand it and then put data into memory for display. So it does require more processing than a BMP or a TIFF to open and display.

However that is not what I meant by processing. A RAW file can not be simply displayed and as I'm sure Tim should be aware of requires actually generating the proper colour pixels into a grid which is not what is recorded on the sensor. The demosaic of the RAW generates a proper RGB pixel at the intersection of the recorded red green or blue pixel (note a sensor only records a red a green or a blue, the colour is created)

This then is recorded in a linear fashion, and must have a curve or gamma applied to it, again more processing before then being fitted to within the 8 bits that are used by output devices (like screens or printers).


2) 8 bits might be enough to store the brightness range of the eye (arguably) but not enough to store which 8 bits in the full range of brightness from dark to pointing at the sun.

exactly and as I addressed that HDRI is a separate and distinct practice. I think that one does not normally need to record this sort of range, 8 bits seems to have been pretty good, and 12 is certainly enough for anything short of HDRI.

I don't want to be able to discern sun spots while holding shadow details on the underside of a leaf. I think that accurately is the key word here and negatives have been holding detail and tonal range sufficient for our desires for rendering scenes, perhaps even exceeding digital captures in some ways.

Compared to opening a RAW file or worse a HDRI file almost nothing is done in processing a typical 8 bit JPG or BMP or TIFF file.

This would mean you couldn't accurately store a picture from a digital camera because you didn't have enough resolution (although you do have the range).


I think you've confused resolution (the ability to resolve two dots as being two dots not a singe blob) from dynamic range or scene brightness range. I'm not entirely certain that the floating point needs to have so much more precision to give 256 discernible steps. I should check that out.


3) The fixed precision 8bit encoding you linked to is actually a 32 bit encoding


oops ... I'll fix that link, thanks.

When you send a file to a printer, even if you've got it in sRGB, the file will be "bent" more to fit the printers output specific profile.

storing things as floating point would reduce this enormously.


4) If you were to use 8 bits to encode in an EXR style way (EXR is 16 bit), you would need to use 5 bits for number and 3 bits for exponent resulting in a wide brightness range but a very low fixed precision accuracy


probably ... but (not having done the calculations) would perhaps be quite sufficient to represent much more than 256 levels with rounding happening at display time not at edit time. I have not thought through how such a system would work in principle, and then again since I don't want my intellectual property stolen and this is a blog post not a scientific paper I would perhaps keep that to myself.

The problem as I see it comes from the results of successive edits. For example, if you apply a different curve to the scene (or parts of it with dodge and burn) you may result in numbers which for adjacent pixels (for example) receiving calculated values of 165.2 165.3 165.4 165.5 Naturally this will all be written to 165 and you now have lost a tone change and been left with a tone.

5) Any newly invented image storage file type will suffer from your first mentioned problem of no being openable on all standard systems.

that is a problem ... as is introducing any new formats.

2 comments:

Tim Parkin said...

I think you might have got the wrong end of the stick.

1) No data format can be viewed without processing at all. JPG needs a 'lot' of processing before being able to view it. I think you mean 'openable with most standard operating sysems without any additional software installed'.

2) 8 bits might be enough to store the brightness range of the eye (arguably) but not enough to store which 8 bits in the full range of brightness from dark to pointing at the sun. This would mean you couldn't accurately store a picture from a digital camera because you didn't have enough resolution (although you do have the range).

3) The fixed precision 8bit encoding you linked to is actually a 32 bit encoding

4) If you were to use 8 bits to encode in an EXR style way (EXR is 16 bit), you would need to use 5 bits for number and 3 bits for exponent resulting in a wide brightness range but a very low fixed precision accuracy

5) Any newly invented image storage file type will suffer from your first mentioned problem of no being openable on all standard systems.

I do agree that using some form of HDR file type makes a lot of sense but you can't get away fom the fact that most of this use 32bit storage and if you reduce this, you lose reslution for smooth photographic transitions.

obakesan said...

Tim

thanks for your comment. This was (as you can tell) an on the fly post. I don't want to argue for extending cameras into HDRI (although something like the superCCD would perhaps do that)

I recognise that JPG is also undergoing processing, but it is not quite the same as the tonemapping which high dynamic range content has to undergo.

it might of course even be that the storage is not beneficial in terms of a cost benefit analysis of storage space.

however I can't see any detriment to moving from integer to decimal representations of the images