Disposable Digital Camera Hacking

Low Level Flash Storage Format

[camera pic from pcworld.com article]

Introduction
The basic file format used is the MS-DOS FAT12 file system, with the Smart Media algorithm for wear leveling. Samsung has a lot of papers on Smart Media, but the most technical one (one that echos many of the things I reverse-engineered below) is this smart media format summary -- it describes virtually every byte of both the Smart Media format and the FAT12 format.  The complete specifications require a signed NDA.

Since FLASH memory is reprogrammable only a limited number of times, the device can't be used like a normal disk drive with sectors at fixed addresses because the more-often-used sectors (such as directory entries and the FAT table) would quickly fail. As a result, you need to do a little work to find out what clusters are stored where.

One thing to keep in mind is how the flash programming and erase mechanism works: An erase sets all bits in a block to ones, programming a bit sets it to zero. You can reprogram a block without erasing it as long as you only clear bits. Therefore, you can keep adding zeros without a time-consuming (and life consuming) erase cycle.

Spare (ECC) Data Format
Each 512-byte sector has an associated 16 bytes of extra data. The Smart Media format defines these bytes as:

0
1
2
3
4
5
6
7
8
9
a
b
c
d
e
f
ff
ff
ff
ff
ff
ff*
10
15
cc
cc
cc
10
15
cc
cc
cc

Bytes 0-3 are the "User data area" and don't seem to be used. All were seen as 0xFF.
Byte 4 is the "User data status flag" - I haven't seen it used in the Dakota yet.
Byte 5 is the "Block Status Flag". The Flash data sheet says that this is programmed to a non-0xFF value if the block has been determined to be bad; I assume this was to match the Smart Media spec. My flash had all of these flags set to 0xFF, so I had no bad blocks (yet...)
Bytes 6-7 is the "Block Address-1", Bytes b-c are the "Block Address-2". From what I saw, they have the same 16-bit value:
Bytes 8-a is the "ECC Area-2" and d-f is the "ECC Area-1", and contain error correction codes.

Conversion Program
I wrote a simple C program to read in flash data from disk and output a filesystem image. It seems to work on simple data, but I'll need to understand what happens with bad sectors better to see if it really works. It does a pretty good check and will tell you if the flash format doesn't match expectations (of course, I'd love to see your results if any warning messages come up), so please use it at your own risk. The file is flashdump2iso.c


Card Information System (CIS) / Partition Table
This is always located in cluster 0. I found it at 0x24000 in memory. It does not seem to contain the Smart Media CIS table, which makes it the only part of the spec missing from this implementation.

....
000241b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 80 01  |................|
000241c0  01 00 01 03 50 f3 29 00  00 00 d7 7c 00 00 00 00  |....P.)....|....|
....
000241f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|


Boot Sector and BIOS Parameter Block (BPB)
Location: 31200 (found by inspection at this address in two different cameras. This can probably be reliably found as the second sector of the cluster 001 (marked 1002 in the ECC area.)

00031200  e9 00 00<53 55 4e 50 4c  55 53 20>00 02.20.01 00  |...SUNPLUS .. ..|
00031210  02.00 01.d7 7c.f8.03 00. 10 00.04 00 29 00 00 00  |....|.......)...|
00031220  00 00 00 00 00 00 29 00  00 00 00 00 00 00 00 00  |......).........|
00031230  00 00 00 00 00 00 46 41  54 31 32 20 20 20 00 00  |......FAT12   ..|
00031240  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000313f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|

Interpretion: OEM name = SUNPLUS, 512 bytes per sector, 32 sectors per allocation unit (cluster), 1 reserved sector, 2 fat copies, 256 file entries in root directory (at 32 bytes per entry, that yields 16 sectors), 0x7cd7
total sectors (16 MB), fixed media (f8), and one fat takes 3 sectors.

Fat tables (two copies)
Location: 31400 (found by inspection in two different cameras), This can probably be reliably found as the part of cluster 001.

Future: description of how FAT12 works. Every set of three bytes (ab cd ef) represent two 12-bit entries (dab and cef). You can just follow my example and move the nibbles appropriately.

First copy:
00031400  f8 ff ff ff ff ff 05 60  00 07 80 00 09 f0 ff 00  |.......`........|
00031410  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Second copy:  (note size of fat table is 3 sectors, 0x600 bytes)
00031a00  f8 ff ff.ff ff ff.05 60  00.07 80 00 09 f0 ff 00  |.......`........|
00031a10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Interpretation:
FAT[0] = ff8 (media descriptor)
FAT[1] = fff EOC mark (end of cluster chain)
FAT[2] = fff bad/used
FAT[3] = fff bad/used
FAT[4] = 005
FAT[5] = 006
FAT[6] = 007
FAT[7] = 008
FAT[8] = 009
FAT[9] = fff EOC
FAT[a] = 000 free
FAT[b] = 000 ... etc ...

Root Directory
Location: 32000 (found by inspection)

00032000  44 43 49 4d 20 20 20 20  20 20 20 10 00 00 00 00  |DCIM       .....|
00032010  00 00 21 00 00 00 00 00  21 00<02 00>00 00 00 00  |..!.....!.......|
--> Subdirectory "DCIM" can be found at cluster 002

Directory "DCIM"
Location: cluster 0x002 (as stated in root directory). Using the ECC data, we find this is at address 34000, which happens to follow the root directory.

00034000  2e 20 20 20 20 20 20 20  20 20 20 10 00 00 00 00  |.          .....|
00034010  00 00 00 00 00 00 00 00  21 00<02 00>00 00 00 00  |........!.......|
--> File "." at cluster 002 (that's this directory in this cluster)

00034020  2e 2e 20 20 20 20 20 20  20 20 20 10 00 00 00 00  |..         .....|
00034030  00 00 00 00 00 00 00 00  21 00<00 00>00 00 00 00  |........!.......|
--> File ".." cluster set to 0, meaning we're a child of the root directory.

00034040  31 30 30 4d 45 44 49 41  20 20 20 10 00 00 00 00  |100MEDIA   .....|
00034050  00 00 21 00 00 00 00 00  21 00<03 00>00 00 00 00  |..!.....!.......|
--> Directory "100MEDIA" can be found at cluster 003

Directory "100MEDIA"
Location: cluster 0x003 (as stated in the DCIM directory). Using the ECC data, we find this is at address 44000, which again happens to follow cluster 002.

00044000  2e 20 20 20 20 20 20 20  20 20 20 10 00 00 00 00  |.          .....|
00044010  00 00 00 00 00 00 00 00  21 00.03 00.00 00 00 00  |........!.......|
--> File ".", cluster = 0003

00044020  2e 2e 20 20 20 20 20 20  20 20 20 10 00 00 00 00  |..         .....|
00044030  00 00 00 00 00 00 00 00  21 00 02 00 00 00 00 00  |........!.......|
--> File "..", cluster = 0002

00044040  44 53 43 5f 30 30 30 33  4a 50 47.20.00 00 00 00  |DSC_0003JPG ....|
00044050  00 00 21 2e 00 00.00 00  21 2e.04 00.2c 57 01 00  |..!.....!...,W..|
--> File "DSC_0003.JPG", First cluster = 004, Size = 0001562c (it's stored LSB first) = 87852 bytes

00044060  e5 53 43 5f 30 30 30 37  4a 50 47 20 00 00 00 00  |.SC_0007JPG ....|
00044070  00 00 21 2e 00 00 00 00  21 2e.0a 00.a1 80 00 00  |..!.....!.......|
--> File "?SC_0007.JPG" (deleted, as marked by the starting 0xE5), First cluster = 00a, Size = 000008a1

File "DSC_0003.JPG"
This is the only non-deleted file I have stored in the camera. The directory entry tells us that the first cluster is 004; we use the FAT table to follow the chain and find the rest:

FAT[4] = 005
FAT[5] = 006
FAT[6] = 007
FAT[7] = 008
FAT[8] = 009
FAT[9] = fff - end of chain

We can translate these cluster numbers to block addresses, and then search through all of memory (argh!) to find the flash address:

             Block     flash      divided by cluster size of
cluster     address   address   0x4000 bytes (for use with dd)
   4    -->   1008     58000                 16
   5    -->   100b     4c000                 13
   6    -->   100d     2c000                  b
   7    -->   100e     38000                  e
   8    -->   1010     3c000                  f
   9    -->   1013      4000                 10

The unix utility "dd" can be used to extract blocks from the data file:

    dd if=xxx.bin bs=0x4000 count=1 iseek=0x16 of=out16

The output files can then be concatenated, or you could pipe a whole bunch of 'dd' commands together in one long command.


Results
This is the picture that was stored on my camera:

First recovered picture!
full size images: original and brightened

It's a shot that I took accidently at my dining room table. I think the box in the middle is a battery holder powering the camera (with red and black leads coming out of the right side). In the background top & center looks like my soldering iron. The table is made up of tiles; you can see a seam running from the top left of the picture down to the bottom right. The lighting is dim incandescent, so that explains the red cast -- the first thing I did when I took the camera apart was to remove the flash so that I wouldn't shock myself.



Ok, that was fun and all, but very boring and stuff you can read in the Smart Media spec. It's time to knock it up a notch!


Deleted file "?SC_0007.JPG"
When a file is deleted, two things happen. First, the first byte of the file entry is set to "E5". Second, the clusters allocated to the file are freed in the FAT table. Most of the file entry is usually still in the directory, but of course can be overwritten if needed. Most importantly, note that the first cluster and the file size (which can tell you the number of clusters used) remain.

Deleted files can be recovered if you make 2 assumptions. The first is the missing first byte of the file name -- we can assume that it is 'D' here, although it really doesn't matter. The second is a much harder assumption: that the clusters were assigned in a determinable order.  Although the operating system can theoretically place the clusters in any order, we can assume that it fills from the front of the drive towards the end, filling in unused clusters. (You'll notice that's what happened with the non-deleted file).

So, let's recover this file. First notice that the file is 0x8a1 bytes -- very very small. Let's ignore that and see what happens. Second, see that the first cluster is 00a -- it's marked as free in the fat table, so it looks like the beginning hasn't been overwritten yet. Let's assume the file occupies cluster 00a on upwards -- all these clusters are free in the FAT.

Here's where the wear-leveling algorithm can help us in file recovery. Never-used clusters are never initialized with ECC data, so we can reliably tell which clusters have had data (if it weren't for the ECC data, we could just see if the regular data was an all-erased "FF" pattern). Searching through all the ECC areas, we can find clusters 00a and beyond:

     Cluster number:   00a   00b   00c   00d   00e   00f   010   011   012
As stored w/ parity:  1015  1016  1019  101a  101c  101f  1020  1023  1025
        ECC address:  2400  2e00  3800  2800  2a00  3000  3200  3400  3600
       data address: 48000 5c000 70000 50000 54000 60000 64000 68000 6c000

We can further verify our assumptions by checking for the JPEG signature ("JFIF") in the first cluster. Eight clusters is 128KB - a little small, but a reasonable size. As I've said before, let's just try it.

The result is a very dark picture. It's hard to tell if this is a picture I took with my finger on the lens (easy to do because there aren't many ways to hold the bare PCB without touching those fragile solder connections). There artifacts typical of an actual imager (as opposed to just zero's stored in memory), but it's not known if the line at the right side is a file misalignment or another artifact of the imager (both very possible).


Actual Smart Media Implemention
Just because the data is stored in a recogizable format doesn't mean that the implementation is necessarily any good. There is still a trade-off to be made between good wear-leveling, unused block erasure (for security), and speed/power.

I've still got notes to decipher, so check back here... until then, here are my old notes:



Here's a little histogram count:

1019, 1016, 1015, 1013, 1010, 100e, 100d, 100b, 1008, 1007, 1004, 1002, 1001 seen 32 times.
0000 seen 39 times.


Here's another representation:

spare                                         &nbs p;  
 ecc addr: 1000  1200  1400  1600  1800  1a00  1c00  1e00 
data addr: 20000 24000 28000 2c000 30000 34000 38000 3c000
  cluster:                               0002
     ####: ----  1001  ----  100d  1002  1004  100e  1010 

 ecc addr: 2000  2200  2400  2600  2800  2a00  2c00  2e00
data addr: 40000 44000 48000 4c000 50000 54000 58000 5c000
  cluster:       0003                          0004?
     ####: 1013  1007  1015  100b  101a  101c  1008  1016
                       JFIF                    JFIF

 ecc addr: 3000  3200  3400  3600  3800  3a00
data addr: 60000 64000 68000 6c000 70000 74000
  cluster:           
     ####: 101f  1020  1023  1025  1019  ----



Data with a #### of 0 is written to pages starting at spare addresses:
 addr spare-addr  data
01000 0080        ff's
04000 0200        256 bytes of 00, then aa 55 ... ending with BAA037500003
08000 0400-04f0   sparse counting-like pattern
                  (0420-0460 have checksums of ff)
09200 0490-04e0   random-like data
09e00 04f0        00's
0c000 0600        256 bytes of 00, then aa 55 ... ending with BAA037500003
10000 0800-08f0   (0820-0860 have checksums of ff)
14000 0a00  \ same cksum
18000 0c00  /
20000 1000  \ same cksum
20200 1010  /



Remaining Items
Error correction is a big one - I don't have an example of what happens when the flash starts going south.
Data structure - There's probably some directory-like thing that allows access to a particular cluster without searching the entire chip.  I didn't see evidence of this in the Smart Media summary, but it would have to exist if the file system is to be efficient. (not that it has to be well designed, but I'm hoping it is)



back to my dakota digital camera page
my homepage