Disposable Digital
Camera Hacking
Low Level Flash
Storage Format
Introduction
The basic file format used is the MS-DOS FAT12 file
system, with the Smart Media algorithm for wear leveling. Samsung has a
lot of papers on Smart Media, but the most technical one (one that
echos many of the things I reverse-engineered below) is this smart
media format summary -- it describes virtually every byte of both
the
Smart Media format and the FAT12 format. The complete
specifications
require
a signed NDA.
Since FLASH memory is reprogrammable only a limited number of times,
the device can't be used like a normal disk drive
with sectors
at fixed addresses because the more-often-used sectors (such as
directory entries and the FAT table) would quickly fail. As a result,
you need to do a little work to find out what clusters are
stored where.
One thing to keep in mind is how the flash programming and erase
mechanism works: An erase sets all bits in a block to ones, programming
a bit sets it to zero. You can reprogram a block without erasing it as
long as you only clear bits.
Therefore, you can keep adding zeros without a time-consuming (and life
consuming) erase cycle.
Spare (ECC) Data Format
Each 512-byte sector has an associated 16 bytes of extra
data. The Smart Media format defines these bytes as:
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
a
|
b
|
c
|
d
|
e
|
f
|
ff
|
ff
|
ff
|
ff
|
ff
|
ff*
|
10
|
15
|
cc
|
cc
|
cc
|
10
|
15
|
cc
|
cc
|
cc
|
Bytes 0-3 are the "User data area" and don't seem to be used. All were
seen as 0xFF.
Byte 4 is the "User data status flag" - I haven't seen it used in the
Dakota yet.
Byte 5 is the "Block Status Flag". The Flash data sheet
says
that this is programmed to a non-0xFF value if the block has been
determined to be bad; I assume this was to match the Smart Media spec.
My flash had all of these flags set to 0xFF, so I had no bad blocks
(yet...)
Bytes 6-7 is the "Block Address-1", Bytes b-c are the "Block
Address-2". From what I saw, they have the same 16-bit value:
- Bits 15-13 and 11 seem not to be set only if the block has never
been
initialized, otherwise they are 0.
- Bit 12 set seems to indicate that the block is in use. When a
block is no longer used, the all 16 bits are programmed to 0000.
- Bits 10-1 indicate the cluster number (you'll have to shift it
down one bit). Note that a cluster is 32 512-byte sectors, so you'll
see this in 32 separate ECC areas. Also note that this is only 11 bits,
not the 12 bit clusters used by FAT12. I don't know if that's an
oversite, or of the scheme changes past 11 bits.
- Bit 0 is a parity bit. Set this bit to make the sum of the number
of ones in the 16-bit word even.
Bytes 8-a is the "ECC Area-2" and d-f is the "ECC Area-1", and contain
error correction codes.
Conversion
Program
I wrote a simple C program to read in flash data from disk
and output a filesystem image. It seems to work on simple data, but
I'll need to understand what happens with bad sectors better to see if
it really works. It does a pretty good check and will tell you if the
flash format doesn't match expectations (of course, I'd love to see
your results if any warning messages come up), so please use it at your
own risk. The file is flashdump2iso.c
Card Information System (CIS) / Partition
Table
This is always located in cluster 0. I found it at
0x24000 in memory. It does not seem to contain the Smart Media CIS
table, which makes it the only part of the spec missing from this
implementation.
....
000241b0 00 00 00 00 00 00
00 00 00 00 00 00 00 00 80 01 |................|
000241c0 01 00 01 03 50 f3
29 00 00 00 d7 7c 00 00 00 00 |....P.)....|....|
....
000241f0 00 00 00 00 00 00
00 00 00 00 00 00 00 00 55 aa |..............U.|
Boot Sector and BIOS Parameter
Block (BPB)
Location: 31200 (found by inspection at this address in
two different cameras. This can probably be reliably found as the
second sector of the cluster 001 (marked 1002 in the ECC area.)
00031200 e9 00 00<53 55
4e 50 4c 55 53 20>00 02.20.01
00 |...SUNPLUS .. ..|
00031210 02.00 01.d7 7c.f8.03 00. 10 00.04 00 29 00 00 00
|....|.......)...|
00031220 00 00 00 00 00 00 29 00 00 00 00 00 00 00 00
00 |......).........|
00031230 00 00 00 00 00 00 46 41 54 31 32 20 20 20 00
00 |......FAT12 ..|
00031240 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 |................|
000313f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55
aa |..............U.|
Interpretion: OEM name = SUNPLUS, 512 bytes per sector, 32 sectors per
allocation
unit (cluster), 1 reserved sector, 2 fat copies, 256 file entries in
root directory (at
32 bytes per entry, that yields 16 sectors), 0x7cd7
total sectors (16 MB), fixed media (f8), and one fat takes 3 sectors.
Fat tables (two copies)
Location: 31400 (found by inspection in two different
cameras), This can probably be reliably found as the part of cluster
001.
Future: description of how FAT12 works. Every set of three bytes (ab cd ef) represent two 12-bit
entries (dab and cef). You can just follow my
example and move the nibbles appropriately.
First copy:
00031400 f8 ff ff ff ff ff
05 60 00 07 80 00 09 f0 ff
00 |.......`........|
00031410 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 |................|
Second copy: (note size of fat table is 3 sectors, 0x600
bytes)
00031a00 f8 ff ff.ff ff
ff.05 60 00.07 80 00 09 f0 ff
00 |.......`........|
00031a10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 |................|
Interpretation:
FAT[0] = ff8 (media descriptor)
FAT[1] = fff EOC mark (end of cluster chain)
FAT[2] = fff bad/used
FAT[3] = fff bad/used
FAT[4] = 005
FAT[5] = 006
FAT[6] = 007
FAT[7] = 008
FAT[8] = 009
FAT[9] = fff EOC
FAT[a] = 000 free
FAT[b] = 000 ... etc ...
Root Directory
Location: 32000 (found by inspection)
00032000 44 43 49 4d 20 20
20 20 20 20 20 10 00 00 00
00 |DCIM .....|
00032010 00 00 21 00 00 00 00 00 21 00<02 00>00 00 00
00 |..!.....!.......|
--> Subdirectory "DCIM" can be found at cluster 002
Directory "DCIM"
Location: cluster 0x002 (as stated in root directory).
Using the ECC data, we find this is at address 34000, which happens to
follow the root directory.
00034000 2e 20 20 20 20 20
20 20 20 20 20 10 00 00 00
00 |. .....|
00034010 00 00 00 00 00 00 00 00 21 00<02 00>00 00 00
00 |........!.......|
--> File "." at cluster 002 (that's this directory in this
cluster)
00034020 2e 2e 20 20 20 20
20 20 20 20 20 10 00 00 00
00 |.. .....|
00034030 00 00 00 00 00 00 00 00 21 00<00 00>00 00 00
00 |........!.......|
--> File ".." cluster set to 0, meaning we're a child of the
root
directory.
00034040 31 30 30 4d 45 44
49 41 20 20 20 10 00 00 00
00 |100MEDIA .....|
00034050 00 00 21 00 00 00 00 00 21 00<03 00>00 00 00
00 |..!.....!.......|
--> Directory "100MEDIA" can be found at cluster 003
Directory "100MEDIA"
Location: cluster 0x003 (as stated in the DCIM directory).
Using the ECC data, we find this is at address 44000, which again
happens to
follow cluster 002.
00044000 2e 20 20 20 20 20
20 20 20 20 20 10 00 00 00
00 |. .....|
00044010 00 00 00 00 00 00 00 00 21 00.03 00.00 00 00
00 |........!.......|
--> File ".", cluster = 0003
00044020 2e 2e 20 20 20 20
20 20 20 20 20 10 00 00 00
00 |.. .....|
00044030 00 00 00 00 00 00 00 00 21 00 02 00 00 00 00
00 |........!.......|
--> File "..", cluster = 0002
00044040 44 53 43 5f 30 30
30 33 4a 50 47.20.00 00 00
00 |DSC_0003JPG ....|
00044050 00 00 21 2e 00 00.00 00 21 2e.04 00.2c 57 01
00 |..!.....!...,W..|
--> File "DSC_0003.JPG", First cluster = 004, Size = 0001562c
(it's stored LSB first) = 87852 bytes
00044060 e5 53 43 5f 30 30
30 37 4a 50 47 20 00 00 00
00 |.SC_0007JPG ....|
00044070 00 00 21 2e 00 00 00 00 21 2e.0a 00.a1 80 00
00 |..!.....!.......|
--> File "?SC_0007.JPG" (deleted, as marked by the starting
0xE5), First cluster = 00a, Size = 000008a1
File
"DSC_0003.JPG"
This is the only non-deleted file I have stored in the camera. The
directory entry tells us that the first cluster is 004; we use the FAT
table to follow the chain and find the rest:
FAT[4] = 005
FAT[5] = 006
FAT[6] = 007
FAT[7] = 008
FAT[8] = 009
FAT[9] = fff - end of chain
We can translate these cluster numbers to block addresses, and then
search through all of memory (argh!) to find the flash address:
Block flash divided
by cluster
size of
cluster
address address 0x4000 bytes (for use with dd)
4 -->
1008
58000
16
5 -->
100b
4c000
13
6 -->
100d
2c000
b
7 --> 100e
38000
e
8 --> 1010
3c000
f
9 -->
1013
4000
10
The unix utility "dd" can be used to extract blocks from the data file:
dd if=xxx.bin bs=0x4000 count=1 iseek=0x16 of=out16
The output files can then be concatenated, or you could pipe a whole
bunch of 'dd' commands together in one long command.
Results
This is the picture that was stored on my camera:
It's a shot that I took accidently at my dining room table. I think the
box in the middle is a battery holder powering the camera (with red and
black leads coming out of the right side). In the background top &
center looks like my soldering iron. The table is made up of tiles; you
can see a seam running from the top left of the picture down to the
bottom right. The lighting is dim incandescent, so that explains the
red cast -- the first thing I did when I took the camera apart was to
remove the flash so that I wouldn't shock myself.
Ok, that was fun and all, but very boring and stuff you can read in the
Smart Media spec. It's time to knock
it up a notch!
Deleted file "?SC_0007.JPG"
When a file is deleted, two things happen. First, the first byte of the
file entry is set to "E5". Second, the clusters allocated to the file
are freed in the FAT table. Most of the file entry is usually still in
the directory, but of course can be overwritten if needed. Most
importantly, note that the first cluster and the file size (which can
tell you the number of clusters used) remain.
Deleted files can be recovered if you make 2 assumptions. The first is
the missing first byte of the file name -- we can assume that it is 'D'
here, although it really doesn't matter. The second is a much harder
assumption: that the clusters were assigned in a determinable
order. Although the operating system can theoretically place the
clusters in any order, we can
assume that it fills from the front of the drive towards the end,
filling in unused clusters. (You'll notice that's what happened with
the non-deleted file).
So, let's recover this file. First notice that the file is 0x8a1 bytes
-- very very small. Let's ignore that and see what happens. Second, see
that the first cluster is 00a -- it's marked as free in the fat table,
so it looks like the beginning hasn't been overwritten yet. Let's
assume the file occupies cluster 00a on upwards -- all these clusters
are free in the FAT.
Here's where the wear-leveling algorithm can help us in file recovery.
Never-used clusters are never initialized with ECC data, so we can
reliably tell which clusters have had data (if it weren't for the ECC
data, we could just see if the regular data was an all-erased "FF"
pattern). Searching through all the ECC areas, we can find clusters 00a
and beyond:
Cluster
number: 00a 00b 00c
00d 00e 00f 010
011 012
As stored w/ parity: 1015 1016 1019 101a
101c 101f 1020 1023 1025
ECC address:
2400 2e00 3800 2800 2a00 3000
3200 3400 3600
data address: 48000 5c000 70000
50000 54000 60000 64000 68000 6c000
We can further verify our assumptions by checking for the JPEG
signature ("JFIF") in the first cluster. Eight clusters is 128KB - a
little small, but a reasonable size. As I've said before, let's just
try it.
The result is a very dark picture. It's hard to tell if this is a
picture I took with my finger on the lens (easy to do because there
aren't many ways to hold the bare PCB without touching those fragile
solder connections). There artifacts typical of an actual imager (as
opposed to just zero's stored in memory), but it's not known if the
line at the right side is a file misalignment or another artifact of
the imager (both very possible).
Actual
Smart Media Implemention
Just because the data is stored in a recogizable format doesn't mean
that the implementation is necessarily any good. There is still a
trade-off to be made between good wear-leveling, unused block erasure
(for security), and speed/power.
I've still got notes to decipher, so check back here... until then,
here are my old notes:
Here's a little histogram count:
1019, 1016, 1015, 1013, 1010, 100e, 100d, 100b, 1008, 1007, 1004, 1002,
1001 seen 32 times.
0000 seen 39 times.
Here's another representation:
spare &nbs
p;
ecc addr: 1000 1200 1400 1600 1800
1a00 1c00 1e00
data addr: 20000 24000 28000 2c000 30000 34000 38000 3c000
cluster:
0002
####: ---- 1001 ----
100d 1002 1004 100e 1010
ecc addr: 2000 2200 2400 2600 2800
2a00 2c00 2e00
data addr: 40000 44000 48000 4c000 50000 54000 58000 5c000
cluster:
0003
0004?
####: 1013 1007 1015
100b 101a 101c 1008 1016
JFIF
JFIF
ecc addr: 3000 3200 3400 3600 3800
3a00
data addr: 60000 64000 68000 6c000 70000 74000
cluster:
####: 101f 1020 1023
1025 1019 ----
Data with a #### of 0 is
written to pages starting at spare addresses:
addr spare-addr data
01000 0080 ff's
04000 0200 256 bytes of 00,
then aa 55 ... ending with BAA037500003
08000 0400-04f0 sparse counting-like pattern
(0420-0460 have checksums of ff)
09200 0490-04e0 random-like data
09e00 04f0 00's
0c000 0600 256 bytes of 00,
then aa 55 ... ending with BAA037500003
10000 0800-08f0 (0820-0860 have checksums of ff)
14000 0a00 \ same cksum
18000 0c00 /
20000 1000 \ same cksum
20200 1010 /
Remaining Items
Error correction is a big one - I don't have an example of what
happens when the flash starts going south.
Data structure - There's probably some directory-like thing that allows
access to a particular cluster without searching the entire chip.
I didn't see evidence of this in the Smart Media summary, but it would
have to exist if the file system is to be efficient. (not that it has
to be well designed, but I'm hoping it is)
back to my dakota digital camera page
my homepage