2012年11月22日星期四

What is Softwae Defined Network

(This is mainly Scott Shenkar’s Definition, but I wholeheartedly agree)

SDN is three abstractions aiming at abstract out simplicity on the network control plane (which is currently ad-hot ACL, middleboxes, DPI, and other functionalities).They are distributed state abstraction, specification abstraction, and configuration abstraction.

1. Distributed State abstraction -- centralized state
network states are physically distributed over many many switches. But that doesn’t mean we have to always deal with this. This distributed states should be abstracted out into a logically centralized task, where you are dealing with a global network view, i.e., some data structure, not some distributed states. Then this logically centralized task could be dealt with in whatever way you like, you could even distributedly do it for scalability when approiprate. But that is a distributed system problem, not a networking problem with inherently distributed states. And you are not forced to deal with network scale complexity.


2. Specification abstraction (or network virtualization) -- simple network view
Control program should describe functionality, not how to realize it in the particular physical network. So what the control program see should be virtual network view which is only complex enough to express its desire, not as complex as the actual underlying physical network.
e.g., for ACL problem, program should only see endpoint-to-endpoint network.


3. Configuration abstraction (or forwarding abstraction) -- hardware oblivious forwarding specification.
Configuration abstraction should expose enough to enable flexible forwarding decisions, but it should NOT expose the hardware details. (OpenFlow comes in, but only partially solve the problem here. It assumes switches are the unit of forwaring abstraction, instead of , say, a fabric).

All in all, SDN is NOT OpenFlow. SDN doesn’t have to happen in a datacenter neither. SDN is just reexamining how we manage the control plane of our network.


How to realize SDN (not that important, and you probably have seen this dozen of times...):

            control programs
     -----------------------------------------  (control program’s network view, or virtualized network)      
          virtualization layer
    ---------------------------------------------  (centralized network view, i.e, one data structure)      
common distribution layer (network OS)
     -------------------------------------------------  (physical, distributed network states)
physical network + switches

2012年11月13日星期二

Security in the Cloud

1. resource sharing among distrusful customers
    cross VM side-channels attack
    proof-of-concept attack: attacker and victim sharing the same core, attacker try to wake-up as frequently as possible, fill the instruction cache, let the victim run (which use a portion of cache), then wakeup again and measure the performance of previousely cached data. (so that how victim uses cache is learned). This could enable you to learn the secret key of the victim
   For multi-core attacking: force shedular to re-schedular frequenctly, so you end up getting the same core with the victim a lot
   DNA reassemble technique used to go from partial, noised secret key to complete secrete key

2. pricing of fine-grained sources
    performance variies with different type of cpus, and network performance vary too, so not very predictable.
   either predictable but low performance, or high but unpredictable performance
   loss comes from workload contention (Zen does good job in cpu performance isolation, but so greate for memory, disk or network. but not much Zen can do anyway)
   thus uniform of abstraction fails
   and attackers have opportunities to interfere with other's workload
 a. placement gaming:
    start multiple instances and shut the ones which perform worse
   when seeing bad performance, just shut the vm and launch a new one

b. resource freeing attack:
   attacker and victim both run apache on the same physical machine, and both want more bandwidth
 attacker could request a lot of dynamic pages from the victim, which is cpu intensive, and when the victim is busy processing these requests, bandwidth is free for the attacker to use

storage panel:

storage in the context of cloud and network (no slides....)

intersection of storage, sdn and computing
system management paradigm important
redefines what is to sell IT
slaf? (an open source storage cluster stack built on commodity hardware)
sotware defined storage align with SDS: volume management, virtual network + virtual storage device,)

 I have no idea what he talked about for 5 minutes.....


reserch problems in software defined storage (NetApp)

trends in storage in datacenter: hetoragilty, dynamic, sharing, and ???
sotware define storage: have all types/layers storage components to seamlessly communicate with each other, and in a way which abstracts out details (like wheter you support dedup or not)

He has a tabke of SLO language (which is worth looking up later)
different stackts for different SLO at different cost points
SLO a core idea and need to be standardized 

storage management not in a single box, but at mulitple layers
isolation in performance and failure/security case
failure handling when one storage service is composed by multiple components (remzi covered layered structure, could be other structure)

now storage happens a little bit in hypervisor, virtual machine and application. which layer should do what? how do we coordinate different layers. 

new storage problems due to dynamism

want: data structure storage.


Evolution of software defined storage (VMware)
go back and look at how SDS changes, as we already have software defined storage
traditionally: hardware defined storage realized by special purpose boxes
clear boundaries between hardware/software, and limited interface: block read/wite
now interface is changing, and the enabling factors are:
1. commoditization of those big boxes
2. because cpu advances, hosts are more powerful and could have more intelligence, rather than, say, put dedup inside box
3. richer interface standards (like which offerred by SCSI, say, xcopy)
4. boxes simplifying, allowing applications to sepecify what they want
5. no distinction between server node and storage node anymore, just bricks (which has cpu, memory, flash and disk) -- already happening in Google, Facebook, etc. 

summary: software defined storage is what happens when distinction of server node and storage node blurs, and we have disruptive, share-nothing model(what does he mean by this???)

Storage Infrastructure Vision(Google)
google needs mulitple data centers
data consistency first class citizen now, scaliabilty, reliability and pricing problems
google infrastructure goal: not just look simple by outside users, look simple to application programmers too. Complexity managed by infrastructure operators, not application developers

Datacenter sotrage systems (Facebook)
chanlenges: 
1. sotware stack will be obsolete soon: (more cpus, not faster cus, faster storages, faster flatter networks)
2. heterogeneous workloads -- but you could potentially have specific datacenter for spedific things, so the workload no that different
3. heterogeneous hardware -- how to take advantage of different hardware profiles
4. dynamic applications -- need adaptive system

opportunies:
1. multi-threaded programming is clumsy -- new parallel programming paradigms
2. flash != hard disk -- new storage engine for flash
3. high speed network stack -- new network stack
4. dynamic system will win big -- high throughput vs. low latency, space vs time, data temperature aware


WISDOM discussion (microsoft)
COSMOS:
   service internal to micorsoft, used for batch procesing and analytics
   chanlanges:
         transient outliers can pummel performance, thus hard to reason
         any storage nodes servicing multiple sytpes of requests
         exploiting cheap bandwidth (flat network), but we always have storage outpaces network







sds (kinda) work by remzi's group and others

use flash as a cache (from mike swift's group)
flash is widely used as cache
use ssd's block interface is inefficient for cache, because cache is different from storage
new firmware in ssd, to get rid of block mapping, and use unified address space
they also provide consistent cache interface (which blcok is clean/dirty, which block has been evicted, etc.)
when doing gabage collection, don't have to migarate because it's cache, and we have primary copy of the data somewhere
we plan to propose new interfaces and virtual ssds (so it could be more software defined???)

combating ordering loss in the storage stack -- the no-order fs work (Vijay)
ordering not respected in many layers
don's use ordering to ensure consistency
1. coerced cache eviction -- additional write to flush cache to ensure ordering
2. backpointer-based consistency, do verification on backpointers when following pointers in file systems
3. (on going) inconsistency in virtualized storage setting

de-virtualization for flash-based ssd (yiying) - the nameless writes in ssd
take off too many layers of block mappings in file system -TFL- physical block
store physcial block number directly in file systems (when file system write a block, no position is specified, and disk decide where to put the block, and inform file system where the block got written)

in a virtualized enviormant: file system de-virtualizer (on going, i think)
file system perform normal writes, but fsdv do nameless writes, and store physical mapping)


zettabyte reliability (yupu)
add checksum at file system/memory bondaries
also an analytical framework to reason about reliability -- the sum of probability model
future: data protection as a service? without modifications to the os? i am not quite sure how to realize this....)

low latency storage class memory
data-centric workloads are sensitive to storage latency
storage-class momory: phase-change memory, stt-ram etc.): persistent, low latency, byte addressable
mnemosyne: persistent regions + consistency
                     persistency: so that data structures don't get corrupted
                     consistency: update data in a crash-safe way


Harden HDFs( Than Do)
software defined way to ensure system reliability
1. selective 2-versioning programming
2. encode file system states using bloom filters



software defined storage by Remzi (more like virtualized storage to me...)

today:
    big box storage dominates, but the selling point is just the software
    so go from software/hardware to software/vmm/hardware

problem:
    1. reliability
         when errors are propagated, where in the system are the errors handled? -- file system just lost errors during propagation! (like 1/10)
         error handling is fundamentaly hard (not because linux is written in C, or because it is open sourced thus poorly programmed)
         how to reasonable about system error profile?

    2. isolation
         storage isolated from VM to VM
        question: how does fs react to block failures? (type aware fault injection) -- Remzi  somehow linked this work to fault isolation =,= ----- get a result matrix on how file system react to different types of faults -- write errors are largely ingnored by ext3 or other file systems, sometimes they panic too (for reitherfs)
        so if one vm does something funny, it might cause the underly storage system to panic -- thus need to isolate faults systematically.
   

    3. performance
       how does performance change when you stack storage systems (say file systems) together, and how to systematically reason about it

    4. application
   how applications use storage (something along the line of Tyler's file is not a file paper)


beyond virtualization:
composition of storage!


Questions:
1. have you seen correctness guarantees been breaken?
    Of course! Disk lie. Apple just change the semantics of fsync

2. how about error propagation in a layered storage?
    we are doing that work. but we have shown that error propagation in a single layer is hard, could imagine it even worse for multiple virtualized storage layers

3. your vision of software defined storage is quite different with sdn? how do you relate your work to sdn, and maybe some analogy of sdn on how to manage storage?
     we are starting small, and on problems we have already seen.
     (i personally think remzi doesn't have a good answer on sdn-flavor like sds, in terms of storage control plane , yet...)


Future trends on storage

Future trends on hard drive:

1. area density growth
    chanlenge: nearghering blocks interference
                       smr (overlapping sectors): could do sequential read/writes and random reads, can't do random writes
     heat assitented magnete recording to increase AD (assymetric temperature for read and write, heated recording, and use laser to heat)


2. what about ssd:
    ssd important to improve performance
    not a viable candidate for capacity though  (cost for fab, but not that much revenue for the whole industry)


Futture trends on NV memory (Fusion io)
 muliti-layer vision
multi-layer memory (less reliable) is the vast majority used in data centers

 how to effectively use flash:
  hiarachy of DRAM, flash, disk

 fustion io: api to directly interact with flash instead of traditional block interface
             api more appropriate for the flash media (transactional semantics, etc.)
            more memory-like semantics of flash instead of traditional stroage view of flash, and corresponding api

      basic io: read/write
   transaction io: commit
    memory like: ability to chase a pointer

  challenges:
     1. reliability with low cost/high density meida
     2. integration with existing software stacks, caches, tiering (falsh consumer orientated, not data center oriented, so up to software guy to make it work for data centers)
     3. system and data center implications - networking, scale out vs. scale up
   

Future trend of data protection/backup (data domain)
phase 0: tape
phase 1: deduplicated disk
              difference between backup storage and primary storage (see their fast'12 paper) -- don't care about iops for backup
              so disk should be optimized for backup purpose
phase 2: optimized deduplicated disk (disk no longer behave like a tape, and can do things differently than you do using tapes)
              new io interfaces to do back-up
              incremental forever, virtual full (instead of weekly full backup and daily incremental)
 phase 3: integrated data protection (backup) silos
                now we endup doing multiple backups every body and each layer (and you don't know how much your organizations are spending on backups!)
phase 4: solve problems of phase 3
               provide a data protection data cloud (what is that?)
             
Why buy innovation (microsoft research, not a storage guy, works on hw acclelaration on search engine)

what innovations on data center?
1. a different scale-cost curve (lower cost for same scale) for the same value -- but innovation has fixed starting up cost (even at 0 scale)
2. different (and greater than linear!) scale-value curve!
    e.g., new capability, competitive advantage, new business


university research typically focus on catogary 1 but catogary 2, industry typical want tatogry 2 a lot.
Why do academia trying so hard to do little tweaks on performance but not thinking about 2???



Questions:
1. why add area densities rather than just adding platters or RPM?
     limit on how many platters and rpm(energy, say)

2. what is the best mechanisms for the new interfaces?
     people like to write programs differently (some people like memory model, while some people do io well)

3. when does these new media hit mainstream?
     well, still in early phase.... see where they could go
   
4. for catagory 2 research, it's hard to measure new values rather than measure performance. How to convince people our research has value?
    it's the decision of the community of a whole to reward research of type 2. 30 years ago we have papers with less measurements than today's. so it is a culture problem of the community

5. why didn't you mention arrays? (disk arrays, flash arrarys, memory arrays, etc)
    raids are well known
    people are building flash arrays, and there are some things new compared to traditional raids

6. innovations on interfaces to storage; for years we have  read/write blocks, what has happened to change that? Or are we going to end up with read/write blocks anyway? (by Remzi)
    data domain: we got value by changing the interface
     fusion io: media changes, roles of open source community has also changed. even though new interfaces not being picked up by old app, but could be used by new apps
     microsoft: if we could get a lot performance, then it's worth to change interface
    data domain: not just performance, but for new capabilities a lot of times    







SDN: industry view

Cisco:
    Missed what he is saying.....
    but one point is slicing is fundamental for network
    and how to balance (academia) network flexibility with production isolation/security

NEC:
   Openflow market potential huge
   usecases:
    1. enterprise departmental isolation/mobility
    2. campus virtual circuits for research collaboration (say, geni)
    3. service provider: network efficiency
    4, cloud comprehensive virutalization (virtual network work with VM and other virtualized stuff)

Google: (not view, but what google is providing)
   Google sdn (openflow) WAN -- to connect datacenters through out the world
        they still use bgp and isis (for backward compability)
        they also use openflow to create (isolated?) large network test enviroment
        they used contrallized control to push workload to the edges as much as possible, because edge swiches are cheap
        management can't be centrailized entirely -- it's a distributed system problem, but not a networking problem anymore
   All in all, Google likes to turn networking problem into distributed system problem, which is what sdn does anyway
    but they want to aggreate network information, and be able to understand and reason about network performance (as there is no formal way to debug/diagnose netowrk performance problem yet)

big switch:
    what sdn is: transition from closed big boxes to open programmable solutions!
    make network operators be able to develop applications (easy, intuitive APi, easy debugging)
    use case: big tap
           monitering network: use a separate tap network, and dump a huge amount of traffic into this tap network, then analyze it (dumb idea, but great win for operators)
     middleboxes: focus on virtual paths, and middlebox-controller communication design (will need integrate middlebox into openflow, i think)

questions:
   1. in a virtulized enviroment, where is the controller?
       google: we are not vmware, we have controller managing all layers
       big switch: we do not want to preclude trandional newwork vendors from this space

   2. openflow is as powerful as closed flow?
       nec: this is a matter of time, as use cases emerge, it will become more powerful to integrate functionalites (5-10 yrs)
       google: sdn is not openflow, but openflow is a form of sdn, as long as you have external controll, doesn't have to be openflow
        cisco: software enables new thing (verification, say)
        google: how to have secuirty in an open network enviroment is a challenge
        big siwtch: contrlloer will become an os, so what the google folks talk about are really process isolation
   
       
   

Consistency First Approach for Routing (BGP revision)

Consistency:


consistency-first approach (over availabiliy or performance)
1.necessary
2.feasible
3.powerful abstractions

scater: consisent storage manager??? (maybe some storage abstraction here????)

internet availability not high: 2.0-2.6%
because an available physical path doesn't imply a router path

90% of outages are less than 700 secs (due to tansiant failure of network maybe?)
so more robust routing prococols to avoid short-term outtage caused by dynamics of routing protocols

bgp protocol:
1.opaque local policies
2. distributed mechanism to update paths (with some delay, thus inconsistency)
so causes short network unavailability

so underlying cause of short -term outages is inconsistent global state

Consensus Routing:
consistency-first approach
decouple safety and liveness

safety: forwarding tables always consistent and police compliant:
apply route updates only after they have reached all dependent Ases
apply updaes synchronously accross Ases
mechanism:
1. run bgp but don't apply the updates
2. a distributed snapshot taken periodically
3. ases send list of incomplet consolidators
4. sonsolidators run a consensus algorithm to agree on the set of imcomplete updates
5. consolidators flood new routes, and forwarding table updated (Note: they are route states updates, not route updates! which route to take is still within choices of each as, and doesn't have to be exposed to outside world)
(in essense, a two-phase commit process)

liveness:routing system adapts to failures quickly
solution: use old path, but dynamically re-route around the failed link (use exissting techniques)


scatter (p2p consistent storage layer)

design patter for consistency-first approach:
1. separation of safety and liveness
2. consistency as a baseline guarantee: trade-off is then between performance and availability (more constraint design space)












Software defined network in datacenter (From AA's group)

Elastic Middleboxes in the Cloud: (Robert)

where is the bottleneck (app? bandwidth? middlebox?)

Fast bottleneck Identification:

Processing Time ( hard for packets in/out)
CPU/Mem info (hard to decide how often to sample)
Open Connections

Strato: captrue MB and NW bottlenecks
Use Greddy Heuristic (tentatively add middle box) --- doesn't work in complex MB topologies
Refinement: most common used (overlapping) MB first?





Midldleboxes scaling: (Aaron)
Move some of the middleboxes control to Controller.
1. How is the logic devided?
    classify middlebox states, and define interfaces between middleboxes and controller
    Action state + support state + tuning state
    Represent states: key (Field1 = value1, field2 = value2....) + Action (drop, forward, etcc.)
   interfaces: get, remove, add states



NaPs: network-aware placement and scheduling in Clusters (yizheng):
motivation: little work to examing the interplay between CPu/memory resource sharing and network resource sharing (by TCP congestion controll, etc)

Quincy placement: put instances as close as possible (not optimial)
NaPs design: 
general framwork to enable network awareness of clusture:

Lowest level: sdn controller to expose network state
Higher level: cluster shedular to communicate with sdn controller for netowrk status, work nodes for workload info and storage for data placement information




 

2012年11月3日星期六

A script to modify Samsung Galaxy S3's rootfs (including init scrpit)



Granted you will need to modified it according to your own environment, and install appropriate software (mkbootimg, etc.)

After running this script, you could use clockworkmode (which should be in your recovery image if your phone is rooted, so you might not want to reflash recovery.img...) to flash modifed image (here only boot.img is modifed) to your phone.

In this script, new boot.img is located in /emmc/clockwork/mod in recovery mode.

You only need to restore boot.img to save time.

Reference:
how to unpack/repack boot images (including ramdisk):
http://android-dls.com/wiki/index.php?title=HOWTO:_Unpack%2C_Edit%2C_and_Re-Pack_Boot_Images
how to calucate md5:
http://www.mydigitallife.info/how-to-calculate-and-generate-md5-hash-value-in-linux-and-unix-with-md5sum/
Linux cpio facility:
man cpio
http://www.gnu.org/software/cpio/

Restore.sh:
cd modified3-2012-11-01.17.05.34/
mkdir tmp
cd tmp/
unpackbootimg -i ../boot.img
mkdir ramdisk
cd ramdisk
gunzip -c ../boot.img-ramdisk.gz |cpio -i
/bin/cp /scratch/suli/android/samsung/config/init.rc init.rc
/bin/cp /scratch/suli/android/samsung/programs/slowdevice/null_bd/null_bd.ko lib/modules/null_bd.ko
find . | cpio -o -H newc | gzip > ../newramdisk.cpio.gz
cd ..
rm -rf ramdisk
/bin/mv newramdisk.cpio.gz boot.img-ramdisk.gz
mkbootimg  --kernel boot.img-zImage --ramdisk boot.img-ramdisk.gz --pagesize 2048 --base 10000000 -o ../newboot.img
cd ..
rm -rf tmp
/bin/mv newboot.img boot.img
rm -f nandroid.md5
md5sum boot.img  cache.ext4.tar  data.ext4.tar   recovery.img  system.ext4.tar > nandroid.md5
adb push boot.img /sdcard/clockworkmod/backup/modified-2012-11-01.17.05.34/boot.img
adb push nandroid.md5 /sdcard/clockworkmod/backup/modified-2012-11-01.17.05.34/nandroid.md5