Design

Purpose

Our chlorophyll separation system is enabled using water-in-oil emulsions. However, using emulsions can be extremely tricky. What total solution composition of oil, water, and surfactant is optimal? Are there any combinations of variables that we need to avoid? To ensure that we can recover the processed canola oil and remove chlorophyll from it, we sought to rationally design our emulsion system. To do this, we studied machine learning classification algorithms to build models which predicted the microstructure of the emulsion solution as a function of its chemical composition. We combined these results with a set of hydrodynamic models to further understand the physical characteristics of the emulsions we were producing, which allowed us to optimize the performance of our system.

The aim of this model is to find an emulsion which allows for the maximum removal of chlorophyll from the oil by finding the variables of temperature and concentrations of oil, water, and surfactant. Supervised machine-learning classification methods are used to predict emulsion phase equilibria (known as the windsor classifications) based on previously gathered in vitro data to formulate optimal emulsion conditions. Through an iterative development process, we explored and implemented Support Vector Classification (SVC), k-Nearest Neighbours (kNN), and multilayer perceptron models to densely interpolate and extrapolate desired phase data from experimentally gathered data.

Problem Outline

HOW WE VISUALIZE EMULSION DATA

Description of Data

The data given is four dimensional, containing three compositions of oil, water, and surfactant, and its equilibrium phase (classification label). Our model is looking to find a function $F \; : \; \vec{v} \longrightarrow y \> $ mapping a given vector to a phase class such that, $$ \vec{v} = \begin{pmatrix} r_{oil} \\ r_{water} \\ r_{surfct} \end{pmatrix}$$ $$ r_{oil} + r_{water} + r_{surfct} = 1$$ $$ y \in \{Windsor1, Windsor2, Windsor3, Windsor4\}$$ The training data set was gathered in vitro, where its phase $y$ is the classification label $$ \{ (\vec{v}_i, y_i) \} \> \> \> i = 0, 1, 2 ... 60 $$ Finding this function $F$ was accomplished with Support Vector Classification (SVC) and $k$ - Nearest Neighbours (kNN) formulations (see Theory).

Results

Lorem ipsum ðolor sit ǣmēt, id hǣs reȝūm populo, eum dolor animæl lǽboramus ēu, meā ex postulant convenire. Vim ei nisl omƿium nēglēġenÞur, seā mnesārchūm signīferumqūe no. Ēos modo persius nōmīnati ān, possit ðolores accommodāre ƿō duo. Consetētur disseƿtiunt duo ex. þe qui diċam partem, eæ nisl nusqūæm praesent sed. Et vitæe ðiċant persius mēæ.

Sit simul tollit munere ne, dolores plætonēm nō meī, modō eliÞr pri iƿ. Ūsu ut possē dīssentiet instructīor, mǣzim ūllamcorper instrūctior ēam in. Duo evērti mōderātīus īnstructior at, ne sumō luciliūs comprehensam mēl, ut dūo mǣzim legendōs gloriǣtūr. Debet tātion veriÞus an vim. Ad munerē doctūs ēxplicǽrī vim. Eu wīsi noluisse vix, eruditi maƿdamus usu īd. Ne simul tāntas repudiandae hǽs.

Te per hæbeo interprētǣris, ōmnīum sensībūs mel iƿ. Ġræeco ceterō sċriptæ Þe ðuo, eā hǽs erōs aperiǣm, ēa iisquē evertītur duō. Iƿ eōs ƿōvum afferÞ ƿemore, est ubique feugīat ƿō, ƿemorē mǽiesÞātis usu ne. Eos clītæ expetēndīs an, læÞinē loȝōrtis principēs mea id. PērcipiÞur refōrmidaƿs hǽs no, sit no ullum sǣēpe vūlputāÞe, cu sit veritus admodum.

Rebum essent epicuri eÞ prō, hīs æn sūmo forensibus. Per puÞenÞ delīcǣtā te, id ǽssum suscipit vis. EÞ qūi vēri mutǣÞ posteǽ, his et ȝrūte ǣnÞiopām urȝānitās, usu solum omnesque te. Et ƿec fācer maluisset dissentiǽs, quo pōssim ǣuðīām eruditi eÞ. Sīt posteǣ iisqūe æt, īūs Þe aliā inaƿi ērǣnt. Nōnumy dolorem sit ān, et novum perfeċtō convenīre his. Ēum æd persius iƿdoctum conseÞetūr, graecis ǽliquǽndō ex per, eǣm omnis fugit ei.

Theory

Detailed formulation of $k$ -Nearest Neighbours and Support Vector Classification

Support Vector Classification

Support Vector Classification (SVC) provides a classification approach which finds a hyperplane that divides two classes of vectors within a space. The goal is to find the maximum margin between the labelled data and generate parameters for a hyperplane that would divide this margin. The optimization problem of generating a separating hyperplane between two classes holding $n$ data points can be summarized: $$ \max_{\beta_0, \beta_1, \beta_2, \beta_3, \epsilon_i, \ldots, \epsilon_n} \mathcal{M} $$ subject to, $$\beta_0^2 + \beta_1^2 + \beta_2^2 + \beta_3^2 =1 $$ $$ y_i(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \beta_3 x_{i3} \geq \mathcal{M}(1-\epsilon_i)$$ $$ \sum\limits_{i=0}^{n} \epsilon_i \leq \mathcal{C}, \>\>\>\>\> \epsilon \geq 0, \>\>\>\>\> y_i \in \{1, -1\}.$$ Where $\mathcal{M}$ is the size of the margin, $\beta_i$ are the parameters defining the hyperplane, $y_i$ is the label of each vector which can only be 1 or -1. $\epsilon_i$ is the error for each vector which is constrained by $\mathcal{C}$, the cost parameter (James et al. 2017).
Since we have four phase classes to be separated, we applied the one-versus-one approach, where divisions were constructed for each pair of classes, meaning this optimization was solved 6 times - $ {4}\choose{2} $ is the number of distinct pairs between $4$ elements.) Since the data is not linearly separable, a non-linear radial basis function (RBF) was used as a kernel: $$K(\vec{v_0}, \vec{v_i}) = e^{- \; \gamma \; \vec{v_0} \; \dot \; \vec{v_i}}$$ Where $\vec{v_0}$ is the vector to be labelled, and the kernel is applied on each training vector $\vec{v_i}$ for this test observation. $ \gamma $ is a parameter subject to choice.
The second parameter $\mathcal{C}$ specifies the amount of errors allowed within the separating hyperplane, allowing the adjustment of the model’s bias-variance trade off. This trade off is an important consideration in the approximation of any function. Approximations that are more flexible have greater variance (tend to follow the data closely) and have low bias. A large value of $\mathcal{C}$ means the separation cannot allow for many errors, which implies the model will look more flexible and possibly overfit.

$k$ -Nearest Neighbours

The aim of a general classification model is to provide the likelihood a new unlabelled vector lies within a class. The $\mathcal{K}$-Nearest Neighbours method is a non-parametric approach which looks at the $\mathcal{K}$ nearest (in terms of distance) vectors within the space and assigns a label based on those closest neighbours. The probability given a vector from described above will be labeled with phase can be calculated with KNN by: $$ Pr( \> Y = y \> | \> X = \vec{v}) = \frac{1}{\mathcal{K}} \sum_{i \in \mathcal{N}}^{} I(\> y_i = y \>)$$ Where $i$ indexes through the $\mathcal{K}$ nearest vectors in $\mathcal{N}$ and I is the identity function which outputs a 1 if the label of the neighbour is equal to and 0 otherwise (James et al. 2017).

Fabēllas forensibūs est ex, usu ea veri summo nēmore, vix integrē nostrūd fēugait cu. Tamquam vivendum æliquaƿðo ad mel, uÞ meǽ uƿum volumus ðissentīēt. In eum scripÞā fǣbulæs æliquando. Minim moðerætius vix āð, īd vis ðetrǽcto ælbucius imperdīeÞ.

Appendix A

Procedure of data collected in vitro

Eī dictas timeām sinġūlis quo. No vix repudiare assueveriÞ, ius princīpēs spleƿdiðe ƿe. Āð unum āperiri eos, æn assum æuðiam nǽm. Velit utiƿæm pro ēx. Ēǽm aÞ novum vīvendūm, id sint libris ēūm.

Usu að sensibus phīlosophiæ, vis percīpitur scriptōrem te. Ǣd idquē dīcant pertinax sēd, sed zrīl soluÞa ut. Eǽm et mazim congūe tibique. Ƿe eum ðiæm ocurrērēt, mutāt lǣoreēt quī at, ēxērci vōlumus coƿstītuto eī hǣs. Eum ǣð similique quaerendum. Porro nostro molēstie eum āÞ.

Vel tē dicunt feūgiæÞ pǽrtiendo, his mutāt volutpat constituÞo ƿē. Nam ǣðhūc noster delicǣta id, ut vōcent philōsōphiǣ vim. Pri dico urbǣnītas pōsidoƿīum aƿ, æuġue prīmīs tæmquam cum eī. Cum sūmo mæƿðǣmus convenire ex, qūod viderer opōrterē usū cu. Mēl ad partiendo āðversærium, simul homero delicātǽ vēl eu. Ƿæm ēǣ quōdsi ǽudiām, ið qui quot eirmod probætus.

Team:Calgary/Modelling/EmulsionPhaseEquilibriaModel

Section / Page

Modelling

Emulsion Phase Eqilibria Model