Thank you. Yeah, review is very useful in post processing open and game output. The next tool that we want to highlight today is Ped Boa. Let me share my screen like it gets started. Bo is a Python tool that we built the last couple of years for parameter estimation of black box macro microgenetic models. The key pieces for the key, I guess the key highlights of the tool is that one it is using Ban optimization, that's why it's the back box optimization. The second thing is that you're specifically written for microkinetic models, in this case Open and KM, but it can also work with other Python models, micro kinetic models, or microkinetic models, or even just arbitrary mathematical models for which optimization has to be done. Tool was built by the University of Delaware professor Blacks Group, and also shaped by colleagues at the chemical company who actually have used for something like this, little bit of a history. The tool is also on our Github page and the source code, the documentation examples. Everything is called Github page. A little bit of history as to why this is done. This is not very commonly done, but this is done. Nevertheless, researchers adjust bicoconecting models so that they better it with experimental data or a better parity with experimental data. The justification is that if you have the flexibility of changing the species energies within DFT error, which is typically 2030 kilojules per mole, it's okay fine fitting of the S DFT data which then goes on into something like which is then going inside a micro kinetic model. It's like one catch a type situation where it's basically catch all type situation where it's out trying to lump all of the errors, errors from DFT errors in the model itself. As in representation of the catalysis or the catalyst using the DFT simulation, the medial scap. Basically everything that comes into material, Sap, exponmental error modeling, trying to lump everything into one, it's not very commonly done. That's why it's, people have done this in the past. The problem is set up as disparameter tuning is set up as an optimization problem. In the optimization problem, what you're trying to do is minimize the disparity between model and experimental quantities of interest. These quantities of interest typically are turn of frequencies that are predicted species mass fractions or any other auxiliary quantity like conversion selectivity which are derived from MKM outputs. Couple of examples. This is about 40 years ago now. This is for the methanol synthesis on copper using 02 and CO hydrogenation. There's just have a snapshot of the list of reactions like 49 reactions. These reactions, the data MKM built for these reactions is designed using DFT data. Then they have obviously heat surfact activation energies and some other customizations like the Omega, the proximity factor, which basically is describing how similar the transition state is to the initial state of final state. That basically the actual activation barrier is calculated using the activation barrier or actuation barrier coming from the transition state calculation and approximately factor correction. Then what they did was they tried to take this MKM and then try to fit with experimental data. How they went about changing this NKM to fit better with experiments is they take the binding energies of each of the species. These are intermediate species. They perturbed by a small value from the D value. It's not given here, but it's given other tables. Let's say the binding energy of hydrogen on the surface is lowered by another 0.07 Things like that, they also perturb the The omega factor basically change the activation energy of each of these elementary steps. When they did that, they got really good parity between the predicted turnover frequencies from the model, The rates of production. In this case, I think it's methanol and water. I think methanol full circles and water is open circles. You can try X axis experimentally observed turnover frequencies. And then the Y axis is the MK predicted turnover frequency. This is one example. The other example is actually, this is again from the same group at University of Wisconsin where they have come up with an algorithm for sequential optimization for fitting microkinetic models to experimental data. They basically post the optimization problem like this. This term here represents the disparity between there is a response which could be mass fractions, rates coming from the model versus the experiment. This is basically a distance between the and then second term is basically a penalization term. The regulation, what this is doing here is that since you're trying to deviate from predicted values, we want the deviations to be as small as possible. You don't want to have very large deviations from the ground truth of the DFP model. You're trying to minimize the parameter deviations. What are the deviations? Deviations could be either, like I said, to the entire piece of species binding onto the surface, or it could be to the activation energies themselves. Or elementary steps. Basically the same idea as the previous paper, but they have come up with the actual mathematical formulation using a gradient based method. Basically minimizing the objective function, or minimizing the loss, the difference between the two responses, model and experiment. And this basically represents the actual MKM. This is equivalent to what is shown in the open MKM modeling. The constraints, the properties of the NPM are set as constraints such as the constant, the thermodynamic equilibrium constant. These are the constraints on these properties. We also want to make sure that the predicted activation energies are meaningful and they're positive. They also don't break fromormic consistency. And then also each of these deviations have to be within the lower and upper bounds. You give bounds to the parameters. They have proposed an approach where basically what they do is the MKM using a D Solar sundials. Then you get the results and you do sensitivity analysis on that. And you take that and send it an optimizer, in this case it's IP. It's a very famous open source gradient based optimization tool. Then the P optimizer predicts new parameters which are used to solve, uh, the DA solver. And then goes on and on until you get a good parity. The optimization here is basically this, minimizing the error between the Ti predictions of the DE solver or the Y, and then the Y coming from the experiment. They got really good parity. This is basically the same methanol data they have used. But then they have done better than the initial paper where they have tried so many hundreds of variations of parameters coming from this optimization. But then they're trying to minimize the, the deviation box. They're trying to be as close to BP as possible. If you're interested, please look at the word is basically based on a similar idea except that it uses by ship optimization in a gradient method, it's a gradient F optimization relies on something called next to which is our in house active learning or patient optimization toolkit which is built by Ephan and others. Ephan is actually presenting this, please check it out. Next is again dependent on other open source libraries like Botch. The idea is that Po basically leveraging a couple of Python tools built in the group. One is put using both of these tools. Pbo next with Open End game, and then does parameter estimation on our Github code for P. Boys, it's a very simple project, we just have one folder with the source code for the. A lot of examples. There are four examples, actually some of them. Final cover this in a lot more detail tomorrow. But a general idea of how Ba optimization works is that if you're trying to basically minimize an objective function, to do that, you initially generate certain inputs, inputs for the parameters that you're changing or optimizing. And then you evaluate the objective function at all of those. The initial set of parameters, the training data, so called, is you can use something like OE or Latin poc sampling to optimally sample a bunch of training data. That training data said can evaluate objective function. The secret source is that the subjective function is now represented by a surrogate model. In this case of next start and pet boys, a caution process. Instead of minimizing the actual full blown objective function on the underlying models, which is MKM, you're actually optimizing the surrogate model. Then using something called the acquisition function, you can actually come up with the next guess at which the objective function has to be evaluated. Idea is that an approach is used when the objective function is very expensive to evaluate. It could be experiments where experiments take time and money. Or it could be an objective function which is very mathematically complex. Maybe an MK which takes 5 hours to run. Instead of running that over and over using gradient based methods, you could segregate it with the caution process. And the gas process is much, much faster than that expensive model. You do the optimization using the caution process for every loop in the optimization. If you had a traditional gradient method, maybe if we needed ten steps in every optimization loop, you ten objective function evaluation. That is, in this case, you only need one objective function evaluation using the actual objective function. And the nine rest of them are using the surrogate process. That's where the benefit is. How do we use Ped Boa to update open NK models? Open KM, as we've seen before, uses input files. The reactor and thermodynamic specification. What we do is that the parameter updates the perturbation that we've seen in the framework that was presented a couple of slides ago. Those updates are done through the Nasa polynomial and polynomial covient update. What I mean by that is that by changing the five cofficient of the Nasa polynomial or the cofficient of the Some polynomial by changing it up or down, effectively move the P of that species by that number. The binding energy divisions could be an EV or kilocalpomleild poole. This is how we change the microkinetic model by changing the Nasa polynomials or Soma polynomials of a given species. Every reaction in which the species participates, the equilibium constant field change every reaction in which that species participates. If that species is the reactant, the transient state will change the activation energial, that species, the transient state, then also the activation energy of that step will change. In effect, you can change the thermodynamics and kinetics of all elementary reactions by changing the Nasa polynomials of a given species. That is all done by using has extensive lies, extensive methods API for this has talked a lot about this we use to do this update the typical workflow. First, if you were to do something like this, you need to have the XL file which showed and I'll also show in a bit which contains the species planning energies and everything. We use that to build the objects for representing all of the species, the reactions and the phases we need that X link initially, that's the starting point, the scratch. Then you have to figure out what parameters we need to fit. This is where the reaction path analysis which showed would help. The sensitivity analysis that showed will help. It's not practical to fit every single species binding energy in the model. You only want to fit the most important ones or the most active ones. You pick the species, you have to decide if it's binding energies or activation energies. Once you do that, then you write a little bit of a Colin Python using. To edit the input files to open. Because when you're running parameter estimation against experiments, experiments are done at different conditions. Which means, let's say detaining data is ten runs each run maybe at a different temperature, different rate, or a different pressure, or different composition of the input species. You want to change the reactor gammel to match the experimental condition. You also want to change the thermo Dao specification file. Obviously, because you're changing the species binding energies then depending on what you're trying to fit. Are you fitting cost for the optimization? Are you minimizing the error between mass fractions or minimizing the error between internal sequences? You have to add the last function. This is a Python programming and this is the decision on the parameters. This is the raw or the input, the stock or the ground truth of the MK from DFT. You can run a parameter estimation for a given set of, maybe larger and wider bounds for each parameter binding, which is basically the species binding energies. Then once you get a solution, may make it narrow. That's called bracketing. And then in a couple of iterations, you can get to a very reasonably accurate set of parameters. Just like a pseudo code for. This would be, you have to define a main function. And this main function basically what parameters you want to fit. Once you decide on the parameters, you read the parameter data. You also have to get experimental data and see what runs from that are important and what do you want to fit. If you have to add weights to the run, maybe some are important. What you have to transform experimental data which is used to fit in. You have to read which parameters are important, which is basically selecting a parameter of interest. And then you have to run the Basian optimization. This is the main code. To run the Basian optimization, you have to provide something called a wrapper and a loss function. What the wrapper essentially does is that it is basically connecting open M with the objective function. Whenever the objective function is updated using new parameters, open M Cam has to be called. That's basically what the wrapper is before that in any optimization process, when you come up with a new set of parameters at which you want to evaluate the objective function, you have to create the input file. We have these two methods called create thermoamelactoraml. These are what I use. This is where the Python programming occurs, where there's a logic to how you update species binding energies. And then the last function is basically this. This is just pseudocode, I guess. I also have pseudocode for create thermoamel, how do I update the thermodynamic specification file first to figure out what parameters are being changed. Then you basically make a copy of the stock, the default parameters coming from the Excel file. And then I put up the species binding energies is in the show Mat or Nasa polynomial. Once you do that, you can't just run the open end game because now the entire reaction network has updated rate constants and things like that. You have to recalculate the input file and then rewrite a new Y that's done by them. We have done some optimizations here to make it fast. If you have to read an Excel file every single time to write input file, it takes a lot of time to run this process. What we do is we take all of the thermodynamic quantities and put them in Python memory and then we use it. I will show that the way the optimization is in the demo, that this is the pseudo code. Let me go to the demo. Before I actually show the demo, I want to reiterate that all of this is on Github page. We have written extensive examples on documentation. There are for example, templates that 11 might begin to run the first example using something called the Rosenrockct. This is a very famous optimization function which people use to evaluate optimizer. This has nothing to do with MKM kinetics, but this is just if you have a mathematical function that you can write in Python, linear, non linear, whatever. Ods, you basically can emulate this example and then use Peda to minimize your objective function. We also have a Python based macrokinetic models. These are your overall reaction rate models or reduced rate expression models that on covered in first day. You can also write Python based, simple Python based models and pirate using Pb. The example I'll cover today is this. Ammonia model, again, the same ammonia model that we have been talking about from Tuesday. There a couple analysis, a couple of examples for that model to use Pb. There's also a full blown model which is based on a literature model. This takes a lot of time to run well, not covered this in the demo, but if anyone is interested, they can always try it on their own. The first thing that we need is experimental data. What we have for this ammonia model is essentially this, where you say, okay, I have the 29 experimental runs, and then at the end of each experimental run I have more faction data, which is what I want to fit. Or you also could have rate data which you could fit. And you also have the operating conditions of the experimental one, like pressure, temperature, the flow rate, or the residence time of that one. This is basically your experimental data. What you're trying to do is update MKM model that these inputs you should get, you should get a similar output in using open MK. Once you have this, let's go and look at how you might do sensitivity analysis. We've already covered this in the previous demo, but there are two Python wraps that we have for open MKM to do sensitivity analysis. One is local and other one is global and we use Lip global sensitivity analysis. We've already seen the local sensitivity analysis. I'll skip this. Basically you go over each and every reaction or species in your elementary steps, and then you put the pre exponential. And then you figure out if that is important to your loss function. In this actually, when you're doing the degree of weight control, the numerator is basically, if you're doing NSC, it's a response like mass fraction. Here the sensitivity is done for the actual loss function itself, which is basically the stoma square errors between your predicted value from the model and the experimental value. This is basically the same loss function you use for optimization. Typically what I've seen is the sensitive parameters for the model will also be sensitive for the loss function. You can choose either method to evaluate important species. The global sensity analysis is very similar except that you use L to do this. The way this works is that you still have to evaluate the loss function at various parameter guesses. You basically select which parameters you want to evaluate the sensitivity for. You also give a range, In this case I'm saying okay, between a parameter bound of minus five, plus five, which among these parameters are important. When you go to the actual results for the LSA, you see that the local sensitivity analysis, I know which species are important to change the loss function. A lot of these species are not important, but some species important species which would change your fitting using LSA, similar with GSA, With GSA or the Global Sensity Analysis, you basically not only see the effect of one variable at once on the response, you can look at correlations between multiple variables. If you have non linear relationships between multiple variables and a response that's given by the that is also estimated. The way you do that is what they have is something called a total sensitivity cofficient. And the first order, the second order and so on. First order sensitivity, which is basically similar to a linear local sensitivity, similar to the total order, It means that there's a linear relationship between the parameter and the response. But if you first order and the total order are not similar, then that means there's a interaction between multiple parameters and a given response. These two techniques can be used to identify which of these parameters are important. Once you, once you've identified which of these species are important and you're only changing those, then we can actually look, I'll go into detail into the code for this, because this is important, how do I? Such that MK can match with experimental data. Let's say all of the species like these are all of the species I can fit. Maybe I'll go from the beginning. This is the main function. If you remember, you need four pieces. The main function, you need the loss function, and you also need the ability to make new thermophiles on the flight. The closes the main function. The main function. Initially, you need to define where you open MK Executable. The reason to do that is Python should be able to run open in the background at each of these parameter values. That's what this path is giving here on your own machine. Wherever you install open K, that's the path you need to give. This is basically the wrapper. What I mean by wrapper is a Python way of running open using the subprocess method of Python. Python will run open game in the background. This is basically the settings for that. This section is where you read the experimental data. You say that my input, which is basically what I'm changing in an experiment, is pressure, temperature, and volumetric. The response is what I'm measuring and what I want my model to fit is my mass fractions of ammonia and hydrogen. Once I have the inputs and outputs, then this is where I get the stock model, or the model where there are no pertivation. What I do here is basically, let me show that the example input. I think it's worth here, but if you go to the source code are the examples for this. This file resembles. This file resembles basically what Gerhard had showed yesterday. What you have here is all of the things that define the model, the unit sheet, the re sheet, which basically references the atomic, each of the elements in the gas species, the surface species, and the gas species part of the model. You have the ammonia, nitrogen, hydrogen, everything like that. And then also the transition states, the reactor sheet which has the properties of the react, the BP sheet which has the BP defined, the phase sheet, which has all of the pass reactions and lateral interactions. Basically, the entire MK, entire data you need to make an MP M input using is in this X sheet. That's basically what, what we need to read and load and store in an object. We have this routine that we have called load thermal objects. It's basically running the exact same code get yesterday where you're reading the units, the species, the reactions and the interaction data and BP, you basically have Python objects with the stock value of the model. This is basically the default DFT data. Then once you store it, what you can do is whenever you need to update the XML or the ML file, basically take the species, you identify a given species. Let's say end on the surface of the terrace surface or a step surface. And you manually change the Nate polynomial for that. And you say that once you update the species and then this function called right CPI. What it does is M goes through species and every reaction and every pay reaction, and then updates this new species there. By this on the fly, we can actually make new dynamic files. If you were to read Excel file every single time, it's actually slower, but by storing everything in memory it's very fast. That's the initial part where you read the file. This is where you decide which, how many parameters you want to change. You could either specify inside the code itself, like I want to change these four species because they're important. These are the ones that are selected from global sensitivity analysis or maybe reaction pat analysis or whatever other method you have to identify sensitive species. Then you can create, you can create this file. Second. Or you can create this file called parameters, Lex. The code can open this for you. And then you say what parameters you want to change if you want to change those parameters, the default value of the parameter and the lower bound. And the upper bound, what this essentially saying that I want to deviate these four species between -20 Lpl, 20 K -20 kilogus per mole, and plus 20 kilogus per mole. This is basically the deviation that I would want to choose. All of that information is read, put it here, or you read it using this one line where you pass that file. If you don't pass that file, you have to manually enter the bound. In this case here, I'm saying that these three species non teres and the transient state of N two teres is fixed. But these three species are changing between these bounds of -20 and ps 20. Once you do that, this is basically the definition of the parameters. This is the second part of the file. Then these things basically connect the open MK model and objective function. This piece of code connects the experimental data and the experimental response so that I can use it to evaluate the loss. Finally, this piece of code here, this is what is calling the optimization. This needs a bunch of things where it needs the structure of the loss function. Where the loss is evaluated. The range of the parameters we just defined here, the parameter range, how many times you want to run the loss or the number of iterations and some other additional flags. The next piece is the last function itself. The way this works is whenever the optimizer requests the objective function at a new set of parameters, it reads those parameters, it loads the default parameters, and then updates those parameters using the method which is the amhactoral edit thermoaml, which updates the thermal. Essentially you in each optimization iteration, updating the reactor input file in a thermophile with the new set of para, and then you run open K. This one line basically uses Python to run open K. You run open, you make sure that the model is actually convert. The parameters are bad and sometimes open K doesn't convert. You don't want to use that result only when the model has run. Basically compare the model predictions with experimental predictions and you estimate loss. That is basically the loss function. And then you save that loss function. This goes on and on. Then in the end of this code, you have some post processing where save all of this data for troubleshooting, where you say, for each run, what is my loss? What is my declaration penalty for each run? What is the set of parameters that the optimizer has used to evaluate my objective function and things like that. You basically save all of these results as output files for debugging. This is not formatted correctly but you basically, in this case I've done a lot of runs, about 600 runs. Each of the M run, I figure out what is my loss and if my loss is getting better, parameters are fitting better and things like that. You also get some plots in the default stock model. Opmkm does not really have a good parity with experiments, but with the updated set of parameters, you seem to do much better, which is the idea of the whole thing. Finally, it repeats this process multiple times. Then you basically, let me show this on the Chrome browser so you can actually better this basically fit into four parameters as I showed. And then each run it shows you what are the fitted deviations. Sometimes you see that there's a huge standard deviation, which means that there's a lot of variation in the parameters and maybe these parameters are not sensitive. These are actually not important for fitting the model with experiments. But for some parameters you have very low standard deviation and the mean is close to each of the optimization it runs. That gives you confidence that, okay, maybe this is a. This is a better fit for that parameter. This is why you need to be careful what parameters you fit and that's where sensitivity analysis and other methods like the action path analysis come handy. This is just one example. There are a lot more examples on the Guapo, using Docker, using different techniques, uh, using optimization, baation optimization. And everything is described in the read me file. You can actually follow the instructions to run each of the examples and modify it for your own purpose. That's basically a short demo. The code is a little bit more complicated to go line by line, so I wanted to give you the highlights, but if anyone is interested, I'll be happy to go over one on one or maybe you can catch up later. But yeah, that's all I have for Ptar'll, be happy to answer any questions. One question I got is that, can Pedo be used with MKM generated using other software than open MKM? Yes, power can be used. The only thing is that you should be able to somehow change the micro kinetic model model parameters on the fly. That other software is a black box MKM software. Then maybe you need to change the input. For instance, you have to change the input files of that software the same way you have done for open with something else where the equations or the code can be changed on the fly. If it's on Python or something, then you do not need to change inputs. You can directly change the model if it's in memory. Yeah, it is possible, but you would have to redo a lot of these routines. A different question I got is the surface species have been mentioned as parameters instead thermos parameters. Can you please clarify what we're saying when I'm talking about the surface species, is that you're not necessarily directly changing the heat of reaction or the activation energy of the reaction, but you are changing the DFT, calculated potential energies of the species just by a small amount within DFT error. And then you let that effect trickle down into the thermodynamics and kinetics. You're not directly changing thermodynamics and kinetics, but you're changing the DFT values which are used to calculate those quantities like constants and equilibum constants. The hope is that you do this correctly, everything is consistent, and you're only changing the model just within DFT era. If you directly change sheets of reaction and activation energy without changing species energies, there's a good chance that you could cause inconsistencies. That's why we follow this procedure. I answered, I hope it makes sense to you. If not, please let reach out to us on slack or e mail.

petBOA - Parameter Estimation with Bayesian Optimization

From Siddhant Meenor Lambor January 24, 2024

25 plays 0 comments You unliked the media.

Talk given by Dr. Sashank Kasiraju (Computational Scientist, University of Delaware) at the Virtual Kinetics Lab (VLab) Online Workshop on 14th December 2023.