Servers kept 46 thousand and access on it. Stood on that one. Yeah. So runs in real time. But I'm documentation are people doing it. This one's. Yeah. So Thomas ******* I like science because the second I give it enough characters. Look at this. Second I give them the characters. Come veg and instantly. Plead guilty. So lot of indexes >> So we have 316 is the song lovely. Okay so I think this is our last ORs or we do internationally. So we're gonna genre alright this is the second to last one. Last one before Thanksgiving. So today we're going to I'm going to start by talking about full text search indexing kind of explain what that is. And then >> I guess and it will go on to more production or you go to ALK snack Yes so that's more production full-text search. And next thing we're going to start off with something that I made a little project that I've been working on. And it's something that you can just get up and running really quickly and it still works really well at like a decent scale. But does anyone know here were full text search indexing is it's kind of a mouthful and it's very niche usage but it's like one of the most important things ever in cloud computing. I'll give you an example. So if anyone ever uses Chegg January olive. If you start typing a question like okay there is something. Right okay so see I didn't search yet and it's already suggesting things. And now it's not giving me the whole post right it's kinda saying. Here is a question. It had areas something. He was not a question. But if I click on the post it's gonna give me a lot more information than what that little box just told me like it's going to work the whole question answers and it's gonna load all the thumbs up thumbs down all the other answers right this is where full-text search indexing comes into play because >> Think that this whole post somebody use or a database write lincoln play tweets not a very good example. Think of an Instagram profile. We want to search for someone on Instagram start typing their name and write this as dangerous. I'm going to search for. I'm assigned to school. Now I'm going to do and I got some weird **** on them. Okay so I'm going to search for someone I now. So here's an example right I start searching for someone I now I know >> People it gives me their name like their display name it gives me their username and it gives me their profile picture. I'm going to regret this. But if I click on his profile it's gonna start loading his entire profile right February. So what is that well it's basically saying full-text search in that saying what we're doing is we're indexing a little bit effects about something. So in this example we're indexing the question. I think the next part of the answer. And then they probably index who posts it. Now why is this important well if you're searching for people on Instagram and you're searching for a question on shape you want search results really fast. A good one is Google. Something that's not a good idea something fun phi. Okay right like our search results are going to have the title of the website the link and then like the first few sentences of the content I don't need to load the entire website to look for which they gave me little snippets to say Is this the most relevant thing to you is this the most relevant thing to you is this the most relevant they're not loading the entire webpage for every search result. And that's where it comes into play because we don't want to load the higher profile. We just want a little bit of information that says that's probably what it. So as a result we get indexing that's really fast. We can look for things that are really quickly little bits of information. And then when the user says Yes that is what I'm looking for we send them the entire ****** thing in the database right so think of Instagram send him the username the display name the profile picture. If I say Yes that's the person I am looking for send them everything send them all the pictures send them everything. So we index this stuff so it's really fast search for stuff. And then when we find a specific thing we get everything. So let's go ahead and play with this on our own eye. There if you go to my github There is a repo full-text search indexing case. I've dropped a link or EON dropped a link. In the class notes >> What does a product huh okay so I my brains it'll fry it. I've been studying math for string. Exam them on one. In my day I know full-text search MySQL using stemming. Yeah I just use JavaScript. Well go NodeJS. I just the static shock. I just shocked my laptop into display went off for a second. Coincidence I think we monitor if there's some electrical engineering right there. Okay so go ahead and grab this repo >> Please just bring it up and just to kind of create a similar environment let's go ahead and make a droplet. Let's make it a boon to now you can make it a baby droplet. Put in New York picture SSH key >> The thing shouldn't have to do as part of them are going to get this going. And then you're going to need that node.js install. I'll go ahead and give that to you guys again. You can run this locally. But I think it's nice to just run it over the Internet because you know if you ever come across a project where you need to those two lines are separate By the way. I'll make that explicitly clear. Okay thank you. Mark down >> If you ever come across a project where you have some sort of search feature. You're gonna wanna index searching because that makes everything a lot faster. And you can do stuff like that. I showed you on Jagger on Instagram where they don't even have to hit enter. Search is just as you type you start suggesting results to them. So let's go ahead and associations and drop our install command. We will blink droplet yet blank just install NodeJS. Yeah and I definitely would encourage this one in particular for everyone to follow along. Just because this full-text search indexing it's it's one of those Cloud Computing Concepts where it's like you get it or you don't like that. I feel like there's no half understanding about how you can use full-text search indexing. Once you go ahead and get an install just get clone the repository. I'm sure everyone's done this at least once git clone and I drop the link should have a folder. There's all that code. Oh wait a minute for people to catch up >> I saw it won't people are catching up. I'll explain a little bit more about this specific project its own. So there are few major full-text search indexing engines. There is elastic search which is used in the ALK stack. There is something called Lucene which was made by the Apache found their foundation Apache Foundation. Apache was seen. Patchy seems kind of like what you stuff into other products. Like MongoDB uses Apache Lucene or MongoDB Atlas uses. Apache was seeing in their cloud environment. And then there was one for a long time called solar and that got I believe that got changed to lunar. There's another one called Lunar. They're both like spelled without the letter before he ourselves like SOL are LUMO are this library uses something called elastic lunar which is like a Node.js version of lunar but it's optimized. And essentially what it is it just wraps the library nicely with a rest API and also makes it so you can do multimodel indexing. So I don't restrict you to just like the one model and make it seeking different models. And then I also added a web socket and points so you can socket a front end client or a back-end client to it and as they like you can just socket searches. So you get that like on type event search results. And I'll bring that back up. I actually tested this with a 46 thousand random entries and it's still instantaneous. And that's a lot of data. You don't want to know the algorithms that he writes. So like I like complexity theory and understanding stuff beneath it or whatever >> What about the b plus trees well it's more than that length. So so so b plus tree is how do these indexes work which is say a million records and he goes from Emily records and see which one of these has the word length or wheels roll. My records were far. I can go one by one by one. Now as many of these scales can be really slow process it's order n. So I'd say that n entries or a search to go through and find where if I have a column in my database that's like ID index AMI ID. And what that'll do is take all of that is a certain range. One no industry one node of a tree window of history when it betrays like take DID check that raises it and go over here take with us in check with revise it. And now it gives my record in order login rather than n and log n is enough be practically instant brownie records. Yeah. I think among the overlap that's how experts make login into it. It's going to actually allocate an entry index so that you know where to find that particular record and your logins style Well when you full-text search which is you've just got a whole website where the data the way they do the indexing they go through the words and they make what's called a stem. So they reduce the word into just a few confidence. And so they all say freeze-dried ice cream is tame way anything about the misspelling would arouse and go down the license very good baseline. Normalization of that word and an index on that stem. And for for all different sends it got or now that you have a decimal in percentage and you take your search term and they'll go through log n search on the stem things in your search. That way if you misspell or you'd constantly it's not exactly a perfect fit or you know whatever the setting is trying to get some sort of length type representation of something that might have 200 different words associated that stem. Certainly. Create sends put that in an index on this and then that hydrates to 80 that it acts like an IDE or the actual record sounds founded where in the US and Israel. My backup log n. Well again as to login is still this Customize settings. It basically is fatter than a database. So it's twice as slow as database with an index. But you never noticed twice as slow when logins speed of the algorithm. Yet but it ends up being faster than like a query to a database server. Only if you're doing poor indexing on your queries. Yeah which most people do which often people do. There's a book called use the index Luke decade old or whatever else but it teaches you how big a database queries. And it's also good because it it's it's good to have indexing separate because then you don't have to worry about what data is stored in one database like if you think on a massive scale like Twitter >> Hence they use Cassandra they use Mongo they use Redis they had multiple different databases each of those databases. If you were to use built-in indexing are indexed a different way this way it's just unified. They just they search for thing and they go right it's in this database at this reference look it up right so when I do the else there that I have that the main purpose of elastic that is itself a database with database make index and other day. Yeah and full text search indexing offers a lot of features that day. What we're here to do npm install full-text search indexing also offers a lot of features that database queries traditionally don't like typo suggestion and correction. Token expansion which is like if you search for the word by and I'll actually show you an example because I use buy and purchase. It'll look for buyers buying. Sometimes they'll even look for different versions of the word like bought stemming yeah dehydrate right and then a lot of those like Elke will also take a word and find other words that mean the same thing. So if you search for by It'll also look for everything that has purchase in it >> Chef Yeah I think you have to get the enterprise ALK stack tip to get that besides like an extra Albert I built band this cuz I was like that's called fuzzy searching where they take something and you look up. Every other you look up other searches that mean the exact same molecule sorry that's a little bit. Sorry. Google does thing on his own word to back. And what it'll do is if you've got a string of words in a row or something like that or stemming words. They'll store the first three are the last three. And now look whenever I see V3 V4 over V3 out or how often do I see other WordNet spot that is aggregate all words that exist when these theory here and easier here are all going to go in the middle and use it for my counting to kind of a vector space of closeness. And so better words show up many times many similar prefixes and suffixes. Then they say These must be synonym or whatever else. And so like the aggregation counted like did you mean or looking for her and things like that of a growth like a nasal data gathering process where DVec which I think is so powerful it should be using a lot of other contexts >> Query or machine or explore alternatives. Now if my search I don't say engaging me and I said say you meant to this. This is what you meant. Okay so everyone should have this installed. Ready to go. Just go hey do. Oh yeah. He just run npm start. There's going to be a few things on start-up. So the first thing is you're going to generate Adam and he client he didn't ETC. Grit. Is listening on port. 8080 now I'm just gonna go ahead and stop mind for just so I can show you where these things are. If I ls now there's extra files. There's JWT secret admin key and client key. These are important if you ever use a service a really popular hosted service is our goal yeah they make a fantastic. I love their products so much but it gets kinda expensive at scale. They are product is entirely built for like searching. You just hit their API with like add an index to a model. And then they have front end. And I believe they even had like iOS and Android SDKs that you just plug it in and they just give you like a component let's lay pipe your search into this and we will just return arrays of suggestions. You make it really easy and you don't have to understand anything but we want to understand stuff. And when I say expensive like if you want anything like decent you gotta start at like $500 a month. For 500 million you think 500 million operations is a lot. But like every time you hit a key that's an operation and then you have that cross like a 100 people and maybe they type 500 keys. And then you start getting really big really fast. So that's just talking about why this is kind of useful to self host. But back to this. So you have a client key and I'm jaded and it's easy to. Yen peak is first created An index model it's just kinda like an object that you want to create and that'll make more sense when we walk through how to actually use this. This is kind of pretty much all full text search engines operate in this like make a model and then you add indexes that model. And then you query that model with a search result. There most of that way I know elastic plastic is kinda complicated but if you dumb it down that's like the three steps. That's exactly. I've modeled this mostly off our goal yet just because I really like their product. So add and key is for stuff like >> Adding models deleting models adding and access the leading indexes. The client key is purely for querying. So they can have unlimited access to query but they can't do anything else. And then the JWT secret I use that for the security and web socket. I will talk about that when we get to that security class right now. Once you read or did you JWT is a JWT lecture a theme valve. If I think back of Yeah >> So apologies here. Yeah anyway so I'll go back and npm start it. Now you can see a second time. It didn't generate the stuff. I'm going to go ahead. He is postmen. Just because I like postmen but if you have a HTTP client you lay everything somebody post requests the URLs in the browser like the last time we worked with the API that I made the object that's only for the web socket. This I just use the admin key. Okay so if we look at and this I did some I would say pretty good documentation on it. So usage right OK scroll down Jesus. Alright so let's talk about the ad an API. So Miles who want to add model you may post request to this endpoint and it basically has to look like this right so we have to specify modeling. The fields are kind of like the structure of the model. So the different fields that you will be setting the model and using yen Asian. So let's go ahead and grab the example and stuff it in. So I'm going to go ahead and make a new tab >> I'm gonna make it post. If you don't have postmen locally installed. If you just search for HTTP request requests online thing kind of like this yeah request then there's there's a bunch of different ones. You just kinda get one of these because you can just kind of find something like this and follow along but I'm going to go ahead and use postmen just because it's easier. And I know how to use it. So URL it's just going to be the IP at the server. End to that >> Colon 8080 we are I am running us on port 8080 by default and that's for two reasons. One you don't need to run it as root and two it's because if you were to use this in production you'd want to stuff it behind something like engine X that would handle an SSL cert for you. And then according to documentation I want slash models slash. And I know maybe you guys are thinking like why do you have slash models then add model why not miles than add it because when you're developing sometimes you forget to switch the prefix and you have like index add and model ad and you just start an ESM shifts. Honestly that was mostly for development stuff. Because I have like slash index slash add. So I didn't get confused when blasting through requests. Okay so for the body I'm just gonna select Raul. And then JSON it's probably a lot easier for using a different client. And then I'm just gonna go ahead and grab. That example Body exactly how it was. It's gonna complain because JSON technically. Everything should be. Wrapping quotes are kind of wait here. That is not the correct data and key though you need your admin key. It will complain >> And that can be found by cat. Admin key out good I don't put a new line at the end. So it's going to be everything before route. What about Lorin but that puts you into that like maybe the more you Linnaeus the less he now. It's funny to me that less last puts you as a synonym for more because less is more like a joke but more doesn't put you into that thing. Yeah. What's the difference less is more. Okay >> I'm going to put the key in there. And then I'm going to post and pray cannot get a response because I stopped it. Go ahead and start that bad boy backup. A model. That's good. I'm glad upward. Okay what is the purpose of your role models are how you define indexes so that you can add an index to a specific model if you have. So if you have indexes with different shapes it can slow down your search all of it. And it also can confuse libraries. Because when you do there's two kinds of searching there's general search where you're just like searched for anything that has this text in it. And then I have a query time boosting which is like search for these fields specifically that has this content and this field matters twice as much as this one pretty using under the hood for Alice estimated plastic where elastic murder lunar lunar elastically or it is elastic. Elastic lunar sky It's based on this this is the full-text search engine point. If anyone's interested I saw seamless people are at this point they've added a model just to prove it's here. We can just changes to list and should say we have test model. It's good news. You can also delete it but I don't need to show you that. Right now we have a model let's add a few indexes in this model. So I'm gonna create a new tab. Oh my god. To me towns open grab the URL again. And according to in your documentation. If we want to index end points slash index slash add. Alright so slash index slash ab. Okay we need a body. That's gonna be a JSON body. And I'm also going to grab the example right from here. But I am going to get the correct m and k. And it's going to complain again. Man I should have made these quotes and excitement documentation saved my life >> I'd actually ok sorry you had a model. Basically this just says like we specify the model name we specify the key and then this is just what the document looks like. It goes along with our fields that we specified when we added the model name description name description. So I'm gonna go ahead and add this one index added and I also get an ID. This is how full-text search. And that's basically what it does is when you do a search it says message unauthorized. Yes uh you have to put in your hand >> You gotta say mer. Okay Oh yeah >> Well **** helps people I went to the football fans in the room. I'm Jaguars framework the bottom of our division the top envision addressing things tie that plane. What is the tiebreaker for who I root for right like I covered for injuries but that's not very nice reference high maybe. But I don't really know what to report tonight when these two division rivals go. It's called the Texans by dimension that Eagles or anyway something other Jaguars fanfare learn. Second needles and bottom envision a cowboy isn't the giants on the top right yeah countless possible. But they make that choice points because we play them once more. And so like if a then B then maybe we lost the goals once we lost the Texans twice. Alright they think. They can do more damage. Yes sure you have Jason player. You're going to save my life. Saying words in a rigorous role prompt them. Asking Jason's kids. Here's our yellow spots are where I got my stealing money they felt okay so when we talk about this the way I full-text search indexing would you do >> You search for something whether it will return anomalies. Basically. Lunar does is basically good returns an ID and score. The score is how close the document that is suggesting relates to your search. So higher score the more it's related. And then it'll give you an ID. And it basically says if you want to go get the document go look up this I did. So they try and make it as fast as you can as fast as it can by saying like for example if you only want the first three documents it's just going to it's going to send you all the references and all the scores but it's not going to send you all the documents because documents can be massive. They can be megabytes It doesn't want to waste time sending you all the documents so it says hey here are all the references. You can be like great I only want the first story. And then I'll go get the giant documents. This ID in particular is just a hash of the time that the index was made. That's not the most effective way. The industry standard is based off of something Twitter created called snowflakes which are basically the first like. It's like 64 bits. And or maybe it's 64 bytes and like the first 40 or the time the next 12 or the host name. Next 12 are like the network information. It's like there's a crazy amount information. They slapping the create one like perfectly unique ID. I think it takes data into account to anyways. So I'm going to make higher hashes that little lately like working with that type of yen that Sam. That's I don't think it's a direct. Hey I just use the npm UUID library. And I just I use version one which makes its UUID often tie. So I believe it's like a hash because if I make like a few of these you can see like extremely similar. So I believe it takes I prefer certain characters of a hash. I'd rather leave the HashSet. I think it's a timestamp derive process. I think might make it what it does is cryptographer painless protects her turn her memory explains what I think it takes the time turns into a number. So second CSP epoch and then does like string on that. So that's why I like the first bit is the same. So it converts like that in essentially binary next. Okay anyway so I just made another index. So let's create a developer into develop just to show you some features later >> What else am I going to chop off let's make developing as well. You'll see why I'm matters later. That's that's called token expansion that we'll talk about later. Okay so I made a few indexes. Let's like talk about another name. Another. Another name. I'm going to make another name. God this is so bad and let's add what's and de novo Joseph. Let's just add some more names in here >> Just Swede gets more indexes right now if we want to see all of them I put a big warning in the documentation. It's they don't use the list endpoint if you got a lot of indexes because I will send everything to you. But if we do list it's just going to give me all the indexes that we made. So this is kind of what they look like in the response object indexes. And then these are all the IDs. If you want to iterate over them there's pointing to JavaScript methods for iterating over arrays and objects and objects of arrays and arrays of object objects of arrays as I thing or I guess. But that's useless because he never Just want to fetch all of them. So let's talk about querying >> So I'm going to make a new tab and I grab this though. New tab. Make it a post request drop this and body raw JSON. And then let's look at the documentation. So we've got we've got model and we've got indexes into sand. Now we want to search for this kind of stuff. So the end point is just slash Q. I made it that to minimize the amount of characters on the front end. And let's just grab >> The example again oh my god update for next time. This is taught. I'm going to wrap all this stuff in quotes to begin with. I and I'm also going to rip out fields. Expand fuzzy and typo strength >> So we just have been fastened just type it out. Let me go grab the client key real quick. Oh Also another thing about this library. Anytime you make a change to the models or the index as I ready to this. Say that non-volatile non-volatile Zeta and we started isolated all back-end. But we're going to want to get that client key now because now we're querying we want the client key. This is effectively what people on the front end will see. I. So we're going to do now is originating post requests. And we just won. When I says I nothing it's just basically saying given as the angle from the model test the rank sick ones that nope same errors less than yep a rat. So we're going to get every thing that has dan good a of the any of the fields near these details for nice wherever all z because they all have the same amount of Dan Goodman in both the name excursion. So that's not very helpful. But this is where some of the other features come into play. So let's talk about some of the other features. We have. Alright fields. This is an important one. This is called query time boosting. Basically what this says is you know me I care more about B description than I do about the name right so what I'm gonna say is what does it look like so I'm going to say the field description gets a boost of three A common ONE. Yeah he's a fielder view yields Its fields. Okay now look this is also going to say I only care about district. This is a three-time floors. Three doesn't really matter when searching for one field but this is going to solidify solidified segregating because that's not going to it's going to make sure I only search in the description of objects. Okay so it's kinda look the same right whatever I didn't do a very good job of differentiating these but you can see they have different scores. Right answer. Further like 2.7 whatever >> Alright so let's make it more interesting and sort of Dan go searching for my name as narcissistic. Let's search for develop. Non probably only going to get. Maybe I'd built-in okay there we got look. So we've got other name ending of a write my name they all have the same score. Now if we search for develop Dan. Now the ones with Dan are going to have 2.7 something but ones with another and Andy are only going to have a 0.5. And when I when I send it back I also sort them >> This is where things start getting very useful. This score is really how relevant it was to search. So I'm saying I search for developed and Dan chances are they're not looking for a developer. They're probably looking for. Right so that's cool. Now apparently at I think I enable this by default. There's another feature called token expansion link that will expand true. And that just basically says my look for develop also expand that into developer and developing and there's more developing like developed. It basically just takes the root of the word. And it'll also take late. That's the beginning of the word. It'll pick stuff that's after it. Or basically if that's just part of a word which is pretty cool for when you are searching for you know a common one that I think of people don't know how to spell Lamborghini so they just spell whammo. That's token expansion. Just like did you mean Library but on the other hand I know how to spell Lamborghini. There's other cool features right So one is a typo suggestion and correction. People misspell things a lot. And if you like this bell develop like FEB pop >> Let me take a Dan out of there so it doesn't. Did I just put in an option Z yeah. Home the more you know if I search for this I'm not gonna get any results because they're just like what let me I want misspelt as aggressively let's do something more realistic. Let develop. Right right no results. Right oh no. But you know if you go into Google and type that in you're going to get stuff related to developed. So I have this one thing called typo strength Type of strength I do put I believe a put and a warning. I guess that typo strength field it's going to attempt to detect typos and the number that you put after it is going to determine how many suggestions that gives you it doesn't tell you these suggestions. It just like ducks its head and does it returns you the results. But you know I'm going to make it one because I know the one thing it's going to go for its probably gonna go forward develop. So if I add typo strength one and I hit search it's like oh you probably meant develop. Here's all the things with development and it still does the token expansion because it's awesome. Okay >> So the other important feature is called fuzzy search. Where one of the most important things I know Google is really a good example of this. If you search for one thing it's like did you mean this do so it will start giving you results. Fuzzy search is used a lot in ads popular car company. So if I search for Honda my guess is that I'm going to get ads for other car company. I have add block on. Let me turn that off for a minute. Apparently not blogs Google ads. Now Honda SUV I start getting dance. Now yeah yeah there we go. Okay so now accurate hitting me. Hey I think. So yeah but that's not going to. So basically what they're saying it's like Honda SUV toy Toyota SUV like that's fuzzy search. They mean the same thing. It's not a great example of it. I'll kind of show you better example. This dataset is going to be useless for this fuzzy search. So what I'm gonna do is I have a local dataset that I was building >> As you search with so I'm just gonna start my local one has a nice set. And I think I already have it. Yeah okay so I'm going to show you all my index is first this better not be that big. Okay Thank God. Okay yeah so the example I used for both research was buy and purchase effectively means same thing. So I have my fire by purchase purchase right so with this data set >> Search for purchase now for actually for some search for by. So what's gonna do I have do token expansion on I have fuzzy search search for one extra meaning is basically what that says. I honestly don't know why I didn't make it the same structure as typo strength but whatever. It's also going to just do a typo strength of one type of strength will also only care if it's actually a typo. The first thing we'll do is check if there's a typo. Then it will actually try to fix a typo that I wanna do is query. Martin said. Yeah okay I don't know why So I searched for Buy. Now let's look at the results. Remember I searched for by now the highest score one is purchase that well yeah 1C buys buying purchase made IS worth of business is a bit high. That's because purchase >> He means is more similar to a definition of BI and buys and buying is because that doesn't direct like it's just a different meaning for the same thing. So it treats that as a higher score because it's the same like tense. Now if I search for purchase you know that's going to be up top. And instead of the bias coming up first now purchase purchase is more that buys but SLO's high score it. Right but now let's say I say a typo porches right I'm still gonna get purchase and purchase. But I still get the bi's and buying and buyer. That's because the first thing I do is the type. Okay. Then I get all suggestions for the typos and then I put all the suggestions through search. So if you have like a really long typo suggestion and a really warm fuzzy search. It's going to do the query for my documentation on my hey don't put a numbers. Here's the algorithm what it's gonna do. Because her fuzzy search I use. And this is great for you guys to the datum use API. These people are it's like a website for poems and stuff but they just have an amazingly generous and useful API. For stuff like you know slash words that means like ringing in the ears. The first suggestion is tinnitus right like that's probably what someone meant when they search for it. So this is what I use in the backend. They give you like a 100 thousand queries a day. But similar Animoto green but capital they put bullet two t gives me got low scores here. It's it's just like it kinda takes like what Google Finance. I think it's kind of like how Google does it but probably not as efficient it's higher than could confidently. Anyway basically so if you think about you know you have a 100 thousand queries a day limit. For a single query you say I want to tag whose frame you spell something wrong. You could type of shrink them like me and you've a fuzzy constraint or pen. Well what it's gonna do it's gonna before either returns when he data it's going to make an index on your misspelling just to see like did you literally misspelled something an index and it's gonna do title detection as you go up. Here's ten of those extra means. Now for each of those ten extra readings plus or typo. So it's going to take a leaded meetings for each of those 11. It's going to do ten fuzzy searches. All right so now you have 11 times ten Write one times ten times ten or 11 to the ten or 11 times 1010 times to yeah 11 times ten. So now we're at a 110 math I magazine markets right now we have a 100 or a ten queries for single sir. That gets hey Natalie as that heavy on the server but it's also you know you just made a 100th and the queries. So one search just took 8.1% of your very generous daily limit. So be careful with how strong you make that stuff because it will find everything. If you flip that and go the other way. So I'm just >> Maybe I should look at your library but if I have a bunch of typos in my descriptions when you go to for your singular words and your type of if you're doing your searches on. Are you doing your typo search by like a histogram in the database of existing strings. So the type of starch I use type i js. Wished like you just downwind of the English library play a thousand y. So it's so if I had a bunch of typos in my descriptions and then I search the correct word >> The dashed lines if I go into Node modules typo typo JS dictionaries. Vn **** that's 62 thousand lines. So it just parses that really quickly. And looks for like what is really close in here and that type of straight will say how many it'll return based on like strength. So it also gives you a strength score. Pesticide you the type of stuff and it's literally just a wrapper for parsing the files and seeing which ones are most similar. I could handwrite it but I just use and PMI type of JS typo dot suggestion. The thing I misspelled a little bit easier never reinvent the wheel unless you're making a better wheel >> Yeah so if I make the typo strength like for I need to restart it. It's probably not going to find anything new because I don't have any extra indexes that coincide with other suggestions. But yeah and then let me show you the last thing last things really cool on this server that I have. This one currently has like 40 something thousand indexes and it. In fact I'll just SSH root. I wrote a little script to count. Steve goal >> What is a node test.js This is just reads the save file. Okay so I have 46,152 indexes in this that just reads the dump file from the next memory cache. And it tells me how many indexes there are. And that's currently running the server. I also have your 46 thousand on really 50 things we inserted. Now this this was my development server. This was when I was making and I just let it run for like four hours randomly with faker JS which is an awesome library for like randomly generating fake information like Names addresses phone numbers emails. I just had it like just spam the server with like index add new name. So if you see in the repository there's the simple that HTML. And this is just an example of how to use web socket and I'm not going to go through this because I use like raw websocket Lusaka or anything. Because ONE cross anything that supports websites including Wave Arduino's run it and see kind of thing. And they know how to solve it >> But basically what this does this is the important Javascript part. Basically what it does is a bear that's cut off what it does is this fetch to the slack off generate token endpoint. That's going to generate a new JWT JSON Web Token. And then what can you do the same to drop it in the URL where socket. Because when you connect to the web socket it looks for that JSON web token and makes sure that it's a valid token before letting you can see you can just get random connections. It's actually a lot easier to us with a web socket connection because even though you're sending like smaller amounts of a camera it's so much faster. So that's how I handle that security. And then basically on on open and I say this as an HTML page you connected on clothes and knock on message and basically keep the search results. This is the important part I listen for an input box. Anytime there's an input event I just send new search for the website. I'm going to search this. And I just hard coded in life and my willingness person who's stuff. I'll kind of show you what this looks like. This is just an example for using the front end. This is connecting to my server with the 46 thousand ones. This is what the jailers ET looks like. She's got the webpage. And search will only happen if you give it at least three characters. So I can just take a random name so I'll go with Sam. So I'll talk about what are we a second I hit. Em. I'll put my hand on the second. That just search through 46,150 Index 152 indexes. Let's go with another name so it's start typing Thomas. What my spelling problems wrong. Now it needs more needs more encoder search for Thomas. So you can see here it's like paleness makes sense. What's another nice starts the same. And then the way you end the name specifies two different names Andrew Alix and Alexander maybe. Okay so there's out. And when I go for Alix and Alexandra handling the name Alexa Alec sand dairy or manipulating the undoubted same phenomena name my daughter. If I have one. Now And I can also search for last name like there was a Glover won't feel good. Okay so basically yeah like I mean this is fast like Dann Stam There's Daniel's. Yeah there's Daniel's. And it also searches it by scores. These have all the same score. There isn't like this has the same scores. This is because I've token expansion which basically means anything. That means the literal thing has the exact same score is anything that's part of. I was very aggressive but the token expansion. But you can see like this loads a lot of people really quickly. So this is really efficient and this was kind of a **** code that I just slapped together. Yeah so sleep the chicken. I'll explain in a little bit. All right so I'll kind of blast through how production like npm project looks. So you have your entry point which is the file that runs when you type npm start. That's literally just importing everything and then I do the server not listen where I basically check to see if this is true that's driven or Wilson's disease file. For each of the files in the ceasefire which each of the files is a model sort of separate models and learn vital. Files was else little files into memory. And I said this is object as part of this utility solar and indexes for he defiantly I agree a last index. That is the converse of that. So I basically just take it from this load it back into memory and I put them all in one of the objects so I can just be like indexes dot the model name. So it's kinda nice. Now we can look at the different endpoints that I actually use. So I guess we can go nowhere. So this is what the myelin points look like. I just have a function for writing the model to the desk. So anytime I make a modification I just read it to the disk ad model. You can see first thing I do is I check if the key is correct. I load the ad and key in the into memory in that main index.js file tip. First thing I do see if the key is correct. Next thing I do. I check if the model already exists. So I don't want to overwrite stuff. And then I basically check if it's there I'm just going to push. If it's not there I'm going to go ahead and push the model into the list of models then I'm going to create an object or sorry ID needs to be specified in elastic linear and you have to have an idea. If you don't build specifying for you they clean the documentation but just now. So if it doesn't exist in your body has put it in there for you. And then what I do is I create the actual instance of the model with the elastic lunar library. And then I have to run this model not the old method for each of the fields. Fields. I use prompts that all that. The best way Asynchronously iterate through stuff because you can label weakness and I will just its rate. It's a lot to sway. You gotta really love Node.js and brightnesses hair. And then when I finish I just write them out allows the file is telling you that model. The list branches dump you the entire thing because like here. And that's fine security wise because nearly anarchy to do them. And then delete model I literally just delete the file that saves it. And I also delete that whole memory object or the object in memory. Some objects AS I just tell you that I deleted it thats models then we talked about indexes. This is the this is not the complicated one crazy complicated one. So I did the same thing right model disk. First thing check that exists. And then the first thing I do is I check for extra fields. So if you have fields that are additional to what you need for the model. I'd be like hey you don't need these get them out of here. Then I'll check for missing field. So I'll be like hey you didn't have this one. Get out of here and give me a next time. Then we all good. So I go ahead and I take the ID and it is it's the date and I just to SRI saying 36. I want 36 of the numbers which ends up being however many characters. That ends up being the ID that I write the model out. Delete I just look up the specific index and I remove the document by the reference. And then I write the model out basically every time I do this or at the model out and then list I just send the whole object. The document store dot docks. The big one is the querying and points. This one's a lot more. Ok. So like Search is the fuzzy search essentially. So as I drop it in to your output space and then they scream for each of the results. This is all based on position fuzzy way exactly there. So I'll do that for every time I'm going to slice 02. However many you asked for and then push it to the new queries send that new queries back. And then here's the function for typo search slot shorter elder We just suggest oh god. I suggest what are your suggestions based on what I typed in it'll be like Huh. These are the suggestions. My cool. My only as many as I asked for her and then this is where it all goes down. First I made it so you can use the admin key if you want because coming back and I don't want to switch keys. But if you don't have the client here the n key I tell you to go die. Then I see if the model name and actually exists. Then I check to see if you have the fields and the expansion. And I just you know do the initial search. And then I get a results and then I map those results to that result object to the magic part. That is solar When a router document store don't give duck Yeah so I basically F the magnetic one yeah this search gets a list of documents with the id and the scores. And then I just for each of those I go ahead and fetch the document. I don't do any page pagination right now. Just send you the whole **** thing. That's all my to do list as probably add pagination in the server side. And then I check if you specify typo strength and it's greater than 0. And there isn't a space because I don't even bother with typos search on spaces are sentences. And then I search the I do dictionary check which basically says like check if it's actually a typo. And if it tells me there's a typo on my right we're sending it to the type of search. So we do the type of search. And then you can see I basically do a very similar thing. I add all of these search results to a new array. And then I do buzzy I checked for fuzzy searches. Run that like search get all the new searches added to the array. And then I take this array and I perform searches all along. So now doing searches on all the typos and pleasant searches of all places suggestions. And I get all of these results both objects and I say if the results are less than one no results found. Otherwise I want to re-sort them just in case they're sorted. If it's just like a straight search but if you start doing typo stuff and fuzzy stuff it gets out of order because I do it asynchronously. So whichever one or transverse I just there were then and then I reorder it in descending order by sport. I use low dash load ash is great. And then send the results. How many so 6870 by thermal ever line 68. Can I control coming back now not through elastic 1A so that's something on the server side. It doesn't it usually doesn't matter too much because you know when you do full-text next full-text search indexing those indexes are pretty small compared to like database object that is being that your work is going to be n log n Yeah not log in. So if you're doing Google you get 10 million results or whatever else and showing you the top three data. Yeah we'll elastic lunar is not meant for scale. This is just DOM easier. It's like install addDoc. So that's why does he like like I said like 46 thousand silo's. Instantaneous Well yeah it's more if if there is 46 thousand give a non-trivial school yeah I don't think. Anyone would use an NPM library at like 10 million chess they'd be like oh what's the production thing that costs a lot I Me Mine class in order to see how scalars stuff out to the millions best you're gonna do next term. I'm here to introduce the concepts. All right so here's the fun part. Here's the website. It looks moment. It's active. Managers and basically say on as a 3D object. And then now they think on web socket I don't bother with typo detection for Plastic surgery because that's really Eddy and hold the websocket part of the search path. So I just do it as late as possible and I literally search for thinking in absolutely the query time who's saying expansion and I just send your results. And then I have just some helper files. Okay that's going to utils dot js. This basically just generates the keys for me. I have HTTP that's just initializes the websocket the progress stuff security for the corps. Yeah and this is where I did the authentication for the token. So on. When I point upgrade that just basically means when I go from an HTTP request to a socket that's called an HTTP upgrade. On that upgrade I check for the token query part. Apparently likes to just give me that slash question mark and then everything after it is parsed correctly so I just get rid of it. And then I just make sure the tokens good if it's not socket that destroy when I generate the token sale unless you're 12 hours I do have that in the documentation. And this is the endpoint that generate the token you'd see 12-hour exploration. There's one major security alone which is just you need WSS or the protocol that's handled by Andrew thanks. So I don't use an HTTPS either if used engine X upfront and handle them. Yeah that's all stuff that engine X because engine X. When you use stuff like engine X they do something what's called HTTPS or SSL or TLS termination where basically when it passes after it passes through engine X anything sitting behind it. It just looks like HTTP like we don't care if it's SSL or not because we know some of the credit side was made WS. Yes because I didn't bother with seltzer. But if you put any of that token is readable reusable right but if you put engine X in front of it and you can put WSS. I just didn't bother because speed cert bought stuff. And yet that is full text search indexing a very non-scalable and quick. This is the concept method. So that's been an hours. Why don't we take a break and we reconvene and he's going to go over lt stack which is like how you can actually clustered the stuff and Netflix uses elastic. Discord uses elastic and make index a billion. It. I'm a billion messages days like a lot especially when they can be like hey Dave William max limit of 10 thousand characters. And you can also index pictures and video like they do a lot. And they're a tiny company they only had eight nodes in the cluster which is impressively small. You'd think that's like eight servers ingesting a billion messages that that's like what what's a billion divided by. 123 that's probably a billion divided by 24 hours. That's like We're million messages in our v by c. Let's see how many seconds that's. That's a 10,157 messages a second. I'm assuming that's a building. That 23123 1s up. Nope. Oh God. Ok they're doing 11,500 messages a second. That's a lot for only eight computers right like that's impressive. So yet so let's take a break reconvene feel like that's the thing you can do today. If I was if I had more time I was going to read is oh hey yeah okay I was going to take an hour and I've done our and Tim. But I think that is important if you do anything that has Search. Because people don't want to wait to enter to search for stuff. They want results as they're typing because we'd become impatient with faeces provide. Yeah and the reason I made that library to is you'll see when he starts talking about the AlexNet elk is great and it's good to scale. It's a pain to set up. So if you just have a little side project like that GitHub repos I think 30 seconds or less parabola >> Cuz you probably want Elastic Search Log Stash and combine Docker Barry. That's cheating. Darwinism question that well outside. Yes. Cool. Yeah Right Mm-hm. Right. One of the things that I suddenly J zoonotic viruses. And and I think right >> Like I was saying since the classes a little bit on the improvise side with 2.5 professors and like last minute prep goods. So it's not like we're going to have a a consistent narrative from the beginning to the end and say these are the learning outcomes and we're walking towards. And here's this builds on that builds on this both on that whatever. One of the things that I try to do in classes that I had spent a **** on a time. Preparing is think very hard about how to get you over yourself Terms of self efficacy which I mean getting into the stage where you can imagine yourself doing this thing at the highest levels. Or even beyond where do you think you're capable of so one of the things that generally think is true is that like normal classes kind of train you on how to be passive in your career and training on how to take orders and follow orders. And one of the things I desperately care about is teaching you how to be the sort of person that can imagine the thing up and build. And so if you're in the website classify people are ellipses. And although at least one is going right sided. You know here's an intense projects. You do your intense projects. You come out of the intense project and you're like oh I am more capable than I thought I was. And so in this class I think we've missed the boat on an intense semester-long project. Like if I were to get this class again and rebuild it from scratch or whatever else. I think I do like alright you're going to build Twitter or you're gonna build Instagram throughout the semester and just go. And use that to kind of like build out everything in the process. So in order to approximate my pedagogical goals which I think about pretty much all day every day is how to get students to see themselves as more capable. And recognize the the speed with which you can actually producing. And because honestly once he got into the wildly and you go through like computer science training or software engineering or something like that and you expect the world to be clear that it isn't you'd get there. And it's not. And you receive some **** S legacy coagulate holding crap things are way earlier than I thought. Which is why it's nice because Wikipedia is complete mess under the hood. That's great news for you because that means that you don't actually have to be that good to do something that says billion. Invalids Achilles lives right hey I'm saying all this to say like for me to approximate what I think the all encompassing any project that you're responsible for does to you. I I almost want to say can you imagine for a second if you were to take all the text that we touched in this class so far. And what is the largest number of lives you can reach and improve by utilizing this tech. Like you know what does a thing you can build with this that might improve a billion lives if you did really well and good marketing team. Only later. To some extent done I'm saying to go from a passive recipient into o I can actually probably do that at scale is the pedagogical goal I care about. And we probably lady down in this course just a tad by not having a hard enough assignment like mini-project quarters okay I wanted to do a project two and ended last week with the hace plus here. And he's an octave project to people and they say oh yeah I was just thinking in class things discretized ****. Too late. You know so I think to some extent that's a little bit of a discipline. You need one like deep dive projects in order to convince yourself that you can do it at scale I don't know. Quite got you there. Some saying Can you imagine yourself doing this and scale it actually you know violin wise in the room. I'm not sure. I think I'd maybe gotten. 10% think they probably could. It's not quite good enough for me. But I don't know can you imagine yourself using this tech to build the next Instagram and if you can or can't or whatever that means that the next step is just grabbed some people you love and get to a whiteboard. Start designing a service that can improve the lives of millions. And now you've got the tech background and cool at all. Like like if you've got just enough to start googling. Now you go to work on like your process and your ideas and things like that whatever else. And then you reverse engineer the tech behind that. As money starts to flow in. But you know say okay I'm just saying that the thing I care about is that you are like people who need the needle for humanity. And so here's tech that can empower you to do that. Like for me this tech took me out of the four side of town and to be in or right and businessman or whatever. That's a big deal to be right and this is like a really democratic source of power. Because all you have to do to make money in this world is provide lots of services like high quality service to large numbers of people. And this you can do to provide high-quality service and large numbers of people for pennies a month like ****. Alright let's go right and so now the next step is that likes to lean towards the water sentences that now it's your job to go the other 50 yards or whatever. I don't know whatever making something distant analogy. Where you say Okay cool I've got some capability. I have to go to imagine how I can use it lives right so I just wanted to plant that seed for. Thanksgiving poets. Also probably just you but uh tweets my back at the gym because I don't really know what I'm doing and going back tomorrow. So you've got like dead lifting advice. I'm game for that. Delta ed live tomorrow got it. Again bench. Get. Harlow go away It probably will be glad I s. So here's I will say this. Here's an extra credit. So I could say I think you need one more deep thing if you want extra credit which honestly the grades in this class will be easily graded ******** whatever sorry thanks you're welcome. But you want extra credits that will guarantee that you're gonna get an a or whatever. Then you can do this thing. So what I wanted you to do is take price point to build monitoring dashboard. Which is just to say your boss is going to be like hey I want I don't want to users. We've got I want to know what's going on or you cyber-security sense. I want to have an event monitoring system so I have my 20 thousand N points. I want to know like here's a severity ten knowing somebody's triangulate data assets or somebody is trying to do my directory traversal or whatever else on this endpoint. You need some sort of system where you can see that stuff and then like delegate it to people to go and search. So that's where we start. It's helping you build a pretty looking dashboard. Full disclosure. I have not done this for a client. I did a Google version in Data Studio using BigQuery instead of this elastic stack per se that's fine. But that client kind of ended up being like a ******* ends and like courts or whatever like that happens. Like cease and desist is kind of normal. Anyway a little painful out there. So so that didn't even make it into like the hands of real people or whatever else. So so with that in mind before I even jump into this. Let's watch a video. I've never done this in a class before but this course focuses on Elastic Search. When I want to take a moment to talk about a few other technologies that are related to. I'm gonna see if I can't make the audio go through the thing >> Elastic Search together with Elastic Search they form what's referred to as the elastic stack. So let's talk a bit about that. If you already know what the elastic stack is all about or if you just care about elastic search then you're welcome to skip this lecture. But I do recommend that you stick around the elastic stack consists of technologies developed and maintained by the company I hand Elastic Search. We just talked about Elastic Search which is the heart of key elastic stack. Meaning that the technologies that I'm about to tell you about generally interact with Elastic Search although it's optional for some of them. However there is a strong synergy between the technologies. So they are frequently used together for various purposes. Alright so let's begin by talking about something called Cabana. Keep Anna is an analytics and visualization platform which lets you easily visualized data from Elasticsearch and analyze it to make sense of it. You can think of Cabana as an elastic search dashboard where you can create visualizations such as pie charts line charts and many others. You can plot your website visitors onto a map and show traffic in real-time. For instance. You can aggregate website traffic by browser and find out which browsers are important to support based on your particular audience. Cabana is also where you configure change detection and forecasting that I mentioned in the previous lecture Cabana also provides an interface to manage certain parts of Elastic Search such as authentication and authorization. Generally speaking you can think of Cabana as a web interface to the data that's stored in Elasticsearch it uses the data from Elasticsearch. And basically just since crew is using the same rest API that I previously mentioned it just provides an interface for building those queries and lets you configure how to display the results. This can save you a lot of time because you don't have to implement all of this yourself. You can put dashboards where you place a number of metrics and visualizations. You can create a dashboard for system administrators that monitors the performance of servers such as CPU and memory usage. You can then create a dashboard for developers which monitors the number of application errors and API response times. Effort dashboard could be a dashboard with KPIs being short for key points of interest for management keeping track of how the business performs such as the number of sales revenue et cetera. As you can see you're likely to store a lot of different kinds of data in elastic search apart from the data that you want to search and present to your external users. In fact you might not even use Elastic Search for implementing search functionality at all using it as an analytics platform together with Cabana is a perfectly valid and common use case as well. On your screen now you can see a couple of screenshots from the Cabana interface giving you an idea of what it looks like. I cannot cover all of the features of Cabana here because this is just a quick overview of the elastic stack. There is a public demo of Cabana though which has some pre-configured dashboards and sample data. So if you're curious to see what it looks like in action then be sure to take that out of a test a link to the demoed this lecture. Next up we have a tool named Log Stash. Additionally luck surge has been used process locks from applications and send them to Elastic Search hence the name. That's still a popular use case but Log Stash has evolved into a more general purpose tool meaning that Log Stash is a data processing pipeline. The data that log size receives will be handled as events which can be anything of your choice. There could be log file entries e-commerce orders customers chat messages etc. These events are then processed by Log Stash and shipped off to one or more destinations. A couple of examples could be elastic search a Kafka queue and email message or to an HTTP end points. Log Stash pipeline consists of three parts or stage's inputs filters and outputs. Each stage can make use of a so-called plug-in. An input plugin could be a file for instance meaning that luck stress will read events from a given file. It could also be that we're sending events to Luxor S over HTTP or we could look up rows from a relational database or listen to a Kafka queue. I wanted to maybe say something about the Kafka queue real quick. Because >> Yeah this one crazy project that we're supposed to be protecting Delaware Water assets or like receiving this Big Data Center on meals for the military or wherever we went over to Aberdeen and they're helping us a cloud stack like they would use for intelligence operations. He gets it from a drone. You get some like whatever for MY is more advanced or else you only pick that all this sort of intelligence dashboard or somebody can see way what do you need to know that even though this row and that row to avoid my you know something like that. So in that the Kafka queue I think is really kind of cool thing and we did like pumps of engineer and kind of are like websites sockets at scale on architecture. The Kafka cue is it's like iron IO amazon S and S. I think it's the Amazon one. At every letter has different copies of the same things. And these are like the open source versions. Kafka queue is basically like a rock and a thing. I can have machines all over listening to that thing waiting to handle some sort of payload. So we're using a military stack as you get like a new report from a weather station we run it through our machine learning or neural nets. It makes it sort of Flood Warning prediction then dropped that flood worrying prediction back into the system. So it is a Kafka queue with that new data comes in. The machine-learning thing runs the neural net on it and then drops the data back in to get processed syndrome that data is still like it's it's a worker management thing. And he's saying listen with you is you've probably got you whatever. So if I'm thinking of this as the ACT again point might be you know whatever. And then I can use it almost the same way. I think of this as like my data my indexing. I'm thinking of this as like my workers for process. So I need some CPU power somewhere. That's what I'm gonna use. You guys to do that already with all the pumps up stuff you've done. But sometimes just a slight variation of pub-sub is I don't just want some clients wake up. I want to like allocated GPU to do some sort of password cracking or something like that or whatever else. So when you're gonna manage CPU resources when he uses RE queuing system my Kafka whereas an SRA database there are a lot of input plugins. So chances are that you will find what you need. Okay so while input plugins are how Log Stash receive events filter plugins are all about how luck structure process them. Here we can pass CSV or XML or JSON. For instance >> Also to data enrichment such as looking up an IP address and resolving its geographical location or look up data in a relational database. And output plugin is where we sent the processed events to. Formally those places are called stashes. You can see a couple of examples on the right hand side of the diagram. So in a nutshell luck stash receives events from one or more inputs processes them and send them to one or more stashes. You can have multiple pipelines running within the same block stash instance. If you want to And Log Stash is horizontally scalable. Log Stash pipeline is defined in a proprietary markup format that's similar to JSON. It's not only a markup language. However as we can also add conditional statements and make a pipeline dynamic. Let's go for simple example before moving on to other components of the elastic stack. Suppose that we want to process access logs from a web server. We can instruct locks to ask to read the access log line by line and process each line as an events. This is easy by using the input plugging named file but as you will see later in this lecture there's a handy tool named beats that is better for this task. Once Log Stash has received a line we can process it. A line from an access log file contains various pieces of information but Log Stash just receives the line as a text string. So we need to pass the string to make sense of it. In other words we need to structure this piece of unstructured data. What we can do is to write a so-called grok pattern which is basically like a regular expression to match pieces of text and store that within fields. This means that we can have a feel for the status one for the request path another for the IP address and so on. This is very useful because wherever we want to send to process the events after we surely don't want to send a long text string. Suppose that we want to send the log entries to Elastic Search. We want to have designated fields for each of the pieces of information within the documents that we add to Elastic Search. So far and access login tree. The document could look something like what you see on your screen now. So that was an example of a common use case of Log Stash to receive access login trees process them and send the results to Elastic Search or any other stash of your choice. Of cost log search can do much more than this but this was just a quick overview. Next up let's talk about a part of the elastic Stack. Named expec expects is actually a pack of features that add additional functionality to Elastic Search and Gabbana. It adds functionality in various feature areas. So let's go through the most important ones. First we have security. Expect adds both authentication and authorization to Bob Cabana and elastic search. In regards to authentication Cabana can integrate with Active Directory and other technologies to provide authentication. You can also add users and roles and configure exactly what a given user or role is allowed to access. This is very useful as different people might need different privileges. For example a marketing department or management team should probably not be allowed to make any changes to data but should have read only access to certain data of interest to them. Next up expect enables you to monitor the performance of the elastic stack being Elasticsearch Log Stash and Cabana. Specifically you can see the CPU and memory usage disk space and many other useful metrics which enables you to stay on top of the performance and easily detect any problems. What's more is that you can even set up alerting and be notified if something unusual happens. Alerting is not specific to the monitoring of the elastic sector. As you could set up a learning for anything you want. For example you might want to be alerted if CPU or memory usage of your web servers go through the roof. Or if there is a spike in application errors Perhaps you want to stay on top of suspicious user behavior such as if a given user has signed in from three different countries within the past hour. You can then be notified by email Slack or others when something goes wrong. With reporting you can export the Cabana visualizations and dashboards PDF files. You can generate reports on demand or scheduled them and then received them directly with no email inbox. You might want daily or weekly reports of your company's key performance indicators or any useful information so engineers or system administrators. Reports can also be triggered by specifying conditions kind of us with alerting. So you define rules for when the reports should be generated and delivered. You can even have the reports generated with your own logo for more professional look or perhaps for sharing reports with customers. Apart from exporting visualizations and data as PDF files you can also export data CSV files which could be useful for importing data into a spreadsheet for instance. Expect is also what enables cabinet to do the machine learning that we talked about in the previous lecture so basically the functionality is provided by XPath and the interface is provided by Kabbalah. Let's quickly recap on what you can do with machine learning. First we can do abnormality detection such as detecting unusual changes in data based on what the neural network beliefs is normal. This can be tied together with a learning so we could have a machine-learning job watching the number of daily visits to our website. If there is a significant drop or increase for that matter this will be identified and we can optionally be notified by email or something like that. The other thing we can do is to forecast future values. This is especially useful for capacity such as figuring out how many visitors our website is going to get in the future this could be helpful for spinning up additional servers if we're not using auto scaling also have more support agents available. An example could be that a webshop gets a lot more traffic in both November and December due to Black Friday and Christmas sales. Next let's talk about a feature called graph. Graph is all about the relationships in your data. An example could be that when someone is viewing a product on an e-commerce website we wanted to show related products on that page as well. Or perhaps suggest the next song and the music playing app such as Spotify based on what the listener like. For example if you like the Beatles there's a good chance that you also like Pink Floyd. But to make this work it's important to distinguish between popular and relevance. Suppose that a lot of people listen to Lincoln Park. And they also enjoy listening to Mozart every now. And then that does not suggest that the two are related. But the strong link between them is just tossed by the fact that they are both relatively popular. For example if you go out on the street and asked ten people if they use Google most of them will say yes. But that doesn't mean that they have anything else in common. That's just because Google is so popular for all kinds of different people. On the other hand if you ask ten people if they use stack overflow the ones to say yes to have something in common because StackOverflow is specifically related to programming. So essentially what we're looking for is the uncommonly common Because that says something about relevance and not popularity. The point is the purely looking at the relationships in data without looking at relevance can be misleading. That's why GraphX uses the relevance capabilities of Elastic Search when determining what is related and what lessons. Graphics poses an API that you can use to integrate this into applications. Apart from the aforementioned examples another example could be to send our product recommendations in a newsletter based on a person's purchase history. Graph also provides a plugin for Cabana where you can visualize the data as an interactive graph. This is very useful when you know what you're looking for but it's perhaps even more useful when you don't UI lets you drill down navigate and explore the relations in your data that you maybe didn't know existed. So that's a quick introduction to graph. It works out of the box so you don't need to index data in a specific way or change anything to use it. The last feature of x pack is one named SQL. If you have worked with relational databases then you are familiar with the language being SQL. In the Elastic Search we create documents with a proprietary query language named the query dsl. This is essentially a JSON object defining the query. The query dsl is really flexible and you can do everything with it. It might be a bit propose at times. For developers who come from a background of working with relational databases it will be easier to just work with SQL wouldn't it that's possible so you can send SQL queries to Elastic Search over HTTP. Alternatively use the provided JDBC driver. What Elasticsearch does is to translate the SQL query into the query dsl form and behind the scenes. So internally the crew is handled in the same way after that translation. What's cool is that there is the Translate API where we can send SQL queries to an elastic search will respond with the crew DSL that the SQL query was translated into. So if you need some help writing the query dsl query you can write it as SQL translated and get a great starting point for your query. And that's a great way to get started if you really want to stick with S grill. Personally I see this as a helper for development and for getting started. Once you get familiar with the query dsl I do think that's what you should use but that's just my personal opinion. And that's it for x pack. Finally right. Now we just have one thing left to cover something called beets. Beets is a collection of so-called data Shippers. They are lightweight agents with a single purpose that you install on servers which then send data to lock stage or elastic search. There are a number of data shippers called pits that collect different kinds of data and serve different purposes. For example there's a beat named file beats which is used for collecting log files and sending the log entries off to either Log Stash or elastic search 5-bit ships with modules for common log files Such as and genetics the Apache web server or MySQL. This is very useful for collecting log files such as access logs or error logs. Another beat worth mentioning is metric beat which collects system-level end or service metrics. You can use it for collecting CPU and memory usage for the operating system and any services running on the system as well. Metric bead also ships with modules for popular services such as entry and exit or MySQL so you can monitor how they perform. There are more pizza available as you can see here but forbid and metric beats are the ones that are most commonly used. So be sure to check out the documentation in case you need something other than what we just talked about. All right so let's put all the pieces together now. The center of it all is Elastic Search which contains the data. Ingesting data into elastic search can be done with beats and all Log Stash but also directly through elastic searchers API. Nibbana is a user interface that sits on top of Elastic Search and lets you visualize the data that it retrieves from elastic search through the API. There's nothing cabinet us that you cannot build yourself and all of the data that it retrieves is accessible through the Elastic Search API. That being said it's a really powerful tool that will likely save you a lot of time as you probably won't have to build your own dashboards from scratch then we have x pack which enables additional features such as machine learning for elastic search and Cabana or management of Log Stash pipelines in Cabana. This is all referred to as the elastic stack. You might however have heard of something called the EL case stack before. This refers to Elastic Search Log Stash and Cabana because these free tools are so frequently used together. The term originates from before there was something called beats and expect creating what is now known as the elastic stack. So the elastic stack is a superset of the ERK stack. But these days you'll mainly referred to the elastic stack. As you can hopefully see covering all of this in a single course is nearly impossible as each of these tools provide so many capabilities. And if I were to include them all in the course or just barely scratched the surface. Now that you know what the elastic stack is all about let's take a moment to talk about some common architectures and high. So I haven't done that before and a class foreign language wise when I turned around it was MacRae maybe hours another best experience around. But you know I can say y right it's that there's a lot of different kinds of functionality that you might be able to imagine using >> I'm not gonna cover all that. I'll show you how to get it spun up and get you going. And then that's for your imaginations. And when you drift off to sleep tonight you think Oh I wonder if I could use machine learning to predict when I've got a spike and my users or whatever else has something like that. So that I can get an alert off or servers or whatever else or something. They say yep okay I could probably spend a weekend Yeah there's some sort of parking ticket right we can access it provided >> Okay so hopefully there's value in that saying that every time I pause silica gel. Fk eight twigs. Anybody know what that is. That's where j is. He's at an NCAA twigs concert. But I never heard of it. They are not the Beatles saying. All right so let's do it. Let's do it. Go Digital Ocean. >> I made the web kids do this the other day sorry. I will say this is spin up a droplet For the droplet do the $40 a month one. Don't leave it on. Do shut it down within the next hour or so it will be less than $0.06 and your $350 credit >> This is clay that droplet and I didn't pick the Docker is the marketplace. Docker. Use this. Once you're in here in the class notes. Here is a an elk stack Docker image that we're going to clone. Cd into it. And compose and Ana. Sorry if the video is boring hopefully it was valuable. Good at Digital Ocean spin up a VM do the $40 a month one. And just remember to close it >> Yeah if you want to do Google platform i this particular stack needs like. It needs a good chunk of RAM. And and you need to make sure you have Docker on there. So if you have Docker I think I think 80 gigs of RAM is sufficient maybe. But this one was like I don't know that's that's on 80. Gigs room is like I think four gigs of RAM was correct. They maybe six. We say well yeah 80 gigs or M is too much sorry I saw SSD disk. Got all mixed up. Yeah I think you need. I think any of the eight gigs of RAM. And I wanted to Docker does things to care about beyond that. You're probably cool. And if you can get to this stage and it's a docker ps and you've got goodbye law Stash Elastic Search up and running. Then we're cool. And then you need your IP address. And if you go to your IP address port 5601 >> And the username and password I think is elastic and change me. Never. Once you're in I just want to kind of take it for a little victory lap right off the bat. If you go to load a dataset sample dashboard let's take a look at. Add some fake flight data and we'll go look at that. That's jumping in some some fake stuff and elastic search database so that I can go view dashboard. And this would be like a sample live dashboard that you could like you know give the suits the C-Suite folks to make them really think that you're doing your job well. Beautiful stuff. And you can link filter it right so I say let me just see the flights From Atlanta to Toronto there's one flight from Atlanta to Toronto. It's $800 seems like a lot for limits to Toronto. It's not that far. You know this graph is almost useless but just wanted to play. Great. Thank you. Let's see just things coming out of a Log Stash airways and jet beats are the two carriers. Here's all the flights coming out of Atlanta. Accent right and I can filter that to I would say the last day I want to say this week refresh. So this week I had for flight or more flights coming out of Atlanta over and then over the last 24 hours not by a lot had seven flights in the month of November on a lot. But 4203 over everything. So he gives us a natural way up and this is the the marketing a house and we are. Yes so here is my dashboard. Probably more appropriate to this class rather than the flight one I just think it's cool. That map is is the web blogs. Like the sink ECharts. Alright. Now now that makes us feel like we're really powerful. And now I'm going to make you feel really not powerful by us doing it ourselves. Sorry here we go. In fact this data is outdated now. So let's go fan. Do. Salaries like CSB Yes I felt it was this roto grinders. Fanduel data. Yes roto guru. Yeah here we go. I wrote a Guru.com. And I could say let me see all of my gambling on Fan Duel from this week to last week. Let's go into an CSB area. So here's some fresh data. Because the prep these notes for last week. But I need a gamble this week right and you guys have a gamble. So once you get there you might have to save this into a file. It's a little bit a little bit silly. As Alexander Madison teas and SL a year. And in week 12 dot csv. So get get some good gambling data. And here's the task. Is is there a little machine-learning button if you click machine learning data visualizer upload file. And you can see that and we've got our data in their fisher. This screen is almost useless because it's kind of like a machine-learning output like this is like if I were doing some predictions on oh probably most values are wide receiver. Yep thanks >> Okay this is more wide receivers protein. So this this is a little bit useless but we're just gonna go straight to import. And I'll call this like FanDuel week 12. Now I I looked in this this way. We can do some crazy stuff here with this ingest pipeline you could like might put a bunch of this together and cooperate together in one field or you turn like lat launch into like a position on the globe or whatever and things like that. So you can do some cool stuff with his ingest pipeline. If you wanna mess around there. We want to simple import There we go got stuff failed fields points per game per dollar points per game. Great. Visualize. Yeah. Okay that's accurate. Not the most useful graph whatever which is to say most players got u 0 points in most games because they just sit on the bench right like they're there in the database but they're not getting any points. Okay by the way if you don't know how to gamble with fantasy football this is every player is going to play every week. They're gonna get some number of points. Everyone has a salary. So like here's patent homes. And his salary is. ****. Loses sounding. Name team opponent. Only salary. A 196.42 does not sound like a salary to me. I think it's it must be like 19,642. It's like off by a factor of something whatever anyway he's probably would cost me $19 thousand and I have 60 thousand with which to build my team. And so you can spend $60 thousand and all the different players and your goal is to get the most points you can with that. And you have to have like a quarterback and running back whatever it things like that. Okay so these things are giving me like you know kind of the things I care about are how many points per game how much do they cost how much it has gone up points per game per dollar that I spend that's valuable one to me. You know things like that. So like who's the most efficient source of money right now. So here's the thing. I don't know how to do. I'm a newb which is like you are at this right like I've not done Ikebana thing for a client. We did one product. We had setup Cabana for like in the development process but chose something else to actually give it the client because they just did it and be like oh he's giving them Accounts since if it bad or whatever else wanted them. Yeah. But if it were in-house or totally do. So I want you to create a new visualization and I think we should be able to do this with like a data table. But honestly and I will see this is like a useless version of this right now. Just suddenly this 540 new players may database. So I'll say the job is to help me make some interesting visualizations from this. Namely get spit out. Well here's a couple of visualizations make and this is a terrible weight arrangement outs especially because I didn't go to that data in the long run. Find the top ten players by points per game per dollar. So like me a table or CD player the same place for games for dollar. If you like the top ten which is probably easier in a spreadsheet that's fine They make me like a pie chart that has light which NFL teams are giving me the most points per game per dollar or something like that. This is not like Google around and obviously our test. But where we're clever let's do this. So this is like the exact opposite of that plane dashboard click one button and like wow to me I've got maps to Atlanta and I was like I import my data and my **** it says. 543. All right. How do I make that useful but go make it useful >> And that's the next thing which gives the grok debugger. We probably need to collaborate a bit right so since where Google around or whatever else something like that I think it's all about the buckets. So maybe we want to look at like the bucket documentation on goodbye. Like I think if I split rows maybe by terms. By name >> In groups of size one. Okay that's a name. Maybe groups of size. 500 OK cool I got I got names. Alright so I did buckets Split Rows aggregation by term. Field name was name. Order BY count maybe didn't wasn't the right one. So maybe I want to add a metric. Add a metric known what my metric to be. My average points per game per dollar. And like relative value. Okay okay good progress. In fact I'm going to kill the count. Not not the Sesame Street count. An orderBy relative value. Ban. There we go. There we go. Okay so now I've got the New England has been the best dollar per point value in any given game thread. Yeah that makes sense they've infrequent and carried out. The Cheyenne Jackson is the second best player in DFS wavelength. You can sell per game that he's in San Francisco defended Jeff Driscoll the quarterback for the lines from the middle to salary. They actually did okay so great messenger crystal Lamar Jackson. Even though he's like a Mac salaries are so fricking good face still up there. Underprice quarterbacks really like a tough versus sort of rest. Russell Wilson deck Prescott. Any Darwin either It's added on is not going to play anymore his benched but Garden of issue. I have the mug at home garden of issue with mustaches on clay I know I know. Yeah yeah Matt staffers on plane anymore either but not for the same reasons. Christian McCaffrey is there on game two. So maybe we should add another term. Let's add a metric. Selecting aggregation. Is a top hit correct. Now topic doesn't feel right. I don't know what this aggregate aggregation like average count >> I'll say median and I want to I just want to show like the team they're on right like how do I how do I add to this table to team did their own Is that a metric maybe that's not a metric is that a bucket well let's add another Split Rows. Sub aggregation. Okay. Terms Select a Field Team. A little bit worried about this size five business but I think this is okay. Just shows me the team. Yeah that's fine. That's fine. So do I see it the sooner save as top relative value alright that's enough screwing around with that. Maybe let's go make along a pie chart pie chart than I thought it was gonna work. What would be good pie chart for this data >> I want to know which team is getting me the most points. So let's say we aggregate by points bucket it by team. Now let's see like a pie chart of which team is gobbling up the most points per week and fantasy football. Alright is achieves. It's probably not the Patriots cuz they like screw around with it running backs and stuff. Maybe it is but let's find out interesting data. This is building infographics for gambling or just for understanding the world better >> Pie chart let's make a pie chart. I don't know what I'm doing. Right split slices. Term t. But I definitely needed metric. Slice size. Team >> I want to sum over points points. Okay alright here we go. Oh I think my bucket kept it at five. So it's capped at 32. **** yeah there we go. Here we go. Alright. Kansas City is number one. San Francisco number to. Miami is the least washington yeah this is totally correct. These guys suck. Denver Cincinnati Chicago sorry fears. In Pittsburgh giants. Lee Jacksonville is only two steps ahead like we're all in the same boat here together. Folks. Number one is Kansas City than San Francisco than Dallas than New England than Minnesota Baltimore Seattle Tampa Bay Arizona LA chargers a tree that was less painful than I thought it would be honestly. You know maybe this is not the best data visualization. What does the best data visualization for that is that a bar chart histogram bar chart. Let's do that. Alright I don't know. I'm stoked by this right I don't know about you. Yeah I've always like I've got a book at home. I bought years ago called The Visual miscellaneous. Miss gadolinium. And it was just like this differently. 2010 or something like that right and and it just has like crazy infographics about our world right like here's how I don't know a billion dollar o gram. Great. And this helps you understand wealth in the world. And here's this kind of thing. I think it was even a chart there that was like the author's moods while writing this. Here's left-right government whenever it. So like I've always been addicted to info-graphics as like how do I make that now you can make an infographic. There you go interact it cool and then just give it to your boss and you feel like the hero. Okay maybe we gotta do a little bit of ingesting. So we'll do some interesting things are. Or maybe the body language meeting is I get OK. I can now go play with that a male and we can move on topic wise said with that is two omega should make up our chart. Let's make a bar chart. Here we go here we wanna know now. So I wanted you to I wanted to I want to do one that is which team scores the most points. And then which ones scores to be the most points per dollar. So which is the most which is the team that's like getting me the most efficient use of my gambling one. And those are two different questions. All right so here we go vertical bar FanDuel Y axis count. So let's change that to some of the points points. Okay great buckets. X-axis Amen I'm a frequent shampoo. That's now select an aggregation Terms. Field team size 32 team play boom fair. It is clear that okay thirty-seconds no problem. Alright. Now my real question is Who's the best person to spend my money on by points per game per dollar Detroit Detroit has been the one most undervalued by the world. And yet scored the most points exactly exactly and San Francisco New England knows probably those defenses prop him up New York Giants and then Philly is number five. Last analyst is cincinnati. Oh no no UX. Team number n. Arizona Kansas City. Lily charters. I think there must be something wrong here. And it doesn't feel right to me. Cincinnati Miami Chicago. On a plate is there any other can know maybe it's like negative points per dollar per game or something like that. I like about median points per game per dollar. Now it's not sorted. Oh I think if I order here oh metric 50th percentile. Oh I see. I can't sort by median. Average. We'll do the average points per dollar. And then sort by that average. Nope still we have this number N which I don't understand what that means in my data set. Probably an errant thing. La chargers Kansas City Arizona are somehow screwed up but out of the ones that are here New York Jets Miami since in any Pittsburgh Carolina Cleveland Jacksonville. These are the these are the people not to gamble with. And here same Dallas San Francisco Seattle Detroit. All right I'm going to spend my money to Detroit this weekend I'm doing. By the way when I gamble recently I've just been gambling like a nickel and turning it into like $0.10 and be like I'm a genius right click just to annualize it. So it's not it's not huge gambling. I'm doing double in every week. I mean it's a great. Yeah yeah. And and and NFL gambling. We've got all like that because he knows his stuff. Delaware is very friendly safe for economic activity. You're welcome. Mike I chose to be here del or most people don't choose V or I like made a 10-year plan to arrive and Delaware. Next step next step >> Next the task here is to play with this one. Let's watch a two-minute video. There's only two minutes or 15 minutes. It's a little smoother. This is kinda playing with the elastic stack side of it. So croc is a popular and very flexible tool to use for parsing log data into distinct fields. It's available in elastic search ingest nodes as well as in Log Stash. There is an extensive library of ready-made patterns available which you can find links in the documentation. In this video we will show you how you can use. They keep on a crook debugger to speed up Creation and testing of grok configuration Go to dev tools in Kabbalah selected grok debug. In the top section we will copy a line-up data. We will use during development. Here we will use a sample access log from a squid cash. If we look at a sample we can see it starts with a number representing the times dA. This is followed by a few spaces. We can use a number pattern to match the timestamp and the space pattern to match any number of consecutive spaces. At the end we had a greedy data pattern that will match anything coming off. These peptides will then kick the simulate button to test it at the bottom we can now see that the timestamp has been successfully parsed. The next part of the login trees the duration which is an integer we can again use a number pattern to match this followed by a space. This correctly captures the duration. If we were to make a mistake for example by adding a second space. The pattern we're no longer match and we get an error. The next seal is an IP address. So we can use the IP patent to matching capture that. We can then use the word patent to capture the cache result followed by a slash as a separator the status code which is a positive integer can be captured using the post in pattern. We test the part turns out to each and every step to make sure we're on the right track. I think we now have a good understanding of how it works. So we will speed alone and capture the number of bytes and request method in a similar way for fields that are followed by a space but does not contain what we can use the not space pattern. This matches anything that is not a type of whitespace at is very useful. So well worth remembering. Once you've added patterns for the rest of the fields we can remove the greedy data pattern at the end as there's nothing left way too much. Look at that. Our growth pattern is now complete >> Okay is that nothing leftward and so I sort of here's the narrative you finally get it right. You built your product one. You got Viet scattered all over the world. You need to know the sheets breaking when VIN number 12 or that you've got yams and really for war. And you don't want to spend a month but they're all getting up to like a present usage. So that site has been up via number four. Right so then what set of security events. And so you're like taking no evolve or Apache was from that people are making love into someplace where you have your beautiful dashboards. We just work on beautiful dashboard. We didn't do it. Logged data. We felt fine. Now we're going to parse log data into structured data that I began my visualization on right and then you just make the NGS pipeline whilst asked which reads the long data file the file line-by-line New File and his drops it as JSON into your database. Right. Now you've got your whole thing and that day it could be from all of your VMs. So I've written I've gotta do this now because I just migrated one of my products into a style where there's VMs For every different way simulations I might have picked EDM is any given hour then kill them all and 50 BMs now or whatever. And but I use student developers in the backend sometimes and they usually shut up right like something broke. And and if I don't have the errors from that node session stored anywhere else but on that VM I'm never see it again right so what I need to do is go in there because they wanted to utilities that the error messages that are passed into an elastic search somewhere else. So it is like passivation beeping. Here's a here's an error message so that I can go and see all the error message and say this machine and this time an error message from this user. And this advocated ln K2 goes up right or just didn't know what I need to go fix it just to see the error message whatever. So I need to do that because they're all hidden inside of VMs. But at the moment when stuff breaks like it did when I was in Spain or whatever mostly went fine but I went to Spain and and and the harsh because here whatever else and like we gotta do the thing and one of the student error caused the code. I have no idea why. I gotta go reproduce the bug somehow. And since I didn't have this installed I'll never know all right. Great. So here we go. This is all about imagination. It's a day of imagination. So let's do it. Here's my next thing for you. Here is a classic Apache log line copy it. Alright. Go to Cabana. Click the little wrench to get to developer tools. Go to the grok debugger. Do you guys know the word grok Have you heard it before no suffering hitchhikers it's from Stranger in a Strange Land by robert highland. It just means to totally get like Oh I have grok that idea. When you say I've grok that idea it means that I've like I have not only have I heard it like completely internalized that and now I can speak it. A sci-fi turn. All right what was so I pause that YouTube video right here. Not. No I Want to See His rest thing. Greedy data. Let's start with greedy data. Greedy data rest simulate. Okay that's where we begin. Step one I think there was an IP >> Nope Was it called. Oh I wonder if it's cuz this 3.3.3 is like two big. Yeah >> So I copy and pasted some fake data 3.3.3 is not part of the valid IP address. And so I changed it to two 3-3. Okay but that's clever of them. They like you know what somebody's faking your IP addresses. I'd like to see that as a security event because they are obviously putting in fake ones. Ok let's have our cybersecurity folks mind that. So let's just putz around with this a little bit right I remember what was it not space. What do you think home means agent name >> No non-space. No this is definitely not space. Space. O is it not space not space. Agent name. Maybe I try word. I'm annoyed already. Okay so we want. So here's the goal. This one word. This is going to be a line and that actually y. So when I have my web server and I can show you my prompt that edge a web server right like SSH ninja super-senior really long password. Alright. Yep failed attack. Logs >> Front end and no access Croft ninja home. Okay so this is a log file web access attempts in the last 24 hours or however long on my profit and say all right this person was like viewing it from their mobile Safari. And they got a fluorophore. Ok because it was looking for an apple touch icon and I definitely did not provide data prompt that ninja is I don't have a apple so they were actually just fetching the fav icon. So like a ton of requests even for just going across that ninja which happened on their Android pixels like salesmen it'd be having Android pixel. But it's like hey they'll save icon whatever else things like that. So if I were to set this up in a series of like distributed Prof. that ninja websites and I have Apache running on each one of those VMs. I want to now have every one of those was not being on the temporal VM but I want to have it on my permanent you know indexing website where I can go through all the log files and be like Let me know all people were coming to my thing from Malaysia Like how many people are coming out at me from Singapore right and so that means that every time the line is add this thing which is going to happen every time there's a new request I want that new line to get parsed and dropped into my database. Alright that's what that's what like Log Stash is made to do or metric file. Okay fine. So. So this is like a sample Apache log. Now coupon is probably going to have one of these already. So what I did is I made it I grabbed a fake one off the internet dropped it in there. And I was going to now practice what I would do in real life to set this up which is I'd say okay read from the file and every new line that comes in I'm going to parse it and turn it into a JSON object without key-value pairs in my database and speak the weights and parse it as it is growing business. And so what I'm gonna do is I'm going to say hey here's something adhere to capture. The type is an IP address for it in the PIP. Maybe I should make a key something different like **** face doesn't communicate as much. But at least demonstrates that the thing after is not the type it's the variable name. Alright then this is kind of like doing regular expressions. So this first something captioned yet. Yeah there's this next thing is going to not capture anything. It's just saying hey now I say a space character to that and say everything else and drop it into a string. And that's rest. Necessarily what that guy did in the video. It's like okay well I'm just gonna go through chunk by chunk until I had them all. So the next thing is this word home. So I want to add something right here. And another thing that will be but what does home mean there I actually don't know any Apache log right so let's look at an Apache log instance. Log format we go. So this is Frank right so this is the IP address. Then the hyphen says This piece of available is not a valid is not available and then it's like a UserID. Alright so here this piece was not available in the Frank. And since this would've been the user ID So I think this is like getting their computer to them. And this is the name of the user to them. Alright so my goal was to now say I wanted to say something like word comp. And then maybe percent space again. Yes. Alright and I captured one where he's like Pokemon right. Now I'm going to catch the next thing. In the username should just be the dash or something like that. Anyway if you just do two or three of those and you say OK.. Now if I knew that I had to do this I can set that up probably hadn't got accustomed patterns. And like find one or even let's go one step better because we're like Google era children right let's go like Apache file. Rock group expression Log Stash grok filter Apache access month. Great. Yes. Here we go. Let's just see if that works. **** yeah. Alright cool. So rather than go through one by one saying hey client IP address their identities home. The name was not given the time sampling thanks 98 February first Great. Thank you pin in it. And you're making a get request and going through my banner ads as an ECM that's the resource the requesting obviously and the HTTP version 1 and the server responded with 200 people. Now I've got my Apache log and there is all the data I drop that in. And now it's just like my FanDuel data right now. I can do visualizations on that. I can group it by IP address and grouped by subnet on the IP address whatever and be like okay cool I now know all behavior is coming from IO or from Malaysia or whatever works. Right now you get it. Right so we got ten minutes left. I don't know. I think I'm cool you got you got questions or thoughts on this. Here's the thing on metric B. I'm not going to like do anything with you but just take a look. Here's how you could use it. And in metric means kind of cool. Like I thought it was just nice drafted in air. It is it will measure memories network usage CPU usage and drop it into your thing right so you got all your different VMs. You learned over your VMs getting stressed out you put metric beat on there. So it's just gonna report to you every every five seconds or 15 seconds or 60 seconds or whatever the current state of that VM. So you just have it your whole dashboard. Alright so the idea is you're just gonna pull data from all over the world. Now all you're crazy cloud architectures whatever tag it in some way dropping your database and then your job is just build pretty visualizations for people like analytes. Ok. And Elastic Search is just a oh by the way like Dan did the full index search I can't say this If we go to explore. What's the right place to explore this edge. Watch your No. We're kind of like Discover Discover. There we go. And I want to discover my FanDuel week 12 data. What that import does is once I get that that JSON object that I drop into Elasticsearch is mother is super indexed all the way and a full text search. So this is like what they ended but super scalable right and go. It's millions of records anywhere else and hunted there for my Julio Jones or whatever kinda just type Jones and no I probably have to make something fancy Gid or name equals. Okay great. There there's one record on Josh on. Ok. That's not full-text search version but there is limited full-text search in this stuff is hyper indexed by you can make all the indexes whatever else. So the way people typically use this is Elastic Search is like my index function on everything else. Yeah maybe the last little thing. I had one other video on here which is basically like how what is lovesick search can be you. I'm just going to jump straight to this screenshot here. Turn off my CSI. What this is representing is a huge and I like to say this will be one server. There company rules this group three servers. Here's a database server flopping back and forth through the database. Whenever a request comes in. This is metric beat or yeah yeah metric beat which is going to say how is this thing doing CPUs on fire that smoking is there like bandwidth whatever. That's just going to drop into Log Stash Log Stash does the grok thing that we just did it JSON that's usable inelastic seconds or database. This thing can also drop the directed to David. Was issuing a query a database. You can say hey Andy By the way you might want to know this JSON object say some do the searching for like I don't know. **** about people with no less super important because just like tears as they say okay cool I got that IP address are now blacklisted. No more lately. I'm sorry. I didn't mean to call you about relatedness **** addiction. But I think now anyway so if I want to alert the costs or whatever that'll drops in the elastic search and here's my commodity dashboard to see like who is here who is searching for all this stuff. And then we can link make different choices. Or we say you know what the legless market is. So large. We're just gonna like pivot our front page blog to be about legless stuff. So that we make more cash right like it's a tool you do with what you want with it. Alright hey but that's the kind of architecture that you would end up setting up with IOC stack. It's monitoring for cybersecurity. This **** is everything right this is the detection part of how bad things are happening. You need to know what bad things are happening on your 10 thousand endpoints. This is how you can pay attention to that. Okay thanks See what I thought that was a scanner. So that's going to like scan your endpoints but what I will do with this is I would like to open gas results. Maybe if I've got like one it's going to run automatically every 24 hours or something like that. Take the results run it through Log Stash to drop into my dashboard so I can seat like along with everything else because when I do it with my dashboard like I was a JP Morgan secure operation center and they've got basically the this is the open source version of Splunk Splunk is the corporate version of this. So they I spoke dashboard looks just like Eric about a dashboard and it has events coming in. Here's an event. Somebody was trying to edit log files of their cloud history. You're like oh that's a security event. Alright They had like try and tell us if people were like hey you know what Iran is really ****** off right now. And we expect the threat actors that have these signatures we doing more stuff. So we want to make sure we've got some controls on our point-of-sale systems coming out of Iraq whatever. So then here's an event that took place in Iraq. And now it's on there and one of the cybersecurity people's gotta look at it figure out what's going on. I do the forensic salad and whatever else and handle it pass it off to whoever's in charge. So this is the way to do that dashboard. So I've used my like Nessus scans or whatever else or if I haven't running automatically I'm gonna dump the results. It's my dashboard so that I've got my security folks. So we're gonna have one screen to see ways >> Thanks >> Tomorrow I'm going at ten for half an hour. Typically it's Monday Wednesday it to Tuesday I went at like nine. So normally Monday Wednesday at two o'clock because that's what I'm useless. Right like I've had my food I'm digesting whatever. And so yeah exactly. So if I'm getting tired I'm just gonna go to the gym normally I just play racquetball bike or whatever but I'm working in like Single rep. Max like I don't know. I feel like it's strengthened >> I probably spend the weekend. Yeah yeah. Yeah yeah I messed up on the deadliest. Somebody was nice enough that form here. Oh yeah that's good but which display question easy video. So at first I was like wow wow. Yeah. So don't go straight in your head kind of like the grounds or something >> Sure. Sure so so Archon lower bandwidth not. Okay. If that's the problem of. Ok. Good thinking. Oh yeah. Sorry locale and tsunami was hollywood was focused on that. I can and I've had pretty good. Okay okay I expect the hardest part from here is going to be navigate streams. So two ligands. How extreme oh okay maybe we need to react react rounding here and you get that right. Means it's components. So what Wendy for means babies as created signed up >> They don't mean it's false by people okay so anyway to true yeah in the backend database. So now anybody who is true can deletes Oh I see so so if they have admin true in the database they can then information goods you do it in a database. Yeah that's good. Yeah it's good in the debates and everything. Yeah if I'm These are looking on the unselected doing what's on my fun well I'm not but
cpeg473-010-20191121-183001.mp4
From Andrew Novocin November 22, 2019
75 plays
75
0 comments
0
You unliked the media.