All right, so this is a data science pass introductory data science with little to no requirements. So if you're interested in data science and you don't feel like you already have a very strong background to enable your data science practices, this is the place for you. If you do feel like you have a strong background. This might also be the place for you though. The class might be a little bit easier, maybe that's what you were shooting for. That's also fine. This class is peculiar in a couple of ways. Probably the most significant one that I want to mention is that we're teaching this class jointly at the University of Delaware and at Lincoln University. Lincoln University is a historically black college and university in Pennsylvania, just very close to Delaware. University of Delaware is the largest university in Delaware and it's an R one university. We're doing this under a National Science Foundation program to advance data science pedagogy, which is called the Harnessing the Data Revolution Data Science Corps. And the specific name of the program that we are running, which was awarded the data science grant by the National Science Foundation, is called the Met Sciences Delaware and Med Atlantic Data Science Corps. Welcome to the program. Welcome to the class. The class is being taught online as you know, because you're joining here, there will be students from both courts. The University of Delaware Court as well as the Lincoln University Court. This will add a little bit of intricacies in like the handling of multiple canvases. But for all intents and purposes, I would like to make sure that this functions as a single cohort. That you can all work together regardless of your primary affiliation. We will also have some other guests from Lincoln University that will come to the class. Today you'll see Dr. Thomas who is one of the PI's of the program, Dr. as well. You will probably see some more faculty over the course of the semester. You will see a faculty at Lincoln that will shadow me, which I think I understand, Dr. Ale. And we'll take over the class at on the Lincoln site in in future semesters. That's to make sure that everybody knows who are the people in this room and that you're all aware of what is going on. This is the second year we teach this class. I think that is it. As far as the very general introduction goes. My name is Caco. I'm a professor here at the University of Delaware. My primary affiliation is with the physics department and I'm an astrophysicist by training. I study stars, galaxies, planets, particularly. I study things in the sky that change on time scales that are observable. Like you usually think of the sky as pretty much static and never changing. But in fact, it changes very rapidly. There are many phenomena that change very rapidly. Those are the things that I study. Supernova, exploding stars being probably the most common topic that I address in my research portfolio. But I have always worked with very large data sets. Astronomy is very, um, well set up to be a leading discipline inside of data science because it has relied for a very long time on large data sets and what a large dataset is, that's a time dependent qualification qualifier. Currently, a large datasets will be terabytes and Tabytesdata do have datasets in astronomy that are really pushing the envelope of the sizes and the quantities comparable to the entire datasets that are comparable to some of the other largest datasets in data science that you may study, like social media data and things like that. Obviously, in the past, a large dataset might have been a much smaller, much smaller in size. When I graduated, when I got my Phd, I studied about 5 Tb of data. And that was really impressive. That was only like maybe 20 years ago, maybe 15. A large dataset depends on your infrastructure and the time at which you're studying. Astronomy for historical has always been dealing with datasets that were large at the time. A lot of astronomers are very well equipped to make advances in data science. Whether that means to develop algorithms that are novel, or to develop applications of existing algorithm in novel ways to astrophysics. Because of that, I've built a skill set of data science that more generally allows me to have interdisciplinary interdisciplinary research portfolio. I work with people in the geography department. I have an appointment at the Biden School where I study social data science and I teach data science applied to social science disciplines. I have a portfolio of research, even in medicine, et cetera. Just because I have very solid data science and data handling skills, which is what I hope to teach you throughout the course of this class, the class Foundations of Data Science for Everyone. With the emphasis on the idea that you don't need to be very proficient in any specific sub discipline to be able to address data science problems. I don't expect you to be mathematics or expert coders. We'll talk in more detail about what I expect so that you can evaluate whether that is consistent with your preparations. But I also expect that some of you will be stronger in some areas and weaker in others. Whether those are technical or maybe you can bring domain expertise. You can bring an understanding of specific kinds of datasets, whether that's social science datasets or natural science datasets that also is part of data science. I don't expect anybody to have any particularly strong background on any of the specific areas. I'll work with you to make sure that you can advance in the areas in which you need more help. With my support for your classmates and the support of the other faculty that will visit us, you hopefully will work with me and your classmates to lend a hand and help your classmate advance in the areas where you are stronger and they may need more help. So far, so good. All right, let's give you some visuals. One thing that I want to say out, right, is there is no book for this class. There's no single book for this class that will be a little bit unsettling. I'm sure I do understand that. There is a certain level of comfort that comes with. Well, if I get distracted or miss the class or you just don't want to listen today, I'll just read the book and that's fine. It doesn't work like that in this class. I think ostensibly I want to say because data science is not really an advanced discipline, to have a solid single reference that we can use to cover all the material that we will cover throughout the course. Most of the resources that Can you hear me okay? By the way, I didn't ask. Okay. Most of the resources that you will use will start from the slide decks that I create. I will try to write how writing summaries of what I'm telling you, but I'm not going to make any promises that they will be available all the time. In most cases, there will be more like bullet points of the topics that I've addressed which can be included in the slides so that you can see if you can, uh, make sense of the specific topics. I will have occasionally, some write ups about the things that I am telling you. Because we're also making this class into a book. Unfortunately, if you have taken in two years from now, you would have a book that follows the glass. But as of now, we're just going to essentially create this content. You're going to follow the content on the slides and I'm going to create the narrative as we go. The slides are going to be available to you at this website, which is Slides.com I have a link. Let me see if it works and put it in the Chat and see if you all can access it. I don't remember if I make sure that the line back is visible to everybody. So let me just make sure that if you all can also click and tell me if it's visible. Yeah, it should be visible to you. Is the working. Yes. They're available on this website. If you remind me regularly, because I will forget, I will also post them as PDF. That will make it a little bit easier for you to navigate them or maybe navigate them offline if you need after the class. But they will be here. They will be created here. They will be available to you. You'll be able to see them. Probably even before they're finished at a link that will look just like this. We'll have a different number for the class week that we are doing. If it's the second week, it will be 02. If it's the first week, the current week is 01. There will be a lot of links today as we go over how the class works, how the grading works, and a little bit of Python coding. There will be a lot of links. They will all be on these slides with QR codes and short links and all of that. Just keep track of these slides and you should have access to all the resources you need. You will see that there's another couple of places where that should make it easy for you to access resources as we go. All right. I thought we would start with discussing what is data science. I thought I would tease you ad and ask you if you can write something about what you think data science is. With the following rules, ten words or less. Put whatever you think is a good definition of data science in the, in the chat, and I'll give you 3 min for that, a Mm, I'm liking the definition. They're coming down. Very good. I like all of these four definitions, playing with data, that's a fun way to think about it. Good, I think I got enough answers that I'm going to stop the timer. But if you want to contribute more, feel free to continue contributing. I'm liking a all of these definitions, they're all in their own way, correct? I don't know that any of them is comprehensive. But the point that I, that I wanted to make all along is really that if you ask a room of experts in a conference on data science, what is the definition of data science, You will get just as much diversity of definitions. There's a, there's so many conferences where how start with what is data science? I'm going to give you my definition just so that we have a common understanding that needs not overwriting any other very good definitions that you all gave. I think data science I would like to define data science as the field of studies that deal with the extraction of information from data within a domain context to enable interpretation and prediction of phenomena. I want to focus on this definition. Data science is a field of study. It's a discipline. It's a discipline the same way that math is a discipline, or the same way in which geography is a discipline. But it is in a sense, it adds a layer of complexity. Because while statistics, for example, can be thought of abstract sense, evaluating abstractly outside of the domain characteristics of collections of numbers. Really, you cannot think of data science in that sense. To talk about data science, you need to address both the domain and the data. The data in most cases number, they're not always numbers of course. But the data are the collection of information that we have. But we want the collection of are the collection of observations that we have. But we want to extract the information from it. And how to extract and interpret that information is a domain aspect. For this reason, this class will be entirely project based. What I mean by that is that we will always talk about data science problem, data science method, machine learning method in some cases, which I'm sure most of you are eager to get to, artificial intelligence and machine learning. In the context of a problem. How do I extract this data and interpret it in the context of this problem? In the context of this domain, we will not be sticking with a single domain. We'll do things in the physical sciences, we'll do things in the natural sciences. Part of my portfolio works in urban science. I have a lot of data sets and a lot of exercises of applied data, urban science. And we'll work in different domains. We will always think about the context in which the data arises and it should be interpreted. One thing that I didn't, that I didn't include here, but actually somebody put it in their answer and it's correct. We should be here too. An aspect of data science is also data collection. It's not just the extraction information from the data, but also how do we collect data. That's something that we won't do. We won't go in the field, we won't work with any instruments directly. But we will think a little bit about it when we think about noise characteristics and how to integrate them in your data analysis and things like that. Of course, as a field this includes some people call it foundational data science, which means the development of new methods, entirely new statistical machine learning and artificial intelligence methods for the analysis of the data. We will not develop any new methods, I don't think so, but if somebody has new methods that they want to develop, that perhaps will be an amazing output of this task. That's not expected. But we will apply the method in a way that is domain relevant. We'll apply a variety of methods. I'm hoping that we will cover a fair amount of machine learning in this class, but I want to make this distinction explicitly that data science and machine learning are not synonymous words, they don't mean the same thing. Machine learning is a subset of methods that are in the portfolio of data science, but data science is not limited to machine learning. Here is another van diagram that is useful to keep in mind, also some terminology and taxonomy that I would like to clear up. Before we start, I will use sometimes the word artificial intelligence more or less interchangeably with deep learning. I think this is a trend when I hear artificial intelligence on the news. In most cases, what people are thinking about our artificial intelligence, Neural network models like GPT, Dad to make images right. Those are all neural networks that is currently the frontier of artificial intelligence. Strictly speaking, artificial intelligence is a superset of neural networks, and machine learning is also a superset of neural networks and a subset of artificial intelligence. Artificial intelligence is the collection of methods that enable machines to make decisions that are not explicitly programmed. But you can do that in a couple of ways. You can have like flow charts of decision that say if this than that, and then let them combine different ways and enable the machine to make an automated decision. Robotic still relies to some extent, to a large extent, perhaps, on that, but current, more commonly, um, artificial intelligence is based on machine learning and even more specifically on deep learning. In the course of the class, I hope we'll cover a fair number of machine learning topics all the way to deep learning. But before that, we had to have a reasonable foundations on descriptive and inferential statistics. The first part of the semester, we won't touch machine learning. We will still work on data science, data handling. We'll learn a lot of things about how to extract data from files. Put it together in ways that are meaningful, clean it, prepare it, and extract statistical information from it. Which can be as simple as taking the mean under standardization or as complicated as doing hypothesis testing with sophisticated statistical tests. None of that is machine learning. We will then talk about several methods in machine learning. What I hope to cover are clustering, which is unsupervised learning. If these words don't make any sense to you right now, don't worry. We'll see in previous cruciating detail what they mean. But if you had had an exposure to this, this will give you an idea of the roadmap clustering, which is unsupervised learning. It's ways of grouping data in meaningful ways. Understand the data structure generally. That's the purpose will cover trees and three methods, Random forest and gradient boosted trees. These are some of the most powerful and most common methods for decision making, whether the applications are classification and regression to predict values or to assign observations to classes. We'll do a little bit of natural language processing, which is super fun. It's a complex and extensive discipline in and of itself. Um, we'll cover a little bit of it then we'll just get deeper into deep learning. For every one of these topics, we will do practical exercises. I will tell you how the methods work, whether they're statistical methods or machine learning methods. So there will be a lecture component of the class where I explain to you how the machinery works. Then a lab component of the class where partially together and then finishing up on your own. As an assignment, we will solve the problem with the specific method that we are study. Any question. All right, Just to make it a little bit more scary, moving away from diagrams, I have this very complex visualization of all the various elements of data science. I don't intend it's humorous. I just want to give you the sense of how overwhelming a comprehensive overview of all the things that are addressed in data science could be as far as what we will cover on this thing. To some extent, assume that we have a common understanding even though perhaps a very strong foundation but understanding of. And then I'm happy to work one on one with each of you to get you to the point where we actually do have this common understanding. Specifically programming and statistics. I will go fast on the very beginning of programming and the beginning, descriptive statistics means medians, things like that. I will go fast, but I don't want to leave you behind. Like I said, I will have some assumption that this is not entirely new. If it is, I can help you and we can work together to make sure that you're not lost and you don't fall behind. We will not get to some things that are really core to data science, but we just don't have the time to do everything that includes, we won't talk about how to really handle big data like high performance computing, coding or anything like that. We will only use Python for coding. There's a lot of other languages and tools that we could work with that we are just going to ignore in the interests of time. In principle, everything that we can do could be done with something as simple as Excel. Although maybe previous crucian will be done with Python. It could be done equivalently with many other coding languages. Then it could be done in very specific ways given the data sets and the environment that you're working. For example, when you see things like Dupe Park and Storm, Those are high performance computing tools that we won't be able to cover in terms of what we will focus on other than programming that statistic said that perhaps, but I didn't very explicitly state that we will work a lot on data ingestion, how you reading data massage and data massage the data to get it into a shape, into a structure that is suitable for applying the method that you want to apply. Another thing that we'll think a lot about, although there may or may not be a single class on this, I may have a class on this during the weeks when one institution has break and the other one doesn't. I sometimes have a whole week on visualizations, but mostly we will just think a lot about how to visualize the data and the results that we get from the data as we do the homework. This is score because the other aspect of data science that is really, really important as it is in all disciplines, is the communication of your results of your analysis. Um, and which happens through verbal means, but it happens largely through visualization. So we'll be very thoughtful, even if we won't have a specific visualization book camp or anything like that, will be very thoughtful in the way we think about visualization throughout. Okay, okay, fine. So let's talk about some administrative stuff about how this class will function. First of all, please go ahead and read the syllabus. If you haven't done so yet, it will tell you what I just told you. It will tell you additional things about the specific topics that we will address. I wanted to focus for a second on the learning outcomes. By the end of this class, we should be able to formulate an appropriate analysis plan for a research question. Select, gather and prepare the data for the analysis. And choose and apply machine learning methods to the data. The choose is important. I think there is just as much art in figuring out what is the right machine learning method or statistical method that you need to use for this data and problem. I think there's probably more meat in this world than in the apply word. The application of machine learning methods in Python turns out it's going to be rather easy. But the choices that you have to make to choose the model and choose what we call the hyperparameter of the model. The details of the model are really the important part. I will grade you as follows. There will be a small percentage of the grade that will be associated with pre class questions. What that means is as follows. At the beginning of this class, I will have, I don't want to call it a quiz because I want you to think about it as a quiz. I will have a series of maybe typically three questions that are asking very general questions about what we covered in the previous class or few classes. The reason why I'm doing it is because, like I said, this is a project based class. We need to work together throughout the semester. It wouldn't work if one of you decided that they can catch up at the end right before the final. It wouldn't work because you couldn't work with other people and we couldn't do the problems together in class. When we do that, this ensures to me that you are going over the material regularly. The questions, I promise you will be simple and the fraction of the grade is small, but it's a check for you more than for me to make sure that you have covered the material. And if there is a question that you don't know how to answer that, you know that you have to go back to that piece of material that we had covered and you had not absorbed. Okay. Also this serves another purpose for me which is equally important, which is that you come to class on time and you come to class, this is T on line, but it's a synchronous class except for a few lectures where to accommodate travel, we might be a synchronous. You may have a couple of exceptions where you cannot attend class. As you have seen, I'm recording the classes, you can watch the video afterwards, but it is expected that you will participate in the class having you do this quiz at the beginning. There won't be a make up for this quiz. It's a small fraction of your grade, but there won't be a make up. If you can't attend, you just missed the quiz. I will drop the lowest grade in each of the segments of the grading scheme. If you miss a quiz, it's no impact whatsoever. If you miss many quizzes, it's a little bit of impact. Okay, coming time, because it will be open at the beginning of the class and it will be closed five, 10 min into the class. Okay. Let me check if anybody has questions on the chat. All right. And then there will be a percentage of the vote that is based on participation. Not we all have different levels of comfort for participation. I, myself cannot shut up, have to ask questions all the time when I'm in a lecture that I just can't help. I realize that that may not be your character, just you may just be a more introverted person asking that everybody turns into me. But I do want to see you contribute to the work. You can ask questions in class, that's an obvious way to participate. You can answer questions when I ask you to answer questions like you did when you defined data science at the beginning of the class, that was brilliant. Everybody put in a definition. I really appreciate it. There will be breakout rooms. When we do the exercises in class, I will pop into the break out rooms and make sure that everybody is talking to each other, you're reasoning through the problem together. That's the point of the breakout room, That you exchange thoughts, opinions so that you build a better understanding than if you worked on your own. Keep in mind that sometimes you will be ahead of your classmate in a particular problem, may have already done something like that, or it may just be come natural to you. Or it may be particularly well slept and well fed that day. The point is not that you talk because you need to get help. The point is that you talk because that gives you a more explicit understanding of what you are doing. The way that I try to convey this is that if you ask anybody that teaches classes, any classes, any of your professors, and if they're honest, they will admit that really the moment that they've learned the material is when they had to step up and teach it. And until then they knew how to do things, but they hadn't necessarily internalized in the same way. Because the process of speaking about your reason actually makes your reasoning more explicit and more clear. In the breakout rooms, there will be 34 people talk to each other, tell each other what you're doing, code together. If you know how to do something, explain it to your classmates. If you don't know how to do something, if anybody can explain it to you, will pop into the into the breakout rooms and see who is participating and who's not. To whatever degree of comfort in speaking, you have be able to assess that you're actually an active participant in the class. Okay, There will be homework. The homework is also a collaborative project type of homework. Now, I'll cover in much more detail how the homework are assigned and graded. Probably tomorrow, probably we won't get to it today. And also, I don't want to give you too much of this stuff because it's hard to pay attention to this, you know, pretty boring details of how are the administrative aspects of the class. But suffice to say that in many cases, the homework could be just like, okay, we started this exercising class in the last 12 min, 20 min of the lecture. Of course, we couldn't finish it, finish it up, and upload it as your homework. You're absolutely welcome to work. Actually, you are very encouraged to work in groups on the homework and not do them by yourself. Why? Because I just told you talking about things will make you understand them better even if you know how to do stuff, okay? The group, ideal group size is three to five people, few fewer. If you're only two, that's fine. If it's a whole lot of you, one day, that's fine. I will have a way to make yourself self declare what your contribution was. You're not evaluated for like, what was your contribution to the homework. You are evaluated as a group. But every one person has to have their own submission. Okay, these are collaborative projects because I'm banking a lot on collaboration and communication within the group. Very serious about making sure that the space that we share is healthy, and safe, and supportive and collaborative for everyone. One of the ways in which I do that is to put a lot of emphasis on a coda conduct for the class. The conduct, I've already pointed you to it in the message that I sent before, started yesterday. But we're going to do a slightly bigger dive on it right now. I would like you, I try to make it as easy as possible for you to get these links. I don't know if you're able well, I gave you the link to the slides. If you were able to get the slide, you're able to click on the slides. There's a QR code, fingers crossed, that I put the right QR codes in the right places. What I want you to do, I want to go to the code of conduct, which will look like this. Read carefully. Now, that shouldn't take more than 3 min, it's two pages. After you're done with reading the code of conduct, I want you to go to the second link here. This is a link to a form. It will look different for me because this is let me show it to you, like this to form, where you can answer a few simple questions about the code of conduct to demonstrate to me that you had in fact read the code of conduct and acknowledge the rules of engagement. I will also ask you your name and your pronouns, and once you submit, once it correctly answer the questions on the code of conduct and submit the form, you will get an e mail. That e mail will have a link to step three of this preliminary exercise to slap to make the communication for the class easier. But also to give you another bit of familiarity with some of the tools of the industry of the data science industry. Whether we'd by industry, academic data science in academia or data science in the job market, we will use slack for communication. Let me show you what that means by screen sharing. Another screen slack is something like this. It's an app for communication. Here you're seeing a different La team that I use for my research group. So these are all my students, et cetera. It looks a little bit like this Court. If you're a gamer, you probably know what I'm talking about. When I say k, you may not know what slack is. Very similar. It's a rapid communication tool. Once you submit your channel, which I'm probably going to, once you submit the form, answers to the form about the code of conduct, you will get a link that will allow you to join the slack. This is the slack that we will use for communication. Once you are here, there's a bunch of channels. There's a general channel where we will have general communication. Hello, this is the general channel for a Foundations of Data Science. For everyone one to, there's a few other things you can introduce yourself. I would appreciate if you introduced yourself and Hello, my name is sometimes this week there will be a channel for quiz where I can put the links to the quizzes. There will be a channel for each homework, but you can ask questions that are homework specific. There will be a resources channel where I will put a bunch of resources, books, link things that we talk about. It will continue building as we go on. This from that point on will be the main way to communicate with me and hopefully with each other. Of course, you can also reach me by e mail. Of course, you can just pop in my office if you can find me. But this is the easiest way and the most rapid way to get me to answer questions, especially if they're time sensitive. Like how do I do this thing for the homework that is so strongly prefer that you reach like this, you can reach out to me on the appropriate channel if it's a homework question, on the homework channel question or as a direct message to me, how's it going with the information overload. So I'm going to give you five, 10 min to do the steps that I had mentioned. I'm going to put back my slide with a QR codes. Then I should be able to see when you join my slide, I'll be able to see if 10 min is too much time like I did before. And if you have any questions, please mute yourself or put them in the chat I had to computers, wi fi got a bes I had to leave and come back. So the link is no longer the chat for me. Could you just resend it? The link to the size? Yes. Is that it? Yeah. When you come back, I'll put it back in. It's come back k. Okay. Yeah. Do you have it? Thank you. Sure. 0. Oh. 0 b, and if you already made onto the slag, feel free to introduce yourself in the Hello, my name is channel. As per the description of the channel, I'm asking that you include your name, your pronouns, which I have not included, what you study, what your area of focus, and one thing that you are good at does not have to be related, so we can you're unable to join this slack unless you have a Udell e mail. Let me check on that. That should not be the case. I might have messed up something maybe because of the link. Okay. Hidden. Did I say your name correctly? Yeah, that's fine. All right. I will invite you directly by e mail. I should be able to fix it. But I don't but I don't know that I'm able to fix it by in real time. All right. And if anybody else has issues on doing any other stuff including joining this slack, I hope I got your name right. If you are also Yeah. You're also Lincoln, right? No, I'm from. You did. Oh, and you're not able to join. What is the the problem? What does he say? Something amiss with your imitation. Please contact your work Special Warner and ask them to resend it. I don't know why I say okay. I'm not sure. I'll add you to the snack. I'll do that. Do that right now. Let me get my canvas. Okay, let me not show everybody the canvas to everybody, but I should have your e mail on Canvas. Anybody else, any issues? Okay. Hearing nothing, I 12345 people. Good, good, good invite people. One is there and the other one is D, Mike, L, J. You need an invite? Two. Lincoln or U. D. So I know where to look for you. Okay. And the UD canvass is no and doesn't give me your e mail. Just ask me the mail. Could you put your e mail in a in a message to me on the chat? Because the UD caval won't won't give me your e mail directly. Thank you. Two. Okay. Three invitations. I've gone out and I see most of the other people are on slack already. 12345678. I think we're probably there. If you were unable to do this and we were unable to finish, let me know and we will fix it off line. All right. So like I said, I'll give you much more detailed instructions about the grading, but I'm going to distill it a little bit into a couple of separate discussions so that it doesn't get too, too dull. I thought what we would talk about instead is some of the tools of the job that we're going to use, particularly for the coding. But before I wanted to, let me see if my poll is available. Let me just get the temperature of the room. I hope you're seeing a poll on zoom right now. And just answer honestly, anonymous. You have no reason why not to what your coding level is like. You're coding your level of familiarity with coating and with Python. With coding in Python specifically, Okay? I see five answers. I should get three more, okay? We're a bit all over the place. It's good to be all over the place because that means that you can help each other. Those of you who know the way around or have doubled can help those that are a little bit more behind, and I'm sure that they will return the favor when we deal with things that are more in the wheelhouse than yours. If you have answered that you are not familiar with coding, you will have some catch up to do. I am here for you. That's just basically the punchline of the story. But we will very slowly talk about the 11 of coding, specifically all of these covered said in Python. Python being a coding language. In, let me go over here for a second. Python being a coding language. Essentially, if you really don't have any exposure to coding, coding language means a way to talk to a computer and make it do things. What is the syntax that you use when you talk to it? Think about it as a language, right? When you talk to a person in English and you want them to make a sandwich, you will ask them, please make me a sandwich. If the person is Italian and speaks no English, you will have to change that and you will ask them, I'm Italian course. That was an obvious yet same thing with the computer. You have to decide how you're going to give it instructions and there has to be something on the other side, an interpreter that can interpret those instructions and turn them into commands that are executable for the computer that's says that Python is one of the most popular languages. Now I have this chart I had in my slides kicking around for some time now. It's probably go fully up to date 2018 and things will move fast in this field. But the point this is the popularity of different languages. And you can see that Python is near the top, and it has been near the top for a long time. It's also ramping up in popularity. This popularity is measured based on how often people Google search questions about Python is a coding language. Python has a couple of really good properties for us to use it in data science, for people to use it in data science, and for us specifically to focus on this as the chosen language for this class. You may disagree if you are not familiar with it, But compared to other languages, at least it's quite intuitive and human readable. There are languages that are really hard for the human to understand what's being written. Python tends to be very intuitive that way. It is open source, which means that there is a large community of people that continuously develop tools that make coding in Python easier, that can be queried for questions. We'll see how you would do that in the next class. If you have a problem, many places on Google where you can go and ask a question and you'll likely get a good answer. You may get an answer that is rough around the edges because the tech community is not always super tolerant and super welcome to beginners will address that, but you probably will get a good answer. It's not the most performative language, meaning like it's not super fast per se. It is actually quite slow. But it does support integration with languages that are more performative. Like, we won't get to that, but if you do continue encoding in data science and you end up having Perform difficult computational tasks. There are ways in which Python can be optimized for that. Specifically for data science, because of the way in which the language is structure. It's structured with packages. The package is a collection of pieces of code that have been written by somebody else. Such that if you want to do something like say, develop a neural network, you don't have to start from scratch. You can start with pieces of code that have already been written and put them together kind of like a puzzle. We call them packages. The ones for neural networks are tensor flow pitch. Then there's packages for statistics like, and stats models for computations like Numpy for general machine learning. Learn or learn. I'm hearing some noises. Can you still hear me? Okay. All right. I don't see any complaints. A demon? Sorry. If I go to your name, I got your e mail. I will I will add to that later. Okay. It's also one of the most desired languages in the data science community. If your objective is to move to the industry as a data scientist, Python will be a great asset in your portfolio. I think this is also out of date and it is probably has surpassed as the most desired skill in terms of coding languages on the job market, and one that leads to the highest earning. That's good news for you. I have resources on Python later, but let's go back to this slide. We will use Python and we will use this specifically in the framework of Jupiter Notebooks. What do I mean by that? One thing is the coding language that you use. One thing is the environment in which you write the coding language. Think of like you write an essay. Okay, You have to write an essay and you can use a pen presumably, or you can use Google Doc, or you can use Word Excel. Each one of those environments has a specific flavor, some tools, ways in which it works. We're going to code Python inside of Jupiter Notebooks. Now I want you to go to the next QR code, so everybody should go and work with me. Now on Collab, you can click on the link. If you have the slides, hopefully the QR code takes you to the right place. Let's just do a quick spot check. So it does a secure code. Very proud of myself, When we go there, you should see something that actually will look different than this because you haven't used it yet. The front page will be a little bit differently. Let me put the link into another window. I can put this stamp here. Can I? That's an incognito tamp. So let's do it in my incognito mode so that working exactly with the. You should see this. You'll see this. Unless you have already used Coap, which case you'll see what I was seeing. You can sign in and you can sign in with any mail account. If you are a U, D, I recommend you actually, I think both D and Lincoln E mails are actually mail at the root. So they should work. If any other Gmail account, once you have signed in, sign in here, you should see something like this. At that point, you will be able to, not that what do you will be able to say that you want to open a new notebook that should look like that. May look quite that's a setting, but the should be the same. Put a thumbs up please on the on the chat as a reaction on zoom so that I know that we have everybody. There is a two step verification. Probably. So I realize that it might take a second. I'm doing it on the side with my incognito window. Okay. I know that some of you had already raised your thumb, but then they went down and others went up. So can I get a thumb again for everybody that is on the no book so that I can help? Who isn't there? Two thumbs up. Okay. Can somebody who's not there tell me what's going on? Two thumbs up. Call comes up five thumbs up. 67. All right. It's been a little bit hard to figure it out because the thumbs disappear too fast for my taste. But if you're having a problem, please at yourself and tell me if you are uncomfortable. Amusing yourself and telling me in front of everybody. Just put a message in the chat and I will come and help you. Otherwise, I'm going to assume that we're all working together. This is going to be very easy, very basic. We're just going to write the first coding Python, very, very basic. Then we'll ramp up and get very fast for a minute, and then hopefully everybody will still be on board and we'll be able to go to more fun exercises very soon. The first piece of code that anybody writes in any language is hello word. We're going to name our no hello word correctly. The introductory challenging coding is to write hello word in that specific language. What are the syntax rules entitled to write hello word? Well, I said very intuitive. If I want to print hello word, I will just go to the cell. Each one of these is a cell. As a font. I'm working with a cell code. I'll tell you a little bit more about that. This is a feature that is specific to notebooks to Python. If we were coding in Python, in a script, it would not look like this. I'm going to run this cell, I can use my Play button here. Also, I'm going to tell you what the shortcuts are for doing the things, and I strongly recommend you try to get familiar with the shortcuts. That will save you a lot of time over time. Shift return will run the cell, you'll never see me hover over the mouse and use play, because shift return will do that for me. And I don't have to leave the key. As you see now I see hello work. What's happening here is that in this notebook environment cells, I have space where I can put my code. At the end of the cell, the set of instructions that I put in that cell gets executed. The output of that set of instructions gets included in the notebook. The reason why it's really good for me to use it for class is because I can see both your code and the result of your code. When you will turn in homeworks, you will turn in through in a very specific way that we'll discuss next time, Jupiter Notebooks, where I will be able to see the code that you wrote, the result that you got. Whether that's a statement that gets printed, a plot that gets made, a number that gets calculated. Here is the kicker. Then there's also text self that you can use. I'm going to click on plus, we'll put a text above where I quit. I can put a comment here that says, this was a exercise in printing Hello world. In Python. Now, I will be super demanding with this because as you will see next time we'll be discussing in very much excruciating detail the way in which I will grade your assignments. It won't be so much like, oh, this coding is really great, great style plus points for having written in really good Pythonic way. Because I don't think this is not the point of the class. The point of the class will be that your outputs are correct and consistent with what you were asked to code, that you were able to interpret them correctly. I will ask, for example, after every plant that you include the caption of that plant in the form of a text cell like the one that I've used right here. That tells me this is a lot of blah. Price of houses as a function of the number of rooms. That shows that up to a certain limit, increasing the number of rooms increases the price of the houses and then past that limit, the dependency changes as an example makes sense. That interpretation will be what I grade you on. Get very familiar with the code, with the collate environment, not only with the cell codes, with the text code, because that's where you show me, not only that you can do what we're doing, but that you understand what you're doing. And that's really the ultimate goal. That makes sense. Okay, let's stick with Python for a minute. Let me get my cheat sheet about things that I wanted to tell you about Python. If you are new to coding, some things that I wanted to talk about are variables and different types of variables. What is a variable? When we, in any coding language, we generally try to include values of things in named variables so that we don't have to repeatedly include the numbers that we use or the strings that we created. More importantly so that they can be modified and still referred to in the same way. Or they can be read into variables that we know the name of. And that if I want to read the new file or if the file changes, I still can use the same variable name. In this very simple example of hello word, I have used two things. I've used a command. That command is print command is either function or method in Python, but let's just say that it's a function, It's a function. A function in a coding language is something that does something. In this case, it prints, although it doesn't have to strictly, typically that function will take arguments. It does something that is different based on the argument that I gave to the function. In the case of the print function, one of the simplest functions in coding in Python ways, that argument is what I wanted to print. In this case, I wanted it to print a string that says hello word. I gave it a string that says hello word. I could have created a variable that contains a string and then passed it to Python. Actually, it doesn't have to be a string. I could create a variable that contains a number and it would have printed it all the same. But let's not get there yet. How do I create a variable in Python? All I have to do is give it a name. Let's call this variable greeting. My greeting will be hello later. I may decide to change the greeting from a low to good night, depending on the time of day. Then maybe let's create another variable that says to whom I say hello. And we're going to call the two. I'm going to create a word. I'm going to do it differently. I'm going to call this variable, if I want to say greeting to all I'm going to say, let's keep the capital for then I'm going to create a variable that is the end of my string. Maybe I want to be a little bit more enthusiastic. I want to say hello word with an exclamation mark. Let's call it punctuation short. It will be an exclamation mark. Well, there's nothing now, because I haven't run my sell of code, I will run my sell of code. These are all variables. Now I can say print my variable greeting and now Python knows the greeting corresponds to a low, we'll print the content of the variable. Let's say that I wanted to print a low board. I can put together these greetings and I can give more than one argument to a print function, print greeting. This will print a low word exclamation mark with a nice space in between all the things. I don't like that exclamation mark between word, that space between word, and my exclamation mark. Because these are strings, I can actually concatenate them. This is a property of this particular kind of variable. This wouldn't work the same if the variable that I defined was a number. But since it's a string, I define as I can concatenated with the plus line. These are you didn't mean to do that. I have that obtrusive, very helpful impact. But at the moment, intrusive help help window. I want you to see what's above as well. I'm going to concatenate the two strings with the plus sign for proper clean Python coding. I'm going to always put spaces between the operators that I use. So if I put a plus sign, there's going to be a space before and the space after. Now I've modified, I've actually modified the string all by concatenated with the string exclamation mark. All right, let's say that I want to say hello to me. Let's call the string peda. Let's call, let's create a new variable that is a string, that is my name. And I'm going to associate it with the name of the variable and now I can say greeting to me instead of work. Okay, so a couple of things that you've seen while I was doing this very quick gymnastics with strings. You've seen that when I invoke a function in Coa, this is not a Python property. This is a Coap property. I get a very nice help message here that tells me how to use the function, the things that they want to focus in. Here, it gives you an example of how you can use it. You can use print value is the thing that we're passing it. There are other things that you can pass it. These are called name, name variable for example. You can pass a separator space, That's the default. We've seen it before. I've passed two things to this function. We put a separator in between them. But I could do that differently. I could take this string and say set that variable separator to now. Instead of putting spaces in between the variable greeting and this combined variable, mem punctuation, it will put a string. That's how you use the help of these functions. The prompt gives you examples, and then if you scroll down, it will tell you what is the meaning of those examples. For example, here, set string inserted between values, the default space parts, so good. All right. Other things that you've seen perhaps, is that Collab Notebook is very helpful and helping me finishing my thoughts if there are variables that are in memory like greeting, or if there are variables that are in memory because they're already in the memory of what I call my memory stack. In the memory of my things that are native to Python. Always knows just because I'm coding in Python, there are some functions that owns, that exist in the language like everything else that is written here, function and variables that already exist in Python. Or if I've imported one of the packages that I was telling you earlier. One of the things that will make it very easy to build neural networks, there will be more things in my memory stack, things that have been defined inside of the code that I've now ingested here. The only thing that I created that starts with GRE is the greeting. It suggests that that may be how I want to complete my word. I can use the tab to do that. It's super helpful because a lot of times there will be typos and the disaster in typing, you will see more typos than you've ever seen this semester. And it will drive you nuts. And trust me, it drives me nuts too. But we all have our limits. Autocomplete helps me mitigate this issue. It's not just so much that it's a little bit annoying that I had to go fix my typos for sure. But it's more that if I write a complex piece of code that is many lines and I have an error, the error message obvious from the error message. It might not be obvious where the error is and that it is a typo. Avoiding typos really helps. Speaking of error message, let's produce an error message. If I do something wrong, like for example here, what am I doing wrong here? Anyone? Why do you expect that this will cause a problem? There is no variable, Greed, right? I haven't defined greed, it doesn't happen to be one of the things that are in my memory stack. I know that because it's white and if it was one of the things that are known to Python like print, it would be a different color. It will give me an error. The error will say in this case, name error, name is not defined. This is a very simple error, it's a small error, it's easy to interpret. Generally, when I have errors, the way in which I go about them is I will always want to go to the bottom, to the bottom of the error and read the specific error message and then go back to the top. Because it may tell me where it is. Here, you see compact, these are the only things that are contained in this error, but there may be a lot of things in between. Those things will be like the various pieces of how the error propagated in the memory of the computer. Generally speaking, if you encounter an error, go to the end. It tells you what kind of error it is. That's a name error, gives you an explanation for that kind of error. And then here where it tells you in which line it was encountered. Okay, so I can fix it. And let's see. Then the other thing that I wanted to tell you, what time do we finish discuss 515 or 05:20 A.M. I done? I think we might be done. I think it's 515. Okay, It's done. So then we'll get back to properties of variables on Thursday. On Thursday we're going to talk about properties of variables. We're going to do a little bit more Python coding. We're going to talk about more details of how I will grade you. I'll give you a reading assignment and we'll talk about how you're turning the homeworks with get help, we'll give you a very quick create repository assignment and upload, please. No book that we're working on together. Yeah, I'll see you on Thursday if you want to go. Feel free to go. If you have questions, feel free to stay and ask. Thank you, Dr. Janco. Bye bye Dr. An. Thank you. Thank you, Don. Bye. Bye. Bye 0.
FDSFE week 1 class 1
From Federica Bianco August 29, 2023
11 plays
11
0 comments
0
You unliked the media.
Zoom Recording ID: 94679079946
UUID: vMJ3RKZaRBWwcPdjg11XaA==
Meeting Time: 2023-08-29 07:50:47pmGMT
…Read more
Less…
- Tags
- Appears In
Link to Media Page
Loading
Add a comment