add better normalization. add link similarity.

This commit is contained in:
matt
2023-05-07 22:07:26 -07:00
parent 3a6f97b290
commit 4bd9f46edd
7 changed files with 383 additions and 29 deletions

View File

@@ -1,5 +1,29 @@
# Data Mining - CSCI 577
# Project Status Report IV
*2023-04-25*
This project report will take the form of an initial draft of the final report, making use of the template discussed in class and made available on Canvas. Minimally, this draft should include the following:
1. Data preparation
2. Policy for dealing with missing attribute values
3. If your project is one of classification, discuss:
a. Intelligent discretization
b. Identification of useless attributes
c. Policy for violations of the adequacy condition and missing
attribute values
4. If your project is one of clustering:
a. Elimination of noise attributes
b. Proper choice or development of distance measures
5. If your project is one of association rule analysis:
a. What are the "market baskets"?
b. How are thresholds for support and confidence developed.
6. In all cases, you should specify:
a. What computational experiments you have conducted, or plan to
conduct.
# Project Status Report III
*2023-04-18*
@@ -35,6 +59,10 @@ I will use the following suite of python tools to conduct my research:
> This progress should also provide a definitive description of your purpose and how you intend to conduct it.
> This should take the form of a detailed outline of the procedures you will undertake in exploring your dataset(s) and maximizing the knowledge that can be extracted from it.
The ultimate purpose of the project is track the progress of political discourse as a function of time and publisher.
Using a dataset of article titles and publications, the aim of the project is to classify article titles using a sentiment analysis language model.
\newpage
# Project Status Report II