From 8783d7ae582417c97338a6e6e20265129810a2aa Mon Sep 17 00:00:00 2001 From: Lucky <66523959+l-ucky@users.noreply.github.com> Date: Thu, 30 Nov 2023 23:50:42 -0400 Subject: [PATCH] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 58f8383..74c3b6d 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,8 @@ # 4 Chan Webscraper, Version 2 +**Notice:** +I will be reuploading all of the files with their appropriate dates attached into a column. I didn't include dates into the column and I don't know how I overlooked this. Oops. I have 100+ CSVs to edit, but I'm busy with exams right now. Maybe over Chrsitmas break. + Consider doing your own data analysis. If you save your CSV, and make a pull request, I can add it to this repository for plotting word usage changes over time. **Highlights:** @@ -13,6 +16,7 @@ Consider doing your own data analysis. If you save your CSV, and make a pull req - Differs from V1 by scraping all replies to OP, and has a much larger noise filter. - Sentiment analysis is also performed. - **X number of "posts by this ID" with graphical representation.** +- Working on a "Word by Time Series" plot of word changes over time. I'll need to remove useless words to hone in on important (i.e. relevant) words. For example, "woman," isn't a relevant word, and it's just noise. **Limitations** - 4chan will recycle poster IDs, so they are not unique identifiers. Therefore, the data mining on n-pbtid isn't fully accurate for the upper-bound, but it should be closer to representative at the lower-bound.