Changes

Jump to navigation Jump to search
}}
=The Coming-of-ageReport=
''After 3 days+ of ungoverned reconnaissance, herein lies a comprehensive update of my beliefs regarding the Twitterverse, Twitter mining and the dream case that we seek from Twitter, here at McNair.''
<blockquote>'''Geo Visualization''' is the process of mapping tweets to a real map of the Earth. Applying '''tweet analytics''' and '''network visualization''' to it, we stand to have an understanding of the geographical dimension of entrepreneurship activities in terms of peoeple, organizations and events in particular places, for instance Palo Alto, CA or Austin, TX. When measured over time, we can observe the crests and troughs of activity in these places. This would be extremely promising especially for the '''HUBS''' research project.</blockquote>
For simplicity, I will refer to the above aggregate as '''Viz&Ana'''
===Key Ideas===
*'''Visualization and AnalyticsViz&Ana: DaaS'''**While exploring the web, I realized that DaaS firms focus on providing Twitter visualization and analytics Viz&Ana services to businesses and individuals to enable data-driven decision-making. In other words, the twitter data they mine offer an user interface for the client to interpret Twitter as an observable phenomenon. Clients exercise their own judgment as to whether a marketing campaign or event organization is successful, and make decisions based on these visualization and analyticsViz&Ana.
**To contribute to the research work at McNair, I would propose that we assemble tools and software in the spirit of a DaaS. In other words, Twitter Mining per se is not meaningful. Constructing a working system where researchers can observe the twitterverse, as if interpreting a primary source of data, is meaningful. For data scientists, running statistical analyses on outputs from this working system is meaningful.
*'''Portability & Flexibility'''
**This is the '''bit''' where we distinguish ourselves dream bigger than a run-of-the-mill SAAS, whose work ends when the viz/analy Viz&Ana is delivered to the hands of the clients.
**Since the Viz&Ana is for research consumption, further research and analysis must be carried out on the graphs, maps and tables produced by the Viz&Ana. We therefore should do well to avoid blackbox scenarios where beautiful but inflexible graphs are produced but cannot proceed further in the hands of the researchers. Open-source tools, a stronger backend and a good data management system is therefore important considerations when building our Viz&Ana system.
**In other words, I want data structures that can move between softwares, not just a poster to hang on the walls.
*'''"When measured over time..."'''**Since twitter represents the movement of trends, it is best interpreted as an organic body of knowledge that is contingent on the passage of time. Any Viz&Ana that we conduct on the twitterverse must be able to be viewed and extracted (and further processed) as a function of '''time'''.
==Twitter MiningTools=====Blackboxes===Before the www revolution, legacy Viz&Ana software started '''in the past''' such as Pajek tend to be blackboxes whose functionality are developed by a dedicated team of commissioned engineers who knew that their target audience are not likely to know code. Many Viz&Ana software, as you will see below, fall into this category.
==Dream Case=Modules and Scripts===There is a large community of developers and researchers who are actively involved in developing open-source, free-to-use modules and scripts. Most of the work done by them lie in one of the three aforementioned Twitter Mining approaches. I have not yet explored time-based or webhook-styled modules that we can harness, but am pretty sure that they exist.  The resources can all be built upon each other, with the help of intermediaries, to create a form of aggregate Viz&Ana that McNair needs. Having limited lived experience with different programming languages and joining modules, I cannot offer optimal advice on how exactly to build them together efficiently. However, they all possess the capability. ''To be further inspected'' ===='''Network Visualization'''====*[https://pypi.python.org/pypi/sentiment_classifier Collection of R Packages] - see [[Field Notes]] for detail*[https://sunlightfoundation.com/blog/2012/05/24/tools-for-transparency-a-how-to-guide-for-social-network-analysis-with-nodexl/ Intro to NodeXL] - see [[Field Notes]] for detail*[http://nodexl.codeplex.com/ NodeXL Canon] - see [[Field Notes]] for detail*[http://www.smrfoundation.org/scholarship/ Academic Scholarship on NodeXL] - see [[Field Notes]] for detail ===='''Tweet Analytics'''====*[https://github.com/mayank93/Twitter-Sentiment-Analysis mayank93's Twitter Sentiment Analysis in python]*[http://www.nltk.org/ Natural Language Processing Toolkit in python]*[https://pypi.python.org/pypi/sentiment_classifier Sentiment Classifier in python] ===='''Geo Visualization'''====*[https://github.com/ericfischer/datamaps ericfischer's Datamap in C] - see [[Field Notes]] for detail*[https://www.mapbox.com/blog/visualizing-3-billion-tweets/ Geo Visualization on Mapbox] - see [[Field Notes]] for detail
==Dream Case==
=Field Notes=

Navigation menu