Saturday, December 13, 2014

fall 2014 -- the emergence of the scholars toolbox

So, classes are over and I finally have a chance to think about the business again.  I still have grading, but I'm taking a break for a moment, and typing out some ideas in this blogpost.

The business idea is to produce a researcher's toolbox, including data collection and analysis tools, such as


* everything you need to run a computer assisted telephone interviewing (CATI) lab,
               - a random phone number generator for CATI campaigns,
               - a full featured predictive dialing outbound call center with agent dial-in
               - full featured survey software


* tools for individual CATI, which are used for the recording and transcribing of face to face interviews, telephone interviews, and conference calls

* a full purpose survey product, which can be used for nearly anything, including automatic testing and grading.

* a mapping program for advanced display, upload and sharing of location based information.

* a website that transforms lyrics and associated content into easily accessible research data

* series of animations of major sociological thinkers speaking about their ideas (total ~ 200 minutes)

* tools to facilitate importing data into Nvivo

* tools to facilitate teamwork and efficient use of resources


I've got several projects going.  I don't really do anything for these, other than come up with an idea and hire someone.  I do no management, and hiring happens pretty quick.  Surprisingly, this takes very little time on my behalf.

I have a great team of very skilled developers working from their homes around the world.

Over the last year, I have completed some small projects, such as the catishack main site, built by web designer Moziful Islam from Dhaka, Bangladesh.  I also completed an  automatic geocoding macro, and another macro that is very useful for importing data into Nvivo.  The macros were written by Dmitry Neduzhyi   in   Kyiv, Ukraine. 

Oh, and when I say I've completed something I mean I've thought of something that would be useful, and I've hired someone to build it and they built it and it actually worked.


And when I say I'm working on something, or building something, that means I've hired someone to build my ideas.

So here's what's been brewing over the past few years . . .


1)  Shared Browser -- Building a browser that checks out passwords, so you can share without actually sharing the password -- the password is inside the browser and will shut down at the end of the checkout period.

Competitor:  dashlane and one or two others.  I don't know that this will be a standalone product that is sold. I'm imagining that it will be an add-on for people that buy other services.  We'll see  how useful it is once it is done.  The main Developer is in Navsari, India.  His name is Preshit Desai. He is IBM Certified.  He holds a masters degree in computer science, 2013.



  2) Spatial Scholars -- I'm building a mapping program for the sharing of information on maps.  This allows the uploading of csv data with address and automatic geocoding, saving of maps, and use of others maps and baselayer maps to create up to 15 layers of data, which can include audio pictures and video.  Users can save the map and share with others.  I just hired someone for this job.  I got quite a few good candidates.  The project is called "spatial scholars" and has been named by the lead developer in Multan, Pakistan, Awais Gillani Shah, who holds a postgraduate degree in Geographic Information Systems.


Competitor: policymap and one or two others that are focused on social data. I paid $2000 for policymap last year.  I'm imagining this will be done by the start of the spring 2015 semester, and that after at least a semester of testing, we may be able  to sell access to the public, of course, at drastically lower prices -- maybe $200 per year -- and still most of that will be profit.




3)   Music Database Builder -- I'm still building the lyrics research site.  This uses the full 1 million unique song dataset from lyricfind (the only real source available) and combines that with five other music information services to provide a data repository, that allows searches of lyrics, and other data not readily available with the lyrics (such as date produced and user comments), that produces output in csv database files that can be easily imported into Nvivo or even our mapping program that is under production. I was able to hire a developer that seemed to have great experience but has yet to produce anything even though its been several months.  He's the third or fourth person I've hired for this, and I went after him because he had experience with big projects, and wasn't busy, but he got busy with something else anyway. I paid a deposit upfront, so I have to be patient.  The lead developer is from Gurgaon, India.  His name is Pulkit Agrawal.

Competitor: No available product existing that lets user do multiple search and produces results in database format for research purposes.  I doubt this site will ever sell access, since it is dealing with copyrighted material.  However once it is up and running I do have permission to run advertising to defray expenses.




4) Statistics program.  The statistics program has fallen apart.   That guy strung me along for over a year, and didn't produce anything  above the planinng stage.  I let it go on too long before ending it.  He kept saying he was going to do it.   I think I may just take pspp and rebrand it.  Why do I need to recreate the wheel?  And I could add the new visualizations, mapping, and all my qualitative stuff.  There are plenty of good free visualization programs that can be integrated.  Also, I would like to integrate my icati and survey and call center with the statistics program.  Main developer (retired): Mario Lacunza, in Lima, Peru.


Competitor: SPSS and several others.  Perhaps too much competition here to justify the expense of building something unique.  However, spss sells well for $1000 a copy, even though there are free alternatives that do the same thing.  There is such a wide need for this kind of tool, that even with the competition, there might be opportunity in the market.  I'm just unsure about spending the money here, as opposed to just rebranding one of the existing free alternatives to spss.





5)  The icati (Individual cati) is in pretty good shape.  We used it in all of my classes.  I sold id codes to the bookstore, and students each bought a 3 month subscription.  That was almost 200 users.  Each student did 5 interviews of 20 minutes each for the project, plus five very short 1 minute interviews in front of the library, and another five interviews during class.  The system records the interview and send the recording via email.  It is also possible to use it to automatically transcribe.  We tested that to pretty good results.   It's not ready for prime time, but its a neat proof of concept.    Ours is better than the rest because it actually relies on users joining and training the system to their voice, which is the preferred method for all professional-level transcription that expects precision for long passages of transcript.  There were some glitches with a 5 minute hangup on some students accounts, but overall icati worked great.  And I now have the backend admin access too, so I can do renewals and everything on my own.  I also have a good list of stuff to do to improve the product in the future (we've already been through three rounds of revision).  I have a wonderful developer named Jehanzaib Younis from New Zealand who is leading the project.

Competitor: evoca (out of business 2014),  evoice, and several others (none focused on research).  Most of these are way overpriced.  The system I built has similar functionality to what others are charging 40-50 dollars per month.  I can sell this at about 1/4 the cost. of others and still make some profit.




  6) Call center I need to get the call center moving.  It really hasn't been used in a while, and I'm not sure if it will.  There are some people tied to faith in community that expressed interest in using it to do some call center work on the housing blight issue. I want to build a call center that is in the cloud and fully integrated with my survey product, mapping product, and social media

Competitor: lots of competition in general -- genesys, promero, live ops engage -- but little competition for academic research market -- sawtooth (very expensive!).  Users can easily pay $150 per seat for many providers.  My current provider -- which I whitelabel as catishack -- charges me $70 per user, plus 1 cent per minute phone charges. I want to build my own so I can just pay the 1 cent per minute. Then I can offer the product at only $20 or $30 per seat and still make enough to cover expenses, etc.





7) The number generator program never got finished because the developer got sick.  He ended up refunding a portion of the money, although the site, Guru, is still holding it.  I don't like them. I will get the money, eventually.  Then I have to figure out how to find someone that can build off this incomplete project.  Developer (retired), in Cairo, Egypt, Mohammed el Malt.

 Competitor:  not many competitors here -- Genesys sampling systems.  There may be one or two others that provide number lists for research, but all of them are very expensive.  Once built this system has no significant operation costs to operate (other than server space). I can offer numbers at 1/50 of what existing providers charge.



8) Full featured survey program. The program has been working well, although I did get some weird notices that it was timing people out.  I didn't have time to look into that, but otherwise it seems to have worked well for our 700 person survey.. I haven't really used it for video yet. Developers: Gabriel Jenek and Amit Kumar in Bokaro, India

Competitor: Survey monkey and many others; most of them are way overpriced.   I can offer similar quality at 1/4 the price.


9. Full featured tool for online exams, testing, and building learning communities.  I will be able to offer this for free.


10. animations of dead sociologists

------------------------------------------------------------------------------------------------------

SPRING 2015 UPDATE




SCHOLARS TOOLBOX

All software is in beta and undergoing regular updates; offerings still in initial development are in italics.

CATI -- computer assisted telephone interviewing

-          Call center software – dial in and out, recording, sms, ivr, TTS, survey, ASR (built from asterisk, at&t speech recognition transcribes both sides of the conversation!)

-          Survey software – hundreds of features, media capability, full security (built from limesurvey)

-          Randomnumberlist.com -- Random number generator – produces random list by city or area code of up to 50,000 numbers in one second; uses only valid area codes and prefixes; includes removal of do not call, business, fax, landline, cell.
o   Under construction– voter registry – includes phone number, address, demographics, etc.  integrated with random number generator so generator randomly produces numbers and all but those on the voter list are removed
o   Also user will be able to specify removal of numbers by location (city, state, zip), ethnic surname, and demographics for residential, business, and cell phone.   Business type will become another removal category (database lists businesses by dozens of SIC/NAISC codes).  If a user wanted only aerospace businesses for example, he would remove all but business, and all but aerospace.  The system will produce a random list, remove all but business, then within that, will remove all but aerospace.  The only numbers left are our random numbers that also match those in the aerospace database.  Since all numbers are specific to the location, the user must specify location of the business (to say, only catch aerospace businesses in New Jersey, in Omaha or in the zip code 93210).  The user will also be able to specify two other removal criteria for businesses – the size of the business, and the ethnic surnames of business contacts.   Most of these are removal criteria where the user checks boxes to remove numbers that match the criteria, but some (like location) will require a textbox entry. The resulting numbers after removal are output to the user. 
o   Most data from http://www.usbizdata.com
o   TOS specifies no appending lists, but we are just using their data to compare to our data.  We are not appending our data with theirs.  Our data is produced randomly and is proprietary.   We are using their data only in the backend, to compare our data to theirs, and only to limit how much of our data is shown to our users.  Our service only provides users phone numbers that have been randomly produced by our program.  Users  only can see our numbers, nothing more.  Our use of their data is totally back end.  It is not appending any list. It is not reselling their data.  It is just using their data for comparisons to our data, to help us understand and search our data better.  When the comparisons are made, we do not save the comparisons.  Their list remains intact and no part of it is merged with our data.  No part of their list is available to our users.
o    
o   Excellent TOS http://www.atozdatabases.com/termsandconditions

-          Icati -- Voicemail recording software (for face to face interviews) – auto-email recordings to user, box account; ASR (built from asterisk; moving to at&t, perhaps)

-          mytranscripts.org - Transcribing software – ASR (at&t), dropbox and youtube integration, works with any file

-          Titan Naturally Speaking – speech recognition software with html5, html, android app (at&t)

-          text blaster  and email blaster – includes campaign manager, etc., for managing email and text campaigns.  Similar to trumpia - https://www.dropbox.com/s/np2w847w13kac4a/blaster.PNG?dl=0

 

Data sharing


-          Moviescholars.org – search, visualize and database download of movie metadata; integrated with html5 viewer – with browser plugin to allow our data skin to be used on any movie streaming site—and our database of movie trailers.  Full integration with social media, sms, mss, bulletin board for discussions, auto tagging, upload, download, etc.

-          Tvscholars - search, visualize and database download of television metadata. integrated with html5 viewer – with browser plugin to allow our data skin to be used on any movie streaming site—and our database of movie trailers.  Full integration with social media, sms, mss, bulletin board for discussions, auto tagging, upload, download, etc.

-          Tunescholars.org --   search, visualize and database download of lyrics and music metadata; integration with spatialscholars,timescholars

-           
-          Newsscholars.org – combines bulletin board, blog, and rss feed aggregator, includes audio and video capability, produces shared or private rss feed aggregator, including user comment for each feed, admin chooses which tag topics are on frontpage, to encourage discussion of certain topics.  User can build small community themselves and be their own admin and control the highlighted discussions.  Use mongodb to auto create tables of tags.  Auto tag each feed with a 16 digit number based on source (first 5 numbers),  date (- next six numbers, which includes month/year), topic (next five numbers – topics chosen by admin and also autotagged).   Autotag based on topic is done through auto-searching through titles for admin-defined topics, and automatically tagging based on  those.  On frontpage user can choose to display by topic, date or source.  These favorite tags are displayed on the frontpage with a count of how many feeds fit the tag, and a link to a page displaying all feeds with that tag.  None of this is particularly novel, with the possible exception of the auto-tagging, and all of it should be able to be built with open source tools.  Full integration with icati so user can buy a number and have all interview links auto-upload to users account; user can tag the uploaded audio, and then it will be auto-added as comment on feed.  Eventually we will allow user to use ivr to type in the first part of the tag record number of the feed at the beginning of the call to determine to which feed to auto-post the interview url as a comment.  Intagration with our mms client so user can take a picture and record audio.  Integration with our full conferencing client so user can record video interview – all interviews posted as comments to rss story.  Easy social media integration (autopost to all), and integration with sms and mms, to allow complete sharing.  Word clouds from Google charts for innovative searches.  Visual integration of feeds and comments with spatial scholars and timescholars.  Integration with NLP / Qualitative analysis tools.  “Auto-quantify the news” algorithm using smart NLP auto-searches of people, places, dates, major events, and sociological topics, quantifying them in crosstab tables.  Integration with Quantitative tool.  (Built from bulletin board, rss, blog, etc)

-          Countryscholars -- search, visualize and database download of country statistics.  Integration with NLP and statistics; integration with blog, socialmedia, sms, mms; integration with spatialscholars for auto-upload of countryscholars output.


-          Tunetext.org – search, listen, upload, share, discuss, download music and music playlists, with streaming lyrics and metadata, metadata randomization, and customizable audio/video TTS DJ to introduce the songs with metadatal.  Integration with amazon, integration with spatialscholars, timescholars. ; browser plugin to allow play of music on any streaming site using our player (amazon, etc)  (built off ampjuke)

-          -Movietext – search, view, upload, share, discuss, download movies via official trailers, playlists of trailers, and youtube embeds of movies/clips.  Full integration with social media, sms, mms.

-           Spatialscholars.com – upload, download, share, and visualize data on maps;  includes database, audio and video files

-          Timescholars - – upload, download, share, and visualize data on timelines;  includes database, audio and video files

-          shareboard – like quora; will be integrated with other software (built off question2answer)

-          textscholars – search, upload, listen/view, and discussion of books, articles, and other scholarly material (much scraped from google scholar)

-          heritagescholars – genealogy site with audio, video upload, sharing, etc  (built from PhpGEDview, family connections, and Webtrees)

-          archivescholars – archive website built around social science disciplines, beginning with sociology.  All major areas of sociology have own subdomain, and provide a forum for viewing, downlading, storing, sharing, and uploading of documents, pictures, audio, video, and structured databases (excel files).  Two main pages exist for each subdomain – primary documents (i.e., traditional online archival databases of written documents) and secondary documents (i.e, textbooks).  The archive for sociology provides demos for textbooks, and the sociology of culture and the sociology of social movements provide full demos, including both textbooks and primary documents.


Productivity

-          Hookus.org -- Web conferencing – audio/video conferencing and screen sharing with recording and no downloads (built on big blue button; working on webrtc version)

-          Statscholars.com -- Statistics – upload/download data, basic social statistics, annotated output, visualizations (built on apache statistics)

-          Qualitative analysis – Nvivo alternative, offers several major improvements.
§  oqan.org
§  OQAN
§  Online Qualitative Analysis Now

A dozen reasons why my qualitative data analysis solution will be better than Nvivo:

§  1) online format.  Install is a huge problem with existing product, with no solution.  User’s have to travel to a computer lab where Nvivo is installed, and fight for a physical computer.  Really?  Yuck!
§  2) sharing and online storage.  Nvivo charges for server edition which allows sharing and access to a shared nvivo file, but still each user must have nvivo installed on their computer.  There’s no real collaboration.
§  3) Collaboration – we not only deal with the issue of parsing multiple users using the same file, we encourage collaboration by automatically giving each user his or her own public and private web presence (own subdomain), with a responsive site integrated with user’s social media, along with chat, webcam, dial-in or log-in conferencing, and shared whiteboard.  Each new opened project is loaded into its own webpage with each of these tools, on the users subdomain.  The user can make decision to make public/private for both the project webpage and the related data for the project.  Public webpages are available on the web, and the link to each newly published page is presented on our webpage.  Public data is shared with all users in the shared projects folder in the users storage account.
§  4) Accessing data -- scraping.  Currently there is some primitive scraping allowed in the current nvivo but it repeatedly stops at a relatively small amount, and offers absolutely no ability for tweaking by the user.  Ours is built for news and comments from fifty top news sites, as well as access to the full feed from social media on twitter, youtube, and facebook.  Users can use our scraping tool to turn the web into database data in minutes.
§  5) Dataset Size.  Nvivo could not handle large datasets very well.  Sadly, Nvivo was not built for big data.  Ours can handle big data much better than even Nvivo server edition, which was not at all feasible for individuals or even nonprofit teams due to the extremely high price of nearly 10k (and each user still hand to spend the $600 for the program).  Our solution was built with products built after the emergence of the big data revolution, meaning that we can handle datasets not ten times larger than Nvivo but at least one hundred times larger.  A regular computer running Nvivo would get bogged down by a dataset of 10,000 entries, where only one column contained qualitative data. Querying a dataset of 100 mb of data would crash nvivo on substandard computers, and with the best computer you could get 700mb before things started acting up.  Our setup is better. Because all users query existing databases within a strong central datacenter built for big data search and retrieval, rather than accessing the databases on their weak personal machines, we can analyze easily 100 times the amount of data as Nvivo. A million datapoints or even a 10gb datafile --- these are big but not impossible to query with today’s big data tools.  
§  6) Advanced file conversion – Nvivo does fine with 20th century file conversion, but what about simple things in todays age, such as ASR for interview transcripts, and OCR for image files that contain text (e.g, pdf, old newspapers, books).  We’ve got it covered!
§  7) Corpus of public data – Nvivo is a nice tool, but its utility is limited unless you know how to find good data.  We offer not only scraping, but also pre-made datasets of public documents in a shared folder within each user’s account.  This includes all major .gov datasets and Big Data solutions that include metadata from tens of thousands of movies and over one million songs.  It also includes over one million articles, books,  audio and video recordings in the public domain that are already imported into the program, so the user can begin analysis as soon as opening the program and choosing the correct file from the shared folder.   In Nvivo, inputting can be a major pain, as well as its inability to handle large data without crashing. These massive datasets are cleaned and entered into our analysis program by admin and defined as shared, and thus available in each user’s online account upon login.  All users have to do is open their account and click on the file they want analyze.  If users want to combine multiple files, that’s no problem.  Most files are updated automatically by api, so the users always are assured of having the freshest available data.
§  8) Analyses.  Nvivo is a qualitative tool, while ours is a mixed methods tool.  We offer a basic statistical tool so you can recode your quantitative data and do all the major social science statistics, such as all the basic frequency and descriptive statistics, chi square, t-tests, and even multiple regression.  We also offer much better qualitative analysis than Nvivo.  Nvivo offers a few basic queries – matrix query which is just a matrix of results from a series of row and column variables, a text search query, and a word frequency query, which counts the frequency of all the words.  Its actually only slightly more powerful than google in its ability to query.  The best thing about the nvvio queries is that they can be run on small slices of the data, based on attributes such as year or gender, such as doing a query on a series of years, or on only women.  Nvivo also offers basic auto coding where attribute values are automatically connected with codable values – again really just a simple matrix query with auto saving of output.  Our queries include these basics, but also include a massive amount of power through choice of functions provided by the major providers of NLP and machine learning, which can also be applied based on data slices.
§   9) Transcribing/subtitling tool – Nvivo has a transcribing tool but it is primitive. Ours is integrated with personal online storage and youtube, shared storage among our users, speech recognition, and can place the transcripts as subtitles to video.
§  10) Output – the output for nvivo is absurd.   You can output anything as a word document but the export of html pages that supposedly allows interaction with the dataset among non- Nvivo users just doesn’t work.  Our output saves natively as a htm5 files.  That means all our stuff is built into the browers.  People can access the full datasets and output without ever accessing or downloading any data.   Users choose whether to make their output pages public or private.  The original data can remain private, while users can have just the right amount of permitted access to query the data.  The project page created for each project is the same for public and private, but public version can be seen by anyone while the user and those people with the site password are the only ones that can see private pages.  Public pages and shared data can be seen on the website and are shared in the cloud storage account, to which each user has access, to encourage sharing.
§  11) Visualization – Nvivo offers poor visualization.  We use open source visualization, and make the creation of visualizations a major part of the output, not an afterthought.  Because we offer statistics and visualization of the quantitative data as well, our solution is truly mixed method.  The auto- mapping is the most promising visualization feature, but timelines and many other mapping tools are promising as well.  Users can easily download any or all visualizations of the data.  Timeline and mapping data can also be exported to the spatialscholars and timescholars websites, for further visualization control.
§  12) Price and profit – Want to send a ton of your hard earned dollars to Australia, or donate a little to a California educational nonprofit?  Our system is based mostly on open-sourced tools, which means that we can offer a community based service that is free. It allows limited size of upload, limited sharing with other users, and no access to public data, but otherwise the free account includes all other functions.  Our unlimited plan is only $60 per year.  Private data is saved after cancel for one year for free, and longer with small payment.  At $60 per year for full access, it would take about twelve years to equal the purchase of one copy of Nvivo.  You might think, hey, but with Nvivo I own the product.  That’s right but after 6 years they will release a new version and you own an outdated product that has been mostly abandoned by the company, with no support, and no forward compatibility – no ability to open new Nvivo files.  All our files will be forward and backward compatible.  We will not create editions to make you buy a new version of our product.  Nvivo editions upgrade about every six years, when you need to buy a new edition.  In six years you can use one version of Nvivo for over $750 or you can use ours for $360.  Our clearly superior nonprofit solution is about half the price. Welcome to the new world of online qualitative analysis.

-          OfficeNow – word processor, presentations, and database, all online, tied to storage, easy collaboration, import and export from word (openoffice, google, etc)

-          mytranscripts.org -- Subtitle program – easily add subtitles to video, and integrate with youtube and dropbox

-          Textuary.org – standalone Sms texting from web to/from cell; MMS to be developed too; will be integrated with other software (built from Plivo)

-          Super converter -- convert from any audio/video/text format to any audio/video/text format                

o   http://vbridge.co.uk/2012/11/05/how-we-tuned-tesseract-to-perform-as-well-as-a-commercial-ocr-package/

               
-          licenseserve.com --password server - for securely sharing passwords without exposing the password

-          Smartclass.org – Blackboard alternative for online learning (Moodle based)

-          Calendrical – online signup sheet/calendar/etc -- doodle on steroids (built on webcalendar)

-          bittyurl – turns any long url into a short one

-          mycloud – storage service akin to dropbox or box

-          nonprofitwork – jobsite for nonprofit job searches and volunteer work (built from jobberbase)

-          donatenow -  crowd funding site for nonprofits (built from ?)

-          audio/video editor – online audio/video editing program with basic editing tools (built from ?)

-          speech recognition – standalone ASR speech recognition site (built from att&t for longer passages, google for shorter ones, also uses open source sphynx)

-          Screen capture/screencast – alternative to camtasia

-          NLPNOW – standalone natural language processing site; user upload of data, and download of output; all major NLP tools compared.  Allows display/save/email/sharing of output. Also will be integrated into Qualitative Program, and several of the data sharing programs (e.g., moviescholars, tunescholars).  There’s a great schematic, here,


o   Here’s the job description,

o   This job is to develop an NLP engine that provides the user the ability to choose between all the major NLP providers and their various NLP functions, using a single source of data.  We will tentatively call this project NLP compare.

o     The data source used for the NLP compare functions come from two places -- user upload on the NLP compare site, or sent via api.

o   For the site, this is a simple site with user login, simple admin backend, user upload of data to be run through our NLP engine, and display/download of the NLP output.

o   For the api, anyone holding a valid api key could send us input, which we would then run through our NLP engine, and send them the output back via return api. 

o   This system would use primarily api to incorporate the unique functions of various open source tools (genism, CliPS pattern mbsp, carrot2, uima, gate, rxnlp, Lingpipe, Libshorttext, etc).  If in testing, functions from competing companies produce exactly the same output then that should be noted and only one of the repeated NLP functions is included.   However, since each of the NLP company computes NLP in different ways, they likely will produce different output for the same NLP function.  Thus, it makes sense to use all unique functions, where uniqueness is based on output, not the name of the NLP  function.

o   The budget is negotiable, but because this is for the college classroom and other nonprofit educational use, thank you for keeping costs low.

-          WriteNow – online grammar and plagiarism check, uses NLP for pos parsing, other open source tools for grammar and plagiarism.  Integrated with online storage.  Brings together best of existing tools,  http://elearningindustry.com/top-10-free-plagiarism-detection-tools-for-teachers, as well as top tools for grammar.

-          translateNow – standalone translation tool for uploaded/downloaded large documents, translation chat (built on Zanata, Hablator).  May be integrated into other products.

-          FTPNow – online FTP; will be integrated into other programs such as tunetext and others (built on MonstaFTP)

-          ScrapeNow – standalone online scraping software, turns websites into databases; scrapes major news sites, twitter, youtube, and facebook and outputs into database format.  Allows display/save/email/sharing of output. Will be integrated with other software (built on Jubatis, XMS, scrapy api, imacro api, import io, and others)

-          Scrapenparse – tools for scraping and parsing news sites, imdb, and others

-          machinelearningNow – standalone machine learning site that combines the best of existing http://butleranalytics.com/20-free-data-mining-software-platforms/  probably built on jubatis, with other algorithms added.  Allows user to upload documents, display/save/email/share results.   Will be integrated into other products such as Qualitative Program, moviescholars, tunescholars.

-          PhotovoiceNow –standalone Android/iphone app that takes picture, and records audio description of picture, saves to user’s storage, and emails link to user.   Audio is also automatically transcribed by ASR, and added as subtitle to picture. Transcript is saved along with picture and audio as one package.   Website is created that has picture and audio and transcript, and that is emailed to user (and user’s social media if directed).  System saves pic, audio, and transcript together on users external storage account (built from Mycloud).   Mobile ready site allows user to login to account, access private recordings, links to quickly share recordings with others on this site and user’s social media accounts, and view all shared recordings.  Frontend site includes public display of users’ photovoice submissions that are designated by users as public, browse submissions and search voice transcript, and an embeddable flash/html5 player that shows the picture and plays the audio (frontend build with open source video xxxx).   This will be integrated into other software.





No comments:

Post a Comment