1. How to do singleton database connection in python

    Here is an example to create a singleton database connecton with MySQL Server , the python version  used is python 2.7.3

    1. mysql — host=localhost — user= username —password=yourpassword;

    2. mysql> create database dbname;

    3. mysql> use dbname.

    4. create a table in mysql using create table command.

    The class DbConnection creates a singleton database connection with MySQL Server,

     To make a variable singleton add it in the __init__() of the inner class Singleton.

    import MySQLdb
    class DbConnection(object):
        _iInstance = None
        class Singleton:
            def __init__(self):
                # add singleton variables here
                self.connection = MySQLdb.connect("localhost","root","root","testblog" )
        def __init__( self):
            if DbConnection._iInstance is None:
                DbConnection._iInstance = DbConnection.Singleton()
            self._EventHandler_instance = DbConnection._iInstance

        def __getattr__(self, aAttr):
            return getattr(self._iInstance, aAttr)

        def __setattr__(self, aAttr, aValue):
            return setattr(self._iInstance, aAttr, aValue)

    class TestConnection():
        def get_connection(self):
            try:
                return DbConnection().connection
            except Exception as exe:
                raise

    To test, create some objects of TestConnection and call the method get_connection,

    print TestConnection().get_connection()
    print TestConnection().get_connection()
    print TestConnection().get_connection()

    [ Sample OUtput]

    <_mysql.connection open to ‘localhost’ at 93d6064>

    <_mysql.connection open to ‘localhost’ at 93d6064>

    <_mysql.connection open to ‘localhost’ at 93d6064>

    We can see that all objects we create make use of the same database  connection.

    References:

    http://www.developer.nokia.com/Community/Wiki/How_to_make_a_singleton_in_Python

  2. Tame your learning Machine - Database Connection and Classification using WEKA

    Dream

    Weka is a powerful Machine Learning and Data Mining tool evolving from the University of Waikato New Zealand. Weka has incorporated the most popular classifiers, clusterers and associators (if you are wondering what these are, you should get your basic machine learning vocabulary right) around into an easily usable Java package. So for all those under educated machines out there and their mahouts struggling to get them tamed, do take a peak into the world of Weka.

    Now, the areas of Machine learning are pretty wide for a single write up like this could ever explain, so I would like to stick to a few areas which the Weka developers would come across almost every time they are building a Machine Learning framework.

    One of the best features of Weka worth mentioning is that it could be used to work directly with most popular Databases, which means that you save the trouble of porting all those huge amounts data you have accumulated into the native file format – arff before you start working on it, not to mention the precious processing power you save re-indexing your arff files every time the data is updated in your database. Another great advantage this gives you is that, the datasets which you need Weka to work on could be derived from the database using native database operations. To explain this further, imagine you have your data spread across different tables in a database, you could use a join query to consolidate all the necessary attributes and then feed this as a table into Weka on-line. That just makes a lot of things much simpler to accomplish and it makes a lot of sense too.

    Here’s how you connect a MySql Server Database to Weka on Linux and start writing code in Java. To make things a little bit clearer, this is the environment I use:

    • Ubuntu 11.04 Natty Narwhal.
    • Weka 3.6.0-3.
    • Eclipse Indigo – Enterprise Edition.

    The prerequisites for this particular example are :

    • You need to have a MySql Server database and its authentication credentials if any on an accessible local server or on the Internet. We use the URL of this database for connecting to Weka.
    • It is assumed that you have the correct version of JDK installed, when you have installed Weka on your machine.

    EXEcute:

    Now let’s move into the steps:

    1.    Download the MySql Connector from this link.
    2.   Extract the zip file and you will find a jar file named mysql-connector-java.jar, copy the file to /usr/share/java/ folder. This is the same folder you will find weka.jar in.
    3.   Open the terminal and edit your .bashrc file to add the following line at the end : export CLASSPATH=/usr/share/java/mysql-connector-java.jar:/usr/share/java/weka.jar/. You are basically setting the classpath here, and you might need sudo permissions to edit this file.
    4.  Change to /usr/share/java/ directory and use the following command to extract the weka.jar file there: sudo jar -xf weka.jar. This will create a folder named weka in /usr/share/java folder. Navigate to /weka/experiment folder and find the DatabaseUtils.prop.mysql file, copy this file to your home folder (~), and rename it to DatabaseUtils.prop in the home directory. It is in this file we provide the database urls and point it to the appropriate drivers.
    5.  Edit the DatabaseUtils.prop file in the home directory to include your appropriate drivers and the database URL. In our case we need to add the name of the mysql connector. Edit the following lines as shown.
    # JDBC driver (comma-separated list)
    jdbcDriver=com.mysql.jdbc.Driver
    # database URL
    jdbcURL=jdbc:mysql:[Your Database url here without any prefixes. eg(//yoururl:3306/testdb)]
    


    My DatabaseUtils.prop file in the home directory looks like this, you could use the same configuration, please note that the data types I have uncommented in the file are the data types that I use in my database, you could comment out or uncomment the ones which you intent to use or not, it actually doesn’t matter now, Weka will throw undefined data type exceptions when you run a query if it doesn’t understand a particular data type, you could then come to this file and uncomment that particular data type :

    # Database settings for MySQL 3.23.x, 4.x
    #
    # General information on database access can be found here:
    # http://weka.wikispaces.com/Databases
    #
    # url: http://www.mysql.com/
    # jdbc: http://www.mysql.com/products/connector/j/
    # author: Fracpete (fracpete at waikato dot ac dot nz)
    # version: $Revision: 5836 $
    # JDBC driver (comma-separated list)
    #jdbcDriver=org.gjt.mm.mysql.Driver
    jdbcDriver=com.mysql.jdbc.Driver
    # database URL
    jdbcURL=jdbc:mysql://yoururl:3306/testdb
    # specific data types
    # string, getString() = 0; --> nominal
    # boolean, getBoolean() = 1; --> nominal
    # double, getDouble() = 2; --> numeric
    # byte, getByte() = 3; --> numeric
    # short, getByte()= 4; --> numeric
    # int, getInteger() = 5; --> numeric
    # long, getLong() = 6; --> numeric
    # float, getFloat() = 7; --> numeric
    # date, getDate() = 8; --> date
    # text, getString() = 9; --> string
    # time, getTime() = 10; --> date
    # the original conversion: =
    char=0
    varchar=0
    #longvarchar=0
    #binary=0
    #varbinary=0
    #longvarbinary=0
    #bit=1
    #numeric=2
    decimal=2
    #tinyint=3
    smallint=4
    integer=5
    int=5
    bigint=6
    #real=7
    #float=2
    #double=2
    #date=8
    #time=10
    #timestamp=8
     
    #mysql-conversion
    CHAR=0
    TEXT=0
    VARCHAR=0
    LONGVARCHA# Database settings for MySQL 3.23.x, 4.x
    #
    # General information on database access can be found here:
    # http://weka.wikispaces.com/Databases
    #
    # url: http://www.mysql.com/
    # jdbc: http://www.mysql.com/products/connector/j/
    # author: Fracpete (fracpete at waikato dot ac dot nz)
    # version: $Revision: 5836 $
    # JDBC driver (comma-separated list)
    #jdbcDriver=org.gjt.mm.mysql.Driver
    jdbcDriver=com.mysql.jdbc.Driver
    # database URL
    jdbcURL=jdbc:mysql:////yoururl:3306/testdb
    # specific data types
    # string, getString() = 0; --> nominal
    # boolean, getBoolean() = 1; --> nominal
    # double, getDouble() = 2; --> numeric
    # byte, getByte() = 3; --> numeric
    # short, getByte()= 4; --> numeric
    # int, getInteger() = 5; --> numeric
    # long, getLong() = 6; --> numeric
    # float, getFloat() = 7; --> numeric
    # date, getDate() = 8; --> date
    # text, getString() = 9; --> string
    # time, getTime() = 10; --> date
    # the original conversion: =
    char=0
    varchar=0
    #longvarchar=0
    #binary=0
    #varbinary=0
    #longvarbinary=0
    #bit=1
    #numeric=2
    decimal=2
    #tinyint=3
    smallint=4 R=9
    BINARY=0
    VARBINARY=0
    LONGVARBINARY=9
    BIT=1
    NUMERIC=2
    DECIMAL=2
    FLOAT=2
    DOUBLE=2
    TINYINT=3
    SMALLINT=4
    #SHORT=4
    SHORT=5
    INTEGER=5
    INT=5
    BIGINT=6
    LONG=6
    REAL=7
    DATE=8
    TIME=10
    TIMESTAMP=8
    # other options
    CREATE_DOUBLE=DOUBLE
    CREATE_STRING=TEXT
    CREATE_INT=INT
    CREATE_DATE=DATETIME
    DateFormat=yyyy-MM-dd HH:mm:ss
    checkUpperCaseNames=false
    checkLowerCaseNames=false
    checkForTable=true
     
    # All the reserved keywords for this database
    # Based on the keywords listed at the following URL (2009-04-13):
    # http://dev.mysql.com/doc/mysqld-version-reference/en/mysqld-version-reference-reservedwords-5-0.html
    Keywords=\
    ADD,\
    ALL,\
    ALTER,\
    ANALYZE,\
    AND,\
    AS,\
    ASC,\
    ASENSITIVE,\
    BEFORE,\
    BETWEEN,\
    BIGINT,\
    BINARY,\
    BLOB,\
    BOTH,\
    BY,\
    CALL,\
    CASCADE,\
    CASE,\
    CHANGE,\
    CHAR,\
    CHARACTER,\
    CHECK,\
    COLLATE,\
    COLUMN,\
    COLUMNS,\
    CONDITION,\
    CONNECTION,\
    CONSTRAINT,\
    CONTINUE,\
    CONVERT,\
    CREATE,\
    CROSS,\
    CURRENT_DATE,\
    CURRENT_TIME,\
    CURRENT_TIMESTAMP,\
    CURRENT_USER,\
    CURSOR,\
    DATABASE,\
    DATABASES,\
    DAY_HOUR,\
    DAY_MICROSECOND,\
    DAY_MINUTE,\
    DAY_SECOND,\
    DEC,\
    DECIMAL,\
    DECLARE,\
    DEFAULT,\
    DELAYED,\
    DELETE,\
    DESC,\
    DESCRIBE,\
    DETERMINISTIC,\
    DISTINCT,\
    DISTINCTROW,\
    DIV,\
    DOUBLE,\
    DROP,\
    DUAL,\
    EACH,\
    ELSE,\
    ELSEIF,\
    ENCLOSED,\
    ESCAPED,\
    EXISTS,\
    EXIT,\
    EXPLAIN,\
    FALSE,\
    FETCH,\
    FIELDS,\
    FLOAT,\
    FLOAT4,\
    FLOAT8,\
    FOR,\
    FORCE,\
    FOREIGN,\
    FROM,\
    FULLTEXT,\
    GOTO,\
    GRANT,\
    GROUP,\
    HAVING,\
    HIGH_PRIORITY,\
    HOUR_MICROSECOND,\
    HOUR_MINUTE,\
    HOUR_SECOND,\
    IF,\
    IGNORE,\
    IN,\
    INDEX,\
    INFILE,\
    INNER,\
    INOUT,\
    INSENSITIVE,\
    INSERT,\
    INT,\
    INT1,\
    INT2,\
    INT3,\
    INT4,\
    INT8,\
    INTEGER,\
    INTERVAL,\
    INTO,\
    IS,\
    ITERATE,\
    JOIN,\
    KEY,\
    KEYS,\
    KILL,\
    LABEL,\
    LEADING,\
    LEAVE,\
    LEFT,\
    LIKE,\
    LIMIT,\
    LINES,\
    LOAD,\
    LOCALTIME,\
    LOCALTIMESTAMP,\
    LOCK,\
    LONG,\
    LONGBLOB,\
    LONGTEXT,\
    LOOP,\
    LOW_PRIORITY,\
    MATCH,\
    MEDIUMBLOB,\
    MEDIUMINT,\
    MEDIUMTEXT,\
    MIDDLEINT,\
    MINUTE_MICROSECOND,\
    MINUTE_SECOND,\
    MOD,\
    MODIFIES,\
    NATURAL,\
    NOT,\
    NO_WRITE_TO_BINLOG,\
    NULL,\
    NUMERIC,\
    ON,\
    OPTIMIZE,\
    OPTION,\
    OPTIONALLY,\
    OR,\
    ORDER,\
    OUT,\
    OUTER,\
    OUTFILE,\
    PRECISION,\
    PRIMARY,\
    PRIVILEGES,\
    PROCEDURE,\
    PURGE,\
    READ,\
    READS,\
    REAL,\
    REFERENCES,\
    REGEXP,\
    RELEASE,\
    RENAME,\
    REPEAT,\
    REPLACE,\
    REQUIRE,\
    RESTRICT,\
    RETURN,\
    REVOKE,\
    RIGHT,\
    RLIKE,\
    SCHEMA,\
    SCHEMAS,\
    SECOND_MICROSECOND,\
    SELECT,\
    SENSITIVE,\
    SEPARATOR,\
    SET,\
    SHOW,\
    SMALLINT,\
    SONAME,\
    SPATIAL,\
    SPECIFIC,\
    SQL,\
    SQLEXCEPTION,\
    SQLSTATE,\
    SQLWARNING,\
    SQL_BIG_RESULT,\
    SQL_CALC_FOUND_ROWS,\
    SQL_SMALL_RESULT,\
    SSL,\
    STARTING,\
    STRAIGHT_JOIN,\
    TABLE,\
    TABLES,\
    TERMINATED,\
    THEN,\
    TINYBLOB,\
    TINYINT,\
    TINYTEXT,\
    TO,\
    TRAILING,\
    TRIGGER,\
    TRUE,\
    UNDO,\
    UNION,\
    UNIQUE,\
    UNLOCK,\
    UNSIGNED,\
    UPDATE,\
    UPGRADE,\
    USAGE,\
    USE,\
    USING,\
    UTC_DATE,\
    UTC_TIME,\
    UTC_TIMESTAMP,\
    VALUES,\
    VARBINARY,\
    VARCHAR,\
    VARCHARACTER,\
    VARYING,\
    WHEN,\
    WHERE,\
    WHILE,\
    WITH,\
    WRITE,\
    XOR,\
    YEAR_MONTH,\
    ZEROFILL
     
    #The character to append to attribute names to avoid exceptions due to
    #clashes between keywords and attribute names
    KeywordsMaskChar=_
     
    #flags for loading and saving instances using DatabaseLoader/Saver
    nominalToStringLimit=50
    idColumn=auto_generated_id
    

    Test:

    6. We are ready to roll now. To check if the configuration which we have set up is correct, we will first run the Weka GUI. Open a terminal and type in the following command to lauch Weka in GUI mode: java -classpath $CLASSPATH:weka.jar weka.gui.Main

    7. From the Weka GUI window select Applications and then select Explorer. In the explorer window Select the OpenDB button. This will bring up the Open database dialog. In the URL textbox add the full name of your database mysql URL (eg: jdbc:mysql://yoururl:3306/testdb), if you require a username and password to connect to the database click the User button and provide these details there. Come back to the Open database window and click Connect. The info area at the bottom of the window should now show connnection to your database = true. This is the green flag which we have been looking for. We are ready to write code now.

    Refine and Activate:

    Open your Eclipse Indigo EE and create a new Java project. Add weka.jar from the installation to the build path to make sure you have the classes ready to accessed. The following piece of code is going to give you an idea of how to go about using the database connection with, for example, a Conjunctive Rule classification. You don’t have to bother about the database URL or drivers in code, since we have already taken care of those in the DatabaseUtils.prop file. Please note that the result of the select query is directly given as input to the classifier, which makes Weka very powerful to use with databases. It’s all about control.

    try {
       InstanceQuery query = new InstanceQuery();
       query.setUsername(“UserName”); //username and password to your db
       query.setPassword(“Password”);
       query.setQuery("select " + " C.calltype,C.phoneid,C.duration "
       + "from testdb.Timeline as T "
       + "inner join testdb.Call as C on T.timestamp=C.timestamp");
     
       // The result of this query is the table which Weka is going to //use for classification or prediction
       Instances data = query.retrieveInstances();
       int nAttr = data.numAttributes();
       int index = (int) (Math.random() * nAttr);
       //set the data for classification, in this case we have set a //random attribute for classification
       data.setClassIndex(index);
       // Some of my data in the database are numeric, but I need the classifier to be notified that they are actually nominal values(like the month of a year)
       NumericToNominal nm = new NumericToNominal();
       String[] options = new String[2];
       options[0] = "-R";
       options[1] = "1-2"; //set the attributes from indices 1 to 2 as
     
       //nominal
       nm.setOptions(options);
       nm.setInputFormat(data);
       Instances filteredData = Filter.useFilter(data, nm);
     
       // classifier
       ConjunctiveRule classifier = new ConjunctiveRule();
       classifier.buildClassifier(filteredData);
       outWriter.print(classifier); // Print the output of the //classifier.
    } catch (Exception e) {
       e.printStackTrace();
       outWriter.println("Exception: " + e.getMessage());
    }
    

    There you go..!!! Feel free to try out different classifiers both numeric and nominal until you find the right suitor for your scenario. Play around with Weka. More about using Classifiers in the next post. Have fun.

  3. Getting around Android memory blues

    I am sure that most of the Android developers have seen the error java.lang.OutOfMemoryError. I too, have come across the same many a times, and subsequently digged deep into it. Since Friday deals with a huge amount of data (graphs, locations and maps, composite list, bitmaps) I have had to spend a lot of time looking for the solution. While wandering for an answer, I did stumble upon some keywords like heap, MAT and so on; However, the solution was still not in the horizon. At last, I got a clear picture about the memory leaks from the session by Patrick Dubroy at Google I/O 2011.

        Up until then, I (somehow) did not have enough faith in the Android Garbage Collector (GC) and used to manually call Finalization and GC for freeing up memory. I am pretty sure that a considerable number of the android developers still use this approach.

    System.runFinalization();
    Runtime.getRuntime().gc();
    System.gc();
    

    I realized, eventually, that this approach slows up the entire application and does have a negative effect on the overall performance of the application. After going through Google I/O 2011, my trust on Android Garbage collector was restored.

    The OutOfMemoryError occurs when the application tries to use more memory than what Android allocates to it. Most of us will be familiar with the term heap size, it indicates the dynamic memory allocated by the Android for each application. The heap limit varies from 16MB(G1) to 48Mb(Xoom). So, if we declare an object by using the “new” operator, the memory is allocated from the heap and when it reaches its limit, the OS throws OutOfMemoryError exception.

    Analyzing GC logs from Logcat

    Normally, Android will free memory on encountering a GC call. But in most of the cases the GC cannot collect all the objects; as objects might still have references to other objects. In a scenario, wherein an object has got any live reference, the memory allocated for it will not be freed and will be tied until the reference is destroyed. Hence, the million dollar question is How do we know whether memory is freed, or whether it is leaking? For the answer you can check the Logcat. I am sure that all of us have seen a message mentioned below in our logcat.

    GC_EXPLICIT freed 5K, 51% free 4263K/8583K, external 1625K/2137K, paused 68ms

    This is the message when GC is called in your application. Again each word in the above line is significant and by analyzing it we can detect
    memory leak(s). The first thing you must note is GC_FOR_MALLOC in the above line. It conveys the reason for calling the GC. The GC is called for different reasons like:

    • GC_FOR_MALLOC: The GC was triggered because there wasn’t enough memory left on the heap to perform an allocation; might be triggered when new objects are being created.
    • GC_EXPLICIT: The GC has been explicitly asked to collect, instead of being triggered by high water marks in the heap. Happens all over the place, but most likely when a thread is being killed or when a binder communication is taken down.
    • GC_CONCURRENT: GC called when the heap has reached a certain amount of objects to collect.
    • GC_EXTERNAL_ALLOC: GC called when the VM is trying to reduce the amount of memory used for collectable objects, to make room for more non-collectable.

    referred from here.

    The GC_EXPLICIT happens when we call the System.gc(). We need to avoid calling GC explicitly from the application for better performance and, therefore, is not recommended.

    The next thing in the log; freed 5K, designates the memory freed.

    The third one 51% free 4263K/8583K is the heap size; this shows the freed/softlimit. 4263K is the current memory usage and 8583K
    is the soft limit for the next GC call.The fourth, and the most important one 1625K/2137K; this shows the external allocated memory. Bitmaps and buffers allocations come under this, and we need to make sure that the memory allocated for this is under control. Usually the external memory allocation failure leads to OutOfMemoryError mostly happens around bitmap and buffers. 1625K is the current memory usage and 2137K is the soft limit for the next GC call.

    The last one shows the pause time. i.e. the amount of time the UI paused while invoking this GC. It depends on the amount of the objects that needs to be freed in this GC call, more the objects the more time it takes.

    Back to step one, How do we identify the memory leak from this logcat message? Usually, when a new Activity is loaded the memory usage shoots up and it goes back to normal once the Activity is created. Below is a sample logcat for reference. 

    GC_EXPLICIT freed 22K, 50% free 2731K/5379K, external 1627K/2137K, paused 53ms
    GC_EXTERNAL_ALLOC freed 374K, 52% free 3231K/6727K, external 2856K/2864K, paused 40ms
    GC_EXTERNAL_ALLOC freed 93K, 51% free 3436K/6983K, external 9700K/9735K,paused 46ms //Activity called
    GC_EXPLICIT freed 929K, 47% free 4249K/7879K, external 3665K/2137K,paused 73ms
    GC_EXPLICIT freed 82K, 53% free 2787K/5831K, external 3625K/2137K,paused 57ms //Activity load complete 
    

    P.S. there is no memory leak in this illustration.

    If there is a memory leak, the memory allocation goes up and stays there, as shown below.

    https://gist.github.com/1226658

    In this case the external memory goes up to 18MB from 1.6MB, clear indicator of a memory leak in the application. The reason for memory leak is that GC cannot free the memory allocated for the objects, because of live external references. You can detect this out via the eclipse memory analyzer (MAT). You can install this in your eclipse [Install Eclipse MAT via the Eclipse Update manager. Select General Purpose Tools and install Memory Analyser (Incubation) and Memory Analyser (Charts)]. This android blog post shows how to use MAT for finding memory leak.

    From the histogram view and dominator tree you can pin point the class/object which is creating the memory leak and you can address the issue.
    Furthermore, the dominator tree can be used to check for duplicate instances (note: duplicate instances have an adverse effect on memory) for any of the activities.

    The other thing we need to bear in mind is the object allocation. Generally, when we declare an object, the memory is allocated from the Heap. After its use GC is called and the memory allocated for the object is recovered. Alternatively, we can declare the objects in such a way that it will be freed as soon as the GC is called (thus speeding up the whole process). There are two ways for declaring objects:

    1. Hard object declaration (or hard reference).
    2. Soft/weak object declaration (or soft/weak reference).

    If we declare an object Foo foo = new Foo(); it is called hard declaration, a widely practiced approach. Consider an example; the objective is to fetch an image of a contact from android. Here is a sample code.

    GC_EXPLICIT freed 22K, 50% free 2731K/5379K, external 1627K/2137K, paused 53ms
    GC_EXTERNAL_ALLOC freed 374K, 52% free 3231K/6727K, external 2856K/2864K, paused 40ms
    GC_EXTERNAL_ALLOC freed 93K, 51% free 3436K/6983K, external 9700K/9735K,paused 46ms //Activity called
    GC_EXPLICIT freed 929K, 47% free 4249K/7879K, external 3665K/2137K,paused 73ms
    GC_EXPLICIT freed 82K, 53% free 2787K/5831K, external 3625K/2137K,paused 57ms //Activity load complete 
    

    In the above code, there is a bitmap object and after fetching the bitmap, it is returned. Here the bitmap is a temporary object for saving the image and there is no need for hard declaration. In this case you can use weak/soft reference for bitmap.

    WeakReference tempbitmap = new WeakReference(BitmapFactory.decodeStream(inputStream));
    

    In the above line I declare the bitmap as a weak reference, so that when GC is called, the above bitmap object will be released of its memory. In your application, if you declare an object which is used temporarily, you can go with a weak or soft reference. Here you can find the difference
    between soft and weak references here.

    Before conclude I would suggest you to go through the Google I/O 2011 for a complete picture about memory leaks and MAT with video tutorial.

  4. We love [technology | sharing]

    Like every other tech startup, we also learn a lot of new things every day. New paradigms, new ways to implement things, new snippets and lot more. Many of which we find in the darkest corners of the developer documentation, stackoverflow, or hidden in source code repositories.

    Most of these problems we encounter are faced by many others as well, so as a responsible denizen, we thought we should share our learnings and earnings as hackers. Follow us if you develop for Android, AWS, Python, DJango, Natural Language Processing, Lucene, Weka, etc.